In my last post I talked about maintaining the workflow history lists throughout SharePoint. This post is about maintaining Nintex workflow progress data. This data is found in the WorkflowProgress table in a Nintex content database.
When a Nintex Workflow is executed, each action is recorded in the WorkflowProgress table. This gives the system the opportunity to show a graphical representation of the workflow history for a specific instance. You can imagine that this table can contain a lot of information. Nintex recommends to keep this table below 15 million rows to avoid performance issues.
To maintain this table, you can use the PurgeWorkflowData operation of nwadmin.
The PurgeWorkflowData operation has a lot of parameters to specify which records it needs to purge. These parameters might not be enough to limit the records you want to purge. There’s a little catch in purging workflow progress data. To allow proper maintenance of the workflow history data, you need to make sure that you only purge workflow progress data for workflow instances where NO workflow history exists anymore. This means that there’s an order in how to maintain both the history lists and the workflowprogress table:
- Workflow History Lists
- WorkflowProgress table
If you purge the WorkflowProgress table before you clean the workflow history, you will end up with history list items which cannot be purged anymore in a selective way (using the ‘PurgeHistoryListData’ operation of nwadmin). You can only purge them by clearing the list completely.
If you read my previous post, I offered an alternative way for maintaining the history lists and one of the options was to clear the list based on the type of event. This basically means that for a lot of workflow instances, some sort of history will be retained. So, you need to take this into account when maintaining the WorkflowProgress table and make sure you don’t remove the workflow progress data for those instances.
To be able to do this, I created a script which does the following:
- Query the WorkflowProgress table of a Nintex content database to get all workflowProgress data which is created at least 6 months ago.
- It tries to resolve the Site, Web and HistoryList for each workflow instance it found. When the resolution of one of these fail, the instance is obsolete and can be removed from the table.
- If the resolution succeeds, a CAML query is performed to see if the instance has any history data in the history list. When nothing is found, the instance is obsolete and can be removed from the table.
- For each obsolete instance, a SQL query is executed to delete all workflow progress for that instance
If you specify a -LogOnly switch, nothing is deleted. All obsolete instances are written to a file.
Be aware that this script can run for a long time, depending on the amount of instances it finds and needs to check.
Script
<#
.SYNOPSIS
Delete obsolete WorkflowProgress data.
.DESCRIPTION
Delete obsolete workflowProgress data.
.PARAMETER DBServer
Specifies the name of the database server where the Nintex database exists. If omitted, the local connection will be used.
.PARAMETER DBName
Specifies the name of the Nintex content database to target. If omitted, all content databases will be in scope.
.PARAMETER LogOnly
Include this switch to run in log-only mode to simulate the actual cancellations.
.NOTES
File Name: Delete-WorkflowProgressData.ps1
Author : Bart Kuppens
Version : 1.0
.EXAMPLE
PS C:\> .\Delete-WorkflowProgressData.ps1 -DBServer SHPDB -DBName NTX_PRD_Content
Description
-----------
This script will delete all rows in the WorflowProgress table of the specified database for those instances for which no Workflow History
was found.
#>
[CmdletBinding()]
param(
[Parameter(Position=0,Mandatory=$false,ValueFromPipeline=$false)]
[string]$DBServer,
[Parameter(Position=1,Mandatory=$false,ValueFromPipeline=$false)]
[string]$DBName,
[Parameter(Position=2,Mandatory=$false,ValueFromPipeline=$false)]
[switch]$LogOnly
)
Add-Type -TypeDefinition @"
public enum LogType
{
SUCCESS,
FAILURE,
INFO,
ERROR
}
"@
###############################
# Functions
###############################
function Run-SQLQuery($server, $database, $query, $parameters=@{})
{
try
{
$connection = New-Object System.Data.SqlClient.SqlConnection
$connection.ConnectionString = "Server=$server;Database=$database;Integrated Security=True"
$command = New-Object System.Data.SqlClient.SqlCommand($query,$connection)
$parameters.Keys | % {
$command.Parameters.AddWithValue("@$_",$parameters[$_])
}
$adapter = New-Object System.Data.SqlClient.SqlDataAdapter
$adapter.SelectCommand = $command
$adapter.SelectCommand.CommandTimeout = 300
$dataset = New-Object System.Data.DataSet
$adapter.Fill($dataset) >> $null
$connection.Close()
$dataset.Tables[0]
}
catch
{
Write-Log -message "Error getting data from SQL ($($_.Exception.InnerException.Message))" -MessageType ERROR
}
}
function Run-SQLNonQuery($server, $database, $query, $parameters=@{})
{
try
{
$connection = New-Object System.Data.SqlClient.SqlConnection
$connection.ConnectionString = "Server=$server;Database=$database;Integrated Security=True"
$command = New-Object System.Data.SqlClient.SqlCommand($query,$connection)
$parameters.Keys | % {
$command.Parameters.AddWithValue("@$_",$parameters[$_])
}
$connection.Open()
$rowsDeleted = $command.ExecuteNonQuery()
$connection.Close()
$rowsDeleted
}
catch
{
Write-Log -message "Error deleting data from SQL ($($_.Exception.InnerException.Message))" -MessageType ERROR
}
}
function Write-Log([string]$message, [LogType]$MessageType)
{
$TimeStamp = $([DateTime]::Now.ToString('yyyy/MM/dd HH:mm:ss'))
$message = "$TimeStamp - $MessageType : $message"
Out-File -InputObject $message -FilePath $global:LogFile -Append
}
function Process-Database($server, $database)
{
$global:LogFile = "$ScriptLogLocation\ProgressDataCleanup_$($database)_" + $([DateTime]::Now.ToString('yyyyMMdd_HHmmss')) + ".log"
if ($LogOnly)
{
Write-Log -message "!!!!!!!! LOG-ONLY Mode activated !!!!!!!!" -MessageType INFO
}
Write-Log -message "Cleanup progress data in database '$database'" -MessageType INFO
Write-Log -message "Last Action Date : $LastActivityDateUTC" -MessageType INFO
Write-Progress -Activity "Processing Data in $database" -Status "Querying database... (might take a while)" -PercentComplete 95
$data = Run-SQLQuery -server $server -database $database -query $selectQuery -parameters @{LastActivityDateUTC="'$LastActivityDateUTC'"}
if ($data.Count -gt 0)
{
Write-Log -Message "# Instances in WorkflowProgress table : $($data.Count)" -MessageType INFO
if ((Get-PSSnapin Microsoft.SharePoint.PowerShell -ea SilentlyContinue) -eq $null)
{
Add-PSSnapin Microsoft.SharePoint.PowerShell
}
$instancesToDelete = 0
$instanceRowsToDelete = 0
$instanceIndx = 0
foreach ($row in $data)
{
$instanceIndex++
Write-Progress -Activity "Processing Data in $database" -Status "Processing instance $instanceIndex/$($data.Count) - Obsolete Data Collected : $($instancesToDelete):$($instanceRowsToDelete)" -PercentComplete 95
$instanceIsObsolete = $false
if ($oldHistoryListID -ne $row.HistoryListID)
{
# dispose old web object if needed
if ($web -ne $null) {$web.Dispose()}
$oldHistoryListID = $row.HistoryListID
$output = NWAdmin.exe -o ResolveSharePointIds -siteId $row.SiteID -webId $row.WebID -listId $row.HistoryListID
if (($output.Count -eq 1) -or ($output.Count -gt 1 -and $output[1].Contains("Command line execution error")))
{
# The site, web or list doesn't exist. The current instance is therefore obsolete.
$instanceIsObsolete = $true
}
else
{
$web = Get-SPWeb $output[0]
$historyList = $web.Lists.TryGetList($output[1])
}
}
if (!($instanceIsObsolete) -and $historyList -ne $null)
{
$spquery = New-Object Microsoft.SharePoint.SPQuery
$spquery.ViewFields = "<FieldRef Name='ID' />"
$spquery.ViewFieldsOnly = $true
$spquery.Query = [string]::Format($queryString,"{$($row.WorkflowInstanceID)}")
$listItems = $historyList.GetItems($spquery)
if ($listItems.Count -eq 0)
{
# No history items found for this instance. Instance data is obsolete
$instanceIsObsolete = $true
}
}
if ($instanceIsObsolete)
{
# we can remove all Workflow Progress for the instance
$instancesToDelete++
$instanceRowsToDelete += $row.ProgressData
if ($LogOnly.IsPresent)
{
Write-Log -message "$($row.ProgressData) obsolete rows found for instance $($row.WorkflowInstanceID)" -MessageType SUCCESS
}
else
{
$deletedInstances = Run-SQLNonQuery -server $server -database $database -query $deleteQuery -parameters @{WorkflowInstanceID=$row.WorkflowInstanceID}
Write-Log -message "Deleted $($deletedInstances[1]) rows for workflow instance $($row.WorkflowInstanceID)" -MessageType SUCCESS
}
}
}
Write-Log -message "Instances to clean : $instancesToDelete" -MessageType INFO
Write-Log -message "ProgressData Rows to remove : $instanceRowsToDelete" -MessageType INFO
}
}
###############################
# Parameters
###############################
$global:ScriptLogLocation = "D:\Logs\NintexCleanup" # The location where the output of the script is saved
$global:LogFile = ""
$MonthsToKeep = 6 # The amount of months events are kept
###############################
# Main Script
###############################
# Initialize the cutoff date.
if ($MonthsToKeep -eq 0)
{
$cutOffDate = Get-Date
}
else
{
$cutOffDate = (Get-Date).AddMonths(-$MonthsToKeep)
}
$LastActivityDateUTC = $cutOffDate.ToUniversalTime().ToString("yyyy-MM-dd HH:MM:ss")
$query = @"
SELECT wi.SiteID,
wi.WebID,
wi.HistoryListID,
wi.WorkflowInstanceID,
count(wp.InstanceID) as ProgressData
FROM WorkflowProgress wp
LEFT JOIN WorkflowInstance wi on wi.instanceID = wp.InstanceID
WHERE wi.state = 4
AND CONVERT(VARCHAR(19),wp.TimeStamp,120) <= @LastActivityDateUTC
GROUP BY wi.SiteID, wi.WebID, wi.HistoryListID, wi.WorkflowInstanceID
ORDER BY wi.SiteID, wi.WebID, wi.HistoryListID, wi.WorkflowInstanceID
"@
$deleteQuery = @"
DELETE wp
FROM WorkflowProgress wp
INNER JOIN WorkflowInstance wi on wi.instanceID = wp.InstanceID
WHERE wi.WorkflowInstanceID = @WorkflowInstanceID
"@
# Query to get items for the cleanup
$queryString = @"
<Where>
<Eq>
<FieldRef Name='WorkflowInstance'/>
<Value Type='Text'>{0}</Value>
</Eq>
</Where>"
"@
if ([string]::IsNullOrEmpty($DBServer) -or [string]::IsNullOrEmpty($DBName))
{
# No Database or DBServer was specified in the startup params. Process all Nintex content databases
[System.Reflection.Assembly]::LoadWithPartialName("Nintex.Workflow") | Out-Null
$configdb = [Nintex.Workflow.Administration.ConfigurationDatabase]::GetConfigurationDatabase()
if ($configdb -ne $null)
{
foreach ($contentdb in $configdb.ContentDatabases)
{
Process-Database -server $contentdb.SqlConnectionString["Data Source"] -database $contentdb.SqlConnectionString["Initial Catalog"]
}
}
}
else
{
# Database and server have been specified at startup. Process only that database.
Process-Database -server $DBServer -database $DBName
}