Maintaining Nintex Workflow Progress Data

In my last post I talked about maintaining the workflow history lists throughout SharePoint. This post is about maintaining Nintex workflow progress data. This data is found in the WorkflowProgress table in a Nintex content database.

When a Nintex Workflow is executed, each action is recorded in the WorkflowProgress table. This gives the system the opportunity to show a graphical representation of the workflow history for a specific instance. You can imagine that this table can contain a lot of information. Nintex recommends to keep this table below 15 million rows to avoid performance issues.

To maintain this table, you can use the PurgeWorkflowData operation of nwadmin.

The PurgeWorkflowData operation has a lot of parameters to specify which records it needs to purge. These parameters might not be enough to limit the records you want to purge. There’s a little catch in purging workflow progress data. To allow proper maintenance of the workflow history data, you need to make sure that you only purge workflow progress data for workflow instances where NO workflow history exists anymore. This means that there’s an order in how to maintain both the history lists and the workflowprogress table:

  1. Workflow History Lists
  2. WorkflowProgress table

If you purge the WorkflowProgress table before you clean the workflow history, you will end up with history list items which cannot be purged anymore in a selective way (using the ‘PurgeHistoryListData’ operation of nwadmin). You can only purge them by clearing the list completely.

If you read my previous post, I offered an alternative way for maintaining the history lists and one of the options was to clear the list based on the type of event. This basically means that for a lot of workflow instances, some sort of history will be retained. So, you need to take this into account when maintaining the WorkflowProgress table and make sure you don’t remove the workflow progress data for those instances.

To be able to do this, I created a script which does the following:

  • Query the WorkflowProgress table of a Nintex content database to get all workflowProgress data which is created at least 6 months ago.
  • It tries to resolve the Site, Web and HistoryList for each workflow instance it found. When the resolution of one of these fail, the instance is obsolete and can be removed from the table.
  • If the resolution succeeds, a CAML query is performed to see if the instance has any history data in the history list. When nothing is found, the instance is obsolete and can be removed from the table.
  • For each obsolete instance, a SQL query is executed to delete all workflow progress for that instance

If you specify a -LogOnly switch, nothing is deleted. All obsolete instances are written to a file.

Be aware that this script can run for a long time, depending on the amount of instances it finds and needs to check.

Script

<#
    .SYNOPSIS
        Delete obsolete WorkflowProgress data.

    .DESCRIPTION
        Delete obsolete workflowProgress data.
       
    .PARAMETER DBServer
        Specifies the name of the database server where the Nintex database exists. If omitted, the local connection will be used.
    
    .PARAMETER DBName
        Specifies the name of the Nintex content database to target. If omitted, all content databases will be in scope.

    .PARAMETER LogOnly
        Include this switch to run in log-only mode to simulate the actual cancellations.

    .NOTES
        File Name: Delete-WorkflowProgressData.ps1
        Author   : Bart Kuppens
        Version  : 1.0

    .EXAMPLE
        PS C:\> .\Delete-WorkflowProgressData.ps1 -DBServer SHPDB -DBName NTX_PRD_Content

        Description
        -----------
        This script will delete all rows in the WorflowProgress table of the specified database for those instances for which no Workflow History
        was found.

#>
[CmdletBinding()]
param(
    [Parameter(Position=0,Mandatory=$false,ValueFromPipeline=$false)] 
    [string]$DBServer,
    [Parameter(Position=1,Mandatory=$false,ValueFromPipeline=$false)] 
    [string]$DBName,
    [Parameter(Position=2,Mandatory=$false,ValueFromPipeline=$false)]
    [switch]$LogOnly
)

Add-Type -TypeDefinition @"
    public enum LogType
    {
        SUCCESS,
        FAILURE,
        INFO,
        ERROR
    }
"@

###############################
# Functions
###############################

function Run-SQLQuery($server, $database, $query, $parameters=@{})
{
    try
    {
        $connection = New-Object System.Data.SqlClient.SqlConnection
        $connection.ConnectionString = "Server=$server;Database=$database;Integrated Security=True"
        $command = New-Object System.Data.SqlClient.SqlCommand($query,$connection)
        $parameters.Keys | % {
            $command.Parameters.AddWithValue("@$_",$parameters[$_])
        }
        $adapter = New-Object System.Data.SqlClient.SqlDataAdapter
        $adapter.SelectCommand = $command
        $adapter.SelectCommand.CommandTimeout = 300
        $dataset = New-Object System.Data.DataSet
        $adapter.Fill($dataset) >> $null
        $connection.Close()
        $dataset.Tables[0]
    }
    catch
    {
        Write-Log -message "Error getting data from SQL ($($_.Exception.InnerException.Message))" -MessageType ERROR
    }
}

function Run-SQLNonQuery($server, $database, $query, $parameters=@{})
{
    try
    {
        $connection = New-Object System.Data.SqlClient.SqlConnection
        $connection.ConnectionString = "Server=$server;Database=$database;Integrated Security=True"
        $command = New-Object System.Data.SqlClient.SqlCommand($query,$connection)
        $parameters.Keys | % {
            $command.Parameters.AddWithValue("@$_",$parameters[$_])
        }
        $connection.Open()
        $rowsDeleted = $command.ExecuteNonQuery()
        $connection.Close()
        $rowsDeleted
    }
    catch
    {
        Write-Log -message "Error deleting data from SQL ($($_.Exception.InnerException.Message))" -MessageType ERROR
    }
}


function Write-Log([string]$message, [LogType]$MessageType)
{
    $TimeStamp = $([DateTime]::Now.ToString('yyyy/MM/dd HH:mm:ss'))
    $message = "$TimeStamp - $MessageType : $message"
    Out-File -InputObject $message -FilePath $global:LogFile -Append
}

function Process-Database($server, $database)
{
    $global:LogFile = "$ScriptLogLocation\ProgressDataCleanup_$($database)_" + $([DateTime]::Now.ToString('yyyyMMdd_HHmmss')) + ".log"
    if ($LogOnly)
    {
        Write-Log -message "!!!!!!!! LOG-ONLY Mode activated !!!!!!!!" -MessageType INFO
    }
    Write-Log -message "Cleanup progress data in database '$database'" -MessageType INFO
    Write-Log -message "Last Action Date    : $LastActivityDateUTC" -MessageType INFO
    Write-Progress -Activity "Processing Data in $database" -Status "Querying database... (might take a while)" -PercentComplete 95
    $data = Run-SQLQuery -server $server -database $database -query $selectQuery -parameters @{LastActivityDateUTC="'$LastActivityDateUTC'"}
    if ($data.Count -gt 0)
    {
        Write-Log -Message "# Instances in WorkflowProgress table : $($data.Count)" -MessageType INFO
        if ((Get-PSSnapin Microsoft.SharePoint.PowerShell -ea SilentlyContinue) -eq $null)
        {
            Add-PSSnapin Microsoft.SharePoint.PowerShell
        }

        $instancesToDelete = 0 
        $instanceRowsToDelete = 0
        $instanceIndx = 0
        foreach ($row in $data)
        {
            $instanceIndex++
            Write-Progress -Activity "Processing Data in $database" -Status "Processing instance $instanceIndex/$($data.Count) - Obsolete Data Collected : $($instancesToDelete):$($instanceRowsToDelete)" -PercentComplete 95
            $instanceIsObsolete = $false
            if ($oldHistoryListID -ne $row.HistoryListID)
            {
                # dispose old web object if needed
                if ($web -ne $null) {$web.Dispose()}

                $oldHistoryListID = $row.HistoryListID
                $output = NWAdmin.exe -o ResolveSharePointIds -siteId $row.SiteID -webId $row.WebID -listId $row.HistoryListID
                if (($output.Count -eq 1) -or ($output.Count -gt 1 -and $output[1].Contains("Command line execution error")))
                {
                    # The site, web or list doesn't exist. The current instance is therefore obsolete.
                    $instanceIsObsolete = $true
                }
                else
                {
                    $web = Get-SPWeb $output[0]
                    $historyList = $web.Lists.TryGetList($output[1])
                }
            }
            if (!($instanceIsObsolete) -and $historyList -ne $null)
            {
                $spquery = New-Object Microsoft.SharePoint.SPQuery 
                $spquery.ViewFields = "<FieldRef Name='ID' />" 
                $spquery.ViewFieldsOnly = $true 
                $spquery.Query = [string]::Format($queryString,"{$($row.WorkflowInstanceID)}") 
                $listItems = $historyList.GetItems($spquery)
                if ($listItems.Count -eq 0)
                {
                    # No history items found for this instance. Instance data is obsolete
                    $instanceIsObsolete = $true
                }
            }
            if ($instanceIsObsolete)
            {
                # we can remove all Workflow Progress for the instance
                $instancesToDelete++
                $instanceRowsToDelete += $row.ProgressData
                if ($LogOnly.IsPresent)
                {
                    Write-Log -message "$($row.ProgressData) obsolete rows found for instance $($row.WorkflowInstanceID)" -MessageType SUCCESS
                }
                else
                {
                    $deletedInstances = Run-SQLNonQuery -server $server -database $database -query $deleteQuery -parameters @{WorkflowInstanceID=$row.WorkflowInstanceID}
                    Write-Log -message "Deleted $($deletedInstances[1]) rows for workflow instance $($row.WorkflowInstanceID)" -MessageType SUCCESS
                }
            }
        }
        Write-Log -message "Instances to clean          : $instancesToDelete" -MessageType INFO
        Write-Log -message "ProgressData Rows to remove : $instanceRowsToDelete" -MessageType INFO
    }
}

###############################
# Parameters
###############################

$global:ScriptLogLocation = "D:\Logs\NintexCleanup"  # The location where the output of the script is saved
$global:LogFile = ""
$MonthsToKeep = 6                                    # The amount of months events are kept 

###############################
# Main Script
############################### 


# Initialize the cutoff date. 
if ($MonthsToKeep -eq 0) 
{ 
    $cutOffDate = Get-Date 
} 
else 
{ 
    $cutOffDate = (Get-Date).AddMonths(-$MonthsToKeep) 
}

$LastActivityDateUTC = $cutOffDate.ToUniversalTime().ToString("yyyy-MM-dd HH:MM:ss")
$query = @"
    SELECT wi.SiteID,
           wi.WebID,
           wi.HistoryListID,
           wi.WorkflowInstanceID,
           count(wp.InstanceID) as ProgressData
      FROM WorkflowProgress wp
      LEFT JOIN WorkflowInstance wi on wi.instanceID = wp.InstanceID
     WHERE wi.state = 4
       AND CONVERT(VARCHAR(19),wp.TimeStamp,120) <= @LastActivityDateUTC 
     GROUP BY wi.SiteID, wi.WebID, wi.HistoryListID, wi.WorkflowInstanceID
     ORDER BY wi.SiteID, wi.WebID, wi.HistoryListID, wi.WorkflowInstanceID
"@

$deleteQuery = @"
    DELETE wp
      FROM WorkflowProgress wp
     INNER JOIN WorkflowInstance wi on wi.instanceID = wp.InstanceID
     WHERE wi.WorkflowInstanceID = @WorkflowInstanceID
"@

# Query to get items for the cleanup 
$queryString = @"
<Where>
  <Eq>
    <FieldRef Name='WorkflowInstance'/>
    <Value Type='Text'>{0}</Value>
  </Eq>
</Where>" 
"@

if ([string]::IsNullOrEmpty($DBServer) -or [string]::IsNullOrEmpty($DBName))
{
    # No Database or DBServer was specified in the startup params. Process all Nintex content databases
    [System.Reflection.Assembly]::LoadWithPartialName("Nintex.Workflow") | Out-Null
    $configdb = [Nintex.Workflow.Administration.ConfigurationDatabase]::GetConfigurationDatabase()
    if ($configdb -ne $null)
    {
        foreach ($contentdb in $configdb.ContentDatabases)
        {
            Process-Database -server $contentdb.SqlConnectionString["Data Source"] -database $contentdb.SqlConnectionString["Initial Catalog"]
        }
    }
}
else
{
    # Database and server have been specified at startup. Process only that database.
    Process-Database -server $DBServer -database $DBName
} 

By Bart

Bart is a certified SharePoint consultant / architect at CTG Belgium NV with a broad professional experience in IT, a background in software development with a specialisation in Microsoft products and technologies and a solid knowledge and experience in Microsoft SharePoint Products and Technologies. He started as a COBOL developer on a mainframe environment and grew into software development for Windows platforms. Participated in projects varying from migrations of existing applications to development of Web applications and Windows applications. Became fascinated by the SharePoint 2007 platform and strongly believed in the added business value of this platform. Is since then fully committed to SharePoint and focuses on SharePoint implementations, migrations, integrations, design and coaching. Stays on top of new developments within the SharePoint technology stack and related technologies.