The Importance of Completing Tasks in Workflows

When you use workflows in your organization to automate business processes, you have to deal with the fact that people are lazy. They don’t like work. One of the things you do a lot in workflows, is create tasks. Somewhere in a process, somebody needs to complete a task before the next step in the process can be completed. So, the task is created and your workflow waits until someone completes it. The importance of completing tasks in the context of workflows is often underestimated. Completing a task in the context of a workflow is a 2-step process and this is a concept which lots of people don’t seem to understand and this leads to a lot of problems.

  • Step 1: Executing the action in the task
  • Step 2: Mark the task completed

Let me give you an example on how this might lead to issues.

You have an onboarding workflow for new employees. This workflow initiates several other workflows.
One of those other workflows is the creation of an Exchange mailbox. This “Exchange Mailbox Creation” workflow creates a task for the Exchange admins which instructs them to create a mailbox for the new employee.

Meanwhile, the main workflow handles some other stuff which is important for onboarding the new employee. Somewhere at the end of the process, the main workflow is going to check if the mailbox has been created. As long as the mailbox is not created, the onboarding process cannot be completed.
To do this, the workflow contains a loop which checks if the Exchange Mailbox Creation workflow is completed. If it’s not, It’s going to pause for 1 hour and check again. This repeats itself until the Exchange Mailbox Creation workflow is completed.

The problem is… that workflow will NEVER end because the people who are responsible for creating Exchange mailboxes do understand the importance of creating a mailbox for a new employee, but don’t understand the importance of marking the task completed in the system. They don’t know that the overall onboarding process relies on the completion of those tasks.

Direct results :

  • The Exchange Mailbox creation workflow will be waiting indefinetely unless someone sets that task to completed
  • The onboarding process for this employee will never finish because of the Exchange Mailbox Creation workflow will never finish

Indirect results:

  • The history list will grow constantly because of the loop which generated 2 list items every hour (for the pause action). Read this post to understand the importance of maintaining a history list.
  • The workflowprogress data table will grow constantly because of the same loop. Read this post to understand the importance of workflowprogress data.

A workflow designer has the responsibility of covering this by :

  • including reminder notifications in the tasks
  • Have some kind of escalation procedure in the case nobody bothers to complete those tasks. Escalating to a manager often helps
  • Make sure that the task notifications stress the importance of completing the tasks for a proper process termination. This should also be emphasized in the reminder notifications or escalations as well

For the example of the onboarding process, I had 91 running instances which have been running for months up until a year because of these incompleted tasks. These 91 instances are responsible for a history list that grows with almost 4400 list items every single day.

You can avoid all of this by thinking through your process and include logic in your workflow and tasks to avoid these situations. But if you are tasked with cleaning up an environment where these kind of issues exist for a while, you might have an environment which has thousands of running workflows that keep filling your history lists and workflowprogress tables with useless information and you need a way to stop this madness.

If you are using Nintex Workflow, you can use the below which allows you to terminate Nintex Workflows in batch. You specify your Nintex content database and a date and the script will terminate all running instances that didn’t have any activity since the specified date.

Keep in mind that terminating workflows might result in notification emails being sent by the system.

 

Maintaining Nintex Workflow Progress Data

In my last post I talked about maintaining the workflow history lists throughout SharePoint. This post is about maintaining Nintex workflow progress data. This data is found in the WorkflowProgress table in a Nintex content database.

When a Nintex Workflow is executed, each action is recorded in the WorkflowProgress table. This gives the system the opportunity to show a graphical representation of the workflow history for a specific instance. You can imagine that this table can contain a lot of information. Nintex recommends to keep this table below 15 million rows to avoid performance issues.

To maintain this table, you can use the PurgeWorkflowData operation of nwadmin.

The PurgeWorkflowData operation has a lot of parameters to specify which records it needs to purge. These parameters might not be enough to limit the records you want to purge. There’s a little catch in purging workflow progress data. To allow proper maintenance of the workflow history data, you need to make sure that you only purge workflow progress data for workflow instances where NO workflow history exists anymore. This means that there’s an order in how to maintain both the history lists and the workflowprogress table:

  1. Workflow History Lists
  2. WorkflowProgress table

If you purge the WorkflowProgress table before you clean the workflow history, you will end up with history list items which cannot be purged anymore in a selective way (using the ‘PurgeHistoryListData’ operation of nwadmin). You can only purge them by clearing the list completely.

Read more

Cleaning Workflow History in SharePoint

A lot of organizations that use SharePoint, use the platform to automate business processes using workflows. But little organizations are aware that because of these workflows, your SharePoint environment needs to be maintained a little more than usual. Almost all organizations which use a lot of workflows start having performance issues. SharePoint is starting to get slow, workflows are starting to show weird startup behaviors or don’t start at all. Those kind of things. And they don’t know why this is happening. One of the causes for this are probably the workflow history lists.

Which lists? Workflow history? We don’t have such lists.
Yes you do, but you don’t know it, because they are hidden! The comments and events you see when you look at a started workflow… these are stored in the workflow history list. Because those lists are hidden, most of the organizations who never had to deal with these issues, don’t know these lists exist and so, those lists grow and grow and grow. Up to a point it starts to get problematic.

To give an example, I had a case where the largest workflow history list contained 24 million items. For the entire environment, the combined size of all workflow history was 39 million items. That’s a lot of history. And honestly… nobody cares about this information. It’s only used in the event something goes wrong in a workflow and you need to find out why. But in highly regulated or audited environments, this history data can be important.

If you use Nintex Workflow, you can purge those lists using a single command. But if you try to do this on a list that contains millions of items, chances are that you will run into issues. The list has become too big. And again… purging the data is not always possible due to policies.

Now what? You will probably start searching for a solution… in the end, it’s just a SharePoint list like any other list, right? Online, you find some hints on how to approach those lists:

  • Easiest option… create a new list, change the workflows to use that new list and just delete the old one. That’s the quickest way to deal with this. But then you will lose all history. Each history list item has an event type from 1 to 11. Suppose you want to retain the events which hold a record of task outcomes. Deleting the list isn’t an option anymore and you need to find a way to selectively delete list items.
  • Create a script that deletes the items. Well, seems logical to do this. But you need to find the most optimal way of deleting items. Ever deleted SharePoint items in large lists? If you have, you know that SharePoint takes it time to do this. In a specific case I had a delete frequency of 1 item per 9 seconds, for a list of 32000 items. You do the math… that’s 80 hours. Which is a lot. Imagine you have to do this on a list of 24 million items. Best case… it takes 9 seconds per item. That’s 2500 days! Or almost 7 years! Completely insane.

So, out of options?

Well no. If you need to retain specific items on that list, deleting individual list items is still an option. But you need to use a different approach.

Instead of iterating over the complete collection of items in a list and delete one by one, you can use a batch processing method which exists on the SPWeb object. This batch processing method accepts an XML structure that contains “methods”. Each method is an instruction to delete an item in a list.

Each method contains the GUID of the list you are targeting, the ID of the item, and a command. In our case “Delete”.

Once you have assembled this structure, you pass it as a parameter to the ProcessBatchData method on the SPWeb object and SharePoint will perform all of the methods in batch.

To give you an idea on the performance of deleting items using this method. It deleted 215000 items in 16 hours. Compare this to 32000 items in 80 hours. That’s a huge improvement.

How would you practically do this?

Well, you use a CAML query to get a bunch of items from your history list and you assemble your batch xml for these items. Once it has been processed, you repeat the query and do it again… until you run out of items.

Here’s an example. The script below is going to remove all list items which have an event type of 0, 4, 5 or 11. The query returns 2000 items at a time and assembles the required batch xml for these 2000 items. The “ListItemCollectionPosition” is used to know when we are at the end of the list and out of items to delete. When this is null after you execute a query, there are no more items to query.

Running this on a list with 24 million items still requires a lot of time. And you might question the importance of retaining events for such lists… but in any case, you now have a faster way of deleting items in a SharePoint list.

In the end, you need to avoid getting in such a situation in the first place. You can do this by establishing some governance policies regarding workflows and create some routines to enforce those policies by performing maintenance on a continuous basis. That way, you will be able to keep those history lists under control.

Gaining insights in your Nintex Workflow Data

Working with workflows in SharePoint is always a challenging task. Especially when you are an administrator and people start complaining about performance issues, workflows that don’t start or resume, and so on. Using Nintex Workflow doesn’t change anything, but it gives you a bit more leverage.

The first thing I always do, is try to get some insights in the size of the issue. So, we’re talking about data. How much workflow data is involved. When Nintex is involved, there are 2 places where I’m going to look:

  • Workflow History Lists (SharePoint)
  • WorkflowProgress table (SQL Database)

Nintex has an article on both topics which I can definately recommend.

https://support.nintex.com/SharePoint/Forms/SharePoint_Maintenance_for_Nintex_Workflow

This article links to a PDF which explains how to maintain the WorkflowProgress table. You can find that PDF here.

In this PDF, you will find a SQL query which queries this table and the WorkflowInstance table to give you an insight in the sheer amount of data which is stored and which might be the root cause of your issues with workflows.

Just for you lazy people, here’s the query:

This gives you a lot of information which you can use in your remediation strategy.
Just export that information to a CSV and load it in Excel and you can have the following information at hand in no time.

workflow data

This is a table which gives you, per Nintex content database, the amount of records in the workflowProgress table (ActionCount) and the number of workflow instances involved. And it splits up this information for the different states workflows can be in (running, completed, canceled, error). Whoever came up with the concept of Pivot tables… thank you! 🙂

While this query is great for getting that information and report on it, it’s not that useful for people who are responsible for creating or maintaining those workflows. Look at the query. It gives you the site, web, list and even item that’s involved. But those are the ID’s. You still need to resolve these to a URL or title of a list to be useful for anyone who’s not an administrator on SharePoint and knows his way around in PowerShell.

You could customize this query and include the SharePoint content databases in there to get you the needed information but you know you shouldn’t touch those databases in SQL! So, that’s not an option.

I decided to make my life a little less complicated and use PowerShell to solve it.
The script below is going to execute the query above (with some minor adjustments to get some extra information) and after getting the results, it will get the corresponding URL’s for site and web, and the title of the list (if needed). All of this information is exported to a CSV.

This not only gives me information I can work with immediately, but it also allows me to do it all from my SharePoint server. I don’t need a SQL Management studio anymore or physical access to the database server.

Depending on the amount of rows that is returned from SQL, the execution of the script can take some time though. I tried to minimize the calls to SharePoint by using arrays where I store the URL’s and titles for found GUID’s. This reduces the stress on SharePoint a bit but it still requires some time to go through each row.

Using this list, you can prioritize the things to focus on during a cleanup. And it gives you also the ability to predict the impact of a purge on that table using specific parameters.