Cleaning Workflow History in SharePoint

A lot of organizations that use SharePoint, use the platform to automate business processes using workflows. But little organizations are aware that because of these workflows, your SharePoint environment needs to be maintained a little more than usual. Almost all organizations which use a lot of workflows start having performance issues. SharePoint is starting to get slow, workflows are starting to show weird startup behaviors or don’t start at all. Those kind of things. And they don’t know why this is happening. One of the causes for this are probably the workflow history lists.

Which lists? Workflow history? We don’t have such lists.
Yes you do, but you don’t know it, because they are hidden! The comments and events you see when you look at a started workflow… these are stored in the workflow history list. Because those lists are hidden, most of the organizations who never had to deal with these issues, don’t know these lists exist and so, those lists grow and grow and grow. Up to a point it starts to get problematic.

To give an example, I had a case where the largest workflow history list contained 24 million items. For the entire environment, the combined size of all workflow history was 39 million items. That’s a lot of history. And honestly… nobody cares about this information. It’s only used in the event something goes wrong in a workflow and you need to find out why. But in highly regulated or audited environments, this history data can be important.

If you use Nintex Workflow, you can purge those lists using a single command. But if you try to do this on a list that contains millions of items, chances are that you will run into issues. The list has become too big. And again… purging the data is not always possible due to policies.

Now what? You will probably start searching for a solution… in the end, it’s just a SharePoint list like any other list, right? Online, you find some hints on how to approach those lists:

  • Easiest option… create a new list, change the workflows to use that new list and just delete the old one. That’s the quickest way to deal with this. But then you will lose all history. Each history list item has an event type from 1 to 11. Suppose you want to retain the events which hold a record of task outcomes. Deleting the list isn’t an option anymore and you need to find a way to selectively delete list items.
  • Create a script that deletes the items. Well, seems logical to do this. But you need to find the most optimal way of deleting items. Ever deleted SharePoint items in large lists? If you have, you know that SharePoint takes it time to do this. In a specific case I had a delete frequency of 1 item per 9 seconds, for a list of 32000 items. You do the math… that’s 80 hours. Which is a lot. Imagine you have to do this on a list of 24 million items. Best case… it takes 9 seconds per item. That’s 2500 days! Or almost 7 years! Completely insane.

So, out of options?

Well no. If you need to retain specific items on that list, deleting individual list items is still an option. But you need to use a different approach.

Instead of iterating over the complete collection of items in a list and delete one by one, you can use a batch processing method which exists on the SPWeb object. This batch processing method accepts an XML structure that contains “methods”. Each method is an instruction to delete an item in a list.

Each method contains the GUID of the list you are targeting, the ID of the item, and a command. In our case “Delete”.

Once you have assembled this structure, you pass it as a parameter to the ProcessBatchData method on the SPWeb object and SharePoint will perform all of the methods in batch.

To give you an idea on the performance of deleting items using this method. It deleted 215000 items in 16 hours. Compare this to 32000 items in 80 hours. That’s a huge improvement.

How would you practically do this?

Well, you use a CAML query to get a bunch of items from your history list and you assemble your batch xml for these items. Once it has been processed, you repeat the query and do it again… until you run out of items.

Here’s an example. The script below is going to remove all list items which have an event type of 0, 4, 5 or 11. The query returns 2000 items at a time and assembles the required batch xml for these 2000 items. The “ListItemCollectionPosition” is used to know when we are at the end of the list and out of items to delete. When this is null after you execute a query, there are no more items to query.

Running this on a list with 24 million items still requires a lot of time. And you might question the importance of retaining events for such lists… but in any case, you now have a faster way of deleting items in a SharePoint list.

In the end, you need to avoid getting in such a situation in the first place. You can do this by establishing some governance policies regarding workflows and create some routines to enforce those policies by performing maintenance on a continuous basis. That way, you will be able to keep those history lists under control.

3 thoughts on “Cleaning Workflow History in SharePoint

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.