The Importance of Completing Tasks in Workflows

When you use workflows in your organization to automate business processes, you have to deal with the fact that people are lazy. They don’t like work. One of the things you do a lot in workflows, is create tasks. Somewhere in a process, somebody needs to complete a task before the next step in the process can be completed. So, the task is created and your workflow waits until someone completes it. The importance of completing tasks in the context of workflows is often underestimated. Completing a task in the context of a workflow is a 2-step process and this is a concept which lots of people don’t seem to understand and this leads to a lot of problems.

  • Step 1: Executing the action in the task
  • Step 2: Mark the task completed

Let me give you an example on how this might lead to issues.

You have an onboarding workflow for new employees. This workflow initiates several other workflows.
One of those other workflows is the creation of an Exchange mailbox. This “Exchange Mailbox Creation” workflow creates a task for the Exchange admins which instructs them to create a mailbox for the new employee.

Meanwhile, the main workflow handles some other stuff which is important for onboarding the new employee. Somewhere at the end of the process, the main workflow is going to check if the mailbox has been created. As long as the mailbox is not created, the onboarding process cannot be completed.
To do this, the workflow contains a loop which checks if the Exchange Mailbox Creation workflow is completed. If it’s not, It’s going to pause for 1 hour and check again. This repeats itself until the Exchange Mailbox Creation workflow is completed.

The problem is… that workflow will NEVER end because the people who are responsible for creating Exchange mailboxes do understand the importance of creating a mailbox for a new employee, but don’t understand the importance of marking the task completed in the system. They don’t know that the overall onboarding process relies on the completion of those tasks.

Direct results :

  • The Exchange Mailbox creation workflow will be waiting indefinetely unless someone sets that task to completed
  • The onboarding process for this employee will never finish because of the Exchange Mailbox Creation workflow will never finish

Indirect results:

  • The history list will grow constantly because of the loop which generated 2 list items every hour (for the pause action). Read this post to understand the importance of maintaining a history list.
  • The workflowprogress data table will grow constantly because of the same loop. Read this post to understand the importance of workflowprogress data.

A workflow designer has the responsibility of covering this by :

  • including reminder notifications in the tasks
  • Have some kind of escalation procedure in the case nobody bothers to complete those tasks. Escalating to a manager often helps
  • Make sure that the task notifications stress the importance of completing the tasks for a proper process termination. This should also be emphasized in the reminder notifications or escalations as well

For the example of the onboarding process, I had 91 running instances which have been running for months up until a year because of these incompleted tasks. These 91 instances are responsible for a history list that grows with almost 4400 list items every single day.

You can avoid all of this by thinking through your process and include logic in your workflow and tasks to avoid these situations. But if you are tasked with cleaning up an environment where these kind of issues exist for a while, you might have an environment which has thousands of running workflows that keep filling your history lists and workflowprogress tables with useless information and you need a way to stop this madness.

If you are using Nintex Workflow, you can use the below which allows you to terminate Nintex Workflows in batch. You specify your Nintex content database and a date and the script will terminate all running instances that didn’t have any activity since the specified date.

Keep in mind that terminating workflows might result in notification emails being sent by the system.

 

Maintaining Nintex Workflow Progress Data

In my last post I talked about maintaining the workflow history lists throughout SharePoint. This post is about maintaining Nintex workflow progress data. This data is found in the WorkflowProgress table in a Nintex content database.

When a Nintex Workflow is executed, each action is recorded in the WorkflowProgress table. This gives the system the opportunity to show a graphical representation of the workflow history for a specific instance. You can imagine that this table can contain a lot of information. Nintex recommends to keep this table below 15 million rows to avoid performance issues.

To maintain this table, you can use the PurgeWorkflowData operation of nwadmin.

The PurgeWorkflowData operation has a lot of parameters to specify which records it needs to purge. These parameters might not be enough to limit the records you want to purge. There’s a little catch in purging workflow progress data. To allow proper maintenance of the workflow history data, you need to make sure that you only purge workflow progress data for workflow instances where NO workflow history exists anymore. This means that there’s an order in how to maintain both the history lists and the workflowprogress table:

  1. Workflow History Lists
  2. WorkflowProgress table

If you purge the WorkflowProgress table before you clean the workflow history, you will end up with history list items which cannot be purged anymore in a selective way (using the ‘PurgeHistoryListData’ operation of nwadmin). You can only purge them by clearing the list completely.

Read more

Cleaning Workflow History in SharePoint

A lot of organizations that use SharePoint, use the platform to automate business processes using workflows. But little organizations are aware that because of these workflows, your SharePoint environment needs to be maintained a little more than usual. Almost all organizations which use a lot of workflows start having performance issues. SharePoint is starting to get slow, workflows are starting to show weird startup behaviors or don’t start at all. Those kind of things. And they don’t know why this is happening. One of the causes for this are probably the workflow history lists.

Which lists? Workflow history? We don’t have such lists.
Yes you do, but you don’t know it, because they are hidden! The comments and events you see when you look at a started workflow… these are stored in the workflow history list. Because those lists are hidden, most of the organizations who never had to deal with these issues, don’t know these lists exist and so, those lists grow and grow and grow. Up to a point it starts to get problematic.

To give an example, I had a case where the largest workflow history list contained 24 million items. For the entire environment, the combined size of all workflow history was 39 million items. That’s a lot of history. And honestly… nobody cares about this information. It’s only used in the event something goes wrong in a workflow and you need to find out why. But in highly regulated or audited environments, this history data can be important.

If you use Nintex Workflow, you can purge those lists using a single command. But if you try to do this on a list that contains millions of items, chances are that you will run into issues. The list has become too big. And again… purging the data is not always possible due to policies.

Now what? You will probably start searching for a solution… in the end, it’s just a SharePoint list like any other list, right? Online, you find some hints on how to approach those lists:

  • Easiest option… create a new list, change the workflows to use that new list and just delete the old one. That’s the quickest way to deal with this. But then you will lose all history. Each history list item has an event type from 1 to 11. Suppose you want to retain the events which hold a record of task outcomes. Deleting the list isn’t an option anymore and you need to find a way to selectively delete list items.
  • Create a script that deletes the items. Well, seems logical to do this. But you need to find the most optimal way of deleting items. Ever deleted SharePoint items in large lists? If you have, you know that SharePoint takes it time to do this. In a specific case I had a delete frequency of 1 item per 9 seconds, for a list of 32000 items. You do the math… that’s 80 hours. Which is a lot. Imagine you have to do this on a list of 24 million items. Best case… it takes 9 seconds per item. That’s 2500 days! Or almost 7 years! Completely insane.

So, out of options?

Well no. If you need to retain specific items on that list, deleting individual list items is still an option. But you need to use a different approach.

Instead of iterating over the complete collection of items in a list and delete one by one, you can use a batch processing method which exists on the SPWeb object. This batch processing method accepts an XML structure that contains “methods”. Each method is an instruction to delete an item in a list.

Each method contains the GUID of the list you are targeting, the ID of the item, and a command. In our case “Delete”.

Once you have assembled this structure, you pass it as a parameter to the ProcessBatchData method on the SPWeb object and SharePoint will perform all of the methods in batch.

To give you an idea on the performance of deleting items using this method. It deleted 215000 items in 16 hours. Compare this to 32000 items in 80 hours. That’s a huge improvement.

How would you practically do this?

Well, you use a CAML query to get a bunch of items from your history list and you assemble your batch xml for these items. Once it has been processed, you repeat the query and do it again… until you run out of items.

Here’s an example. The script below is going to remove all list items which have an event type of 0, 4, 5 or 11. The query returns 2000 items at a time and assembles the required batch xml for these 2000 items. The “ListItemCollectionPosition” is used to know when we are at the end of the list and out of items to delete. When this is null after you execute a query, there are no more items to query.

Running this on a list with 24 million items still requires a lot of time. And you might question the importance of retaining events for such lists… but in any case, you now have a faster way of deleting items in a SharePoint list.

In the end, you need to avoid getting in such a situation in the first place. You can do this by establishing some governance policies regarding workflows and create some routines to enforce those policies by performing maintenance on a continuous basis. That way, you will be able to keep those history lists under control.

Gaining insights in your Nintex Workflow Data

Working with workflows in SharePoint is always a challenging task. Especially when you are an administrator and people start complaining about performance issues, workflows that don’t start or resume, and so on. Using Nintex Workflow doesn’t change anything, but it gives you a bit more leverage.

The first thing I always do, is try to get some insights in the size of the issue. So, we’re talking about data. How much workflow data is involved. When Nintex is involved, there are 2 places where I’m going to look:

  • Workflow History Lists (SharePoint)
  • WorkflowProgress table (SQL Database)

Nintex has an article on both topics which I can definately recommend.

https://support.nintex.com/SharePoint/Forms/SharePoint_Maintenance_for_Nintex_Workflow

This article links to a PDF which explains how to maintain the WorkflowProgress table. You can find that PDF here.

In this PDF, you will find a SQL query which queries this table and the WorkflowInstance table to give you an insight in the sheer amount of data which is stored and which might be the root cause of your issues with workflows.

Just for you lazy people, here’s the query:

This gives you a lot of information which you can use in your remediation strategy.
Just export that information to a CSV and load it in Excel and you can have the following information at hand in no time.

workflow data

This is a table which gives you, per Nintex content database, the amount of records in the workflowProgress table (ActionCount) and the number of workflow instances involved. And it splits up this information for the different states workflows can be in (running, completed, canceled, error). Whoever came up with the concept of Pivot tables… thank you! 🙂

While this query is great for getting that information and report on it, it’s not that useful for people who are responsible for creating or maintaining those workflows. Look at the query. It gives you the site, web, list and even item that’s involved. But those are the ID’s. You still need to resolve these to a URL or title of a list to be useful for anyone who’s not an administrator on SharePoint and knows his way around in PowerShell.

You could customize this query and include the SharePoint content databases in there to get you the needed information but you know you shouldn’t touch those databases in SQL! So, that’s not an option.

I decided to make my life a little less complicated and use PowerShell to solve it.
The script below is going to execute the query above (with some minor adjustments to get some extra information) and after getting the results, it will get the corresponding URL’s for site and web, and the title of the list (if needed). All of this information is exported to a CSV.

This not only gives me information I can work with immediately, but it also allows me to do it all from my SharePoint server. I don’t need a SQL Management studio anymore or physical access to the database server.

Depending on the amount of rows that is returned from SQL, the execution of the script can take some time though. I tried to minimize the calls to SharePoint by using arrays where I store the URL’s and titles for found GUID’s. This reduces the stress on SharePoint a bit but it still requires some time to go through each row.

Using this list, you can prioritize the things to focus on during a cleanup. And it gives you also the ability to predict the impact of a purge on that table using specific parameters.

Following sites not working in a Hybrid Sites scenario – 401 Unauthorized

Earlier this week, I was at one of my customers which has a SharePoint 2013 implementation. They had an issue where following sites was not working anymore. When they clicked the Follow link, they got an error that the site could not be followed.

They have a hybrid implementation with OneDrive for Business and Hybrid Sites setup.
When I looked in the ULS, I saw the following error popping up

Loud and clear… authentication issues.

Microsoft has an excellent resource where they outline the roadmap to implement hybrid features.

Both roadmaps outline the steps which are needed to set up those features. Since OneDrive for Business was working fine, I focused on the Hybrid Sites features and started going through the steps of the roadmap to see if everything was set up correctly.

  1. Configure Office 365 for SharePoint hybrid – Check!
  2. Set up SharePoint services for hybrid environments – Check!
  3. Install the September PU for SharePoint Server 2013 – We were on the December 2016 CU, so … Check!
  4. Configure S2S authentication from SharePoint Server 2013 to SharePoint Online –  Hmmm… I don’t recall doing this in the past.
  5. Configure hybrid sites features in Central Administration – Check!

Since I was getting authentication issues, and I didn’t recall me doing the S2S authentication configuration step, I figured that this was the cause of the problem.

When you follow the link for that step, you will see that there’s some work to do to set it up. Luckily, Microsoft provided a tool which actually does it for you. It’s called the Hybrid Picker. This simplifies things a bit.

Read more

Change SharePoint Service Identities using PowerShell

After installing SharePoint and setting up my farm, one of the first things I always do is change SharePoint service identities. In a freshly installed SharePoint farm, most services are running under the farm account or under a local identity (LocalService, LocalSystem). Some of the services I change right away:

  • Search Host Controller Service
  • SharePoint Server Search
  • Distributed Cache
  • SharePoint Tracing Service

With the exception of the SharePoint Tracing Service, all of these identities can be changed from the “Service Accounts” page in Central Administration. But where’s the fun in that… furthermore, this page has one big disadvantage. You can change a service to run with a managed account but you can’t set it to run under a local account (LocalService, LocalSystem, NetworkService). So, if you changed your service from a local account to a domain account, you can’t undo this change using the UI. You need to use PowerShell.

The script below allows you to set a domain account or a local account.

 

Cleaning up obsolete SharePoint groups

During the lifetime of a SharePoint implementation, sites come and go. When a site collection grows, you typically see the amount of SharePoint groups growing as well because you want to give people access to those sites in a sort of organized way. When sites go, those groups are left behind. Removing those obsolete SharePoint groups can be a challenging task because groups which have been created for a specific site can be used for other sites as well. So, before removing a group, you need to be sure that it’s not used on any other sub sites.

SharePoint groups live on the root web of a site collection.

If a group is created as part of the site creation process, it will have a description which clearly states for which site is has been created. This doesn’t mean it can’t be used on any other sites. If you want to get a list of all SharePoint groups which exist in a site collection, you can also use the following PowerShell snippet to get the collection.

obsolete-sharepoint-groups-1

If you want to know which groups are used on a specific subsite of the site collection, you can use the UI and check the Site Permissions section of a site. This will give you all permissions for that site. You can also use the following PowerShell snippet to get these.

obsolete-sharepoint-groups-2

See the difference? The SiteGroups has a group “MyCustomGroup” which is not part of the Groups collection of the same web. This means that the group exists in the site collection but at this point, it’s not used. When I give this group explicit permissions to my site, it will be added to this collection and it will be in use.

So, the process of cleaning up obsolete groups is to check the Groups collection on each sub site and see which site groups are used for giving people access. If you have a site group which is not part of any Groups collection of any site, it’s not used and you can remove that group from the site collection.

You can do this manually, or you can automate this process and use the following script for this task.

This script will do 2 things.
If run in Simulation mode, it will look for obsolete groups and ouput them to the console. Nothing more.
If run in Execution mode, it will look for obsolete groups and delete them from the site collection.

My advice… run it in Simulation mode before running it in Execution mode. That way, you have an idea of what groups were found and will be deleted.

There are some situations which need clarifications.

Inherited permissions

What happens when you have subsites which inherit permissions? This is no issue. Suppose you have a subsite which inherits permissions of the root web of the site collection. When a SharePoint group is given access to the root web of the site collection, it will be given access to all sub sites which inherit their permissions and as such, that group will be part of the Groups collection of those sub sites.

Audience targeting

What happens when a group is exclusively used for audience targeting? Well, this is a problem because that group is not part of the Groups collection of the site where you have used it as an audience. In my opinion, this is a situation you should avoid doing because you are going to give a collection of users access to a site. In theory, an audience is a subset of authorized users, right? You want to target specific content on a site to specific users. If they can’t reach the site, what’s the point in targeting content to them?

If you do find yourself in such a situation where you have SharePoint groups which are exclusively used for audience targeting, a good approach would be to give those groups distinctive names, clearly indicating they are audience targeting groups. For example, you could start each group name with “AUD_”. This way, you can extend the script above and include a check to skip groups which start with “_AUD”.

SharePoint issues when using a trust with Selective Authentication

If you have some experience with SharePoint, the issue where you get a credential request three times before hitting the 401 Unauthorized is probably not new to you. We all know this happens when you try to navigate to a SharePoint site from the web front-end servers. Resolving this is common knowledge for SharePoint admins… You disable the loopback check in the registry or you use the recommended BackConnectionHostNames registry key. This has been documented in KB896861.

Last week, I was at a customer doing an assessment of a SharePoint implementation and one of their developers approached me with a weird issue on their Extranet. They have a SharePoint farm in a separate extranet domain. Between the internal domain and the extranet domain is a one-way trust to allow users from the internal domain to use their accounts to log on to a site on the Extranet. He was able to do this from the web front-end servers of the Extranet farm but not from his laptop. On his laptop, he had to enter his credentials and this kept failing… seems familiar right?

I double-checked the BackConnectionHostNames on the servers and sure enough, the key and hosts were there.

I tried the same thing on my machine with my account and this worked! I was able to go to the site from my machine. When he tried to do it from my machine with his account, it failed. We tested this on several other clients with several users… ALL of them had the same issue. Nobody was able to sign-in. Only I was able to sign-in from any place.

I will spare you the checks and comparisons we did, but I will tell you that we were able to solve it!

Servers in a domain are, like user accounts, just objects in Active Directory. When you open the properties of such a computer object in AD, and you go to the Security tab, you can specify a lot of permissions which specific AD objects can have on this computer. One of those permissions is “Allowed to authenticate”. For the servers in that Extranet farm, I was explicitly granted that permission, while the “Authenticated Users” group was not…

allowed-to-authenticate

In normal circumstances, this doesn’t pose any issue. If you have 1 domain which contains your users and servers, this permission is not required. Furthermore, if you have multiple domains and a one-way trust and you keep the default trust authentication level (Forest-wide authentication), you will not have any issues with users from the trusted domain authenticating to resources in the trusting domain.

selective-authentication-02

However, when you are using “Selective Authentication”, you need to explicitly grant the “Allowed to authenticate” permission to all users on the resources they need to access. When we verified this authentication level at the customer, we got confirmation that they were using selective authentication. So, we had to give “Authenticated Users” this permission on the SharePoint servers in the AD of the Extranet to resolve this issue.

See following articles for more information on selective authentication on trusts.

An eye for details… changing the ImageUrl for migrated lists

Migrating from SharePoint 2007 to SharePoint 2013 can cause all kind of headaches but this post is not about those headaches. It’s about details, or better… having an eye for details. Ever noticed that after you migrate a site to SharePoint 2013 and you complete the actual visual upgrade to SharePoint 2013 mode, the list icons which are used, are not the fancy list icons which you get when you create a new list or library in SharePoint 2013? The icons for migrated lists and libraries are still the old icons from the early days of SharePoint 2007.

ImageUrl - 1

The icon for a list or library is stored in the ImageUrl property of a list and this property points to a gif or png in the “/_layouts/images/” folder for migrated lists. When you create a new list in SharePoint 2013, the value of the property points to “/_layouts/15/images”. Furthermore, if you compare for instance a migrated document library with a new document library, you notice that the value of the property differs, not only in the location where the icon is displayed from, but also the type of file. For instance, a simple document library.

  • Migrated document library : /_layouts/images/itdl.gif
  • New document library : /_layouts/15/images/itdl.png?rev=23

While I can imagine that a lot of people really don’t see any issue with this and don’t care how those icons look like, I don’t like loose ends. If you migrate an environment, you might as well get it done completely and replace the list icons with new versions as well and get the following result in the end.

ImageUrl - 2

Admit it, this looks much better than those old school icons. It’s a small detail, but it just makes more sense. If you have a smart user who actually cares about the environment, the question why new lists have different icons than existing lists, will eventually pop up anyway and if you tell them that this the result of the migration, the next question will be whether you can change them to resemble the new lists. Show your customers or users you have an eye for detail and do it proactively.

Changing these icons can be done very easily using PowerShell. The only thing you need is a mapping between the old and new icon.

I created a script which replaces all icons for lists and libraries. In this script, a mapping is done for the mostpart of the icons which are used. It might not be the complete list, but feel free to add missing icons. There are some scripts out there which replace icons, not based on a mapping but just replace all .gif icons with .png’s. However, there are some icons which don’t have .png counterparts. So, if you replace those, your list icon will be broken.

You can find this script in my PowerShell repository on GitHub

Replacing event receivers in SharePoint

I’m currently migrating a SharePoint 2007 to SharePoint 2013. For this particular environment, a custom solution was made which involves a number of event receivers. The customer wanted to retain this functionality, so I had to port this solution to SharePoint 2013. One problem though… the source code was not available. We had to revert to reverse engineering the solution using ILSpy to recreate the source code and build a new solution. We made sure that all feature ID’s were the same and that our namespaces and class names were also the same. After deploying and testing the solution, it worked.

During the migration, we attached the content database to the SharePoint 2013 web application and that’s when we noticed something.
When you add an event receiver to a SharePoint list, the “Assembly” property of the event receiver contains the assembly signature of the DLL which contains the event receiver class. When we attached the database, SharePoint complained it was missing an assembly. The assembly of the old solution. When we compared the assembly signature of the old solution with the signature of our new solution, we saw it had a different publickeytoken. We completely overlooked this. This was one of those “Doh!” moments.

It seems that it’s not that straightforward to change the publickeytoken. I found a way to extract this publickeytoken from a DLL and generate a strong name key (SNK) file.

sn.exe -e myassembly.dll mykey.snk

But this strong name key is missing one crucial piece of information. The private key. If you want to sign your solution with this strong name, you need to do this using delay signing. Your solution will build and the signature matches the one from the old assembly, but when you try to deploy it to SharePoint, it fails because it can’t add the assembly to the GAC due to the missing private key.

I figured that instead of looking for workarounds, the most easy way to solve this, is to replace the old event receivers with new ones which have the correct signature. This proved to be an easy solution. I created 2 scripts which helped me with this.

Get all event receivers with a specific signature

This scripts returns all event receivers which have a specific signature.

You can export this output to a CSV file, which can be used in the next script. All information which is needed to replace these eventreceivers is included in the output.

Delete and recreate event receivers

Using the .CSV file which can be created from the previous script, the script below deletes the old eventreceivers and replaces them with new ones. It uses the information from the old eventreceivers which is included in the CSV and uses the signature which is passed in as a parameter, as the new assembly signature for the new event receivers.

You can find these scripts in my PowerShell repository on GitHub.