Following sites not working in a Hybrid Sites scenario – 401 Unauthorized

Earlier this week, I was at one of my customers which has a SharePoint 2013 implementation. They had an issue where following sites was not working anymore. When they clicked the Follow link, they got an error that the site could not be followed.

They have a hybrid implementation with OneDrive for Business and Hybrid Sites setup.
When I looked in the ULS, I saw the following error popping up

Loud and clear… authentication issues.

Microsoft has an excellent resource where they outline the roadmap to implement hybrid features.

Both roadmaps outline the steps which are needed to set up those features. Since OneDrive for Business was working fine, I focused on the Hybrid Sites features and started going through the steps of the roadmap to see if everything was set up correctly.

  1. Configure Office 365 for SharePoint hybrid – Check!
  2. Set up SharePoint services for hybrid environments – Check!
  3. Install the September PU for SharePoint Server 2013 – We were on the December 2016 CU, so … Check!
  4. Configure S2S authentication from SharePoint Server 2013 to SharePoint Online –  Hmmm… I don’t recall doing this in the past.
  5. Configure hybrid sites features in Central Administration – Check!

Since I was getting authentication issues, and I didn’t recall me doing the S2S authentication configuration step, I figured that this was the cause of the problem.

When you follow the link for that step, you will see that there’s some work to do to set it up. Luckily, Microsoft provided a tool which actually does it for you. It’s called the Hybrid Picker. This simplifies things a bit.

Read more

Convert a SharePoint 2016 front-end to a front-end with Distributed Cache

I have been running a SharePoint 2016 farm with 4 servers (Front-End, Application, Distributed Cache, Search) but now Feature Pack 1 has been released, I’m looking to reduce this to 2 servers and use the 2 combined minroles which have been added. To be more precise, I want to eliminate my Distributed Cache server and convert my Front-End to a Front-End with Distributed Cache. After that, I want to do the same for my Search server and convert my application server to an application server with search. But doing this for my Distributed Cache server, proved to be a challenge. I was getting all kind of errors. This might have something to do with the order I did things though.
Before I did the conversion, I removed my Distributed Cache server from the farm and I didn’t do this in a graceful way… I just disconnected it from the farm.

My first attempt to convert a server failed with a very self-explanatory error:

The Distributed Cache service instance must be configured. Run ‘Add-SPDistributedCacheServiceInstance -Role WebFrontEndWithDistributedCache’ on this server to configure the Distributed Cache service instance.

minrole1

That’s pretty clear. This is indeed a front-end server and in SharePoint 2016, this role doesn’t have the Distributed Cache service instance. So, I added it like the error proposed.

The result however was not what I expected. I got the error:

Failed to connect to hosts in the cluster

minrole2

Come to think of it… this actually makes sense. I already mentioned I kicked out the Distributed Cache server in a very direct way. Probably, my cluster is still referencing this host. The only way to verify this is to look at the cluster configuration. To get this information, you can export it to a text file.

Somewhere near the end of the file a “hosts” section can be found and this contained the following information:

minrole2a

And this confirmed it. The old host was still listed in the configuration. I had to remove this host first. To do this, I executed the following command:

This succesfully unregistered my host from the cluster and I re-ran the command to add the Distributed Cache Service Instance.

This time, no error! Hooray!

I proceeded to run the Set-SPServer command again to start the conversion. Again, an error appeared. This time, the error was different.

The Distributed Cache service instance must be reconfigured. Run ‘Remove-SPDistributedCacheServiceInstance’ on this server to remove the Distributed Cache service instance. Then run ‘Add-SPDistributedCacheServiceInstance -Role WebFrontEndWithDistributedCache’ on this server to reconfigure the Distributed Cache service instance….

minrole3

At this point, when you look at the Distributed Cache service in Central Administration, you can see that it’s stopped. If you open the Windows Services console on the server, the AppFabric Caching Service is disabled.

I tried to remove this service using Remove-SPDistributedCacheServiceInstance, but it failed with the error:

cacheHostInfo is null

minrole4

To get past this point and get the service removed, you can execute the following piece of PowerShell script:

This worked and it removed the service instance completely from my server.

I then proceeded to execute the Add-SPDistributedCacheServiceInstance command again but this time without the -Role parameter!!!

When the command finished, I went to the Services on Server page in Central Administration and I noticed that the Distributed Cache service was added again. This time it was started! It also stated that my server was not compliant anymore but who cares…. we are going to change the role anyway.

minrole5

I re-ran the Set-SPServer command to start the conversion:

And yes, now it worked. I was notified that a timer job had been scheduled to do the conversion.

minrole6

I only had to wait until the job was completed. When I looked at the Servers page in Central Administration, I could clearly see that my role was changed and that the server was compliant.

minrole7

Part of this could possibly have been avoided by removing the Distributed Cache server in a more graceful way. Lessons learned… don’t cut corners!  😎 

DistributedCache Server role vs SkipRegisterAsDistributedCacheHost

Since SharePoint 2013, the New-SPConfigurationDatabase and Connect-SPConfigurationDatabase cmdlets have a parameter called “SkipRegisterAsDistributedCacheHost”. When this switch parameter is specified during the creation of a new  farm or when a server is added to an existing farm, the local server will not have the Distributed Cache. With the arrival of SharePoint 2016, we also got the MinRole feature. This feature enables you to designate the local server to be a “DistributedCache” server. How to do this, is explained in one of my previous posts where I provide a script to create a farm and to join a farm. I was wondering what happens when you use the DistributedCache server role together with the -SkipRegisterAsDistributedCacheHost switch.

Best way of finding this out is to try it. I did, and found out that the server in fact was a DistributedCache host after I joined it to the farm. So, it ignores the switch completely.

But what with the other roles? How about Custom? And Search? To see how this is handled, I fired up ILSpy and started digging in the code.

In the Microsoft.SharePoint.DistributedCaching.Utilities.SPDistributedCacheServiceInstance class, a method ShouldRegisterAsDistributedCacheHost exists and this is called somewhere during the execution of one of the above 2 cmdlets.

This method contains a test which illustrates how the decision is made if a server should be a DistributedCache host or not.

Bottomline… 

If the ServerRole parameter is DistributedCache, SingleServerFarm or WebFrontEndWithDistributedCache, the presence of the SkipRegisterAsDistributedCacheHost doesn’t matter. The server will be a Distributed Cache host. Period.

If the ServerRole parameter is not specified, “Custom” or we don’t have the MinRole feature enabled, the SkipRegisterAsDistributedCacheHost is taken into account and used in the decision.

Change SharePoint Service Identities using PowerShell

After installing SharePoint and setting up my farm, one of the first things I always do is change SharePoint service identities. In a freshly installed SharePoint farm, most services are running under the farm account or under a local identity (LocalService, LocalSystem). Some of the services I change right away:

  • Search Host Controller Service
  • SharePoint Server Search
  • Distributed Cache
  • SharePoint Tracing Service

With the exception of the SharePoint Tracing Service, all of these identities can be changed from the “Service Accounts” page in Central Administration. But where’s the fun in that… furthermore, this page has one big disadvantage. You can change a service to run with a managed account but you can’t set it to run under a local account (LocalService, LocalSystem, NetworkService). So, if you changed your service from a local account to a domain account, you can’t undo this change using the UI. You need to use PowerShell.

The script below allows you to set a domain account or a local account.

 

Cleaning up obsolete SharePoint groups

During the lifetime of a SharePoint implementation, sites come and go. When a site collection grows, you typically see the amount of SharePoint groups growing as well because you want to give people access to those sites in a sort of organized way. When sites go, those groups are left behind. Removing those obsolete SharePoint groups can be a challenging task because groups which have been created for a specific site can be used for other sites as well. So, before removing a group, you need to be sure that it’s not used on any other sub sites.

SharePoint groups live on the root web of a site collection.

If a group is created as part of the site creation process, it will have a description which clearly states for which site is has been created. This doesn’t mean it can’t be used on any other sites. If you want to get a list of all SharePoint groups which exist in a site collection, you can also use the following PowerShell snippet to get the collection.

obsolete-sharepoint-groups-1

If you want to know which groups are used on a specific subsite of the site collection, you can use the UI and check the Site Permissions section of a site. This will give you all permissions for that site. You can also use the following PowerShell snippet to get these.

obsolete-sharepoint-groups-2

See the difference? The SiteGroups has a group “MyCustomGroup” which is not part of the Groups collection of the same web. This means that the group exists in the site collection but at this point, it’s not used. When I give this group explicit permissions to my site, it will be added to this collection and it will be in use.

So, the process of cleaning up obsolete groups is to check the Groups collection on each sub site and see which site groups are used for giving people access. If you have a site group which is not part of any Groups collection of any site, it’s not used and you can remove that group from the site collection.

You can do this manually, or you can automate this process and use the following script for this task.

This script will do 2 things.
If run in Simulation mode, it will look for obsolete groups and ouput them to the console. Nothing more.
If run in Execution mode, it will look for obsolete groups and delete them from the site collection.

My advice… run it in Simulation mode before running it in Execution mode. That way, you have an idea of what groups were found and will be deleted.

There are some situations which need clarifications.

Inherited permissions

What happens when you have subsites which inherit permissions? This is no issue. Suppose you have a subsite which inherits permissions of the root web of the site collection. When a SharePoint group is given access to the root web of the site collection, it will be given access to all sub sites which inherit their permissions and as such, that group will be part of the Groups collection of those sub sites.

Audience targeting

What happens when a group is exclusively used for audience targeting? Well, this is a problem because that group is not part of the Groups collection of the site where you have used it as an audience. In my opinion, this is a situation you should avoid doing because you are going to give a collection of users access to a site. In theory, an audience is a subset of authorized users, right? You want to target specific content on a site to specific users. If they can’t reach the site, what’s the point in targeting content to them?

If you do find yourself in such a situation where you have SharePoint groups which are exclusively used for audience targeting, a good approach would be to give those groups distinctive names, clearly indicating they are audience targeting groups. For example, you could start each group name with “AUD_”. This way, you can extend the script above and include a check to skip groups which start with “_AUD”.

SharePoint issues when using a trust with Selective Authentication

If you have some experience with SharePoint, the issue where you get a credential request three times before hitting the 401 Unauthorized is probably not new to you. We all know this happens when you try to navigate to a SharePoint site from the web front-end servers. Resolving this is common knowledge for SharePoint admins… You disable the loopback check in the registry or you use the recommended BackConnectionHostNames registry key. This has been documented in KB896861.

Last week, I was at a customer doing an assessment of a SharePoint implementation and one of their developers approached me with a weird issue on their Extranet. They have a SharePoint farm in a separate extranet domain. Between the internal domain and the extranet domain is a one-way trust to allow users from the internal domain to use their accounts to log on to a site on the Extranet. He was able to do this from the web front-end servers of the Extranet farm but not from his laptop. On his laptop, he had to enter his credentials and this kept failing… seems familiar right?

I double-checked the BackConnectionHostNames on the servers and sure enough, the key and hosts were there.

I tried the same thing on my machine with my account and this worked! I was able to go to the site from my machine. When he tried to do it from my machine with his account, it failed. We tested this on several other clients with several users… ALL of them had the same issue. Nobody was able to sign-in. Only I was able to sign-in from any place.

I will spare you the checks and comparisons we did, but I will tell you that we were able to solve it!

Servers in a domain are, like user accounts, just objects in Active Directory. When you open the properties of such a computer object in AD, and you go to the Security tab, you can specify a lot of permissions which specific AD objects can have on this computer. One of those permissions is “Allowed to authenticate”. For the servers in that Extranet farm, I was explicitly granted that permission, while the “Authenticated Users” group was not…

allowed-to-authenticate

In normal circumstances, this doesn’t pose any issue. If you have 1 domain which contains your users and servers, this permission is not required. Furthermore, if you have multiple domains and a one-way trust and you keep the default trust authentication level (Forest-wide authentication), you will not have any issues with users from the trusted domain authenticating to resources in the trusting domain.

selective-authentication-02

However, when you are using “Selective Authentication”, you need to explicitly grant the “Allowed to authenticate” permission to all users on the resources they need to access. When we verified this authentication level at the customer, we got confirmation that they were using selective authentication. So, we had to give “Authenticated Users” this permission on the SharePoint servers in the AD of the Extranet to resolve this issue.

See following articles for more information on selective authentication on trusts.

The SharePoint 2013 installer doesn’t like .NET Framework 4.6.x

One of the Prerequisites of SharePoint 2013 is the .NET Framework 4.5. I have been installing countless SharePoint 2013 environments without issues but recently, I started noticing that the SharePoint installer fails with the following error :

Setup is unable to proceed due to the following error(s):
This product requires Microsoft .NET Framework 4.5.

 

NET Framework 4.6 - pic 2

This might seem strange since the prerequisite installer installed everything. When this happens, you might want to have a look at the actual installed versions of the .NET Framework. You will probably notice that .NET Framework 4.6 or higher is installed. To check which versions have been installed, you can download the .NET Framework Setup Verification Utility from Microsoft to verify which versions of the .NET Framework have been installed. When you run this tool, you will see an overview of all installed versions.

NET Framework 4.6 - pic2

When you see the .NET Framework 4.6 or 4.6.1 listed, the SharePoint installer cannot detect the presence of version 4.5.

With the release of Service Pack 1 for SharePoint 2013, SharePoint is compatible with version 4.6 but the installer isn’t. You can have version 4.6, but only AFTER you installed SharePoint 2013.

So, in order to get SharePoint installed on that machine, you need to uninstall version 4.6 or higher before you can install SharePoint 2013. And this is where it gets a bit annoying. The tool to verify the .NET Framework also has the ability to uninstall a version. Except… this doesn’t work! You can try it but nothing happens. When you recheck, it’s still there. Even after a reboot.

This version of the .NET Framework can be uninstalled by uninstalling Windows update KB3102467. But this is just the first action… if you are installing SharePoint in an organization which pushes Windows Updates, you probably want to  block this framework from installing on that machine for the time being.

To do that, you can execute the following PowerShell script. This is going to add a key ‘BlockNetFramework461’ to the registry and set it to 1 to block the installation of that version. Once SharePoint is installed, you can remove this key if you want.

To see the official article for the blocking of the installation of the .NET Framework 4.6.1, see the following link:

https://support.microsoft.com/en-us/kb/3133990

Once that version is uninstalled, you can proceed with the installation of SharePoint 2013 on that machine.

Update (14/09/2016)

Microsoft released a fix for the installer issue. You can find this fix in KB3087184.

Cleaning up large content databases and reclaiming unused disk space

Dealing with large SharePoint content databases can be a daunting task. Microsoft has the recommendation to keep your content databases below the 200GB mark for databases which are used daily. But when you are dealing with backup/restore, migrations and general operations which involve moving around those kind of databases, even 200GB can be a huge pain in the ass and will cost you in terms of time you are spending looking at a progress bar or watching a percentage creeping slowly to 100%.

A solution for this is to split up the content database into smaller databases provided that you have multiple site collections in that database which can be moved out.
Relocating a site collection to a different content database is very easy. You can do this with the Move-SPSite cmdlet.

Once you moved out a site collection to a different database, you will notice that your huge database is not getting smaller. That’s because the site collection which has been moved, is still in that database but it’s marked for deletion.
The actual removal of that site collection is done by the Gradual Site Delete timer job. This job runs once a day. Once it has run, the site collection is completely removed from the database.
But still, if you look at the database, it will not be any smaller than before. When you look at the used space in the database, you will see that this has decreased and the unused space has grown. The unused space is not released.
To release unused space, … *ducks for cover* … you can shrink the database. There, I said it.

Generally speaking, shrinking content databases is not done. You can do this, but it has no advantages and it has a negative effect on performance because when someone adds something to a site in that database, it has to expand to be able to store anything.
So, shrinking is definitely something you should avoid at any cost… except for the case where you have such a huge content database that you’ve split up into smaller content databases. The reason for splitting up the database in the first place was to make it smaller, right? To have a size which is much more manageable. But in order to get it back to a reasonable size, you need to shrink it. There’s no way around it.

During a migration from SharePoint 2007 to SharePoint 2013, I had to migrate a content database of 220GB. All things considered, this is not huge. It’s large, but not huge. This content database contained around 20 site collections. Backup of this database was not an issue… 20 minutes to backup this database. Copying this backup to a SharePoint 2010 migration server was frustrating. It took over an hour. Yeah, It SHOULD go faster but if you pass through a 10Mbit router, you are copying at USB 2.0 speed! But this was nothing compared to the time the Mount-SPContentDatabase cmdlet needed to complete to attach this database to the web application and do the actual upgrade from SP2007 to SP2010. This attach/upgrade took almost 3 hours and then it just aborted due to lack of disk space. The migration server had a data disk of 600GB and it just filled completely with the transaction log that was created as part of the attach/upgrade process. So, I lost 3 hours, had to wait until extra disk space was added and restart the whole thing again. By the time it had attached and upgraded, the size of the database was actually increased to 330GB.

When everything was attached and upgraded, I decided that I’m not going through this again when I do the migration from SP2010 to SP2013. I needed to have databases which are easier to handle. So, I split up this database into 5 databases of which the largest was still 115GB. But ok, nothing I could do about that in the short term.

Running the Gradual Site Delete job however proved to be a pain as well… took almost 6 hours to complete! Started around 2PM. Went home at 5PM. Next day, I noticed it finished around 8PM. So, I started the shrink operation of the database… lost half a day with that. Wasn’t able to do anything with that database for the larger part of the day.

Since this was only a “test” migration, I realised that history was going to repeat itself during the final migration and that I needed a way to make use of those lost hours between the finishing of the gradual site delete and the shrink. When the gradual site delete is done, start with the shrinking to have it done in the morning.

Enter… PowerShell!

The script below is going to kick off the Gradual Site Delete timer job for a specific web application and it then will monitor this job every 5 minutes to see if has completed. Once it has completed, it will continue with the shrinking of the database. The shrinking happens in 2 stages. The first stage is done with NOTRUNCATE. This means that SQL will move all pages to the front of the file but it will not reclaim unused space. The size stays the same. The next step is a shrink operation with TRUNCATEONLY. This will just remove all unused space from the end of the file. It’s basically the same thing that happens when you do a Shrink Database from the Management Studio.

Again, don’t do this as a part of a weekly maintenance routine because the first step of the shrink will introduce index fragmentation in your database. Seriously! For me, this was a necessary cleanup I had to do as part of a migration project to reorganize the content database and minimize the migration time! The environment I was doing this in, was a intermediate SharePoint 2010 environment, not a live environment.

Also, the Shrink operation in the script allows you to specify a percentage of free space it should reserve for unused space.

I used 5%. This way, for a content database of 100GB, 5GB of free space is retained. You can change this if you want, or you can add an additional parameter which allows you to specify the amount of free space it should keep.

You can find this script in my PowerShell repository on GitHub.

An eye for details… changing the ImageUrl for migrated lists

Migrating from SharePoint 2007 to SharePoint 2013 can cause all kind of headaches but this post is not about those headaches. It’s about details, or better… having an eye for details. Ever noticed that after you migrate a site to SharePoint 2013 and you complete the actual visual upgrade to SharePoint 2013 mode, the list icons which are used, are not the fancy list icons which you get when you create a new list or library in SharePoint 2013? The icons for migrated lists and libraries are still the old icons from the early days of SharePoint 2007.

ImageUrl - 1

The icon for a list or library is stored in the ImageUrl property of a list and this property points to a gif or png in the “/_layouts/images/” folder for migrated lists. When you create a new list in SharePoint 2013, the value of the property points to “/_layouts/15/images”. Furthermore, if you compare for instance a migrated document library with a new document library, you notice that the value of the property differs, not only in the location where the icon is displayed from, but also the type of file. For instance, a simple document library.

  • Migrated document library : /_layouts/images/itdl.gif
  • New document library : /_layouts/15/images/itdl.png?rev=23

While I can imagine that a lot of people really don’t see any issue with this and don’t care how those icons look like, I don’t like loose ends. If you migrate an environment, you might as well get it done completely and replace the list icons with new versions as well and get the following result in the end.

ImageUrl - 2

Admit it, this looks much better than those old school icons. It’s a small detail, but it just makes more sense. If you have a smart user who actually cares about the environment, the question why new lists have different icons than existing lists, will eventually pop up anyway and if you tell them that this the result of the migration, the next question will be whether you can change them to resemble the new lists. Show your customers or users you have an eye for detail and do it proactively.

Changing these icons can be done very easily using PowerShell. The only thing you need is a mapping between the old and new icon.

I created a script which replaces all icons for lists and libraries. In this script, a mapping is done for the mostpart of the icons which are used. It might not be the complete list, but feel free to add missing icons. There are some scripts out there which replace icons, not based on a mapping but just replace all .gif icons with .png’s. However, there are some icons which don’t have .png counterparts. So, if you replace those, your list icon will be broken.

You can find this script in my PowerShell repository on GitHub

Replacing event receivers in SharePoint

I’m currently migrating a SharePoint 2007 to SharePoint 2013. For this particular environment, a custom solution was made which involves a number of event receivers. The customer wanted to retain this functionality, so I had to port this solution to SharePoint 2013. One problem though… the source code was not available. We had to revert to reverse engineering the solution using ILSpy to recreate the source code and build a new solution. We made sure that all feature ID’s were the same and that our namespaces and class names were also the same. After deploying and testing the solution, it worked.

During the migration, we attached the content database to the SharePoint 2013 web application and that’s when we noticed something.
When you add an event receiver to a SharePoint list, the “Assembly” property of the event receiver contains the assembly signature of the DLL which contains the event receiver class. When we attached the database, SharePoint complained it was missing an assembly. The assembly of the old solution. When we compared the assembly signature of the old solution with the signature of our new solution, we saw it had a different publickeytoken. We completely overlooked this. This was one of those “Doh!” moments.

It seems that it’s not that straightforward to change the publickeytoken. I found a way to extract this publickeytoken from a DLL and generate a strong name key (SNK) file.

sn.exe -e myassembly.dll mykey.snk

But this strong name key is missing one crucial piece of information. The private key. If you want to sign your solution with this strong name, you need to do this using delay signing. Your solution will build and the signature matches the one from the old assembly, but when you try to deploy it to SharePoint, it fails because it can’t add the assembly to the GAC due to the missing private key.

I figured that instead of looking for workarounds, the most easy way to solve this, is to replace the old event receivers with new ones which have the correct signature. This proved to be an easy solution. I created 2 scripts which helped me with this.

Get all event receivers with a specific signature

This scripts returns all event receivers which have a specific signature.

You can export this output to a CSV file, which can be used in the next script. All information which is needed to replace these eventreceivers is included in the output.

Delete and recreate event receivers

Using the .CSV file which can be created from the previous script, the script below deletes the old eventreceivers and replaces them with new ones. It uses the information from the old eventreceivers which is included in the CSV and uses the signature which is passed in as a parameter, as the new assembly signature for the new event receivers.

You can find these scripts in my PowerShell repository on GitHub.