Cleaning up large content databases and reclaiming unused disk space

Dealing with large SharePoint content databases can be a daunting task. Microsoft has the recommendation to keep your content databases below the 200GB mark for databases which are used daily. But when you are dealing with backup/restore, migrations and general operations which involve moving around those kind of databases, even 200GB can be a huge pain in the ass and will cost you in terms of time you are spending looking at a progress bar or watching a percentage creeping slowly to 100%.

A solution for this is to split up the content database into smaller databases provided that you have multiple site collections in that database which can be moved out.
Relocating a site collection to a different content database is very easy. You can do this with the Move-SPSite cmdlet.

Once you moved out a site collection to a different database, you will notice that your huge database is not getting smaller. That’s because the site collection which has been moved, is still in that database but it’s marked for deletion.
The actual removal of that site collection is done by the Gradual Site Delete timer job. This job runs once a day. Once it has run, the site collection is completely removed from the database.
But still, if you look at the database, it will not be any smaller than before. When you look at the used space in the database, you will see that this has decreased and the unused space has grown. The unused space is not released.
To release unused space, … *ducks for cover* … you can shrink the database. There, I said it.

Generally speaking, shrinking content databases is not done. You can do this, but it has no advantages and it has a negative effect on performance because when someone adds something to a site in that database, it has to expand to be able to store anything.
So, shrinking is definitely something you should avoid at any cost… except for the case where you have such a huge content database that you’ve split up into smaller content databases. The reason for splitting up the database in the first place was to make it smaller, right? To have a size which is much more manageable. But in order to get it back to a reasonable size, you need to shrink it. There’s no way around it.

During a migration from SharePoint 2007 to SharePoint 2013, I had to migrate a content database of 220GB. All things considered, this is not huge. It’s large, but not huge. This content database contained around 20 site collections. Backup of this database was not an issue… 20 minutes to backup this database. Copying this backup to a SharePoint 2010 migration server was frustrating. It took over an hour. Yeah, It SHOULD go faster but if you pass through a 10Mbit router, you are copying at USB 2.0 speed! But this was nothing compared to the time the Mount-SPContentDatabase cmdlet needed to complete to attach this database to the web application and do the actual upgrade from SP2007 to SP2010. This attach/upgrade took almost 3 hours and then it just aborted due to lack of disk space. The migration server had a data disk of 600GB and it just filled completely with the transaction log that was created as part of the attach/upgrade process. So, I lost 3 hours, had to wait until extra disk space was added and restart the whole thing again. By the time it had attached and upgraded, the size of the database was actually increased to 330GB.

When everything was attached and upgraded, I decided that I’m not going through this again when I do the migration from SP2010 to SP2013. I needed to have databases which are easier to handle. So, I split up this database into 5 databases of which the largest was still 115GB. But ok, nothing I could do about that in the short term.

Running the Gradual Site Delete job however proved to be a pain as well… took almost 6 hours to complete! Started around 2PM. Went home at 5PM. Next day, I noticed it finished around 8PM. So, I started the shrink operation of the database… lost half a day with that. Wasn’t able to do anything with that database for the larger part of the day.

Since this was only a “test” migration, I realised that history was going to repeat itself during the final migration and that I needed a way to make use of those lost hours between the finishing of the gradual site delete and the shrink. When the gradual site delete is done, start with the shrinking to have it done in the morning.

Enter… PowerShell!

The script below is going to kick off the Gradual Site Delete timer job for a specific web application and it then will monitor this job every 5 minutes to see if has completed. Once it has completed, it will continue with the shrinking of the database. The shrinking happens in 2 stages. The first stage is done with NOTRUNCATE. This means that SQL will move all pages to the front of the file but it will not reclaim unused space. The size stays the same. The next step is a shrink operation with TRUNCATEONLY. This will just remove all unused space from the end of the file. It’s basically the same thing that happens when you do a Shrink Database from the Management Studio.

Again, don’t do this as a part of a weekly maintenance routine because the first step of the shrink will introduce index fragmentation in your database. Seriously! For me, this was a necessary cleanup I had to do as part of a migration project to reorganize the content database and minimize the migration time! The environment I was doing this in, was a intermediate SharePoint 2010 environment, not a live environment.

Also, the Shrink operation in the script allows you to specify a percentage of free space it should reserve for unused space.

I used 5%. This way, for a content database of 100GB, 5GB of free space is retained. You can change this if you want, or you can add an additional parameter which allows you to specify the amount of free space it should keep.

You can find this script in my PowerShell repository on GitHub.