Over the past month I have been working on a major incident on our production SharePoint 2007 environment.
When opening a sub site in a site collection by entering it url without default.aspx file, SharePoint would show an HTTP 500 error (Internal Server error). When you try to open the default.aspx file, SharePoint would display the error "Cannot complete this action. Please try again.". The symptoms were similar to:
About a month ago a site administrator of a sub site noticed that the NT AUTHORITY\AUTHENTICATED USERS group was listed in the site members group. In other words, everybody who was authenticated in the domain or trusted domains had Contribute permissions. As you can imagine this was not supposed to happen, so he tried to delete the AUTHENTICATED USERS group from the members group. What the user exactly did and what happened is not really clear, but the the result was that the sub site was no longer reachable.
After searching for a solution for about a day, we could not figure it out and called in the support of Microsoft. Unfortunately they also weren't very sure how to solve it, but did mentioned the possibility of installing SP1 which might solve the issue. On the other hand, the also feared that installing SP1 would fail because of the broken site. To be sure we prefered to test the installation of SP1 first. In the meantime, the users couldn't access their data and we had to fix that first.
The suggestion of Microsoft was to use stsadm -o export/import. We would restore a known good copy of the database to a restore environment, export the sub site tree, copy it to the production environment and import on top of the broken sub site. Unfortunately this did not work, we received the same error. The alternative was to restore to a new sub site. The import started fine, but after about seven hours stsadm crashed! Also a second import crashed......now what.
We had bought MetaLogix Migration Manager to migrate content from file shares, ASP/HTML websites and other SharePoint environments into our new central MOSS2007 environment. So we gave that tool a go. Fortunately that worked. After several hours, the data was available again for the users.
FIX THE PRODUCTION ENVIRONMENT:
I wanted to test the installation of SP1 on a copy of that site collection. I tried to use stsadm -o backup/restore to copy the production site collection and to a restore environment. The backup went fine, but restore failed after 8 to 10 hours :-(
I then tried to create a backup of the entire database, restore that to the restore database server and attach to an empty web application. Fortunately that worked and I had an environment to test SP1 on. The installation of SP1 completed successfully and we now know that we can install SP1 without any issues on the environment.
The entire site collection was about 48GB of size. The sub site tree was about 25GB. Moveing the data using the default stsadm tools, did not work. It looks like stsadm is having some issues with large sites.
According to Microsoft Product Support, SP1 contains stored procedures which check and fix security issues. They weren't sure if those stored procedures would fix our issue. Obviously it did :-)