Getting the production database to work with locally - sql-server

I'm nearing the stage of saying "lets go live" of a client's system I've been working on for the past few months, basically a few autonomous services including a public facing website, an intranet website, an hourly exporter from a legacy OLTP DB based on modified flags/triggers etc.. and few others services that compose the system, with each their own specialized database etc.. , communicating with each other via messages (NServiceBus).
When I started I tried to keep a local replication of everything but its proving more and more difficult and on reflection probably a major friction point of the past few weeks, I like to keep regularly up to date as the legacy database is growing and causing hundreds events daily. Having high latency & mediocre bandwidth (between myself and the client's site, I'm in SE Asia where bandwidth is generally crap anyway) is also an issue for RDP, SQL tools, remote connection strings etc.. Tracking down integration bugs and understanding scenarios they present during feedback/integration/QA is also difficult as my data doesn't reflect the current state of the client's DB (the clients' staff have been working and evolving the data their end) and means another break, coffee and lengthy sync again. It would be ideal to do it all locally and then deploy at the end but I have to deliver parts incrementally (to get the check), and some parts are even in use (although not critical) so bug fixing on in use pats needs turn around quick, and with it being a small company, incremental feedback, it helps flush out some of the more vague requirements along the way (curse me).
I was thinking it would be good to have twice daily sync between the environments (their DBs to mine), I somewhat have design control over everything apart from the legacy SQL server database.
What are the best options SO users?
I was thinking of setting up a Windows 2003 light VM on my dev box. And in this install the same setup of the client sites (but not spread across multiple servers obviously). And then for syncing the databases I was thinking about SQL Server replication? or batch scripts? Or is there any better tools - ones that are fast and good compressions? I don't want my changes to go back to production (I have a separate CI & deployment procedure), I just want (I think I want.. tell me if better idea) my databases to be refreshed every night or twice a day (maybe whilst I'm at lunch bandwidth permitting).
How does everyone approach this?

I would recommend two ways to do this:
Snapshot Replication
Backing up the transaction log and manually (or batch) applying it
Snapshot replication can be difficult to get working, but it is possible even in offline situations such as physcially carrying snapshots to another location.
The transaction log method can be used as part of your standard backup procedures. ie: A full backup twice a week with transaction log backups more regularly.
Remember that best practice is to cleanse the data before using it in a test environment. At the very least this should be changing all personal data, especially email addresses, passwords and any other method which could result in some automated process making contact with the user in your database.

Related

How hard to migrate from IaaS to PaaS on Azure

So I'm thinking of dipping a toe in the Azure pool
Our web App Suite will soon be a pure ASP.Net + SQL Server affair
For various reasons it will be simpler to initially create a SQL VM and run everything from there initially.
How hard will it be to ...
...migrate SQL off the VM and into either "Cloud Services" or "Data Management"?
...migrate the suite of WebApps off the VM and into "Websites"?
It is my understanding that having achieved this migration, the OS level updates will no longer be my concern as they will be handled by the service. Thus at this point I'll be able to throw the original VM away :)
This isn't exactly answering your questions, but it might help educate you on more questions to ask and giving you a boost out of the gate. These were all lessons learned before, during, or after our migration of our systems to Azure. Now that we're up there, we have a ~50GB database with ~6 services running across ~30 instances. As long as our database backup behaves, total amount of effort in upgrading all of this is less than an hour (and could be much less if we didn't have many safeguards meant to force us to be aware of what's going on during the migration process - we don't want it to be too easy to deploy just to protect us from ourselves).
Preparing to migrate your system to Azure:
If you're planning to go to Azure, you first need to make sure your architectures and technologies are compatible. This isn't to say you have to code everything specific to Azure. This means some of the following things:
You should realize that "high availability" does not mean "error-free". In fact, high-availability environments usually have more errors that you have to handle and manage. For example, if you have a request going over the network to a server that just had a motherboard fry and was taken offline, that network request will be unsuccessful. That's not typically a problem you code for in "standard" server apps. To take it even further, what if that failed network is for a Database Connection that gets put back into a connection pool? That will cause that connection to be poisoned and broken the next time somebody pulls it out regardless of the future state of that server that went poof! There are just some extra things to worry about here because you're no longer depending on just 1 network with 4 servers on it but are now depending on hundreds of networks with thousands of servers on them. That 0.05% error scenario will happen MUCH more often to you than you've ever experienced in the past and you really have to be aware of this!
You should use dependency-injection to easily change things around. Proper separation of concerns will changes that seem very difficult become very easy in Azure.
You should use architectures for "high-availability". For example, a web application that would break when ran in a web farm would also break in Azure but a web application designed to work in a web farm would be very easy to run in Azure.
You should have automated deployments and configuration transforms for all of your applications. Anything else is just unsustainable unless it's nothing more than one little web site or something like that.
Depending on your needs, you can do it in phases. If database latency is something that isn't a big deal, perhaps a hybrid approach (over VPN from Azure to your data center) is acceptable to get your apps in Azure first while you later migrate your database. Or perhaps the opposite. What we did was keep primary apps and database in our data center but put secondary apps up in Azure first. Then some primary apps (that took a performance hit for a month until) later our database and critical apps. That final migration sure made for a very long weekend and not much sleep, but it is SOO much nicer now that we're done!
Migrating your applications to Azure:
This ultimately depends very heavily on what your application is or does, and every scenario has different steps/issues/benefits. I'm not going to cover this deeply other than to say, "Use Google, it's your friend!". Beyond that, for us, getting our applications up into Azure was the largest payoff when compared to our data. The ROI on our app migration was less than a month between hosting costs, licensing, and management effort. Instead of taking a couple days to setup a server, I can now take a day to setup an entirely new and duplicated environment of all of our SaaS applications/databases/etc and have them running on ~25 different Cloud instances!
Instead of trying to tell you how to migrate these, let me give you a few words of caution so you know sooner rather than later:
If you have app problems in Windows 2012, humor me and try it in Windows 2008 R2. There are a couple bugs in some of the 2012 images that they've prepared. It's incredibly trivial to switch back and forth!
Go make your logging 1000x's better than what it is now. If you don't do that now, you'll regret it.
Don't depend 100% on the easy-to-implement "Azure Logging". It works well enough but it more-or-less requires your applications to start successfully and is absolutely useless in debugging startup problems. If you don't have an alternative, then you will waste many, many hours just debugging stupid little problems when your app starts up. By the time you're done with it, you could have easily added 5 other logging frameworks and had an amazingly awesome logging system in place plus a running app instead of nothing but a running app to show for the same amount of time. Really, do this! (Profiling is a good idea as well, although Mini-Profiler has load-balancing issues if you have multiple instances.)
If you add new endpoints to your deployment (ports, etc), you cannot simply "Upgrade" an existing deployment. You must delete it (the deployment, not the service) and install from scratch. You can delete the Staging one, deploy to Staging, then swap.
If you have WCF apps, pretend you don't know about Windows Activation Services. They're disabled in Azure by default. You can either hack them to turn them on (startup scripts) or create your own self-hosting application. We self-host so we can more easily tweak service configurations once we're deployed (it's not easy to edit web.config files in a way that sticks in Azure). Web services work in "IIS" in Azure but TCP, named pipes, etc. do not.
Go learn about and add the Transient Errors Application Block (or something equivalent) to anything you communicate with. If you don't do it now, you'll regret it.
Go make your logging better! Really, really, REALLY do this!
Migrating your SQL Server Database to Azure:
Getting your database up into Azure is a bit of a painful process. There isn't a quick and easy way to just get it up there and making it work. Some people have to make some major changes while others just have to tweak a few ignorable things here and there. However, no matter how large or small your database is, you really REALLY must devote a lot of time to testing it. Test your migration process. Test your scripts to prepare your database. Test the performance and stability of the database up in the cloud. Test your backup procedures. Test your upgrade procedures. Test your backup restoration procedures. Test ALL of this because I guarantee you that you will find some surprises!
Schema:
Go learn about all of the limitations of SQL Azure. No Heaps, etc. Learn them before you start! Go learn them now! They're all mostly to very reasonable.
Be aware of the 2GB T-Log limitation! This means some very large indexes can never be rebuilt! (that said, our 30GB table isn't yet hitting this)
To deploy your schema, go into SSMS for your local db and use the "Tasks -> Extract Data-tier Application..." feature (it's in different areas of the menu in different versions of SSMS). Take this file and go into SSMS for your Azure database and use the "Deploy Data-tier Application" feature. (This will help you catch some of the Azure limitations you aren't honoring if this process fails.) This is, by far, the easiest way to get an empty version of your database up into Azure.
Use a tool like Redgate SQL Compare to verify your work (you'll have to tweak a couple options, like WITH NOCHECK to get a clean comparison).
You'll have to cleanup users, schemas, broken sprocs, etc. before you succeed at this. (this is a good thing!)
Data:
Go learn about all of the limitations of SQL Azure. Learn them before you start! Go learn them now! They're all mostly to very reasonable.
Go download the Azure Database Migration Wizard from Codeplex (or wherever the latest version is). It's not the most amazing software (kinda unstable) but even if it crashes once or twice on you, it'll still save you a LOT of time!
I strongly recommend RedGate's SQL Data Compare. The previously-mentioned migration wizard will help you identify problems (it's on you to fix those) and will get you ~98% migrated but you'll want to come back and clean up after it. It has some bugs that misses nullable BIT fields (and upper ascii characters) and some other things that a tool like SQL Data Compare can easily identify and fix. It can also give you the peace-of-mind that you can depend on your database.
If your database is large, consider spinning up a temporary VM in Azure (they have them with SQL Server pre-installed and available in ~20 minutes) to do your migrations from. If you do this, it's best to upload a compressed database backup to Blob storage (Cerebrata's storage too is great for this) and then grab it and restore to SQL Server in that VM. Then stage your migrations all from there!
Test, test, TEST!!!
Be careful running SQL on a VM, it's not a high availability solution. Azure VMs are prone to restarting from time to time. Unless you have multiple VMs running SQL Server in an availability group, or you have some sort of mirroring and load balancing setup, you won't have a high availability solution. I too originally favoured the IaaS to PaaS route, in the end it seemed to be a false economy as migrating IaaS to PaaS is about as much work as migrating on-premise to PaaS. In the end I decided to take the time to optimise my application for PaaS, i.e. moving durable storage to blobs, implementing transient error handling and retry logic, etc.
What you're proposing is certainly possible but having a multi VM arrangement to deliver high availability SQL takes a bit of work and is expensive! Have a read of the following guide, it was really helpful to me when I started the migration process:
Top 7 Concerns of Migrating a .NET Application to Azure
Just yesterday Microsoft announced their plan to host also Iaas solutions and not only Saas solution on their Azure platform.
http://weblogs.asp.net/scottgu/archive/2013/04/16/windows-azure-general-availability-of-infrastructure-as-a-service-iaas.aspx
About migration, it really depends. We work with a distribution mechanism: TFS + Octopus so the deployment is very easy and it works on Iaas or SQL Azure, it doesn't really matter.
There are also other things to keep into consideration when moving into Saas. Probably your code should be refactored if it's not Saas oriented or your application may have a very high hosting cost over Azure.

Updating the production IIS WebSite and SQL Server Database without stopping

There is a testing server that uses the testing database. We test the website on the testing server. If it is okay, we update the website and the database schema from the testing server to the production server. But this method is very painful and risky.
First, we have to redirect the users to a maintenance page, so the website is paused for a while.
Second, if something goes wrong when updating, we have to back to old website, because we can't put the website in a maintenance mode for a long time.
So I'm seeking a solid solution to update an IIS website and an Sql Server Database without data loss and using maintenance mode. Is there any way to do this? How the big websites do this without data loss and pausing.
We've thought using a release candidate website. We've planned to use this RC website for temporary. First, we update the RC site, then swap the bindings between RC and production website. But this time the database is problem. Because we can change the database schema, and the old one can't work with new database. So, if we use a temp site with temp db, there will be data loss. Also, the updated website won't work with old database if the temp site uses old production database. So I need a solid and practical solution for this problem.
This is orders of magnitude more complicated than you imagine. This specifically is not about HA nor about contiguous integration. Neither of those will provide what you need, they're only pieces of the much more complex puzzle.
There simply isn't possible to write code changes in a manner that is transparent/oblivious to schema changes as they occur. At best you can write the code in a manner that supports the schema at v. N and at v. N+1, which in itself is a big challenge. But is impossible to write the code in a manner that supports the schema as it transitions from v. N to v. N+1. The schema change induced by a deployment has to be atomic for the code operating on the schema. Since the schema change itself cannot be atomic, it follow that the upgrade has two possible avenues:
take the code offline during the schema change. This is what you're doing now and is the safest approach. Of course, it implies service availability down time and runs the risks you already experienced (rollback of failed upgrade, lengthy upgrade, etc). A variant of this approach is to redirect the service to a read-only copy of the data and offer a degraded service experience(no changes are possible during the downtime) which may or may not be acceptable, depending on the business specifics.
standby upgrade. This implies that you take a snapshot of the service data (various HA solutions may provide a standby snapshot out-of-the-box, eg. log shipping). Upgrade the snapshot, then apply all the transactions that occurred on the real service data to the upgraded snapshot. This is always tricky, because it requires a technology to detect, capture and apply the changes (eg. change tracking, replication, custom solution etc) and requires to transform each change to the new, upgraded, schema. Once the upgraded schema is up to date with changes from the main service, the service can be redirected to the upgraded schema. This redirection is also much more complex than it sounds. For one choosing the moment when to cut-off the old schema and stop accepting new changes, while making sure all changes were applied to the new upgraded schema DB is a challenge in itself. Another challenge is to resolve the conflict of the code understanding pre-upgrade and post-upgrade schema versions. Developing code that handles both is, as I said, problematic and error prone, so one solution is to, again, take the service offline for a short period and replace the code. Another solution is to have a standby service, running code that handles the post-upgrade DB schema and is connected to the post-upgrade DB, and redirect the live requests to your standby, upgraded, service.
And I did not even touch the thorny subject of service interaction, when a particular service of a much larger deployed solution has to be upgraded. This is when service API protocol back compatibility plays the major role to allow the post-upgrade service to play along with its peer services.
Ultimately there just isn't any silver bullet. I've witnessed single machine large DB deployments that took weeks to roll out version N+1, with transcriptional replication contiguously feeding the post-upgrade DB schema with changes from the pre-upgrade DB. And I witnessed deployments of thousands of machines deploying version N+1 in stages, as a complicated dance of enabling code and data changes over the course of several days to reach the full functionality of the post-upgrade. This problem is just plain hard.
This is what Azure is fantastic for. The Azure Cloud platform allows for staging servers and production servers. You can set it up so once you commit your changes to Git or TFS it is pushed to either a staging or production server automatically. You can also set up to manually push changes. Most of the ORM libraries like Entity Framework have migration support.
There is a lot more information out there on this topic like:
Azure seamless upgrade when database schema changes
Staging or Production Instance?
Youre describing a high availability solution (HA). HA solutions are expensive and can quickly be overkill. you would need redundancy for your app and db servers, which means setting up a db cluster. All this increases the amount of time you would spend deploying changes, but the tradeoff is that your app is always available.
The main thing with deployment is having a repeatable process. So my best recommendation is to script or automate as much as possible.

Backing up SQL Database for Reports

I'm looking for some help/suggestions for backing up two large databases to one server dedicated to reports. The situation is;
My company has two databases for its internal website. One for the UK and one for Europe. Both are mirrored for DR.
I have a server based in Europe which is dedicated to Microsoft Reporting Services, where we run reports based on the data collected in those two databases.
We do not want to point reporting services to the live databases for performance/security reasons so we currently backup both databases on a daily basis and restore them to our Reporting Services server.
However this means we are putting a strain on our networks by backing up the entire databases, and also the data is only up-to-date by midnight yesterday.
Our aim is to have the data up to date by at least 15 minutes, it has been suggested to look at Log Shipping so I wondered if anyone had any experience in setting this up and what are the pros and cons and whether there is a better alternative?
Any help would be greatley appreciated,
Thanks
We developed a similar environment. We used Mirroring to get the data off to our reporting server and created an automated routine to create Snapshots of the database every 15 min. These snapshots only take 1 to 2 seconds to create in our environment and give us a read only copy of the database. Let me know if you would like me to go into deeper detail.
Note we are running Enterprise on both servers.
Log shipping is a great solution for this. We've got articles about it over at SQLServerPedia's Log Shipping section, and I've got a video tutorial on there talking you through your different options. One thing to keep in mind about log shipping is that when the restores happen, your users will be kicked out of the reporting database.
Replication doesn't have that problem, but replication is nowhere near "set-it-and-forget-it" - it's time-intensive to manage, and isn't quite as reliable as you'd like it to be. In addition, you may have to make schema modifications in order to use replication. Log shipping is more automatic & stable, but at the cost of kicking users out at restore time.
You can minimize that by having two log shipping schedules - one for daytime during business hours, and one for the rest. During business hours, you only restore the data once per hour (or less), and the rest of the time you do it every 15 minutes.
You should look at replication as an alternative to backups.
I would recommend that you look into using Transactional Replication.
It sounds as though you are looking to implement a scenario that is similar to what we are currently implementing ourselves.
We use Transaction Replication (albeit real time, you would most likely wish to synchronize your environment on a less frequent schedule) to offload a copy of our live production database to another server for reporting purposes.
Offloading reporting data is a common replication scenario and is described here in the Microsoft Replication documentation.
http://msdn.microsoft.com/en-us/library/ms151784.aspx
Brent is right in that there is indeed an element of configuration required with Replication, along with security considerations that would need to be addressed however, there are a number of key advantages to using Replication in my opinion, including:
Reduced latency in comparison to log
shipping.
The ability to Publish only the
Articles (tables) that are required
for reporting.
Reduced storage requirements.
Less data being published means less
network traffic.
Access to your reporting
data/database at all times.
For example, in our environment, we decided to replicate only the specific tables (articles) from our production database that we actually require for reporting.
I hope what I have described is clear and makes sense but please do feel free to contact me if you have any queries.

Database Design Theory For Multiple Application Instances

I'm working on a SaaS project that will have each customer having an instance of the application (customer1.application.com, customer2.application.com, etc.) and ideally each customer would have their "own" space in the DB. The current plan is to create a DB for each customer and deploy an instance of the application into the web farm. The idea is that each customer could opt out of an upgrade to maintain the status quo (something one of our investors REALLY wanted, largely in part because he hates how Facebook keeps changing how it works.)
Last night I attempted to roll out, to my two test accounts, an update that altered the database. While the ensuing errors that were caused were my fault (forgetting a small but apparently very important change in the DDL) I'm starting to worry about my overall theory of operation because missing one ALTER COLUMN statement and a whole upgrade cycle could be blown to hell. So after that long build up here's my questions:
1) Is there a way to do a diff between two databases (the "test" production database and a actual production database) that will accurately record each change being made?
2) Is there another database (and/or application) design model I should be considering? I know if I take away supporting multiple versions of the application that I actually remove a lot of the long term support headaches.
Food for thought:
Code upgrades happen more frequently than DB Schema upgrades. Make sure you have a really good SCM in place to handle the code upgrades. We use git with great success.
Code is easy to manage, databases are not (in comparison). The reason is that they are mutable, and change each moment. Plus, they are really hard to roll back (possible, but time consuming, with downtime). So we must arrive at a simple way to track schema updates (along with associated data changes), and be able to apply them in the future to other similar databases.
Each database schema version should be given a unique, sequential integer version number. Start with 100 per say.
Each time you have to upgrade it, write an sql scripts like
100-101.sql
101-102.sql
102-103.sql
It is the job of each script to perform the upgrade for that specific version. It can be as simple as adding a table, or as complicated as re-arranging foreign keys. But in any event, they will be reliable in what they are designed to do.
You can apply any given script many times during testing (on fresh data) to ensure it will work as expected.
So when you find yourself needing to upgrade a client from version 130 to 180, you can safely apply the sql scripts (IN ORDER), and you will arrive at the correct destination.
You should never be changing DBs by hand. Do it by a script that does all DDL changes, etc...
Ideally, there should be a generic DB release script that uses DDL version as configuration/input.
(and DDL changes should be tagged with a specific tag in a versioning system)
You can go Microsoft route re: supporting multiple versions as a headache - simply designate all versions prior to X (say 2 versions back) as un-supported. That way, you can support last 2-3 versions but don't waste resources on anything more, while allowing per-client flexibility to a large extent.
You should carefully weigh pros/cons of having versioned app/DB system like you propose.
List the the pros (such as placating an investor, positive experience for a client when version changes unexpectedly that you mentioned - translated into marginal probability to retain/add new clients who require such feature, plus an easy way to do BETA/UAT testing, plus a failrly wasy way to roll back the schema changes gone awry by loading client's data into DB schema from prior version).
List the cons (cost of DB space, cost of your time to implement, cost of support)
Compare the two and decide which is better for your business.
Redgate's SQL Compare does a good job of comparing and diffing two SQL Server databases (warning: commercial third-party product). Also, I think there's free stuff out there that does much the same thing.
If you want to be able to leave some customers behind on older versions of your product, it might make more sense to maintain a one-database-per-customer model, with the scripts for building each version of the databases under source control. This keeps your customers isolated from each other, and even allows you to switch database vendors (e.g. from SQL Server to Oracle) or versions (i.e. from SQL Server 2000 to Sql Server 2005) on some customers while keeping other customers on the older versions.
Manual run scripts will not work. Nor diff tools, for the matter. Diff works for 2,4 maybe 10 databases. But does not scale, because what you need is reliability in presence of failures (offline databases, server restarting all that).
You deploy by scheduling upgrade scripts. For instance, see how MySpace does this for over 1000 databases: MySpace Uses SQL Server Service Broker to Protect Integrity of 1 Petabyte of Data. the key take is that they use a guaranteed, reliable, delivery mechanism (SSB) to deploy schema maintenance scripts. You need an asynchronous, reliable, mechanism to run scripts because destination databases may be offline, running scheduled maintenance, unreacahbe etc, and a reliable delivery mechanism like Service Broker can handle all the retries and related issues (handling duplicates, acknowledgments etc). You can also look at Asynchronous procedure execution for an example of how to handle script execution via SSB.
As for the scripts themselves, I recommend you start looking at your database schema and configuration data as a versioned resource. I have addresses this problem already several times, eg. see Do you put your database static data into source-control ? How?
Update
I guess I own some explanation why I consider diffing a wrong approach. Just to makes things clear, I'm talking about deployments of hundreds of servers and thousands of databases. The original post compares itself to facebook and I whish them to have the success to reach that size, but also the questions asks about design principles, so I say that discussing about cloud level scale is appropiate.
I see two problems with diff tools:
Availability. All diff tools work by connecting to both the 'master' and the 'copy', so they can do their job only when both are online. This creates a hot spot, a single point of failure, the 'master' copy, whose availability becomes critical for deploying upgrades. High availability always comes at a cost. It also leaves the problem of 'copy' availability as a minor implementation details, the upgrade scheme must handle retries and time outs and disconnects from the client on its own (not a trivial problem by any means).
Atomicity. The diff tools expect a stable schema of the 'master'. This in effect places a freeze on 'master' while an upgrade is taking place. While this can be controlled on a small scale, on large scales it becomes a problem as upgrading the master itself to v. N+1 becomes a race against all the thousands of databases, when some may be still upgrading from v. N-1.
Script based solutions that ship the upgrade script to the 'copy' solve both of these problems. Also diff tools like the VSDB .dbschema based vsdbcmd.exe are better than a 'live' diff tool since the 'master' dbschema file can be delivered to the 'copy' machine and turn the whole upgrade process into a local operation.
Overall I also belive that script based upgrade, using metadata versioning, is supperior to diff based upgrade, because of reasons of testing and source control I had already talked about in the link to Q1525591.
if I take away supporting multiple
versions of the application that I
actually remove a lot of the long term
support headaches
Any change, however small, has a chance of breaking something that is important for someone.
So if you have multiple customers, rolling out a fix for customer 1 will upset customer 2. It doesn't even have to be a bugged release; it might just be a change in behaviour they disagree with. For most customers, not controlling the release schedule is simply unacceptable.
So I'd advise you to keep a different codebase for every customer. Roll out fixes only after agreement with a customer.
There is a number of customers where this approach breaks down (think Yahoo mail), but reading your question I think you're safely below that number. And for a compare tool, I can't help but agree with the posts suggesting Redgate's SQL Compare.

How would you migrate hundreds of MS Access databases to a central service?

We have literally 100's of Access databases floating around the network. Some with light usage and some with quite heavy usage, and some no usage whatsoever. What we would like to do is centralise these databases onto a managed database and retain as much as possible of the reports and forms within them.
The benefits of doing this would be to have some sort of usage tracking, and also the ability to pay more attention to some of the important decentralised data that is stored in these apps.
There is no real constraints on RDBMS (Oracle, MS SQL server) or the stack it would run on (LAMP, ASP.net, Java) and there obviously won't be a silver bullet for this. We would like something that can remove the initial grunt work in an automated fashion.
We upsize (either using the upsize wizard or by hand) users to SQL server. It's usually pretty straight forward. Replace all the access tables with linked tables to the sql server and keep all the forms/reports/macros in access. The investment in access isn't lost and the users can keep going business as usual. You get reliability of sql server and centralized backups. Keep in mind - we’ve done this for a few large access databases, not hundreds. I'd do a pilot of a few dozen and see how it works out.
UPDATE:
I just found this, the sql server migration assitant, it might be worth a look:
http://www.microsoft.com/sql/solutions/migration/default.mspx
Update: Yes, some refactoring will be necessary for poorly designed databases. As for how to handle access sprawl? I've run into this at companies with lots of technical users (engineers esp., are the worst for this... and excel sprawl). We did an audit - (after backing up) deleted any databases that hadn't been touched in over a year. "Owners" were assigned based the location &/or data in the database. If the database was in "S:\quality\test_dept" then the quality manager and head test engineer had to take ownership of it or we delete it (again after backing it up).
Upsizing an Access application is no magic bullet. It may be that some things will be faster, but some types of operations will be real dogs. That means that an upsized app has to be tested thoroughly and performance bottlenecks addressed, usually by moving the data retrieval logic server-side (views, stored procedures, passthrough queries).
It's not really an answer to the question, though.
I don't think there is any automated answer to the problem. Indeed, I'd say this is a people problem and not a programming problem at all. Somebody has to survey the network and determine ownership of all the Access databases and then interview the users to find out what's in use and what's not. Then each app should be evaluated as to whether or not it should be folded into an Enterprise-wide data store/app, or whether its original implementation as a small app for a few users was the better approach.
That's not the answer you want to hear, but it's the right answer precisely because it's a people/management problem, not a programming task.
Oracle has a migration workbench to port MS Access systems to Oracle Application Express, which would be worth investigating.
http://apex.oracle.com
So? Dedicate a server to your Access databases.
Now you have the benefit of some sort of usage tracking, and also the ability to pay more attention to some of the important decentralised data that is stored in these apps.
This is what you were going to do anyway, only you wanted to use a different database engine instead of NTFS.
And now you have to force the users onto your server.
Well, you can encourage them by telling them that you aren't going to overwrite their data with old backups anymore, because now you will own the data, and you won't do that anymore.
Also, you can tell them that their applications will run faster now, because you are going to exclude the folder from on-access virus scanning (you don't do that to your other databases, which is why they are full of sql-injection malware, but these databases won't be exposed to the internet), and planning to turn packet signing off (you won't need that on a dedicated server: it's only for people who put their file-share on their domain-server).
Easy upgrade path, improved service to users, greater centralization and control for IT. Everyone's a winner.
Further to David Fenton's comments
Your administrative rule will be something like this:
If the data that is in the database is just being used by one user, for their own work (alone), then they can keep it in their own network share.
If the data that is in the database is for being used by more than one person (even if it is only two), then that database must go on a central server and go under IT's management (backups, schema changes, interfaces, etc.). This is because, someone experienced needs to coordinate the whole show or we will risk the time/resources of the next guy down the line.

Resources