As all databases should be, the source for ours is versioned using source control. The database is upgraded using a series of SQL scripts generated by Red Gate's comparison tool, which is essentially the same as an 'up' migration in the numerous database migration frameworks that seem to have sprung up recently.
But what's the point in the 'down' migrations in these frameworks? Often the code for the 'up' migration is extremely complex (typically complex data migration as features evolve) and I struggle to see the purpose of having to write it all in reverse for the 'down' one. It's certainly something I've never felt the need for. Am I missing something here...?
It seems that the pertinent question here is:
Why would a scripted rollback ever be preferable to a full database restore from a backup taken immediately before the upgrade?
I can think of several reasons:
The database is very large - say a few hundred GB - and your company cannot afford the downtime and/or administrative overhead that would be involved in a full restore.
A bug was introduced that was not discovered until a week or two into production. If you've never experienced this before, you're lucky. Once you've got a week's worth of transactions in the new database, you can forget about just restoring from backup.
The bug was not discovered until months into the release. In other words, you don't even have the backup anymore, and you're officially in damage control/disaster recovery mode. I've never experienced this, but I've heard stories. It's a scary thought - how do you undo all the damage that was done? In this case your downgrade might not be perfect, but it might still be better than the alternative.
By contrast, perhaps the database changes were trivial - adding a few rows here, a few triggers there. In this case, a scripted rollback is going to take much less time than a restore. It's possible that some things that took hours to upgrade - such as creation of new indexes or addition of new columns - may only take seconds to downgrade (drop).
You're deploying to customer sites. Some of them may not have backups at all (yeah, it's pathetic, but there's nothing you can do about it). If one of them needs a rollback, this is your only option.
There may be other reasons to have downgrade scripts - this is just off the top of my head.
Customer: "We don't like the new version and want to go back to the old version."
Rollbacks. You push everything into production and it blows up - down migrations are a good safety net for rolling back.
Or you're developing with multiple code branches - you can go back and forth between versions to your heart's content.
If you upgrade, and subsequently data is added to your database that you want to preserve, a rollback script (as long as it is designed as such) should achieve this, whereas if you simply restore a backup you'll lose it.
But you could get round the above by restoring a backup and using SQL Data Compare to copy the additional data across.
Related
I've outgrown the Sql Server custom actions available in WiX, so I'm taking the bold step of creating my own using Deployment Tools Foundation. I want to be a good citizen and make sure that mine support rollback. But what's the best way of doing it?
I need to support SQL Server 2005 and later, all editions.
The problem, as I see it, is that Windows Installer works in two phases: it does the work, storing undo information as it goes. Then, when all the pieces are in place it either commits (deleting the undo information) or does a rollback.
This means that standard transactions won't do the job. They would have to be completed inside my Execute custom action, and I wouldn't get a chance to roll them back later.
I've considered taking a copy-only backup of the database that I can restore in the rollback action if necessary but I think this approach, whilst simple has shortcomings. I don't know how big our databases will get, for example - so I can't guarantee that there will be space available to hold the backup on the target machine. Also, backup and restore can take a while to complete, and I don't want typical installs (where rollback doesn't happen) to be unnecessarily slow.
So that brings me to my current favoured idea: make sure the Distributed Transaction Coordinator is started up, then initialise a Distributed Transaction before making changes, then either committing it or rolling it back in the appropriate custom actions.
It seems I can uses the members of the TransactionInterop class to export a cookie that will enable me to share the transaction between my different custom actions.
Can anyone with experience of this kind of thing say if it is likely to work?
Some database/instance operations cannot be done inside a transaction (eg. CREATE/ALTER/DROP ENDPOINT), and other operations cannot be done inside a distributed transaction (eg. SAVE TRANSACTION). So you won't be able to do them at all in your proposed plan. Also your DB upgrade scripts will have to all work correctly when run inside an uncommitted transaction.
I would say that there are fewer risks of going down the backup/restore path (or alternatively creating a database snapshot and restoring from the snapshot on rollback, with the drawback of requiring EE).
Also an option is to have an undo script for every do script run during upgrade, and have the undo script run during rollback and remove the effects of the installation. I understand that this is a hard problem, probably doubles the amount of scripts that have to be developed (and tested...) and requires some serious developer discipline.
I've done quite a few installers with SQL scripts over the years and I've kind of come to the opinion that it's only suited for simple databases like here's my VB app with a local MSDE / MySQL database or here's my local store for code table lookups and temporary commits while we wait to sync it somewhere else.
Once you get into industrial strength heavy lifting enterprise app type situations I like to get my DB configuration out of the installer and into the application as a first run type story. You can do a lot heavier lifting with C# there and not be constrained by MSI.
I'm working on a SaaS project that will have each customer having an instance of the application (customer1.application.com, customer2.application.com, etc.) and ideally each customer would have their "own" space in the DB. The current plan is to create a DB for each customer and deploy an instance of the application into the web farm. The idea is that each customer could opt out of an upgrade to maintain the status quo (something one of our investors REALLY wanted, largely in part because he hates how Facebook keeps changing how it works.)
Last night I attempted to roll out, to my two test accounts, an update that altered the database. While the ensuing errors that were caused were my fault (forgetting a small but apparently very important change in the DDL) I'm starting to worry about my overall theory of operation because missing one ALTER COLUMN statement and a whole upgrade cycle could be blown to hell. So after that long build up here's my questions:
1) Is there a way to do a diff between two databases (the "test" production database and a actual production database) that will accurately record each change being made?
2) Is there another database (and/or application) design model I should be considering? I know if I take away supporting multiple versions of the application that I actually remove a lot of the long term support headaches.
Food for thought:
Code upgrades happen more frequently than DB Schema upgrades. Make sure you have a really good SCM in place to handle the code upgrades. We use git with great success.
Code is easy to manage, databases are not (in comparison). The reason is that they are mutable, and change each moment. Plus, they are really hard to roll back (possible, but time consuming, with downtime). So we must arrive at a simple way to track schema updates (along with associated data changes), and be able to apply them in the future to other similar databases.
Each database schema version should be given a unique, sequential integer version number. Start with 100 per say.
Each time you have to upgrade it, write an sql scripts like
100-101.sql
101-102.sql
102-103.sql
It is the job of each script to perform the upgrade for that specific version. It can be as simple as adding a table, or as complicated as re-arranging foreign keys. But in any event, they will be reliable in what they are designed to do.
You can apply any given script many times during testing (on fresh data) to ensure it will work as expected.
So when you find yourself needing to upgrade a client from version 130 to 180, you can safely apply the sql scripts (IN ORDER), and you will arrive at the correct destination.
You should never be changing DBs by hand. Do it by a script that does all DDL changes, etc...
Ideally, there should be a generic DB release script that uses DDL version as configuration/input.
(and DDL changes should be tagged with a specific tag in a versioning system)
You can go Microsoft route re: supporting multiple versions as a headache - simply designate all versions prior to X (say 2 versions back) as un-supported. That way, you can support last 2-3 versions but don't waste resources on anything more, while allowing per-client flexibility to a large extent.
You should carefully weigh pros/cons of having versioned app/DB system like you propose.
List the the pros (such as placating an investor, positive experience for a client when version changes unexpectedly that you mentioned - translated into marginal probability to retain/add new clients who require such feature, plus an easy way to do BETA/UAT testing, plus a failrly wasy way to roll back the schema changes gone awry by loading client's data into DB schema from prior version).
List the cons (cost of DB space, cost of your time to implement, cost of support)
Compare the two and decide which is better for your business.
Redgate's SQL Compare does a good job of comparing and diffing two SQL Server databases (warning: commercial third-party product). Also, I think there's free stuff out there that does much the same thing.
If you want to be able to leave some customers behind on older versions of your product, it might make more sense to maintain a one-database-per-customer model, with the scripts for building each version of the databases under source control. This keeps your customers isolated from each other, and even allows you to switch database vendors (e.g. from SQL Server to Oracle) or versions (i.e. from SQL Server 2000 to Sql Server 2005) on some customers while keeping other customers on the older versions.
Manual run scripts will not work. Nor diff tools, for the matter. Diff works for 2,4 maybe 10 databases. But does not scale, because what you need is reliability in presence of failures (offline databases, server restarting all that).
You deploy by scheduling upgrade scripts. For instance, see how MySpace does this for over 1000 databases: MySpace Uses SQL Server Service Broker to Protect Integrity of 1 Petabyte of Data. the key take is that they use a guaranteed, reliable, delivery mechanism (SSB) to deploy schema maintenance scripts. You need an asynchronous, reliable, mechanism to run scripts because destination databases may be offline, running scheduled maintenance, unreacahbe etc, and a reliable delivery mechanism like Service Broker can handle all the retries and related issues (handling duplicates, acknowledgments etc). You can also look at Asynchronous procedure execution for an example of how to handle script execution via SSB.
As for the scripts themselves, I recommend you start looking at your database schema and configuration data as a versioned resource. I have addresses this problem already several times, eg. see Do you put your database static data into source-control ? How?
Update
I guess I own some explanation why I consider diffing a wrong approach. Just to makes things clear, I'm talking about deployments of hundreds of servers and thousands of databases. The original post compares itself to facebook and I whish them to have the success to reach that size, but also the questions asks about design principles, so I say that discussing about cloud level scale is appropiate.
I see two problems with diff tools:
Availability. All diff tools work by connecting to both the 'master' and the 'copy', so they can do their job only when both are online. This creates a hot spot, a single point of failure, the 'master' copy, whose availability becomes critical for deploying upgrades. High availability always comes at a cost. It also leaves the problem of 'copy' availability as a minor implementation details, the upgrade scheme must handle retries and time outs and disconnects from the client on its own (not a trivial problem by any means).
Atomicity. The diff tools expect a stable schema of the 'master'. This in effect places a freeze on 'master' while an upgrade is taking place. While this can be controlled on a small scale, on large scales it becomes a problem as upgrading the master itself to v. N+1 becomes a race against all the thousands of databases, when some may be still upgrading from v. N-1.
Script based solutions that ship the upgrade script to the 'copy' solve both of these problems. Also diff tools like the VSDB .dbschema based vsdbcmd.exe are better than a 'live' diff tool since the 'master' dbschema file can be delivered to the 'copy' machine and turn the whole upgrade process into a local operation.
Overall I also belive that script based upgrade, using metadata versioning, is supperior to diff based upgrade, because of reasons of testing and source control I had already talked about in the link to Q1525591.
if I take away supporting multiple
versions of the application that I
actually remove a lot of the long term
support headaches
Any change, however small, has a chance of breaking something that is important for someone.
So if you have multiple customers, rolling out a fix for customer 1 will upset customer 2. It doesn't even have to be a bugged release; it might just be a change in behaviour they disagree with. For most customers, not controlling the release schedule is simply unacceptable.
So I'd advise you to keep a different codebase for every customer. Roll out fixes only after agreement with a customer.
There is a number of customers where this approach breaks down (think Yahoo mail), but reading your question I think you're safely below that number. And for a compare tool, I can't help but agree with the posts suggesting Redgate's SQL Compare.
You know the one I am talking about.
We have all been there at some point. You get that awful feeling of dread and the realisation of oh my god did that actually just happen.
Sure you can laugh about it now though, right, so go on and share your SQL Server mishaps with us.
Even better if you can detail how you resolved your issue so that we can learn from our mistakes together.
So in order to get the ball rolling, I will go first……..
It was back in my early years as a junior SQL Server Guru. I was racing around Enterprise Manager, performing a few admin duties. You know how it is, checking a few logs, ensuring the backups ran ok, a little database housekeeping, pretty much going about business on autopilot and hitting the enter key on the usual prompts that pop up.
Oh wait, was that a “Are you sure you wish to delete this table” prompt. Too late!
Just to confirm for any aspiring DBA’s out there, deleting a production table is a very very bad thing!
Needless to say a world record was promptly set for the fastest database restore to a new database, swiftly followed by a table migration, oh yeah. Everyone else was none the wiser of course but still a valuable lesson learnt. Concentrate!
I suppose everyone has missed the WHERE clause off a DELETE or UPDATE at some point...
Inserted 5 million test persons into a production database. The biggest mistake in my opinion was to let me have write access to the production db in the first place. :P Bad dba!
My biggest SQL Server mistake was assuming it was as capable as Oracle when it came to concurrency.
Let me explain.
When it comes to transactional isolation level in SQL Server you have two choices:
Dirty reads: transactions can see uncommitted data (from other transactions); or
Selects block on uncommitted updates.
I believe these come from ANSI SQL.
(2) is the default isolation level and (imho) the lesser of two evils. But it's a huge problem for any long-running process. I had to do a batch load of data and could only do it out of hours because it killed the website while it was running (it took 10-20 minutes as it was inserting half a million records).
Oracle on the other hand has MVCC. This basically means every transaction will see a consistent view of the data. They won't see uncommitted data (unless you set the isolation level to do that). Nor do they block on uncommitted transactions (I was stunned at the idea an allegedly enterprise database would consider this acceptable on a concurrency basis).
Suffice it to say, it was a learning experience.
And ya knkow what? Even MySQL has MVCC.
I changed all of the prices to zero on a high-volume, production, eCommerce site. I had to take the site down and restore the DB from backup.. VERY UGLY.
Luckily, that was a LOOONG time ago.
forgetting to highlight the WHERE clause when updating or deleting
scripting procs and checking drop dependent objects first and running this on production
I was working on the payment system on a large online business. Multi-million Euro business.
Get a script from a colleague with a few small updates.
Run it on production.
Get an error report 30 minutes later from helpdesk, complaining about no purchases last 30 minutes.
Discover that all connections are waiting on a table lock to be released
Discover that the script from my colleague started with an explicit BEGIN TRANSACTION and expected me to manually type COMMIT TRANSACTION at the end.
Explain to boss why 30 minutes of sales were lost.
Blame myself for not reading the script documentation properly.
Starting off a restore from last week onto a production instance instead of the development instance. Not a good morning.
I've seen plenty other people miss a WHERE clause.
Myself, I always type the WHERE clause first, and then go back to the start of the line and type in the rest of the query :)
Thankfully we only ever make one cock-up before you realise that using transactions really is very, very trivial. I've amended thousands of records on accident before, luckily roll-back is there...
If you're querying the live environment without having thoroughly tested your scripts then I think embarrassing should really be foolhardy or perhaps unprofessional.
One of my favorites happened in an automated import when the client changed the data structure without telling us first. The Social Security number column and the amount of money we were to pay the person got switched. Luckily we found it before the system tried to pay someone his social security number. We now have checks in automated imports that look for funny data before running and stop it if the data seems odd.
Like zabzonk said, forgot the WHERE clause on an update or two in my day.
We had an old application that didn't handle syncing with our HR database for name updates very efficiently, mainly due to the way they keyed in changes to titles. Anyway, a certain woman got married, and I had to write a database change request to update her last name, I forgot the where clause and everyone in said application's name was now Allison Smith.
Columns are nullable, and parameter values fail to retrieve the correct information...
The biggest mistake was giving developers "write" access to the production DB
many DEV and TEST records were inserted / overwritten and backup- ed too production until it was wisely suggested (by me!) to only allow read access!
Sort of SQL-server related. I remember learning about how important it is to always dispose of a SqlDataReader. I had a system that worked fine in development, and happened to be running against the ERP database. In production, it brought down the database because I assumed it was enough to close SqlConnection, and had hundreds, if not thousands of open connections.
At the start of my co-op term I ended up expiring access to everyone who used this particular system (which was used by a lot of applications in my Province). In my defense, I was new to SQL Server Management Studio and didn't know that you could 'open' tables and edit specific entries with a sql statement.
I expired all the user access with a simple UPDATE statement (access to this application was given by a user account on the SQL box as well as a specific entry in an access table) but when I went to highlight that very statement and run it, I didn't include the WHERE clause.
A common mistake I'm told. The quick fix was unexpire everyones accounts (including accounts that were supposed to be expired) until the database could be backed up. Now I either open tables and select specific entries with SQL or I wrap absolutely everything inside a transaction followed by an immediate rollback.
Our IT Ops decided to upgrade to SQL 2005 from SQL 2000.
The next Monday, users were asking why their app didn't work. Errors like:
DTS Not found etc.
This lead to a nice set of 3 Saturdays in the office rebuilding the packages in SSIS with a good overtime package :)
Not, exactly a "mistake" but back when I was first learning PHP and MYSQL I would spend hours daily, trying to figure out why my code was not working, not knowing that I had the wrong password/username/host/database credentials to my SQL database. You cant believe how much time I wasted on that, and to make it even worse this was not a one time incident. But LOL, its all good, it builds character.
I once, and only once, typed something similar to the following:
psql> UPDATE big_table SET foo=0; WHERE bar=123
I managed to fix the error rather quickly. Since that and another error my updates always start out as:
psql> UPDATE table SET WHERE foo='bar';
Much easier to avoid errors that way.
I worked with a junior developer once who got confused and called "Drop" instead of "Delete" on a table.
It was a long night working to get the backup restored...
Edit: I should have mentioned, this was in a production environment, and the table was full of data...
This was before the days when Google could help. I didn't encounter this problem with SQL Server, but with it's ugly older cousin Sybase.
I updated a table schema in a production environment. Not understanding at the time that stored procedures that use SELECT * must be recompiled to pickup new fields, I proceeded to spend the next eight hours trying to figure out why the stored procedure that performed a key piece of work kept failing. Only after a server reboot did I clue in.
Losing thousands of dollars and hundreds of (end user) man-hours at your flagship customer's site is quite an educational experience. Highly recommended!!
A healthy amount of years ago I was working on a clients site, that had a nice script to clear the dev environment of all orders, carts and customers.. to ease up testing, so I of course put the damn script on the productions server query analyzer and ran it.
Took some 5-6 minutes to run too, I was bitching about how slow the dev server was until the number of deleted rows came up. :)
Fortunately I had just ran a full backup since I was about to do an installation..
Beyond the typical where clause error. Ran a drop on an incorrect DB and thus had to run a restore. Now I triple check my server name. Thankfully I had a good backup.
I set the maximum server memory to 0. I was thinking at the time that would automatically tell SQL server to use all available memory (It was early). No such luck. SQL server decided to use only 16 MB and I had to connect in single user mode to get the setting changed back.
Hit "Restore" instead of "Backup" in Management Studio.
Today we're using a shared SQL Server database and that is perfect as I don't know anything about SQL Server maintenance. But for economic reasons we're need to upgrade to a dedicated server.
Given that I don't have time to read the entire documentation, What do I absolutely need to know about SQL Server to not screw this up?
Resource suggestions appreciated!
The answer probably has to do with how data-intensive your application is. If it's like most business applications, you're probably OK reading a couple quick start guides and winging it (as long as you back up regularly ... that's important, so read up on that carefully). SQL Server is generally pretty self-tuning, and if you're not talking millions of rows and high TPS, you're probably fine for a little while.
If it is a data-intensive application, or has high availability or throughput needs ... get a DBA, even just on contract. Don't put all your eggs in a basket you can't carry.
Backing up!
Oh, it's the accidental DBA!
Brent Ozar has a handful of useful articles: http://www.brentozar.com/sql/
Don't forget about SQLServerPedia - http://sqlserverpedia.com/wiki/Main_Page
Cheers!
In terms of backing up, don't forget to backup the transaction log as well as the database, unless you'd like your transaction log to grow until it takes over the entire drive.
I'd also read up on indexing, and statistics and rebuilding each.
Also you should probably get a good understanding of how database security works.
If at all possible get a dev server as well as a prod one. Much much better to test changes on dev than directly in production! Then limit prod access to only a couple of people and make all changes to production happen through tested scripts.
In order of importance:
How to schedule backups
How to create indexes
How to rebuild indexes
The Profiler and Tuning wizard can help you with 2 and 3.
If you are programming the database and not just administering it, I'd reccommend Robert Vieira's book. It's a great introduction.
The backup and restore process of a large database or collection of databases on sql server is very important for disaster & recovery purposes. However, I have not found a robust solution that will guarantee the whole process is as efficient as possible, 100% reliable and easily maintainable and configurable accross multiple servers.
Microsft's Maintenance Plans doesn't seem to be sufficient. The best solution I have used is one that I created manually using many jobs with many steps per database running on the source server (backup) and destination server (restore). The jobs use stored procedures to do the backup, copying & restoring. This runs once a day (full backup/restore) and intraday every 5 mins (transaction log shipping).
Although my current process works and reports any job failures via email, I know the whole process isn't very reliable and cannot be easily maintained/configured on all our servers by a non-DBA without having in-depth knowledge of the process.
I would like to know if others have this same backup/restore process and how others overcome this issue.
I've used a similar step to keep dev/test/QA databases 'zero-stepped' on a nightly basis for developers and QA folks to use.
Documentation is the key - if you want to remove what Scott Hanselman calls 'bus factor' (i.e. the danger that the creator of the system will get hit by a bus and everything starts to suck).
That said, for normal database backups and disaster recovery plans, I've found that SQL Server Maintenance Plans work out pretty well. As long as you include:
1) Decent documentation
2) Routine testing.
I've outlined some of the ways to go about doing that (for anyone drawn to this question looking for an example of how to go about creating a disaster recovery plan):
SQL Server Backup Best Practices (Free Tutorial/Video)
The key part of your question is the ability for the backup solution to be managed by a non-DBA. Any native SQL Server answer like backup scripts isn't going to meet that need, because backup scripts require T-SQL knowledge.
Because of that, you want to look toward third-party solutions like the ones Mitch Wheat mentioned. I work for Quest (the makers of LiteSpeed) so of course I'm partial to that one - it's easy to show to non-DBAs. Before I left my last company, I had a ten minute session to show the sysadmins and developers how the LiteSpeed console worked, and that was that. They haven't called since.
Another approach is using the same backup software that the rest of your shop uses. TSM, Veritas, Backup Exec and Microsoft DPM all have SQL Server agents that let your Windows admins manage the backup process with varying degrees of ease-of-use. If you really want a non-DBA to manage it, this is probably the most dead-easy way to do it, although you sacrifice a lot of performance that the SQL-specific backup tools give you.
I am doing precisely the same thing and have various issues semi regularly even with this process.
How do you handle the spacing between copying the file from Server A to Server B and restoring the transactional backup on Server B.
Every once in a while the transaction backup is larger than normal and takes a longer time to copy. The restore job then gets an operating system error that the file is in use.
This is not such a big deal since the file is automatically applied the next time around however it would be nicer to have a more elegant solution in general and one that specifically fixes this issue.