Evolutionary Database Migration and Default Data

Evolutionary Database Migration and Default Data - database

We're re-evaluating our database upgrade process for our application to try and remove the pain of having to generate all the upgrade scripts for a release at the end of the release cycle. We're looking to move towards a more evolutionary process, using migrations which are checked in alongside features with a tool such as migratordotnet, and this seems like a very clean way of managing schema changes.
However, the default data which is shipped with our database is quite regularly subject to change and some of these data updates do not fit well with the migration process. For example, inserts to tables which have an Identity primary key are not easily identifiable and so cannot be reversed on downgrading.
So I'm wondering how people manage the migration of default data? Do they managed it outside of the scheme migration process? Or are inserts performed during migrations but the removal of the data is not performed during downgrades?

For us, DB migration is part of the daily development process. Developers have to either commit a program or a script which performs the necessary changes. This happens as soon the related feature is implemented and never "at the end of the release cycle".
Downgrade is rarely an issue but if you have it, create a column with a version information. When the upgrade succeeds and the customer decides to stay with the new version, drop the column again (or keep it).
The key for success is that we have extensive test cases which create the database from scratch (using an In-Memory DB like H2 or a DB which is installed on each developer machine) including all data and then migrate it all the way through and back. We can import anonymized data from the productions servers into the test cases to track down bugs and improve tests without violating the privacy of our customers or getting in the way of the developers.

Related

Flyway: how to prevent accidentally running migrations from developer on a central database?

When I debug against a (central) database, Flyway can update the database schema. This happens when my local application runs on a development branch which is further ahead than the deployed application.
Running the local application will then invoke migration scripts on the central database. In the worst case, this might update a production database of course.
Another scenario is one where 2 developers work against 1 development database with test-data. Both developers are working on different features and both are modifying the schema. When one developer updates the db, the other is (probably) confronted with a checksum issue, and otherwise surprised by the changes made.
I'm thinking about a solution to such problems.
Of course, there are some solutions outside of Flyway, like only connecting to throw-away databases, or preventing access to important db's.
I'm interested in which options Flyway offers.

There are some options, which seems to be useful in your cases.
For the first one, you can set the target property with the same version your DB has, to prevent Flyway to update it.
For the second case with 2 developers working simultaneously with the same DB instance, you can try to turn off the validateOnMigrate property to avoid validation failures or the ignoreMissingMigrations to ignore the migrations some of the developers doesn't have yet.
Here you can find all available via console properties for migration task. You didn't specify, haw exactly you run the Flyway, if it's done vie Spring Boot, then not all this properties are available, just some of them - you can find it here under the Flyway section.
But for most cases, I think, the best solution is to simply turn Flyway migrations off during the development and debugging if it's possible and use for the delivery of ready features.

Updating the production IIS WebSite and SQL Server Database without stopping

There is a testing server that uses the testing database. We test the website on the testing server. If it is okay, we update the website and the database schema from the testing server to the production server. But this method is very painful and risky.
First, we have to redirect the users to a maintenance page, so the website is paused for a while.
Second, if something goes wrong when updating, we have to back to old website, because we can't put the website in a maintenance mode for a long time.
So I'm seeking a solid solution to update an IIS website and an Sql Server Database without data loss and using maintenance mode. Is there any way to do this? How the big websites do this without data loss and pausing.
We've thought using a release candidate website. We've planned to use this RC website for temporary. First, we update the RC site, then swap the bindings between RC and production website. But this time the database is problem. Because we can change the database schema, and the old one can't work with new database. So, if we use a temp site with temp db, there will be data loss. Also, the updated website won't work with old database if the temp site uses old production database. So I need a solid and practical solution for this problem.

This is orders of magnitude more complicated than you imagine. This specifically is not about HA nor about contiguous integration. Neither of those will provide what you need, they're only pieces of the much more complex puzzle.
There simply isn't possible to write code changes in a manner that is transparent/oblivious to schema changes as they occur. At best you can write the code in a manner that supports the schema at v. N and at v. N+1, which in itself is a big challenge. But is impossible to write the code in a manner that supports the schema as it transitions from v. N to v. N+1. The schema change induced by a deployment has to be atomic for the code operating on the schema. Since the schema change itself cannot be atomic, it follow that the upgrade has two possible avenues:
take the code offline during the schema change. This is what you're doing now and is the safest approach. Of course, it implies service availability down time and runs the risks you already experienced (rollback of failed upgrade, lengthy upgrade, etc). A variant of this approach is to redirect the service to a read-only copy of the data and offer a degraded service experience(no changes are possible during the downtime) which may or may not be acceptable, depending on the business specifics.
standby upgrade. This implies that you take a snapshot of the service data (various HA solutions may provide a standby snapshot out-of-the-box, eg. log shipping). Upgrade the snapshot, then apply all the transactions that occurred on the real service data to the upgraded snapshot. This is always tricky, because it requires a technology to detect, capture and apply the changes (eg. change tracking, replication, custom solution etc) and requires to transform each change to the new, upgraded, schema. Once the upgraded schema is up to date with changes from the main service, the service can be redirected to the upgraded schema. This redirection is also much more complex than it sounds. For one choosing the moment when to cut-off the old schema and stop accepting new changes, while making sure all changes were applied to the new upgraded schema DB is a challenge in itself. Another challenge is to resolve the conflict of the code understanding pre-upgrade and post-upgrade schema versions. Developing code that handles both is, as I said, problematic and error prone, so one solution is to, again, take the service offline for a short period and replace the code. Another solution is to have a standby service, running code that handles the post-upgrade DB schema and is connected to the post-upgrade DB, and redirect the live requests to your standby, upgraded, service.
And I did not even touch the thorny subject of service interaction, when a particular service of a much larger deployed solution has to be upgraded. This is when service API protocol back compatibility plays the major role to allow the post-upgrade service to play along with its peer services.
Ultimately there just isn't any silver bullet. I've witnessed single machine large DB deployments that took weeks to roll out version N+1, with transcriptional replication contiguously feeding the post-upgrade DB schema with changes from the pre-upgrade DB. And I witnessed deployments of thousands of machines deploying version N+1 in stages, as a complicated dance of enabling code and data changes over the course of several days to reach the full functionality of the post-upgrade. This problem is just plain hard.

This is what Azure is fantastic for. The Azure Cloud platform allows for staging servers and production servers. You can set it up so once you commit your changes to Git or TFS it is pushed to either a staging or production server automatically. You can also set up to manually push changes. Most of the ORM libraries like Entity Framework have migration support.
There is a lot more information out there on this topic like:
Azure seamless upgrade when database schema changes
Staging or Production Instance?

Youre describing a high availability solution (HA). HA solutions are expensive and can quickly be overkill. you would need redundancy for your app and db servers, which means setting up a db cluster. All this increases the amount of time you would spend deploying changes, but the tradeoff is that your app is always available.
The main thing with deployment is having a repeatable process. So my best recommendation is to script or automate as much as possible.

SQL Server 2008 Auto Backup

We want to have our test servers databases updated from our production server databases on a nightly basis to ensure we're developing on the most recent data. We, however, want to ensure that any fn, sp, etc that we're currently working on in the development environment doesn't get overwritten by the backup process. What we were thinking of doing was having a prebackup program that saves objects selected by our developers and a postbackup program to add them back in after the backup process is complete.
I was wondering what other developers have been doing in a situation like this. Is there an existing tool to do this for us that can run automatically on a daily basis and allow us to set objects not to overwrite (without requiring the attention of a sysadmin to run it daily).

All the objects in our databases are maintained in code - tables, view, triggers, stored procedures, everything - if we expect to find it in the database then it should be in DDL in code that we can run. Actual schema changes are versioned - so there's a table in the database that says this is schema version "n" and if this is not the current version (according to the update code) then we make the necessary changes.
We endeavour to separate out triggers and views - don't, although we probably should, do much with SP and FN - with drop and re-create code that is valid for the current schema version. Accordingly it should be "safe" to drop and recreate anything that isn't a table, although there will be sequencing issues with both the drop and the create if there are dependencies between objects. The nice thing about this generally is that we can confidently bring a schema from new to current and have confidence that any instance of the schema is consistent.
Expanding to your case, if you have the ability to run the schema update code including the code to recreate all the database objects according to the current definitions then your problem should substantially go away... backup, restore, run schema maint logic. This would have the further benefit that you can introduce schema (table) changes in the dev servers and still keep the same update logic.
I know this isn't a completely generic solution. And its worth noting that it probably works better with database per developer (I'm an old fashioned programmer, so I see all problems as having code based solutions (-:) but as a general approach I think it has considerable merit because it gives you a consistent mechanism to address a number of problems.

Step-by-step instructions for updating an (SQL Server) database?

Just a question about best-practices when upgrading an existing database. Assuming there will be all kinds of modifications to the data itself, the structure, the relations, additional columns, disappearing columns and whatever more.
My problem is a simple one. I'm working on a project that will use SQL Server. No problem there, since I'm enough of an expert to handle this. But this project will be upgraded later on and I need to specify a protocol that needs to be followed by the upgrade mechanism. Basically, this protocol needs to be followed when creating upgrade scripts...
Right now, I have these simple steps:
Add the new columns to the tables.
Add constraints to the new columns.
Add new tables.
Drop constraints where needed.
Drop columns that need to be removed.
Drop tables that need to be removed.
Somehow, this list feels incomplete. Is there a more extended list somewhere describing the proper steps which needs to be followed during an upgrade?
Also, is it always possible to do a complete upgrade within a single database transaction (with SQL Server) or are there breakpoints that need to be included within the protocol where one transaction should end and another one starts?While automated tools will provide a nice, automated solution, I still can't really use them. The development team working on this system has 4 developers, each with their own database on their local system. Every developer keeps track of their own updates to the structure and keeps track of them by generating both an Upgrade and Downgrade script for his own modifications, both for structural changes and data changes. These scripts can then be used by the other developers to keep their own system up-to-date. Whenever the system is going to be released, those scripts are all merged into one big script.
The system does not include any stored procedures or other "special" features. The database is just that: a data storage with just tables and relations between them. No roles, no users, no stored procedures, no triggers, no complex datatypes...The DB is used by an application where users work from 9-to-5 so shutting down can be done easily, including upgrades for the clients. We also add a version number to the database and applications will check if they're linked to the correct database version.
During development, all developers use their own database instance, which they can fully control. Since we're not the ones who use the application, we tend to develop for the Express edition, not any more expensive one. To be honest, we don't develop our application to support a lot of users, but we'll inform our users that since it uses SQL Server, they could install the system on a bigger SQL Server platform, according to their own needs. They will need their own DBA for this, though. We do have a bigger SQL Server available for ourselves, which we also use for our own web interface, but this server is located in a special dataserver where it is being maintained for us, not by us.
The project previously used MS Access for it's data storage and was intended for single-user development, but as it turned out, many users still decided to share their databases and this had shown that the datamodel itself is reliable enough for multi-user environments. So we migrated to SQL Server to support smaller offices with 3 or more users and some big organisation who will have 500 or more users at the same time.
Since we need to keep the cost of the software low, we don't have a big budget to spend on expensive tools or a more expensive server.

Check out Red-Gate's SQL Compare (structure comparison), SQL Data Compare (data comparison), and SQL Packager (for packaging up updates scripts into a C# project or a .NET executable).
They provide a nice, clean, fully functional and easy-to-use solution for all your database upgrade needs. They're well worth their license fees - that pays for itself in a few weeks or months.
Highly recommended!

In my opinion, it's an absolute bear doing these manually. For Microsoft SQL Server, I'd recommend using the Database editiion of Team System, since it includes complete source control capabilities for your database, and can automatically build your scripts for upgrading/downgrading versions.
Another option is SQLCompare with Redgate, which can also handle these kinds of upgrades/downgrades, and will result in a very nice SQL script. I've used both, and keeping the historic scripts has helped us troubleshoot issues and resolve many a mystery.
If you are working with a manual script as above, don't forget to also account for SP changes in your scripts. Also, any hand-edited script should be able to be executed multiple times on a database - i.e. if your script includes a table creation or drop, be sure to check for existance first, otherwise your script will fail if executed back to back.
Again, while it's possible to build a manual protocol I'd still fall back on using one of the purpose-built tools out there, and both Team System and SQL Compare will be able to output scripts that you could include as part of an installation/upgrade package.

With database updates I always believe it should be all or nothing. If any of the DB updates fail your application will be left in an unknown state that could be harmful to the data so I think it is best practice to either apply them all or none (1 transaction around them all).
I also like to backup the database before applying updates so that if anything does go wrong the database can be rolled back (this has saved me numerous times when working with live data).
Hope this helps.

Best practices for upgrading a production database schema actually look pretty bad on the surface. Unless you can completely shut down your system for the upgrade, which is often not possible, your changes all need to be backwards compatible. If you have many clients accessing the database, you can't update them all simultaneously, so any schema changes you make need to allow old code to run.
That means never renaming a column, and making all new columns nullable. This doesn't mean you leave it like that forever. You write two scripts, one for the initial change, which is backwards compatible, then another to clean things up after all clients have been updated.
Automated tools are great for validation of schemas, but they are not so good when it comes to actually modifying a complex system. You should break your changes up into many small, discrete change scripts so each can be run manually. If there's a failure, it's easier to pinpoint the cause and fix it. Basically, each feature gets its own script. Give each a unique name and then store that name in the database itself when you run the script so you can query the database to find out what's been run and what hasn't. This is invaluable when you have instances on developer's machines, test servers, production, etc.

Database Design Theory For Multiple Application Instances

I'm working on a SaaS project that will have each customer having an instance of the application (customer1.application.com, customer2.application.com, etc.) and ideally each customer would have their "own" space in the DB. The current plan is to create a DB for each customer and deploy an instance of the application into the web farm. The idea is that each customer could opt out of an upgrade to maintain the status quo (something one of our investors REALLY wanted, largely in part because he hates how Facebook keeps changing how it works.)
Last night I attempted to roll out, to my two test accounts, an update that altered the database. While the ensuing errors that were caused were my fault (forgetting a small but apparently very important change in the DDL) I'm starting to worry about my overall theory of operation because missing one ALTER COLUMN statement and a whole upgrade cycle could be blown to hell. So after that long build up here's my questions:
1) Is there a way to do a diff between two databases (the "test" production database and a actual production database) that will accurately record each change being made?
2) Is there another database (and/or application) design model I should be considering? I know if I take away supporting multiple versions of the application that I actually remove a lot of the long term support headaches.

Food for thought:
Code upgrades happen more frequently than DB Schema upgrades. Make sure you have a really good SCM in place to handle the code upgrades. We use git with great success.
Code is easy to manage, databases are not (in comparison). The reason is that they are mutable, and change each moment. Plus, they are really hard to roll back (possible, but time consuming, with downtime). So we must arrive at a simple way to track schema updates (along with associated data changes), and be able to apply them in the future to other similar databases.
Each database schema version should be given a unique, sequential integer version number. Start with 100 per say.
Each time you have to upgrade it, write an sql scripts like
100-101.sql
101-102.sql
102-103.sql
It is the job of each script to perform the upgrade for that specific version. It can be as simple as adding a table, or as complicated as re-arranging foreign keys. But in any event, they will be reliable in what they are designed to do.
You can apply any given script many times during testing (on fresh data) to ensure it will work as expected.
So when you find yourself needing to upgrade a client from version 130 to 180, you can safely apply the sql scripts (IN ORDER), and you will arrive at the correct destination.

You should never be changing DBs by hand. Do it by a script that does all DDL changes, etc...
Ideally, there should be a generic DB release script that uses DDL version as configuration/input.
(and DDL changes should be tagged with a specific tag in a versioning system)
You can go Microsoft route re: supporting multiple versions as a headache - simply designate all versions prior to X (say 2 versions back) as un-supported. That way, you can support last 2-3 versions but don't waste resources on anything more, while allowing per-client flexibility to a large extent.
You should carefully weigh pros/cons of having versioned app/DB system like you propose.
List the the pros (such as placating an investor, positive experience for a client when version changes unexpectedly that you mentioned - translated into marginal probability to retain/add new clients who require such feature, plus an easy way to do BETA/UAT testing, plus a failrly wasy way to roll back the schema changes gone awry by loading client's data into DB schema from prior version).
List the cons (cost of DB space, cost of your time to implement, cost of support)
Compare the two and decide which is better for your business.

Redgate's SQL Compare does a good job of comparing and diffing two SQL Server databases (warning: commercial third-party product). Also, I think there's free stuff out there that does much the same thing.
If you want to be able to leave some customers behind on older versions of your product, it might make more sense to maintain a one-database-per-customer model, with the scripts for building each version of the databases under source control. This keeps your customers isolated from each other, and even allows you to switch database vendors (e.g. from SQL Server to Oracle) or versions (i.e. from SQL Server 2000 to Sql Server 2005) on some customers while keeping other customers on the older versions.

Manual run scripts will not work. Nor diff tools, for the matter. Diff works for 2,4 maybe 10 databases. But does not scale, because what you need is reliability in presence of failures (offline databases, server restarting all that).
You deploy by scheduling upgrade scripts. For instance, see how MySpace does this for over 1000 databases: MySpace Uses SQL Server Service Broker to Protect Integrity of 1 Petabyte of Data. the key take is that they use a guaranteed, reliable, delivery mechanism (SSB) to deploy schema maintenance scripts. You need an asynchronous, reliable, mechanism to run scripts because destination databases may be offline, running scheduled maintenance, unreacahbe etc, and a reliable delivery mechanism like Service Broker can handle all the retries and related issues (handling duplicates, acknowledgments etc). You can also look at Asynchronous procedure execution for an example of how to handle script execution via SSB.
As for the scripts themselves, I recommend you start looking at your database schema and configuration data as a versioned resource. I have addresses this problem already several times, eg. see Do you put your database static data into source-control ? How?
Update
I guess I own some explanation why I consider diffing a wrong approach. Just to makes things clear, I'm talking about deployments of hundreds of servers and thousands of databases. The original post compares itself to facebook and I whish them to have the success to reach that size, but also the questions asks about design principles, so I say that discussing about cloud level scale is appropiate.
I see two problems with diff tools:
Availability. All diff tools work by connecting to both the 'master' and the 'copy', so they can do their job only when both are online. This creates a hot spot, a single point of failure, the 'master' copy, whose availability becomes critical for deploying upgrades. High availability always comes at a cost. It also leaves the problem of 'copy' availability as a minor implementation details, the upgrade scheme must handle retries and time outs and disconnects from the client on its own (not a trivial problem by any means).
Atomicity. The diff tools expect a stable schema of the 'master'. This in effect places a freeze on 'master' while an upgrade is taking place. While this can be controlled on a small scale, on large scales it becomes a problem as upgrading the master itself to v. N+1 becomes a race against all the thousands of databases, when some may be still upgrading from v. N-1.
Script based solutions that ship the upgrade script to the 'copy' solve both of these problems. Also diff tools like the VSDB .dbschema based vsdbcmd.exe are better than a 'live' diff tool since the 'master' dbschema file can be delivered to the 'copy' machine and turn the whole upgrade process into a local operation.
Overall I also belive that script based upgrade, using metadata versioning, is supperior to diff based upgrade, because of reasons of testing and source control I had already talked about in the link to Q1525591.

if I take away supporting multiple
versions of the application that I
actually remove a lot of the long term
support headaches
Any change, however small, has a chance of breaking something that is important for someone.
So if you have multiple customers, rolling out a fix for customer 1 will upset customer 2. It doesn't even have to be a bugged release; it might just be a change in behaviour they disagree with. For most customers, not controlling the release schedule is simply unacceptable.
So I'd advise you to keep a different codebase for every customer. Roll out fixes only after agreement with a customer.
There is a number of customers where this approach breaks down (think Yahoo mail), but reading your question I think you're safely below that number. And for a compare tool, I can't help but agree with the posts suggesting Redgate's SQL Compare.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight