Automating DB Object Migrations from Source Control

Automating DB Object Migrations from Source Control - sql-server

I'm looking for some "Best Practices" for automating the deployment of Stored Procedures/Views/Functions/Table changes from source control. I'm using StarTeam & ANT so the labeling is taken care of; what I am looking for is how some of you have approached automating the pull of these objects from source - not necessarily StarTeam.
I'd like to end up with one script that can then be executed, checked in, and labeled.
I'm NOT asking for anyone to write that - just some ideas or approaches that have (or haven't) worked in the past.
I'm trying to clean up a mess and want to make sure I get this as close to "right" as I can.
We are storing the tables/views/functions etc. in individual files in StarTeam and our DB is SQL 2K5.

We use SQL Compare from redgate (http://www.red-gate.com/).
We have a production database, a development database and each developer has their own database.
The development database is synchronised with the changes a developer has made to their database when they check in their changes.
The developer also checks in a synchronisation script and a comparison report generated by SQL Compare.
When we deploy our application we simply synchronise the development database with the production database using SQL Compare.
This works for us because our application is for in-house use only. If this isn't your scenario then I would look at SQL Packager (also from redgate).

I prefer to separate views, procedures, and triggers (objects that can be re-created at will) from tables. For views, procedures, and triggers, just write a job that will check them out and re-create the latest.
For tables, I prefer to have a database version table with one row. Use that table to determine what new updates have not been applied. Then each update is applied and the version number is updated. If an update fails, you have only that update to check and you can re-run know that the earlier updates will not happen again.

Related

Deploying latest changes to production DB - without lose of data

I have been doing some research on the correct procedure to follow when working with both a development and live production database. The best article i have found was this one: Strategies for Database Development and Deployment, but i can not accept the idea that i have to maintain a Word document, manually, of every change i make to the DB. That seems ridiculous to me...
I use SQL Server Management Studio to manage my SQL databases both in dev and in prod. Is there a way to deploy latest changes to production WITHOUT destroying tables and data. Can someone please point me to a good procedural article on how this is done in SSMS.
Thanks

It is irresponsible to make changes to a database design without creating change scripts that are put into source control.
However, if you are already in this curcumstance, I suggest buying red_gate's SQLCompare. It will look at the two databases and script the differnces. You still can't run this willy-nilly though - sometimes you have made changes to the dev database that are not yet part of the curent version being pushed to prod and SQLCompare has no way to know this. It is far better to create the scripts as you go (Using alter table when the table currently exists so as not to disrupt existing data) and keep them in source control with the rest of code that you will be pushing at the same time.

The only right way to do it in prodiction - with or without Management Studio - prepare, check, test and run the scripts manually.
WITH FRESH BACKUP!

A common strategy is to keep an ordered set of change scripts, e.g. prefixed with date or database version, which can easily be tested on the development database by starting with a fresh backup from production. The change scripts can often be generated from SQL Server Management Studio when making changes, or could be crafted manually in case of more complex changes.
Another suggestion would be to version control the database definitions (tables, procedures etc). This can be easily achieved by using SQL Server Management Studio to generate create scripts for all objects after each update. This way you can easily compare changes made over time, or between different environments.

How do you deal with multiple developers and database changes?

I would like to know how you guys deal with development database changes in groups of 2 or more devs? Do you have a global db everyone access, maybe a local copy and manually apply script changes? It would be nice to see pros and cons that you've noticed for each approach and the number of devs in your team.

Start with "Evolutionary Database Design" by Martin Fowler. This sums it up nicely
There are have been other questions about DB development that may be useful too, for example Is RedGate SQL Source Control for me?

Our approach is that everyone has their own DB, the complete DB can be created from create scripts with base data if required. All the scripts required for this are in source control.
All scripts are CREATE scripts and they reflect the current state of the database schema. Upgrades are in separate SQL files which can upgrade existing DBs from a specific version to a newer one (run sequentially). After all the updates have been applied, the schema must be identical to what you would get from running the setup scripts.
We have some tools to do this (we use SQL Server and .NET):
Scripting is done with a tool which also applies a standard formatting so that the changes are well traceable with text diff tools (and by the SCM)
A runtime module takes care of comparing the existing DB objects, run updates if required, automatically apply "non-destructive" changes, then check the DB objects again to ensure a correct migration before committing the changes
The toolset is available as open-source project (licensed under LGPL), it's called the bsn ModuleStore (note that it is limited to SQL Server 2005/2008/Azure and to .NET for the runtime part).

We use what was code named "Data Dude" - the database features in TFS and Visual Studio - to deal with this. When you "get latest" and bring in code that relies on a schema change, you also bring in the revised schemas, stored procedures etc. You rigght-click the database project and Deploy; that gets your local schema and sp in sync but doesn't overwrite your data. The job of working out the script to get you from your old schema to the new one falls to Visual Studio, not to you or your DBA. We also have "populate" scripts for things like lists of provinces and a deploy runs them for you.
So much better than the old way which always fell apart at high stress times, with people checking in code then going home and nobody knowing what columns to add to make the code work etc.

Database source control vs. schema change scripts

Building and maintaining a database that is then deplyed/developed further by many devs is something that goes on in software development all the time. We create a build script, and maintain further update scripts that get applied as the database grows over time. There are many ways to manage this, from manual updates to console apps/build scripts that help automate these processes.
Has anyone who has built/managed these processes moved over to a Source Control solution for database schema management? If so, what have they found the best solution to be? Are there any pitfalls that should be avoided?
Red Gate seems to be a big player in the MSSQL world and their DB source control looks very interesting:
http://www.red-gate.com/products/solutions_for_sql/database_version_control.htm
Although it does not look like it replaces the (default) data* management process, so it only replaces half the change management process from my pov.
(when I'm talking about data, I mean lookup values and that sort of thing, data that needs to be deployed by default or in a DR scenario)
We work in a .Net/MSSQL environment, but I'm sure the premise is the same across all languages.

Similar Questions
One or more of these existing questions might be helpful:
The best way to manage database changes
MySQL database change tracking
SQL Server database change workflow best practices
Verify database changes (version-control)
Transferring changes from a dev DB to a production DB
tracking changes made in database structure
Or a search for Database Change

I look after a data warehouse developed in-house by the bank where I work. This requires constant updating, and we have a team of 2-4 devs working on it.
We are fortunate because there is only the one instance of our "product", so we do not have to cater for deploying to multiple instances which may be at different versions.
We keep a creation script file for each object (table, view, index, stored procedure, trigger) in the database.
We avoid the use of ALTER TABLE whenever possible, preferring to rename a table, create the new one and migrate the data over. This means that we don't have to look through a history of ALTER scripts - we can always see the up to date version of every table by looking at its create script. The migration is performed by a separate migration script - this can be partly auto-generated.
Each time we do a release, we have a script which runs the create scripts / migration scripts in the appropriate order.
FYI: We use Visual SourceSafe (yuck!) for source code control.

I've been looking for a SQL Server source control tool - and came across a lot of premium versions that do the job - using SQL Server Management Studio as a plugin.
LiquiBase is a free one but i never quite got it working for my needs.
There is another free product out there though that works stand along from SSMS and scripts out objects and data to flat file.
These objects can then be pumped into a new SQL Server instance which will then re-create the database objects.
See gitSQL

Maybe you're asking for LiquiBase?

Verify database changes (version-control)

I have read lots of posts about the importance of database version control. However, I could not find a simple solution how to check if database is in state that it should be.
For example, I have a databases with a table called "Version" (version number is being stored there). But database can be accessed and edited by developers without changing version number. If for example developer updates stored procedure and does not update Version database state is not in sync with version value.
How to track those changes? I do not need to track what is changed but only need to check if database tables, views, procedures, etc. are in sync with database version that is saved in Version table.
Why I need this? When doing deployment I need to check that database is "correct". Also, not all tables or other database objects should be tracked. Is it possible to check without using triggers? Is it possible to be done without 3rd party tools? Do databases have checksums?
Lets say that we use SQL Server 2005.
Edited:
I think I should provide a bit more information about our current environment - we have a "baseline" with all scripts needed to create base version (includes data objects and "metadata" for our app). However, there are many installations of this "base" version with some additional database objects (additional tables, views, procedures, etc.). When we make some change in "base" version we also have to update some installations (not all) - at that time we have to check that "base" is in correct state.
Thanks

You seem to be breaking the first and second rule of "Three rules for database work". Using one database per developer and a single authoritative source for your schema would already help a lot. Then, I'm not sure that you have a Baseline for your database and, even more important, that you are using change scripts. Finally, you might find some other answers in Views, Stored Procedures and the Like and in Branching and Merging.
Actually, all these links are mentioned in this great article from Jeff Atwood: Get Your Database Under Version Control. A must read IMHO.

We use DBGhost to version control the database. The scripts to create the current database are stored in TFS (along with the source code) and then DBGhost is used to generate a delta script to upgrade an environment to the current version. DBGhost can also create delta scripts for any static/reference/code data.
It requires a mind shift from the traditional method but is a fantastic solution which I cannot recommend enough. Whilst it is a 3rd party product it fits seamlessly into our automated build and deployment process.

I'm using a simple VBScript file based on this codeproject article to generate drop/create scripts for all database objects. I then put these scripts under version control.
So to check whether a database is up-to-date or has changes which were not yet put into version control, I do this:
get the latest version of the drop/create scripts from version control (subversion in our case)
execute the SqlExtract script for the database to be checked, overwriting the scripts from version control
now I can check with my subversion client (TortoiseSVN) which files don't match with the version under version control
now either update the database or put the modified scripts under version control

You have to restrict access to all databases and only give developers access to a local database (where they develop) and to the dev server where they can do integration. The best thing would be for them to only have access to their dev area locally and perform integration tasks with an automated build. You can use tools like redgates sql compare to do diffs on databases. I suggest that you keep all of your changes under source control (.sql files) so that you will have a running history of who did what when and so that you can revert db changes when needed.
I also like to be able to have the devs run a local build script to re initiate their local dev box. This way they can always roll back. More importantly they can create integration tests that tests the plumbing of their app (repository and data access) and logic stashed away in a stored procedure in an automated way. Initialization is ran (resetting db), integration tests are ran (creating fluff in the db), reinitialization to put db back to clean state, etc.
If you are an SVN/nant style user (or similar) with a single branch concept in your repository then you can read my articles on this topic over at DotNetSlackers: http://dotnetslackers.com/articles/aspnet/Building-a-StackOverflow-inspired-Knowledge-Exchange-Build-automation-with-NAnt.aspx and http://dotnetslackers.com/articles/aspnet/Building-a-StackOverflow-inspired-Knowledge-Exchange-Continuous-integration-with-CruiseControl-NET.aspx.
If you are a perforce multi branch sort of build master then you will have to wait till I write something about that sort of automation and configuration management.
UPDATE
#Sazug: "Yep, we use some sort of multi branch builds when we use base script + additional scripts :) Any basic tips for that sort of automation without full article?" There are most commonly two forms of databases:
you control the db in a new non-production type environment (active dev only)
a production environment where you have live data accumulating as you develop
The first set up is much easier and can be fully automated from dev to prod and to include rolling back prod if need be. For this you simply need a scripts folder where every modification to your database can be maintained in a .sql file. I don't suggest that you keep a tablename.sql file and then version it like you would a .cs file where updates to that sql artifact is actually modified in the same file over time. Given that sql objects are so heavily dependent on each other. When you build up your database from scratch your scripts may encounter a breaking change. For this reason I suggest that you keep a separate and new file for each modification with a sequence number at the front of the file name. For example something like 000024-ModifiedAccountsTable.sql. Then you can use a custom task or something out of NAntContrib or an direct execution of one of the many ??SQL.exe command line tools to run all of your scripts against an empty database from 000001-fileName.sql through to the last file in the updateScripts folder. All of these scripts are then checked in to your version control. And since you always start from a clean db you can always roll back if someones new sql breaks the build.
In the second environment automation is not always the best route given that you might impact production. If you are actively developing against/for a production environment then you really need a multi-branch/environment so that you can test your automation way before you actually push against a prod environment. You can use the same concepts as stated above. However, you can't really start from scratch on a prod db and rolling back is more difficult. For this reason I suggest using RedGate SQL Compare of similar in your build process. The .sql scripts are checked in for updating purposes but you need to automate a diff between your staging db and prod db prior to running the updates. You can then attempt to sync changes and roll back prod if problems occur. Also, some form of a back up should be taken prior to an automated push of sql changes. Be careful when doing anything without a watchful human eye in production! If you do true continuous integration in all of your dev/qual/staging/performance environments and then have a few manual steps when pushing to production...that really isn't that bad!

First point: it's hard to keep things in order without "regulations".
Or for your example - developers changing anything without a notice will bring you to serious problems.
Anyhow - you say "without using triggers".
Any specific reason for this?
If not - check out DDL Triggers. Such triggers are the easiest way to check if something happened.
And you can even log WHAT was going on.

Hopefully someone has a better solution than this, but I do this using a couple methods:
Have a "trunk" database, which is the current development version. All work is done here as it is being prepared to be included in a release.
Every time a release is done:
The last release's "clean" database is copied to the new one, eg, "DB_1.0.4_clean"
SQL-Compare is used to copy the changes from trunk to the 1.0.4_clean - this also allows checking exactly what gets included.
SQL Compare is used again to find the differences between the previous and new releases (changes from DB_1.0.4_clean to DB_1.0.3_clean), which creates a change script "1.0.3 to 1.0.4.sql".
We are still building the tool to automate this part, but the goal is that there is a table to track every version the database has been at, and if the change script was applied. The upgrade tool looks for the latest entry, then applies each upgrade script one-by-one and finally the DB is at the latest version.
I don't have this problem, but it would be trivial to protect the _clean databases from modification by other team members. Additionally, because I use SQL Compare after the fact to generate the change scripts, there is no need for developers to keep track of them as they go.
We actually did this for a while, and it was a HUGE pain. It was easy to forget, and at the same time, there were changes being done that didn't necessarily make it - so the full upgrade script created using the individually-created change scripts would sometimes add a field, then remove it, all in one release. This can obviously be pretty painful if there are index changes, etc.
The nice thing about SQL compare is the script it generates is in a transaction -and it if fails, it rolls the whole thing back. So if the production DB has been modified in some way, the upgrade will fail, and then the deployment team can actually use SQL Compare on the production DB against the _clean db, and manually fix the changes. We've only had to do this once or twice (damn customers).
The .SQL change scripts (generated by SQL Compare) get stored in our version control system (subversion).

If you have Visual Studio (specifically the Database edition), there is a Database Project that you can create and point it to a SQL Server database. The project will load the schema and basically offer you a lot of other features. It behaves just like a code project. It also offers you the advantage to script the entire table and contents so you can keep it under Subversion.
When you build the project, it validates that the database has integrity. It's quite smart.

On one of our projects we had stored database version inside database.
Each change to database structure was scripted into separate sql file which incremented database version besides all other changes. This was done by developer who changed db structure.
Deployment script checked against current db version and latest changes script and applied these sql scripts if necessary.

Firstly, your production database should either not be accessible to developers, or the developers (and everyone else) should be under strict instructions that no changes of any kind are made to production systems outside of a change-control system.
Change-control is vital in any system that you expect to work (Where there is >1 engineer involved in the entire system).
Each developer should have their own test system; if they want to make changes to that, they can, but system tesing should be done on a more controlled, system test system which has the same changes applied as production - if you don't do this, you can't rely on releases working because they're being tested in an incompatible environment.
When a change is made, the appropriate scripts should be created and tested to ensure that they apply cleanly on top of the current version, and that the rollback works*
*you are writing rollback scripts, right?

I agree with other posts that developers should not have permissions to change the production database. Either the developers should be sharing a common development database (and risk treading on each others' toes) or they should have their own individual databases. In the former case you can use a tool like SQL Compare to deploy to production. In the latter case, you need to periodically sync up the developer databases during the development lifecycle before promoting to production.
Here at Red Gate we are shortly going to release a new tool, SQL Source Control, designed to make this process a lot easier. We will integrate into SSMS and enable the adding and retrieving objects to and from source control at the click of a button. If you're interested in finding out more or signing up to our Early Access Program, please visit this page:
http://www.red-gate.com/Products/SQL_Source_Control/index.htm

I have to agree with the rest of the post. Database access restrictions would solve the issue on production. Then using a versioning tool like DBGhost or DVC would help you and the rest of the team to maintain the database versioning

How do you track database changes in source control?

We use SQL Server 2000/2005 and Vault or SVN on most of our projects. I haven't found a decent solution for capturing database schema/proc changes in either source control system.
Our current solution is quite cumbersome and difficult to enforce (script out the object you change and commit it to the database).
We have a lot of ideas of how to tackle this problem with some custom development, but I'd rather install an existing tool (paid tools are fine).
So: how do you track your database code changes? Do you have any recommended tools?
Edit:
Thanks for all the suggestions. Due to time constraints, I'd rather not roll my own here. And most of the suggestions have the flaw that they require the dev to follow some procedure.
Instead, an ideal solution would monitor the SQL Database for changes and commit any detected changes to SCM. For example, if SQL Server had an add-on that could record any DML change with the user that made the change, then commit the script of that object to SCM, I'd be thrilled.
We talked internally about two systems:
1. In SQL 2005, use object permissions to restrict you from altering an object until you did a "checkout". Then, the checkin procedure would script it into the SCM.
2. Run a scheduled job to detect any changes and commit them (anonymously) to SCM.
It'd be nice if I could skip the user-action part and have the system handle all this automatically.

Use Visual studio database edition to script out your database. Works like a charm and you can use any Source control system, of course best if it has VS plugins. This tool has also a number of other useful features. Check them out here in this great blog post
http://www.vitalygorn.com/blog/post/2008/01/Handling-Database-easily-with-Visual-Studio-2008.aspx
or check out MSDN for the official documentation

Tracking database changes directly from SSMS is possible using various 3rd party tools. ApexSQL Source Control automatically scripts any database object that is included in versioning. Commits cannot be automatically performed by the tool. Instead, the user needs to choose which changes will be committed.
When getting changes from a repository, ApexSQL Source Control is aware of a SQL database referential integrity. Thus, it will create a synchronization scripts including all dependent objects that will be wrapped in a transactions so, either all changes will be applied in case no error is encountered, or none of the selected changes is applied. In any case, database integrity remains unaffected.

I have to say I think a visual studio database project is also a reasonable solution to the source control dilemma. If it's set up correctly you can run the scripts against the database from the IDE. If your script is old, get the latest, run it against the DB. Have a script that recreates all the objects as well if you need, new objects must be added to the this script as well by hand, but only once
I like every table, proc and function to be in it's own file.

One poor man's solution would be to add a pre-commit hook script that dumps out the latest db schema into a file and have that file committed to your SVN repository along with your code. Then, you can diff the db schema files from any revision.

I just commit the SQL-alter-Statement additional to the complete SQL-CreateDB-statement.

Rolling your own from scratch would not be very doable, but if you use a sql comparison tool like Redgate SQL Compare SDK to generate your change files for you it would not take very long to half-roll what you want and then just check those files into source control. I rolled something similar for myself to update changes from our development systems to our live systems in just a few hours.

In our environment, we never change the DB manually: all changes are done by scripts at release time, and the scripts are kept in the version control system. One important part of this procedure is to be sure that all scripts can be run again against the same DB the scripts are idempotent?) without loss of data. For example, if you add a column, make sure that you do nothing if the column is already there.
Your comment about "suggestions have the flaw that they require the dev to follow some procedure" is really a tell-tale. It's not a flaw, it's a feature. Version control helps developers in following procedures and makes the procedures less painful. If you don't want to follow procedures, you don't need version control.

In SQL2000 generate each object into it's own file, then check them all into your source control. Let your source control handle the change history.
In SQL 2005, you'll need to write a bit of code to generate all objects into separate files.

In one project I arranged by careful attention in the design that all the important data in the database can be automatically recreated from external places. At startup the application creates the database if it is missing, and populates it from external data sources, using a schema in the application source code (and hence versioned with the application). The database store name (a sqlite filename although most database managers allow multiple databases) includes a schema version, and we increase the schema version whenever we commit a schema change. This means when we restart the application to a new version with a different schema that a new database store is automatically created and populated. Should we have to revert a deployment to an old schema then the new run of the old version will be using the old database store, so we get to do fast downgrades in the event of trouble.
Essentially, the database acts like a traditional application heap, with the advantages of persistence, transaction safety, static typing (handy since we use Python) and uniqueness constraints. However, we don't worry at all about deleting the database and starting over, and people know that if they try some manual hack in the database then it will get reverted on the next deployment, much like hacks of a process state will get reverted on the next restart.
We don't need any migration scripts since we just switch database filename and restart the application and it rebuilds itself. It helps that the application instances are sharded to use one database per client. It also reduces the need for database backups.
This approach won't work if your database build from the external sources takes longer than you will allow the application to be remain down.

If you are using .Net and like the approach Rails takes with Migrations, then I would recommend Migrator.Net.
I found a nice tutorial that walks through setting it up in Visual Studio. He also provides a sample project to reference.

We developed a custom tool that updates our databases. The database schema is stored in a database-neutral XML file which is then read and processed by the tool. The schema gets stored in SVN, and we add appropriate commentary to show what was changed. It works pretty well for us.
While this kind of solution is definitely overkill for most projects, it certainly makes life easier at times.

Our dbas periodically check prod against what is in SVN and delete any objects not under source control. It only takes once before the devlopers never forget to put something in source control again.
We also do not allow anyone to move objects to prod without a script as our devs do not have prod rights this is easy to enforce.

In order to track all the change like insert update and delete there will be a lot of overhead for the SVN.
It is better to track only the ddl changes like (alter, drop, create) which changes the schema.
You can do this Schema tracking easily by creating a table and a trgger to insert data to that table.
Any time you want u can get the change status by querying from that table
There are a lots of example here and here