How to do version control for SQL Server database? - sql-server

I want to get my databases under version control.
I'll always want to have at least some data in there (as alumb mentions: user types and administrators). I'll also often want a large collection of generated test data for performance measurements.
How would I apply version control to my database?

Martin Fowler wrote my favorite article on the subject, http://martinfowler.com/articles/evodb.html. I choose not to put schema dumps in under version control as alumb and others suggest because I want an easy way to upgrade my production database.
For a web application where I'll have a single production database instance, I use two techniques:
Database Upgrade Scripts
A sequence database upgrade scripts that contain the DDL necessary to move the schema from version N to N+1. (These go in your version control system.) A _version_history_ table, something like
create table VersionHistory (
Version int primary key,
UpgradeStart datetime not null,
UpgradeEnd datetime
);
gets a new entry every time an upgrade script runs which corresponds to the new version.
This ensures that it's easy to see what version of the database schema exists and that database upgrade scripts are run only once. Again, these are not database dumps. Rather, each script represents the changes necessary to move from one version to the next. They're the script that you apply to your production database to "upgrade" it.
Developer Sandbox Synchronization
A script to backup, sanitize, and shrink a production database. Run this after each upgrade to the production DB.
A script to restore (and tweak, if necessary) the backup on a developer's workstation. Each developer runs this script after each upgrade to the production DB.
A caveat: My automated tests run against a schema-correct but empty database, so this advice will not perfectly suit your needs.

Red Gate's SQL Compare product not only allows you to do object-level comparisons, and generate change scripts from that, but it also allows you to export your database objects into a folder hierarchy organized by object type, with one [objectname].sql creation script per object in these directories. The object-type hierarchy is like this:
\Functions
\Security
\Security\Roles
\Security\Schemas
\Security\Users
\Stored Procedures
\Tables
If you dump your scripts to the same root directory after you make changes, you can use this to update your SVN repo, and keep a running history of each object individually.

This is one of the "hard problems" surrounding development. As far as I know there are no perfect solutions.
If you only need to store the database structure and not the data you can export the database as SQL queries. (in Enterprise Manager: Right click on database -> Generate SQL script. I recommend setting the "create one file per object" on the options tab) You can then commit these text files to svn and make use of svn's diff and logging functions.
I have this tied together with a Batch script that takes a couple parameters and sets up the database. I also added some additional queries that enter default data like user types and the admin user. (If you want more info on this, post something and I can put the script somewhere accessible)
If you need to keep all of the data as well, I recommend keeping a back up of the database and using Redgate (http://www.red-gate.com/) products to do the comparisons. They don't come cheap, but they are worth every penny.

First, you must choose the version control system that is right for you:
Centralized Version Control system - a standard system where users check out/check in before/after they work on files, and the files are being kept in a single central server
Distributed Version Control system - a system where the repository is being cloned, and each clone is actually the full backup of the repository, so if any server crashes, then any cloned repository can be used to restore it
After choosing the right system for your needs, you'll need to setup the repository which is the core of every version control system
All this is explained in the following article: http://solutioncenter.apexsql.com/sql-server-source-control-part-i-understanding-source-control-basics/
After setting up a repository, and in case of a central version control system a working folder, you can read this article. It shows how to setup source control in a development environment using:
SQL Server Management Studio via the MSSCCI provider,
Visual Studio and SQL Server Data Tools
A 3rd party tool ApexSQL Source Control

Here at Red Gate we offer a tool, SQL Source Control, which uses SQL Compare technology to link your database with a TFS or SVN repository. This tool integrates into SSMS and lets you work as you would normally, except it now lets you commit the objects.
For a migrations-based approach (more suited for automated deployments), we offer SQL Change Automation (formerly called ReadyRoll), which creates and manages a set of incremental scripts as a Visual Studio project.
In SQL Source Control it is possible to specify static data tables. These are stored in source control as INSERT statements.
If you're talking about test data, we'd recommend that you either generate test data with a tool or via a post-deployment script you define, or you simply restore a production backup to the dev environment.

You might want to look at Liquibase (http://www.liquibase.org/). Even if you don't use the tool itself it handles the concepts of database change management or refactoring pretty well.

+1 for everyone who's recommended the RedGate tools, with an additional recommendation and a caveat.
SqlCompare also has a decently documented API: so you can, for instance, write a console app which syncs your source controlled scripts folder with a CI integration testing database on checkin, so that when someone checks in a change to the schema from their scripts folder it's automatically deployed along with the matching application code change. This helps close the gap with developers who are forgetful about propagating changes in their local db up to a shared development DB (about half of us, I think :) ).
A caveat is that with a scripted solution or otherwise, the RedGate tools are sufficiently smooth that it's easy to forget about SQL realities underlying the abstraction. If you rename all the columns in a table, SqlCompare has no way to map the old columns to the new columns and will drop all the data in the table. It will generate warnings but I've seen people click past that. There's a general point here worth making, I think, that you can only automate DB versioning and upgrade so far - the abstractions are very leaky.

With VS 2010, use the Database project.
Script out your database
Make changes to scripts or directly on
your db server
Sync up using Data >
Schema Compare
Makes a perfect DB versioning solution, and makes syncing DB's a breeze.

We use DBGhost to manage our SQL database. Then you put your scripts to build a new database in your version control, and it'll either build a new database, or upgrade any existing database to the schema in version control. That way you don't have to worry about creating change scripts (although you can still do that, if for example you want to change the data type of a column and need to convert data).

It is a good approach to save database scripts into version control with change scripts so that you can upgrade any one database you have. Also you might want to save schemas for different versions so that you can create a full database without having to apply all the change scripts. Handling the scripts should be automated so that you don't have to do manual work.
I think its important to have a separate database for every developer and not use a shared database. That way the developers can create test cases and development phases independently from other developers.
The automating tool should have means for handling database metadata, which tells what databases are in what state of development and which tables contain version controllable data and so on.

You could also look at a migrations solution. These allow you to specify your database schema in C# code, and roll your database version up and down using MSBuild.
I'm currently using DbUp, and it's been working well.

You didn't mention any specifics about your target environment or constraints, so this may not be entirely applicable... but if you're looking for a way to effectively track an evolving DB schema and aren't adverse to the idea of using Ruby, ActiveRecord's migrations are right up your alley.
Migrations programatically define database transformations using a Ruby DSL; each transformation can be applied or (usually) rolled back, allowing you to jump to a different version of your DB schema at any given point in time. The file defining these transformations can be checked into version control like any other piece of source code.
Because migrations are a part of ActiveRecord, they typically find use in full-stack Rails apps; however, you can use ActiveRecord independent of Rails with minimal effort. See here for a more detailed treatment of using AR's migrations outside of Rails.

Every database should be under source-code control. What is lacking is a tool to automatically script all database objects - and "configuration data" - to file, which then can be added to any source control system. If you are using SQL Server, then my solution is here : http://dbsourcetools.codeplex.com/ . Have fun.
- Nathan.

It's simple.
When the base project is ready then you must create full database script. This script is commited to SVN. It is first version.
After that all developers creates change scripts (ALTER..., new tables, sprocs, etc).
When you need current version then you should execute all new change scripts.
When app is released to production then you go back to 1 (but then it will be successive version of course).
Nant will help you to execute those change scripts. :)
And remember. Everything works fine when there is discipline. Every time when database change is commited then corresponding functions in code are commited too.

Because our app has to work across multiple RDBMSs, we store our schema definition in version control using the database-neutral Torque format (XML). We also version-control the reference data for our database in XML format as follows (where "Relationship" is one of the reference tables):
<Relationship RelationshipID="1" InternalName="Manager"/>
<Relationship RelationshipID="2" InternalName="Delegate"/>
etc.
We then use home-grown tools to generate the schema upgrade and reference data upgrade scripts that are required to go from version X of the database to version X + 1.

If you have a small database and you want to version the entire thing, this batch script might help. It detaches, compresses, and checks a MSSQL database MDF file in to Subversion.
If you mostly want to version your schema and just have a small amount of reference data, you can possibly use SubSonic Migrations to handle that. The benefit there is that you can easily migrate up or down to any specific version.

We don't store the database schema, we store the changes to the database. What we do is store the schema changes so that we build a change script for any version of the database and apply it to our customer's databases. I wrote an database utility app that gets distributed with our main application that can read that script and know which updates need to be applied. It also has enough smarts to refresh views and stored procedures as needed.

To make the dump to a source code control system that little bit faster, you can see which objects have changed since last time by using the version information in sysobjects.
Setup: Create a table in each database you want to check incrementally to hold the version information from the last time you checked it (empty on the first run). Clear this table if you want to re-scan your whole data structure.
IF ISNULL(OBJECT_ID('last_run_sysversions'), 0) <> 0 DROP TABLE last_run_sysversions
CREATE TABLE last_run_sysversions (
name varchar(128),
id int, base_schema_ver int,
schema_ver int,
type char(2)
)
Normal running mode: You can take the results from this sql, and generate sql scripts for just the ones you're interested in, and put them into a source control of your choice.
IF ISNULL(OBJECT_ID('tempdb.dbo.#tmp'), 0) <> 0 DROP TABLE #tmp
CREATE TABLE #tmp (
name varchar(128),
id int, base_schema_ver int,
schema_ver int,
type char(2)
)
SET NOCOUNT ON
-- Insert the values from the end of the last run into #tmp
INSERT #tmp (name, id, base_schema_ver, schema_ver, type)
SELECT name, id, base_schema_ver, schema_ver, type FROM last_run_sysversions
DELETE last_run_sysversions
INSERT last_run_sysversions (name, id, base_schema_ver, schema_ver, type)
SELECT name, id, base_schema_ver, schema_ver, type FROM sysobjects
-- This next bit lists all differences to scripts.
SET NOCOUNT OFF
--Renamed.
SELECT 'renamed' AS ChangeType, t.name, o.name AS extra_info, 1 AS Priority
FROM sysobjects o INNER JOIN #tmp t ON o.id = t.id
WHERE o.name <> t.name /*COLLATE*/
AND o.type IN ('TR', 'P' ,'U' ,'V')
UNION
--Changed (using alter)
SELECT 'changed' AS ChangeType, o.name /*COLLATE*/,
'altered' AS extra_info, 2 AS Priority
FROM sysobjects o INNER JOIN #tmp t ON o.id = t.id
WHERE (
o.base_schema_ver <> t.base_schema_ver
OR o.schema_ver <> t.schema_ver
)
AND o.type IN ('TR', 'P' ,'U' ,'V')
AND o.name NOT IN ( SELECT oi.name
FROM sysobjects oi INNER JOIN #tmp ti ON oi.id = ti.id
WHERE oi.name <> ti.name /*COLLATE*/
AND oi.type IN ('TR', 'P' ,'U' ,'V'))
UNION
--Changed (actually dropped and recreated [but not renamed])
SELECT 'changed' AS ChangeType, t.name, 'dropped' AS extra_info, 2 AS Priority
FROM #tmp t
WHERE t.name IN ( SELECT ti.name /*COLLATE*/ FROM #tmp ti
WHERE NOT EXISTS (SELECT * FROM sysobjects oi
WHERE oi.id = ti.id))
AND t.name IN ( SELECT oi.name /*COLLATE*/ FROM sysobjects oi
WHERE NOT EXISTS (SELECT * FROM #tmp ti
WHERE oi.id = ti.id)
AND oi.type IN ('TR', 'P' ,'U' ,'V'))
UNION
--Deleted
SELECT 'deleted' AS ChangeType, t.name, '' AS extra_info, 0 AS Priority
FROM #tmp t
WHERE NOT EXISTS (SELECT * FROM sysobjects o
WHERE o.id = t.id)
AND t.name NOT IN ( SELECT oi.name /*COLLATE*/ FROM sysobjects oi
WHERE NOT EXISTS (SELECT * FROM #tmp ti
WHERE oi.id = ti.id)
AND oi.type IN ('TR', 'P' ,'U' ,'V'))
UNION
--Added
SELECT 'added' AS ChangeType, o.name /*COLLATE*/, '' AS extra_info, 4 AS Priority
FROM sysobjects o
WHERE NOT EXISTS (SELECT * FROM #tmp t
WHERE o.id = t.id)
AND o.type IN ('TR', 'P' ,'U' ,'V')
AND o.name NOT IN ( SELECT ti.name /*COLLATE*/ FROM #tmp ti
WHERE NOT EXISTS (SELECT * FROM sysobjects oi
WHERE oi.id = ti.id))
ORDER BY Priority ASC
Note: If you use a non-standard collation in any of your databases, you will need to replace /* COLLATE */ with your database collation. i.e. COLLATE Latin1_General_CI_AI

I wrote this app a while ago, http://sqlschemasourcectrl.codeplex.com/ which will scan your MSFT SQL db's as often as you want and automatically dump your objects (tables, views, procs, functions, sql settings) into SVN. Works like a charm. I use it with Unfuddle (which allows me to get alerts on checkins)

The typical solution is to dump the database as necessary and backup those files.
Depending on your development platform, there may be opensource plugins available. Rolling your own code to do it is usually fairly trivial.
Note: You may want to backup the database dump instead of putting it into version control. The files can get huge fast in version control, and cause your entire source control system to become slow (I'm recalling a CVS horror story at the moment).

It's a very old question, however, many people are trying to solve this even now. All they have to do is to research about Visual Studio Database Projects. Without this, any database development looks very feeble. From code organization to deployment to versioning, it simplifies everything.

We needed to version our SQL database after we migrated to an x64 platform and our old version broke with the migration. We wrote a C# application which used SQLDMO to map out all of the SQL objects to a folder:
Root
ServerName
DatabaseName
Schema Objects
Database Triggers*
.ddltrigger.sql
Functions
..function.sql
Security
Roles
Application Roles
.approle.sql
Database Roles
.role.sql
Schemas*
.schema.sql
Users
.user.sql
Storage
Full Text Catalogs*
.fulltext.sql
Stored Procedures
..proc.sql
Synonyms*
.synonym.sql
Tables
..table.sql
Constraints
...chkconst.sql
...defconst.sql
Indexes
...index.sql
Keys
...fkey.sql
...pkey.sql
...ukey.sql
Triggers
...trigger.sql
Types
User-defined Data Types
..uddt.sql
XML Schema Collections*
..xmlschema.sql
Views
..view.sql
Indexes
...index.sql
Triggers
...trigger.sql
The application would then compare the newly written version with the version stored in SVN, and if there were differences it would update SVN.
We determined that running the process once a night was sufficient since we did not make that many changes to SQL. It allows us to track changes to all the objects we care about plus it allows us to rebuild our full schema in the event of a serious problem.

I agree with ESV answer and for that exact reason I started a little project a while back to help maintain database updates in a very simple file which could then be maintained a long side out source code. It allows easy updates to developers as well as UAT and Production. The tool works on SQL Server and MySQL.
Some project features:
Allows schema changes
Allows value tree population
Allows separate test data inserts for eg. UAT
Allows option for rollback (not automated)
Maintains support for SQL server and MySQL
Has the ability to import your existing database into version control with one simple command (SQL server only ... still working on MySQL)
Please check out the code for some more information.

We just started using Team Foundation Server. If your database is medium sized, then visual studio has some nice project integrations with built in compare, data compare, database refactoring tools, database testing framework, and even data generation tools.
But, that model doesn't fit very large or third party databases (that encrypt objects) very well. So, what we've done is to store only our customized objects. Visual Studio / Team foundation server works very well for that.
TFS Database chief arch. blog
MS TFS site

I'm also using a version in the database stored via the database extended properties family of procedures. My application has scripts for each version step (ie. move from 1.1 to 1.2). When deployed, it looks at the current version and then runs the scripts one by one until it reaches the last app version. There is no script that has the straight 'final' version, even deploy on a clean DB does the deploy via a series of upgrade steps.
Now what I like to add is that I've seen two days ago a presentation on the MS campus about the new and upcoming VS DB edition. The presentation was focused specifically on this topic and I was blown out of the water. You should definitely check it out, the new facilities are focused on keeping schema definition in T-SQL scripts (CREATEs), a runtime delta engine to compare deployment schema with defined schema and doing the delta ALTERs and integration with source code integration, up to and including MSBUILD continuous integration for automated build drops. The drop will contain a new file type, the .dbschema files, that can be taken to the deployment site and a command line tool can do the actual 'deltas' and run the deployment.
I have a blog entry on this topic with links to the VSDE downloads, you should check them out: http://rusanu.com/2009/05/15/version-control-and-your-database/

A while ago I found a VB bas module that used DMO and VSS objects to get an entire db scripted off and into VSS. I turned it into a VB Script and posted it here. You can easily take out the VSS calls and use the DMO stuff to generate all the scripts, and then call SVN from the same batch file that calls the VBScript to check them in.

In my experience the solution is twofold:
You need to handle changes to the development database that are done by multiple developers during development.
You need to handle database upgrades in customers sites.
In order to handle #1 you'll need a strong database diff/merge tool. The best tool should be able to perform automatic merge as much as possible while allowing you to resolve unhandled conflicts manually.
The perfect tool should handle merge operations by using a 3-way merge algorithm that brings into account the changes that were made in the THEIRS database and the MINE database, relative to the BASE database.
I wrote a commercial tool that provides manual merge support for SQLite databases and I'm currently adding support for 3-way merge algorithm for SQLite. Check it out at http://www.sqlitecompare.com
In order to handle #2 you will need an upgrade framework in place.
The basic idea is to develop an automatic upgrade framework that knows how to upgrade from an existing SQL schema to the newer SQL schema and can build an upgrade path for every existing DB installation.
Check out my article on the subject in http://www.codeproject.com/KB/database/sqlite_upgrade.aspx to get a general idea of what I'm talking about.
Good Luck
Liron Levi

Check out DBGhost http://www.innovartis.co.uk/. I have used in an automated fashion for 2 years now and it works great. It allows our DB builds to happen much like a Java or C build happens, except for the database. You know what I mean.

I would suggest using comparison tools to improvise a version control system for your database. Two good alternatives are xSQL Schema Compare and xSQL Data Compare.
Now, if your goal is to have only the database's schema under version control you can simply use xSQL Schema Compare to generate xSQL Snapshots of the schema and add these files in your version control. Then, to revert or update to a specific version, just compare the current version of the database with the snapshot for the destination version.
Also, if you want to have the data under version control as well, you can use xSQL Data Compare to generate change scripts for you database and add the .sql files in your version control. You could then execute these scripts to revert / update to any version you want. Keep in mind that for the 'revert' functionality you need to generate change scripts that, when executed, will make Version 3 the same as Version 2 and for the 'update' functionality, you need to generate change scripts that do the opposite.
Lastly, with some basic batch programming skills you can automate the whole process by using the command line versions of xSQL Schema Compare and xSQL Data Compare
Disclaimer: I'm affiliated to xSQL.

An alternative to version controlling your database is to use a version-controlled database, of which there are now several.
https://www.dolthub.com/blog/2021-09-17-database-version-control/
These products don't apply version control on top of another type of database -- they are their own database engines that support version control operations. So you need to migrate to them or start building on them in the first place.
I write one of them, DoltDB, which combines the interfaces of MySQL and Git. Check it out here:
https://github.com/dolthub/dolt

Related

How to run raw SQL to deploy database changes

We intend to create DACPAC files using SQL database projects and distribute them automatically to several environments, DEV/QA/PROD, using Azure Pipeline. I can make changes to the schema for a table, view, function, or procedure, but I'm not sure how we can update specific data in a table. I am sure this is very common use case but unfortunately I am having hard time implementing it.
Any idea how can I automate creating/updating/deleting a row for a table?
E.g.: update myTable set myColumn = 5 where someColumn = 'condition'
In your database project you can add a Post Deployment Script
Do not. Seriously. I found DACPAC always to be WAY too limiting for serious operations. Look how the SQL is generated and - realize how little control you have.
The standard approach is to have deployment scripts that you generate and that do the changes in the database, plus a table in the db tracking which have executed (possibly with a checksum so you do not need t change the name to update them).
You can easily generate them partially by schema compare (and then generate the change script), but those also allow you to do things like data scrubbing and multi step transformations that DACPAC by design cannot efficiently and easily do.
There are plenty of frameworks for this around. They generally belong in the category of developer tools.

Is it possible to automatically generate migration scripts by comparing db and code?

I’m seriously confused about how flyway generally works to maintain the db as a code. Suppose I have the following V0 script:
Create table student(
Name varchar(25)
)
That would be my initial db. Now, suppose I want to add a new column, why am I being forced to do a V1 script like this one?
Alter table student add column surname varchar(25)
What I’d like to do would be to simply update the v0 script like this:
Create table student(
Name varchar(25),
Surname varchar(25)
)
Then the tool, by comparing the actual db, should be able to understand that a new column should be created!
This is as other code (java, javasctipt,..) tools work and the same I would it like to be for db as a code tools.
So my question is: is there a way to achieve this behavior without dropping/recreating the db?
I tagged this question with flyway and liquibase tools but feel free to suggest other tools that would fit my needs.
Whatever way you develop the database,there is no way to achieve this behavior without dropping/recreating the db, because the CREATE TABLE statement assumes that the table that you specify isn't already there. You can't use a CREATE OR ALTER statement because these aren't supported for tables even where the RDBMS that you use supports that syntax.
In the early stages of a database project, you can work very much quicker with a build script that you use to create a database with tables, views and so on. You can then insert some data, try it out, run a few tests, maybe and then tear it down. Flyway community supports this: you just have a single migration script starting from an empty database that you repeatedly 'clean' and 'migrate', until you reach your first version. Flyway takes care of the 'Tear-down' process. and give you a fresh start, by wiping your configured schemas completely clean.
Flyway Teams supports a special type of migration, the 'repeatable' that allows you to use for migrations SQL files that you can alter. However, you would need to add logic that deletes the table if it already exists before it executes your CREATE TABLE statement. It avoids having to 'Flyway clean', but it is a lot of extra work. It also means that you lose the whole advantage of a version representing an exact state of a database.
At some point, you are going to use migrations because you're likely to have copies of the database to keep up-to-date. Whatever tool you use to update a development or producton database, you are going to need to use a migration for this because of the existing data in tables.
Flyway Enterprise supports the automatic generation of a migration, if you are using Oracle or SQL Server. SQL Compare is provided to compare two versions of a database and produce a migration script from one version to the next. This allows you to use a build script as you suggest, compare it with the current version of the database, and generate a migration script to get from the one to the other.

Python - extracting a SQL Server database schema to a file

Often I need to extract the complete schema of an existing SQL Server DB to a file. I need to cover every object -- tables, views, functions, SPs, UDDTs, triggers, etc. The purpose is so that I can then use a file-diff utility to compare that schema to a baseline reference.
Normally I use Enterprise Manager or Management Studio to script out the DB objects and then concatenate those files to make one big file in a consistent predictable order. I was wondering whether there's a way to accomplish this task in Python? Obviously it'd take an additional package, but having looked at a few (pyodbc, SQLAlchemy, SQLObject), none of them seem really suited to this use case.
If you can connect to SQL Server and run queries in Python then yes – it’s possible but it will take a lot of effort and testing to get it to work correctly.
Idea is to use system tables to get details about each object and then generate DDL statements based on this. Some if not all DDL statements already exist in sys.syscomments table.
Start off by executing and examining this in SSMS before you start working in Python.
select *
from sys.tables
select *
from sys.all_columns
select *
from sys.views
select *
from sys.syscomments
All system tables documentation from MSDN.
I've used this PowerShell strategy in the past. Obviously, that isn't Python, but it is a script you can write then execute from within Python. Give this article a read as it may be your easiest (and cheapest) solution: http://blogs.technet.com/b/heyscriptingguy/archive/2010/11/04/use-powershell-to-script-sql-database-objects.aspx
As a disclaimer, I was only exporting stored procedures, not every single object.

Updating client SQL Server database structure from text file

We have a "master database structure", and need a routine to keep the database structure on client sites up-to-date.
A number of suggestions have been given to a related question, but I am looking for a more specific solution, along these lines:
I would like to generate a text file (XML or other readable format) which describes the entire database structure (this could go into version control). This routine will run in-house, to provide a database schema file to be distributed with the next version of our product.
Then I need a way to update the database structure on the client site so that it corresponds to the master database structure. (In other words, I don't want to have to keep track of numerous change scripts for different versions of the database structure, but a more general routine which can get the client database structure updated to the current master database structure.)
So the main feature I'm looking for could be described as "database structure to text" and "text to database structure".
There are a whole lot of diff tools that can give you schema and stored procedures and constraint differences between two databases. You could roll your own, but I think it would be more expensive than one of these tools if you have a complex schema, many give a free trial so you can test.
The problem is you'd have to have the master database online to do so though and accessible from the client database installation (or install it there) which might or might not be feasible.
If you won't do that, the only other sane option I can think of is to use the migration idea, keep a list of SQL scripts + version pairs, plus current version on each database. This could be consolidated by a different tool that could generate a single script from a the client's database version number and the list of changes. And if you haven't the list of changes, you can start with a diff tool run, and keep track of them from there.
The comparing text route (comparing text SQL dumps of both schemas) you seem to prefer looks very hard to do it right and automatically to me, doesn't look like the right path to take.
Several popular strategies are variants of this:
Add a table to the database:
CREATE TABLE Release
(release_number int not null,
applied datetime not null
)
Each release, as part of its release script inserts a row into this table.
You can now find out with a single query which release each client is running, and run all the releases between that one and the release they want to be running.
In addition, you could check that their schema is correct for each version (correct table names, columns, etc.) by doing something like this:
SELECT so.name,
sc.name
FROM sysobjects so,
syscolumns sc
WHERE type = 'U'
ORDER BY 1, 2
then calculate a hash of the result and compare it with a pre-computed hash (generated by running the query on your reference installation) to see if the installation is now correct.

Stored procedures/DB schema in source control

Do you guys keep track of stored procedures and database schema in your source control system of choice?
When you make a change (add a table, update an stored proc, how do you get the changes into source control?
We use SQL Server at work, and I've begun using darcs for versioning, but I'd be curious about general strategies as well as any handy tools.
Edit: Wow, thanks for all the great suggestions, guys! I wish I could select more than one "Accepted Answer"!
We choose to script everything, and that includes all stored procedures and schema changes. No wysiwyg tools, and no fancy 'sync' programs are necessary.
Schema changes are easy, all you need to do is create and maintain a single file for that version, including all schema and data changes. This becomes your conversion script from version x to x+1. You can then run it against a production backup and integrate that into your 'daily build' to verify that it works without errors. Note it's important not to change or delete already written schema / data loading sql as you can end up breaking any sql written later.
-- change #1234
ALTER TABLE asdf ADD COLUMN MyNewID INT
GO
-- change #5678
ALTER TABLE asdf DROP COLUMN SomeOtherID
GO
For stored procedures, we elect for a single file per sproc, and it uses the drop/create form. All stored procedures are recreated at deployment. The downside is that if a change was done outside source control, the change is lost. At the same time, that's true for any code, but your DBA'a need to be aware of this. This really stops people outside the team mucking with your stored procedures, as their changes are lost in an upgrade.
Using Sql Server, the syntax looks like this:
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[usp_MyProc]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)
drop procedure [usp_MyProc]
GO
CREATE PROCEDURE [usp_MyProc]
(
#UserID INT
)
AS
SET NOCOUNT ON
-- stored procedure logic.
SET NOCOUNT OFF
GO
The only thing left to do is write a utility program that collates all the individual files and creates a new file with the entire set of updates (as a single script). Do this by first adding the schema changes then recursing the directory structure and including all the stored procedure files.
As an upside to scripting everything, you'll become much better at reading and writing SQL. You can also make this entire process more elaborate, but this is the basic format of how to source-control all sql without any special software.
addendum: Rick is correct that you will lose permissions on stored procedures with DROP/CREATE, so you may need to write another script will re-enable specific permissions. This permission script would be the last to run. Our experience found more issues with ALTER verses DROP/CREATE semantics. YMMV
create a "Database project" in Visual Studio to write and manage your sQL code and keep the project under version control together with the rest of your solution.
The solution we used at my last job was to number the scripts as they were added to source control:
01.CreateUserTable.sql
02.PopulateUserTable
03.AlterUserTable.sql
04.CreateOrderTable.sql
The idea was that we always knew which order to run the scripts, and we could avoid having to manage data integrity issues that might arise if you tried modifying script #1 (which would presumable cause the INSERTs in #2 to fail).
One thing to keep in mind with your drop/create scripts in SQL Server is that object-level permissions will be lost. We changed our standard to use ALTER scripts instead, which maintains those permissions.
There are a few other caveats, like the fact that dropping an object drops the dependency records used by sp_depends, and creating the object only creates the dependencies for that object. So if you drop/create a view, sp_depends will no longer know of any objects referencing that view.
Moral of the story, use ALTER scripts.
I agree with (and upvote) Robert Paulson's practice. That is assuming you are in control of a development team with the responsibility and discipline to adhere to such a practice.
To "force" that onto my teams, our solutions maintain at least one database project from Visual Studio Team Edition for Database Professionals. As with other projects in the solution, the database project gets versioned control. It makes it a natural development process to break the everything in the database into maintainable chunks, "disciplining" my team along the way.
Of course, being a Visual Studio project, it is no where near perfect. There are many quirks you will run into that may frustrate or confuse you. It takes a fair bit of understanding how the project works before getting it to accomplish your tasks. Examples include
deploying data from CSV files.
selective deployment of test data based on build type.
Visual Studio crashing on comparing with databases with certain type of CLR assembly embedded within.
no means of differntiation between test/production databases that implement different authentication schemes - SQL users vs Active Directory users.
But for teams who don't have a practice of versioning their database objects, this is a good start. The other famous alternative is of course, Red Gate's suite of SQL Server products, which most people who use them consider superior to Microsoft's offering.
I think you should write a script which automatically sets up your database, including any stored procedures. This script should then be placed in source control.
Couple different perspectives from my experience. In the Oracle world, everything was managed by "create" DDL scripts. As ahockley mentioned, one script for each object. If the object needs to change, its DDL script is modified. There's one wrapper script that invokes all the object scripts so that you can deploy the current DB build to whatever environment you want. This is for the main core create.
Obviously in a live application, whenever you push a new build that requires, say, a new column, you're not going to drop the table and create it new. You're going to do an ALTER script and add the column. So each time this kind of change needs to happen, there are always two things to do: 1) write the alter DDL and 2) update the core create DDL to reflect the change. Both go into source control, but the single alter script is more of a momentary point in time change since it will only be used to apply a delta.
You could also use a tool like ERWin to update the model and forward generate the DDL, but most DBAs I know don't trust a modeling tool to gen the script exactly the way they want. You could also use ERWin to reverse engineer your core DDL script into a model periodically, but that's a lot of fuss to get it to look right (every blasted time you do it).
In the Microsoft world, we employed a similar tactic, but we used the Red Gate product to help manage the scripts and deltas. Still put the scripts in source control. Still one script per object (table, sproc, whatever). In the beginning, some of the DBAs really preferred using the SQL Server GUI to manage the objects rather than use scripts. But that made it very difficult to manage the enterprise consistently as it grew.
If the DDL is in source control, it's trivial to use any build tool (usually ant) to write a deployment script.
I've found that by far, the easiest, fastest and safest way to do this is to just bite the bullet and use SQL Source Control from RedGate. Scripted and stored in the repository in a matter of minutes. I just wish that RedGate would look at the product as a loss leader so that it could get more widespread use.
Similar to Robert Paulson, above, our organization keeps the database under source control. However, our difference is that we try to limit the number of scripts we have.
For any new project, there's a set procedure. We have a schema creation script at version 1, a stored proc creation script and possibly an initial data load creation script. All procs are kept in a single, admittedly massive file. If we're using Enterprise Library, we include a copy of the creation script for logging; if it's an ASP.NET project using the ASP.NET application framework (authentication, personalization, etc.), we include that script as well. (We generated it from Microsoft's tools, then tweaked it until it worked in a replicable fashion across different sites. Not fun, but a valuable time investment.)
We use the magic CTRL+F to find the proc we like. :) (We'd love it if SQL Management Studio had code navigation like VS does. Sigh!)
For subsequent versions, we usually have upgradeSchema, upgradeProc and/or updateDate scripts. For schema updates, we ALTER tables as much as possible, creating new ones as needed. For proc updates, we DROP and CREATE.
One wrinkle does pop up with this approach. It's easy to generate a database, and it's easy to get a new one up to speed on the current DB version. However, care has to be taken with DAL generation (which we currently -- usually -- do with SubSonic), to ensure that DB/schema/proc changes are synchronized cleanly with the code used to access them. However, in our build paths is a batch file which generates the SubSonic DAL, so it's our SOP to checkout the DAL code, re-run that batch file, then check it all back in anytime the schema and/or procs change. (This, of course, triggers a source build, updating shared dependencies to the appropriate DLLs ... )
In past experiences, I've kept database changes source controlled in such a way that for each release of the product any database changes were always scripted out and stored in the release that we're working on. The build process in place would automatically bring the database up to the current version based on a table in the database that stored the current version for each "application". A custom .net utility application we wrote would then run and determine the current version of the database, and run any new scripts against it in order of the prefix numbers of the scripts. Then we'd run unit tests to make sure everything was all good.
We'd store the scripts in source control as follows (folder structure below):
I'm a little rusty on current naming conventions on tables and stored procedures so bare with my example...
[root]
[application]
[version]
[script]
\scripts
MyApplication\
1.2.1\
001.MyTable.Create.sql
002.MyOtherTable.Create.sql
100.dbo.usp.MyTable.GetAllNewStuff.sql
With the use of a Versions table that would take into account the Application and Version the application would restore the weekly production backup, and run all the scripts needed against the database since the current version. By using .net we were easily able to package this into a transaction and if anything failed we would rollback, and send emails out, so we knew that release had bad scripts.
So, all developers would make sure to maintain this in source control so the coordinated release would make sure that all the scripts we plan to run against the database would run successfully.
This is probably more information than you were looking for, but it worked very well for us and given the structure it was easy to get all developers on board.
When release day came around the operations team would follow the release notes and pick up the scripts from source control and run the package against the database with the .net application we used during the nightly build process which would automatically package the scripts in transactions so if something failed it would automatically roll back and no impact to the database was made.
Stored procedures get 1 file per sp with the standard if exist drop/create statements at the top. Views and functions also get their own files so they are easier to version and reuse.
Schema is all 1 script to begin with then we'll do version changes.
All of this is stored in a visual studio database project connected to TFS (# work or VisualSVN Server # home for personal stuff) with a folder structure as follows:
- project
-- functions
-- schema
-- stored procedures
-- views
At my company, we tend to store all database items in source control as individual scripts just as you would for individual code files. Any updates are first made in the database and then migrated into the source code repository so a history of changes is maintained.
As a second step, all database changes are migrated to an integration database. This integration database represents exactly what the production database should look like post deployment. We also have a QA database which represents the current state of production (or the last deployment). Once all changes are made in the Integration database, we use a schema diff tool (Red Gate's SQL Diff for SQL Server) to generate a script that will migrate all changes from one database to the other.
We have found this to be fairly effective as it generates a single script that we can integrate with our installers easily. The biggest issue we often have is developers forgetting to migrate their changes into integration.
We keep stored procedures in source control.
Script everything (object creation, etc) and store those scripts in source control. How do the changes get there? It's part of the standard practice of how things are done. Need to add a table? Write a CREATE TABLE script. Update a sproc? Edit the stored procedure script.
I prefer one script per object.
For procs, write the procs with script wrappers into plain files, and apply the changes from those files. If it applied correctly, then you can check in that file, and you'll be able to reproduce it from that file as well.
For schema changes, you may need to check in scripts to incrementally make the changes you've made. Write the script, apply it, and then check it in. You can build a process then, to automatically apply each schema script in series.
We do keep stored procedures in source control. The way we (or at least I) do it is add a folder to my project, add a file for each SP and manually copy, paste the code into it. So when I change the SP, I manually need to change the file the source control.
I'd be interested to hear if people can do this automatically.
I highly recommend maintaining schema and stored procedures in source control.
Keeping stored procedures versioned allows them to be rolled back when determined to be problematic.
Schema is a less obvious answer depending on what you mean. It is very useful to maintain the SQL that defines your tables in source control, for duplicating environments (prod/dev/user etc.).
We have been using an alternative approach in my current project - we haven't got the db under source control but instead have been using a database diff tool to script out the changes when we get to each release.
It has been working very well so far.
We store everything related to an application in our SCM. The DB scripts are generally stored in their own project, but are treated just like any other code... design, implement, test, commit.
I run a job to script it out to a formal directory structure.
The following is VS2005 code, command line project, called from a batch file, that does the work. app.config keys at end of code.
It is based on other code I found online. Slightly a pain to set up, but works well once you get it working.
Imports Microsoft.VisualStudio.SourceSafe.Interop
Imports System
Imports System.Configuration
Module Module1
Dim sourcesafeDataBase As String, sourcesafeUserName As String, sourcesafePassword As String, sourcesafeProjectName As String, fileFolderName As String
Sub Main()
If My.Application.CommandLineArgs.Count > 0 Then
GetSetup()
For Each thisOption As String In My.Application.CommandLineArgs
Select Case thisOption.ToUpper
Case "CHECKIN"
DoCheckIn()
Case "CHECKOUT"
DoCheckOut()
Case Else
DisplayUsage()
End Select
Next
Else
DisplayUsage()
End If
End Sub
Sub DisplayUsage()
Console.Write(System.Environment.NewLine + "Usage: SourceSafeUpdater option" + System.Environment.NewLine + _
"CheckIn - Check in ( and adds any new ) files in the directory specified in .config" + System.Environment.NewLine + _
"CheckOut - Check out all files in the directory specified in .config" + System.Environment.NewLine + System.Environment.NewLine)
End Sub
Sub AddNewItems()
Dim db As New VSSDatabase
db.Open(sourcesafeDataBase, sourcesafeUserName, sourcesafePassword)
Dim Proj As VSSItem
Dim Flags As Integer = VSSFlags.VSSFLAG_DELTAYES + VSSFlags.VSSFLAG_RECURSYES + VSSFlags.VSSFLAG_DELNO
Try
Proj = db.VSSItem(sourcesafeProjectName, False)
Proj.Add(fileFolderName, "", Flags)
Catch ex As Exception
If Not ex.Message.ToString.ToLower.IndexOf("already exists") > 0 Then
Console.Write(ex.Message)
End If
End Try
Proj = Nothing
db = Nothing
End Sub
Sub DoCheckIn()
AddNewItems()
Dim db As New VSSDatabase
db.Open(sourcesafeDataBase, sourcesafeUserName, sourcesafePassword)
Dim Proj As VSSItem
Dim Flags As Integer = VSSFlags.VSSFLAG_DELTAYES + VSSFlags.VSSFLAG_UPDUPDATE + VSSFlags.VSSFLAG_FORCEDIRYES + VSSFlags.VSSFLAG_RECURSYES
Proj = db.VSSItem(sourcesafeProjectName, False)
Proj.Checkin("", fileFolderName, Flags)
Dim File As String
For Each File In My.Computer.FileSystem.GetFiles(fileFolderName)
Try
Proj.Add(fileFolderName + File)
Catch ex As Exception
If Not ex.Message.ToString.ToLower.IndexOf("access code") > 0 Then
Console.Write(ex.Message)
End If
End Try
Next
Proj = Nothing
db = Nothing
End Sub
Sub DoCheckOut()
Dim db As New VSSDatabase
db.Open(sourcesafeDataBase, sourcesafeUserName, sourcesafePassword)
Dim Proj As VSSItem
Dim Flags As Integer = VSSFlags.VSSFLAG_REPREPLACE + VSSFlags.VSSFLAG_RECURSYES
Proj = db.VSSItem(sourcesafeProjectName, False)
Proj.Checkout("", fileFolderName, Flags)
Proj = Nothing
db = Nothing
End Sub
Sub GetSetup()
sourcesafeDataBase = ConfigurationManager.AppSettings("sourcesafeDataBase")
sourcesafeUserName = ConfigurationManager.AppSettings("sourcesafeUserName")
sourcesafePassword = ConfigurationManager.AppSettings("sourcesafePassword")
sourcesafeProjectName = ConfigurationManager.AppSettings("sourcesafeProjectName")
fileFolderName = ConfigurationManager.AppSettings("fileFolderName")
End Sub
End Module
<add key="sourcesafeDataBase" value="C:\wherever\srcsafe.ini"/>
<add key="sourcesafeUserName" value="vssautomateuserid"/>
<add key="sourcesafePassword" value="pw"/>
<add key="sourcesafeProjectName" value="$/where/you/want/it"/>
<add key="fileFolderName" value="d:\yourdirstructure"/>
If you're looking for an easy, ready-made solution, our Sql Historian system uses a background process to automatically synchronizes DDL changes to TFS or SVN, transparent to anyone making changes on the database. In my experience, the big problem is maintaining the code in source control with what was changed on your server--and that's because usually you have to rely on people (developers, even!) to change their workflow and remember to check in their changes after they've already made it on the server. Putting that burden on a machine makes everyone's life easier.

Resources