How do you put an large existing database (schema) under source control? - sql-server

My DBA just lost some development work that he did on our development database. Poor fella. So naturally our manager asked him, at our status meeting, how this could happen and how we could avoid this happening in the future. "Source control could alleviate the problem" I suggested... The dba's response; "No, we just backup the server more often". Now I would like to help my DBA understand what source control is and how it fits together with a database schema and development on that schema.
Previously I've tried to explain him that there's nothing special about the source code behind tables and stored procedures and it should be in a source control system (TFS in this case). But he just didn't bite. Now, while this misap is in recent memory, I would like to take another stab at it.
So my question is, do you know of any good advice I could pass on to my DBA and maybe even a couple of resources explaining how you would go about migrating a DB schema to be under source control and find its proper place in the build and deployment processes?
A couple of facts about the environment:
Source Control on a TFS 2008 Server.
Database is a MS SQL server 2008 with >300 tables and >300 other objects (sprocs, triggers, functions etc.).
Clarification:
We have been using DB Ghost and other change management solutions on other projects with other DBAs, in the past. We even have the license for VS DB edition! The problem is getting the DBA to even think about this way of developing for the database. He's really old school (i.e. migrating changes manually from environment to environment), and unfortunately hes the only one who knows anything about this particular DB.

See how to version control sql server databases and Do you source control your databases, among many others. Or use the search page. Basically, your approach seems correct. Good luck persuading the DBA...

If you are using Visual Studio Team System, I recommend having a stab at their Database Edition (i think these days it comes with the Developer Edition if you are an MSDN Subscriber). What this will allow you to do is to script out all your schema, stored procs, views, triggers, etc and source control these. This should also make the dba more comfortable since he will be working with a "Database" version of the tool rather than the "Developer" version (naming can go a great lengths with people). As you make changes from Visual Studio, you can manage script changes as you work, and source control them.

If your company has an MSDN license, they can use the Visual Studio Database edition. There's a video tutorial of it here.
I have no power of purchase, so I don't know what the cost breakdowns are. But it has the capability of source controlling all the parts of a DB schema, and includes creating change-scripts as well as auto-deploying straight from VS if you want (I wouldn't recommend that).
In general though, it's pretty solid as a database source control option.

Source control for databases can be quite contentious. It's different to use source control for something that produces a binary because you can't lock the source: a stored proc is a row in a table and there is not single table to read to get a table definition.
Also, version to version is mostly a set of ALTER statements. You script out CREATEs and add them to source control. This makes it harder to use in cases like this.
To me, this is more a procedural error.
Why was the change not done from a script? Forget where the script lives, but why no reproducable and re-runnable script? Perhaps linked to the change tracking number? If the database is reset (loaded from prod) then how would the change have been re-applied to prepare for production. And other questions.
I believe in source control and we use it: but it has limits for database work.

First you are approaching this incorrectly. If the dba won't bite on Source Control and he is making errors that affect the system, the person you need to persuade is his boss.
If it helps, I'm from the old school too and I love having our database objects in source control. How nice to be able to revert one table without having to restore the whole database backup to a different location and then move the table. How much faster and simpler. How nice to be able to compare two different versions and see what changed. How nice to deploy a change and know exactly which database changes (say, for instance only twelve of the 23 possible ones) go with the part you are deploying and not some other unfinished project. How nice to know exactly which scripts were involved in a particular change you had to rollback. How nice that nobody is making on-the-fly changes on production since we now require all production changes to be from source control scripts. There are so many fewer errors and issues to worry about.
Yes it was a change in how we did business, but we did it through a policy change from on high so three was no argument and the dbas went through a couple of times and reverted any objects different from source control to the source control version, so now nobody will even think of doing a database change without it being in source control.

As the product manager for SQL Compare I've spoken to many 'traditional' DBAs who are uncomfortable with third party tools mainly because they have a system that works for them and sometimes changing can be difficult. There are many situations where I am convinced that they would benefit from our tools if only they gave them a chance. Frustrating.
One thing you might consider trying is Red Gate's upcoming tool, SQL Source Control. This is designed to build source control into SSMS, in other words it doesn't require DBAs to leave the comfort zone of their management environment. The bad news is that the tool hasn't been released yet. The good news is that we have an Early Access Program. Please visit the following link to find out more about the tool:
http://www.red-gate.com/Products/SQL_Source_Control/index.htm

you can't really put a large database under source control, so your DBA is right.
what you can do practically is to put your schema under source control, and maybe a few smallish 'configuration' tables.

One way to source control database is to store the data in and about the database separately
You can have the all the tables, procedures and function scripts as SQL files and add them to source control.
Export the database data as insert statements into SQL files, each with a fixed size. This is a cumbersome process as it would involve a lot of files that are to be tracked and controlled.
I am not sure if the VSS/SVN are able to read and keep history of changes to dump files created by the database backup options.

Its not clear from you question if you want to protect the data in the Db or the schemas in the Db. If the latter then you could identify all the important schemas and run an cron job that pulls the schema definitions from the Db and inserts them automatically into a source control system (perhaps even via triggers on the schemas??).
But this still just amounts to backing the system up more often. For what you envision you would need source control integrated with the Db tools and I don't know of any product that does that.
(and I shudder to think of VSS integrated into SQL management studio :-(( )

My answer to this same problem was to export all DB objects to text form (more than 136,000 of them) and then create the SourceSafe projects to hold them. Any New or changed objects in the DB now go to the SourceSafe structure, while unchanged are left alone.

Related

How do you deal with multiple developers and database changes?

I would like to know how you guys deal with development database changes in groups of 2 or more devs? Do you have a global db everyone access, maybe a local copy and manually apply script changes? It would be nice to see pros and cons that you've noticed for each approach and the number of devs in your team.
Start with "Evolutionary Database Design" by Martin Fowler. This sums it up nicely
There are have been other questions about DB development that may be useful too, for example Is RedGate SQL Source Control for me?
Our approach is that everyone has their own DB, the complete DB can be created from create scripts with base data if required. All the scripts required for this are in source control.
All scripts are CREATE scripts and they reflect the current state of the database schema. Upgrades are in separate SQL files which can upgrade existing DBs from a specific version to a newer one (run sequentially). After all the updates have been applied, the schema must be identical to what you would get from running the setup scripts.
We have some tools to do this (we use SQL Server and .NET):
Scripting is done with a tool which also applies a standard formatting so that the changes are well traceable with text diff tools (and by the SCM)
A runtime module takes care of comparing the existing DB objects, run updates if required, automatically apply "non-destructive" changes, then check the DB objects again to ensure a correct migration before committing the changes
The toolset is available as open-source project (licensed under LGPL), it's called the bsn ModuleStore (note that it is limited to SQL Server 2005/2008/Azure and to .NET for the runtime part).
We use what was code named "Data Dude" - the database features in TFS and Visual Studio - to deal with this. When you "get latest" and bring in code that relies on a schema change, you also bring in the revised schemas, stored procedures etc. You rigght-click the database project and Deploy; that gets your local schema and sp in sync but doesn't overwrite your data. The job of working out the script to get you from your old schema to the new one falls to Visual Studio, not to you or your DBA. We also have "populate" scripts for things like lists of provinces and a deploy runs them for you.
So much better than the old way which always fell apart at high stress times, with people checking in code then going home and nobody knowing what columns to add to make the code work etc.

Version Control / Code repository for SqlServer stored procedures and views

Any recommendations for a database equivalent of SVN or GIT for use to check stored procedures and views out and in, and provide version control?
I am interested in open source / free solutions, but if you have a commercial solution, preferably low cost, please let me know also.
I have looked at answers here that talk about adding a entire database backup to a code repository, or comparing records, but that is not what I am talking about here.
I would like to check out a stored procedure, and check it back in knowing that no one else has touched it in the meanwhile.
I would like to see what changes have been made to the stored procedures and views since the last time I worked on the data access layer (even if there is no impact on my code).
Our company uses Visual Studio Database Edition to manage our database schema (schema, not data). At this point, we would be lost without it. Our entire database schema is managed by Microsoft TFS, and is our "source of truth" with regards to what our schema looks like. It does much more than source control as well, including database validation, test data generation, refactoring etc.
Great tool.
We use liquibase which you can find here:
http://www.liquibase.org/
It is open and extensible. Published under the Apache 2.0 license.
and here is a tutorial on managing database schema changes with liquibase:
http://bytefilia.com/managing-database-schema-changes-liquibase-existing-schema/
You should have the repository in source control be the only source of the truth, with a periodic refresh of the database procedures (Continuous Integration) from the source control system. The user with write/create permissions for the stored procedures should be the one CI service runs under. This is the only way you can ensure no one is adding\changing objects in db rather than in the source of the truth (source control)
You can set up liquibase with SVN or GIT, and also any JDBC db, so SQL Server or most others as well.
Maybe not entirely what you had in mind - but it's worth reading this article:
http://www.codeproject.com/KB/architecture/Database_CI.aspx
or googling "continuous database integration".
Broadly speaking, as a schema can be defined in text, any source code repository can be used to store it. What you probably really need is to know which version you're currently looking at, whether it's older or newer than the one in production, and what changes have happened when. The TFS solution gives you a lot of that, but you can also roll your own - though you'll probably need to script the way database changes are managed in your various environments.

Sub version for database (I want something for data values in the database, not for the schema)

I am using github for maintaining versions and code synchronization.
We are team of two and we are located at different places.
How can we make sure that our databases are synchronized.
Update:--
I am rails developer. But these days i m working on drupal projects (where database is the center of variations). So i want to make sure that team must have a synchronized database. Also the values in various tables.
I need something which keep our data values synchronized.
Centralized database is a good solution. But things get disturbed when someone works offline
if you use visual studio then you can script your database tables, views, stored procedures and functions as .sql files from a database solution and then check those into version control as well - its what i currently do at my workplace
In you dont use visual studio then you can still script your sql as .sql files [but with more work] and then version control them as necessary
Have a look at Red Gate SQL Source Control - http://www.red-gate.com/products/SQL_Source_Control/
To be honest I've never used it, but their other software is fantastic. And if all you want to do is keep the DB schema in sync (rather than full source control) then I have used their SQL Compare product very succesfully in the past.
(ps. I don't work for them!)
You can use Sql Source Control together with Sql Data Compare to source control both: schema and data. Here is an article from redgate: Source controlling data.
These are some of the possibilities.
Using the same database. Set-up a central database where everybody can connect to. This way you are sure everybody uses the same database all the time.
After every change, export the database and commit it to the VCS. This option requires discipline and manual labor.
Use some kind of other definition of the schema. For example, Doctrine for php has the ability to build the database from a yaml definition which can be stored in the vcs. This can be easier automated then point 2.
Use some other software/script which updates the database.
I feel your pain. I had terrible trouble getting SQL Server to play nice with SVN. In the end I opted for a shared database solution. Every day I run an extensive script to backup all our schema definitions (specifically stored procedures) for version control into text files. Due to the limited number of changes this works well.
I now use this technique for our major project and personal projects too. The only negative is that it relies on being connected all the time. The other answers suggest that full database versioning is very time consuming and I tend to agree. For "live" upgrades we use the Red Gate tools, they do both schema and data compare and it works very well.
http://www.red-gate.com/products/SQL_Data_Compare/. We were using this tool for keeping databases in sync in our company. Later we had some specific demands so we had to write our own code for synchronization. Depends how complex is you database and how much changes is happening. It is much simpler if you have time when no one is working and you can lock database for syncronization.
Check out OffScale DataGrove.
This product tracks changes to the entire DB - schema and data. You can tag versions in any point in time, and return to older states of the DB with a simple command. It also allows you to create virtual, separate, copies of the same database so each team member can have his own separate DB. All the virtual copies are tracked into the same repository so it's super-easy to revert your DB to someone else's version (you simply check-out their version, just like you do with your source control). This means all your DBs can always be synchronized.
Regarding a centralized DB - just like you don't want to work on the same source code, you don't want to be working on the same DB. It means you'll constantly break each other's code and builds each time someone changes something in the DB.
I suggest that you go with a separate DB for each developer, and sync them using DataGrove.
Disclaimer - I work at OffScale :-)
Try Wizardby. This is my personal project, but I've used it in my several previous jobs with great deal of success.
Basically, it's a tool which lets you specify all changes to your database schema in a database-independent manner and then apply these changes to all your databases.

How to keep code base and database schema in synch?

So recently on a project I'm working on, we've been struggling to keep a solution's code base and the associated database schema in synch (Database = SQL Server 2008).
Database changes occur fairly regularly (adding columns, constraints, relationships, etc) and as a result it's not uncommon for people to do a 'Get Latest' from source control and
find that they also need to rebuild the database as well (and sometimes they forget to do the latter).
We're not using VSTS: Database Edition (DataDude) but the standard Visual Studio database project with a script (batch file) which tears down and recreates the database from T-SQL scripts. The solution is a .Net & ASP.net solution with LINQ to SQL underlying as the ORM.
Anyone have ideas on an approach to take (automated or not) which would keep everyone up to date with the latest database schema?
Continuous integration with MSBuild is an option, but only helps pick up any breaking changes committed, it doesn't really help in the scenario I highlighted above.
We are using Team Foundation Server, if that helps..
We try to work forward from the creation scripts.
i.e a change to the database is not authorised unless the script has been tested and checked into source control.
But this assumes that the database team is integrated with your app team which is usually not the case in a large project...
(I was tempted to answer this "with great difficulty")
EDIT: Tools won't help you if your process isn't right.
Ok although its not the entire solution, you should include an assertion in the Application code that links up to the database to assert the correct schema is being used, that way at least it becomes obvious, and you avoid silent bugs and people complaining that stuff went crazy all of the sudden.
As for the schema version, you could use some database specific functionality if available, but i personally prefer to declare a schema version table and keep the version number in there, that way its portable and can be checked with a simple select statement
have a look at DB Ghost - you can create a dbp using the scripter in seconds and then manage all your database code with the change manager. www.dbghost.com
This is exactly what DB Ghost was designed to handle.
We basically do things the way you are, with the generation script checked into source control as well. I'm the designated database master so all changes to the script itself are done through me. People send me scripts of the changes they have made, I update my master copy of the schema, run a generate scripts (SSMS) to produce the new DB script, and then check it in. I keep my copy of the code current with any changes that are being made elsewhere. We're a small shop so this works pretty well for us. I realize that it probably doesn't scale.
If you are not using Visual Studio Database Professional Edition, then you will need another tool that can break the database down into its elemental pieces so that they are managable and changeable in an easier manner.
I'd recommend seriously considering Redgate's SQL tools if you want to maintain sanity over all your database changes and updates.
SQL Packager
SQL Multi Script
SQL Refactor
Use a tool like RedGate SQL Compare to generate the change schema between any given version of the database. You can then check that file into source code control
Have a look at this question: dynamic patching of databases. I think it's similar enough to your problem to be helpful.
My solution to this problem is simple. Define everything as XML, and make sure that both the database, the ORM and the UI are generated from this XML, no exceptions. That way, you can use code generation tools to quickly regenerate the database creation script, which will alter your schema while (hopefully) preserving some data. It takes some effort to do, but the net result is well worth it.

Testing and Managing database versions against code versions

As you develop an application database changes inevitably pop up. The trick I find is keeping your database build in step with your code. In the past I have added a build step that executed SQL scripts against the target database but that is dangerous in so much as you could inadvertanly add bogus data or worse.
My question is what are the tips and tricks to keep the database in step with the code? What about when you roll back the code? Branching?
Version numbers embedded in the database are helpful. You have two choices, embedding values into a table (allows versioning multiple items) that can be queried, or having an explictly named object (such as a table or somesuch) you can test for.
When you release to production, do you have a rollback plan in the event of unexpected catastrophe? If you do, is it the application of a schema rollback script? Use your rollback script to rollback the database to a previous code version.
You should be able to create your database from scratch into a known state.
While being able to do so is helpful (especially in the early stages of a new project), many (most?) databases will quickly become far too large for that to be possible. Also, if you have any BLOBs then you're going to have problems generating SQL scripts for your entire database.
I've definitely been interested in some sort of DB versioning system, but I haven't found anything yet. So, instead of a solution, you'll get my vote. :-P
You really do want to be able to take a clean machine, get the latest version from source control, build in one step, and run all tests in one step. Making this fast makes you produce good software faster.
Just like external libraries, database configuration must also be in source control.
Note that I'm not saying that all your live database content should be in the same source control, just enough to get to a clean state. (Do back up your database content, though!)
Define your schema objects and your reference data in version-controlled text files. For example, you can define the schema in Torque format, and the data in DBUnit format (both use XML). You can then use tools (we wrote our own) to generate the DDL and DML that take you from one version of your app to another. Our tool can take as input either (a) the previous version's schema & data XML files or (b) an existing database, so you are always able to get a database of any state into the correct state.
I like the way that Django does it. You build models and the when you run a syncdb it applies the models that you have created. If you add a model you just need to run syncdb again. This would be easy to have your build script do every time you made a push.
The problem comes when you need to alter a table that is already made. I do not think that syncdb handles that. That would require you to go in and manually add the table and also add a property to the model. You would probably want to version that alter statement. The models would always be under version control though, so if you needed to you could get a db schema up and running on a new box without running the sql scripts. Another problem with this is keeping track of static data that you always want in the db.
Rails migration scripts are pretty nice too.
A DB versioning system would be great, but I don't really know of such a thing.
While being able to do so is helpful (especially in the early stages of a new project), many (most?) databases will quickly become far too large for that to be possible. Also, if you have any BLOBs then you're going to have problems generating SQL scripts for your entire database.
Backups and compression can help you there. Sorry - there's no excuse not to be able to get a a good set of data to develop against. Even if it's just a sub-set.
Put your database developments under version control. I recommend to have a look at neXtep designer :
http://www.nextep-softwares.com/wiki
It is a free GPL product which offers a brand new approach to database development and deployment by connecting version information with a SQL generation engine which could automatically compute any upgrade script you need to upgrade any version of your database into another. Any existing database could be version controlled by a reverse synchronization.
It currently supports Oracle, MySql and PostgreSql. DB2 support is under development. It is a full-featured database development environment where you always work on version-controlled elements from a repository. You can publish your updates by simple synchronization during development and you can generate exportable database deliveries which you will be able to execute on any targetted database through a standalone installer which validates the versions, performs structural checks and applies the upgrade scripts.
The IDE also offers you SQL editors, dependency management, support for modular database model components, data model diagrams, SQL clients and much more.
All the documentation and concepts could be found in the wiki.

Resources