Managing SQL Files in Source Control

Managing SQL Files in Source Control - sql-server

I work on a piece of software that has many tables, views, and stored procedures. Currently, to make it easy for developers to run all of the latest updates on their local databases and for ease of deployment of the software, we have a large Update.sql file. This creates tables and stored procedures that don't exists and adds/updates/removes data that needs to change. It is designed to be run over and over again without messing up someones database and only apply the changes that are needed. This is very convenient for the developers and for deployment.
However, I would really love to be able to split all of the database objects (tables, functions, stored procedures, back-fills/data updates) into separate scripts in source control. This would allow us to track changes to individual database objects instead of just one large SQL file.
Is there a good way to get the best of both worlds? Perhaps a free tool that can run all SQL files in a folder and all of its sub-folders? Or some batch script that can merge all of the individual files together into a single file after every check-in?
EDIT 10/27/2017: After reviewing some of the links that the answers have shared, I think this question comes down to finding a way to take the best parts of State based VS Migration based database update management. Here is an article that I think breaks down the differences and pros/cons pretty well, but I'll summarize the parts that I am focused on below
STATE BASED: This is what is used by Visual Studio SQL Server Projects. It is a snapshot of what the database should look like at the current version. Updates to servers are created by comparing the database to this snapshot and auto-generating scripts that will alter tables/views/SPs/etc. to be what they need to be.
Pros:
Version Control: Each database objects (table, stored procedure, etc.) is a separate script file. This makes tracking changes made to those objects over time very manageable because you can just view the source control history.
Compilation: If you are using Visual Studio SQL Server Projects, you can actually compile them and they will tell you if your references are all good. For instance, if you drop a column in the table and there was a stored procedure that references that column, this will tell you that the SP references a column that no longer exists so you can fix it.
Simple Deployment: You can use these projects that have hundreds of individual database object scripts and have it update a database either in Visual Studio using Publish or by compiling it and taking the DacPac that it made to SQL and updating it that way. So even though there are a bunch of individual files, after compiling it just comes down to one file that you work with in the end.
Cons:
Updating data: In the real-world, State-based updates often aren't viable. For example, let's say your Contacts table used to have a Full Name column. In version 2, you decide to split this into First Name and Last Name and drop the Full Name column. Normally you would write scripts to add the new columns, convert the data, and then drop the old column. However, state-based doesn't work that way, it will just drop the column and add the new ones, but not do anything to convert the data.
MIGRATION BASED: This is pretty much what we are currently doing, except in one really big file instead of several small files. You start with a base-line (which might be an empty database), and then you write one or more files that then alter that base-line to get it to the current version. For instance, Version1.sql might create the Contacts table with the Full Name column, then Version2.sql could create the First Name/Last Name columns, move the data, and then drop the old column. You can either use tools that only runs each script once in the right order or you can do what we've been doing and have a big script that has logic in it to know what things have been run and which haven't and only do what needs to be done.
Pros and Cons: This is basically the reverse of State-based. It gives you a lot of flexibility on how you create your scripts and the power to use real-world logic to update your database the way it needs to be instead of letting it automatically create drop/alter/insert/etc. scripts itself. Much like State-based, as long as you have the right tools, it is easy to deploy. However, it usually isn't very easy to track changes made to database objects overtime. If I want to see the full history of changes to a particular table, who did it, and when, there's not really an easy way to do this, because there is not a single file representing that database object with a Source Control history. Also, I haven't seen any tools that can take a Migration-based strategy and compile it to show you if the changes made have any reference issues.
SO, MY QUESTION IS: How can I keep the power, flexibility, and ease of use of Migration-based that we are currently using, but also get the best parts of State-based (Version Control and Compilation to check dependencies)? I'm up for some hybrid solution as long as it doesn't mean that my developers have to manage two things (like write a Migration script, but also don't forget to update the SQL project so we can track the history). If I could automate a SQL Project to update the database object scripts based on the migration that would be cool, but it would need to know who made the changes that caused the update and preferably what changeset it happened in.
Thoughts?

With sql server mamagement studio you can generate scripts to recreate the db - you could do tha and put those in your source versionning system.
Use "Tasks" , "Generate Scripts" and click through the options. You can use single objects to file.
As for Data ... I think there is some kind of checkbox to export the data as well - not sure though.
f.e. here: Want to create a script to export Data and tables and views to a sql script

I'm not sure of a free tool, but the solution to the below seems interesting...
Run all SQL files in a directory
What I WILL say about that is there are no transactions, so if one of your .sql scripts breaks, it is not going to roll back all of your creations. Other than that though, this should work fine.

Related

How to create a script for SQL Server database create / upgrade from any state

I need to create scripts for creating or updating a database. The scripts are created from my test Database or from my source control.
The script needs to upgrade a database from any version of my application to the current version so it needs to be agnostic to what already exists in the database.
I do not have access to the databases that will be upgraded.
e.g.
If a table does not exist the script should create it.
If the table exists the script should check if all the columns exist (And check their types).
I wrote a lot of this checking code in C# as in i have an SQL create table script and the C# code checks if the table (and columns) exists before running the script.
My code is not production ready and i wanted to know what ready made solutions are out there.

I have no experience with frameworks that can do this.
Such an inquiry is off-topic for SO anyway.
But depending on your demands, it may not be too hard to implement something yourself.
One straightforward approach would be to work with incremental schema changes; basically just a chronological list of SQL scripts.
Never change or delete existing script (unless something really bad is in there).
Instead, just keep adding upgrade scripts for every new version.
Yes, 15 years later you will have accumulated 5,000 scripts.
Trust me, it will be the least of your problems.
To create a new database, just execute the full chain of scripts in chronological order.
For upgrades, there are two possibilities.
Keep a progress list in every database.
That is basically just a table containing the names of all scripts that have already been executed there.
To upgrade, just execute every script that is not in that list already. Add them to the list as you go.
Note: if necessary, this can be done with one or more auto-generated, deployable, static T-SQL scripts.
Make every script itself responsible for recognizing whether or not it needs to do anything.
For example, a 'create table' script checks if the table already exists.
I would recommend a combination of the two:
option #1 for new versions (as it scales a lot better than #2)
option #2 for existing versions (as it may be hard to introduce #1 retroactively on legacy production databases)
Depending on how much effort you will put in your upgrade scripts, the 'option #2' part may be able to fix some schema issues in any given database.
In other words, make sure you start off with scripts that are capable of bringing messy legacy databases back in line with the schema dictated by your application.
Future scripts (the 'option #1' part) have less to worry about; they should trust the work done by those early scripts.
No, this approach is not resistant against outside interference, like a rogue sysadmin.
It will not magically fix a messed-up schema.
It's an illusion to think you can do that automatically, without somebody analyzing the problem.
Even if you have a tool that will recreate every missing column and table, that will not bring back the data that used to be in there.
And if you are not interested in recovering data, then you might as well discard (part of) the database and start from scratch.
On the other hand, I would recommend to make the upgrade scripts 'idempotent'.
Running it once or running it twice should make no difference.
For example, use DROP TABLE IF EXISTS rather than DROP TABLE; the latter will throw an exception when executed again.
That way, in desperate times you may still be able to repair a database semi-automatically, simply by re-running everything.

If you are talking about Schema state, you can look at state-based deployment-tools instead of change-based. (not the official terminology)
You should look at these two tools
SQL Server Data Tools (Dacpac) data-tier-applications which is practically free
RedGate has an entire toolset for this https://www.red-gate.com/solutions/need/automate. which is licensed
The one thing to keep in mind with State based deployments is that you don't control how the database gets from one-state to another, with SSDT
For example a column-rename = drop and recreate that column, same for a table-rename.
In their defence they do have some protections and do tell you what is about to happen.
EDIT (Updating to address comment below)
It should not be a problem that you can't access the TargetDb while in development. You can still use the above tools provided you can use them (Dacpac/Redgate) tooling when you are deploying to the TargetDb.
If you are hoping to have a dynamic TSQL script that can update a target database in an unknown state. Then that is a recipe for failure/disaster. I do have some suggestions at the end for dealing with this.
The way I see it working is
Do your development using Dacpac/Redgate
Build your artefacts Dacpac / Redgate package
Copy artefact to the deployment server with tools
when doing deployments use the tools (Dacpac Powershell) or Redgate manually
If your only choice is a TSQL script, then the only option is extensive-defensive coding covering all possibilities.
Every object must have an existence check
Every property must have a state check
Every object/property must have a roll forward / roll backward script.
For example to sync a table
A Script to check the table exists, if not create it
A script to check each property of the table is in the correct state
check all columns and their data-types and script to update them to match
check defaults
check indexes, partitioning etc
Even with this, you might not be able to handle every scenario.

The work you are trying to do requires you start using a standard change control process.
Given the risk of data loss, and issues related to creation of columns in a specific sequence and the potential for column definitions to change.
I recommend you look at defining a base line version which you will manually have to upgrade each system to.
You can roll your own code, and use a schema version table, or use any one of the tools available such as redgate sql source control, visual studio database projects, dbup, or others.
I do not believe any tool will bring you from 0-1, however, once you baseline, any one of these tools will greatly facilitate your workflow.
Start with this article Get Your Database Under Version Control
Here are some tools that can help you?
Octopus Schema Migrations
Flyway By Redgate
Idera Database Change Management
SQL Server Data Tools

Versioning SQL Server Database with data using SSDT

I have a sql server database already containing data. I want to start versioning it. I know I can use Database project in Visual Studio, and by importing database I can generate sql scripts.
But what about data in the database? I tried to make some Data-Tier Application Files, but when I try to import it in my DB project in Visual Studio I am getting this error:
Import Data-Tier Application File - This operation is not supported for packages containing data
So how do I import data? It has to be some way, because when I am extracting DAC file there is option Extract Schema and Data so there has to be a way to use this data afterwards.
Or maybe post deployement scripts are the only option?
Grettings

Your only option for this at this time is to use post-deploy scripts to populate those tables, taking into account the fact that the scripts need to be able to run multiple times without re-inserting data. A temp table/table variable and a MERGE statement are probably your best bets if you might have changes to the reference data, otherwise a left join might suffice.
Others have tried to include reference data, but it's a pretty hard problem to solve in a manner that works well for everyone. I know others like Ed Elliott have written some stuff that can turn those on/off as needed so you're not always including all reference data every time. You could also look into a post-post-deploy scenario where after your publish and post-deploy, you run a separate script that updates the data from static files. They'd still be in source control, but not necessarily part of your SSDT project. You'd have to remember to run that script in your builds, though.
I know for a while we had a database that solely had the lookup tables populated so we could reference that and do data compares if needed, but that still requires someone to maintain those values in an ongoing manner.

Generating database scripts for SQL Server database versioning

In the scope of responsible programming and versioning, I would like to start to version my database changes especially since I am developing on my database instance then moving it to production. I haven't found any thing that truly makes sense to me on how to do this. I am using Visual Studio 2010 Pro as my IDE. Is there a document that makes this process simple and able to detect changes to the database with relative ease? Or what should I change in my workflow to make this easier?

One way that I've successfully done this sort of thing in the past, is via Sql Source Control. Visual Studio does not offer this functionality for you.
Alternatively, you can use SSMS to generate the Database scripts for you and save it as a file; then you can check in the script. You would chose whether you generate the whole DB script in one file or whether you do it on an object by object basis. The syncing part will have to be done by you by executing your scripts in production. In conclusion a total nightmare.
Redgate also offers Sql Compare, which is great for syncing databases. Take a look at their products if you or your company can afford them.

We use our own DB solution in-house which brings all the tools required for proper DB versioning. While I realize that it may not be a perfect solution for everyone, I invite you to have a look at it (it is open-source): bsn ModuleStore
The versioning aspect is as follows: the tool can script out the SQL semi-automatically, and it does reformat the source code to be in an uniform format. The files will therefore always be identical for the same source, no matter of when and by whom something has been scripted; this therefore works nicely with non-locking source control systems (especially SVN, Git or Mercurial).
The reformat puts all statements in the same form (e.g. optional keywords such as AS, INNER, OUTER etc. are dealt with), scripts everything to the "dbo" schema (even if it was in a different one), puts all identifiers into the square braces ([something]), uppercases all reserved words, does the indentation etc.
Besides versioning, the runtime part of the tool can diff the running DB and the CREATE scripts (DB source code) and apply updates automatically for all non-destructive changes (e.g. updating indexes, constraints, views, stored procedures, triggers, custom types, new tables etc.). Destructuve changes have to be scriped manually (table changes which then usually require data transformations). The runtime will make sure that all updates are performed in a transaction and rollback if the resulting DB doesn't match the CREATE scripts, therefore you get the safety of knowing that the DB is exactly on the version required by the application, even if it has been tampered with manually.
Also, multiple "modules" can be used in a single database. Each module is stored as a schema and independent of other schemas, thereby making it possible to add or remove modules from one single DB, and avoiding the need to create multiple databases for different parts of the application. Also, the use of schemas to do this makes sure that there are no name collisions.
It may be worth noting that the toolset has no dependency to the SMO, it is autonomous.

Save Your Database scripts at SVN. Here is the Refernce How to use SVN Tortoise
OR
Save your database script at VSS. Here is the reference What is VSS ? How can we use that ?
In both cases you can keep track of the changes done so that in future you can check the history which in saved in the form of versions.
You can use Red Gate product also
EDIT
How do you pull out what what has changed?
Use comparison feature to check the changes made in the previous versions.
How do I apply the changes to the live database server?
Download the latest file from server.
I hope you are not using the Drop statements for the Table in your consolidated script. As it will delete all records from the table.
Drop statements will take place for Stored Pro, View, Function etc.
Please note that you have to run the complete latest database script file on the production server with below mentioned action plans
1. Remove Drop Statement for Schema DDL
2. Add Drop/Create Statements for Stored Proc/Views
3. Include Alter statements DML of schema.
Hope this will definitely help you.

Database source control vs. schema change scripts

Building and maintaining a database that is then deplyed/developed further by many devs is something that goes on in software development all the time. We create a build script, and maintain further update scripts that get applied as the database grows over time. There are many ways to manage this, from manual updates to console apps/build scripts that help automate these processes.
Has anyone who has built/managed these processes moved over to a Source Control solution for database schema management? If so, what have they found the best solution to be? Are there any pitfalls that should be avoided?
Red Gate seems to be a big player in the MSSQL world and their DB source control looks very interesting:
http://www.red-gate.com/products/solutions_for_sql/database_version_control.htm
Although it does not look like it replaces the (default) data* management process, so it only replaces half the change management process from my pov.
(when I'm talking about data, I mean lookup values and that sort of thing, data that needs to be deployed by default or in a DR scenario)
We work in a .Net/MSSQL environment, but I'm sure the premise is the same across all languages.

Similar Questions
One or more of these existing questions might be helpful:
The best way to manage database changes
MySQL database change tracking
SQL Server database change workflow best practices
Verify database changes (version-control)
Transferring changes from a dev DB to a production DB
tracking changes made in database structure
Or a search for Database Change

I look after a data warehouse developed in-house by the bank where I work. This requires constant updating, and we have a team of 2-4 devs working on it.
We are fortunate because there is only the one instance of our "product", so we do not have to cater for deploying to multiple instances which may be at different versions.
We keep a creation script file for each object (table, view, index, stored procedure, trigger) in the database.
We avoid the use of ALTER TABLE whenever possible, preferring to rename a table, create the new one and migrate the data over. This means that we don't have to look through a history of ALTER scripts - we can always see the up to date version of every table by looking at its create script. The migration is performed by a separate migration script - this can be partly auto-generated.
Each time we do a release, we have a script which runs the create scripts / migration scripts in the appropriate order.
FYI: We use Visual SourceSafe (yuck!) for source code control.

I've been looking for a SQL Server source control tool - and came across a lot of premium versions that do the job - using SQL Server Management Studio as a plugin.
LiquiBase is a free one but i never quite got it working for my needs.
There is another free product out there though that works stand along from SSMS and scripts out objects and data to flat file.
These objects can then be pumped into a new SQL Server instance which will then re-create the database objects.
See gitSQL

Maybe you're asking for LiquiBase?

How do you track database changes in source control?

We use SQL Server 2000/2005 and Vault or SVN on most of our projects. I haven't found a decent solution for capturing database schema/proc changes in either source control system.
Our current solution is quite cumbersome and difficult to enforce (script out the object you change and commit it to the database).
We have a lot of ideas of how to tackle this problem with some custom development, but I'd rather install an existing tool (paid tools are fine).
So: how do you track your database code changes? Do you have any recommended tools?
Edit:
Thanks for all the suggestions. Due to time constraints, I'd rather not roll my own here. And most of the suggestions have the flaw that they require the dev to follow some procedure.
Instead, an ideal solution would monitor the SQL Database for changes and commit any detected changes to SCM. For example, if SQL Server had an add-on that could record any DML change with the user that made the change, then commit the script of that object to SCM, I'd be thrilled.
We talked internally about two systems:
1. In SQL 2005, use object permissions to restrict you from altering an object until you did a "checkout". Then, the checkin procedure would script it into the SCM.
2. Run a scheduled job to detect any changes and commit them (anonymously) to SCM.
It'd be nice if I could skip the user-action part and have the system handle all this automatically.

Use Visual studio database edition to script out your database. Works like a charm and you can use any Source control system, of course best if it has VS plugins. This tool has also a number of other useful features. Check them out here in this great blog post
http://www.vitalygorn.com/blog/post/2008/01/Handling-Database-easily-with-Visual-Studio-2008.aspx
or check out MSDN for the official documentation

Tracking database changes directly from SSMS is possible using various 3rd party tools. ApexSQL Source Control automatically scripts any database object that is included in versioning. Commits cannot be automatically performed by the tool. Instead, the user needs to choose which changes will be committed.
When getting changes from a repository, ApexSQL Source Control is aware of a SQL database referential integrity. Thus, it will create a synchronization scripts including all dependent objects that will be wrapped in a transactions so, either all changes will be applied in case no error is encountered, or none of the selected changes is applied. In any case, database integrity remains unaffected.

I have to say I think a visual studio database project is also a reasonable solution to the source control dilemma. If it's set up correctly you can run the scripts against the database from the IDE. If your script is old, get the latest, run it against the DB. Have a script that recreates all the objects as well if you need, new objects must be added to the this script as well by hand, but only once
I like every table, proc and function to be in it's own file.

One poor man's solution would be to add a pre-commit hook script that dumps out the latest db schema into a file and have that file committed to your SVN repository along with your code. Then, you can diff the db schema files from any revision.

I just commit the SQL-alter-Statement additional to the complete SQL-CreateDB-statement.

Rolling your own from scratch would not be very doable, but if you use a sql comparison tool like Redgate SQL Compare SDK to generate your change files for you it would not take very long to half-roll what you want and then just check those files into source control. I rolled something similar for myself to update changes from our development systems to our live systems in just a few hours.

In our environment, we never change the DB manually: all changes are done by scripts at release time, and the scripts are kept in the version control system. One important part of this procedure is to be sure that all scripts can be run again against the same DB the scripts are idempotent?) without loss of data. For example, if you add a column, make sure that you do nothing if the column is already there.
Your comment about "suggestions have the flaw that they require the dev to follow some procedure" is really a tell-tale. It's not a flaw, it's a feature. Version control helps developers in following procedures and makes the procedures less painful. If you don't want to follow procedures, you don't need version control.

In SQL2000 generate each object into it's own file, then check them all into your source control. Let your source control handle the change history.
In SQL 2005, you'll need to write a bit of code to generate all objects into separate files.

In one project I arranged by careful attention in the design that all the important data in the database can be automatically recreated from external places. At startup the application creates the database if it is missing, and populates it from external data sources, using a schema in the application source code (and hence versioned with the application). The database store name (a sqlite filename although most database managers allow multiple databases) includes a schema version, and we increase the schema version whenever we commit a schema change. This means when we restart the application to a new version with a different schema that a new database store is automatically created and populated. Should we have to revert a deployment to an old schema then the new run of the old version will be using the old database store, so we get to do fast downgrades in the event of trouble.
Essentially, the database acts like a traditional application heap, with the advantages of persistence, transaction safety, static typing (handy since we use Python) and uniqueness constraints. However, we don't worry at all about deleting the database and starting over, and people know that if they try some manual hack in the database then it will get reverted on the next deployment, much like hacks of a process state will get reverted on the next restart.
We don't need any migration scripts since we just switch database filename and restart the application and it rebuilds itself. It helps that the application instances are sharded to use one database per client. It also reduces the need for database backups.
This approach won't work if your database build from the external sources takes longer than you will allow the application to be remain down.

If you are using .Net and like the approach Rails takes with Migrations, then I would recommend Migrator.Net.
I found a nice tutorial that walks through setting it up in Visual Studio. He also provides a sample project to reference.

We developed a custom tool that updates our databases. The database schema is stored in a database-neutral XML file which is then read and processed by the tool. The schema gets stored in SVN, and we add appropriate commentary to show what was changed. It works pretty well for us.
While this kind of solution is definitely overkill for most projects, it certainly makes life easier at times.

Our dbas periodically check prod against what is in SVN and delete any objects not under source control. It only takes once before the devlopers never forget to put something in source control again.
We also do not allow anyone to move objects to prod without a script as our devs do not have prod rights this is easy to enforce.

In order to track all the change like insert update and delete there will be a lot of overhead for the SVN.
It is better to track only the ddl changes like (alter, drop, create) which changes the schema.
You can do this Schema tracking easily by creating a table and a trgger to insert data to that table.
Any time you want u can get the change status by querying from that table
There are a lots of example here and here

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight