Database schema updates

Database schema updates - database

I'm working on an AIR application that uses a local SQLite database and was wondering how I could manage database schema updates when I distribute new versions of the application. Also considering updates that skip some versions. E.g. instead of going from 1.0 to 1.1, going from 1.0 to 1.5.
What technique would you recommend?

In the case of SQLite, you can make use of the user_version pragma to track the version of the database. To get the version:
PRAGMA user_version
To set the version:
PRAGMA user_version = 5
I then keep each group of updates in an SQL file (that's embedded in the app) and run the updates needed to get up to the most recent version:
Select Case currentUserVersion
Case 1
// Upgrade to version 2
Case 2
// Upgrade to version 3
Case etc...
End Select
This allows the app to update itself to the most recent version regardless of the current version of the DB.

We script every DDL change to the DB and when we make a "release" we concatenate them into a single "upgrade" script, together with any Stored Procedures which have changed "since last time"
We have a table that stores the version number of the latest patch applied - so upgrade tools can apply any newer patches.
Every Stored Procedure is in a separate file. Each starts with an "insert" statement to a logging table that stores Name of SProc, Version and "now". (Actually an SProc is executed to store this, its not a raw insert statement).
Sometimes during deployment we manually change an SProc, or rollout odds & ends from DEV, and comparing the log on client's TEST and PRODUCTION databases enables us to check that everything is at the same version.
We also have a "release" master-database, to which we apply the updates, and we use a restored backup of that for new installations (saves the time of running the scripts, which obviously increase over time). We update that as & when, because obviously if it is a bit stale the later patch scripts can be applied.
Our Release database also contains sanitised starter data (which is deleted, or sometimes adopted & modified, before a new installation goes live - so this is not included in any update scripts)
SQL Server has a toolbar button to script a change - so you can use the GUI tools to make all the changes, but rather than saving them generate a script instead. (actually, there is a checkbox to always generate a script, so if you forget and just press SAVE it still gives you the script it used after-the-fact, which can be saved as the patch file)

What I am considering is adding a SchemaVersion table to the database which holds a record for every version that exists. The last version of the SchemaVersion table is the current level of the database.
I am going to create (SQL) scripts that perform the initial setup of 1.0 and thereafter the upgrade from 1.0 to 1.1, 1.1 to 1.2, etc.
Even a fresh install to e.g. 1.2 will run through all these scripts. This might seem a little slow, but is only done once and on an (almost) empty database.
The big advantage of this is that a fresh install will have the same database schema as an upgraded install.
As I said: I am considering this. I will probably start implementing this tomorrow. If you're interested I can share my experiences. I will be implementing this for a c# application that uses LINQ-to-entities with SQL Server and MySQL as DBMSes.
I am interested to hear anybody else's suggestions and ideas and if somebody can point me out an open source .Net library or classes that implements something like this, that would be great.
EDIT:
In the answer to a different question here on SO I found a reference to Migrator.Net. I started using it today and it looks like it is exactly what I was looking for.

IMO the easiest thing to do is to treat an update from e.g. 1.0 to 1.5 as a succession of updates from 1.0 to 1.1, 1.1 to 1.2, and so forth. For each version change, keep a conversion script/piece of code around.
Then, keep a table with a version field in the database, and compile into the the app the required version. On startup, if the version field does not match the compiled-in version, run all the required conversion scripts, one by one.
The conversion scripts should ideally start a transaction and write the new version into the database as the last statement before committing the transaction.

Related

How to create a script for SQL Server database create / upgrade from any state

I need to create scripts for creating or updating a database. The scripts are created from my test Database or from my source control.
The script needs to upgrade a database from any version of my application to the current version so it needs to be agnostic to what already exists in the database.
I do not have access to the databases that will be upgraded.
e.g.
If a table does not exist the script should create it.
If the table exists the script should check if all the columns exist (And check their types).
I wrote a lot of this checking code in C# as in i have an SQL create table script and the C# code checks if the table (and columns) exists before running the script.
My code is not production ready and i wanted to know what ready made solutions are out there.

I have no experience with frameworks that can do this.
Such an inquiry is off-topic for SO anyway.
But depending on your demands, it may not be too hard to implement something yourself.
One straightforward approach would be to work with incremental schema changes; basically just a chronological list of SQL scripts.
Never change or delete existing script (unless something really bad is in there).
Instead, just keep adding upgrade scripts for every new version.
Yes, 15 years later you will have accumulated 5,000 scripts.
Trust me, it will be the least of your problems.
To create a new database, just execute the full chain of scripts in chronological order.
For upgrades, there are two possibilities.
Keep a progress list in every database.
That is basically just a table containing the names of all scripts that have already been executed there.
To upgrade, just execute every script that is not in that list already. Add them to the list as you go.
Note: if necessary, this can be done with one or more auto-generated, deployable, static T-SQL scripts.
Make every script itself responsible for recognizing whether or not it needs to do anything.
For example, a 'create table' script checks if the table already exists.
I would recommend a combination of the two:
option #1 for new versions (as it scales a lot better than #2)
option #2 for existing versions (as it may be hard to introduce #1 retroactively on legacy production databases)
Depending on how much effort you will put in your upgrade scripts, the 'option #2' part may be able to fix some schema issues in any given database.
In other words, make sure you start off with scripts that are capable of bringing messy legacy databases back in line with the schema dictated by your application.
Future scripts (the 'option #1' part) have less to worry about; they should trust the work done by those early scripts.
No, this approach is not resistant against outside interference, like a rogue sysadmin.
It will not magically fix a messed-up schema.
It's an illusion to think you can do that automatically, without somebody analyzing the problem.
Even if you have a tool that will recreate every missing column and table, that will not bring back the data that used to be in there.
And if you are not interested in recovering data, then you might as well discard (part of) the database and start from scratch.
On the other hand, I would recommend to make the upgrade scripts 'idempotent'.
Running it once or running it twice should make no difference.
For example, use DROP TABLE IF EXISTS rather than DROP TABLE; the latter will throw an exception when executed again.
That way, in desperate times you may still be able to repair a database semi-automatically, simply by re-running everything.

If you are talking about Schema state, you can look at state-based deployment-tools instead of change-based. (not the official terminology)
You should look at these two tools
SQL Server Data Tools (Dacpac) data-tier-applications which is practically free
RedGate has an entire toolset for this https://www.red-gate.com/solutions/need/automate. which is licensed
The one thing to keep in mind with State based deployments is that you don't control how the database gets from one-state to another, with SSDT
For example a column-rename = drop and recreate that column, same for a table-rename.
In their defence they do have some protections and do tell you what is about to happen.
EDIT (Updating to address comment below)
It should not be a problem that you can't access the TargetDb while in development. You can still use the above tools provided you can use them (Dacpac/Redgate) tooling when you are deploying to the TargetDb.
If you are hoping to have a dynamic TSQL script that can update a target database in an unknown state. Then that is a recipe for failure/disaster. I do have some suggestions at the end for dealing with this.
The way I see it working is
Do your development using Dacpac/Redgate
Build your artefacts Dacpac / Redgate package
Copy artefact to the deployment server with tools
when doing deployments use the tools (Dacpac Powershell) or Redgate manually
If your only choice is a TSQL script, then the only option is extensive-defensive coding covering all possibilities.
Every object must have an existence check
Every property must have a state check
Every object/property must have a roll forward / roll backward script.
For example to sync a table
A Script to check the table exists, if not create it
A script to check each property of the table is in the correct state
check all columns and their data-types and script to update them to match
check defaults
check indexes, partitioning etc
Even with this, you might not be able to handle every scenario.

The work you are trying to do requires you start using a standard change control process.
Given the risk of data loss, and issues related to creation of columns in a specific sequence and the potential for column definitions to change.
I recommend you look at defining a base line version which you will manually have to upgrade each system to.
You can roll your own code, and use a schema version table, or use any one of the tools available such as redgate sql source control, visual studio database projects, dbup, or others.
I do not believe any tool will bring you from 0-1, however, once you baseline, any one of these tools will greatly facilitate your workflow.
Start with this article Get Your Database Under Version Control
Here are some tools that can help you?
Octopus Schema Migrations
Flyway By Redgate
Idera Database Change Management
SQL Server Data Tools

run liquibase on multiple databases at different versions

I am trying to integrate Liquibase with our Spring/Hibernate web-app to replace our existing home-grown solution. So far Liquibase is great, but there's one use-case that is important to us and I don't know if Liquibase supports it or not, which is this:
We deploy our web app to clients who host the webapp and the database (MySQL) themselves. So, supposing we deploy to our first client (client1) with a new clean DB schema ( generated from Hibernate mappings) and no items in Liquibase changeset. We then develop some schema changes and redeploy the application to client1, and liquibase does its stuff and applies the changesets- all great so far.
Now, we deploy to a new client, client2, again with a new database schema generated from Hibernate mappings. But this time, there are changesets present ( for the changes made between client1 and client2 deployments) but they don't need to be applied, as they're already in the new schema). However, because the DATABASECHANGELOG table is empty, Liquibase will try to apply the changesets and probably fail with SQL errors.
What we'd like is for new deployments to new clients to 'know' at what changeset they are (relative to the first deployment to client 1), so it only applies subsequent updates.
There seem to be several possibilities for this, probably more I've not thought of:
populate DATABASECHANGELOG with fake entries to fool Liquibase into thinking these have already been applied.
always deploy our first,baseline original schema to subsequent clients, and run updates sequentially, and so never deploy a 'new' schema derived from Hibernate mappings, after client1.
use our own tracking system (e.g., map a db version to an application version, and a db version to a changeset).
Is this a problem, or I am just not understanding how to use Liquibase properly? Would be grateful for any advice from people who've dealt with this sort of use-case before. We'd really like to avoid deployment-specific changeSets if at all possible - there will be dozens, if not hundreds of deployments to handle.
Thanks,
Richard

We have a similar setup.
But we are getting liquibase into the game earlier. Before we officially release the software we setup the liquibase changesets and let liquibase handle the database.
We did not want to loose the advantage of letting hibernate generate the DB during the development phase. So we are also using Hibernate while developing.
But right before the version is stable we let the liquibase diff tool run on the database and let it create a changeset for the hibernate-generated tables.
Then this changeset is corrected manually since the liquibase diff tool does produce some flaws.
Once the changeset is ready we ship this with the software.
We maintain a reference system that keeps the data base version of the last officially released version. Then for the next release we let the liquibase diff tool run with the current development version against the reference db. That spits out the difference for the next version. This is also corrected manually and finally you have a changeset that changes the db to the next version.
Hope this gives you an idea of one way to use liquibase and hibernate together.

I usually suggest always running the same changelog file against all your different databases. That way you don't have to deal with manually marking changeSets as ran, using preconditions, or anything else. Most importantly, every database will follow the same upgrade path so you know they are going to update consistently without any unexpected problems.
You can use the liquibase hibernate extension to automatically append changeSets to your changelog based on your hibernate mapping, but when it comes time to deploy your changes to the databases you just run your liquibase changelog file and not try to use hibernate's schema generation logic at all.

For option 1 above (populate with fake entries) I've just discovered the changelogSync command which looks like it marks all changeset entries as applied, even if they haven't been.
But is this better or worse than genuinely applying the changes, from a baseline schema?

Keep different database environment synched

At the moment we manually push changes from our DEV SQL environment to the TEST and production (using Schema compare in Visual studio, plus some script we create while making changes to the DEV), but this is very time consuming and error prone.
We were wondering if there was a better way of doing this and how would we need to implement this.
I've read about maybe using versioning (how would this work?), or maybe using RED GATES' SQL Source control (but can this be used to push changes to the TEST, or is it only used to keep track of local changes?)
We want a reliable way to update our TEST & Production servers so that data won't be corrupted/lost... We use SQL Server 2008 R2 and Visual Studio 2012.
We are starting a new project, so it's time for a change! Thank you for your time!

One simple way to do this would be to have simple version table in the db with one row and one column which stores the version number.
Now everytime you are pushing changes to dev, create the incremental sql script, Have a master script which based on the current version of the db, will call the necessary incremental sql scripts to upgrade the schema to the latest version.
Be careful of dropping columns, changing column types, or reducing columns sizes e.g. varchar(100) to varchar(10) in your incremental scripts, as that could result in data loss if not planned properly.
You incremental scripts should be idempotent, that they could be run over and over, just in case to handle the case when db crashes during upgrade.

Although there are many benefits in using SQL Source Control (and I'd love for you to give it a go, as I'm the product manager!), its purpose is limited to versioning and not managing and deploying to your various environments. The correct Red Gate tool for this would be Deployment Manager.
http://www.red-gate.com/delivery/deployment-manager/
There is a blog maintained by the Deployment Manager project team here, which should give you an idea of where the tool is headed:
http://thefutureofdeployment.com/

Does Schema Compare in VS have CLI? If so you can probably automate it to run several times during the day. If not you can try using some other 3rd party tools that support CLI such as ApexSQL Diff for schema and ApexSQL Data Diff for synchronizing data.

IS there any Tool or API to auto update a database structure

In an application that I am supporting, lately I have made several changes to the DB structure.
I send the update to the users, but its very hard to keep them all up-to-date.
Is there any simple way to do this ?
Something that enables users to skip versions, but still do the update in the next version they install.
I use BlackFish database.
Thanks

Just store database version number in the database and write migration scripts like this:
database_10.sql - initial db structure
database_10_15.sql - migration script to move from 1.0 to 1.5
database_10_17.sql - migration script to move from 1.5 to 1.7
Check database version number on every application startup and apply needed migration scripts.

Side note:
Another appealing alternative to it also for small project is Component ACE Absolute Database.
Now direct to the point:
The personnal edition (Free) comes with a custom utility named DBManager (along with its source code).
It can serve as starting point to how to manage database structure change programmatically (the Delphi way!).
Why not port it to BlackFish?

I very rarely change Databases but just add a table or sometimes a colunm. When I startup my program it checks for the existance of said column or table and if it's not there it just tries to make it.

Verify database changes (version-control)

I have read lots of posts about the importance of database version control. However, I could not find a simple solution how to check if database is in state that it should be.
For example, I have a databases with a table called "Version" (version number is being stored there). But database can be accessed and edited by developers without changing version number. If for example developer updates stored procedure and does not update Version database state is not in sync with version value.
How to track those changes? I do not need to track what is changed but only need to check if database tables, views, procedures, etc. are in sync with database version that is saved in Version table.
Why I need this? When doing deployment I need to check that database is "correct". Also, not all tables or other database objects should be tracked. Is it possible to check without using triggers? Is it possible to be done without 3rd party tools? Do databases have checksums?
Lets say that we use SQL Server 2005.
Edited:
I think I should provide a bit more information about our current environment - we have a "baseline" with all scripts needed to create base version (includes data objects and "metadata" for our app). However, there are many installations of this "base" version with some additional database objects (additional tables, views, procedures, etc.). When we make some change in "base" version we also have to update some installations (not all) - at that time we have to check that "base" is in correct state.
Thanks

You seem to be breaking the first and second rule of "Three rules for database work". Using one database per developer and a single authoritative source for your schema would already help a lot. Then, I'm not sure that you have a Baseline for your database and, even more important, that you are using change scripts. Finally, you might find some other answers in Views, Stored Procedures and the Like and in Branching and Merging.
Actually, all these links are mentioned in this great article from Jeff Atwood: Get Your Database Under Version Control. A must read IMHO.

We use DBGhost to version control the database. The scripts to create the current database are stored in TFS (along with the source code) and then DBGhost is used to generate a delta script to upgrade an environment to the current version. DBGhost can also create delta scripts for any static/reference/code data.
It requires a mind shift from the traditional method but is a fantastic solution which I cannot recommend enough. Whilst it is a 3rd party product it fits seamlessly into our automated build and deployment process.

I'm using a simple VBScript file based on this codeproject article to generate drop/create scripts for all database objects. I then put these scripts under version control.
So to check whether a database is up-to-date or has changes which were not yet put into version control, I do this:
get the latest version of the drop/create scripts from version control (subversion in our case)
execute the SqlExtract script for the database to be checked, overwriting the scripts from version control
now I can check with my subversion client (TortoiseSVN) which files don't match with the version under version control
now either update the database or put the modified scripts under version control

You have to restrict access to all databases and only give developers access to a local database (where they develop) and to the dev server where they can do integration. The best thing would be for them to only have access to their dev area locally and perform integration tasks with an automated build. You can use tools like redgates sql compare to do diffs on databases. I suggest that you keep all of your changes under source control (.sql files) so that you will have a running history of who did what when and so that you can revert db changes when needed.
I also like to be able to have the devs run a local build script to re initiate their local dev box. This way they can always roll back. More importantly they can create integration tests that tests the plumbing of their app (repository and data access) and logic stashed away in a stored procedure in an automated way. Initialization is ran (resetting db), integration tests are ran (creating fluff in the db), reinitialization to put db back to clean state, etc.
If you are an SVN/nant style user (or similar) with a single branch concept in your repository then you can read my articles on this topic over at DotNetSlackers: http://dotnetslackers.com/articles/aspnet/Building-a-StackOverflow-inspired-Knowledge-Exchange-Build-automation-with-NAnt.aspx and http://dotnetslackers.com/articles/aspnet/Building-a-StackOverflow-inspired-Knowledge-Exchange-Continuous-integration-with-CruiseControl-NET.aspx.
If you are a perforce multi branch sort of build master then you will have to wait till I write something about that sort of automation and configuration management.
UPDATE
#Sazug: "Yep, we use some sort of multi branch builds when we use base script + additional scripts :) Any basic tips for that sort of automation without full article?" There are most commonly two forms of databases:
you control the db in a new non-production type environment (active dev only)
a production environment where you have live data accumulating as you develop
The first set up is much easier and can be fully automated from dev to prod and to include rolling back prod if need be. For this you simply need a scripts folder where every modification to your database can be maintained in a .sql file. I don't suggest that you keep a tablename.sql file and then version it like you would a .cs file where updates to that sql artifact is actually modified in the same file over time. Given that sql objects are so heavily dependent on each other. When you build up your database from scratch your scripts may encounter a breaking change. For this reason I suggest that you keep a separate and new file for each modification with a sequence number at the front of the file name. For example something like 000024-ModifiedAccountsTable.sql. Then you can use a custom task or something out of NAntContrib or an direct execution of one of the many ??SQL.exe command line tools to run all of your scripts against an empty database from 000001-fileName.sql through to the last file in the updateScripts folder. All of these scripts are then checked in to your version control. And since you always start from a clean db you can always roll back if someones new sql breaks the build.
In the second environment automation is not always the best route given that you might impact production. If you are actively developing against/for a production environment then you really need a multi-branch/environment so that you can test your automation way before you actually push against a prod environment. You can use the same concepts as stated above. However, you can't really start from scratch on a prod db and rolling back is more difficult. For this reason I suggest using RedGate SQL Compare of similar in your build process. The .sql scripts are checked in for updating purposes but you need to automate a diff between your staging db and prod db prior to running the updates. You can then attempt to sync changes and roll back prod if problems occur. Also, some form of a back up should be taken prior to an automated push of sql changes. Be careful when doing anything without a watchful human eye in production! If you do true continuous integration in all of your dev/qual/staging/performance environments and then have a few manual steps when pushing to production...that really isn't that bad!

First point: it's hard to keep things in order without "regulations".
Or for your example - developers changing anything without a notice will bring you to serious problems.
Anyhow - you say "without using triggers".
Any specific reason for this?
If not - check out DDL Triggers. Such triggers are the easiest way to check if something happened.
And you can even log WHAT was going on.

Hopefully someone has a better solution than this, but I do this using a couple methods:
Have a "trunk" database, which is the current development version. All work is done here as it is being prepared to be included in a release.
Every time a release is done:
The last release's "clean" database is copied to the new one, eg, "DB_1.0.4_clean"
SQL-Compare is used to copy the changes from trunk to the 1.0.4_clean - this also allows checking exactly what gets included.
SQL Compare is used again to find the differences between the previous and new releases (changes from DB_1.0.4_clean to DB_1.0.3_clean), which creates a change script "1.0.3 to 1.0.4.sql".
We are still building the tool to automate this part, but the goal is that there is a table to track every version the database has been at, and if the change script was applied. The upgrade tool looks for the latest entry, then applies each upgrade script one-by-one and finally the DB is at the latest version.
I don't have this problem, but it would be trivial to protect the _clean databases from modification by other team members. Additionally, because I use SQL Compare after the fact to generate the change scripts, there is no need for developers to keep track of them as they go.
We actually did this for a while, and it was a HUGE pain. It was easy to forget, and at the same time, there were changes being done that didn't necessarily make it - so the full upgrade script created using the individually-created change scripts would sometimes add a field, then remove it, all in one release. This can obviously be pretty painful if there are index changes, etc.
The nice thing about SQL compare is the script it generates is in a transaction -and it if fails, it rolls the whole thing back. So if the production DB has been modified in some way, the upgrade will fail, and then the deployment team can actually use SQL Compare on the production DB against the _clean db, and manually fix the changes. We've only had to do this once or twice (damn customers).
The .SQL change scripts (generated by SQL Compare) get stored in our version control system (subversion).

If you have Visual Studio (specifically the Database edition), there is a Database Project that you can create and point it to a SQL Server database. The project will load the schema and basically offer you a lot of other features. It behaves just like a code project. It also offers you the advantage to script the entire table and contents so you can keep it under Subversion.
When you build the project, it validates that the database has integrity. It's quite smart.

On one of our projects we had stored database version inside database.
Each change to database structure was scripted into separate sql file which incremented database version besides all other changes. This was done by developer who changed db structure.
Deployment script checked against current db version and latest changes script and applied these sql scripts if necessary.

Firstly, your production database should either not be accessible to developers, or the developers (and everyone else) should be under strict instructions that no changes of any kind are made to production systems outside of a change-control system.
Change-control is vital in any system that you expect to work (Where there is >1 engineer involved in the entire system).
Each developer should have their own test system; if they want to make changes to that, they can, but system tesing should be done on a more controlled, system test system which has the same changes applied as production - if you don't do this, you can't rely on releases working because they're being tested in an incompatible environment.
When a change is made, the appropriate scripts should be created and tested to ensure that they apply cleanly on top of the current version, and that the rollback works*
*you are writing rollback scripts, right?

I agree with other posts that developers should not have permissions to change the production database. Either the developers should be sharing a common development database (and risk treading on each others' toes) or they should have their own individual databases. In the former case you can use a tool like SQL Compare to deploy to production. In the latter case, you need to periodically sync up the developer databases during the development lifecycle before promoting to production.
Here at Red Gate we are shortly going to release a new tool, SQL Source Control, designed to make this process a lot easier. We will integrate into SSMS and enable the adding and retrieving objects to and from source control at the click of a button. If you're interested in finding out more or signing up to our Early Access Program, please visit this page:
http://www.red-gate.com/Products/SQL_Source_Control/index.htm

I have to agree with the rest of the post. Database access restrictions would solve the issue on production. Then using a versioning tool like DBGhost or DVC would help you and the rest of the team to maintain the database versioning

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight