Databases and DVCS - database

I've recently asked a question about how suitable a DVCS is for the corporate environment, and that has sparked another question for me.
One of the plus sides to a DVCS seems to be that you can easily branch and try out new things. My problem starts when I begin to think about database changes. I've always found it tricky to get a DB into a VCS and it just sounds like it's going to be even harder with a DVCS.
So, whats the best way to work with databases and a DVCS?
EDIT: I've started looking into Migrator.NET. What do people think of projects like this for easily moving between versions specificaly with experimental branches in your DVCS?

I think the best way to deal with this issue is to work with DB Schemas, not the databases themselves. In this case, each developer would have their own database to develop against.
Here are some of the options available:
Migrations framework within Ruby on Rails.
South for Django, in addition to the schema being defined in the model classes themselves.
Visual Studio 2008 Team System Database Edition for .NET: You define the schema and the tool can do a diff on schema and data to generate scripts to go between different versions of the database.
These may give you some inspiration on how to deal with putting a database in version control. Another benefit that comes when you deal schemas is that you can more readily implement TDD and Continuous Integration (CI). Your TDD/CI environment would be able to build up a new version of the database and then run tests against the newly generated environment.

Version all the scripts you're using to manage your database. If you need to have "in-development" changes to a DB, make them on your personal DB until such time as you "publish" your changes.

Database version control is always the most difficult thing in a multi-developer environment.
Typically each user will have their own DB which is a chimera of some but not all of the DB changes. When they make changes, they'll need to commit their change scripts. This gets really awkward. The core problems seem to stem from database changes affecting many aspects of the system and multiple table changes being dependent on each other - and how to migrate to the new schema from the old schema. Migrating data to a new schema is typically non-trivial. Often you want to default a column when data is copied to the new schema, but NOT default a column in general for INSERT, say. These are typically already difficult in production deployment issues and having to manage the database during development when the database design could be in major flux in the same way as a major deployment is a lot more work than you usually need to be doing in development. Time that could be better spent ensuring that your database is well-designed - constraints, foregin keys, etc.
Because the developers are more likely to step on each other with database changes, we always had a database chokepoint - the developers all developed against the SAME development database and made their changes "live". Then the dev database was version controlled independently. This is not really easy when people are offsite or whatever. Another alternative is to have designated database developers who coordinate changes several developers need to the same table - that doesn't need to be their entire job, but gives you better DB design consistency. Or you can coordinate database revisions so that people become more aware of the DB revs other people are doing and time their changes to wait until a DB rev is available from another developer.

The best way to not put database into VCS in binary form. Period.
If you have text representation of your database and you have special merge tool to resolve conflicts when your database will be changed in different branches -- then you can start thinking about versioning databases. Otherwise it will be constant pain in the ass.

Related

Copy dbo to a new Schema in MSSQL

I got a database in SQL where everything is in the dbo schema. Now we want to copy that schema to have two schemas in the same database with the exact same content. Is there any easy way to do this in an Azure Database?
(We want to separate our Development and UAT but still only use one database)
While the other answer posted here, using the SSIS to transfer the SQL objects, will work, I feel compelled to point out that your approach raises a lot of other concerns.
Using a single database for two environments is not a good practice. The first big issue with is how do you handle deployments? Let's say UAT is in the uat schema and development is in development is the dev schema. You make a change to the Customers table, how do you deploy the table change to both schemas? If you use SSIS, you will need an on-premise SSIS server that handles copying the changes to the various shcemas in the target database. This will create a large maintenance headache and likely lead to important changes being wiped out.
Another issue this results in how does your application target a specific schema? You can have a login defaulted to a specific schema when it runs, but many ORM tools will want to the schema ahead of time. This will force to write the code in way that could potentially force to deploy different code to different environments. This opens up the possibility that parts of the code won't get test until production.
The last concern I have is with this approach, versioning your database becomes difficult and many of the tools that are out there, won't support what you are doing. This means you will likely be creating custom processes and tools to deploy a database instead of leveraging tools built by vendors like Microsoft or Red Gate. This puts you in a position where you need to support not only the application you make for your customers, but also an application to do your job (basically doubling your work).
My suggestion is think about the need to run two environments in a single database. I'm assuming this is likely due to cost, in which case you might find this to be false. Azure has many pricing tiers to support customers with various budgets. Depending on your application workload for both environments, you will likely find you need a large DTU database to support both. You might find that by having two databases, you can leverage smaller DTUs tiers which may end up being cheaper.
Please use the CopySchema option of the Transfer SQL Server Objects Task in SSIS as explained here.

Migration using model first approach in entity framework

I have setup a system where I have taken the model first approach as it made more logical sense for me. Now when even I have some changes in the model currently what I do is -
Use the Generate database from model feature of entity framework. I create a dummy database and apply those scripts. which deletes all my data and tables first and then updates the database with the latest sql file which is generated by entity framework.
Now I use the Visual Studio's schema compare feature and generate migration scripts for my local database and also for the one which is in production.
I go through the scripts manually and verify them. Once that is done I run the migration scripts on the production instances.
Question : The main problem is that is really tedious and since I do it from my local system, connecting to my prod databases is very slow and sometimes my visual studio also crashes. Is there a more cleaner approach to do this? Which is more automated such that my laptop is not really responsible for the database migrations on the production instances?
You can try Database Migration Power Pack - it allows creating change scripts instead of full database scripts but on behind it does the same procedure as you did by hand. The problem is that mentioned tool will not work with EF5.
Unfortunately EF migrations currently don't support models created through EDMX. Migrations support only code first approach at the moment.
In a Schema First design I use ApexSQL Diff (quite likely very similar to RedGate's product, perhaps a bit cheaper) - a good 3rd party tool is much easier to use than a VS Database Project and is easy to apply with a script-application tool like RoundHousE.
Using it in a Model First approach can follow the Schema First approach using a cycle of Model‑Schema‑Diff‑Schema‑Model as described in the post; consider these guidelines/notes below to make for a streamlined process. The schema-diff approach does not need to be tedious, slow, or excessively manual.
The current version of the database schema is obtained by applying a sequence of database patches (or DDL/DML scripts).
A tool (we use RoundHousE) automatically applies the scripts, as needed. It records information to know which scripts have been applied. Applying the same scripts is idempotent.
Diff done against a local database; this local database can be built up from all the previous change scripts in an automated fashion. This latest-local is always the diff target for the latest model changes.
The remote/live database is never used as a diff target. The same scripts can be applied later to the test (and then live) databases. Since everything is done the same way then the process is repeatable on all databases.
The only "issue" is that an update that is not well thought out may lead to data that is invalid under new restrictions/constraints. Of course, this was easy to identify, fix, and re-diff before pushing to the live database.
Once a diff is committed to source control it must be applied on the branch. To "undo" a previously commit change-script requires creating a new diff applying an inverse action. There is no implicit down-version.
We have a [Hg] model branch that affectively acts as a schema lock that that must be unified against; this could be viewed as a weak point, but it has worked well with small-team development.
A tool like Huagati DBML/EDMX is used to synchronize the Schema back to the Model which is really useful when developing. This little gem really pays for itself and is part of the cycle. When this is employed it's easy to also "update to a model" or make Schema changes in SSMS (or whatever) and then bring them back over.
The Code First migrations are "OK" (and definitely better than naught!), but I'm only using them because Azure SQL (aka SQL Database) is not supported by advanced diff tooling due to not exposing various sys information. (The diffs can be done locally as per normal, but ApexSQL Diff generates DDL/DML that is not always friendly with Azure SQL - plus, it's a chance for me to learn a slightly different approach :-)
Some advantages of Code First migrations via the Power Pack: can perform update tasks in C# instead of being limited to the DDL/DML (can be convenient), automatic downgrades (although I question their use), do not need to purchase a 3rd party tool (can be expensive), easier integration/deployment to Azure SQL, less tied to a specific database vendor (in theory), etc.
While Code First migrations (and automation of such) are a good step forward vs. the absolutely horrid Drop-and-Recreate approach, I much prefer dedicate SQL tooling when developing.

What is the State of the Art for deploying database updates to production databases?

Every shop at which I've worked has had their own cobbled-together, haphazard, poorly understood and poorly maintained method for updating production databases.
I've never seen a consistent method for doing this.
So, in the most recent versions of SQL Server, what is the best practice for updating schema changes and migrating data from a development or test server to a production server?
Is there a 3rd party tool which handles this painlessly?
I'd imagine the ultimate tool would be able to
detect schema changes between two DBs and generate DDL to update one to the other.
include the ability to have custom code which performs custom data migration steps
allow versioning so a v1 db could be updated all the way to a v99 database, running all scripts and migration steps in order.
The three things I've used are:
For schemas
Visual Studio Database Projects. Meh. They are okay but you still have to do alot of the work yourself.
Red Gate's SQL Compare and the entire SQL Toolbelt. They've worked pretty hard to make this something you can version control. In practice I've found with databases you are usually trying to get from point A in the version timeline to point B. With binaries, you often just clobber whatever is there with point B (an oversimplification I know, but often true).
http://www.red-gate.com/
xSQL is a good place to start if your system is small and perhaps will remain small:
http://www.xsqlsoftware.com/LiteEdition.aspx
I don't work for or know anyone who works for or get any money from these people. Just telling you what I've done in the past.
For data
Red Gate has SQL Data Compare.
However, if you want something "free" (or included with SQL Server)
I've actually had a lot of success just using BCP and writing a small system that injects and extracts data. Generally when I find myself doing this I ask myself, "Why? If I am changing data, does that mean I am really changing something that is configuration? Can I use a different method here?" But sometimes you can't (maybe it's a legacy system where the original devs thought databases are for everything).
The problem with BCP extracts is they don't version control very well. There are tricks I've used like extracting in character mode and stuffing an order by in the extract query to try and pull rows out in an order that makes them somewhat more palatable for version control.
For small Projects I have used RedGate to manage schema and data migrations with alot of success. Very easy to use works for most cases.
For larger enterprise systems for Schema and data changes normally you save all the SQL scripts as text files and run them. We also include a Rollback script to run incase something goes wrong during the migration. Run this on UAT server then Test/staging/pre prod server then on Production. Saving a copy of all these files plus their roll back scripts should allow you to move from multiple versions of a DB.
There is also http://code.google.com/p/migratordotnet/ if your using .NET it allows you to define these scripts in CODE. Very usesful if you want to deploy across multiple DBs in an automated way. Makes it easy to say set my DB to version 23. Or revert my DB to version 5. etc. Works for schema and data, but I would only really use it for a few lines of data.
First you have to think that the requirements between scenarios vary a lot:
Customers purchase v1 of the product at Costco and install it in they home office or small business. When v2 comes out, customer purchases a box of the product and installs it on a new computer. It exports the data from the v1 installation and imports it into v2 installation. Even though behind the scenes both v1 and v2 use a SQL Express instance there is no supported upgrade. Schema changes on the deployed databases are not expected (hidden database, non technical user) and definitely not supported. The only 'upgrade' path supported is an explicit export/import, which probably uses an XML file or something similar.
A business purchases v1 of the product with a support contract. It installs it on its department SQL Server instance, from where the data is accessed by the purchased product and by many more integration services, reports etc. When v2 is released, the customer runs the prescribed upgrade procedure, if it runs into problems it calls the product vendor customer support line which walks the customer through some specific steps for his deployment. Database schema customizations are expected and often supported, including upgrade scenarios, but the schema changes are done by the customer (not known at v2 design time).
A web startup has database that backs the site. Developers make changes on their personal instances and check in changes. Automated build deployment with contiguous integration picks up the changes and deploys them against a test instance, and run build validation tests. The main branch build can be, at any moment, deployed into production. Production is the one database that backs the site. The structure of the production database is documented and understood 100%, every single change to the production database schema occurs through the build system and QA process. On a side note, this is the scenarios most SO users that ask your question have in mind, minus the part about '100% documented and understood'. I give the example of WWW backing site, but deplyment can really be anything. The gist of it is that there is only one production database (it may include HA/DR copies, and it may consist of multiple actual SQL Server databases), and is the only database that has to be upgraded.
A succesfull web startup. Same as above, but the production database has 5TB of data and 5 minutes of downtime make the CNN headlines. Schema changes may involve setting up replicas and copying data into new schemas with contiguous updates, followed by an online switch of operations to the replica. Schema changes are designed by MCM experts and deployn a schema change can be a multi-week process.
I can go on wit more scenarios. The point is that the requirement of each of these cases are so vastly different, that no 'state of the art' can answer all of them. Some scenarios will be perfectly OK with a schema diff deployment tool like vsdbcmd or SQL Compare. Other scenarios will be much better faced with explicit versioning scripts. Other might have such specific requirements (eg. 0 downtime) that each upgrade is a project on its own and has to be specifically custom tailored.
One thing is clear though across all scenarios: if your shop threats the development database MDF file* as 'source' and makes changes to it using the management tools, that is always a major #fail. All changes should be captured explicitly as some sort of source control artifact, and this is why I favor most the explicit version scripts, as in Version Control and your Database. But I recon that the VSDB project support for compile time schema validation and its ease of refactoring schema objects make a pretty powerful proposition and VSDB schema compare deployment may be OK.
Another important approache that has to be addressed is the code first schema modeling from tools like EF or LinqToSql. It works brilliantly to deploy v1, but fails miserably at any subsequent version. I strongly discourage these approaches.
But to sum up and answer in brief: as today, the state of the art sucks.
At Red Gate we'd recommend one of two approaches depending on your requirements and how formal you need your processes to be. If you have a development database and simply want to push changes to production, SQL Compare is the tool for the job. A level of versioning can be achieved by using the schema snapshots.
However, if you wants full source control benefits, such as team collaboration, sandboxed environments, audit trail, compliance, history, rollback, etc, you should consider SQL Source Control. This links development databases to Team Foundation Server or Subversion.

Sub version for database (I want something for data values in the database, not for the schema)

I am using github for maintaining versions and code synchronization.
We are team of two and we are located at different places.
How can we make sure that our databases are synchronized.
Update:--
I am rails developer. But these days i m working on drupal projects (where database is the center of variations). So i want to make sure that team must have a synchronized database. Also the values in various tables.
I need something which keep our data values synchronized.
Centralized database is a good solution. But things get disturbed when someone works offline
if you use visual studio then you can script your database tables, views, stored procedures and functions as .sql files from a database solution and then check those into version control as well - its what i currently do at my workplace
In you dont use visual studio then you can still script your sql as .sql files [but with more work] and then version control them as necessary
Have a look at Red Gate SQL Source Control - http://www.red-gate.com/products/SQL_Source_Control/
To be honest I've never used it, but their other software is fantastic. And if all you want to do is keep the DB schema in sync (rather than full source control) then I have used their SQL Compare product very succesfully in the past.
(ps. I don't work for them!)
You can use Sql Source Control together with Sql Data Compare to source control both: schema and data. Here is an article from redgate: Source controlling data.
These are some of the possibilities.
Using the same database. Set-up a central database where everybody can connect to. This way you are sure everybody uses the same database all the time.
After every change, export the database and commit it to the VCS. This option requires discipline and manual labor.
Use some kind of other definition of the schema. For example, Doctrine for php has the ability to build the database from a yaml definition which can be stored in the vcs. This can be easier automated then point 2.
Use some other software/script which updates the database.
I feel your pain. I had terrible trouble getting SQL Server to play nice with SVN. In the end I opted for a shared database solution. Every day I run an extensive script to backup all our schema definitions (specifically stored procedures) for version control into text files. Due to the limited number of changes this works well.
I now use this technique for our major project and personal projects too. The only negative is that it relies on being connected all the time. The other answers suggest that full database versioning is very time consuming and I tend to agree. For "live" upgrades we use the Red Gate tools, they do both schema and data compare and it works very well.
http://www.red-gate.com/products/SQL_Data_Compare/. We were using this tool for keeping databases in sync in our company. Later we had some specific demands so we had to write our own code for synchronization. Depends how complex is you database and how much changes is happening. It is much simpler if you have time when no one is working and you can lock database for syncronization.
Check out OffScale DataGrove.
This product tracks changes to the entire DB - schema and data. You can tag versions in any point in time, and return to older states of the DB with a simple command. It also allows you to create virtual, separate, copies of the same database so each team member can have his own separate DB. All the virtual copies are tracked into the same repository so it's super-easy to revert your DB to someone else's version (you simply check-out their version, just like you do with your source control). This means all your DBs can always be synchronized.
Regarding a centralized DB - just like you don't want to work on the same source code, you don't want to be working on the same DB. It means you'll constantly break each other's code and builds each time someone changes something in the DB.
I suggest that you go with a separate DB for each developer, and sync them using DataGrove.
Disclaimer - I work at OffScale :-)
Try Wizardby. This is my personal project, but I've used it in my several previous jobs with great deal of success.
Basically, it's a tool which lets you specify all changes to your database schema in a database-independent manner and then apply these changes to all your databases.

Testing and Managing database versions against code versions

As you develop an application database changes inevitably pop up. The trick I find is keeping your database build in step with your code. In the past I have added a build step that executed SQL scripts against the target database but that is dangerous in so much as you could inadvertanly add bogus data or worse.
My question is what are the tips and tricks to keep the database in step with the code? What about when you roll back the code? Branching?
Version numbers embedded in the database are helpful. You have two choices, embedding values into a table (allows versioning multiple items) that can be queried, or having an explictly named object (such as a table or somesuch) you can test for.
When you release to production, do you have a rollback plan in the event of unexpected catastrophe? If you do, is it the application of a schema rollback script? Use your rollback script to rollback the database to a previous code version.
You should be able to create your database from scratch into a known state.
While being able to do so is helpful (especially in the early stages of a new project), many (most?) databases will quickly become far too large for that to be possible. Also, if you have any BLOBs then you're going to have problems generating SQL scripts for your entire database.
I've definitely been interested in some sort of DB versioning system, but I haven't found anything yet. So, instead of a solution, you'll get my vote. :-P
You really do want to be able to take a clean machine, get the latest version from source control, build in one step, and run all tests in one step. Making this fast makes you produce good software faster.
Just like external libraries, database configuration must also be in source control.
Note that I'm not saying that all your live database content should be in the same source control, just enough to get to a clean state. (Do back up your database content, though!)
Define your schema objects and your reference data in version-controlled text files. For example, you can define the schema in Torque format, and the data in DBUnit format (both use XML). You can then use tools (we wrote our own) to generate the DDL and DML that take you from one version of your app to another. Our tool can take as input either (a) the previous version's schema & data XML files or (b) an existing database, so you are always able to get a database of any state into the correct state.
I like the way that Django does it. You build models and the when you run a syncdb it applies the models that you have created. If you add a model you just need to run syncdb again. This would be easy to have your build script do every time you made a push.
The problem comes when you need to alter a table that is already made. I do not think that syncdb handles that. That would require you to go in and manually add the table and also add a property to the model. You would probably want to version that alter statement. The models would always be under version control though, so if you needed to you could get a db schema up and running on a new box without running the sql scripts. Another problem with this is keeping track of static data that you always want in the db.
Rails migration scripts are pretty nice too.
A DB versioning system would be great, but I don't really know of such a thing.
While being able to do so is helpful (especially in the early stages of a new project), many (most?) databases will quickly become far too large for that to be possible. Also, if you have any BLOBs then you're going to have problems generating SQL scripts for your entire database.
Backups and compression can help you there. Sorry - there's no excuse not to be able to get a a good set of data to develop against. Even if it's just a sub-set.
Put your database developments under version control. I recommend to have a look at neXtep designer :
http://www.nextep-softwares.com/wiki
It is a free GPL product which offers a brand new approach to database development and deployment by connecting version information with a SQL generation engine which could automatically compute any upgrade script you need to upgrade any version of your database into another. Any existing database could be version controlled by a reverse synchronization.
It currently supports Oracle, MySql and PostgreSql. DB2 support is under development. It is a full-featured database development environment where you always work on version-controlled elements from a repository. You can publish your updates by simple synchronization during development and you can generate exportable database deliveries which you will be able to execute on any targetted database through a standalone installer which validates the versions, performs structural checks and applies the upgrade scripts.
The IDE also offers you SQL editors, dependency management, support for modular database model components, data model diagrams, SQL clients and much more.
All the documentation and concepts could be found in the wiki.

Resources