Is it a good idea to keep database migration inside VCS? - database

The conventional wisdom seems to be that database migrations should be kept inside the VCS - that way there is a record of all the changes the database went through.
But...
What is the use of having old migrations? I don't really see myself reverting to the old version of db. Wouldn't it be easier to just keep them out of the VCS, and create a migration queue on every machine that doesn't have to be kept in sync with everybody else's migraton queues?

If you think keeping old migrations around is waste then you don't understand migrations. Migrations allow you to rollback a change. You might not imagine doing it, but it can be necessary. Full migration histories allow collaborative teams to remain in sync regardless of version they might have when they start.
Trashing and starting over is one of the worst things you can do with South. You completely screw up other developers unless you explicitly tell everyone to go in and clear out the south_migrationhistory table and delete all the existing migrations and create a new init migration.
In short. Leave migrations in VCS, it's where they belong so that anyone coming into a project from any point can quickly migrate their db to the current version. Don't clean them out, they don't hurt anything and you create hassles for other collaborators by doing so.

I think you are right there is little use in keeping old migrations, even if they do no harm I also do not like unused code in my projects.
While using a migration system like south helps a lot during development it is true that you are probably not going to need old schema migrations about consolidated features/changes (and in case you still have the model changes in python code available on the VCS).
From time to time I do trash and recreate all migration in one single initial migration and start again to collect new migrations.
Keeping migration outside of the VCS is not a good idea at it will probably if it is not going introduce problems it will slow down development for sure.
EDIT:
Since the clean up process involves the team to be aware of this reorganization (and clear the migration history table) it is not suggested to do so when you can't reach easily all developers/users.

Related

Start over with Laravel's migrations

I'm working on a bigger Laravel project for a while and have currently over 70 migrations (using MySQL). I've never written tests (shame on me) and like to catch up on that now.
I've used foreign keys at the beginning. I've renamed some and then I removed them completely with subsequent migrations.
Now a temporary sqlite database should be used for testing. But as I was renaming fk's by name and MySQL's naming convention differs from the sqlite's, sqlite cannot find these. There are also a bunch of other errors.
I'm wondering if I could delete all my migrations and create a single migration which holds all the database structure of the current state. Starting from scratch, so to speak.
Is that advisable? What should happen to the migrations table?
While developing you are usually just continually adding migrations to get the structure you need. If you are at a point you like the structure, you can definitely condense down your migrations.
Often times when someone is developing locally they might do this, as they get to a spot where they might want to start deploying the application they will condense down the migrations, as you know the structure you need and you don't need to 'go backwards' as this is the schema you need. You only need to worry about future changes.
On any system you haven't ran these migrations you will be perfectly fine as it will just run the first time and you will have your complete schema ready to go. If you already have migrations table and don't want to rollback, you are in a pickle ... you would have to manually alter the migrations table to remove all the migrations that don't exist any more basically and change the remaining one(s) to match what you have after your consolidation. (If you were hoping to be able to rollback/refresh ... but in reality you won't be rolling back or refreshing after a certain point, only moving forward [often due to not wanting to loss data])
There isn't really much to worry about in just condensing down your migrations ... they are your schema and if you know the schema you want, no need to step through 20 different migrations that might be altering the same tables numerous times to get to that final result.

Altering database tables on updating website

This seems to be an issue that keeps coming back in every web application; you're improving the back-end code and need to alter a table in the database in order to do so. No problem doing manually on the development system, but when you deploy your updated code to production servers, they'll need to automatically alter the database tables too.
I've seen a variety of ways to handle these situations, all come with their benefits and own problems. Roughly, I've come to the following two possibilities;
Dedicated update script. Requires manually initiating the update. Requires all table alterations to be done in a predefined order (rigid release planning, no easy quick fixes on the database). Typically requires maintaining a separate updating process and some way to record and manage version numbers. Benefit is that it doesn't impact running code.
Checking table properties at runtime and altering them if needed. No manual interaction required and table alters may happen in any order (so a quick fix on the database is easy to deploy). Another benefit is that the code is typically a lot easier to maintain. Obvious problem is that it requires checking table properties a lot more than it needs to.
Are there any other general possibilities or ways of dealing with altering database tables upon application updates?
I'll share what I've seen work best. It's just expanding upon your first option.
The steps I've usually seen when updating schemas in production:
Take down the front end applications. This prevents any data from being written during a schema update. We don't want writes to fail because relationships are messed up or a table is suddenly out of sync with the application.
Potentially disconnect the database so no connections can be made. Sometimes there is code out there using your database you don't even know about!
Run the scripts as you described in your first option. It definitely takes careful planning. You're right that you need a pre-defined order to apply the changes. Also I would note often times you need two sets of scripts, one for schema updates and one for data updates. As an example, if you want to add a field that is not nullable, you might add a nullable field first, and then run a script to put in a default value.
Have rollback scripts on hand. This is crucial because you might make all the changes you think you need (since it all worked great in development) and then discover the application doesn't work before you bring it back online. It's good to have an exit strategy so you aren't in that horrible place of "oh crap, we broke the application and we've been offline for hours and hours and what do we do?!"
Make sure you have backups ready to go in case (4) goes really bad.
Coordinate the application update with the database updates. Usually you do the database updates first and then roll out the new code.
(Optional) A lot of companies do partial roll outs to test. I've never done this, but if you have 5 application servers and 5 database servers, you can first roll out to 1 application/1 database server and see how it goes. Then if it's good you continue with the rest of the production machines.
It definitely takes time to find out what works best for you. From my experience doing lots of production database updates, there is no silver bullet. The most important thing is taking your time and being disciplined in tracking changes (versioning like you mentioned).

Would it make sense to use Git for this project?

We have a project where 99% of the code is PL/SQL, including the front end (Oracle forms). All 10 developers use the same DB instance for developement. The project is big (thousands of DB objects) so there is rarely any contention and any that exist is serialized by locking objects in Subversion before making any changes to them in DB (this is manual and not automated).
Would it make sense to use Git or some other distributed VCS in this situation?
My current thoughts are that it would not, as all changes affect all other developers immediately, even before they are commited in SVN.
My opinion is no.
I love DVCSs because of it dynamic nature: I can do changes here, commit, there, commit, merge one to another, make changes to it, commit, and only after that finally merge new changes to the “tunk” so that other will see them and will be affected by them.
In your situation developers use the same DB instance for development. So any changes affect all other even before the code is committed. So I see no point in using D (Distributed) features of VCS in your case.

Managing the migration of breaking database changes to a database shared by old version of the same application

One of my goals is to be able to deploy a new version of a web application that runs side by side the old version. The catch is that everything shares a database. A database that in the new version tends to include significant refactoring to database tables. I would like to be rollout the new version of the application to users over time and to be able to switch them back to the old version if I need to.
Oren had a good post setting up the issue, but it ended with:
"We are still in somewhat muddy water with regards to deploying to production with regards to changes that affects the entire system, to wit, breaking database changes. I am going to discuss that in the next installment, this one got just a tad out of hand, I am afraid."
The follow-on post never came ;-). How would you go about managing the migration of breaking database changes to a database shared by old version of the same application. How would you keep the data synced up?
Read Scott Ambler's book "Refactoring Databases"; take with a pinch of salt, but there are quite a lot of good ideas in there.
The details of the solutions available depend on the DBMS you use. However, you can do things like:
create a new table (or several new tables) for the new design
create a view with the old table name that collects data from the new table(s)
create 'instead of' triggers on the view to update the new tables instead of the view
In some circumstances, you don't need a new table - you may just need triggers.
If the old version has to be maintained, the changes simply can't be breaking. That also helps when deploying a new version of a web app - if you need to roll back, it really helps if you can leave the database as it is.
Obviously this comes with significant architectural handicaps, and you will almost certainly end up with a database which shows its lineage, so to speak - but the deployment benefits are usually worth the headaches, in my experience.
It helps if you have a solid collection of integration tests for each old version involved . You should be able to run them against your migrated test database for every version which is still deemed to be "possibly live" - which may well be "every version ever" in some cases. If you're able to control deployment reasonably strictly you may get away with only having compatibility for three or four versions - in which case you can plan phasing out obsolete tables/columns etc if there's a real need. Just bear in mind the complexity of such planning against the benefits accrued.
Assuming only 2 versions of your client, I'd only keep one copy of the data in the new tables.
You can maintain the contract between the old and new apps behind views on top of the new tables.
Use before/instead of triggers to handle writes into the "old" views that actually write into the new tables.
You are maintaining 2 versions of code and must still develop your old app but it is unavoidable.
This way, there are no synchronisation issues, effectively you'd have to deal with replication conflicts between "old" and "new" schemas.
More than 2 versions becomes complicated as mentioned...
First, I would like to say that this problem is very hard and you might not find a complete answer.
Lately I've been involved in maintaining a legacy line of business application, which might soon evolve to a new version. Maintenance includes solving bugs, optimization of old code and new features, that sometimes cannot fit easily in the current application architecture. The main problem with our application is that it was poorly documented, there is no trace of changes and we are basically the 5th rotation team working on this project (we are fairly new to it).
Leaving the outer details on the side (code, layers, etc), I will try to explain a little how we are currently managing the database changes.
We have at this moment two rules that we are trying to follow:
First, is that old code (sql, stored procs, function, etc) works as is and should be kept as is, without modifying too much unless there is the case (bug or feature change), and of course, try to document it as much as possible (especially the problems like:
"WTF!, why did he do that instead of that?").
Second is that every new feature that comes in should use the best practices known at this moment, and modify the old database structure as little as it can. This would introduce some database refactoring options like using editable views on top of the old structure, introducing new extension tables for already existing ones, normalizing the structure and providing the older structure through views, etc.
Also, we are trying to write as many unit tests as we can provided the business analysts are working side by side and documenting the business rules.
Database refactoring is a very complex field to be answered in a short answer. There are a lot of books that answer all your problems, one http://databaserefactoring.com/ being pointed in one of the answers.
Later Edit: Hopefully the second rule will also answer the handling of breaking changes.

Checklist for Database Schema Upgrades

Having to upgrade a database schema makes installing a new release of software a lot trickier. What are the best practices for doing this?
I'm looking for a checklist or timeline of action items, such as
8:30 shut down apps
8:45 modify schema
9:15 install new apps
9:30 restart db
etc, showing how to minimize risk and downtime. Issues such as
backing out of the upgrade if things go awry
minimizing impact to existing apps
"hot" updates while the database is running
promoting from dev to test to production servers
are especially of interest.
I have a lot of experience with this. My application is highly iterative, and schema changes happen frequently. I do a production release roughly every 2 to 3 weeks, with 50-100 items cleared from my FogBugz list for each one. Every release we've done over the last few years has required schema changes to support new features.
The key to this is to practice the changes several times in a test environment before actually making them on the live servers.
I keep a deployment checklist file that is copied from a template and then heavily edited for each release with anything that is out of the ordinary.
I have two scripts that I run on the database, one for schema changes, one for programmability (procedures, views, etc). The changes script is coded by hand, and the one with the procs is scripted via Powershell. The change script is run when everything is turned off (you have to pick a time that annoys the least amount of users for this), and it is run command by command, manually, just in case anything goes weird. The most common problem I have run into is adding a unique constraint that fails due to duplicate rows.
When preparing for an integration testing cycle, I go through my checklist on a test server, as if that server was production. Then, in addition to that, I go get an actual copy of the production database (this is a good time to swap out your offsite backups), and I run the scripts on a restored local version (which is also good because it proves my latest backup is sound). I'm killing a lot of birds with one stone here.
So that's 4 databases total:
Dev: all changes must be made in the change script, never with studio.
Test: Integration testing happens here
Copy of production: Last minute deployment practice
Production
You really, really need to get it right when you do it on production. Backing out schema changes is hard.
As far as hotfixes, I will only ever hotfix procedures, never schema, unless it's a very isolated change and crucial for the business.
I guess you have considered the reads of Scott Ambler?
http://www.agiledata.org/essays/databaseRefactoring.html
This is a topic that I was just talking about at work. Mainly the problem is that unless database migrations is handled for you nicely by your framework, eg rails and their migration scripts, then it is left up to you.
The current way that we do it has apparent flaws, and I am open to other suggestions.
Have a schema dump with static data that is required to be there kept up to date and in version control.
Every time you do a schema changing action, ALTER, CREATE, etc. dump it to a file and throw it in version control.
Make sure you update the original sql db dump.
When doing pushes to live make sure you or your script applies the sql files to the db.
Clean up old sql files that are in version control as they become old.
This is by no means optimal and is really not intended as a "backup" db. It's simply to make pushes to live easy, and to keep developers on the same page. There is probably something cool you could setup with capistrano as far as automating the application of the sql files to the db.
Db specific version control would be pretty awesome. There is probably something that does that and if there isn't there probably should be.
And if the Scott Ambler paper whets your appetite I can recommend his book with Pramod J Sadolage called 'Refactoring Databases' - http://www.ambysoft.com/books/refactoringDatabases.html
There is also a lot of useful advice and information at the Agile Database group at Yahoo - http://tech.groups.yahoo.com/group/agileDatabases/
Two quick notes:
It goes without saying... So I'll say it twice.
Verify that you have a valid backup.
Verify that you have a valid backup.
#mk. Check out Jeff's blog post on database version control (if you haven't already)

Resources