We have a project where 99% of the code is PL/SQL, including the front end (Oracle forms). All 10 developers use the same DB instance for developement. The project is big (thousands of DB objects) so there is rarely any contention and any that exist is serialized by locking objects in Subversion before making any changes to them in DB (this is manual and not automated).
Would it make sense to use Git or some other distributed VCS in this situation?
My current thoughts are that it would not, as all changes affect all other developers immediately, even before they are commited in SVN.
My opinion is no.
I love DVCSs because of it dynamic nature: I can do changes here, commit, there, commit, merge one to another, make changes to it, commit, and only after that finally merge new changes to the “tunk” so that other will see them and will be affected by them.
In your situation developers use the same DB instance for development. So any changes affect all other even before the code is committed. So I see no point in using D (Distributed) features of VCS in your case.
Related
This seems to be an issue that keeps coming back in every web application; you're improving the back-end code and need to alter a table in the database in order to do so. No problem doing manually on the development system, but when you deploy your updated code to production servers, they'll need to automatically alter the database tables too.
I've seen a variety of ways to handle these situations, all come with their benefits and own problems. Roughly, I've come to the following two possibilities;
Dedicated update script. Requires manually initiating the update. Requires all table alterations to be done in a predefined order (rigid release planning, no easy quick fixes on the database). Typically requires maintaining a separate updating process and some way to record and manage version numbers. Benefit is that it doesn't impact running code.
Checking table properties at runtime and altering them if needed. No manual interaction required and table alters may happen in any order (so a quick fix on the database is easy to deploy). Another benefit is that the code is typically a lot easier to maintain. Obvious problem is that it requires checking table properties a lot more than it needs to.
Are there any other general possibilities or ways of dealing with altering database tables upon application updates?
I'll share what I've seen work best. It's just expanding upon your first option.
The steps I've usually seen when updating schemas in production:
Take down the front end applications. This prevents any data from being written during a schema update. We don't want writes to fail because relationships are messed up or a table is suddenly out of sync with the application.
Potentially disconnect the database so no connections can be made. Sometimes there is code out there using your database you don't even know about!
Run the scripts as you described in your first option. It definitely takes careful planning. You're right that you need a pre-defined order to apply the changes. Also I would note often times you need two sets of scripts, one for schema updates and one for data updates. As an example, if you want to add a field that is not nullable, you might add a nullable field first, and then run a script to put in a default value.
Have rollback scripts on hand. This is crucial because you might make all the changes you think you need (since it all worked great in development) and then discover the application doesn't work before you bring it back online. It's good to have an exit strategy so you aren't in that horrible place of "oh crap, we broke the application and we've been offline for hours and hours and what do we do?!"
Make sure you have backups ready to go in case (4) goes really bad.
Coordinate the application update with the database updates. Usually you do the database updates first and then roll out the new code.
(Optional) A lot of companies do partial roll outs to test. I've never done this, but if you have 5 application servers and 5 database servers, you can first roll out to 1 application/1 database server and see how it goes. Then if it's good you continue with the rest of the production machines.
It definitely takes time to find out what works best for you. From my experience doing lots of production database updates, there is no silver bullet. The most important thing is taking your time and being disciplined in tracking changes (versioning like you mentioned).
Right now, the devs all have a their local dev environments with a snapshot of the production database - which they can twist, churn and beat up the data without affecting anyone but themselves.
These snapshots are starting to get large, and a data import of them is starting to take close to an hour.
Any better recommendations at maintaining dev data? The dev data can be ripped apart for potential changes, and then need to be put back together if a change idea was bad, etc.
I try to use the following approach:
Developers maintain a baseline script which is in version control and sets up the database schema from scratch. It creates the schema just as it exists in the production database.
They also maintain a 'script' to setup test data. This 'script' uses actually production classes and sometimes a little DSL on top of that. In order to be reasonable fast the script generates only minimal testdata. I recommend making it part of the definition of done to create some testdate for any new feature build.
Developers can run these scripts at will on their database (or database schema). The first script is also used as a basis for running automatic database tests.
Result of any work done by the developers is a migration script. i.e. a script that can be applied to the production database to bring it to the new desired state, including updates to data.
These migrations can be tested on snapshots of the production database. Snapshots of the production database are also used to run load and performance tests.
Only for the snapshots I use database specific tools. Mostly everything else is written in the main programming language (java for me) so the developers feel comfortable using it.
I often encounter resistance to this approache ("too many scripts", "too many databases", "I don't want to use version control, because my db modelling tool doesn't support it"). But appart from loads of manual works I don't really see an alternative.
In my experience, having a centralized DB+data for each environment: Development, Testing+Integration and Production has been the best approach.
Development: let the developers do whatever they want with it. If production-like data is required, obfuscate/remove sensitive data. The more lightweight this database is, the better for you to move, maintain and backup.
Testing: use it to simulate the production environment and let the
testers to input/retrieve all the data the want but only through your
application interfaces. This environment also allows you to test your deployments
before sending them to production, you don't want a bad DB installer
to leave the production app in an unusable state. If required, you
can input this environment with production data but obfuscate/remove
sensitive data too. You could use high volumes to spot performance issues before they get to production.
Production: Leave your production data/environment alone, you don't
want sensitive data to end up in the wrong hands or a DB error configuration to allow the developers to change data accidentally.
Usually, as a developer, you want a few things from the dev database set up.
You want it to be easy to work with - it should be straightforward to make changes, keep those changes versioned, and apply them to other environments.
You want to have representative data - and have that data be predictable. For instance, if you're building an invoicing system, you want clients with known credit limits so you can write test cases to track what happens to them as issue an invoice, have it paid etc. (Integration tests, rather than unit tests).
You want to be able to query against representative data volumes so performance issues arise in dev as well as production.
You never, ever want to be able to affect "real" data - for instance, you want email addresses and names to be anonymous, you want passwords to be re-set.
Continuous Database Integration offers a solution to most of this - and also solves the "it takes an hour to set up a database for a development environment" issue.
I'm in the same situation. I had the idea to move archive data to a read-only filegroup so that I only need to backup and restore it once. The non-archive data would be much smaller and could be copied more frequently to backup storage and to the dev machines.
Of course that only works if it is possible to split a big portion of the database size off to a read-only filegroup.
A different idea would be to restore once on a dev machine and use a database snapshot for quick restore to a clean state. I found that one particularly useful.
The conventional wisdom seems to be that database migrations should be kept inside the VCS - that way there is a record of all the changes the database went through.
But...
What is the use of having old migrations? I don't really see myself reverting to the old version of db. Wouldn't it be easier to just keep them out of the VCS, and create a migration queue on every machine that doesn't have to be kept in sync with everybody else's migraton queues?
If you think keeping old migrations around is waste then you don't understand migrations. Migrations allow you to rollback a change. You might not imagine doing it, but it can be necessary. Full migration histories allow collaborative teams to remain in sync regardless of version they might have when they start.
Trashing and starting over is one of the worst things you can do with South. You completely screw up other developers unless you explicitly tell everyone to go in and clear out the south_migrationhistory table and delete all the existing migrations and create a new init migration.
In short. Leave migrations in VCS, it's where they belong so that anyone coming into a project from any point can quickly migrate their db to the current version. Don't clean them out, they don't hurt anything and you create hassles for other collaborators by doing so.
I think you are right there is little use in keeping old migrations, even if they do no harm I also do not like unused code in my projects.
While using a migration system like south helps a lot during development it is true that you are probably not going to need old schema migrations about consolidated features/changes (and in case you still have the model changes in python code available on the VCS).
From time to time I do trash and recreate all migration in one single initial migration and start again to collect new migrations.
Keeping migration outside of the VCS is not a good idea at it will probably if it is not going introduce problems it will slow down development for sure.
EDIT:
Since the clean up process involves the team to be aware of this reorganization (and clear the migration history table) it is not suggested to do so when you can't reach easily all developers/users.
Some of the people in my project seem to think that using a common development database with everyone connecting to it is the best thing. I think that it isn't and each developer having his own database (with periodic updated data dumps) is the best. Am I right or wrong? Have you encountered any problems in any of these approaches?
Disk space and CPU should be cheap enough that every developer can run their own instance of the database, with an automated build under version control. This is needed to allow developers to be bold in hacking on the database, in isolation from any other developer's concurrent hacking.
The caveat being, of course, that any changes they make to their private instance are useless to anyone else unless it can be automatically applied during the build process. So there needs to be a firm policy that application code can't depend on any database state unless that state is represented by version-controlled, unit-tested changes to the DDL.
For an excellent guide on the theory and practice of treating the database definition as another part of the project code, and coordinating changes and refactorings, see Refactoring Databases: Evolutionary Database Design by Scott W. Ambler and Pramod Sadalage.
I like having my own copy of the database for development, because it gives you the flexibility to rapidly change things without worrying how it will impact others.
However, if all the developers are hacking away on their own copy of the database, it becomes more and more difficult to merge everyone's work together in the end.
I think you can get the best of both worlds by letting developers work on a local copy during day-to-day development, but each developer should probably merge their work into a common copy on a pretty regular basis. Writing a lot of unit tests helps too.
We share a single database amongst all our developer (20-odd) but we've got it structured so that everyone has their own tables.
You don't need a separate database per developer if you structure the application right. It should be configurable which database or table-prefix it uses anyway so you can easily move it between instances (unit test, system test, acceptance test, production, disaster recovery and so on).
The advantage to using a single database is that the cost of maintenance is amortized. You don't have your DBAs trying to handle a lot of databases (or, if you're a small-DB shop, you don't have every developer trying to maintain their own database when they're better utilized in developing).
Having a single point of Failure is not a good thing isn't it?
I prefer a single, shared database. But it's very dependent on the situation and the applications being developed.
What works for me may not work for you. Go with your gut.
If you are working with Hibernate or any hibernate-based platform you can configure your database to be created when you start your server (create-drop option). This is very useful when you are adding new attributes to your classes. If this is the case each developer must have his own copy of the DB.
If you are not changing the DB structure at all then you can use a single shared DB.
In this second case is not a must. I prefer to have my own DB where I can do whatever I want. On the other hand remember that some queries can take a lot of time and this will affect your whole team if you are sharing a DB.
Having to upgrade a database schema makes installing a new release of software a lot trickier. What are the best practices for doing this?
I'm looking for a checklist or timeline of action items, such as
8:30 shut down apps
8:45 modify schema
9:15 install new apps
9:30 restart db
etc, showing how to minimize risk and downtime. Issues such as
backing out of the upgrade if things go awry
minimizing impact to existing apps
"hot" updates while the database is running
promoting from dev to test to production servers
are especially of interest.
I have a lot of experience with this. My application is highly iterative, and schema changes happen frequently. I do a production release roughly every 2 to 3 weeks, with 50-100 items cleared from my FogBugz list for each one. Every release we've done over the last few years has required schema changes to support new features.
The key to this is to practice the changes several times in a test environment before actually making them on the live servers.
I keep a deployment checklist file that is copied from a template and then heavily edited for each release with anything that is out of the ordinary.
I have two scripts that I run on the database, one for schema changes, one for programmability (procedures, views, etc). The changes script is coded by hand, and the one with the procs is scripted via Powershell. The change script is run when everything is turned off (you have to pick a time that annoys the least amount of users for this), and it is run command by command, manually, just in case anything goes weird. The most common problem I have run into is adding a unique constraint that fails due to duplicate rows.
When preparing for an integration testing cycle, I go through my checklist on a test server, as if that server was production. Then, in addition to that, I go get an actual copy of the production database (this is a good time to swap out your offsite backups), and I run the scripts on a restored local version (which is also good because it proves my latest backup is sound). I'm killing a lot of birds with one stone here.
So that's 4 databases total:
Dev: all changes must be made in the change script, never with studio.
Test: Integration testing happens here
Copy of production: Last minute deployment practice
Production
You really, really need to get it right when you do it on production. Backing out schema changes is hard.
As far as hotfixes, I will only ever hotfix procedures, never schema, unless it's a very isolated change and crucial for the business.
I guess you have considered the reads of Scott Ambler?
http://www.agiledata.org/essays/databaseRefactoring.html
This is a topic that I was just talking about at work. Mainly the problem is that unless database migrations is handled for you nicely by your framework, eg rails and their migration scripts, then it is left up to you.
The current way that we do it has apparent flaws, and I am open to other suggestions.
Have a schema dump with static data that is required to be there kept up to date and in version control.
Every time you do a schema changing action, ALTER, CREATE, etc. dump it to a file and throw it in version control.
Make sure you update the original sql db dump.
When doing pushes to live make sure you or your script applies the sql files to the db.
Clean up old sql files that are in version control as they become old.
This is by no means optimal and is really not intended as a "backup" db. It's simply to make pushes to live easy, and to keep developers on the same page. There is probably something cool you could setup with capistrano as far as automating the application of the sql files to the db.
Db specific version control would be pretty awesome. There is probably something that does that and if there isn't there probably should be.
And if the Scott Ambler paper whets your appetite I can recommend his book with Pramod J Sadolage called 'Refactoring Databases' - http://www.ambysoft.com/books/refactoringDatabases.html
There is also a lot of useful advice and information at the Agile Database group at Yahoo - http://tech.groups.yahoo.com/group/agileDatabases/
Two quick notes:
It goes without saying... So I'll say it twice.
Verify that you have a valid backup.
Verify that you have a valid backup.
#mk. Check out Jeff's blog post on database version control (if you haven't already)