How can I get my database under version control with Perl? - database

I've been looking at the options for getting our database schemas under version control. It seems that Ruby folks have got Rails Migrations, and .NET folks have got a few options (for instance this, this, and this). What about Perl?
I've seen this thread on PerlMonks which doesn't have much, although it mentions DBIX::Migration::Directories. Is anyone actually using this module, or some other module? Or do you roll your own DB migration solutions?
Gratuitous details:
We don't use DBIx::Class for the most part
We use MySQL
We use SVN

At work, we use a modified version of DBIx::Migration (it has some limitations, such as no more than 10 migrations). Then, you have a core schema that you've dumped from your database and when the version number is too low, you upgrade your database using the migrations from the migration schema directory.
I also highly recommend the Database Refactoring book. Amongst other things, it will give you excellent techniques for managing migrations safely in such a way that if you need to roll back, you don't lose data (such as when you drop a column you think you don't need).
To help with the automatic deprecation schedules it suggests, I've written Devel::Deprecate so that you don't need to remember when to do the deprecations. Your code will complain loudly for you (and only in testing, not in production).
Important: You'll periodically find that you're applying so many database migration levels with this technique that you'll sometimes need to "bump up" your minimum base migration because it takes too long to rebuild the database. Just take a new dump of the database at the desired migration level and remove all migrations less than or equal to that level.
Update: Fast forward a few years and today I recommend sqitch. It's designed from the ground up to handle the case of putting a database under version control without tying you to a particular programming language or VCS.

One very interesting project that's still probably a little young to rely on is Adam Kennedy's ORLite::Migrate which takes it's inspiration from Rails migrations. He wrote up a very interesting journal over at use.perl.org about his plans and I hope to keep an eye on it for the future.
It does appear that this package only works with SQLite at the moment but I think Adam's planning on building this out to be more database agnostic in the future.

In POPFile we use our own solution. We store a schema version number in the db and if the program detects that there is a newer schema, it will update the db accordingly. This is not exactly the best and most fun part of our code.
To be honest, I fail to see the advantage of using DBIx::Migration::Directories if you aren't already using DBIx::Class. You have to provide the SQL and the version numbers and the database handle. You might as well provide a little more code to find the sql file and and feed it to the database.
Of course, having the schema in version control is a great bonus.

We use a system similar to what Manni described. The two big disadvantages are:
Can't rollback schema changes (typically this is rare, not well tested and hard anyway so having to do it manually isn't a big deal IMO).
Using a sequential version number is a pain when you develop in multiple branches -- since you are using SVN this isn't as likely to be an issue as if you were using git though. :-)
The script script I use is here: database_update and there's a small example data file.

How about sqitch? It advertises itself as a "database change management application",

There is an interesting CPAN module (Database::Migrator). I have used it, and works fine in order to handle the migrations of your project.
Each migration goes into its own directory. Migrations are applied in sorted order, typically you name them starting with a number prefix. The migration directory can either contain files with SQL or Perl.

Related

Databases and "branch"

We are currently developping an application which use a database.
Every time we update the database structure, we have to provide a script to update the database from the previous version to the current one.
So the database has currently a number that gave us it's current version and then our software make an update when we want to use an "old" database.
The issue we are encountering is when we have branches:
When we create a new big feature, that will not be available for users(and not included in releases), we create a branch.
The main branch(trunk) will be merged regularly to ensure that the create brunch has the latest bug corrections.
Here is some illustration:
The issue is with our update scripts. They update from the previous version to the current one, then update the version number of the database.
Imagine that we have the DB version 17 when creating the branch.
We then do the branch, and make changes on the Trunk DB. The DB has now the version 18.
Then we make a db change on the branch. Since we know there has already been a new version "18", we create the version 19 and the updater 18->19.
Then the trunk is merged on the branch.
At this very moment we may have some updaters that will never runs.
If someone updated his database before the merge, his database will be flagged has having the version 19, the the update 17->18 will never be done.
We want to change this behavior but we can't find how:
Our constraints are:
We are unable to make all changes on the same branch
Sometimes we have more than just 2 branchs, and we can only merge from the trunk to the feature branch until the feature is finished
What can we do to ensure a continuity between our database branch?
I think the easiest way is to use the Ruby-on-rails approach. Every DB change is a separate script file, no matter how small. Each script file is numbered, and when you do an upgrade you simply run each script from the number your DB currently is to the last one.
What this means in practice is that your DB version system stops being v18 to v19, and starts being v18.0 to v18.01, then v18.02 etc. What you release to the customer may get rolled up into a big v19 upgrade script, but as you develop, you will be making many, many small upgrades.
You'll have to modify this slightly to work for your system, each script will either have to be renumbered as it gets merged to the branch or you will have to ensure the upgrade scripts don't simply track the last upgrade number, but track each upgrade number so missing holes will still get filled in as the script gets merged across.
You will also have to roll up these little upgrades into the next major number as you create the release tag (on the trunk first) to keep things sane.
edit: so fundamentally you first havew to get rid of the notion of using a upgrade sdcript to go from version to version. For example, if you start with a table, and trunk adds column A and the branch adds column B, then you merge trunk to branch - you cannot realistically "upgrade" to the version with both, unless the branch version number is always greater than the trunk's upgrade script, and that doesn't work if you subsequently merge trunk to the branch. So you must therefore scrap the idea of a "version" that applies to development branches. The only way round that is to update each change independently, and track each change individually. Then you can say you need the "last main release plus colA plus colB" (admittedly if you merge trunk in, you can take the current main release from trunk whether its v18 or v19, but you still need to apply each branch update individually).
So you start with trunk at DB v18. Branch and make changes. Then you merge trunk later, where the DB is at v19. Your earlier branch changes still need to be applied (or should already be applied, but you may need to write a branch-update script with all branch changes in it, if you re-create your DB). Note the branch does not have a "v20" version number at all, and the branches changes are not made to a single update script like you have on trunk. You can add these changes you make on branch as a single script if you like (or 1 script of 'since the last trunk merge' changes) or as many little scripts.
When the branch is complete, the very last task is to take all the DB changes made for the branch and toll them up into a script that can be applied to the master upgrader, and when it is merged onto trunk, that script is merged into the current upgrade script and the DB version number bumped.
There is an alternative that may work for you, but I found it to be a little flaky when you try to update DBs with data, sometimes it just couldn't manage to do the update and the DB had to be wiped and re-created (which, to be fair, is probably what would have had to happen if I used SQL scripts at the time). That's to use Visual Studio Database project. This stores every part of the schema as a file, so you'll have 1 script per table. These will be hidden from you by Visual Studio itself that will show you designers instead of scripts but they're stored as files in version control. VS can deploy the project and will try to upgrade your DB if it already exists. Be careful of the options, many defaults say "drop and create" instead of using alter to update an existing table.
These projects can generate a (largely machine-readable) SQL script for deployment, we used to generate these and deliver them to a DBA team who didn't use VS and only accepted SQL.
And lastly, there's Roundhouse which is not something I've used but it might help you to become the new upgrader "script". Its a free project and I've read its more powerful and easier to use than VS DB projects. Its a DB versioning and change management tool, integrates with VS, and uses SQL scripts.
We use the following procedure for about 1.5 years now. I don't know if this is the best solution, but we didn't have any trouble with it (except some human errors in a delta-file like forgetting a USE-statement).
It has some simularities with the answer that Krumia gave, but differs in the point that in this approach only new change scripts/delta files are executed. This makes it a lot easier to write those files.
Delta files
Write all the DB-changes you make for a feature in a delta-file. You can have multiple statements in one delta-file or split them up into multiple. Once committed that file it's best (and once merged it's necessary) to start a new one and leave the old one untouched.
Put all the delta-files in one directory and give them a name-pattern like YYYY-MM-DD-HH.mm.description.sql. It's essential that you can sort them in time (therefore the timestamp) so you know what file needs to be executed first. Besides that you don't want to have a merge conflict with those files so it should be unique (over all branches).
Merging/pulling
Create a merge-script (for examlpe a bash-script) that performs the following actions:
Note the current commit-hash
Do the actual merge (or pull)
Get a list of all the delta-files that are added with this merge (git diff --stat $old_hash..HEAD -- path/to/delta-files)
Execute those delta-files, in the order specified by the timestamp
By using git to determine what files are new (and thus what database-actions aren't executed yet on the current branch) you are not longer bound to version-numbering.
Alternating delta-files
It might happen that within one merge delta-files from different branches may be 'new to execute' and that those files alternate like this:
2014-08-04-delta-from-feature_A.sql
2014-08-05-delta-from-feature_B.sql
2014-08-06-delta-from-feature_A.sql
As the timestamp determines the execution-order there will be first added something from feature A, then feature B, then back again to feature A. When you write proper delta-files, that are executable by themself/stand-alone, that shouldn't be a problem.
We recently have started using the Sql Server Data Tools (SSDT), which replaced the Visual Studio Database Project type, to version control our SQL databases. It creates a project for each database, with items for views and stored procedures and the ability to create Data-Tier Applications (DACPAC) that can be deployed to SQL Server instances. SSDT also supports Unit Testing and Static Data, and offers developers the option of quick sandbox testing using a LocalDB instance. There is a a good TechEd video overview of the SSDT tools and a lot more resources online.
In your situation you would use SSDT to manage your database objects in version control along side your application code, using the same merging process to push features between branches. When it comes time to upgrade an existing install you would create the DACPACs and use the Data-Tier Application upgrade process to apply the changes. Alternatively you could also use database synchronization tools such as DBGhost or RedGate to apply updates to the existing schema.
You want database migrations. Many frameworks have plugins for this. For instance CakePHP uses a plugin from CakeDC to manage. Here are some generic tools: http://en.wikipedia.org/wiki/Schema_migration#Available_Tools.
If you want to roll your own, perhaps instead of keeping the current DB version in the database, you keep a list of which patches have been applied. So instead of version table with one row with value 19, you instead have a patches table with multiple rows:
Patches
1
2
3
4
5
8
Looking at this you need to apply patches 6 and 7.
I just stumbled upon an older article written in 2008 by Jeff Atwood; hopefully it is still relevant to your problem.
Get Your Database Under Version Control
It mentiones five part series written by K. Scott Allen:
Three rules for database work
The Baseline
Change Scripts
Views, Stored Procedures and the Like
Branching and Merging
There are tools specifically designed to deal with this type of problems.
One is DBSourceTools
DBSourceTools is a GUI utility to help developers bring SQL Server
databases under source control. A powerful database scripter, code
editor, sql generator, and database versioning tool. Compare Schemas,
create diff scripts, edit T-SQL with ease. Better than Management
Studio.
Another one:
neXtep Designer
NeXtep designer is an Integrated Development Environment for database
developers. The main concept behind the product is to take advantage
of versioning in order to compute the incremental SQL scripts you need
to deliver your developments.
This project aims at building a development platform that provides all
tools which a database developer needs while automating the tasks of
generating the deliveries (= SQL resulting from a development).
To learn more about the problematic of delivering database updates, we
invite you to read the Delivering database updates article which will
present you our vision of best and worst practices.
I think an approach which will satisfy most of your requirements is to embrace the "Database Refactoring" concept.
There is a good book on this topic Refactoring Databases: Evolutionary Database Design
A database refactoring is a small change to your database schema which
improves its design without changing its semantics (e.g. you don't add
anything nor do you break anything). The process of database
refactoring is the evolutionary improvement of your database schema so
as to improve your ability to support the new needs of your customers,
support evolutionary software development, and to fix existing legacy
database design problems.
The book describes database refactoring from the point of view of:
Technology. It includes full source code for how to implement each refactoring at the database level and for most refactorings we
show how the application would change to reflect the change in the
database. Our code examples are in Oracle, Java, and Hibernate
meta-data (the refactorings are easy to translate to other
environments, and sometimes we discuss vendor-specific features which
simplify some refactorings).
Process. It describes in detail the process of database refactoring in both the simple situation of a single application
accessing the database as well as the situation of the database being
accessed by many programs, many of which are out of the scope of your
authority. The technical examples assume the latter situation, so if
you're in the simple situation you may find some of our solutions to
be a little more complicated than you need (lucky you!).
Culture. Although it is technically simple to implement individual refactorings, and clearly possible (albeit a little
complicated) to adapt your internal processes to support database
refactoring, the fact is that cultural challenges within your
organization will likely prove to be the most difficult hurdle to
overcome.
This idea may or may not work, but reading about your work so far and the previous answer looks like reinventing the wheel. The "wheel" is source control, with it's branch, merge and version tracking features.
At the moment, for each DB schema change, you have a SQL file containing the changes from the previous one. You already mention the significant issues you have with this approach.
Replace your method with this one: Maintain ONE (and only ONE!) SQL file, which stores all DDL command for creating tables, indexes, and so on from scratch. You need to add a new field? Add a "ALTER TABLE" line in your SQL file. This way your source control tool will in effect manage your database schema, and each branch can have a different.
All of a sudden, the source code is in sync with the database schema, branching and merging works, and so on.
Note: Just to clarify the purpose of the script mentioned here is to recreate the database from scratch up to a specific version, every single time.
EDIT: I spent some time looking for material to support this approach. Here is one that looks particularly good, with a proven track record:
Database Schema Versioning Management 101
Have you seen this situation before?
Your team is writing an enterprise application around a database
Since everyone is building around the same database, the schema of the database is in flux
Everyone has their own "local" copies of the database
Every time someone changes the schema, all of these copies need the latest schema to work with the latest build of the code
Every time you deploy to a staging or production database, the schema needs to work with the latest build of the code
Factors such as schema dependencies, data changes, configuration changes, and remote developers muddy the water
How do you currently address this problem of keeping the database
versions in working order? Do you suspect this is taking more time
than necessary? There are many ways to approach this problem, and the
answer depends on the workflow in your environment. The following
article describes a distilled and simplistic methodology you can use
as a starting point.
Since it can be implemented with ANSI SQL, it is database agnostic
Since it depends on scripting, it requires negligible storage management, and it can fit in your current code version management
program
The database versioning method you are using is certainly wrong, in my opinion. If anything has to have versions, it should be the source code. The source code has versions. Your live environment is only an instance of the source code.
The answer is to apply database changes using redeployable change scripts.
All changes, no matter which branch it is on (even in master/trunk) should be done in a separate script.
Sequence your scripts, so that newer ones will not get executed first. Having a prefix with date in the format YYYYMMDD for filename has worked for us.
When this happens, the change is made to the source code, not the database. You can have as many instances/builds for various tags/branches in the VCS as you like. For example, separate live builds for each branch.
Then you only have to do the build for each instance (probably every day). The build should fetch the files from the relevant branch and perform compiling/deploying. Since the scripts are redeployable, old scripts make no effect on the database. Only the recent changes are deployed to the database.
But, how to make redeployable scripts?
This is a question that is hard to answer, since you have not specified which database you are using. So I will give you an example about how my organization does it.
Let me take a simple example: if we need to add a column to a particular table, we do not just write ALTER TABLE ... ADD COLUMN .... We write code to add a column, if and only if that column does not exist in the given table.
Now, we have separate API to handle all that existence-checking boilerplate code. So our scripts are simply calls to those APIs. You will have to write your own. These API's are not actually that hard (we're using Oracle RDBMS). But they give us a huge gain in version control and deployment.
But, that's only one scenario, there are gazillion ways a schema definition can change
Yes indeed. Data type of a column can change; A new table can be added; An attribute column can be merged into a primary key (very rare); Sequences can change; Constraints; Foreign keys; They all can change.
But it turns out that all this can be handled by API's with special privileges to read metadata tables. I am not saying it's easy, but I am saying that it is a one time cost.
But, how do you rollback a database change?
My personal experience is, if you put some real effort into designing before banging the keyboard to write ALTER TABLE statements, this scenario is extremely rare. And if there ever is a rollback, you should manually handle it. (e.g. manually remove added column).
Normally, changes to views and stored procedures are rather common, and changes to table definitions is rare.
Building the Database
As I said before, building the database can be done by running all the redeployable scripts. Pre-deployed scripts has no effect.
Your database deployment script should not start with DROP DATABASE. Your database has lots of data which was used for unit tests. Unless you make a really really simple system, these data will be valuable in the future for testing. Your testers will not be too happy about adding ten thousand records to various tables every time a database is upgraded.
Put testers aside, how are you planning to upgrade your client/customers production database without annihilating all their production data? This is why you must use redeployable change scripts.
You can try version number schemes such as 18.1-branchname etc... But they are really going to utterly fail. Because you can merge your source, not it's instances.
I think that the way you pose the problem is impossible to solve, but if change part of your process there is a solution. Let's start with the first part: why it is impossible to solve using just deltas. In the following I assume you have the main trunk and two branches dev-a and dev-b; both branches stem from the same point-in-time.
Why cannot work
Say Alice add a delta script to dev-a:
ALTER TABLE t1 (ALTER COLUMN col5 char(4))
and Bob add another script in dev-b
ALTER TABLE t1 (ALTER COLUMN col5 int)
The two scripts are clearly incompatible and you end up in breaking code in main when you merge back from any of the two. The merge tool cannot be of help if the script files have different names.
Possible solution
My suggestion is to describe your database in terms of both baseline and deltas: the delta scripts must always refer to a specific baseline, so you are able to compute a new baseline schema resulting from the application of successive deltas to a specific baseline.
An example
dev-a *--B.A1--D.1#A1--D2#A1--------B.A2--*--B.A3--
/ /
main -- B.0 --*--------------------------*--B.1---*----------
\ /
dev-b *--B.B1--D.1#B1--B.B2--*
note that after branching you immediately spin-off a new baseline, same before every merge. This way you may check that the baselines are compatible.
Final comment
Managing deltas in version control is kind of reinventing the wheel, as each delta script is functionally equivalent to saving different versions of the baseline script. That said I agree with you that they in practice they convey more value and force people to think what happens in production when you change the database.
If you opt store only baseline, you have plenty of tools to support.
Another option is to serialize work on the database, as a whole or partitioning the schema in separate areas with unique owners.

Creating database tables programmatically in evolutions kingdom

Imagine a program which operates large hierarhical datasets. The program stores each new such dataset in a dedicated table. The table is created accordingly to what data types the dataset has in it. Well, nothing very unusual. This is a trivial situation. But how do I make this kind of arrangements in Play 2.0, where the evolution paradigm rules? I just cannot start thinking of it.
UPDATE
It turned out, there is no simple way. Ok. The round way.
Is it possible to:
1) Make the program write the evolutions files itself and apply them automatically? Will it cause some distortion with Play's philosophy?
2) Use another DB system in a separate thread and do not use the Play's innate databsae functionality? Would that hurt much?
UPDATE 2
I am reading though MongoDB Casbah documentation and I like it a lot. I am planning to use this with my Play application. Is there any contra-evidence for using MongoDB via Casbah with Play?
Thst's good question. And there's no brilliant answer, unfortunately.
Generally evolutions are good and are desired when you work in group. In such case you should switch to manual evolutions (not these generated by Ebean, they are dangerous to your data in current state) and just put your initial DDL as big as possible with create statements.
In next evolutions you can create new tables or alter existing, but for god's sake do not try to create existing table :)
Other approach I was (or still) thinking about is using Ebean's auto-generated DDLs (which always assumes that your DB is empty) to generate differential schemas with some SQL schema migration tools (ie mybatis) but this is unfortunately additional effort required.
The last thing I sometimes use when I'm not sure about correct evolution syntax is small test-field app where you can add similar models and watch how Ebean's plugin will threat them. Unfortunately even this solution won't create proper alters, but it's better then testing on main app.
Well, after some more experiments, I have concluded to use MongoDB (actually, I had to choose from a wide variety of document-oriented DBMSs, and decided to start with MongoDB). I have established a MongoDB server, incorporated it's Java driver, Casbah (the driver's Scala-wrapper) and all the necessary dependencies into my project, and all works fine. No need for SQL or the evolutions paradigm, whatsoever.
And I am not using any parts of Play that work with database (the config file, anorm, and what's else is there), just ignoring that, and doing all Mongo.
All works JUST FINE!

Play, Hibernate and Evolutions

I've no previous experience with tools like Liquibase and similar. Up to now the way I've usually managed deployment into production on apps using Hibernate was using manual SQL to modify the tables, as they were quite simple apps (the complex ones didn't use it...don't ask please :P).
I've wanted to use Evolutions in Play, but I see it clashes heavily with Hibernate in development, making it a pain and not a realistic option. In development Hibernate manages everything easily so there is no point on using Evolutions, but we wanted to keep the structure (files) to make it easier to migrate the app in production mode. But due to the clashes it doesn't seem worthy.
Liquibase had a Play module but it seems to have been discontinued since Evolutions was released (I wonder why, as I believe it would work wonder with Hibernate).
The question(s) would be:
How do you manage database migrations of apps in production?
What's the usual procedure/steps you use when your model changes between releases and you have to deploy to production?
Any specific tool or feature of Hibernate we are overlooking, or just old-faithful SQL Alter table and similar?
Focusing on Play Framework, how do you manage this?
What is often the case is that an application has two phases in its life cycle - initial development and post-production "maintenance". My experience is that often, all the big database changes happen in the first phase. Let yourself be flexible there by relying on Hibernate, then when you go to production, you take a schema dump, roll that on production with Evolutions, and manage your DDL manually from there.
In the second "phase" (I'm an agile guy, I hate the word ;-)), schema changes often include DML as well because you have to calculate initial values for new columns, etcetera. Also, you'll usually be spending more time on coding than on schema changes, so the whole manual experience becomes a bit less painful :).
(Having said that - I'd love a better integration between Evolutions and Play/Hibernate, like having the option to record the DDL that Hibernate spits out to the evolutions directory)
Well you ask a very good question. I struggled with this problem on grails, so I have not really a solution, but some thoughts. I will start with a comparison of Evolutions with Liquibase:
Liquibase is a matured solution, even if the plugin isn't under development any more, the underlying library is it. So I think it's an acceptable solution.
If you use Evolution you have one big disadvantage compared with liquibase: You must write your SQL directly, so the scripts depends on your database-system. Think abouts booleans and the representation in different databases. So you lost benefit Hibernate gives you.
Now to the general problem. I think you have to options:
Let Hibernate handle the database structure for you. Only in cases Hibernate can't do the job, you use liquibase or evolutions. Unfortunately you can run into some troubles if you have complicated update scenarios. How ever you win that your development is faster.
You ignore all DDL-Features from Hibernate and do everything with liquibase or evolutions. This is the most reliable and robust solution, but obviously you have much more work in development.
So what is my recommendation? I would try the following approach: Develop with an distributed version control system, like bzr or git. Then use feature-branches. Use for feature branches always the hibernate functionality. Before you merge the stuff into the trunk, create liquibase-script. These script can be generated by liquibase with some manual customizing). So you can develop a feature very quick and has in trunk always the robust solution 2.
How ever be aware that this isn't a proofed approach in great project. I only tested this strategy with Hibernate and Liquibase on a small project - it works fine.
So would be great to get feedback.
Regarding having hibernate spit out the SQL to get you started to use Evolutions or Flyway, take a look at this: http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/toolsetguide.html#toolsetguide-s1-6
EDIT: I actually made a plugin to bootstrap your migration script. I think it might be useful to most of the people that come across this thread:
http://web.ist.utl.pt/~joao.a.p.antunes/2014/08/09/play-2-2-x-jpa-hibernate-database-migration
Cheers!

Database source control with Oracle

I have been looking during hours for a way to check in a database into source control. My first idea was a program for calculating database diffs and ask all the developers to imlement their changes as new diff scripts. Now, I find that if I can dump a database into a file I cound check it in and use it as just antother type of file.
The main conditions are:
Works for Oracle 9R2
Human readable so we can use diff to see the diferences. (.dmp files doesn't seem readable)
All tables in a batch. We have more than 200 tables.
It stores BOTH STRUCTURE AND DATA
It supports CLOB and RAW Types.
It stores Procedures, Packages and its bodies, functions, tables, views, indexes, contraints, Secuences and synonims.
It can be turned into an executable script to rebuild the database into a clean machine.
Not limitated to really small databases (Supports least 200.000 rows)
It is not easy. I have downloaded a lot of demos that does fail in one way or another.
EDIT: I wouldn't mind alternatives aproaches provided that they allows us to check a working system against our release DATABASE STRUCTURE AND OBJECTS + DATA in a batch mode.
By the way. Our project has been developed for years. Some aproaches can be easily implemented when you make a fresh start but seem hard at this point.
EDIT: To understand better the problem let's say that some users can sometimes do changes to the config data in the production eviroment. Or developers might create a new field or alter a view without notice in the realease branch. I need to be aware of this changes or it will be complicated to merge the changes into production.
So many people try to do this sort of thing (diff schemas). My opinion is
Source code goes into a version control tool (Subversion, CSV, GIT, Perforce ...). Treat it as if it was Java or C code, its really no different. You should have an install process that checks it out and applies it to the database.
DDL IS SOURCE CODE. It goes into the version control tool too.
Data is a grey area - lookup tables maybe should be in a version control tool. Application generated data certainly should not.
The way I do things these days is to create migration scripts similar to Ruby on Rails migrations. Put your DDL into scripts and run them to move the database between versions. Group changes for a release into a single file or set of files. Then you have a script that moves your application from version x to version y.
One thing I never ever do anymore (and I used to do it until I learned better) is use any GUI tools to create database objects in my development environment. Write the DDL scripts from day 1 - you will need them anyway to promote the code to test, production etc. I have seen so many people who use the GUIs to create all the objects and come release time there is a scrabble to attempt to produce scripts to create/migrate the schema correctly that are often not tested and fail!
Everyone will have their own preference to how to do this, but I have seen a lot of it done badly over the years which formed my opinions above.
Oracle SQL Developer has a "Database Export" function. It can produce a single file which contains all DDL and data.
I use PL/SQL developer with a VCS Plug-in that integrates into Team Foundation Server, but it only has support for database objects, and not with the data itself, which usually is left out of source control anyways.
Here is the link: http://www.allroundautomations.com/bodyplsqldev.html
It may not be as slick as detecting the diffs, however we use a simple ant build file. In our current CVS branch, we'll have the "base" database code broken out into the ddl for tables and triggers and such. We'll also have the delta folder, broken out in the same manner. Starting from scratch, you can run "base" + "delta" and get the current state of the database. When you go to production, you'll simply run the "delta" build and be done. This model doesn't work uber-well if you have a huge schema and you are changing it rapidly. (Note: At least among database objects like tables, indexes and the like. For packages, procedures, functions and triggers, it works well.) Here is a sample ant task:
<target name="buildTables" description="Build Tables with primary keys and sequences">
<sql driver="${conn.jdbc.driver}" password="${conn.user.password}"
url="${conn.jdbc.url}" userid="${conn.user.name}"
classpath="${app.base}/lib/${jdbc.jar.name}">
<fileset dir="${db.dir}/ddl">
<include name="*.sql"/>
</fileset>
</sql>
</target>
I think this is a case of,
You're trying to solve a problem
You've come up with a solution
You don't know how to implement the solution
so now you're asking for help on how to implement the solution
The better way to get help,
Tell us what the problem is
ask for ideas for solving the problem
pick the best solution
I can't tell what the problem you're trying to solve is. Sometimes it's obvious from the question, this one certainly isn't. But I can tell you that this 'solution' will turn into its own maintenance nightmare. If you think developing the database and the app that uses it is hard. This idea of versioning the entire database in a human readable form is nothing short of insane.
Have you tried Oracle's Workspace Manager? Not that I have any experience with it in a production database, but I found some toy experiments with it promising.
Don't try to diff the data. Just write a trigger to store whatever-you-want-to-get when the data is changed.
Expensive though it may be, a tool like TOAD for Oracle can be ideal for solving this sort of problem.
That said, my preferred solution is to start with all of the DDL (including Stored Procedure definitions) as text, managed under version control, and write scripts that will create a functioning database from source. If someone wants to modify the schema, they must, must, must commit those changes to the repository, not just modify the database directly. No exceptions! That way, if you need to build scripts that reflect updates between versions, it's a matter of taking all of the committed changes, and then adding whatever DML you need to massage any existing data to meet the changes (adding default values for new columns for existing rows, etc.) With all of the DDL (and prepopulated data) as text, collecting differences is as simple as diffing two source trees.
At my last job, I had NAnt scripts that would restore test databases, run all of the upgrade scripts that were needed, based upon the version of the database, and then dump the end result to DDL and DML. I would do the same for an empty database (to create one from scratch) and then compare the results. If the two were significantly different (the dump program wasn't perfect) I could tell immediately what changes needed to be made to the update / creation DDL and DML. While I did use database comparison tools like TOAD, they weren't as useful as hand-written SQL when I needed to produce general scripts for massaging data. (Machine-generated code can be remarkably brittle.)
Try RedGate's Source Control for Oracle. I've never tried the Oracle version, but the MSSQL version is really great.

Generating database tables from object definitions

I know that there are a few (automatic) ways to create a data access layer to manipulate an existing database (LINQ to SQL, Hibernate, etc...). But I'm getting kind of tired (and I believe that there should be a better way of doing things) of stuff like:
Creating/altering tables in Visio
Using Visio's "Update Database" to create/alter the database
Importing the tables into a "LINQ to SQL classes" object
Changing the code accordingly
Compiling
What about a way to generate the database schema from the objects/entities definition? I can't seem to find good references for tools like this (and I would expect some kind of built-in support in at least some frameworks).
It would be perfect if I could just:
Change the object definition
Change the code that manipulates the object
Compile (the database changes are done auto-magically)
Check out DataObjects.Net - is is designed to support exactly this case. Code only, and nothing else. Its schema upgrade layer is probably the most featured one you can find, and it really fully abstracts schema upgrade SQL.
Check out product video - you'll notice nothing additional is made to sync the schema. Schema upgrade sample shows the intended usage of this feature.
You may be looking for an Object Database.
I believe this is the problem that the Microsofy Entity Framework is trying to address. Whilst not specifically designed to "Compile (the database changes are done auto-magically)" it does address the issue of handling changes to the domain model without a huge dependance on the underlying data model.
As Jason suggested, object db might be a good choice. Take a look at db4objects.
What you described is GORM. It is part of the Grails framework and is built to work with Hibernate (maybe JPA in the future). When I was first using Grails it seemed backwards. I was more comfortable with a Rails style workflow of making the tables and letting the framework generate scaffolding from the database schema. GORM persists your domain objects for you so you create and change the objects, it manages database create/update. This makes more sense now that I have gotten used to it. Sorry to tease you if you aren't looking for a new framework but it is on the roadmap for release 1.1 to make GORM available standalone.
When we built the first version of our own framework (Inon Datamanager) I had it read pre-existing SQL tables and autogenerate Java objects from them.
When my colleagues who came from a Smalltalkish background built the second version, they started from the objects and then autogenerated the tables.
Actually, they forgot about the SQL part altogether until I came back in and added it. But nowadays we just run a trigger on application startup which iterates over the object model, checks if the tables and all the right columns exist, and creates them if not. Very convenient.
This turned out to be a lot easier than you might expect - if your favourite tool doesn't support a similar process, you could probably write it in a couple of hours - assuming the relational to object mapping is relatively simple.
But the point is, it seems to depend on whether you're culturally an object person or a database person - you can regard either one as the authoritative source.
Some of the really big dogs, such as ERwin Data Modeler, will go object to DB. You need to have the big bucks to afford the product though.
I kept digging around some of the "major" frameworks and it seems that Django does exactly what I was talking about. Or so it seems from this screencast.
Does anyone have any remark to make about this? Does it work well?
Yes, Django works well.
yes, it will generate your SQL tables from your data model definitions (written in python)
It won't always alter existing tables if you update your structure, you might have to run an ALTER table manually
Ruby on Rails has an even more advanced version of these features (Rails migrations), but I don't like the framework as much, I find ruby and rails pretty idiosyncratic
Kind of a late answer, but here it goes:
I faced the exact same problem and ended up writing my own solution for it, working with .NET and SQL Server only however. It basicaly does implement the process you describe:
All DB objects are kept as embedded CREATE scripts as part of the source code
DB Objects are set up automatically (or on request) when using the data access functionality
All non-table changes are also performed automatically (or on request) at the same time
Table changes, which may require special attention to migrate data, are performend via (manually created) change scripts also upon upgrading the database
Even manual changes made to any databse object can be detected, so that schema integrity can be verified and rectified
An optional lightweight ORM can map stored procedures and objects as well as result sets (even multiple)
A command-line application helps keeping the SQL source files in sync with a development database
The library including the database are free under a LGPL license.
http://code.google.com/p/bsn-modulestore/

Resources