Start over with Laravel's migrations - database

I'm working on a bigger Laravel project for a while and have currently over 70 migrations (using MySQL). I've never written tests (shame on me) and like to catch up on that now.
I've used foreign keys at the beginning. I've renamed some and then I removed them completely with subsequent migrations.
Now a temporary sqlite database should be used for testing. But as I was renaming fk's by name and MySQL's naming convention differs from the sqlite's, sqlite cannot find these. There are also a bunch of other errors.
I'm wondering if I could delete all my migrations and create a single migration which holds all the database structure of the current state. Starting from scratch, so to speak.
Is that advisable? What should happen to the migrations table?

While developing you are usually just continually adding migrations to get the structure you need. If you are at a point you like the structure, you can definitely condense down your migrations.
Often times when someone is developing locally they might do this, as they get to a spot where they might want to start deploying the application they will condense down the migrations, as you know the structure you need and you don't need to 'go backwards' as this is the schema you need. You only need to worry about future changes.
On any system you haven't ran these migrations you will be perfectly fine as it will just run the first time and you will have your complete schema ready to go. If you already have migrations table and don't want to rollback, you are in a pickle ... you would have to manually alter the migrations table to remove all the migrations that don't exist any more basically and change the remaining one(s) to match what you have after your consolidation. (If you were hoping to be able to rollback/refresh ... but in reality you won't be rolling back or refreshing after a certain point, only moving forward [often due to not wanting to loss data])
There isn't really much to worry about in just condensing down your migrations ... they are your schema and if you know the schema you want, no need to step through 20 different migrations that might be altering the same tables numerous times to get to that final result.

Related

Databases and "branch"

We are currently developping an application which use a database.
Every time we update the database structure, we have to provide a script to update the database from the previous version to the current one.
So the database has currently a number that gave us it's current version and then our software make an update when we want to use an "old" database.
The issue we are encountering is when we have branches:
When we create a new big feature, that will not be available for users(and not included in releases), we create a branch.
The main branch(trunk) will be merged regularly to ensure that the create brunch has the latest bug corrections.
Here is some illustration:
The issue is with our update scripts. They update from the previous version to the current one, then update the version number of the database.
Imagine that we have the DB version 17 when creating the branch.
We then do the branch, and make changes on the Trunk DB. The DB has now the version 18.
Then we make a db change on the branch. Since we know there has already been a new version "18", we create the version 19 and the updater 18->19.
Then the trunk is merged on the branch.
At this very moment we may have some updaters that will never runs.
If someone updated his database before the merge, his database will be flagged has having the version 19, the the update 17->18 will never be done.
We want to change this behavior but we can't find how:
Our constraints are:
We are unable to make all changes on the same branch
Sometimes we have more than just 2 branchs, and we can only merge from the trunk to the feature branch until the feature is finished
What can we do to ensure a continuity between our database branch?
I think the easiest way is to use the Ruby-on-rails approach. Every DB change is a separate script file, no matter how small. Each script file is numbered, and when you do an upgrade you simply run each script from the number your DB currently is to the last one.
What this means in practice is that your DB version system stops being v18 to v19, and starts being v18.0 to v18.01, then v18.02 etc. What you release to the customer may get rolled up into a big v19 upgrade script, but as you develop, you will be making many, many small upgrades.
You'll have to modify this slightly to work for your system, each script will either have to be renumbered as it gets merged to the branch or you will have to ensure the upgrade scripts don't simply track the last upgrade number, but track each upgrade number so missing holes will still get filled in as the script gets merged across.
You will also have to roll up these little upgrades into the next major number as you create the release tag (on the trunk first) to keep things sane.
edit: so fundamentally you first havew to get rid of the notion of using a upgrade sdcript to go from version to version. For example, if you start with a table, and trunk adds column A and the branch adds column B, then you merge trunk to branch - you cannot realistically "upgrade" to the version with both, unless the branch version number is always greater than the trunk's upgrade script, and that doesn't work if you subsequently merge trunk to the branch. So you must therefore scrap the idea of a "version" that applies to development branches. The only way round that is to update each change independently, and track each change individually. Then you can say you need the "last main release plus colA plus colB" (admittedly if you merge trunk in, you can take the current main release from trunk whether its v18 or v19, but you still need to apply each branch update individually).
So you start with trunk at DB v18. Branch and make changes. Then you merge trunk later, where the DB is at v19. Your earlier branch changes still need to be applied (or should already be applied, but you may need to write a branch-update script with all branch changes in it, if you re-create your DB). Note the branch does not have a "v20" version number at all, and the branches changes are not made to a single update script like you have on trunk. You can add these changes you make on branch as a single script if you like (or 1 script of 'since the last trunk merge' changes) or as many little scripts.
When the branch is complete, the very last task is to take all the DB changes made for the branch and toll them up into a script that can be applied to the master upgrader, and when it is merged onto trunk, that script is merged into the current upgrade script and the DB version number bumped.
There is an alternative that may work for you, but I found it to be a little flaky when you try to update DBs with data, sometimes it just couldn't manage to do the update and the DB had to be wiped and re-created (which, to be fair, is probably what would have had to happen if I used SQL scripts at the time). That's to use Visual Studio Database project. This stores every part of the schema as a file, so you'll have 1 script per table. These will be hidden from you by Visual Studio itself that will show you designers instead of scripts but they're stored as files in version control. VS can deploy the project and will try to upgrade your DB if it already exists. Be careful of the options, many defaults say "drop and create" instead of using alter to update an existing table.
These projects can generate a (largely machine-readable) SQL script for deployment, we used to generate these and deliver them to a DBA team who didn't use VS and only accepted SQL.
And lastly, there's Roundhouse which is not something I've used but it might help you to become the new upgrader "script". Its a free project and I've read its more powerful and easier to use than VS DB projects. Its a DB versioning and change management tool, integrates with VS, and uses SQL scripts.
We use the following procedure for about 1.5 years now. I don't know if this is the best solution, but we didn't have any trouble with it (except some human errors in a delta-file like forgetting a USE-statement).
It has some simularities with the answer that Krumia gave, but differs in the point that in this approach only new change scripts/delta files are executed. This makes it a lot easier to write those files.
Delta files
Write all the DB-changes you make for a feature in a delta-file. You can have multiple statements in one delta-file or split them up into multiple. Once committed that file it's best (and once merged it's necessary) to start a new one and leave the old one untouched.
Put all the delta-files in one directory and give them a name-pattern like YYYY-MM-DD-HH.mm.description.sql. It's essential that you can sort them in time (therefore the timestamp) so you know what file needs to be executed first. Besides that you don't want to have a merge conflict with those files so it should be unique (over all branches).
Merging/pulling
Create a merge-script (for examlpe a bash-script) that performs the following actions:
Note the current commit-hash
Do the actual merge (or pull)
Get a list of all the delta-files that are added with this merge (git diff --stat $old_hash..HEAD -- path/to/delta-files)
Execute those delta-files, in the order specified by the timestamp
By using git to determine what files are new (and thus what database-actions aren't executed yet on the current branch) you are not longer bound to version-numbering.
Alternating delta-files
It might happen that within one merge delta-files from different branches may be 'new to execute' and that those files alternate like this:
2014-08-04-delta-from-feature_A.sql
2014-08-05-delta-from-feature_B.sql
2014-08-06-delta-from-feature_A.sql
As the timestamp determines the execution-order there will be first added something from feature A, then feature B, then back again to feature A. When you write proper delta-files, that are executable by themself/stand-alone, that shouldn't be a problem.
We recently have started using the Sql Server Data Tools (SSDT), which replaced the Visual Studio Database Project type, to version control our SQL databases. It creates a project for each database, with items for views and stored procedures and the ability to create Data-Tier Applications (DACPAC) that can be deployed to SQL Server instances. SSDT also supports Unit Testing and Static Data, and offers developers the option of quick sandbox testing using a LocalDB instance. There is a a good TechEd video overview of the SSDT tools and a lot more resources online.
In your situation you would use SSDT to manage your database objects in version control along side your application code, using the same merging process to push features between branches. When it comes time to upgrade an existing install you would create the DACPACs and use the Data-Tier Application upgrade process to apply the changes. Alternatively you could also use database synchronization tools such as DBGhost or RedGate to apply updates to the existing schema.
You want database migrations. Many frameworks have plugins for this. For instance CakePHP uses a plugin from CakeDC to manage. Here are some generic tools: http://en.wikipedia.org/wiki/Schema_migration#Available_Tools.
If you want to roll your own, perhaps instead of keeping the current DB version in the database, you keep a list of which patches have been applied. So instead of version table with one row with value 19, you instead have a patches table with multiple rows:
Patches
1
2
3
4
5
8
Looking at this you need to apply patches 6 and 7.
I just stumbled upon an older article written in 2008 by Jeff Atwood; hopefully it is still relevant to your problem.
Get Your Database Under Version Control
It mentiones five part series written by K. Scott Allen:
Three rules for database work
The Baseline
Change Scripts
Views, Stored Procedures and the Like
Branching and Merging
There are tools specifically designed to deal with this type of problems.
One is DBSourceTools
DBSourceTools is a GUI utility to help developers bring SQL Server
databases under source control. A powerful database scripter, code
editor, sql generator, and database versioning tool. Compare Schemas,
create diff scripts, edit T-SQL with ease. Better than Management
Studio.
Another one:
neXtep Designer
NeXtep designer is an Integrated Development Environment for database
developers. The main concept behind the product is to take advantage
of versioning in order to compute the incremental SQL scripts you need
to deliver your developments.
This project aims at building a development platform that provides all
tools which a database developer needs while automating the tasks of
generating the deliveries (= SQL resulting from a development).
To learn more about the problematic of delivering database updates, we
invite you to read the Delivering database updates article which will
present you our vision of best and worst practices.
I think an approach which will satisfy most of your requirements is to embrace the "Database Refactoring" concept.
There is a good book on this topic Refactoring Databases: Evolutionary Database Design
A database refactoring is a small change to your database schema which
improves its design without changing its semantics (e.g. you don't add
anything nor do you break anything). The process of database
refactoring is the evolutionary improvement of your database schema so
as to improve your ability to support the new needs of your customers,
support evolutionary software development, and to fix existing legacy
database design problems.
The book describes database refactoring from the point of view of:
Technology. It includes full source code for how to implement each refactoring at the database level and for most refactorings we
show how the application would change to reflect the change in the
database. Our code examples are in Oracle, Java, and Hibernate
meta-data (the refactorings are easy to translate to other
environments, and sometimes we discuss vendor-specific features which
simplify some refactorings).
Process. It describes in detail the process of database refactoring in both the simple situation of a single application
accessing the database as well as the situation of the database being
accessed by many programs, many of which are out of the scope of your
authority. The technical examples assume the latter situation, so if
you're in the simple situation you may find some of our solutions to
be a little more complicated than you need (lucky you!).
Culture. Although it is technically simple to implement individual refactorings, and clearly possible (albeit a little
complicated) to adapt your internal processes to support database
refactoring, the fact is that cultural challenges within your
organization will likely prove to be the most difficult hurdle to
overcome.
This idea may or may not work, but reading about your work so far and the previous answer looks like reinventing the wheel. The "wheel" is source control, with it's branch, merge and version tracking features.
At the moment, for each DB schema change, you have a SQL file containing the changes from the previous one. You already mention the significant issues you have with this approach.
Replace your method with this one: Maintain ONE (and only ONE!) SQL file, which stores all DDL command for creating tables, indexes, and so on from scratch. You need to add a new field? Add a "ALTER TABLE" line in your SQL file. This way your source control tool will in effect manage your database schema, and each branch can have a different.
All of a sudden, the source code is in sync with the database schema, branching and merging works, and so on.
Note: Just to clarify the purpose of the script mentioned here is to recreate the database from scratch up to a specific version, every single time.
EDIT: I spent some time looking for material to support this approach. Here is one that looks particularly good, with a proven track record:
Database Schema Versioning Management 101
Have you seen this situation before?
Your team is writing an enterprise application around a database
Since everyone is building around the same database, the schema of the database is in flux
Everyone has their own "local" copies of the database
Every time someone changes the schema, all of these copies need the latest schema to work with the latest build of the code
Every time you deploy to a staging or production database, the schema needs to work with the latest build of the code
Factors such as schema dependencies, data changes, configuration changes, and remote developers muddy the water
How do you currently address this problem of keeping the database
versions in working order? Do you suspect this is taking more time
than necessary? There are many ways to approach this problem, and the
answer depends on the workflow in your environment. The following
article describes a distilled and simplistic methodology you can use
as a starting point.
Since it can be implemented with ANSI SQL, it is database agnostic
Since it depends on scripting, it requires negligible storage management, and it can fit in your current code version management
program
The database versioning method you are using is certainly wrong, in my opinion. If anything has to have versions, it should be the source code. The source code has versions. Your live environment is only an instance of the source code.
The answer is to apply database changes using redeployable change scripts.
All changes, no matter which branch it is on (even in master/trunk) should be done in a separate script.
Sequence your scripts, so that newer ones will not get executed first. Having a prefix with date in the format YYYYMMDD for filename has worked for us.
When this happens, the change is made to the source code, not the database. You can have as many instances/builds for various tags/branches in the VCS as you like. For example, separate live builds for each branch.
Then you only have to do the build for each instance (probably every day). The build should fetch the files from the relevant branch and perform compiling/deploying. Since the scripts are redeployable, old scripts make no effect on the database. Only the recent changes are deployed to the database.
But, how to make redeployable scripts?
This is a question that is hard to answer, since you have not specified which database you are using. So I will give you an example about how my organization does it.
Let me take a simple example: if we need to add a column to a particular table, we do not just write ALTER TABLE ... ADD COLUMN .... We write code to add a column, if and only if that column does not exist in the given table.
Now, we have separate API to handle all that existence-checking boilerplate code. So our scripts are simply calls to those APIs. You will have to write your own. These API's are not actually that hard (we're using Oracle RDBMS). But they give us a huge gain in version control and deployment.
But, that's only one scenario, there are gazillion ways a schema definition can change
Yes indeed. Data type of a column can change; A new table can be added; An attribute column can be merged into a primary key (very rare); Sequences can change; Constraints; Foreign keys; They all can change.
But it turns out that all this can be handled by API's with special privileges to read metadata tables. I am not saying it's easy, but I am saying that it is a one time cost.
But, how do you rollback a database change?
My personal experience is, if you put some real effort into designing before banging the keyboard to write ALTER TABLE statements, this scenario is extremely rare. And if there ever is a rollback, you should manually handle it. (e.g. manually remove added column).
Normally, changes to views and stored procedures are rather common, and changes to table definitions is rare.
Building the Database
As I said before, building the database can be done by running all the redeployable scripts. Pre-deployed scripts has no effect.
Your database deployment script should not start with DROP DATABASE. Your database has lots of data which was used for unit tests. Unless you make a really really simple system, these data will be valuable in the future for testing. Your testers will not be too happy about adding ten thousand records to various tables every time a database is upgraded.
Put testers aside, how are you planning to upgrade your client/customers production database without annihilating all their production data? This is why you must use redeployable change scripts.
You can try version number schemes such as 18.1-branchname etc... But they are really going to utterly fail. Because you can merge your source, not it's instances.
I think that the way you pose the problem is impossible to solve, but if change part of your process there is a solution. Let's start with the first part: why it is impossible to solve using just deltas. In the following I assume you have the main trunk and two branches dev-a and dev-b; both branches stem from the same point-in-time.
Why cannot work
Say Alice add a delta script to dev-a:
ALTER TABLE t1 (ALTER COLUMN col5 char(4))
and Bob add another script in dev-b
ALTER TABLE t1 (ALTER COLUMN col5 int)
The two scripts are clearly incompatible and you end up in breaking code in main when you merge back from any of the two. The merge tool cannot be of help if the script files have different names.
Possible solution
My suggestion is to describe your database in terms of both baseline and deltas: the delta scripts must always refer to a specific baseline, so you are able to compute a new baseline schema resulting from the application of successive deltas to a specific baseline.
An example
dev-a *--B.A1--D.1#A1--D2#A1--------B.A2--*--B.A3--
/ /
main -- B.0 --*--------------------------*--B.1---*----------
\ /
dev-b *--B.B1--D.1#B1--B.B2--*
note that after branching you immediately spin-off a new baseline, same before every merge. This way you may check that the baselines are compatible.
Final comment
Managing deltas in version control is kind of reinventing the wheel, as each delta script is functionally equivalent to saving different versions of the baseline script. That said I agree with you that they in practice they convey more value and force people to think what happens in production when you change the database.
If you opt store only baseline, you have plenty of tools to support.
Another option is to serialize work on the database, as a whole or partitioning the schema in separate areas with unique owners.

Altering database tables on updating website

This seems to be an issue that keeps coming back in every web application; you're improving the back-end code and need to alter a table in the database in order to do so. No problem doing manually on the development system, but when you deploy your updated code to production servers, they'll need to automatically alter the database tables too.
I've seen a variety of ways to handle these situations, all come with their benefits and own problems. Roughly, I've come to the following two possibilities;
Dedicated update script. Requires manually initiating the update. Requires all table alterations to be done in a predefined order (rigid release planning, no easy quick fixes on the database). Typically requires maintaining a separate updating process and some way to record and manage version numbers. Benefit is that it doesn't impact running code.
Checking table properties at runtime and altering them if needed. No manual interaction required and table alters may happen in any order (so a quick fix on the database is easy to deploy). Another benefit is that the code is typically a lot easier to maintain. Obvious problem is that it requires checking table properties a lot more than it needs to.
Are there any other general possibilities or ways of dealing with altering database tables upon application updates?
I'll share what I've seen work best. It's just expanding upon your first option.
The steps I've usually seen when updating schemas in production:
Take down the front end applications. This prevents any data from being written during a schema update. We don't want writes to fail because relationships are messed up or a table is suddenly out of sync with the application.
Potentially disconnect the database so no connections can be made. Sometimes there is code out there using your database you don't even know about!
Run the scripts as you described in your first option. It definitely takes careful planning. You're right that you need a pre-defined order to apply the changes. Also I would note often times you need two sets of scripts, one for schema updates and one for data updates. As an example, if you want to add a field that is not nullable, you might add a nullable field first, and then run a script to put in a default value.
Have rollback scripts on hand. This is crucial because you might make all the changes you think you need (since it all worked great in development) and then discover the application doesn't work before you bring it back online. It's good to have an exit strategy so you aren't in that horrible place of "oh crap, we broke the application and we've been offline for hours and hours and what do we do?!"
Make sure you have backups ready to go in case (4) goes really bad.
Coordinate the application update with the database updates. Usually you do the database updates first and then roll out the new code.
(Optional) A lot of companies do partial roll outs to test. I've never done this, but if you have 5 application servers and 5 database servers, you can first roll out to 1 application/1 database server and see how it goes. Then if it's good you continue with the rest of the production machines.
It definitely takes time to find out what works best for you. From my experience doing lots of production database updates, there is no silver bullet. The most important thing is taking your time and being disciplined in tracking changes (versioning like you mentioned).

Is it a good idea to keep database migration inside VCS?

The conventional wisdom seems to be that database migrations should be kept inside the VCS - that way there is a record of all the changes the database went through.
But...
What is the use of having old migrations? I don't really see myself reverting to the old version of db. Wouldn't it be easier to just keep them out of the VCS, and create a migration queue on every machine that doesn't have to be kept in sync with everybody else's migraton queues?
If you think keeping old migrations around is waste then you don't understand migrations. Migrations allow you to rollback a change. You might not imagine doing it, but it can be necessary. Full migration histories allow collaborative teams to remain in sync regardless of version they might have when they start.
Trashing and starting over is one of the worst things you can do with South. You completely screw up other developers unless you explicitly tell everyone to go in and clear out the south_migrationhistory table and delete all the existing migrations and create a new init migration.
In short. Leave migrations in VCS, it's where they belong so that anyone coming into a project from any point can quickly migrate their db to the current version. Don't clean them out, they don't hurt anything and you create hassles for other collaborators by doing so.
I think you are right there is little use in keeping old migrations, even if they do no harm I also do not like unused code in my projects.
While using a migration system like south helps a lot during development it is true that you are probably not going to need old schema migrations about consolidated features/changes (and in case you still have the model changes in python code available on the VCS).
From time to time I do trash and recreate all migration in one single initial migration and start again to collect new migrations.
Keeping migration outside of the VCS is not a good idea at it will probably if it is not going introduce problems it will slow down development for sure.
EDIT:
Since the clean up process involves the team to be aware of this reorganization (and clear the migration history table) it is not suggested to do so when you can't reach easily all developers/users.

YAGNI and database creation scripts

Right now, I have code which creates the database (just a few CREATE queries on a SQLite database) in my main database access class. This seems unnecessary as I have no intention of ever using the code. I would just need it if something went wrong and I needed to recreate the database. Should I...
Leave things as they are, even though the database creation code is about a quarter of my file size.
Move the database-creation code to a separate script. It's likely I'll be running it manually if I ever need to run it again anyway, and that would put it out-of-sight-out-of-mind while working on the main code.
Delete the database-creation code and rely on revision control if I ever find myself needing it again.
I think it is best to keep the code. Even more importantly, you should maintain this code (or generate it) every time the database schema changes.
It is important for the following reasons.
You will probably be surprised how many times you need it. If you need to migrate your server, or setup another environment (e.g. TEST or DEMO), and so on.
I also find that I refer to the DDL SQL quite often when coding, particularly if I have not touched the system for a while.
You have a reference for the decisions you made, such as the indexes you created, unique keys, etc etc.
If you do not have a disciplined approach to this, I have found that the database schema can drift over time as ad-hoc changes are made, and this can cause obscure issues that are not found until you hit the database. Even worse without a disciplined approach (i.e. a reference definition of the schema) you may find that different databases have subtly different schema.
I would just need it if something went
wrong and I needed to recreate the
database.
Recreating the database is absolutely not an exceptional case. That code is part of your deployment process on a new / different system, and it represents the DB structure your code expects to work with. You should actually have integration tests that confirm this. Working indefinitely with a single DB server whose schema was created incrementally via manually dispatched SQL statements during development is not something you should rely on.
But yes, it should be separated from the access code; so option 2 is correct. The separate script can then be used by tests as well as for deployment.

Database source control with Oracle

I have been looking during hours for a way to check in a database into source control. My first idea was a program for calculating database diffs and ask all the developers to imlement their changes as new diff scripts. Now, I find that if I can dump a database into a file I cound check it in and use it as just antother type of file.
The main conditions are:
Works for Oracle 9R2
Human readable so we can use diff to see the diferences. (.dmp files doesn't seem readable)
All tables in a batch. We have more than 200 tables.
It stores BOTH STRUCTURE AND DATA
It supports CLOB and RAW Types.
It stores Procedures, Packages and its bodies, functions, tables, views, indexes, contraints, Secuences and synonims.
It can be turned into an executable script to rebuild the database into a clean machine.
Not limitated to really small databases (Supports least 200.000 rows)
It is not easy. I have downloaded a lot of demos that does fail in one way or another.
EDIT: I wouldn't mind alternatives aproaches provided that they allows us to check a working system against our release DATABASE STRUCTURE AND OBJECTS + DATA in a batch mode.
By the way. Our project has been developed for years. Some aproaches can be easily implemented when you make a fresh start but seem hard at this point.
EDIT: To understand better the problem let's say that some users can sometimes do changes to the config data in the production eviroment. Or developers might create a new field or alter a view without notice in the realease branch. I need to be aware of this changes or it will be complicated to merge the changes into production.
So many people try to do this sort of thing (diff schemas). My opinion is
Source code goes into a version control tool (Subversion, CSV, GIT, Perforce ...). Treat it as if it was Java or C code, its really no different. You should have an install process that checks it out and applies it to the database.
DDL IS SOURCE CODE. It goes into the version control tool too.
Data is a grey area - lookup tables maybe should be in a version control tool. Application generated data certainly should not.
The way I do things these days is to create migration scripts similar to Ruby on Rails migrations. Put your DDL into scripts and run them to move the database between versions. Group changes for a release into a single file or set of files. Then you have a script that moves your application from version x to version y.
One thing I never ever do anymore (and I used to do it until I learned better) is use any GUI tools to create database objects in my development environment. Write the DDL scripts from day 1 - you will need them anyway to promote the code to test, production etc. I have seen so many people who use the GUIs to create all the objects and come release time there is a scrabble to attempt to produce scripts to create/migrate the schema correctly that are often not tested and fail!
Everyone will have their own preference to how to do this, but I have seen a lot of it done badly over the years which formed my opinions above.
Oracle SQL Developer has a "Database Export" function. It can produce a single file which contains all DDL and data.
I use PL/SQL developer with a VCS Plug-in that integrates into Team Foundation Server, but it only has support for database objects, and not with the data itself, which usually is left out of source control anyways.
Here is the link: http://www.allroundautomations.com/bodyplsqldev.html
It may not be as slick as detecting the diffs, however we use a simple ant build file. In our current CVS branch, we'll have the "base" database code broken out into the ddl for tables and triggers and such. We'll also have the delta folder, broken out in the same manner. Starting from scratch, you can run "base" + "delta" and get the current state of the database. When you go to production, you'll simply run the "delta" build and be done. This model doesn't work uber-well if you have a huge schema and you are changing it rapidly. (Note: At least among database objects like tables, indexes and the like. For packages, procedures, functions and triggers, it works well.) Here is a sample ant task:
<target name="buildTables" description="Build Tables with primary keys and sequences">
<sql driver="${conn.jdbc.driver}" password="${conn.user.password}"
url="${conn.jdbc.url}" userid="${conn.user.name}"
classpath="${app.base}/lib/${jdbc.jar.name}">
<fileset dir="${db.dir}/ddl">
<include name="*.sql"/>
</fileset>
</sql>
</target>
I think this is a case of,
You're trying to solve a problem
You've come up with a solution
You don't know how to implement the solution
so now you're asking for help on how to implement the solution
The better way to get help,
Tell us what the problem is
ask for ideas for solving the problem
pick the best solution
I can't tell what the problem you're trying to solve is. Sometimes it's obvious from the question, this one certainly isn't. But I can tell you that this 'solution' will turn into its own maintenance nightmare. If you think developing the database and the app that uses it is hard. This idea of versioning the entire database in a human readable form is nothing short of insane.
Have you tried Oracle's Workspace Manager? Not that I have any experience with it in a production database, but I found some toy experiments with it promising.
Don't try to diff the data. Just write a trigger to store whatever-you-want-to-get when the data is changed.
Expensive though it may be, a tool like TOAD for Oracle can be ideal for solving this sort of problem.
That said, my preferred solution is to start with all of the DDL (including Stored Procedure definitions) as text, managed under version control, and write scripts that will create a functioning database from source. If someone wants to modify the schema, they must, must, must commit those changes to the repository, not just modify the database directly. No exceptions! That way, if you need to build scripts that reflect updates between versions, it's a matter of taking all of the committed changes, and then adding whatever DML you need to massage any existing data to meet the changes (adding default values for new columns for existing rows, etc.) With all of the DDL (and prepopulated data) as text, collecting differences is as simple as diffing two source trees.
At my last job, I had NAnt scripts that would restore test databases, run all of the upgrade scripts that were needed, based upon the version of the database, and then dump the end result to DDL and DML. I would do the same for an empty database (to create one from scratch) and then compare the results. If the two were significantly different (the dump program wasn't perfect) I could tell immediately what changes needed to be made to the update / creation DDL and DML. While I did use database comparison tools like TOAD, they weren't as useful as hand-written SQL when I needed to produce general scripts for massaging data. (Machine-generated code can be remarkably brittle.)
Try RedGate's Source Control for Oracle. I've never tried the Oracle version, but the MSSQL version is really great.

Resources