Creating database tables programmatically in evolutions kingdom - database

Imagine a program which operates large hierarhical datasets. The program stores each new such dataset in a dedicated table. The table is created accordingly to what data types the dataset has in it. Well, nothing very unusual. This is a trivial situation. But how do I make this kind of arrangements in Play 2.0, where the evolution paradigm rules? I just cannot start thinking of it.
UPDATE
It turned out, there is no simple way. Ok. The round way.
Is it possible to:
1) Make the program write the evolutions files itself and apply them automatically? Will it cause some distortion with Play's philosophy?
2) Use another DB system in a separate thread and do not use the Play's innate databsae functionality? Would that hurt much?
UPDATE 2
I am reading though MongoDB Casbah documentation and I like it a lot. I am planning to use this with my Play application. Is there any contra-evidence for using MongoDB via Casbah with Play?

Thst's good question. And there's no brilliant answer, unfortunately.
Generally evolutions are good and are desired when you work in group. In such case you should switch to manual evolutions (not these generated by Ebean, they are dangerous to your data in current state) and just put your initial DDL as big as possible with create statements.
In next evolutions you can create new tables or alter existing, but for god's sake do not try to create existing table :)
Other approach I was (or still) thinking about is using Ebean's auto-generated DDLs (which always assumes that your DB is empty) to generate differential schemas with some SQL schema migration tools (ie mybatis) but this is unfortunately additional effort required.
The last thing I sometimes use when I'm not sure about correct evolution syntax is small test-field app where you can add similar models and watch how Ebean's plugin will threat them. Unfortunately even this solution won't create proper alters, but it's better then testing on main app.

Well, after some more experiments, I have concluded to use MongoDB (actually, I had to choose from a wide variety of document-oriented DBMSs, and decided to start with MongoDB). I have established a MongoDB server, incorporated it's Java driver, Casbah (the driver's Scala-wrapper) and all the necessary dependencies into my project, and all works fine. No need for SQL or the evolutions paradigm, whatsoever.
And I am not using any parts of Play that work with database (the config file, anorm, and what's else is there), just ignoring that, and doing all Mongo.
All works JUST FINE!

Related

Save Database field changes - best practices? Versioning, Loggable?

i am using Symfony 2 with Doctrine as ORM Framework. I am searching for the best way to save changes done to database fields. I will have about 100 Tables each with about 50 fields and some thousand rows. Now i would like to save all changes done to the fields.
Possibilities i thought about:
Doctrine extension "Loggable" - saves changes in a different Table, but don't know if it can afford this amount of entries.
a MySQL Trigger for each Table that saves changes in a new Table?
But what is the best practice to save changes?
You can use either MySQL triggers or the mentioned DoctrineExtension Loggable feature. Both works, both has cons and pros. MySQL trigger can write into a separate table (see mysql trigger FAQ).
triggers:
++ framework, programming language independent
++ works when you want to modify the data by hand or by a script.
-- You have to write the triggers for every table or have to figure out some generic solution in SQL (I can't help on that).
-- If you are not familiar with stored procedures and PL/SQL, well, there is learning curve
doctrine extensions:
++ Just put your annotation on classes and you're done.
++ You can query the history, revert changes through the Repository API
-- you lock yourself to a vendor, this sometimes is, sometimes isn't a problem
-- doesn't works when you modify the data by hand or with a 3rd party scripts.
If the chance of switching doctrine to something else is low, I would start with doctrine extensions. It's a tool with the exact purpose to help dealing with SQL after all.
I'd suggest going with triggers, especially if you want your logging functionality to stay application independent — that is, it will work even if you decide to rewrite your app on a different framework or completely different programming language.
P.S. I don't know how great is triggers support in MySQL, since I switched to PostgreSQL before MySQL even had them.
Such a thing is commonly called "change data capture". It's been asked about with reference to MySQL before on SO:
Change Data Capture in MySQL
Maybe this answer can help you.
Different vendors make this a built in feature to varying degrees.
The following article has a step by step explanation + sample code of doing versioning using triggers.
http://www.jasny.net/articles/versioning-mysql-data/

Database source control with Oracle

I have been looking during hours for a way to check in a database into source control. My first idea was a program for calculating database diffs and ask all the developers to imlement their changes as new diff scripts. Now, I find that if I can dump a database into a file I cound check it in and use it as just antother type of file.
The main conditions are:
Works for Oracle 9R2
Human readable so we can use diff to see the diferences. (.dmp files doesn't seem readable)
All tables in a batch. We have more than 200 tables.
It stores BOTH STRUCTURE AND DATA
It supports CLOB and RAW Types.
It stores Procedures, Packages and its bodies, functions, tables, views, indexes, contraints, Secuences and synonims.
It can be turned into an executable script to rebuild the database into a clean machine.
Not limitated to really small databases (Supports least 200.000 rows)
It is not easy. I have downloaded a lot of demos that does fail in one way or another.
EDIT: I wouldn't mind alternatives aproaches provided that they allows us to check a working system against our release DATABASE STRUCTURE AND OBJECTS + DATA in a batch mode.
By the way. Our project has been developed for years. Some aproaches can be easily implemented when you make a fresh start but seem hard at this point.
EDIT: To understand better the problem let's say that some users can sometimes do changes to the config data in the production eviroment. Or developers might create a new field or alter a view without notice in the realease branch. I need to be aware of this changes or it will be complicated to merge the changes into production.
So many people try to do this sort of thing (diff schemas). My opinion is
Source code goes into a version control tool (Subversion, CSV, GIT, Perforce ...). Treat it as if it was Java or C code, its really no different. You should have an install process that checks it out and applies it to the database.
DDL IS SOURCE CODE. It goes into the version control tool too.
Data is a grey area - lookup tables maybe should be in a version control tool. Application generated data certainly should not.
The way I do things these days is to create migration scripts similar to Ruby on Rails migrations. Put your DDL into scripts and run them to move the database between versions. Group changes for a release into a single file or set of files. Then you have a script that moves your application from version x to version y.
One thing I never ever do anymore (and I used to do it until I learned better) is use any GUI tools to create database objects in my development environment. Write the DDL scripts from day 1 - you will need them anyway to promote the code to test, production etc. I have seen so many people who use the GUIs to create all the objects and come release time there is a scrabble to attempt to produce scripts to create/migrate the schema correctly that are often not tested and fail!
Everyone will have their own preference to how to do this, but I have seen a lot of it done badly over the years which formed my opinions above.
Oracle SQL Developer has a "Database Export" function. It can produce a single file which contains all DDL and data.
I use PL/SQL developer with a VCS Plug-in that integrates into Team Foundation Server, but it only has support for database objects, and not with the data itself, which usually is left out of source control anyways.
Here is the link: http://www.allroundautomations.com/bodyplsqldev.html
It may not be as slick as detecting the diffs, however we use a simple ant build file. In our current CVS branch, we'll have the "base" database code broken out into the ddl for tables and triggers and such. We'll also have the delta folder, broken out in the same manner. Starting from scratch, you can run "base" + "delta" and get the current state of the database. When you go to production, you'll simply run the "delta" build and be done. This model doesn't work uber-well if you have a huge schema and you are changing it rapidly. (Note: At least among database objects like tables, indexes and the like. For packages, procedures, functions and triggers, it works well.) Here is a sample ant task:
<target name="buildTables" description="Build Tables with primary keys and sequences">
<sql driver="${conn.jdbc.driver}" password="${conn.user.password}"
url="${conn.jdbc.url}" userid="${conn.user.name}"
classpath="${app.base}/lib/${jdbc.jar.name}">
<fileset dir="${db.dir}/ddl">
<include name="*.sql"/>
</fileset>
</sql>
</target>
I think this is a case of,
You're trying to solve a problem
You've come up with a solution
You don't know how to implement the solution
so now you're asking for help on how to implement the solution
The better way to get help,
Tell us what the problem is
ask for ideas for solving the problem
pick the best solution
I can't tell what the problem you're trying to solve is. Sometimes it's obvious from the question, this one certainly isn't. But I can tell you that this 'solution' will turn into its own maintenance nightmare. If you think developing the database and the app that uses it is hard. This idea of versioning the entire database in a human readable form is nothing short of insane.
Have you tried Oracle's Workspace Manager? Not that I have any experience with it in a production database, but I found some toy experiments with it promising.
Don't try to diff the data. Just write a trigger to store whatever-you-want-to-get when the data is changed.
Expensive though it may be, a tool like TOAD for Oracle can be ideal for solving this sort of problem.
That said, my preferred solution is to start with all of the DDL (including Stored Procedure definitions) as text, managed under version control, and write scripts that will create a functioning database from source. If someone wants to modify the schema, they must, must, must commit those changes to the repository, not just modify the database directly. No exceptions! That way, if you need to build scripts that reflect updates between versions, it's a matter of taking all of the committed changes, and then adding whatever DML you need to massage any existing data to meet the changes (adding default values for new columns for existing rows, etc.) With all of the DDL (and prepopulated data) as text, collecting differences is as simple as diffing two source trees.
At my last job, I had NAnt scripts that would restore test databases, run all of the upgrade scripts that were needed, based upon the version of the database, and then dump the end result to DDL and DML. I would do the same for an empty database (to create one from scratch) and then compare the results. If the two were significantly different (the dump program wasn't perfect) I could tell immediately what changes needed to be made to the update / creation DDL and DML. While I did use database comparison tools like TOAD, they weren't as useful as hand-written SQL when I needed to produce general scripts for massaging data. (Machine-generated code can be remarkably brittle.)
Try RedGate's Source Control for Oracle. I've never tried the Oracle version, but the MSSQL version is really great.

Nonrelational Databases for C++

I was thinking of starting a project that very clearly needs a persistent store. I was about to reluctantly decide on a RDBMS, when I came across an article which briefly mentions CouchDB. Seems some advancements in DB technology have happened since I last looked, so I thought I would ask here about databases before I got into it.
Here are my criteria. ( I list the criteria again at the end, so if you want to skip the explanations just scroll down. )
The project is open source and I will not be asking anything for it, so preferably the database is open source and free. Furthermore the software has to run on both Linux and Windows.
There are parts of the project that have to be in C++. The project is not large enough code wise to justify using a second language. So basically the whole thing will be C++.
This project will not have anything to do with the web, so preferably
the database will not require the detritus of a web library.
The objects I want to store fall into one of two categories: a basic object and a container object. The difference being objects which are containers will contain even more objects, ie: a parts of parts problem. I need a database that can handle such cases cleanly and efficiently.
I also expect the schema to evolve rapidly, at least initially. I alse suspect that some of the old data simply will not fit into the new schemas. So I would like to keep different versions of the schema around. Win possible, I would like to be able to transform data in one to schema into another schema.
For the application to work the way intended, people would have to exchange large chunks of database with each other. So I would want simple ways of importing and exporting data, which I could automate to some degree.
Finally it would be nice if the database could in someway be simulated in unit tests.
THose are my requirements. I have replicated them below to make it easier for people answering.
Thank you
Non Technical requirements
1. Open source preferably free.
2. Run on Windows and Linux
Has a C++ interface.
Is able to handle a non-web application, preferably without REST.
Can handle a "parts of parts" problem fairly well.
Can handle multiple indexes.
Has sort of concept of schema version, can handle multiple schema versions, and can migrate tables from one schema to another.
Should have a simple mechanism for move data from one instance of the database to another.
Preferably has some mechanism for testing.
HDF5 is a binary format which behaves like an hierarchical database. It has binding and libraries for C++ and python (I only use the latter) and it is used to store big amounts of data, like the ones produces in certain physics and astronomy experiments.
http://www.hdfgroup.org/HDF5/
I've looked at a few nosql databases some time ago (had an different requirement than than you though - needed it to be a standalone server). The ones that I remember as particularly interesting are Redis and Kyoto Cabinets. Have a look.
BTW, you don't mention any performance requirement. If so, have you considered SQLite? Simple, embedded, stable, and with the flexibility of SQL after all. With prepared statement the performance penalty of SQL should not be very high.
EDIT: ooops, just noticed that you asked this more than a year ago... Well, perhaps you can tell us what you've chosen :)

How can I get my database under version control with Perl?

I've been looking at the options for getting our database schemas under version control. It seems that Ruby folks have got Rails Migrations, and .NET folks have got a few options (for instance this, this, and this). What about Perl?
I've seen this thread on PerlMonks which doesn't have much, although it mentions DBIX::Migration::Directories. Is anyone actually using this module, or some other module? Or do you roll your own DB migration solutions?
Gratuitous details:
We don't use DBIx::Class for the most part
We use MySQL
We use SVN
At work, we use a modified version of DBIx::Migration (it has some limitations, such as no more than 10 migrations). Then, you have a core schema that you've dumped from your database and when the version number is too low, you upgrade your database using the migrations from the migration schema directory.
I also highly recommend the Database Refactoring book. Amongst other things, it will give you excellent techniques for managing migrations safely in such a way that if you need to roll back, you don't lose data (such as when you drop a column you think you don't need).
To help with the automatic deprecation schedules it suggests, I've written Devel::Deprecate so that you don't need to remember when to do the deprecations. Your code will complain loudly for you (and only in testing, not in production).
Important: You'll periodically find that you're applying so many database migration levels with this technique that you'll sometimes need to "bump up" your minimum base migration because it takes too long to rebuild the database. Just take a new dump of the database at the desired migration level and remove all migrations less than or equal to that level.
Update: Fast forward a few years and today I recommend sqitch. It's designed from the ground up to handle the case of putting a database under version control without tying you to a particular programming language or VCS.
One very interesting project that's still probably a little young to rely on is Adam Kennedy's ORLite::Migrate which takes it's inspiration from Rails migrations. He wrote up a very interesting journal over at use.perl.org about his plans and I hope to keep an eye on it for the future.
It does appear that this package only works with SQLite at the moment but I think Adam's planning on building this out to be more database agnostic in the future.
In POPFile we use our own solution. We store a schema version number in the db and if the program detects that there is a newer schema, it will update the db accordingly. This is not exactly the best and most fun part of our code.
To be honest, I fail to see the advantage of using DBIx::Migration::Directories if you aren't already using DBIx::Class. You have to provide the SQL and the version numbers and the database handle. You might as well provide a little more code to find the sql file and and feed it to the database.
Of course, having the schema in version control is a great bonus.
We use a system similar to what Manni described. The two big disadvantages are:
Can't rollback schema changes (typically this is rare, not well tested and hard anyway so having to do it manually isn't a big deal IMO).
Using a sequential version number is a pain when you develop in multiple branches -- since you are using SVN this isn't as likely to be an issue as if you were using git though. :-)
The script script I use is here: database_update and there's a small example data file.
How about sqitch? It advertises itself as a "database change management application",
There is an interesting CPAN module (Database::Migrator). I have used it, and works fine in order to handle the migrations of your project.
Each migration goes into its own directory. Migrations are applied in sorted order, typically you name them starting with a number prefix. The migration directory can either contain files with SQL or Perl.

Generating database tables from object definitions

I know that there are a few (automatic) ways to create a data access layer to manipulate an existing database (LINQ to SQL, Hibernate, etc...). But I'm getting kind of tired (and I believe that there should be a better way of doing things) of stuff like:
Creating/altering tables in Visio
Using Visio's "Update Database" to create/alter the database
Importing the tables into a "LINQ to SQL classes" object
Changing the code accordingly
Compiling
What about a way to generate the database schema from the objects/entities definition? I can't seem to find good references for tools like this (and I would expect some kind of built-in support in at least some frameworks).
It would be perfect if I could just:
Change the object definition
Change the code that manipulates the object
Compile (the database changes are done auto-magically)
Check out DataObjects.Net - is is designed to support exactly this case. Code only, and nothing else. Its schema upgrade layer is probably the most featured one you can find, and it really fully abstracts schema upgrade SQL.
Check out product video - you'll notice nothing additional is made to sync the schema. Schema upgrade sample shows the intended usage of this feature.
You may be looking for an Object Database.
I believe this is the problem that the Microsofy Entity Framework is trying to address. Whilst not specifically designed to "Compile (the database changes are done auto-magically)" it does address the issue of handling changes to the domain model without a huge dependance on the underlying data model.
As Jason suggested, object db might be a good choice. Take a look at db4objects.
What you described is GORM. It is part of the Grails framework and is built to work with Hibernate (maybe JPA in the future). When I was first using Grails it seemed backwards. I was more comfortable with a Rails style workflow of making the tables and letting the framework generate scaffolding from the database schema. GORM persists your domain objects for you so you create and change the objects, it manages database create/update. This makes more sense now that I have gotten used to it. Sorry to tease you if you aren't looking for a new framework but it is on the roadmap for release 1.1 to make GORM available standalone.
When we built the first version of our own framework (Inon Datamanager) I had it read pre-existing SQL tables and autogenerate Java objects from them.
When my colleagues who came from a Smalltalkish background built the second version, they started from the objects and then autogenerated the tables.
Actually, they forgot about the SQL part altogether until I came back in and added it. But nowadays we just run a trigger on application startup which iterates over the object model, checks if the tables and all the right columns exist, and creates them if not. Very convenient.
This turned out to be a lot easier than you might expect - if your favourite tool doesn't support a similar process, you could probably write it in a couple of hours - assuming the relational to object mapping is relatively simple.
But the point is, it seems to depend on whether you're culturally an object person or a database person - you can regard either one as the authoritative source.
Some of the really big dogs, such as ERwin Data Modeler, will go object to DB. You need to have the big bucks to afford the product though.
I kept digging around some of the "major" frameworks and it seems that Django does exactly what I was talking about. Or so it seems from this screencast.
Does anyone have any remark to make about this? Does it work well?
Yes, Django works well.
yes, it will generate your SQL tables from your data model definitions (written in python)
It won't always alter existing tables if you update your structure, you might have to run an ALTER table manually
Ruby on Rails has an even more advanced version of these features (Rails migrations), but I don't like the framework as much, I find ruby and rails pretty idiosyncratic
Kind of a late answer, but here it goes:
I faced the exact same problem and ended up writing my own solution for it, working with .NET and SQL Server only however. It basicaly does implement the process you describe:
All DB objects are kept as embedded CREATE scripts as part of the source code
DB Objects are set up automatically (or on request) when using the data access functionality
All non-table changes are also performed automatically (or on request) at the same time
Table changes, which may require special attention to migrate data, are performend via (manually created) change scripts also upon upgrading the database
Even manual changes made to any databse object can be detected, so that schema integrity can be verified and rectified
An optional lightweight ORM can map stored procedures and objects as well as result sets (even multiple)
A command-line application helps keeping the SQL source files in sync with a development database
The library including the database are free under a LGPL license.
http://code.google.com/p/bsn-modulestore/

Resources