Version track, automate DB schema changes with django

Version track, automate DB schema changes with django - database

I'm currently looking at the Python framework Django for future db-based web apps as well as for a port of some apps currently written in PHP. One of the nastier issues during my last years was keeping track of database schema changes and deploying these changes to productive systems. I haven't dared asking for being able to undo them too, but of course for testing and debugging that would be a great feature. From other questions here (such as this one or this one), I can see that I'm not alone and that this is not a trivial problem. Also, I found many inspirations in the answers there.
Now, as Django seems to be very powerful, does it have any tools to help with the above? Maybe it's even in their docs and I missed it?

There are at least two third party utilities to handle DB schema migrations, South and Django Evolution. I haven't tried either one, but I have heard some good things about South, though Evolution has been around a little longer.
Also, look at SchemaEvolution on the Django wiki. It is just a wiki page about migrating the db.

Last time I checked (version 0.97), syncdb will be able to add tables to sync your DB schema with your models.py file, but it cannot:
Rename or add a column on a populated DB. You need to do that by hand.
Refactorize your model (like split a table into two) and repopulate your DB accordingly.
It might be possible though to write a Django script to make the migration by playing with the two different managers, but that might take ages if your DB is large.

There was a panel session on DB schema changes at the recent DjangoCon; there is a video of the session (thanks to Google), which should provide some useful information on a number of these utilities.

And now there's also dmigrations. From announcement:
django-evolution attempts to address this problem the clever way, by detecting changes to models that are not yet reflected in the database schema and figuring out what needs to be done to bring the two back in sync. In contrast, dmigrations takes the stupid approach: it requires you to explicitly state the changes in a sequence of migrations, which will be applied in turn to bring a database up to the most recent state that reflects the underlying models.
This means extra work for developers who create migrations, but it also makes the whole process completely transparent—for our projects, we decided to go with the simplest system that could possibly work.
(My bold)

I heard lot of good about Django Schema Evolution Branch and those were opions of actual users. It mostely works out of the box and do what it should do.

U should lookup Dmigrations, it functions a little bit diffrent from django-eveoltions.
It shows you everything it is doing and for compliccated things it asks you for your intervetnions. It should be great.

Related

Good (CMS-based?) platform for simple database apps

I need to implement yet another database website. Let's say roughly 5 tables, 25 columns, and (eventually) thousands to tens of thousands of rows. Easy data entry and maintenance are more important than presentation of the data to non-privileged users. It's a niche site, so performance is not a concern. We'll have no trouble finding somewhere to host it.
So: what's a good platform for this? Intituitively I feel that there ought to be some platform that allows this to be done with no code written - some web version of MS Access. Obviously I'm happy to code business rules, and special logic that distinguishes this from every other database app.
I've looked at Drupal (with Views) and it looks possible, but with quite a bit of effort. Will look at Al Fresco next. A CMS-y platform helps because then you can nicely integrate static content, you get nice styling, plugins, etc etc.
Really good data entry (tracking changes, logging, ability to roll back, mass imports...) would be great. If authorised users could do arbitrary SQL queries (yes, I know...) that would be a big bonus. Image management support a small bonus.

Django is what you are looking for. In fact, you could probably set up what you ask without much coding at all, just configuration.
Once complete, authorised users can add 'rows' with a nice but simple GUI, or, of course, you can batch import via database commands.
I'm a Python newbie, and I've already created 2 Django-based sites. I have created more than a dozen Drupal-based sites, and Django is easier and produces significantly faster sites.

Your need somewhat sits between two chairs : bespoke application and CMS-based. I'd advocate for the CMS approach, if and only if you feel the need for content structure customization will grow in the future, slowly removing the need for direct SQL queries.
I am biased since working with eZ Publish for many years now, but it satisfies the requirements you expressed natively :
Really good data entry (tracking changes, logging, ability to roll back, mass imports...)
[...] Image management support a small bonus.
An idea of the content edition feel can be watched here:
http://ez.no/Demos-Videos/eZ-Publish-Administration-Interface-Video-Tutorial
and you can download and test-drive eZ Publish Community Edition there : http://share.ez.no/latest
It is a PHP-based solution, strong professional community (http://share.ez.no), over 1100 add-ons available on http://projects.ez.no. The underlying libs are mostly relying on Apache Zeta Components, high-quality, robust set of PHP5 libraries.
Last note : the content model is abstracted, meaning you'd not have to create a new table everytime a new type of content should be stored : a simple content class definition from the administration interface, and the rest is taken care of, including the edition interface for the new content type. Might remove the need for hardcore SQL queries ?
Hope it helped,

Drupal can do most of what you need (I don't know of a module that will let you enter arbitrary SQL queries), but you will end up with some overhead of tables and modules you don't really need. It's up to you to decide if that's a problem or not. I don't think the overhead would hurt performance in your case.
The advantages of using Drupal would be the large community, the stability of the platform and the flexibility to add more functionality when needed. Also, the large user base ensures that most code has been tested rather well.

I highly recommend Drupal. It is very simple (also internally codebase is small and clean) it has dosens of possibilities and tremendous support. Once you start with Drupal you will never go to anything else.
Note that I'm not connected with Drupal staff, I've just created dosens of Drupas sites and many of them in just a minutes. My last one took me 2 hrs, see it here http://iPadDevZone.com
UPDATE #1:
It really depends on your DB schema complexity. The best case is that you just use CCK module (part of core now) and create your node type. Node is Drupal name for content. All you do is just web admin your node type fields (text, image, numbers, dates, custom, etc). Then, if user creates content with this node type he/she can enter all the fields which are stored in separate db table fields. This is however hidden for you - if you wish not to know about it - it is just a web gui. Then you choose how the node is presented, which properties as shown and where.
Watch videos in CCK resources section in the bottom of this page: http://drupal.org/project/cck
If you need to do some programming then it is also very easy to use so called PHP code sniplets which are entered as part of your content (node) and executed when the page is displayed.
Drupal has node revisions built in the core. You can see all the versions and roll back if you wish.
You can set the permissions in quite granular level so you can control what your users may or may not.

I would take a look at Symphony. I havn't been using it myself, but it seems like it's really easy to use and to customize!
http://symphony-cms.com/

Seems to me an online database system would be better than a CMS system.
So in addition to what's been posted above:
www.quickbase.com (by Intuit) - think around $150/mo
www.rollbase.com - check on price, full featured
www.rhythmdata.com - easy to set up, but don't think it's got the advanced features you're looking for.
Good luck!
B

I appreciate these answers, but most of them are really platforms that are much better at something else (eg, Drupal really is a CMS, and has some support for custom fields - but it's not at all easy). Since this is a brand new site from scratch, it doesn't really make sense to start with something that does custom database fields as an afterthought, I think.
The closest I've found is Zoho Creator. It really is like "MS Access for Web 2.0" - and even supports importing from Access. The pricing could get expensive though. It feels like it might eventually be quite constraining. I'm still evaluating.
Are there any other products like Zoho Creator?

CMS Database design - Master database or Multi-Db per site

I am in process of designing my CMS that I am about to create. I was thinking about the database and how I want to go by approaching it.
Do you think its best to create 1 master database for all my clients websites? or Should I have 1 database per site?
What is the benefits and negatives on both approaches? I am always thinking about the future so I was thinking about implementing memcache or APC cache to the project, to offer an option to my client.
Just trying to learn the best practices and what other developers apporach would be

I've run both. My business chooses to separate client-specific data into separate tables so that if one happens to go corrupt, not all are taken down. In an ideal world this might never happen, but murphy's law....It does seem very easy to find things with them separated. You will know with 100% certainty that one client's content will never show up on another's page.
If you do go down that route, be prepared to create scripts that build and configure databases for you. There's nothing fun about building a great system and having demand for it, only to spend your time manually setting up DB's and installs all day long. Also, setting db names is one additional step that's not part of using a single db table--it's a headache that will repeat itself seemingly over and over again.

Develop the single master DB. It will take a small amount of additional effort and add a little bit more complexity to the database design, but will give you a few nice features. The biggest is being able to share data between sites.
Designing for a master database means that you have the option to combine sites when it makes sense, but also lets you install a master per site. Best of both worlds.

It depends greatly upon the amount of customization each client will require. If you forsee clients asking for many one-off features specific to their deployment, separate databases based off of a single core structure might make sense. I would highly recommend trying to make any customizations usable by all clients though, and keep all structure defined in one place/database instead of duplicating it across multiple databases. By using one database, you make updating the structure straightforward and the implementation consistent across all sites so they can all use the same CMS code.

How to organize a database in Django with multiple, distinct apps?

I'm new to Django (and databases in general), and I'm not sure how to structure the following. The sources of data I'll have for my site are:
a blog
for a few different games:
a high score list
user-created levels
If I were storing the data in ordinary files, I'd just have one file for each of the above. In Django, ideally (I think) I'd have a separate database for each of these, but apparently multiple database support isn't there for Django yet. I'm worried (unnecessarily?) about keeping everything in one database for two reasons:
If I screw something up in one of the sections, I don't want to mess up the rest of the data.
When I'm working on one of these sections, I'd like the freedom to easily change the model around. Since I've learned that syncdb doesn't, in fact, sync the database, I've decided that the easiest thing to do when messing around with a model is to simply wipe the database and start over. Again, I'm worried about messing up the other sections. I looked at south, and it seems like more trouble than it's worth during the planning stages of an app (but I'll reconsider later when there's actually valuable data).
Part of the problem is that I'm not really comfortable keeping my data in a binary format. I'm used to text, so I can easily diff it, modify it in an editor, etc., without going through some magical database interface (I'm using postgresql, by the way).
Are my fears unfounded? How do people normally handle this problem?

For what it's worth, I totally understand your frustration as I went through a really similar thought process when starting. Unfortunately, there isn't much you can do (easily, anyway) besides get familiar with the tools you'll be using.
First of all, you don't need multiple databases for what you're doing - one will do. Each app will create its own tables in the database which are somewhat isolated from one another. As czarchaic mentioned, you can do python manage.py reset app_name to reset just one of them in case you change your model. You will lose that data, though.
To get data in relatively easy to work with format, you can use the command python manage.py dumpdata > file_name.json, and then to reload it later python manage.py loaddata file_name.json. You can use this for backups, local test data, or as a poor man's migration (hint: South would be easier).
Yet another option is to use a NoSQL database for any data you think will need extra flexibility. Just keep in mind that Django doesn't have support for any of these at the moment. That means no no admin support or ModelForms. Of course, having a model may become unnecessary.

In short, your fears are unfounded. You should "organize" your database by project to use the Django term. Each model in each app will have it's own table, but they will all be in the same database. Putting them in a separate database isn't a good idea for a whole host of reasons, the biggest is that you cannot query across databases.
While I agree that south is probably a bit heavy for your initial design/dev stages it should be considered seriously for anything even resembling a beta version and absolutely necessary in production.
If you're going to be messing with your models a bunch during development the best thing to do is use fixtures to load data in quickly after running sync. Or, if you are going to be changing a bunch of required fields, then write some quick Python to create dummy data for you.
As for not trusting your data to binary, a simple "pg_dump " will get you a textual version of your data. It sounds to me like you're working on your app against live production data, which is a mistake. Your goal should be to get your application built, working, and tested on fake data or at least a copy of your production data and then when you're sure everything is golden migrate it into production. This is where things like south come in handy as you can automate this deployment and it will help you handle any database table/column changes you need to make.
I'm sure all of this sounds like a pain, but all of it is able to be automated and trust me it makes your life down the road much much easier.

I generally just reset the module
>>> python manage.py reset blog
this will reset all tables in INSTALLED_APPS.blog
I'm not sure if this answers your question but it's much lest destructive than wiping the DB.

Syncdb should really only be used for development. That's why it doesn't really matter if you wipe the tables and start again, perhaps exporting look up data into a json file that you can import each time you sync.
When your site reaches production however, you have a little more work. Any changes you make to your models that need to be reflected in the database, need to be emitted as SQL and run manually on the database. There's a django-admin.py function to emit the suggested SQL, which you can use to build up a script to run on the database to migrate it. Like you mention, a migrations app like South can be beneficial here but it's not strictly needed.
As far as your separation of sites goes, run them as separate sites/projects. You can have a separate settings file per project which allows you to run two different databases. This is in contrast to running the two sites as separate apps within the same project. If they're totally separate they probably shouldn't be in the same project unless you need to leverage common code.

How can I get my database under version control with Perl?

I've been looking at the options for getting our database schemas under version control. It seems that Ruby folks have got Rails Migrations, and .NET folks have got a few options (for instance this, this, and this). What about Perl?
I've seen this thread on PerlMonks which doesn't have much, although it mentions DBIX::Migration::Directories. Is anyone actually using this module, or some other module? Or do you roll your own DB migration solutions?
Gratuitous details:
We don't use DBIx::Class for the most part
We use MySQL
We use SVN

At work, we use a modified version of DBIx::Migration (it has some limitations, such as no more than 10 migrations). Then, you have a core schema that you've dumped from your database and when the version number is too low, you upgrade your database using the migrations from the migration schema directory.
I also highly recommend the Database Refactoring book. Amongst other things, it will give you excellent techniques for managing migrations safely in such a way that if you need to roll back, you don't lose data (such as when you drop a column you think you don't need).
To help with the automatic deprecation schedules it suggests, I've written Devel::Deprecate so that you don't need to remember when to do the deprecations. Your code will complain loudly for you (and only in testing, not in production).
Important: You'll periodically find that you're applying so many database migration levels with this technique that you'll sometimes need to "bump up" your minimum base migration because it takes too long to rebuild the database. Just take a new dump of the database at the desired migration level and remove all migrations less than or equal to that level.
Update: Fast forward a few years and today I recommend sqitch. It's designed from the ground up to handle the case of putting a database under version control without tying you to a particular programming language or VCS.

One very interesting project that's still probably a little young to rely on is Adam Kennedy's ORLite::Migrate which takes it's inspiration from Rails migrations. He wrote up a very interesting journal over at use.perl.org about his plans and I hope to keep an eye on it for the future.
It does appear that this package only works with SQLite at the moment but I think Adam's planning on building this out to be more database agnostic in the future.

In POPFile we use our own solution. We store a schema version number in the db and if the program detects that there is a newer schema, it will update the db accordingly. This is not exactly the best and most fun part of our code.
To be honest, I fail to see the advantage of using DBIx::Migration::Directories if you aren't already using DBIx::Class. You have to provide the SQL and the version numbers and the database handle. You might as well provide a little more code to find the sql file and and feed it to the database.
Of course, having the schema in version control is a great bonus.

We use a system similar to what Manni described. The two big disadvantages are:
Can't rollback schema changes (typically this is rare, not well tested and hard anyway so having to do it manually isn't a big deal IMO).
Using a sequential version number is a pain when you develop in multiple branches -- since you are using SVN this isn't as likely to be an issue as if you were using git though. :-)
The script script I use is here: database_update and there's a small example data file.

How about sqitch? It advertises itself as a "database change management application",

There is an interesting CPAN module (Database::Migrator). I have used it, and works fine in order to handle the migrations of your project.
Each migration goes into its own directory. Migrations are applied in sorted order, typically you name them starting with a number prefix. The migration directory can either contain files with SQL or Perl.

Managing the migration of breaking database changes to a database shared by old version of the same application

One of my goals is to be able to deploy a new version of a web application that runs side by side the old version. The catch is that everything shares a database. A database that in the new version tends to include significant refactoring to database tables. I would like to be rollout the new version of the application to users over time and to be able to switch them back to the old version if I need to.
Oren had a good post setting up the issue, but it ended with:
"We are still in somewhat muddy water with regards to deploying to production with regards to changes that affects the entire system, to wit, breaking database changes. I am going to discuss that in the next installment, this one got just a tad out of hand, I am afraid."
The follow-on post never came ;-). How would you go about managing the migration of breaking database changes to a database shared by old version of the same application. How would you keep the data synced up?

Read Scott Ambler's book "Refactoring Databases"; take with a pinch of salt, but there are quite a lot of good ideas in there.
The details of the solutions available depend on the DBMS you use. However, you can do things like:
create a new table (or several new tables) for the new design
create a view with the old table name that collects data from the new table(s)
create 'instead of' triggers on the view to update the new tables instead of the view
In some circumstances, you don't need a new table - you may just need triggers.

If the old version has to be maintained, the changes simply can't be breaking. That also helps when deploying a new version of a web app - if you need to roll back, it really helps if you can leave the database as it is.
Obviously this comes with significant architectural handicaps, and you will almost certainly end up with a database which shows its lineage, so to speak - but the deployment benefits are usually worth the headaches, in my experience.
It helps if you have a solid collection of integration tests for each old version involved . You should be able to run them against your migrated test database for every version which is still deemed to be "possibly live" - which may well be "every version ever" in some cases. If you're able to control deployment reasonably strictly you may get away with only having compatibility for three or four versions - in which case you can plan phasing out obsolete tables/columns etc if there's a real need. Just bear in mind the complexity of such planning against the benefits accrued.

Assuming only 2 versions of your client, I'd only keep one copy of the data in the new tables.
You can maintain the contract between the old and new apps behind views on top of the new tables.
Use before/instead of triggers to handle writes into the "old" views that actually write into the new tables.
You are maintaining 2 versions of code and must still develop your old app but it is unavoidable.
This way, there are no synchronisation issues, effectively you'd have to deal with replication conflicts between "old" and "new" schemas.
More than 2 versions becomes complicated as mentioned...

First, I would like to say that this problem is very hard and you might not find a complete answer.
Lately I've been involved in maintaining a legacy line of business application, which might soon evolve to a new version. Maintenance includes solving bugs, optimization of old code and new features, that sometimes cannot fit easily in the current application architecture. The main problem with our application is that it was poorly documented, there is no trace of changes and we are basically the 5th rotation team working on this project (we are fairly new to it).
Leaving the outer details on the side (code, layers, etc), I will try to explain a little how we are currently managing the database changes.
We have at this moment two rules that we are trying to follow:
First, is that old code (sql, stored procs, function, etc) works as is and should be kept as is, without modifying too much unless there is the case (bug or feature change), and of course, try to document it as much as possible (especially the problems like:
"WTF!, why did he do that instead of that?").
Second is that every new feature that comes in should use the best practices known at this moment, and modify the old database structure as little as it can. This would introduce some database refactoring options like using editable views on top of the old structure, introducing new extension tables for already existing ones, normalizing the structure and providing the older structure through views, etc.
Also, we are trying to write as many unit tests as we can provided the business analysts are working side by side and documenting the business rules.
Database refactoring is a very complex field to be answered in a short answer. There are a lot of books that answer all your problems, one http://databaserefactoring.com/ being pointed in one of the answers.
Later Edit: Hopefully the second rule will also answer the handling of breaking changes.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight