Managing the migration of breaking database changes to a database shared by old version of the same application

Managing the migration of breaking database changes to a database shared by old version of the same application - database

One of my goals is to be able to deploy a new version of a web application that runs side by side the old version. The catch is that everything shares a database. A database that in the new version tends to include significant refactoring to database tables. I would like to be rollout the new version of the application to users over time and to be able to switch them back to the old version if I need to.
Oren had a good post setting up the issue, but it ended with:
"We are still in somewhat muddy water with regards to deploying to production with regards to changes that affects the entire system, to wit, breaking database changes. I am going to discuss that in the next installment, this one got just a tad out of hand, I am afraid."
The follow-on post never came ;-). How would you go about managing the migration of breaking database changes to a database shared by old version of the same application. How would you keep the data synced up?

Read Scott Ambler's book "Refactoring Databases"; take with a pinch of salt, but there are quite a lot of good ideas in there.
The details of the solutions available depend on the DBMS you use. However, you can do things like:
create a new table (or several new tables) for the new design
create a view with the old table name that collects data from the new table(s)
create 'instead of' triggers on the view to update the new tables instead of the view
In some circumstances, you don't need a new table - you may just need triggers.

If the old version has to be maintained, the changes simply can't be breaking. That also helps when deploying a new version of a web app - if you need to roll back, it really helps if you can leave the database as it is.
Obviously this comes with significant architectural handicaps, and you will almost certainly end up with a database which shows its lineage, so to speak - but the deployment benefits are usually worth the headaches, in my experience.
It helps if you have a solid collection of integration tests for each old version involved . You should be able to run them against your migrated test database for every version which is still deemed to be "possibly live" - which may well be "every version ever" in some cases. If you're able to control deployment reasonably strictly you may get away with only having compatibility for three or four versions - in which case you can plan phasing out obsolete tables/columns etc if there's a real need. Just bear in mind the complexity of such planning against the benefits accrued.

Assuming only 2 versions of your client, I'd only keep one copy of the data in the new tables.
You can maintain the contract between the old and new apps behind views on top of the new tables.
Use before/instead of triggers to handle writes into the "old" views that actually write into the new tables.
You are maintaining 2 versions of code and must still develop your old app but it is unavoidable.
This way, there are no synchronisation issues, effectively you'd have to deal with replication conflicts between "old" and "new" schemas.
More than 2 versions becomes complicated as mentioned...

First, I would like to say that this problem is very hard and you might not find a complete answer.
Lately I've been involved in maintaining a legacy line of business application, which might soon evolve to a new version. Maintenance includes solving bugs, optimization of old code and new features, that sometimes cannot fit easily in the current application architecture. The main problem with our application is that it was poorly documented, there is no trace of changes and we are basically the 5th rotation team working on this project (we are fairly new to it).
Leaving the outer details on the side (code, layers, etc), I will try to explain a little how we are currently managing the database changes.
We have at this moment two rules that we are trying to follow:
First, is that old code (sql, stored procs, function, etc) works as is and should be kept as is, without modifying too much unless there is the case (bug or feature change), and of course, try to document it as much as possible (especially the problems like:
"WTF!, why did he do that instead of that?").
Second is that every new feature that comes in should use the best practices known at this moment, and modify the old database structure as little as it can. This would introduce some database refactoring options like using editable views on top of the old structure, introducing new extension tables for already existing ones, normalizing the structure and providing the older structure through views, etc.
Also, we are trying to write as many unit tests as we can provided the business analysts are working side by side and documenting the business rules.
Database refactoring is a very complex field to be answered in a short answer. There are a lot of books that answer all your problems, one http://databaserefactoring.com/ being pointed in one of the answers.
Later Edit: Hopefully the second rule will also answer the handling of breaking changes.

Related

Altering database tables on updating website

This seems to be an issue that keeps coming back in every web application; you're improving the back-end code and need to alter a table in the database in order to do so. No problem doing manually on the development system, but when you deploy your updated code to production servers, they'll need to automatically alter the database tables too.
I've seen a variety of ways to handle these situations, all come with their benefits and own problems. Roughly, I've come to the following two possibilities;
Dedicated update script. Requires manually initiating the update. Requires all table alterations to be done in a predefined order (rigid release planning, no easy quick fixes on the database). Typically requires maintaining a separate updating process and some way to record and manage version numbers. Benefit is that it doesn't impact running code.
Checking table properties at runtime and altering them if needed. No manual interaction required and table alters may happen in any order (so a quick fix on the database is easy to deploy). Another benefit is that the code is typically a lot easier to maintain. Obvious problem is that it requires checking table properties a lot more than it needs to.
Are there any other general possibilities or ways of dealing with altering database tables upon application updates?

I'll share what I've seen work best. It's just expanding upon your first option.
The steps I've usually seen when updating schemas in production:
Take down the front end applications. This prevents any data from being written during a schema update. We don't want writes to fail because relationships are messed up or a table is suddenly out of sync with the application.
Potentially disconnect the database so no connections can be made. Sometimes there is code out there using your database you don't even know about!
Run the scripts as you described in your first option. It definitely takes careful planning. You're right that you need a pre-defined order to apply the changes. Also I would note often times you need two sets of scripts, one for schema updates and one for data updates. As an example, if you want to add a field that is not nullable, you might add a nullable field first, and then run a script to put in a default value.
Have rollback scripts on hand. This is crucial because you might make all the changes you think you need (since it all worked great in development) and then discover the application doesn't work before you bring it back online. It's good to have an exit strategy so you aren't in that horrible place of "oh crap, we broke the application and we've been offline for hours and hours and what do we do?!"
Make sure you have backups ready to go in case (4) goes really bad.
Coordinate the application update with the database updates. Usually you do the database updates first and then roll out the new code.
(Optional) A lot of companies do partial roll outs to test. I've never done this, but if you have 5 application servers and 5 database servers, you can first roll out to 1 application/1 database server and see how it goes. Then if it's good you continue with the rest of the production machines.
It definitely takes time to find out what works best for you. From my experience doing lots of production database updates, there is no silver bullet. The most important thing is taking your time and being disciplined in tracking changes (versioning like you mentioned).

Which comes first: database or application logic?

What is the best way or recommended best practice in the flow of database driven asp.net web application? I mean the database first or coding first or side by side?

Your data access code won't compile without an existing database - unless you stub (or Mock) it. So probably the database comes first.
But it is a bad idea to do whole chunks of the application in isolation. Ideally you should design and build slivers of the system - database and application - hand-in-hand. These slivers should be cohesive sub-sets of functionality, probably smaller than sub-systems. Inevitably, the act of coding screens and business rules will throw up problems in the data model. So it is good to have a data modeller or DBA who is happy to work incrementally alongside the developers.
edit
Stephanie makes an extremely pertinent point:
"the core tables which are persisting
your app's data really can't be
piecemealed. Most of the data is known
at project start. It has a form, you
need to find it."
I agree that the core entities are knowable at project start, and the physical data model can be derived from that logical data model. But I don't think it is ever possible to nail down completely the structure of any table, even a core table, at the start. This is because at the start of the design/build phase all we have to go on are the Requirements, and if there's one thing history tells us about the Requirements it is that they will change.
So, new tables will be needed and some existing tables will become obsolete. There will be columns which need to be added, columns which need to be modified, columns which need to be dropped. This is why Nature gave us the ALTER TABLE statement.
I am not suggesting that we don't design our tables, or assemble them piecemeal. I am merely suggesting that when we start designing the HR sub-system we need to worry about the EMPLOYEES table and the SALARIES table. We don't need to concern ourselves with INVENTORY or ORDERS until we commence work on Sales.

We personally start with the Domain and do things side-by-side. The important part is that we implement vertical slices of the application (fully working end-to-end features), not horizontal slices (e.g. first the whole database layer, then the data access, then the services, then the presentation): we build the application incrementally and demonstrate progress with working code after each iteration.

Applications are all about features.
You don't build apps to store data,
but to provide functionality. If we
can't agree on that, the discussion is
moot of course. Software should be
developed to satisfy the needs of its
users and not of its developers.
Well I have really no understanding of the second sentence. If you think my company pays me a good salary to write code that satisfies me and not my users you're crazy. So that argument is a strawman. Back to the first.
This is a common view point of application centric people (they), vs. database centric people (We). They see the entire point of the exercise to "provide features". Those are things the clients know they want and ask for them. To them, the database is just persistence required for these features. And when they are done, that's it, features delivered, database is sufficient for those features. Could be an entire Rube Goldberg inside the database with redundant data, severe violations of normal forms, constraints enforced by the application, what have you.
think overall usability alone outweighs database design
If the design of your database is affecting your usability than the design was bad. I have no doubt that one who strives for features will leave the database in such a state that it severely hampers usability.
Data Centric people, don't look at a system as a place to provide only what's been asked for, but a repository of Intellectual Capital that can be exploited by more than whatever the Application-du-jour is. I can't begin to describe the number of cases where one team has used the database of some other team's app to enhance their apps value. Just look at all the medical research that is nothing more that the meta-analysis of existing studies. None of that is possible if you believe that only the features of your app matter and subsequent uses of your apps data do not.
A good data model isn't inviolate. Sure you'll add to it, change it when requirements change. But if you don't completely understand your data, I don't know how anyone can begin to write code.

I guess you need first to define datamodel and only then going coding. You should plan everything carefully before actually writting the code.

First is a feature list.
Then, detailed spec.
Then test plan and design of all, including databases.
Then, it wouldn't matter which to implement first.

You'll probably end up doing it "side by side".
You need some data to be able to test the application, but you need the application to be able to verify that you're storing the correct data.
Do some modelling first and then build the minimum you can for one or two features. Then when these are working add the next feature and so on.
You'll need to write some database update procedures (both the code and the rules about what and when to update) as you will have to extend your tables, but you'll need those for the final system anyway as it will have to change as new requirements come along.

Having done it quite a few times, I find myself invariably doing it like so:
Define the problem I'm trying to solve.
Write out some use-cases.
Have my significant other or a friend tell me if this is even a problem.
Sketch out a few sample screens.
Write flow diagrams for the use cases.
Ask my Rubber-duck questions.
Use questions to refine 1-6.
Write out the 'nouns'. Those become my data Model.
Write out the actions. Those become application logic.
Code data Model.
Code Application Logic.
Realize I've gotten it a little wrong.
Repeat 10-12 as many times as needed.
Ask, "Have I solved the problem"?
If not, rinse, lather and repeat 1-15.

This is a trick question. IMO, they both come in parallel during your planning and design phase. They are so closely related that it make sense to do them together. Just keep in mind that your database design will be almost fully developed while your code is still in its infancy (though your application logic should be almost fully mapped out in you head or on paper)
The idea is that you're designing your solution in the context of the problem. When you're planning out your solution you will be (or should be) defining your application as a set of things and actions (nouns and verbs).
For example, a very basic helpdesk program has people and tickets. People need to create tickets, update tickets, and close tickets. The nouns that require persistent storage will comprise your database, and the nouns + actions will be contained in your application.
Sometimes your table mappings and the relationship between tables will be obvious (IE people create tickets, ticket.creatorID = people.personID) and other times the relationship doesn't really click in your head until you start working through use cases or until you start writing your code (IE different ppl have different access levels defining what they can do. At a glance this would seem like a simple field in a table, but in practice it is better as a separate table).

Best way to design database tables with interbase?

I am about to start redoing a company database in a proper fashion. Our current database is a mess and has little to no documentation. I was wondering what people recommend to use when designing an Interbase database? Is there some sort of good visual schema designer that will generate the SQL? Is it better to do it all by hand?
Basically, what are the steps people usually take when designing and documenting a database? If it matters, I intend to use Hibernate as an ORM for the database. (Specific tips with Interbase would be appreciated as well).
thanks!

Usually, I use a text editor. Occasionally, I use Database Workbench. Last I heard, Embarcadero was going to add InterBase support to some of their database modeling tools, but I don't know if that has shipped yet.

If this is a brand new database app that has recently been created and is not being relied on in the business, then I say full steam ahead with your re-write / new database. I suspect however that you are dealing with a database that has been around for a few years and is heavily used.
If I am right about the database being a few years old, I strongly recommend against starting from scratch. Almost any production database that has been around a few years will be "messy". This is usually because the real world requirements for programs usually demand the solutions to be somewhat messy. This will be true of your brand new database (should you go this route) a few years from now as well.
Here are some reasons I would not recreate a production database from scratch:
The live database contains years worth of transactions and customer data that is very valuable. It will be very difficult to transfer this data into a completely different database structure. Believe me, even if the company tells you now they will not need to access this old data, they will.
Many business rules have probably been built into the database structure, in the form of defaults, triggers, stored procedures, even the data types of the columns, and without examining these very carefully and documenting them, you are likely to leave them out of your new database and spend much time debugging and adding these in when people start using the system and discover the rules are not being applier properly
You are liable to make mistakes in your new database design, or realise later that the structure needs to change to accommodate a new feature. If you have been making changes to your current database, and learning from that, future changes become easier and more intuitive.
Here is the approach I recommend:
Understand and document the current database, which will give you a really good understanding of the information flows in your business.
When you see what appears to be bad or messy design, look at it carefully. You may be right, and see potential for change, or you might find a trade off was made for performance or other reasons, and you can learn from this.
Make incremental improvements to the database structure, being sure to update documentation, alter the programs that rely on those areas (or work with your programmer if that isn't you).
I know this seems like a very long way around, but take it from someone who has been maintaining and creating databases for 12 years now - your current database is probably messy because the real-world requirements are messy.

Using a common database for collaborative development

Some of the people in my project seem to think that using a common development database with everyone connecting to it is the best thing. I think that it isn't and each developer having his own database (with periodic updated data dumps) is the best. Am I right or wrong? Have you encountered any problems in any of these approaches?

Disk space and CPU should be cheap enough that every developer can run their own instance of the database, with an automated build under version control. This is needed to allow developers to be bold in hacking on the database, in isolation from any other developer's concurrent hacking.
The caveat being, of course, that any changes they make to their private instance are useless to anyone else unless it can be automatically applied during the build process. So there needs to be a firm policy that application code can't depend on any database state unless that state is represented by version-controlled, unit-tested changes to the DDL.
For an excellent guide on the theory and practice of treating the database definition as another part of the project code, and coordinating changes and refactorings, see Refactoring Databases: Evolutionary Database Design by Scott W. Ambler and Pramod Sadalage.

I like having my own copy of the database for development, because it gives you the flexibility to rapidly change things without worrying how it will impact others.
However, if all the developers are hacking away on their own copy of the database, it becomes more and more difficult to merge everyone's work together in the end.
I think you can get the best of both worlds by letting developers work on a local copy during day-to-day development, but each developer should probably merge their work into a common copy on a pretty regular basis. Writing a lot of unit tests helps too.

We share a single database amongst all our developer (20-odd) but we've got it structured so that everyone has their own tables.
You don't need a separate database per developer if you structure the application right. It should be configurable which database or table-prefix it uses anyway so you can easily move it between instances (unit test, system test, acceptance test, production, disaster recovery and so on).
The advantage to using a single database is that the cost of maintenance is amortized. You don't have your DBAs trying to handle a lot of databases (or, if you're a small-DB shop, you don't have every developer trying to maintain their own database when they're better utilized in developing).

Having a single point of Failure is not a good thing isn't it?

I prefer a single, shared database. But it's very dependent on the situation and the applications being developed.
What works for me may not work for you. Go with your gut.

If you are working with Hibernate or any hibernate-based platform you can configure your database to be created when you start your server (create-drop option). This is very useful when you are adding new attributes to your classes. If this is the case each developer must have his own copy of the DB.
If you are not changing the DB structure at all then you can use a single shared DB.
In this second case is not a must. I prefer to have my own DB where I can do whatever I want. On the other hand remember that some queries can take a lot of time and this will affect your whole team if you are sharing a DB.

Version track, automate DB schema changes with django

I'm currently looking at the Python framework Django for future db-based web apps as well as for a port of some apps currently written in PHP. One of the nastier issues during my last years was keeping track of database schema changes and deploying these changes to productive systems. I haven't dared asking for being able to undo them too, but of course for testing and debugging that would be a great feature. From other questions here (such as this one or this one), I can see that I'm not alone and that this is not a trivial problem. Also, I found many inspirations in the answers there.
Now, as Django seems to be very powerful, does it have any tools to help with the above? Maybe it's even in their docs and I missed it?

There are at least two third party utilities to handle DB schema migrations, South and Django Evolution. I haven't tried either one, but I have heard some good things about South, though Evolution has been around a little longer.
Also, look at SchemaEvolution on the Django wiki. It is just a wiki page about migrating the db.

Last time I checked (version 0.97), syncdb will be able to add tables to sync your DB schema with your models.py file, but it cannot:
Rename or add a column on a populated DB. You need to do that by hand.
Refactorize your model (like split a table into two) and repopulate your DB accordingly.
It might be possible though to write a Django script to make the migration by playing with the two different managers, but that might take ages if your DB is large.

There was a panel session on DB schema changes at the recent DjangoCon; there is a video of the session (thanks to Google), which should provide some useful information on a number of these utilities.

And now there's also dmigrations. From announcement:
django-evolution attempts to address this problem the clever way, by detecting changes to models that are not yet reflected in the database schema and figuring out what needs to be done to bring the two back in sync. In contrast, dmigrations takes the stupid approach: it requires you to explicitly state the changes in a sequence of migrations, which will be applied in turn to bring a database up to the most recent state that reflects the underlying models.
This means extra work for developers who create migrations, but it also makes the whole process completely transparent—for our projects, we decided to go with the simplest system that could possibly work.
(My bold)

I heard lot of good about Django Schema Evolution Branch and those were opions of actual users. It mostely works out of the box and do what it should do.

U should lookup Dmigrations, it functions a little bit diffrent from django-eveoltions.
It shows you everything it is doing and for compliccated things it asks you for your intervetnions. It should be great.