Flyway best practice: one large migration script vs many incremental ones - database

I have a task to add some new migration scripts to the existing application using flyway, roughly as following:
Update 4 records in one table for one logical task (JIRA ticket #1)
Update another record in the same table for another logical task (JIRA ticket #2)
Update record(s) in another table for (JIRA ticket #2)
There are 2 alternatives:
Keep the changes 3 in separate migration scripts.
Lump all the changes together in one large all-encompassing script.
I like to keep these changes in 3 separate migration scrips for the sake of logical delineation and in case something fails in any of these scripts execution I don't have to create another large correction script to fix the failure(s) as it might get quite messy.
On the other hand, these 3 tasks all belong to the same piece of work (bigger parent JIRA task) and go to production together.
Could someone share their experience and opinion as to what is the best practice - keep 3 migration scripts or put it all together, with pros and cons of each approach if possible please?
I found this article, but it sheds no light to my specific query:
https://dbabulletin.com/index.php/2018/03/29/best-practices-using-flyway-for-database-migrations/

This is going to partly opinion, so we'll just have to deal with that within SO and the rules.
I would absolutely break it up into smaller scripts. There's two reasons for this. First, when one of those sets of changes goes south (yeah, probably not, but what if), you can divorce that change from all the others much easier if it's in its own script. Of course you'd still group the changes so they're all 7.1, 7.2, 7.3 (or whatever), but being able to pull stuff apart is easier, if it's already pulled apart.
Second, if you get into a situation where you're doing piecemeal deployments through cherry picking, you're also going to want to break down the scripts into individual sets of changes. Same reason as above.
That's not to say I'd always, ever, only do individual changes. But the kind of disparate changes you're listing, ones that aren't related to one another directly and could, in fact, be pulled apart, those, I'd keep apart.
Partly opinion, partly an understanding of how things fall apart or need to be changed. I hope it's an answer within the SO rules.

This is more opinion that hard fact, but what I've usually come across as "best practice" is to keep scripts to either a feature or dependent commands. This way if a command breaks the migration, you don't loss too much progress and can help pinpoint where your issue is.

Related

Altering database tables on updating website

This seems to be an issue that keeps coming back in every web application; you're improving the back-end code and need to alter a table in the database in order to do so. No problem doing manually on the development system, but when you deploy your updated code to production servers, they'll need to automatically alter the database tables too.
I've seen a variety of ways to handle these situations, all come with their benefits and own problems. Roughly, I've come to the following two possibilities;
Dedicated update script. Requires manually initiating the update. Requires all table alterations to be done in a predefined order (rigid release planning, no easy quick fixes on the database). Typically requires maintaining a separate updating process and some way to record and manage version numbers. Benefit is that it doesn't impact running code.
Checking table properties at runtime and altering them if needed. No manual interaction required and table alters may happen in any order (so a quick fix on the database is easy to deploy). Another benefit is that the code is typically a lot easier to maintain. Obvious problem is that it requires checking table properties a lot more than it needs to.
Are there any other general possibilities or ways of dealing with altering database tables upon application updates?
I'll share what I've seen work best. It's just expanding upon your first option.
The steps I've usually seen when updating schemas in production:
Take down the front end applications. This prevents any data from being written during a schema update. We don't want writes to fail because relationships are messed up or a table is suddenly out of sync with the application.
Potentially disconnect the database so no connections can be made. Sometimes there is code out there using your database you don't even know about!
Run the scripts as you described in your first option. It definitely takes careful planning. You're right that you need a pre-defined order to apply the changes. Also I would note often times you need two sets of scripts, one for schema updates and one for data updates. As an example, if you want to add a field that is not nullable, you might add a nullable field first, and then run a script to put in a default value.
Have rollback scripts on hand. This is crucial because you might make all the changes you think you need (since it all worked great in development) and then discover the application doesn't work before you bring it back online. It's good to have an exit strategy so you aren't in that horrible place of "oh crap, we broke the application and we've been offline for hours and hours and what do we do?!"
Make sure you have backups ready to go in case (4) goes really bad.
Coordinate the application update with the database updates. Usually you do the database updates first and then roll out the new code.
(Optional) A lot of companies do partial roll outs to test. I've never done this, but if you have 5 application servers and 5 database servers, you can first roll out to 1 application/1 database server and see how it goes. Then if it's good you continue with the rest of the production machines.
It definitely takes time to find out what works best for you. From my experience doing lots of production database updates, there is no silver bullet. The most important thing is taking your time and being disciplined in tracking changes (versioning like you mentioned).

Change in database structure

We already have a database structure, but it is the structure without normalization and very confused and in need of change, but already has a large volume of stored data, for example, all financial data company, which finance department officials are afraid of losing.
We are undecided about remodeling the entire structure of the database and retrieve the most basic and all that is possible, or continue with the same model along with their problems.
I wonder if someone has made a change like this, if you can actually transfer the data to a new structure.
thanks
Before you do any thing I would BACKUP!!! Next I would create a new database with the ideas that you had in mind. Remember this is were all the real work should be once this is created it is hard to go back. Put a lot of thought in and make the design a bullet proof tiger to the design of your company. Next create some procedures to transform the data you have in the new database as you see fit. It would help if you mentioned the platform(s) you are using and mabey provide some generic examples
I have found SSIS packages work well for projects like this if you are using SQLSERVER. While you will need to still write your transforms out the packages make the work easier for others to see what is happening
Anything can be done by you the developer. However it might make business sense to check out various 3rd party tools. There are many out there and depending on exactly what you are doing you may benefit from doing some research
Yes, it's called "database conversion". It is a very common practice, but it must be done carefully and methodically, ideally by someone who has done many of them and knows the pitfalls. It is not to be done casually by any means. Moreover, it is not unusual in the financial sector to run the "old system" in parallel with the new system for a couple of months, to reconcile month-end reports, before saying goodbye to the old system. Running parallel is a PITA, and can only be done if all of the conversion programs are in place, but it's better to be safe than sorry when the numbers must be correct to the penny.
I had the same problem, the way I solved this is by re-design a new database, then I made a script that copies the data from the old schema to the new one. It's not an easy task because you need to take care of what you are copying from the old model to the new one but it's doable!
absolutely you can migrate the data to an new structure. The real question is 'how difficult (expensive/time consuming/reliable) will the migration be?' To answer that question one would have to know
The accuracy of the existing data - does it have gaps, duplication that disagrees with each other and no way to resolve, errors, etc.
What structure do you imagine going to and is this going to introduce complexity to the migration
the skill level of the person/team doing the migration
How long the migration will take and will the platforms be changing (either the live system being modified or the new system design changing)

How many tables/sprocs/functions in a database is too many?

I'm interested in database refactoring. I deal with several databases that don't have a large amount of data, just a few GB with at most a few hundred thousand rows. However, they have hundreds -- sometimes many hundreds -- of tables, views, sprocs and functions. In some places a divide-and-rule strategy using schemas has been implemented which has helped some problems of seeing ownership/usage of tables. However, it hasn't really helped object coupling.
We all read that integration via shared database isn't A Good Thing, but we also know that it is, at least for a while , a very productive thing as everything is in the database. We just don't apply the Single Responsibility Principle to databases like we do to objects.
Edit: I should add that I have no database performance issues. The tables are not large, the biggest has only a few hundred thousand rows. There is no real database performance issue; except when the database schema/logic/implementation is grotesquely inefficient (say requiring a cursor to do a sproc execution for each row in a result set in order to pre-process data for a report). Before you say I should change these, that is the whole point: I can't because the database is no longer in a state where the impact of changes can be assessed.
Clearly at some point you say "Enough!" and divide into multiple databases connected by messages, ETL, application tiers etc etc
The question is: how many is too many? What is the absolute upper limit of the number of sprocs/tables/functions that you can have before you go insane?
First, stop trying to think of databases in object oriented terms. Principles of object oriented programming simply do NOT apply to relational databases.
Shared databases are a very good thing from a business perspective. Multiple databases storing information that has to be transferred between them quickly becomes way more complex than your piddly many hundreds of objects. Data that is consistent between enterprise applications is priceless. Trying to reconcile if GE Corp and General Electric Corporation are really the same entity between two databases can be a nightmare.
Refactoring datbases is a nice goal, but it is very complex in reality. Don't do it unless you have a major performance issue that needs to be addressed or unless you are willing to commit to a process of identifying all the code that might be affected by a change. Even then, consider if you can know all the code that might change (this is one reason why database people hate, hate, hate dynamic code!).
Often the best way to refactor is to add your change and start changing over to using your new field, sp etc while leaving the old one in place until a set expiration date. Since you are on an annual cycle, you will need to manage those dates over a long period of time. To see if sps are being used, you can identify the ones you aren't sure of and add some code to them to insert to a table everytime they are run. If after your whole year cycle, they haven't been run, you can safely eliminate them. The cycle may be shorter depending on the sp.
If I'm writing something that will only be run annually, I would normally put the word annual in the sp name. But that may not be true where you are, however, the function of the sp should give you an idea if it is something that should only be run periodically. I wouldn't expect usp_send email proc to only run once a year but I might expect that a usp_attendance_report might not be run often. Of course as I said, I would have named it something more like usp_annual_attendance_report and you can consider doing that sort of thing moving forward.
But be aware that any refactoring you do will have to take place on a long cycle to ensure that you don't delete something you need. If your code is in a source control system (and all database tables, sp, views, UDFs, triggers, etc should be), you can probably eliminate some things knowing that if they fail you can pretty instantly put them back. Again, I'd examine the object to determine the possible risk eliminating them would have.
Of course if you have good automated tests in place, eliminating something on dev and running the tests can help you find out if something is still being referenced.
If you are looking for an easy way to refactor, I don't know of one. Refactoring databses is a time-consuming, risky activity and one which may not show enough improvement for the powers that be to be willing to pay for it.
A good book on refactoring databases is:http://www.amazon.com/Refactoring-Databases-Evolutionary-Addison-Wesley-Signature/dp/0321293533
I'm not sure there is a magical limit for any of the things you mentioned. I prefer to keep things in one place so I don't have to remember that some records are in place and other records are in another.
I'd be more interested to know if all this work is impacting your performance? And if it's not then why change it? Unless it's impacting performance in some horrible way your customers won't see any benefit from your work and then what's the point?
Your customers might be better served if you just bought a new machine or upgraded your database server software.

django AuditTrail vs Reversion

I am working on an new web app I need to store any changes in database to audit table(s). Purpose of such audit tables is that later on in a real physical audit we can asecertain what happened in a situation, who edited what and what was the state of db at the time of e.g. a complex calculation.
So mostly audit table will be written and not read. Report may be generated though sometimes.
I have looked for available solution
AuditTrail - simple and that is why I am inclining towards it, I can understand it single file code.
Reversion - looks simple enough to use but not sure how easy it would be to modify it if needed.
rcsField seems to be very complex and too much for my needs
I haven't tried anyone of these, so I wanted to know some real experiences and which one I should be using. e.g. which one is faster uses less space, easy to extend and maintain?
Personally I prefer to create audit tables in the database and populate through triggers so that any change even ad hoc queries from the query window are stored. I would never consider an audit solution that is not based in the database itself. This is important because people who are making malicious changes to the database or committing fraud are not likely to do so through the web interface but on the backend directly. Far more of this stuff happens from disgruntled or larcenous employees than outside hackers. If you are using an ORM already, your data is at risk because the permissions are at the table level rather than the sp level where they belong. Therefore it is even more important that you capture any possible change to the dat not just what was from the GUI. WE have a dynamic proc to create audit tables that is run whenever new tables are added to the database. Since our audit tables populate only the changes and not the whole record, we do not need to change them every time a field is added.
Also when evaluating possible solutions, make sure you consider how hard it will be to revert the data to undo a specific change. Once you have audit tables, you will find that this is one of the most important things you need to do from them. Also consider how hard it will be to maintian the information as the database schema changes.
Choosing a solution because it appears to be the easiest to understand, is not generally a good idea. That should be lowest of your selction criteria after meeting the requirements, security, etc.
I can't give you real experience with any of them but would like to make an observation.
I assume by AuditTrail you mean AuditTrail on the Django wiki. If so, I think you'll want to instead look at HistoricalRecords developed by the same author (Marty Alchin aka #gulopine) in his book Pro Django. It should work better with Django 1.x.
This is the approach I'll be using on an upcoming project, not because it necessarily beats the others from a technical standpoint, but because it matches the "real world" expectations of the audit trail for that application.
As i stated in my question rcField seems to be to much for my needs, which is simple that i want store any changes to my table, and may be come back later to those changes to generate some reports.
So I tested AuditTrail and Reversion
Reversion seems to be a better full blown application with many features(which i do not need), Also as far as i know it saves data in a single table in XML or YAML format, which i think
will generate too much data in a single table
to read that data I may not be able to use already present db tools.
AuditTrail wins in that regard that for each table it generates a corresponding audit table and hence changes can be tracked easily, per table data is less and can be easily manipulated and user for report generation.
So i am going with AuditTrail.

Managing the migration of breaking database changes to a database shared by old version of the same application

One of my goals is to be able to deploy a new version of a web application that runs side by side the old version. The catch is that everything shares a database. A database that in the new version tends to include significant refactoring to database tables. I would like to be rollout the new version of the application to users over time and to be able to switch them back to the old version if I need to.
Oren had a good post setting up the issue, but it ended with:
"We are still in somewhat muddy water with regards to deploying to production with regards to changes that affects the entire system, to wit, breaking database changes. I am going to discuss that in the next installment, this one got just a tad out of hand, I am afraid."
The follow-on post never came ;-). How would you go about managing the migration of breaking database changes to a database shared by old version of the same application. How would you keep the data synced up?
Read Scott Ambler's book "Refactoring Databases"; take with a pinch of salt, but there are quite a lot of good ideas in there.
The details of the solutions available depend on the DBMS you use. However, you can do things like:
create a new table (or several new tables) for the new design
create a view with the old table name that collects data from the new table(s)
create 'instead of' triggers on the view to update the new tables instead of the view
In some circumstances, you don't need a new table - you may just need triggers.
If the old version has to be maintained, the changes simply can't be breaking. That also helps when deploying a new version of a web app - if you need to roll back, it really helps if you can leave the database as it is.
Obviously this comes with significant architectural handicaps, and you will almost certainly end up with a database which shows its lineage, so to speak - but the deployment benefits are usually worth the headaches, in my experience.
It helps if you have a solid collection of integration tests for each old version involved . You should be able to run them against your migrated test database for every version which is still deemed to be "possibly live" - which may well be "every version ever" in some cases. If you're able to control deployment reasonably strictly you may get away with only having compatibility for three or four versions - in which case you can plan phasing out obsolete tables/columns etc if there's a real need. Just bear in mind the complexity of such planning against the benefits accrued.
Assuming only 2 versions of your client, I'd only keep one copy of the data in the new tables.
You can maintain the contract between the old and new apps behind views on top of the new tables.
Use before/instead of triggers to handle writes into the "old" views that actually write into the new tables.
You are maintaining 2 versions of code and must still develop your old app but it is unavoidable.
This way, there are no synchronisation issues, effectively you'd have to deal with replication conflicts between "old" and "new" schemas.
More than 2 versions becomes complicated as mentioned...
First, I would like to say that this problem is very hard and you might not find a complete answer.
Lately I've been involved in maintaining a legacy line of business application, which might soon evolve to a new version. Maintenance includes solving bugs, optimization of old code and new features, that sometimes cannot fit easily in the current application architecture. The main problem with our application is that it was poorly documented, there is no trace of changes and we are basically the 5th rotation team working on this project (we are fairly new to it).
Leaving the outer details on the side (code, layers, etc), I will try to explain a little how we are currently managing the database changes.
We have at this moment two rules that we are trying to follow:
First, is that old code (sql, stored procs, function, etc) works as is and should be kept as is, without modifying too much unless there is the case (bug or feature change), and of course, try to document it as much as possible (especially the problems like:
"WTF!, why did he do that instead of that?").
Second is that every new feature that comes in should use the best practices known at this moment, and modify the old database structure as little as it can. This would introduce some database refactoring options like using editable views on top of the old structure, introducing new extension tables for already existing ones, normalizing the structure and providing the older structure through views, etc.
Also, we are trying to write as many unit tests as we can provided the business analysts are working side by side and documenting the business rules.
Database refactoring is a very complex field to be answered in a short answer. There are a lot of books that answer all your problems, one http://databaserefactoring.com/ being pointed in one of the answers.
Later Edit: Hopefully the second rule will also answer the handling of breaking changes.

Resources