How to update a database schema without losing your data with Hibernate? - database

Imagine you are developing a Java EE app using Hibernate and JBoss. You have a running server that has some important data on it. You release the next version of the app once in a while (1-2 weeks) and they have a bunch of changes in the persistence layer:
New entities
Removed entities
Attribute type changes
Attribute name changes
Relationship changes
How do you effectively set up a system that updates the database schema and preserves the data? As far as I know (I may be mistaking), Hibernate doesn't perform alter column, drop/alter constraint.
Thank you,
Artem B.

LiquiBase is your best bet. It has a hibernate integration mode that uses Hibernate's hbm2ddl to compare your database and your hibernate mapping, but rather than updating the database automatically, it outputs a liquibase changelog file which can be inspected before actually running.
While more convenient, any tool that does a comparison of your database and your hibernate mappings is going to make mistakes. See http://www.liquibase.org/2007/06/the-problem-with-database-diffs.html for examples. With liquibase you build up a list of database changes as you develop in a format that can survive code with branches and merges.

I personally keep track of all changes in a migration SQL script.

You can use https://github.com/Devskiller/jpa2ddl tool which provides Maven and Gradle plugin and is capable of generating automated schema migrations for Flyway based on JPA entities. It also includes all properties, dialects, user-types, naming strategies, etc.

For one app I use SchemaUpdate, which is built in to Hibernate, straight from a bootstrap class so the schema is checked every time the app starts up. That takes care of adding new columns or tables which is mostly what happens to a mature app. To handle special cases, like dropping columns, the bootstrap just manually runs the ddl in a try/catch so if it's already been dropped once, it just silently throws an error. I'm not sure I'd do this with mission critical data in a production app, but in several years and hundreds of deployments, I've never had a problem with it.

As a further response of what Nathan Voxland said about LiquiBase, here's an example to execute the migration under Windows for a mySql database:
Put the the mysql connector under lib folder in liquibase distribution for example.
Create a file properties liquibase.properties in the root of the liquibase distribution and insert this recurrent lines :
driver: com.mysql.jdbc.Driver
classpath: lib\\mysql-connector-java-5.1.30.jar
url: jdbc:mysql://localhost:3306/OLDdatabase
username: root
password: pwd
Generate or retrieve an updated database under another name for example NEWdatabase.
Now you will exctract differences in a file Migration.xml with the following command line :
liquibase diffChangeLog --referenceUrl="jdbc:mysql://localhost:3306/NEWdatabase"
--referenceUsername=root --referencePassword=pwd > C:\Users\ME\Desktop\Migration.xml
Finally execute the update by using the just generated Migration.xml file :
java -jar liquibase.jar --changeLogFile="C:\Users\ME\Desktop\Migration.xml" update
NB: All this command lines should be executed from the liquibase home directory where liquibase.bat/.sh and liquibase.jar are present.

I use the hbm2ddl ant task to generate my ddl. There is an option that will perform alter tables/columns in your database.
Please see the "update" attribute of the hbm2ddl ant task:
http://www.hibernate.org/hib_docs/tools/reference/en/html/ant.html#d0e1137
update(default: false): Try and create
an update script representing the
"delta" between what is in the
database and what the mappings
specify. Ignores create/update
attributes. (Do not use against
production databases, no guarantees at
all that the proper delta can be
generated nor that the underlying
database can actually execute the
needed operations)

You can also use DBMigrate. It's similar to Liquibase :
Similar to 'rake migrate' for Ruby on
Rails this library lets you manage
database upgrades for your Java
applications.

Related

Sync database between multiple users

I am looking for a solution to sync DB between multiple developers (us at the office..).
We use Wordpress and MAMP (for now, MAMP/Headless WP and NPM/React in the future) and we want to use Appveyor (or similar) to deploy at dev-server and live-server, and want the DB to be synced everywhere or at least among us and the dev server and have a secondary (free standing) on the live-server.
Can this be done with Liquidbase or is there a better option?
Thanks :)
I don't know a whole lot about WordPress and how it uses the database, but in theory this should be possible as long as you are talking about syncing the schema changes. If you are also trying to sync the data, then Liquibase is not the right tool for the job.
To do this with Liquibase, try installing using the installer and working through some of the examples to get an idea for how the tool works. The examples use a local h2 in-memory database, so it is pretty painless to try things and start over if you mess things up.
After getting a feel for things, you will want to use the Liquibase generateChangeLog command to create the initial changelog that contains all the instructions for creating the schema as it exists on the database you are using when you run generateChangeLog. Then test that you can run liquibase update on a separate database and have WordPress use that database successfully.
Once you have proven that workflow, you can continue by following this pattern:
Before making changes to the WordPress schema, run liquibase snapshot to create a JSON formatted snapshot of the "DEV" schema - the schema you are changing in development mode. You will need additional options to generate the JSON format snapshot.
Make the desired changes to the WordPress "DEV" schema, most likely by using the WordPress app itself.
Use liquibase diffChangeLog to compare the JSON snapshot to the newly-altered "DEV" schema. This will add changesets to the existing changelog file that describe how to alter the schema to create the desired changes.
Use liquibase changeLogsSync on the "DEV" schema to update the liquibase tracking tables so that liquibase knows that the changes in the changelog already exist in that database.
Use liquibase update against the "PROD" database to have the new schema changes show up in that environment.
This workflow is described in the Liquibase docs for the snapshot command.
ps - there is no d in Liquibase :-)

Given I have to write the migration scripts myself, what value does Flyway provide?

In my situation I use a tool that generates SQL statements to contain all database init/create statements. How does Flyway provide value beyond what my tool provides? Why should I care to write hand-coded migration scripts to use Flyway?
The question above mixes two things that should be separate: the concept of database creation mixed with the concept of migration.
database creation
Given a complete database and an empty database, you can use many tools to generate the scripts needed to recreate the complete database where nothing exists. In Flyway terms, you just creating a baseline. This isn't the concept of migration at all. Of course, given a V2.0 database, you could see any V1.0 database, blow it away, and install the V2.0 database, but now you've lost your data.
migration
Given a complete database V2.0 and a V1.0 older database, and you want to make the V1.0 database be "upgraded" to the V2.0. In the database world, this is called a migration because the existing 1.0 data needs to be re-arranged in a way that it works on V2.0. Now you need a script that not only creates/alters tables, you need a script that does some ETL (extract data, transform the data to be able to load into the new table structures, alter the old database to the new table structures, then load the data into the database). This may or may not be trivial, depending. You build the script to do it, Flyway will manage executing that script.
Flyway
Flyway enables the following:
Migration scripts become part of the software asset. They are versioned so that baseline/migration scripts can be maintained in source control in a way that migration becomes a repeatable feature as opposed to "one off" scripting work.
Flyway maintains a meta table in each database it works with so it knows what scripts have been applied
Flyway can apply migration in a completely automated way that removes manual execution errors
Flyway enables the creation of migration scripts as part of development (like Test Driven Development makes unit test creation an integral part of development) so that all your database development is captured in the form of migration scripts (rather than building migration scripts as needed as part of "one off" migrations.
It's common when using Flyway to update any previous version of your application in seconds via a single command. It becomes so easy that the stress of migration from an old DB to a new version goes away and now, evolution of the DB becomes easy and usual.
To use Flyway well, it requires changing your workflow: every time develop a change in your developer DB, put the change into a migration script so you can execute those changes against all the older DB versions that exist in the world. And those scripts are checked into your application's source code making migration a first class citizen of your software asset just like any other functionality.
It depends very much on your use case,
If you plan to write a simple application with an database structure that will remain static over the lifetime of the application it will add very little value.
If the project is expected to have a dynamic design over its lifetime with changes taking place on the schema Flyway provides a formal structure in which the changes maybe expressed and viewed. This formal structure can also be very helpful if you end up with a larger team working on the project as Flyway can then become part of the framework to handle things like multi-schema CI work.
One key thing is that you do not have to start with Flyway, you can added it at a later point, normally with limited retooling as the schema at that point in time will just become your baseline to which all future changes can be added.

run liquibase on multiple databases at different versions

I am trying to integrate Liquibase with our Spring/Hibernate web-app to replace our existing home-grown solution. So far Liquibase is great, but there's one use-case that is important to us and I don't know if Liquibase supports it or not, which is this:
We deploy our web app to clients who host the webapp and the database (MySQL) themselves. So, supposing we deploy to our first client (client1) with a new clean DB schema ( generated from Hibernate mappings) and no items in Liquibase changeset. We then develop some schema changes and redeploy the application to client1, and liquibase does its stuff and applies the changesets- all great so far.
Now, we deploy to a new client, client2, again with a new database schema generated from Hibernate mappings. But this time, there are changesets present ( for the changes made between client1 and client2 deployments) but they don't need to be applied, as they're already in the new schema). However, because the DATABASECHANGELOG table is empty, Liquibase will try to apply the changesets and probably fail with SQL errors.
What we'd like is for new deployments to new clients to 'know' at what changeset they are (relative to the first deployment to client 1), so it only applies subsequent updates.
There seem to be several possibilities for this, probably more I've not thought of:
populate DATABASECHANGELOG with fake entries to fool Liquibase into thinking these have already been applied.
always deploy our first,baseline original schema to subsequent clients, and run updates sequentially, and so never deploy a 'new' schema derived from Hibernate mappings, after client1.
use our own tracking system (e.g., map a db version to an application version, and a db version to a changeset).
Is this a problem, or I am just not understanding how to use Liquibase properly? Would be grateful for any advice from people who've dealt with this sort of use-case before. We'd really like to avoid deployment-specific changeSets if at all possible - there will be dozens, if not hundreds of deployments to handle.
Thanks,
Richard
We have a similar setup.
But we are getting liquibase into the game earlier. Before we officially release the software we setup the liquibase changesets and let liquibase handle the database.
We did not want to loose the advantage of letting hibernate generate the DB during the development phase. So we are also using Hibernate while developing.
But right before the version is stable we let the liquibase diff tool run on the database and let it create a changeset for the hibernate-generated tables.
Then this changeset is corrected manually since the liquibase diff tool does produce some flaws.
Once the changeset is ready we ship this with the software.
We maintain a reference system that keeps the data base version of the last officially released version. Then for the next release we let the liquibase diff tool run with the current development version against the reference db. That spits out the difference for the next version. This is also corrected manually and finally you have a changeset that changes the db to the next version.
Hope this gives you an idea of one way to use liquibase and hibernate together.
I usually suggest always running the same changelog file against all your different databases. That way you don't have to deal with manually marking changeSets as ran, using preconditions, or anything else. Most importantly, every database will follow the same upgrade path so you know they are going to update consistently without any unexpected problems.
You can use the liquibase hibernate extension to automatically append changeSets to your changelog based on your hibernate mapping, but when it comes time to deploy your changes to the databases you just run your liquibase changelog file and not try to use hibernate's schema generation logic at all.
For option 1 above (populate with fake entries) I've just discovered the changelogSync command which looks like it marks all changeset entries as applied, even if they haven't been.
But is this better or worse than genuinely applying the changes, from a baseline schema?

How to update database by ant (versioning)?

I have a web application on java, which is working with database. I need an ant’s script that will deploy or update my application to latest version. There is no problem with application part, but I don't know how to do database update.
I have an idea to build-in some meta-information (number of version) to the names of sql scripts.
For example:
DB_1.0.0.sql
DB_1.0.1.sql
DB_1.2.0.sql
DB_2.0.0.sql
DB_2.1.0.sql
My script detected, that current version was 1.0.1, so I need to execute DB_1.2.0.sql, DB_2.0.0.sql, DB_2.1.0.sql files by SQL task. Problem is: how to find files with ant, that I need to execute.
Maybe it is not the best way to update database. Do you have any other idea?
Flyway works as you've described. It keeps a record of the SQL files already applied to the database, enabling an automatic upgrade. Simple and straight forward to use.
A more powerful solution, IMHO, is liquibase. It has an XML syntax to record database changes, enabling the generation of cross-platform SQL. It also has some powerful features such as the ability to roll-back changes and perform diff's between databases.
It looks like your filenames follow a strict convetion. In that case you can find files by matching a pattern filelist and execute using sql.
You can use LiquiBase to write some tasks that can help in database schema updates-

Testing and Managing database versions against code versions

As you develop an application database changes inevitably pop up. The trick I find is keeping your database build in step with your code. In the past I have added a build step that executed SQL scripts against the target database but that is dangerous in so much as you could inadvertanly add bogus data or worse.
My question is what are the tips and tricks to keep the database in step with the code? What about when you roll back the code? Branching?
Version numbers embedded in the database are helpful. You have two choices, embedding values into a table (allows versioning multiple items) that can be queried, or having an explictly named object (such as a table or somesuch) you can test for.
When you release to production, do you have a rollback plan in the event of unexpected catastrophe? If you do, is it the application of a schema rollback script? Use your rollback script to rollback the database to a previous code version.
You should be able to create your database from scratch into a known state.
While being able to do so is helpful (especially in the early stages of a new project), many (most?) databases will quickly become far too large for that to be possible. Also, if you have any BLOBs then you're going to have problems generating SQL scripts for your entire database.
I've definitely been interested in some sort of DB versioning system, but I haven't found anything yet. So, instead of a solution, you'll get my vote. :-P
You really do want to be able to take a clean machine, get the latest version from source control, build in one step, and run all tests in one step. Making this fast makes you produce good software faster.
Just like external libraries, database configuration must also be in source control.
Note that I'm not saying that all your live database content should be in the same source control, just enough to get to a clean state. (Do back up your database content, though!)
Define your schema objects and your reference data in version-controlled text files. For example, you can define the schema in Torque format, and the data in DBUnit format (both use XML). You can then use tools (we wrote our own) to generate the DDL and DML that take you from one version of your app to another. Our tool can take as input either (a) the previous version's schema & data XML files or (b) an existing database, so you are always able to get a database of any state into the correct state.
I like the way that Django does it. You build models and the when you run a syncdb it applies the models that you have created. If you add a model you just need to run syncdb again. This would be easy to have your build script do every time you made a push.
The problem comes when you need to alter a table that is already made. I do not think that syncdb handles that. That would require you to go in and manually add the table and also add a property to the model. You would probably want to version that alter statement. The models would always be under version control though, so if you needed to you could get a db schema up and running on a new box without running the sql scripts. Another problem with this is keeping track of static data that you always want in the db.
Rails migration scripts are pretty nice too.
A DB versioning system would be great, but I don't really know of such a thing.
While being able to do so is helpful (especially in the early stages of a new project), many (most?) databases will quickly become far too large for that to be possible. Also, if you have any BLOBs then you're going to have problems generating SQL scripts for your entire database.
Backups and compression can help you there. Sorry - there's no excuse not to be able to get a a good set of data to develop against. Even if it's just a sub-set.
Put your database developments under version control. I recommend to have a look at neXtep designer :
http://www.nextep-softwares.com/wiki
It is a free GPL product which offers a brand new approach to database development and deployment by connecting version information with a SQL generation engine which could automatically compute any upgrade script you need to upgrade any version of your database into another. Any existing database could be version controlled by a reverse synchronization.
It currently supports Oracle, MySql and PostgreSql. DB2 support is under development. It is a full-featured database development environment where you always work on version-controlled elements from a repository. You can publish your updates by simple synchronization during development and you can generate exportable database deliveries which you will be able to execute on any targetted database through a standalone installer which validates the versions, performs structural checks and applies the upgrade scripts.
The IDE also offers you SQL editors, dependency management, support for modular database model components, data model diagrams, SQL clients and much more.
All the documentation and concepts could be found in the wiki.

Resources