Flyway/Liquibase for Database Structure and DBUnit for Database Inserts? - database

I have the following scenario for my application:
1 Production Server
1 Test Server
n Development Computers
For database migration we use Hibernate Schema Update for the Schema and DBUnit for filling in alle the production data (on all servers/computers). When the schema update is done I generate a new DTD File for the new schema, so I can do a fresh import of the DBUnit XML. The application updates the database at startup with the XML file (only on development and test servers/computers!)
Of course this approach is not optimal and fragile. So I looked at Liquibase and Flyway. Both seem to be great tools, but what I do not get is: How do I migrate the data? In my case, I dump the data of the production system once a week and add it to the applications source control as a DBUnit XML file, so all developers have "fresh" data and the test server has current production data, too.
The problem I see with Liquibase and Flyway is, that there is no solution how to do automated diffs from the database data and generate the migration changes automatically.
So my idea is the following with the following steps:
Set Hibernate to validate instead of update.
When a STRUCTURAL database change is needed, I add it to the migration script for the major version
No database inserts are in the migration script.
Generate a new DTD for DBunit based on the new database structure
Generate the DBUnit XML from the production database.
Another idea would be to utilize flyways JavaMigration and provide an initial Database Dump based on DBUnit. All other changes for database data will be handled in migration scripts. But still there is the problem: How to make diffs from the current migration script state and the production database state?
It would be awesome if anyone could provide me hints how to handle my scenario :)

If your goal is to use dumps of the PROD database in DEV and TEST environments, I would:
Configure the DB migration tool to run on application startup (both Flyway and Liquibase support this through their respective APIs)
Package all the DB structure migrations together with the app
Dump both data and structure from PROD
This way, when the PROD database is restored to DEV or TEST, the old metadata table of the migration tool is restored as well.
When the app starts, the migration tool will discover that the db structure is outdated and upgrade it to the newest version. Done.
No need to use DBUnit for this.

The short answer is that all your changes would be done through Liquibase or Flyway.
We use Flyway, with the same prod/test/development setup.
We make all db changes (structure or metadata) using Flyway migration scripts, stored in source control. Each time we do a new deployment to an environment, we first run the migration scripts there (using either the command line tool or the maven plugin). The code first goes to development environment, gets integration tested there and keeps going to test and production.
The main thing to watch out for is that Flyway requires a linear versioning to the files, so if two developers check in migrations at the same time, one of them will have to rename theirs.

Related

Sync database between multiple users

I am looking for a solution to sync DB between multiple developers (us at the office..).
We use Wordpress and MAMP (for now, MAMP/Headless WP and NPM/React in the future) and we want to use Appveyor (or similar) to deploy at dev-server and live-server, and want the DB to be synced everywhere or at least among us and the dev server and have a secondary (free standing) on the live-server.
Can this be done with Liquidbase or is there a better option?
Thanks :)
I don't know a whole lot about WordPress and how it uses the database, but in theory this should be possible as long as you are talking about syncing the schema changes. If you are also trying to sync the data, then Liquibase is not the right tool for the job.
To do this with Liquibase, try installing using the installer and working through some of the examples to get an idea for how the tool works. The examples use a local h2 in-memory database, so it is pretty painless to try things and start over if you mess things up.
After getting a feel for things, you will want to use the Liquibase generateChangeLog command to create the initial changelog that contains all the instructions for creating the schema as it exists on the database you are using when you run generateChangeLog. Then test that you can run liquibase update on a separate database and have WordPress use that database successfully.
Once you have proven that workflow, you can continue by following this pattern:
Before making changes to the WordPress schema, run liquibase snapshot to create a JSON formatted snapshot of the "DEV" schema - the schema you are changing in development mode. You will need additional options to generate the JSON format snapshot.
Make the desired changes to the WordPress "DEV" schema, most likely by using the WordPress app itself.
Use liquibase diffChangeLog to compare the JSON snapshot to the newly-altered "DEV" schema. This will add changesets to the existing changelog file that describe how to alter the schema to create the desired changes.
Use liquibase changeLogsSync on the "DEV" schema to update the liquibase tracking tables so that liquibase knows that the changes in the changelog already exist in that database.
Use liquibase update against the "PROD" database to have the new schema changes show up in that environment.
This workflow is described in the Liquibase docs for the snapshot command.
ps - there is no d in Liquibase :-)

Given I have to write the migration scripts myself, what value does Flyway provide?

In my situation I use a tool that generates SQL statements to contain all database init/create statements. How does Flyway provide value beyond what my tool provides? Why should I care to write hand-coded migration scripts to use Flyway?
The question above mixes two things that should be separate: the concept of database creation mixed with the concept of migration.
database creation
Given a complete database and an empty database, you can use many tools to generate the scripts needed to recreate the complete database where nothing exists. In Flyway terms, you just creating a baseline. This isn't the concept of migration at all. Of course, given a V2.0 database, you could see any V1.0 database, blow it away, and install the V2.0 database, but now you've lost your data.
migration
Given a complete database V2.0 and a V1.0 older database, and you want to make the V1.0 database be "upgraded" to the V2.0. In the database world, this is called a migration because the existing 1.0 data needs to be re-arranged in a way that it works on V2.0. Now you need a script that not only creates/alters tables, you need a script that does some ETL (extract data, transform the data to be able to load into the new table structures, alter the old database to the new table structures, then load the data into the database). This may or may not be trivial, depending. You build the script to do it, Flyway will manage executing that script.
Flyway
Flyway enables the following:
Migration scripts become part of the software asset. They are versioned so that baseline/migration scripts can be maintained in source control in a way that migration becomes a repeatable feature as opposed to "one off" scripting work.
Flyway maintains a meta table in each database it works with so it knows what scripts have been applied
Flyway can apply migration in a completely automated way that removes manual execution errors
Flyway enables the creation of migration scripts as part of development (like Test Driven Development makes unit test creation an integral part of development) so that all your database development is captured in the form of migration scripts (rather than building migration scripts as needed as part of "one off" migrations.
It's common when using Flyway to update any previous version of your application in seconds via a single command. It becomes so easy that the stress of migration from an old DB to a new version goes away and now, evolution of the DB becomes easy and usual.
To use Flyway well, it requires changing your workflow: every time develop a change in your developer DB, put the change into a migration script so you can execute those changes against all the older DB versions that exist in the world. And those scripts are checked into your application's source code making migration a first class citizen of your software asset just like any other functionality.
It depends very much on your use case,
If you plan to write a simple application with an database structure that will remain static over the lifetime of the application it will add very little value.
If the project is expected to have a dynamic design over its lifetime with changes taking place on the schema Flyway provides a formal structure in which the changes maybe expressed and viewed. This formal structure can also be very helpful if you end up with a larger team working on the project as Flyway can then become part of the framework to handle things like multi-schema CI work.
One key thing is that you do not have to start with Flyway, you can added it at a later point, normally with limited retooling as the schema at that point in time will just become your baseline to which all future changes can be added.

Azure continuous deployment from GitHub and database upgrades

I have a Web application that I usually deployed using Web Deploy directly from Visual Studio (whatever branch I am currently using in VS - normally master). But now I'm introducing a second web app on Azure that will be built from the same repo but different branch. To make things simpler I will be configuring both Web apps on Azure to integrate directly with GitHub and associate them with specific branch.
I also added two additional web.config files: Web.Primary.config and Web.Secondary.config and configured app settings on Azure portal of each web app by adding additional value SCM_BUILD_ARGS and set them to
SCM_BUILD_ARGS=-p:PublishProfile=Primary // in primary web app
SCM_BUILD_ARGS=-p:PublishProfile=Secondary // in secondary web app
which I understand will transform correct config file with specific external services' configurations (DB connection, mail server, etc.).
Now the additional step that I would like to include in continuous deployment is run a set of SQL scripts that I have in my repo that I used to manually upgrade database during Web Deploy in VS. Individual scripts are actually doing specific database upgrade steps:
backup current tables - backup creates a set of Backup_OriginalTableName tables that are copied from existing ones and populated with existing data
drop whole DB model - all non-backup objects are being dropped from procedures, functions, types, views, tables...
create model - creates all tables, views and indices
create user types
create user functions
create stored procedures
restore data to new tables from backup tables - this step may occasionally break if we introduce new non-nullable columns to tables in the new model don't have defaults defined on them; I will somehow have to mitigate this problem by adding an additional script that will add missing columns to backup tables and give them some defaults, but that's a completely different issue.
I used to also have a set of batch files (BAT) in my VS solution that simply executed sqlcmd against specific database instance and executed these scripts in predefined order (as above). Hence I had batches:
Recreate Local.bat - this one used additional SQL scripts to not restore from backup but rather to recreate an empty DB with only lookup tables being populated and some default data for development purposes (like predefined test users)
Restore Local.bat - I used this script to simply restore database from backup tables discarding any invalid data I may have created while debugging/testing since last DB recreate/upgrade/restore
Upgrade Local.bat - upgrade local development DB executing scripts mentioned above
Upgrade Production.bat - upgrade production DB on Azure executing scripts mentioned above
So to support the whole deployment process I was now doing manually in VS I would now like to also execute these scripts against specific Azure SQL DB during continuous deployment. I suppose I should be running these right after code deployment because if that one fails, DB shouldn't be upgraded either.
I'm a bit confused where and how to do this? Can I configure this somewhere in Azure portal? I was looking for resources on the Web but I can't seem to find any relevant information how to do additional deployment steps to execute these scripts. I think this is some everyday scenario as it's hard to think of web apps not requiring databases these days.
Maybe it's just my process that is wrong for DB upgrade/deployment so let me also know if there is any other normal way that does DB upgrade/migration with continuous deployment on Azure... I may change my process to accommodate for this.
Note 1: I'm not using Entity Framework or any other full blown ORM. I'm rather using NPoco and all my DB logic is built in SPs that DAL is using.
Note 2: I'm aware of recently introduced staging capabilities of Azure, but my apps are on cheaper plan that doesn't support staging and I want to keep it this way as I may be introducing additional web apps along the way that will be using additional code branches and resources (DB, mail etc.)
It sounds to me like your db project is a good candidate for SSDT and inclusion in source control. You can create a MyDB.sqlproj that builds your db as a dacpac, and then you can use SqlPackage.exe Publish to accomplish your deployment to Azure.
We recently brought our databases under source control and follow a similar process to build and automatically deploy them (but not to a SQL Azure DB). We've found the source control, SSDT tooling support, and automated deployment options to be worth the effort of setting up and maintaining our project this way.
This SO question has some good notes for Azure deployment of a dacpac using SSDT:
How to publish DACPAC file to a SQL Server database project via SQLPackage.exe of SSDT?

managing Sql Server databases version control in large teams

For the last few years I was the only developer that handled the databases we created for our web projects. That meant that I got full control of version management. I can't keep up with doing all the database work anymore and I want to bring some other developers into the cycle.
We use Tortoise SVN and store all repositories on a dedicated server in-house. Some clients require us not to have their real data on our office servers so we only keep scripts that can generate the structure of their database along with scripts to create useful fake data. Other times our clients want us to have their most up to date information on our development machines.
So what workflow do larger development teams use to handle version management and sharing of databases. Most developers prefer to deploy the database to an instance of Sql Server on their development machine. Should we
Keep the scripts for each database in SVN and make developers export new scripts if they make even minor changes
Detach databases after changes have been made and commit MDF file to SVN
Put all development copies on a server on the in-house network and force developers to connect via remote desktop to make modifications
Some other option I haven't thought of
Never have an MDF file in the development source tree. MDFs are a result of deploying an application, not part of the application sources. Thinking at the database in terms of development source is a short-cut to hell.
All the development deliverables should be scripts that deploy or upgrade the database. Any change, no matter how small, takes the form of a script. Some recommend using diff tools, but I think they are a rat hole. I champion version the database metadata and having scripts to upgrade from version N to version N+1. At deployment the application can check the current deployed version, and it then runs all the upgrade scripts that bring the version to current. There is no script to deploy straight the current version, a new deployment deploys first v0 of the database, it then goes through all version upgrades, including dropping object that are no longer used. While this may sound a bit extreme, this is exactly how SQL Server itself keeps track of the various changes occurring in the database between releases.
As simple text scripts, all the database upgrade scripts are stored in version control just like any other sources, with tracking of changes, diff-ing and check-in reviews.
For a more detailed discussion and some examples, see Version Control and your Database.
Option (1). Each developer can have their own up to date local copy of the DB. (Up to date meaning, recreated from latest version controlled scripts (base + incremental changes + base data + run data). In order to make this work you should have the ability to 'one-click' deploy any database locally.
You really cannot go wrong with a tool like Visual Studio Database Edition. This is a version of VS that manages database schemas and much more, including deployments (updates) to target server(s).
VSDE integrates with TFS so all your database schema is under TFS version control. This becomes the "source of truth" for your schema management.
Typically developers will work against a local development database, and keep its schema up to date by synchronizing it with the schema in the VSDE project. Then, when the developer is satisfied with his/her changes, they are checked into TFS, and a build and then deployment can be done.
VSDE also supports refactoring, schema compares, data compares, test data generation and more. It's a great tool, and we use it to manage our schemas.
In a previous company (which used Agile in monthly iterations), .sql files were checked into version control, and (an optional) part of the full build process was to rebuild the database from production then apply each .sql file in order.
At the end of the iteration, the .sql instructions were merged into the script that creates the production build of the database, and the script files moved out. So you're only applying updates from the current iteration, not going back til the beginning of the project.
Have you looked at a product called DB Ghost? I have not personally used it but it looks comprehensive and may offer an alternative as part point 4 in your question.

How do I manage version control when developing with SQL Server Express?

I am developing a website using SQL Server Express on my development machine. My web hosting company is providing me with SQL Server 2005.
At the moment all I have is a database that I develop with and a database that is on the live server. I do not have the original scripts to generate the schema but I can auto generate the create scripts individually or for the entire database.
I am now putting my code into source control and I would like to know how I manage my database schema. What do I put into it? Create commands? Alter scripts?
The database is very small at the moment and it is not hard to maintain the two databases, but I am concerned that going forward but it will get out of hand. Do you have any tips for getting the live database in sync when deploying new code?
EDIT Any ideas as to what should go into source control? Should the DDL scripts go here?
Deploy schema changes as DDL upgrade scripts and, if you haven't already, add a table to contain the schema version number which you update at the end of each upgrade script.
EDIT: Yes, all your scripts should go into source control, including DDL scripts.
I typically keep a testing copy of the live database on my local or virtual development box, which I flush routinely from the prod database down to testing. The testing copy is meant for my total exploitation. When I have something that I believe is ready for deployment, I move it to my development dataset, which mirrors the prod db and is not used for playing around. If the development db passes all my tests, I deploy the script to the production db

Resources