Merge 2 schemas from Postgres DBs with different flyways - database

I currently have 2 databases used by 2 services (let's call them database/service A and database/service B), both of them with their own schemas.
I need to migrate some of the tables from DB A into DB B, once that's all completed re-point service A to service B. I know could easily do the schema migration by using pg_dump utility and that seems to be the "easy" bit.
The problem I have is that both services use Flyway for database version control, hence when I re-point service A to DB B there's a bunch of migrations that are clashing on the same version number because of checksum mismatch.
I've seen that there's a "baseline" functionality in Flyway (https://flywaydb.org/documentation/command/baseline), but at first look that doesn't seem to be what I need.
How could I resolve this problem?

On first considering this problem, the immediate answer is that your move from DbB to DbA is done through one migration on top of the existing migrations in DbA. You don't try to modify the database outside of the Flyway process. Instead, you incorporate the Flyway process into your database change. Flyway is very agnostic to the set of changes you introduce. So, you're just adding another change to the existing set. This shouldn't result in a repair or a baseline to get to the required point.
Let's say the last migration for DbA is V6.3__XXX, we just add V6.4__MigratingDbB to our chain of changes. What's in that script is the necessary set of changes. That should do it.

Grant's answer is definitely the best, but an alternative solution if the database objects for the two services are completely independent, is to have two Flyway configurations which refer to the script collections for each service, and which have distinct history tables. The problem is if there are dependencies between the two services; the migrations from one service would then need to know the current state of play in the other, which could get you in a tangle actually executing them.

Related

Sub version for database (I want something for data values in the database, not for the schema)

I am using github for maintaining versions and code synchronization.
We are team of two and we are located at different places.
How can we make sure that our databases are synchronized.
Update:--
I am rails developer. But these days i m working on drupal projects (where database is the center of variations). So i want to make sure that team must have a synchronized database. Also the values in various tables.
I need something which keep our data values synchronized.
Centralized database is a good solution. But things get disturbed when someone works offline
if you use visual studio then you can script your database tables, views, stored procedures and functions as .sql files from a database solution and then check those into version control as well - its what i currently do at my workplace
In you dont use visual studio then you can still script your sql as .sql files [but with more work] and then version control them as necessary
Have a look at Red Gate SQL Source Control - http://www.red-gate.com/products/SQL_Source_Control/
To be honest I've never used it, but their other software is fantastic. And if all you want to do is keep the DB schema in sync (rather than full source control) then I have used their SQL Compare product very succesfully in the past.
(ps. I don't work for them!)
You can use Sql Source Control together with Sql Data Compare to source control both: schema and data. Here is an article from redgate: Source controlling data.
These are some of the possibilities.
Using the same database. Set-up a central database where everybody can connect to. This way you are sure everybody uses the same database all the time.
After every change, export the database and commit it to the VCS. This option requires discipline and manual labor.
Use some kind of other definition of the schema. For example, Doctrine for php has the ability to build the database from a yaml definition which can be stored in the vcs. This can be easier automated then point 2.
Use some other software/script which updates the database.
I feel your pain. I had terrible trouble getting SQL Server to play nice with SVN. In the end I opted for a shared database solution. Every day I run an extensive script to backup all our schema definitions (specifically stored procedures) for version control into text files. Due to the limited number of changes this works well.
I now use this technique for our major project and personal projects too. The only negative is that it relies on being connected all the time. The other answers suggest that full database versioning is very time consuming and I tend to agree. For "live" upgrades we use the Red Gate tools, they do both schema and data compare and it works very well.
http://www.red-gate.com/products/SQL_Data_Compare/. We were using this tool for keeping databases in sync in our company. Later we had some specific demands so we had to write our own code for synchronization. Depends how complex is you database and how much changes is happening. It is much simpler if you have time when no one is working and you can lock database for syncronization.
Check out OffScale DataGrove.
This product tracks changes to the entire DB - schema and data. You can tag versions in any point in time, and return to older states of the DB with a simple command. It also allows you to create virtual, separate, copies of the same database so each team member can have his own separate DB. All the virtual copies are tracked into the same repository so it's super-easy to revert your DB to someone else's version (you simply check-out their version, just like you do with your source control). This means all your DBs can always be synchronized.
Regarding a centralized DB - just like you don't want to work on the same source code, you don't want to be working on the same DB. It means you'll constantly break each other's code and builds each time someone changes something in the DB.
I suggest that you go with a separate DB for each developer, and sync them using DataGrove.
Disclaimer - I work at OffScale :-)
Try Wizardby. This is my personal project, but I've used it in my several previous jobs with great deal of success.
Basically, it's a tool which lets you specify all changes to your database schema in a database-independent manner and then apply these changes to all your databases.

Database source control vs. schema change scripts

Building and maintaining a database that is then deplyed/developed further by many devs is something that goes on in software development all the time. We create a build script, and maintain further update scripts that get applied as the database grows over time. There are many ways to manage this, from manual updates to console apps/build scripts that help automate these processes.
Has anyone who has built/managed these processes moved over to a Source Control solution for database schema management? If so, what have they found the best solution to be? Are there any pitfalls that should be avoided?
Red Gate seems to be a big player in the MSSQL world and their DB source control looks very interesting:
http://www.red-gate.com/products/solutions_for_sql/database_version_control.htm
Although it does not look like it replaces the (default) data* management process, so it only replaces half the change management process from my pov.
(when I'm talking about data, I mean lookup values and that sort of thing, data that needs to be deployed by default or in a DR scenario)
We work in a .Net/MSSQL environment, but I'm sure the premise is the same across all languages.
Similar Questions
One or more of these existing questions might be helpful:
The best way to manage database changes
MySQL database change tracking
SQL Server database change workflow best practices
Verify database changes (version-control)
Transferring changes from a dev DB to a production DB
tracking changes made in database structure
Or a search for Database Change
I look after a data warehouse developed in-house by the bank where I work. This requires constant updating, and we have a team of 2-4 devs working on it.
We are fortunate because there is only the one instance of our "product", so we do not have to cater for deploying to multiple instances which may be at different versions.
We keep a creation script file for each object (table, view, index, stored procedure, trigger) in the database.
We avoid the use of ALTER TABLE whenever possible, preferring to rename a table, create the new one and migrate the data over. This means that we don't have to look through a history of ALTER scripts - we can always see the up to date version of every table by looking at its create script. The migration is performed by a separate migration script - this can be partly auto-generated.
Each time we do a release, we have a script which runs the create scripts / migration scripts in the appropriate order.
FYI: We use Visual SourceSafe (yuck!) for source code control.
I've been looking for a SQL Server source control tool - and came across a lot of premium versions that do the job - using SQL Server Management Studio as a plugin.
LiquiBase is a free one but i never quite got it working for my needs.
There is another free product out there though that works stand along from SSMS and scripts out objects and data to flat file.
These objects can then be pumped into a new SQL Server instance which will then re-create the database objects.
See gitSQL
Maybe you're asking for LiquiBase?

How update in a Multi Tenant app all schema of all tenants?

I am developing a multi-tenant app. I chose the "Shared Database/Separate Schemas" approach.
My idea is to have a default schema (dbo) and when deploying this schema, to do an update on the tenants' schemas (tenantA, tenantB, tenantC); in other words, to make synchronized schemas.
How can I synchronize the schemas of tenants with the default schema?
I am using SQL Server 2008.
First thing you will need is a table or other mechanism to store the version information of the schema. If nothing else so that you can bind your application and schema together. There is nothing more painful than a version of the application against the wrong schema—failing, corrupting data, etc.
The application should reject or shutdown if its not the right version—you might get some blowback when its not right, but protects you from the really bad day when the database corrupts the valuable data.
You'll need a way to track changes such as Subversion or something else—from SQL you can export the initial schema. From here you will need a mechanism to track changes using a nice tool like SQL compare and then track the schema changes and match to an update in version number in the target database.
We keep each delta in a separate folder beneath the upgrade utility we built. This utility signs onto the server, reads the version info and then applies the transform scripts from the next version in the database until it can find no more upgrade scripts in its sub folder. This gives us the ability upgrade a database no matter how old it is to the current version. If there are data transforms unique the tenant, these are going to get tricky.
Of course you should always make a backup of the database that writes to an external file preferable with an human identifiable version number so you can find it and restore it when the script(s) go bad. And eventually it will so just plan on figuring out how to recover and restore.
I saw there is some sort of schema upgrader tool in the new VS 2010 but I haven't used it. That might also be useful to you.
There is no magic command to synchronize the schemas as far as I know. You would need to use a tool - either built in house or bought (Check out Red Gate's SQL Compare and SQL Examiner - you need to tweak them to compare different schemas).
Just synchronizing can often be tricky business though. If you added a column, do you need to also fill that column with data? If you split a column into two new columns there has to be conversion code for something like that.
My suggestion would be to very carefully track any scripts that you run against the dbo schema and make sure that they also get run against the other schemas when appropriate. You can then use a tool like SQL Compare as an occasional sanity check to look for any unexpected differences.

How to have a "master-structure" database with "children-data" databases in SQL SERVER 2005?

I have been googling a lot and I couldn't find if this even exists or I'm asking for some magic =P
Ok, so here's the deal.
I need to have a way to create a "master-structured" database which will only contain the schemas, structures, tables, store procedures, udfs, etc, everything but real data in SQL SERVER 2005 (if this is available in 2008 let me know, I could try to convince my client to pay for it =P)
Then I want to have several "children" of that master db which implement those schemas, tables, etc but each one has different data.
So when I need to create a new stored procedure or something like that, I just create it on the master database (and of course it's available on its children).
Actually I have several different databases with the same schema and different data. But the problem is to maintain congruency between them. Everytime I create a script to create some SP or add some index or whatever, I have to execute it in every database, and sometimes I could miss one =P
So let's say you have a UNIVERSE (would be the master db) and the universe has SPACES (each one represented by a child db). So the application I'm working on needs to dynamically "clone" SPACES. To do that, we have to create a new database. Nowadays I'm creating a backup of the db being cloned, restoring it as a new one and truncate the tables.
I want to be able to create a new "child" of the "master" db, which will maintain the schemas and everything, but will start with empty data.
Hope it's clear... My english is not perfect, sorry about that =P
Thanks to all!
What you really need is to version-control your database schema.
See do-you-source-control-your-databases
If you use SQL Server, I would recommend dbGhost - not expensive and does a great job of:
synchronizing 2 databases
diff-ing 2 databases
creating a database from a set of scripts (I would recommend this version).
batch support, so that you can upgrade all your databases using a single batch
You can use this infrastructure for both:
rolling development versions to test, integration and production systems
rolling your 'updated' system to multiple production deployments (especially in a hosted environment)
I would write my changes as a sql file and use OSQL or SQLCMD via a batchfile to ensure that I repeatedly executed on all the databases without thinking about it.
As an alternative I would use the VisualStudio Database Pro tools or RedGate SQL compare tools to compare and propogate the changes.
There are kludges, but the mainstream way to handle this is still to use Source Code Control (with all its other attendant benefits.) And SQL Server is increasingly SCC friendly.
Also, for many (most robust) sites it's a per-server issue as much as a per-database issue.
You can put things in master like SPs and call them from anywhere. As far as other objects like tables, you can put them in model and new databases will get them when you create a new database.
However, in order to get new tables to simply pop up in the child databases after being added to the parent, nothing.
It would be possible to create something to look through the databases and script them from a template database, and there are also commercial tools which can help discover differences between databases. You could also have a DDL trigger in the "master" database which went out and did this when you created a new table.
If you kept a nice SPACES template, you could script it out (without data) and create the new database - so there would be no need to TRUNCATE. You can script it out from SQL or an external tool.
Little trivia here. The mssqlsystemresource database works as you describe: is defined once and 'appears' in every database as the special sys schema. Unfortunately the special 'magic' needed to get this working is not available to the user databases. You'll have to use deployment techniques to keep your schema in synk. That is, apply the changes to every database as the other answers already suggested.
In theory, you could put a trigger on your UNIVERSE.sysobjects table (assuming SQL Server), and then you could enumerate master.dbo.sysdatabases to find all the child databases. If you have a special table that indicates it's a child database, you can reference child.dbo.sysobjects to find it.
Make no mistake, it would be difficult to implement. But it's one way you could do it.

Testing and Managing database versions against code versions

As you develop an application database changes inevitably pop up. The trick I find is keeping your database build in step with your code. In the past I have added a build step that executed SQL scripts against the target database but that is dangerous in so much as you could inadvertanly add bogus data or worse.
My question is what are the tips and tricks to keep the database in step with the code? What about when you roll back the code? Branching?
Version numbers embedded in the database are helpful. You have two choices, embedding values into a table (allows versioning multiple items) that can be queried, or having an explictly named object (such as a table or somesuch) you can test for.
When you release to production, do you have a rollback plan in the event of unexpected catastrophe? If you do, is it the application of a schema rollback script? Use your rollback script to rollback the database to a previous code version.
You should be able to create your database from scratch into a known state.
While being able to do so is helpful (especially in the early stages of a new project), many (most?) databases will quickly become far too large for that to be possible. Also, if you have any BLOBs then you're going to have problems generating SQL scripts for your entire database.
I've definitely been interested in some sort of DB versioning system, but I haven't found anything yet. So, instead of a solution, you'll get my vote. :-P
You really do want to be able to take a clean machine, get the latest version from source control, build in one step, and run all tests in one step. Making this fast makes you produce good software faster.
Just like external libraries, database configuration must also be in source control.
Note that I'm not saying that all your live database content should be in the same source control, just enough to get to a clean state. (Do back up your database content, though!)
Define your schema objects and your reference data in version-controlled text files. For example, you can define the schema in Torque format, and the data in DBUnit format (both use XML). You can then use tools (we wrote our own) to generate the DDL and DML that take you from one version of your app to another. Our tool can take as input either (a) the previous version's schema & data XML files or (b) an existing database, so you are always able to get a database of any state into the correct state.
I like the way that Django does it. You build models and the when you run a syncdb it applies the models that you have created. If you add a model you just need to run syncdb again. This would be easy to have your build script do every time you made a push.
The problem comes when you need to alter a table that is already made. I do not think that syncdb handles that. That would require you to go in and manually add the table and also add a property to the model. You would probably want to version that alter statement. The models would always be under version control though, so if you needed to you could get a db schema up and running on a new box without running the sql scripts. Another problem with this is keeping track of static data that you always want in the db.
Rails migration scripts are pretty nice too.
A DB versioning system would be great, but I don't really know of such a thing.
While being able to do so is helpful (especially in the early stages of a new project), many (most?) databases will quickly become far too large for that to be possible. Also, if you have any BLOBs then you're going to have problems generating SQL scripts for your entire database.
Backups and compression can help you there. Sorry - there's no excuse not to be able to get a a good set of data to develop against. Even if it's just a sub-set.
Put your database developments under version control. I recommend to have a look at neXtep designer :
http://www.nextep-softwares.com/wiki
It is a free GPL product which offers a brand new approach to database development and deployment by connecting version information with a SQL generation engine which could automatically compute any upgrade script you need to upgrade any version of your database into another. Any existing database could be version controlled by a reverse synchronization.
It currently supports Oracle, MySql and PostgreSql. DB2 support is under development. It is a full-featured database development environment where you always work on version-controlled elements from a repository. You can publish your updates by simple synchronization during development and you can generate exportable database deliveries which you will be able to execute on any targetted database through a standalone installer which validates the versions, performs structural checks and applies the upgrade scripts.
The IDE also offers you SQL editors, dependency management, support for modular database model components, data model diagrams, SQL clients and much more.
All the documentation and concepts could be found in the wiki.

Resources