Using flyway to support multiple database instances with various data - database

My team currently has several beta customers using our product. The current method of upgrading a customer's database to the latest version consists of, re-initializing the database, and re-creating the customers configuration by hand, which isn't a lot, but is certainly tedious and will change as we implement some kind of migration strategy.
My question is, is it possible to use flyway (or some other tool) to manage database schema migrations of all instances of our product, yet retain independent instance data? What is the best approach to this kind of problem.

Yes, you can use Flyway for this.
You can place the customer-specific reference data in a separate location per customer.
You can then configure flyway.locations like this:
Customer A: flyway.locations=scripts/ddl,scripts/data/customer_a
Customer B: flyway.locations=scripts/ddl,scripts/data/customer_b

Related

Is there any way in Spring Boot to commit same data to two different data sources without duplicating the repository?

I want to replicate neo4j data on two different neo4j instances installed on ec2. Is there anyway by which I can commit same data to two different neo4j instances?
I tried examples for committing on two different data sources as given here but they create separate repositories for each data source config which I don't want here. I have only one repository and when I commit, it should be written in both data sources. I cannot use enterprise edition of neo4j since it is costly. So I have to limit myself to community edition only.
In general, I would like to learn about this process irrespective of type of any database. It can be SQL or H2 or mongo, any db.
Essentially you are asking for a replication feature and as far as I remember replication is not available in free open source version of neo4j.
If you create two repositories, you will introduce the problems of a distributed system example: what if one write succeeds and the other one fails. If you really want this kind of architecture, you are better off using a RabbitMQ/Kafka and make is event driven system that works on publish/subscribe pattern. You will have multiple listeners that way that can update multiple ne04j instance, still not ideal by any way!!!!
I would suggest to look at neo4j alternatives like https://orientdb.com/ or buy commercial license of neo4j.

How to design a DB for several projects

Im wondering what will be the best way to organize my DB. Let me explain:
Im starting a new "big" project. This big project will be composed by few litle ones. In general the litle projects are not related to each other, they are just features of the big one.
One thing that all the projects have in common is the users that are going to use it.
So my questions are:
Should i create different DB for each one of the litle projects
(currently each project will contain 4-5 tables)
How to deal with the users? Should I create one DB for all the users
or should i
duplicate the users table in every DB? Have in mind that the
information about the users is used a lot in every litle project,
it's NOT only for identification purposes.
Thanks in advance for your advice.
This greatly depends on the database you choose to use.
If these "sub-projects" are designed to work as one coherent unit, then I strongly recommend you keep it all in the same database. One backup, one restore, one unit.
For organizational purposes, if you are using a database which supports it, select a different Schema per project. PostgreSQL and SQL Server are two databases (among others) which support this effortlessly.
In the case of a database like MySQL, I recommend you pick a short prefix for each subproject and prefix all tables accordingly. "P1_Customer" for example.
Shared data would go in it's own schema or prefix, like Global or something like that.
Actually, this was one of the many reasons we switched our main database from MySQL to PostgreSQL. We've been heavy users of both, and I really appreciate the features that PostgreSQL offers. SQL Server, if you are in a windows environment, is a great database IMO as well.
If the little projects are "features of the big one" then I don't see a reason why you wouldn't want just one user table for the main project. The way you setup the question makes this seem true "If there is a user A in little project 1, then there must be a user A in the 'big' project." If that is true, you should likely have the users in the big db instead of doing duplication unless you have more qualifying details.
i think the proper answer is 'it depends'.
Starting your organization down the path of single centralized system is good on many levels. I think in general i would recommend this.
however:
if you are going to have dramatically different development schedules, or dramatically different user experiences with the various sub projects, then you may be better off keeping them separate.
I'd have a look at OpenID or some other single sign-on protocol depending on the nature of your application. OpenID includes a mechanism called "attribute exchange", which allows applications to retrieve profile information from the OpenID provider.
This allows you to create a central user profile repository, with an authentication scheme, and have your individual apps query that repository for profile information.
The question as to how to design your database is hard to answer without more information. In most architectures, "features" within an application tend to be closely linked - "users" are related to "accounts" are related to "organisations" etc.
I'd recommend looking at the foreign key relationships to answer this question. If you have lots of foreign keys, build a single database for all tables. If you have "clusters" of foreign keys, and you want to have a different life cycle for each application (assuming the clusters map neatly to the applications), consider separate databases.
By "life cycle", I mean mostly the development lifecycle - app 1 might deploy weekly, app 2 monthly, app 3 once only and then be frozen.

django manual database migration

I am preferring to manually migrate my tables in Django. Because using automated tools puts me in a place where I cannot see the impact. With impact, I mean the the time it takes the db get in synch with my models. Below is a simple example:
class User(models.Model):
first_name = CharField(..)
Let's say I want to add this:
class User(models.Model):
first_name = CharField(..)
last_name = CharField(..)
I will follow the these steps in my production server:
Disable site traffic.
Manually connect to the your DB server, let's say MySQL and add a field to the User table named last_name (make sure it is sync with the SQL generated for the new Model, of course.)
Update your model.
Upload new files, restart traffic.
I have two questions for this scenario:
Is this a preferred/acceptable way for manual db migration in Django?
If I just add a field with a specific default value to the User table by SQL manually, but don't update the model, will I still get DatabaseIntegrity exception?
Thanks in advance,
With all of the schema migration tools, such as south, there are ways of explicitly defining how your models get migrated. The benefits of using a tool such as this are:
Your migrations are stored in your version control system
There's a documented procedure to roll back schema migrations
If another developer joins your project, you can refer that person to the south documentation rather than explaining your own hacky solution to documenting schema migrations.
I think I should just emphasize a point here: Though south has automigration tools, you don't have to use automigration if you're using South.
Is this a preferred/acceptable way for manual db migration in Django?
I would answer no. As #Mike said Django has a reliable and fairly versatile ecosystem of migration tools, the most prominent of which is South. #Mike's answer has the details right.
To answer your second question:
If I just add a field with a specific default value to the User table by SQL manually, but don't update the model, will I still get DatabaseIntegrity exception?
No. Your models will continue to function normally. Of course if you want to do something with the new fields using Django's ORM you'll be better off adding them to the model class.
A side effect of this is that you can migrate legacy database tables by selectively choosing the fields to use in your models.

Interacting with external DB via Django

I'm working on a Django app that interacts with an existing database (think ERP/transaction type data) to perform analysis. There will be minimal/no updating of the existing database mainly reading data in. Its just a simple small setup so no replication etc. issues to think about re. updating.
The analysis would result in new records created within the Django Model.
Currently the existing DB runs on PostgreSQL.
I am aware of Alex Gaynor's GSOC multidb code which, from what I gather is ticket #1142 which has no patch yet to trunk.
So from what I gather there are three options I can see:
1) Point Django db to the same db as the ERP and let it create the tables it needs within it (all the ERP tables have a prefix so there would be no collision) however this strikes me as hackey and a recipe for disaster.
2) Create a new db for Django and automatically copy over the required tables. Better but I cant update, thought I can probably live with this.
3) Try out the multidb patch.
Are there other better ideas out there? I'm leaning towards at least trying out the multidb patch but I'm a little worried about stability and forwards compatibility.
How about not using Django's ORM layer at all for that DB? It the interaction is minimal, you might do it faster by just using direct SQL with the appropriate postgresql-python library.

What is a good reverse db engineer tool for NHibernate?

Does anyone know of a good tool to reverse engineer mappings and business classes for NHibernate? NHibernate is best for greenfield development, but we also need to work with large legacy databases. I've tried NGen, which does ok, but it does the entire DB and you cannot select individual tables or map to sprocs and it maps a UNIQUEIDENTIFIER to a UNIQUEIDENTIFIER(should have been to a GUID).
We do have a corporate budget, so the tool doesn't have to be free. I understand that Frans has said the next version of LLBLGen will provide support for NHibernate and other 3rd parties (Is LL to be the one generator to rule them all?), but that's 4th quarter or later.
We use LLBLGen exclusively and LOVE it. Since we utilize legacy databases as well it was a perfect fit. Maybe an alpha or beta will be available earlier?
I've used MyGeneration with NHibernate before. Unfortunately, I can't say much about the setup/configuration process because I inherited the files from another developer. I do know that you tell it which database to run against and then it comes back with a list of database objects (for sure tables and views, not sure about stored procedures). Then you select which objects you want to generate mappings for and click a button which generates mappings and/or classes with a template engine.

Resources