Database synchronization between a new greenfield project database and old projects database - database

I thinking about developing a new greenfield app using DDD/TDD/NHibernate with a new database schema reflecting the domain, where changes in the DB would need to be synchronized both ways with the old projects database. The requirement is that both projects will run in parallel, and once the new project starts adding more business value than the old project, the old projects would be shutted down.
One approach I have on my mind is to achieve the db synchronization via db triggers. Once you insert/update/delete in new database, the trigger for the table would need to correctly update the old database. The same for changes in the old database, its triggers would need update the new database.
Example:
old project has one table Quote, with columns QuoteId and QuoteVersion. The correct domain model is one Quote object, with many QuoteVersion objects. So the new database would have two tables, Quote and QuoteVersion. So, if you change Quote table in the new DB, the trigger would need to either update all records with that QuoteId in the old DB or the latest version. Next, if you update Quote record in the old DB, again you either update the record in the new DB or it might update it only if the latest version of the Quote in the old DB was updated.
So, there would need to be some logic in the triggers. Those sql statements might be kind of non-trivial. To ensure maintainability, there would need to be thorough tests for triggers (save data in one db, test data in the second db, for different cases).
The question: do you think this trigger idea for db synchronization is viable (not sure yet how to ensure one trigger wont trigger the other database trigger)? Anybody tried that and found out it goes to hell? Do you have a better idea how to fulfil the requirement of sync databases?

This is a non-trivial challenge, and I would not really want to use triggers - you've identified a number of concerns yourself, and I would add to this concerns about performance and availability, and the distinct likelihood of horrible infinite loop bugs - trigger in legacy app inserts record into greenfield app, causes trigger to fire in greenfield app to insert record in legacy app, causes trigger to fire in legacy app...
The cleanest option I've seen is based on a messaging system. Every change in the application fires a message, which is handled by a recipient at the receiving end. The recipient can validate the message, and - ideally - forward it to the "normal" code which handles that particular data item.
For example:
legacy app creates new "quote" record
legacy app sends a message with a representation of the new "quote"
message bus forwards message to greenfield app "newQuoteMessageHandler"
greenfield app "newQuoteMessageHandler" validates data
greenfield "newQuoteMessageHandler" instantiates "quote" domain entity, and populates it with data
greenfield domain entity deals with remaining persistence and associated business logic.
Your message handlers should be relatively easy to test - and you can use them to isolate each app from the crazy in the underlying data layer. It also allows you to deal with evolving data schemas in the greenfield app.
Retro-fitting this into the legacy app could be tricky - and may well need to involve triggers to capture data updates, but the logic inside the trigger should be pretty straightforward - "send new message".
Bi-directional sync is hard! You can expect to spend a significant amount of time on getting this up and running, and maintaining it as your greenfield project evolves. If you're working on MS software, it's worth looking at http://msdn.microsoft.com/en-us/sync/bb736753.

Related

How to handle deleting databases in Couch across multiple clients?

Within CouchDB it's sometimes the case that eventually you'll probably have more deleted documents within a database than active documents. After a while this becomes somewhat nonoptimal, as you're syncing more deleted document data than anything else.
The official documentation recommends periodically destroying the databases in order to get around this, but I've noticed that all that happens when doing this is that a client with a local copy of the database (e.g. if you have a database named "username" that's designed to replicate to a client device via Pouch), when it sees the blank database, refills it back up, deleted document records and all.
Short of changing the database name every time, is there any way to signal to other Couch instances that they shouldn't repopulate the new fresh and clean database, and instead take it as a new database entirely? Or, in fact, any other solution at all?
Yes, if you have bidirectional replication then the "other side" will replicate all the deleted docs back to the new DB. The only two options I can think of are to have a new database (with a new name, which is what the docs you linked to probably meant), or to use filtered replication so the client doesn't push up deleted docs (or doesn't push up deleted docs older than a certain point).
The latter of these options is significantly more complex than the former.

Two Different MVC sites sharing one database Entity Framework?

I have a database-driven site developed with MVC and Entity Framework code first. The database is rather large and contains all the data that I would need for an additional web application. What are the implications of setting up a new website with database first using the same existing database? What I am really trying to ask is whether it would be a bad idea to share a database between two web applications where both are querying and doing updates to the data. Will this slow down processing on the original site or possibly lock up data, etc.? Both sites would be running on the same machine...
TIA
If sharing of the same data between both application is important i.e. you want the data to be shared between applications - than you have to use the same database. It'll slow down processing, but if it's the requirement, then you have to.
There's nothing stopping you from having two applications accessing the database. They are built to have multiple connections with multiple people accessing them. So there aren't many risks involved. You probably won't even notice the speed difference.
The two biggest risks I can think of
if both applications edit a record in the database, the one that submitted data last will win unless you put business logic in place to prevent that from happening
if the database schema is updated, both applications need to be updated to reflect the new schema to let it access the new data, or edit the data successfully

How to implement web applications more efficiently?

I've been developing a website using JSF & PrimeFaces. At the time of development, I noticed that there're 2 bottlenecks for any web applications. Correct me if I'm wrong.
The bottlenecks are:
I've used Hibernate framework for persistence layer. Now if a change
occurs in database, then there's no way to reflect that in scoped
beans. Hibernate has dynamic-update attribute which helps to update
only the affected records of the table [at the time of persisting]. But I've not found similar
kind of mechanism by which I can always get updated DAO. Here,
developer has to take responsibility of updating them by using
session.refresh() method, which just reloads the entire object
tree from database table. So,for each small database changes, I
think the caching functionality of DAO [in Hibernate] is missed
since every time they're evicted from session cache. In a word, database updates doesn't trigger DAO updates.
After updating DAO, if I want to reflect the changes in view level, then I had to take help of Primeface sockets [PrimePush] since refreshing the pages every time isn't a good implementation & primeface socket allows updating of specific ids. So, that means for each DAO fields, I've to take help of many different Primeface sockets, each one having unique channel. Also sending messages to those different sockets has to be done by developer in bean codes.
So, the question is how these can be handled in a efficient way? Is there any other technologies/framework which handles these issues so that developer doesn't have to worry about?
Ideally you shoule be doing like :
Hibernate Persistence Layer (have DAO performing CRUD operations)
Managed Beans which access your DAO
View (Primefaces) using BackBean updating the View.
You don't need PrimePush or something. It should be refreshed by actions in your Views

asp.net code first automatic database updates

I am creating an application in C# Asp.net using Code First Entity Framework that will be using a different databases for different customers (in other words every customer has its own database, that will be generated on first time use).
I am trying to figure out a way to update all these databases automatically whenever I apply changes to my objects. In other words, how would I approach a cleanstep system in Code First EF?
Currently I am using InitializerIfModelChange to define a simple database that allows me to test my application whenever a schema change occurs. However, this method drops the database, which obviously is unacceptable in case of customer databases.
I must assume hundreds of customers so updating all databases by hand is not an option
I do not mind writing code that copies the data into a new database.
I think the best solution would be a way to backup a database somehow and then reinsert all data into the newly created database. Even better would be a way that automatically updates the schema without dropping the database. However I have no idea how to approach this. Can anyone point me in the right direction?
The link posted by Joakim was helpful. It requires you to update to EF 4.3.1 (dont forget your references in other projects if you have them) after which you can run the command that enables the migration. To automatically update the schema from code you can use
Configuration configuration = new Configuration();
DbMigrator migrator = new DbMigrator(configuration);
migrator.Update();
Database.SetInitializer<DbContext>(null);

how to minimize application downtime when updating database and application ORM

We currently run an ecommerce solution for a leisure and travel company. Everytime we have a release, we must bring the ecommerce site down as we update database schema and the data access code. We are using a custom built ORM where each data entity is responsible for their own CRUD operations. This is accomplished by dynamically generating the SQL based on attributes in the data entity.
For example, the data entity for an address would be...
[tableName="address"]
public class address : dataEntity
{
[column="address1"]
public string address1;
[column="city"]
public string city;
}
So, if we add a new column to the database, we must update the schema of the database and also update the data entity.
As you can expect, the business people are not too happy about this outage as it puts a crimp in their cash-flow. The operations people are not happy as they have to deal with a high-pressure time when database and applications are upgraded. The programmers are upset as they are constantly getting in trouble for the legacy system that they inherited.
Do any of you smart people out there have some suggestions?
The first answer is obviously, don't use an ORM. Only application programmers think they're good. Learn SQL like everyone else :)
OK, so back to reality. What's to stop you restricting all schema changes to be additions only. Then you can update the DB schema anytime you like, and only install the recompiled application until a safe time (6am works best I find) after the DB is updated. If you must remove things, perform the steps the other way round - install the new app leaving the schema unchanged, and then remove the bits from the schema.
You're always going to have a high-pressure time as you roll out changes, but at least you can manage it better by doing it in 2 easier to understand pieces. Your DBAs will be ok with updating the schema for the existing application.
The downside is that you have to be a lot more organised, but that's not a bad thing when dealing with production servers and you should be seriously organised about it currently.
Supporting this scenario will add significant complexity to your environment and/or process and/or application.
You can run a complex update process where your application code is smart enough to run correctly on both the old schema and the new schema at the same time. Then you can update the application first and the schema second. A third step may be to migrate any data, which again, the application has to be able to work with. In that case, you only need to "tombstone" the application for the time it takes to upgrade the application, which could just be seconds, depending on how many files and machines are involved in the upgrade.
In most cases, it's best to leave the application/environment/process simple and live with the downtown during a slow time of the day/week/month. Pretty much all applications need to be "taken down" for time to time for "regularly schedule maintenance".

Resources