Hibernate: how to mirror specific data - database

I'm currently working on a project using hibernate for persistance on top of databases of various types.
The solution consists of multiple servers with their own databases.
The challenge is now to build server that receives all data from all other servers to provide monitoring and reporting functionality. If data changes in one of the servers, it shall (almost) instantly be sent to the monitoring server. Network latency and outage shall be handled.
I found two possible ways to monitor the data changes (insert, update, delete):
Hibernate Envers
Appears to be an auditing solution that builds a protocol of all modifications in individually created database tables. I could not find information how to filter the data. This may become necessary in the future
Hibernate Interceptor
The interceptor functionality (e.g. described in the Mykong blog entry). It
does almost the same like Envers but gives me the possibility to use my own audit table to store the modifications and to filter the data by my own criteria if necessary
My idea is now to
store the modifications by serializing the data to the audit table
scan the table (e.g. every 30 seconds) for new entries
transfer the entries (e.g. by http upload) to the monitoring server
import the data to the monitoring database using hibernate
My question is now:
Is there a better or easier way to solve this?

Related

Load balancer and multiple instance of database design

The current single application server can handle about 5000 concurrent requests. However, the user base will be over millions and I may need to have two application servers to handle requests.
So the design is to have a load balancer to hope it will handle over 10000 concurrent requests. However, the data of each users are being stored in one single database. So the design is to have two or more servers, shall I do the followings?
Having two instances of databases
Real-time sync between two database
Is this correct?
However, if so, will the sync process lower down the performance of the servers
as Database replication seems costly.
Thank you.
You probably want to think of your service in "tiers". In this instance, you've got two tiers; the application tier and the database tier.
Typically, your application tier is going to be considerably easier to scale horizontally (i.e. by adding more application servers behind a load balancer) than your database tier.
With that in mind, the best approach is probably to overprovision your database (i.e. put it on its own, meaty server) and have your application servers all connect to that same database. Depending on the database software you're using, you could also look at using read replicas (AWS docs) to reduce the strain on your database.
You can also look at caching via Memcached / Redis to reduce the amount of load you're placing on the database.
So – tl;dr – put your DB on its own, big, server, and spread your application code across many small servers, all connecting to that same DB server.
Best option could be the synchronizing the standby node with data from active node as cost effective solution since it can be achievable using open source relational database(e.g. Maria DB).
Do not store computable results and statistics that can be easily doable at run time which may help reduce to data size.
If history data is not needed urgent for inquiries , it can be written to text file in easily importable format to database(e.g. .csv).
Data objects that are very oftenly updated can be kept in in-memory database as key value pair, use scheduled task to perform batch update/insert to relation database to achieve persistence
Implement retry logic for database batch update tasks to handle db downtimes or network errors
Consider writing data to relational database as serialized objects
Cache configuration data to memory from database either periodically or via API to refresh the changing part.

Synchronize data b/w two data stores

I have two different databases, one's an old legacy one which I'll be decommissioning due to the old service not being used anymore. The other one's is a new service and will eventually replace the old system. Before that happens we need both services running for a while.
Both have two tables for users for storing the email address, password and the other table is for simple user related data (addresses.)
I need to synchronize data between these two databases. The old one is a MS SQL Server DB and the new one's a NoSQL DB, (DynamoDB.)
My strategy would be that before going live, copy all the users from the old DB to the new one and then once the new system is running then synchronize the users between each DB.
I'll do this by having a tool run periodically to check any users added after last run by querying the users table something like this WHERE CreationDate >= LastRunTime and then for each user query it if it exists in the other database. I'll do this two way i.e. from old DB -> new DB and from new DB -> old DB.
Is this a good way of doing this? Any other better, fast solutions to achieve this?
How can I detect changes to existing user's data? Is there any better solution than checking & matching every user's record in both systems' tables and then taking the one that's last modified (by checking at the LastModifiedDate timestamp for each record) and updating it in the other system's table?
Solution 1 (My Recommended): Whenever system insert/update a record in either of the databases you add/update a record data in the database and add that information in a Queue.
A sperate reader will read from the queue and replicate the data to respective database periodically this way your data will get sync between the databases.
Note: Another advantage of using the queue would be that you don't have to set very high throughput in your DynamoDB table.
Solution 2: What you had suggested in your question, you can add a CRON job that will replicate the databases by checking the record based on timestamp.
I've executed several table migrations from Oracle / MySQL to DynamoDB with no downtime and the approach I used was a little different than what you described. This approach ends up requiring more coding but I would consider it a lower risk approach than the hard cutover you described.
This approach requires multiple phases as described below:
Phase 1
Create the new DynamoDB table(s) for the data in your legacy system.
Phase 2
Update your application to write/update data in both the legacy database and in DynamoDB. Your application will still read and write to the legacy system so this should be a low risk change.
Immediately before deploying this code load DynamoDB up with all of the old data.
Immediately after deploying audit the database to make sure they are in sync.
Phase 3
Update your application to start reading from DynamoDB. This should be low risk because your application will have been maintaining data in DynamoDB for some time.
Keep your application writing to the legacy database so you can cut back if you identify any problems in the new implementation. This ensures the cutover is low risk and you can easily roll back.
Phase 4
Remove the code from your application that reads and writes to the legacy database and deploy this to production.
You can now decommission the legacy database!
This is definitely more steps and will take more time than just taking the application down, migrating all of the data, and then deploying a new version of the application to read/write from DynamoDB. However, the main benefit to this approach is that it not only requires no downtime but is lower risk as it tests the change in phases and allows for easy rollback if any issues are encountered.
On high level, a sync job could be 1> cron job based or 2> notification based.
The cron job could do sync as well as auditing if you have "creation time" and "last_updated_by time". In this case the master DB (from where the data should be synced from) is normally a SQL Db since it's much easier to do table scan in SQL than in NoSQL (like in DynamoDB you need to use its scan function and it's limited by the table's hash key).
The second option is to build a notification machenism and this could be based on DynamoDB's stream http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html. It's a mature feature for DynamoDB, it guarantees event order and could achieve near real time event deliver. What you need to do is to build a listen for those events.
Lastly, you could take a look at AWS Database Migration Service https://aws.amazon.com/dms/ to see if it satisfies your requirement.

Mirroring or caching database views in intermediary layer for web application?

I have an Oracle 11g database with an extremely complex and badly designed schema. This is a legacy system with many dependencies that supports several critical applications, modifying the schema of this database is unfortunately out of the question.
I am developing a web application (ASP.NET MVC 5) that acts as a read-only status dashboard for the information in this database. Currently, I rely on purpose built database views to get only the information the web application needs. Given the complexity of the schema many of these views perform very poorly. When the web application is busy with many users, the database struggles to keep up, usually resulting in time out errors. Also, when the database does fail for whatever reason, the web application cannot show any data. Users would still like the web application to show a snapshot of the data before the database failed.
The nature of the data is very dynamic, rows are being added/updated/deleted by several external systems and processes constantly, and I have no way of knowing when there is a change to the underlying data, so I have to re-query the view to get fresh data.
Because of this situation, we are considering removed the direct link between this database and the web application and instead create some sort of intermediary cache/database/magic layer between them. This way, the web application would get its data from this intermediary later without placing heavy load on the complex database. When the complex database fails, the web application can still query the last snapshot of data from this intermediary layer.
The question is, what should this intermediary layer be? Because I don't know when and how the underlying data changes I can't maintain a live cache of the data. Instead I would need to rely on snapshots of the views in this database.
This is our current idea:
We create a new, intermediary SQL Server/Oracle database. A job runs every 2 minutes for each database view we are currently using, queries it, then dumps the results into a table in our intermediary SQL Server/Oracle database. This would require truncating the intermediary table, then refilling it with the fresh view result data. In the meantime, the web application would be querying these intermediary tables for data. The obvious concern is what happens when the web application is trying to query the intermediary table while it is being truncated and repopulated with fresh data? Another concern is dealing with possible concurrency issues when grabbing data from views that share foreign keys or related data.
Normally, a web application would simply maintain a cache, but this would require being hooked into the add/update/delete events in the database to maintain the state of the cache. On top of that, the cache wouldn't be a full snapshot of the database. If the database were to fail, the cache would be unable to provide a snapshot of data from before the failure.
Any other suggestions on what this magical intermediary layer should be? We are looking at solutions available on both the database end (either SQL Server or Oracle) as well as solutions on the web application side (ASP.NET MVC 5, IIS).

Calculating data difference between server database and client (embedded) database

Let's have a classic server-side RDBMS (Oracle/MS SQL/MySQL) and a client side embedded database (e.g. sqlite). We want some tables kept in sync between client and server.
Each server-side table to sync has its counterpart in client-side with the same schema (...or at least similar considering the different data types supported by the database engines). Moreover each table has a timestamp column updated by every update operation.
How can we collect the rows that are updated either server or client side since the last sync efficiently? Efficient meaning
with low bandwith usage, e.g. not sending entire tables back and forth
in a resource friendly way so that the syncing process can be utilized frequently

Copying data from a local database to a remote one

I'm writing a system at the moment that needs to copy data from a clients locally hosted SQL database to a hosted server database. Most of the data in the local database is copied to the live one, though optimisations are made to reduce the amount of actual data required to be sent.
What is the best way of sending this data from one database to the other? At the moment I can see a few possibly options, none of them yet stand out as being the prime candidate.
Replication, though this is not ideal, and we cannot expect it to be supported in the version of SQL we use on the hosted environment.
Linked server, copying data direct - a slow and somewhat insecure method
Webservices to transmit the data
Exporting the data we require as XML and transferring to the server to be imported in bulk.
The data copied goes into copies of the tables, without identity fields, so data can be inserted/updated without any violations in that respect. This data transfer does not have to be done at the database level, it can be done from .net or other facilities.
More information
The frequency of the updates will vary completely on how often records are updated. But the basic idea is that if a record is changed then the user can publish it to the live database. Alternatively we'll record the changes and send them across in a batch on a configurable frequency.
The amount of records we're talking are around 4000 rows per table for the core tables (product catalog) at the moment, but this is completely variable dependent on the client we deploy this to as each would have their own product catalog, ranging from 100's to 1000's of products. To clarify, each client is on a separate local/hosted database combination, they are not combined into one system.
As well as the individual publishing of items, we would also require a complete re-sync of data to be done on demand.
Another aspect of the system is that some of the data being copied from the local server is stored in a secondary database, so we're effectively merging the data from two databases into the one live database.
Well, I'm biased. I have to admit. I'd like to hypnotize you into shelling out for SQL Compare to do this. I've been faced with exactly this sort of problem in all its open-ended frightfulness. I got a copy of SQL Compare and never looked back. SQL Compare is actually a silly name for a piece of software that synchronizes databases It will also do it from the command line once you have got a working project together with all the right knobs and buttons. Of course, you can only do this for reasonably small databases, but it really is a tool I wouldn't want to be seen in public without.
My only concern with your requirements is where you are collecting product catalogs from a number of clients. If they are all in separate tables, then all is fine, whereas if they are all in the same table, then this would make things more complicated.
How much data are you talking about? how many 'client' dbs are there? and how often does it need to happen? The answers to those questions will make a big difference on the path you should take.
There is an almost infinite number of solutions for this problem. In order to narrow it down, you'd have to tell us a bit about your requirements and priorities.
Bulk operations would probably cover a wide range of scenarios, and you should add that to the top of your list.
I would recommend using Data Transformation Services (DTS) for this. You could create a DTS package for appending and one for re-creating the data.
It is possible to invoke DTS package operations from your code so you may want to create a wrapper to control the packages that you can call from your application.
In the end I opted for a set of triggers to capture data modifications to a change log table. There is then an application that polls this table and generates XML files for submission to a webservice running at the remote location.

Resources