Architecture Challenge with 2 databases

Architecture Challenge with 2 databases - database

How to handles two databases for a single rest app. I see that this is a cloud app concept but in cases that something gets deleted in DB1 and not DB2 how to handle this and make sure that both are always the same.

There are many ways to do this.
First you need to consider whether the DBs have the same schema and data. in such case a master-slave, master-master replication will solve it.
In the case where the schema and data is similar in some tables an not in others, in that case you can use dblink to replicate just one table Postgresql: master-slave replication of 1 table
most of the time such changes are made in the app level and if you really want to avoid dealing with 2 phase commit, you can have a queue and a service processing (retrying and recovering) the updates to both dbs
finally in the case where direct updates to the db (without passing through the app) are required to be supported, triggers and dblink is an opportunity.
Please explore these tools to get more ideas: https://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling
EDIT: start here if you're lucky enough to have the same db/data https://www.2ndquadrant.com/en/resources/postgres-bdr-2ndquadrant/

Related

Best practices to update multiple maria database instances from application

we have two databases in different servers for maria DB, Primary and secondary.As Master-Slave relation is not in place. We need to update both the data bases for each user actions.
Example: If add employee action performed from front end by user. We need to insert employee details in Primary data base first and later in secondary.
We designed to insert separately to database ie two insert call's from application for each Database.
As we have multiple DB interaction for single operation, this solution will effect performance.
Is there any way we can achieve this by using procedure or UDF's?
Any better approaches or suggestion are helpfull

There are multiple ways we can do what you are looking for. Of course, there are pros and cons of each approach.
The simplest one would be to use MaxScale's "Tee" filter, that can automatically execute all your queries on multiple servers. This is, of course, transparent to the application code.
Refer to: https://mariadb.com/kb/en/mariadb-maxscale-24-tee-filter/
The benefit is that it does with the least amount of strain on your application performance as it doesn't have to make two calls to execute your statements on the databases and your application code remains simple.
Negative of this is that, there are no return values form the "tee" filter! You won't be able to tell if the insert, update, delete was successful or not.
The other method that might be much better is to use the "Spider" storage engine in MariaDB and connect to the remote table on the remote MariaDB server.
Now you can create a trigger on your primary table and depending on your business logic/requirements, you can update the Spider table, a.k.a Remote table, from within the trigger. This will give you more control and your application is still kept clean without dual connections.
Another benefit of this is that if due to some reason your Trigger fails to update the remote "Spider" table, your primary transaction will also fail! This will ensure proper data integrity is maintained :)
There are other engines that can be used similarly like "CONNECT" but "Spider" is the only officially supported among the ones that can connect to a remote database's table. Spider, however, can only connect as long as the remote database is also MariaDB.
Hope this answers your question and is what you are looking for :)
Cheers,
Faisal.

Synchronize data b/w two data stores

I have two different databases, one's an old legacy one which I'll be decommissioning due to the old service not being used anymore. The other one's is a new service and will eventually replace the old system. Before that happens we need both services running for a while.
Both have two tables for users for storing the email address, password and the other table is for simple user related data (addresses.)
I need to synchronize data between these two databases. The old one is a MS SQL Server DB and the new one's a NoSQL DB, (DynamoDB.)
My strategy would be that before going live, copy all the users from the old DB to the new one and then once the new system is running then synchronize the users between each DB.
I'll do this by having a tool run periodically to check any users added after last run by querying the users table something like this WHERE CreationDate >= LastRunTime and then for each user query it if it exists in the other database. I'll do this two way i.e. from old DB -> new DB and from new DB -> old DB.
Is this a good way of doing this? Any other better, fast solutions to achieve this?
How can I detect changes to existing user's data? Is there any better solution than checking & matching every user's record in both systems' tables and then taking the one that's last modified (by checking at the LastModifiedDate timestamp for each record) and updating it in the other system's table?

Solution 1 (My Recommended): Whenever system insert/update a record in either of the databases you add/update a record data in the database and add that information in a Queue.
A sperate reader will read from the queue and replicate the data to respective database periodically this way your data will get sync between the databases.
Note: Another advantage of using the queue would be that you don't have to set very high throughput in your DynamoDB table.
Solution 2: What you had suggested in your question, you can add a CRON job that will replicate the databases by checking the record based on timestamp.

I've executed several table migrations from Oracle / MySQL to DynamoDB with no downtime and the approach I used was a little different than what you described. This approach ends up requiring more coding but I would consider it a lower risk approach than the hard cutover you described.
This approach requires multiple phases as described below:
Phase 1
Create the new DynamoDB table(s) for the data in your legacy system.
Phase 2
Update your application to write/update data in both the legacy database and in DynamoDB. Your application will still read and write to the legacy system so this should be a low risk change.
Immediately before deploying this code load DynamoDB up with all of the old data.
Immediately after deploying audit the database to make sure they are in sync.
Phase 3
Update your application to start reading from DynamoDB. This should be low risk because your application will have been maintaining data in DynamoDB for some time.
Keep your application writing to the legacy database so you can cut back if you identify any problems in the new implementation. This ensures the cutover is low risk and you can easily roll back.
Phase 4
Remove the code from your application that reads and writes to the legacy database and deploy this to production.
You can now decommission the legacy database!
This is definitely more steps and will take more time than just taking the application down, migrating all of the data, and then deploying a new version of the application to read/write from DynamoDB. However, the main benefit to this approach is that it not only requires no downtime but is lower risk as it tests the change in phases and allows for easy rollback if any issues are encountered.

On high level, a sync job could be 1> cron job based or 2> notification based.
The cron job could do sync as well as auditing if you have "creation time" and "last_updated_by time". In this case the master DB (from where the data should be synced from) is normally a SQL Db since it's much easier to do table scan in SQL than in NoSQL (like in DynamoDB you need to use its scan function and it's limited by the table's hash key).
The second option is to build a notification machenism and this could be based on DynamoDB's stream http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html. It's a mature feature for DynamoDB, it guarantees event order and could achieve near real time event deliver. What you need to do is to build a listen for those events.
Lastly, you could take a look at AWS Database Migration Service https://aws.amazon.com/dms/ to see if it satisfies your requirement.

Need a solution to get rid of multiple database

In my company we have multiple database structure hosted in SQL Server.
for e.g., whenever a new customer sign up with us, we create a new DB in SQL Server to maintain their data.
Right now we already have 2000+ DBs in our database server. We expect more customers to sign up in near future, which might even cross 5000+ count.
Having DBs of 5000+ and increasing count of DBs might not be an advisable one, sometimes we run some task which will run across the DBs, and if we are going to run tasks across 5000+ DBs we will surely end up in performance issues.
What would be the alternative solution to avoid creating multiple DBs for each and every customer and also at the same time maintaining their data separately?
I am hearing about BigData and other DataBase solutions but could not get clear picture.
Can someone share some light on this?

If the databases have an identical schema you could combine them into one. That way each customer's table will now become a set of rows in the new database. A new customer will probably be a few new rows in the tables that store customer's profile.
You can use row level security for restricting access to customer's data:-
https://msdn.microsoft.com/en-us/library/dn765131.aspxpx
For pros and cons of using this approach over your existing see: Pros/Cons Using multiple databases vs using single database and Single or multiple databases
Using other options provide great learning opportunity but may have a significant transition cost even if there were some that were indeed better.

one solution I would suggest is to use prefix on the table name for each customer. you can then solve the security issue by limit per customer per set of tables.
the con is you will have to rewrite your application to use prefix to each table whenever it want to access it. If you have a lot of tables , that will be a problem.
I think this is how some multi Wordpress hosting site handle it database issue.

you should consider if you just store the data and access it with simple querys or if you usually do complex query's, if you just store the data and access it with simple querys and your need are not 100% relational maybe you should consider to move part of your data to HDFS file system:
https://en.wikipedia.org/wiki/Apache_Hadoop#HDFS .
To process the data in hadoop there are many tools but the raising one for sure is spark:
https://en.wikipedia.org/wiki/Apache_Spark
probably the best solution is to start move your historic data in HDFS just for storage and keep the rest as it is until you take confidence with the hadoop and spark paradigm
hadoop is a distributed , fault tollerant file system and spark is an engine for batch processing huge amount of unstructured or structured data, consider that data in hadoop are not structure usually so you have to change the way you process your data, if you want to still use sql I suggest to check Impala and Hive as well:
http://impala.io/
https://hive.apache.org/
Take a look at cloudera web site for a more structure IT solution instead of a lot of single tool that you will need to organize
http://www.cloudera.com/content/www/en-us/solutions.html
They have a quick start VM to try all the hadoop ecosystem tools , probably thats the best way to start experimenting:
http://www.cloudera.com/content/www/en-us/downloads/quickstart_vms/5-4.html

How to Sync Child Databases based on record type?

We have a situation where it is an unmovable requirement to have two separate databases, but we want to keep the single web-based front end that we currently use to manipulate what used to be a single database. Records need to go in a child database based on the value of a column, let's say employee type "hourly" vs. "salaried".
There are a lot of synonyms, stored procedures, and other bits of SQL that lie between the web interface and the database, so we figured that instead of doing the split there we could use the current database as a "master" database and then have something behind it direct the data into either of the two child databases. (as in the following diagram:)
We seem to be good to the extent that data flows one way (from the web interface to the child databases) - to the extent that data flows back the other way (from the child databases to the master), we seem to get into some hairy situations.
Some of them seem intractable (e.g. if one person on child DB A inserts a record with an autoincrement ID of 1 at the same time a person on child DB B inserts a record with the same Id of 1), but most of them seem to just be a pain in the ass.
My question is: Does there exist a solution that will allow us to sync these databases, but allow us to insert the logic of "only if the employee column has a status of X", rather than just blindly mirroring them?
Here are a few ideas that were floated around: triggers seem to have potential but seem to be a lot of work as well, and we were wondering if there were any tools out there that could do the heavy lifting of the sync for us. Does anyone out there have any ideas?
triggers
Service Broker
SSIS
Microsoft Sync Framework

So this is really more of an opinion thing, because there are possibilities with each solution that you already identified. There is no easy solution, but a lightweight approach would be to use CDC in combination with SSIS. SSIS has built in hooks to work with CDC and CDC will provide better performance with your master database - it will not involve the kind of waits that could occur from using triggers that insert data into another database.
Here is more on CDC

Parallel query execution on multiple database servers (running Microsoft SQL Server)

Is it possible to configure multiple database servers (all hosting the same database) to execute a single query simultaneously?
I'm not asking about executing queries using multiple CPUs simultaneously - I know this it possible.
UPDATE
What I mean is something like this:
There are two 2 servers: Server1 and Server2
Both server host database Foo and both instances of Foo are identical
I connect to Server1 and submit a complicated (lots of joins, many calculations) query
Server1 decides that some calculations should be made on Server2 and some data should be read from that server, too - appropriate parts of the query are sent to Server2
Both servers read data and perform necessary calculations
Finally, results from Server1 and Server2 are merged and returned to the client
All this should happen automatically, without need to explicitly reference Server1 or Server2. I mean such parallel query execution - is it possible?
UPDATE 2
Thanks for the tips, John and wuputah.
I am researching alternatives of increasing both availability and capacity of MOSS database backend. So what I'm looking for is some kind out-of-the-box SQL Server load balancing solution that would be transparent to the application, because I cannot modify the application in any way. I guess SQL Server has no such feature (and Oracle, as far as I understand it, does - it is RAC mentioned by wuputah).
UPDATE 3
A quote from the Top Tips for SQL Server Clustering article:
Let's start by debunking a common
misconception. You use MSCS clustering
for high availability, not for load
balancing. Also, SQL Server does not
have any built-in, automatic
load-balancing capability. You have to
load balance through your
application's physical design.

What you're really talking about is a clustering solution. It looks like SQL Server and Oracle have solutions to this, but I don't know anything about them. I can guess they would be very costly to buy and implement.
Possible alternate suggestions would be as follows:
Use master-slave replication, and do your complex read queries from the slave. All writes must go to the master, which are then sent to the slave, so things stay in sync. This helps things go faster because the slave only has to worry about the writes coming from the master, which are already predetermined on behalf of the slave (no deadlocks etc). If you're looking to utilize multiple servers, this is the first place I would start.
Use master-master replication. This means that all writes from both servers go to each other, so they stay in sync (at least theoretically). This has some of the benefits as master-slave but you don't have to worry about writes going to one server instead of the other. The more common use of master-master replication is for failover support; master-slave is really better suited to performance.
Use the feature John Sansom talked about. I don't know much about it, but it seems its basis is splitting your database into tables on different servers, which will have some benefits as well as drawbacks. The big issue is that since the two systems can't share memory, they will have to share a lot of data over the network to compute complex joins.
Hope this helps!
RE Update 1:
If you can't modify the application, there is hope, but it might be a bit complicated. If you were to set up master-slave replication, you can then set up a proxy to send read queries to the slave(s) and write queries to the master(s). I've seen this done with MySQL, but not SQLServer. That's a bit of a problem unless you want to write the proxy yourself.
This has been discussed on SO previously, so you can find more information there.
RE Update 2:
Microsoft's clustering might not be designed for performance, but that's Microsoft fault. That's still the level of complexity you're talking about here. If they say it won't help, then your options are limited to those above and by what you do with your application (like sharding, splitting into multiple databases, etc).

Yes I believe it is possible, well sort of, let me explain.
You need to look into and research the use of Distributed Queries. A distributed query runs across multiple servers and is typically used to reference data that is not stored locally.
http://msdn.microsoft.com/en-us/library/ms191440.aspx
For example, Server A may hold my Customers table and Server B holds my Orders table. It is possible using distributed queries to run a query that references both Server A and Server B, with each server managing the processing of its local data (which could incorporate the use of parallelism).
Now in theory you could store the exact same data on each server and design your queries specifically so that only certain table were referenced on certain servers, thereby distributing the query load. This is not true parallel processing however, in terms of CPU.
If your intended goal is to distribute the processing load of your application then the typical approach with SQL Server is to use Replication to distribute data processing across multiple servers. This method is also not to be confused with parallel processing.
http://databases.about.com/cs/sqlserver/a/aa041303a.htm
I hope this helps but of course please feel free to pose any questions you may have.

Interesting question, but I'm struggling to get my head around this being beneficial for a multi-user system.
If I'm the only user having half my query done on Server1 and the other half on Server2 sounds cool :)
If there are two concurrent users (lets say with queries of identical difficulty) then I'm struggling to see that this helps :(
I could have identical data on both servers and load balancing - so I get Server1, my mate gets Server2 - or I could have half the data on Server1 and the other half on Server2, and each will be optimised, and cache, just their own data - spreading the load. But whenever you have to do a merge to complete a query the limiting factor becomes the pipe-size between them.
Which is basically Federated Database Servers. Instead of having all my Customers on one server and all my Orders on the other I could, say, have my USA customers and their orders on one, and my European customers/orders on the other, and only if my query spans both is there any need for a merge step.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight