How to fusion two mongoldb clusters - database

is it possible to move all databases from one mongo cluster to another that contains data? or it should be a new cluster only?
Is is possible to do live migration without putting down the application (I mean , there will be a data writing or reading while doing migration). Is this risky and could involve data loss.
bests,

yes, it is possible to merge more than one cluster in an existing cluster.
However, to avoid losing data, stop the application, change the connection to point the target cluster and cutover.

Related

How to sync two postgresql databases in real time

I have two postgresql databases configured for replication. Master node is sending new data in real time to secondary node and it works just fine. But there is one disadvantage - secondary node is read-only, so no write requests are accepted.
What I need is to be able to perform write/read operations on both of the databases and still be able to have them in perfect sync, so they are identical.
What is the best solution for such requirements?
Just for the context of this question:
I have two web app instances that are deployed in two different locations in the world (very high delay in sending requests is the reason I decided to deploy one instance locally in each location). Both instances are fetching the same data but they are also able to generate some data and input it into the DB. It is impossible to have only one DB due to too big delay when fetching data.
Maybe my solution is not perfect, I'm open for any suggestion really because I'm out of ideas how to make it work smoothly and maybe I'm lacking some knowledge.
Thanks

MongoDB - set replication to DocumentDB

We're setting up a local MongoDB cluster - Locally, we'll have one primary and one node, and we want to have another node in AWS. Is it possible to have that node as the DocumentDB service instead of an EC2 instace?
Also, I know I must have an odd number of total nodes, is it possible to first add one node and then add another one?
Thanks ahaed.
Also, I know I must have an odd number of total nodes
In a MongoDB replica set, you can have any number of nodes you like. It is possible to have a 2-node replica set, although it's not very practically useful since unavailability of a single node (e.g. a restart for maintenance) would make the whole deployment unavailable for writes. A 4-node replica set is a feasible construction if you wanted an additional replica somewhere (e.g. for geographically close querying from a secondary, or for analytics querying), though if you are simply doing this for redundancy you should probably stick with the standard 3-node configuration and configure proper backups.
Is it possible to first add one node and then add another one?
You can reconfigure a replica set at any time.
Is it possible to have that node as the DocumentDB service instead of an EC2 instace?
Unlikely. DocumentDB is not MongoDB. DocumentDB pretends to be like a MongoDB but it 1) pretends to be an old version of MongoDB, 2) even then many features don't work, and 3) it's not anywhere near the same architecture as MongoDB under the hood. So when you ask a genuine MongoDB database to work with a DocumentDB node, this will probably not work.
This assumes you can even configure DocumentDB in the required manner - I suspect this won't be possible to begin with.
If you're only trying to replicate the data to DocumentDB, Database Migration Service is a good tool for the job: https://aws.amazon.com/dms/
But like others have said, this will be a separate cluster from your MongodDB setup.

Writing to many replicas of MongoDB

Let's say I have a distributed application that writes to the database. To decrease latency one of the instances (app + database) is hosted in Australia and another one is hosted in Europe. Both instances of database need to share the same data.
So what we are after here is data locality. The reason for it is obvious: we don't want users in Australia shooting requests to our database in Europe because that would increase latency.
The natural choice would be to deploy both instances of database in a one replica set. But it seems that with MongoDB you can write to only one Mongo instance within replica set.
What are the strategies with MongoDB to have two instances of database, sharing the same data, to which you can write to? Or is the MongoDB just a wrong choice for this requirement?
Huge subject, but i'll try to give you a short and simple answer :
As your two instances must share the same sata, you can't use sharded cluster with zones . But replica set can be your solution :
Create a replica set with at least the following :
a server in a 'neutral' zone. It will be the primary server (set a priority higher). This server, as long as it still primary, will handle your write operations.
your two existing servers with lower priority.
Set in your application Read Preference to 'nearest'. This way, your read operations will be handle by the server having the mower network latency, regardless of the Master/secondary roles of server.
But i highly recommand you to check the documentation, to see how correctly deploy this architecture. Here's a good start
EDIT
Some consideration about this solution :
This use case is one of the rare use case where it's better to read from secondaries. In general, prefer reading your data from MASTER, since replica set is done for high availability, not for scalability.
If some of your data can be 'located' to be accessed faster, consider sharding collections as a better solution

How transparent should losing access to a Postgres-XL datanode be?

I have set-up a testing Postgres-XL cluster with the following architecture:
gtm - vm00
coord1+datanode1 - vm01
coord2+datanode2 - vm02
I created a new database, which contains a table that is distributed by replication. This means that I should have the exact copy of that table in each and every single datanode.
Doing operations on the table works great, I can see the changes replicated when connecting to all coordinator nodes.
However, when I simulate one of the datanodes going down, while I can still read the data in the table just fine, I cannot add or modify anything, and I receive the following error:
ERROR: Failed to get pooled connections
I am considering deploying Postgres-XL as a highly available database backend for a fair number of applications, and I cannot control how those applications interact with the database (it might be big a problem if those applications couldn't write to the database while one datanode is down).
To my understanding, Postgres-XL should achieve high availability for replicated tables in a very transparent way and should be able to support losing one or more datanodes (as long as at least one is still available - again, this is just for replicated tables), but this does not seem the case.
Is this the intended behaviour? What can be done in order to be able to withstand having one or more datanodes down?
So as it turns out not transparent at all. To my jaw dropping surprise at it turns out Postgres-XL has no build in high availably support or recovery. Meaning if you lose one node the database fails. And if you are using the round robbin or hash DISTRIBUTED BY options if you lose a disk in a node you have lost the entire database. I could not believe it, but that is the case.
They do have a "stand by" server option which is just a mirrored node for each node you have, but even this requires manually setting it to recover and doubles the number of nodes you need. For data protection you will have to use the REPLICATION DISTRIBUTED BY option which is MUCH slower and again has no fail over support so you will have to manually restart it and reconfigure it not to use the failing node.
https://sourceforge.net/p/postgres-xl/mailman/message/32776225/
https://sourceforge.net/p/postgres-xl/mailman/message/35456205/

Scaling out SQL Server for the web (Single Writer Multiple Readers)

Has anyone had any experience scaling out SQL Server in a multi reader single writer fashion. If not can anyone suggest a suitable alternative for a read intensive web application, that they have experience with
It depends on probably 2 things:
How big each single write is?
Do readers need real time data?
A write will block readers when writing, but if each write is small and fast then readers won't notice.
If you offload, say, end of day reporting then you batch your load onto a separate server because readers do not require real time data. This makes sense
A write on your primary server must be synched to your offload secondary server... which will block there as part of the synch process anyway + you add an overhead load to manage the synch.
Most apps are 95%+ read anyway all the time. For example, an update or delete is a read followed by a write.
My choice would be (probably, based on the low write volume and it's a web app) to scale up and stuff as much RAM as I could in the DB server with separate disk paths for the data and log files of the database.
I don't have any experience with scaling out SQL Server for your scenario.
However for a Read-Intensive application, I would be looking at reducing the load on the database and employ a Cache Strategy using something like Memcache or MS Velocity
There are two approaches that I'm aware of:
Have the entire database loaded into the Cache and manage Adding and Updating of items in the cache.
Add items to the cache only when they are requested and remove them when a write operation is performed.
Some kind of replication would do the trick.
http://msdn.microsoft.com/en-us/library/ms151827.aspx
You of course need to change your app code.
Some people use partitioned tables, with different row ranges being stored on different servers - united with views. This would be invisible to the app. Federation for this practice, I think.
By designing your database, application and server configuration (SQL particulars - location of data/log/system/sql binaries/tempdb), you should be able to handle a pretty good load. Try not to complicate things if you don't have to.

Resources