I have to four servers located in two datacenters. DC1 <= SERVERS A & B, DC2 <= SERVERS C & D.
I need all the four servers to be a mirror of each other. I have a load balancer configured to route request depending on request overload.
For the moment circular replication sound like the best choice out there. I know the pros and cons of this replication. I would like to know if there is an alternative way of doing this.
I have already create failover scripts to manage when a node goes down and shrining the replication circule is required and the script is working.
Many thanks,
An acceptable alternative to circlar replication is a cluster.
However clusers might not suit everyone as if a any of the nodes fails to carryout a query, the query does not get commited. (Scarry ain it?)
In the end I went with circular replication and wrote a script to maintain it. If a node fails the circle shrinks automaically. The same script also introduces new/failed nodes again to the circle.
Maria DB now supports the global transaction id. This will simplify circular replication. We will be able to switch masters without the need to worry about replication position.
For more information read the article below
https://mariadb.com/kb/en/global-transaction-id/
Related
We're setting up a local MongoDB cluster - Locally, we'll have one primary and one node, and we want to have another node in AWS. Is it possible to have that node as the DocumentDB service instead of an EC2 instace?
Also, I know I must have an odd number of total nodes, is it possible to first add one node and then add another one?
Thanks ahaed.
Also, I know I must have an odd number of total nodes
In a MongoDB replica set, you can have any number of nodes you like. It is possible to have a 2-node replica set, although it's not very practically useful since unavailability of a single node (e.g. a restart for maintenance) would make the whole deployment unavailable for writes. A 4-node replica set is a feasible construction if you wanted an additional replica somewhere (e.g. for geographically close querying from a secondary, or for analytics querying), though if you are simply doing this for redundancy you should probably stick with the standard 3-node configuration and configure proper backups.
Is it possible to first add one node and then add another one?
You can reconfigure a replica set at any time.
Is it possible to have that node as the DocumentDB service instead of an EC2 instace?
Unlikely. DocumentDB is not MongoDB. DocumentDB pretends to be like a MongoDB but it 1) pretends to be an old version of MongoDB, 2) even then many features don't work, and 3) it's not anywhere near the same architecture as MongoDB under the hood. So when you ask a genuine MongoDB database to work with a DocumentDB node, this will probably not work.
This assumes you can even configure DocumentDB in the required manner - I suspect this won't be possible to begin with.
If you're only trying to replicate the data to DocumentDB, Database Migration Service is a good tool for the job: https://aws.amazon.com/dms/
But like others have said, this will be a separate cluster from your MongodDB setup.
I have two nodes which I want to run as servers in active-active mode and also have HA capability i.e if one is down, the other one should start receiving all the requests but while both are up, both should be taking all the requests. Now since Redis doesn't allow active-active mode for the same hash set and I don't have option to run Sentinel as I can't have a third node, my idea is to run the two nodes in replication and myself decide if master node is down and promote the slave to master. Are there any issues with this? When the original master comes back up, is there a way to configure it as slave?
Does this sound like a good idea? I am open to suggestions other than Redis.
Generally running two node is never a good idea because it is bound to have split brain problem: When the network between the two node is down for a moment or two, the two node will inevitably think each other is offline and will promote/keep itself to be master and start accepting requests from other services. Then the split brain happens.
And if you are OK with this possible situation, then you can look into setup a master-slave with help of a script file and a HA service like pacemaker or keepalived.
Typically you have to tell the cluster manager through a predefined rule that when two machine rejoins under split brain condition, which one is your preferred master.
When a master is elected, execute the script and basically it execute slaveof no one on itself and execute slaveof <new-master-ip> <port> on the other node.
You could go one step further in your script file and try to merge the two data sets together but whether that's achievable or not is entirely down to how you have organized your data in Redis and how long you are prepared to wait for to have all the data in sync.
I have done this way myself before through pacemaker+corosync.
Ok, partial solution with SLAVEOF:
You can manually promote slave to master by running:
SLAVEOF NO ONE
You can manually transition master to slave by running:
SLAVEOF <HOST> <port>
Clustering should be disabled.
If you brought the replica online manually by changing it to replicaof no one, you need to be careful to bring the failed master back online as a replicaof the new node so you dont overwrite more recent data. I would not recommend doing this manually. You want to minimize downtime so automated failover is ideal
You mention being open to other products. Check out KeyDB which has the exact configuration you are looking for. It is a maintained multi-threaded fork of redis which offers the active-replica scenario you are looking for. Check out an example of it here.
Run both nodes as replicas of each other accepting reads and writes simultaneously (depending on upfront proxy config). If one fails the other continues to take the full load and is already sync'd.
Regarding the split brain concern, KeyDB can handle split brain scenarios where the connection between masters is severed, but writes continue to be made. Each write is timestamped and when the connection is restored each master will share their new data. The newest write will win. This prevents stale data from overwriting new data written after the connection was severed.
I would recommendation to have at least 3 nodes with Sentinel Setup for enabling gossip/quorum for auto promotion of slave to master when current master node goes down.
I believe it is possible to create a cluster with two nodes with the commands below:
$ redis-cli --cluster create <ip-node1>:7000 <ip-node1>:7001 <ip-node2>:7000 <ip-node2>:7001 --cluster-replicas 1
To resolve the split-brain problem. you can add a third node without data:
$ cluster meet #IP_node3#:7000
$ cluster nodes
I think it works.
I have set-up a testing Postgres-XL cluster with the following architecture:
gtm - vm00
coord1+datanode1 - vm01
coord2+datanode2 - vm02
I created a new database, which contains a table that is distributed by replication. This means that I should have the exact copy of that table in each and every single datanode.
Doing operations on the table works great, I can see the changes replicated when connecting to all coordinator nodes.
However, when I simulate one of the datanodes going down, while I can still read the data in the table just fine, I cannot add or modify anything, and I receive the following error:
ERROR: Failed to get pooled connections
I am considering deploying Postgres-XL as a highly available database backend for a fair number of applications, and I cannot control how those applications interact with the database (it might be big a problem if those applications couldn't write to the database while one datanode is down).
To my understanding, Postgres-XL should achieve high availability for replicated tables in a very transparent way and should be able to support losing one or more datanodes (as long as at least one is still available - again, this is just for replicated tables), but this does not seem the case.
Is this the intended behaviour? What can be done in order to be able to withstand having one or more datanodes down?
So as it turns out not transparent at all. To my jaw dropping surprise at it turns out Postgres-XL has no build in high availably support or recovery. Meaning if you lose one node the database fails. And if you are using the round robbin or hash DISTRIBUTED BY options if you lose a disk in a node you have lost the entire database. I could not believe it, but that is the case.
They do have a "stand by" server option which is just a mirrored node for each node you have, but even this requires manually setting it to recover and doubles the number of nodes you need. For data protection you will have to use the REPLICATION DISTRIBUTED BY option which is MUCH slower and again has no fail over support so you will have to manually restart it and reconfigure it not to use the failing node.
https://sourceforge.net/p/postgres-xl/mailman/message/32776225/
https://sourceforge.net/p/postgres-xl/mailman/message/35456205/
Currently we have 2 servers with a load-balancer before them. We want to be able to turn 1 machine off and later on, without the user noticing it.
Our application also uses solr and now i wanted to install & configure solr on both servers and the question is how do i configure a master-master replication?
After my initial research i found out that it's not possible :(
But what are my options here? I want both indices to stay in sync and when a document is commited on one server it should also go to the other.
Thanks for your help!
Not certain of your specific use case (why turn 1 server on and off?), there is no specific "master-master" replication. Solr does however support distributed indexing and querying via SolrCloud. From the documentation for SolrCloud:
Replication ensures redundancy for your data, and enables you to send
an update request to any node in the shard. If that node is a
replica, it will forward the request to the leader, which then
forwards it to all existing replicas, using versioning to make sure
every replica has the most up-to-date version. This architecture
enables you to be certain that your data can be recovered in the event
of a disaster, even if you are using Near Real Time searching.
It's a bit complex so I'd suggest you spend some time going thru the documentation as it's not quite as simple as setting up a couple of masters and load balancing between them. It is a big step up from the previous master/slave replication that Solr used, so even if it's not a perfect fit it will be a lot closer to what you need.
https://cwiki.apache.org/confluence/display/solr/SolrCloud
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
You can just create a simple master - slave replication as described here:
https://cwiki.apache.org/confluence/display/solr/Index+Replication
But be sure you send your inserts, deletes, updates directly to the master, but selects can go through the load balancer.
The other alternative is to create a third server as a master, and 2 slaves, and the lode balancer can be in front of the two slaves.
I am currently looking at CouchDB and I understand that I have to specify all the replications by hand. If I want to use it on 100 nodes how would I do the replication?
Doing 99 "replicate to" and 99 "replicate from" on each node
It feels like it would be overkill since a node replication includes all the other nodes replications to it
Doing 1 replicate to the next one to form a circle (like A -> B -> C -> A)
Would work until one crash, then all wait until it comes back
The latency would be big for replicating from the first to the last
Isn't there a way to say: "here are 3 IPs on the full network. Connect to them and share with everyone as you see fit like an independent P2P" ?
Thanks for your insight
BigCouch won't provide the cross data-center stuff out of the box. Cloudant DBaaS (based on BigCouch) does have this setup already across several data-centers.
BigCouch is a sharded "Dynamo-style" fork of Apache CouchDB--it is to be merged into the "mainline" Apache CouchDB in the future, fwiw. The shards live across nodes (servers) in the same data-center. "Classic" CouchDB-style Replication is used (afaik) to keep the BigCouches in the various data-centers insync.
CouchDB-style replication (n-master) is change-based, so replication only includes the latest changes.
You would need to setup to/from pairs of replication for each node/database combination. However, if all of your servers are intended to be identical, replication won't actually happen that often--it will only happen if needed.
If A gets a change, replication ships it to B and C (etc). However, if B--having just got that change--replicates it to C before A gets the chance too--due to network latency, etc--when A does finally try, it will realize the data is already there, and not bother sending the change again.
If this is a standard part of your setup (i.e., every time you make a db you want it replicated everywhere else), then I'd highly recommend automating the setup.
Also, checkout the _replicator database. It's much easier to manage what's going on:
https://gist.github.com/fdmanana/832610
Hope something in there is useful. :)