2 Node Redis HA - database

I have two nodes which I want to run as servers in active-active mode and also have HA capability i.e if one is down, the other one should start receiving all the requests but while both are up, both should be taking all the requests. Now since Redis doesn't allow active-active mode for the same hash set and I don't have option to run Sentinel as I can't have a third node, my idea is to run the two nodes in replication and myself decide if master node is down and promote the slave to master. Are there any issues with this? When the original master comes back up, is there a way to configure it as slave?
Does this sound like a good idea? I am open to suggestions other than Redis.

Generally running two node is never a good idea because it is bound to have split brain problem: When the network between the two node is down for a moment or two, the two node will inevitably think each other is offline and will promote/keep itself to be master and start accepting requests from other services. Then the split brain happens.
And if you are OK with this possible situation, then you can look into setup a master-slave with help of a script file and a HA service like pacemaker or keepalived.
Typically you have to tell the cluster manager through a predefined rule that when two machine rejoins under split brain condition, which one is your preferred master.
When a master is elected, execute the script and basically it execute slaveof no one on itself and execute slaveof <new-master-ip> <port> on the other node.
You could go one step further in your script file and try to merge the two data sets together but whether that's achievable or not is entirely down to how you have organized your data in Redis and how long you are prepared to wait for to have all the data in sync.
I have done this way myself before through pacemaker+corosync.

Ok, partial solution with SLAVEOF:
You can manually promote slave to master by running:
SLAVEOF NO ONE
You can manually transition master to slave by running:
SLAVEOF <HOST> <port>
Clustering should be disabled.

If you brought the replica online manually by changing it to replicaof no one, you need to be careful to bring the failed master back online as a replicaof the new node so you dont overwrite more recent data. I would not recommend doing this manually. You want to minimize downtime so automated failover is ideal
You mention being open to other products. Check out KeyDB which has the exact configuration you are looking for. It is a maintained multi-threaded fork of redis which offers the active-replica scenario you are looking for. Check out an example of it here.
Run both nodes as replicas of each other accepting reads and writes simultaneously (depending on upfront proxy config). If one fails the other continues to take the full load and is already sync'd.
Regarding the split brain concern, KeyDB can handle split brain scenarios where the connection between masters is severed, but writes continue to be made. Each write is timestamped and when the connection is restored each master will share their new data. The newest write will win. This prevents stale data from overwriting new data written after the connection was severed.

I would recommendation to have at least 3 nodes with Sentinel Setup for enabling gossip/quorum for auto promotion of slave to master when current master node goes down.

I believe it is possible to create a cluster with two nodes with the commands below:
$ redis-cli --cluster create <ip-node1>:7000 <ip-node1>:7001 <ip-node2>:7000 <ip-node2>:7001 --cluster-replicas 1
To resolve the split-brain problem. you can add a third node without data:
$ cluster meet #IP_node3#:7000
$ cluster nodes
I think it works.

Related

Need to migrate the whole cluster from one DC to another DC

I have a SolrCloud cluster consists of 5 hosts in one DC.
The collection configuration is 5 shards and 3 replicas and max 3 shards per host.
Solr version used is 5.3.1.
Because of some unforeseen maintenance activity, it needs to be moved to some other DC temporarily. In order to minimize the impact we need the indexed data to be available with the new setup. All the nodes has roughly 100GB of indexed data.
I already have tried copying the whole setup to the new DC and restarted after after updating the host information in the config files. It always complains some or other shards not available from hosts while querying data. [error code 503]
Note: The back up was taken from a running setup.
I also have tried creating the whole cluster again with the same configuration and copying only the data directory from the back up. It also results in shards not available from the hosts.
I wanted to understand if there is something wrong in the process I am following. One thing I am suspecting is , the back up should be taken after stoping a particular node.
Is there any simple and better way available? I am using Solr-5.3.1.
The right way to do it is using backup and restore feature. This feature was already available in the 5.3 version, check the appropiate doc and follow the steps. Should work just fine.

How transparent should losing access to a Postgres-XL datanode be?

I have set-up a testing Postgres-XL cluster with the following architecture:
gtm - vm00
coord1+datanode1 - vm01
coord2+datanode2 - vm02
I created a new database, which contains a table that is distributed by replication. This means that I should have the exact copy of that table in each and every single datanode.
Doing operations on the table works great, I can see the changes replicated when connecting to all coordinator nodes.
However, when I simulate one of the datanodes going down, while I can still read the data in the table just fine, I cannot add or modify anything, and I receive the following error:
ERROR: Failed to get pooled connections
I am considering deploying Postgres-XL as a highly available database backend for a fair number of applications, and I cannot control how those applications interact with the database (it might be big a problem if those applications couldn't write to the database while one datanode is down).
To my understanding, Postgres-XL should achieve high availability for replicated tables in a very transparent way and should be able to support losing one or more datanodes (as long as at least one is still available - again, this is just for replicated tables), but this does not seem the case.
Is this the intended behaviour? What can be done in order to be able to withstand having one or more datanodes down?
So as it turns out not transparent at all. To my jaw dropping surprise at it turns out Postgres-XL has no build in high availably support or recovery. Meaning if you lose one node the database fails. And if you are using the round robbin or hash DISTRIBUTED BY options if you lose a disk in a node you have lost the entire database. I could not believe it, but that is the case.
They do have a "stand by" server option which is just a mirrored node for each node you have, but even this requires manually setting it to recover and doubles the number of nodes you need. For data protection you will have to use the REPLICATION DISTRIBUTED BY option which is MUCH slower and again has no fail over support so you will have to manually restart it and reconfigure it not to use the failing node.
https://sourceforge.net/p/postgres-xl/mailman/message/32776225/
https://sourceforge.net/p/postgres-xl/mailman/message/35456205/

Spark: run InputFormat as singleton

I'm trying to integrate a key-value database to Spark and have some questions.
I'm a Spark beginner, have read a lot and run some samples but nothing too
complex.
Scenario:
I'm using a small hdfs cluster to store incoming messages in a database.
The cluster has 5 nodes, and the data is split into 5 partitions. Each
partition is stored in a separate database file. Each node can therefore process
its own partition of the data.
The Problem:
The interface to the database software is based on JNI, the database itself is
implemented in C. For technical reasons, the database software can maintain
only one active connection at a time. There can be only one JVM process which
is connected to the Database.
Because of this limitation, reading from and writing to the database must go
through the same JVM process.
(Background info: the database is embedded into the process. It's file based,
and only one process can open it at a time. I could let it run in a separate
process, but that would be slower because of the IPC overhead. My application
will perform many full table scans. Additional writes will be batched and are
not time-critical.)
The Solution:
I have a few ideas in my mind how to solve this, but i don't know if they work
well with Spark.
Maybe it's possible to magically configure Spark to only have one instance of my
proprietary InputFormat per node.
If my InputFormat is used for the first time, it starts a separate thread
which will create the database connection. This thread will then continue
as a daemon and will live as long as the JVM lives. This will only work
if there's just one JVM per node. If Spark starts multiple JVMs on the
same node then each would start its own database thread, which would not
work.
Move my database connection to a separate JVM process per node, and my
InputFormat then uses IPC to connect to this process. As i said, i'd like to avoid this.
Or maybe you have another, better idea?
My favourite solution would be #1, followed closely by #2.
Thanks for any comment and answer!
I believe the best option here is to connect to your DB from driver, not from executors. This part of the system anyway would be a bottleneck.
Have you thought of queueing (buffer) then using spark streaming to dequeue and use your output format to write.
If data from your DB fits into RAM memory of your spark-driver you can load it there as a collection and then parallelize it to an RDD https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#parallelized-collections

Solr master-master replication alternatives?

Currently we have 2 servers with a load-balancer before them. We want to be able to turn 1 machine off and later on, without the user noticing it.
Our application also uses solr and now i wanted to install & configure solr on both servers and the question is how do i configure a master-master replication?
After my initial research i found out that it's not possible :(
But what are my options here? I want both indices to stay in sync and when a document is commited on one server it should also go to the other.
Thanks for your help!
Not certain of your specific use case (why turn 1 server on and off?), there is no specific "master-master" replication. Solr does however support distributed indexing and querying via SolrCloud. From the documentation for SolrCloud:
Replication ensures redundancy for your data, and enables you to send
an update request to any node in the shard. If that node is a
replica, it will forward the request to the leader, which then
forwards it to all existing replicas, using versioning to make sure
every replica has the most up-to-date version. This architecture
enables you to be certain that your data can be recovered in the event
of a disaster, even if you are using Near Real Time searching.
It's a bit complex so I'd suggest you spend some time going thru the documentation as it's not quite as simple as setting up a couple of masters and load balancing between them. It is a big step up from the previous master/slave replication that Solr used, so even if it's not a perfect fit it will be a lot closer to what you need.
https://cwiki.apache.org/confluence/display/solr/SolrCloud
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
You can just create a simple master - slave replication as described here:
https://cwiki.apache.org/confluence/display/solr/Index+Replication
But be sure you send your inserts, deletes, updates directly to the master, but selects can go through the load balancer.
The other alternative is to create a third server as a master, and 2 slaves, and the lode balancer can be in front of the two slaves.

couchdb replication on a lot of servers

I am currently looking at CouchDB and I understand that I have to specify all the replications by hand. If I want to use it on 100 nodes how would I do the replication?
Doing 99 "replicate to" and 99 "replicate from" on each node
It feels like it would be overkill since a node replication includes all the other nodes replications to it
Doing 1 replicate to the next one to form a circle (like A -> B -> C -> A)
Would work until one crash, then all wait until it comes back
The latency would be big for replicating from the first to the last
Isn't there a way to say: "here are 3 IPs on the full network. Connect to them and share with everyone as you see fit like an independent P2P" ?
Thanks for your insight
BigCouch won't provide the cross data-center stuff out of the box. Cloudant DBaaS (based on BigCouch) does have this setup already across several data-centers.
BigCouch is a sharded "Dynamo-style" fork of Apache CouchDB--it is to be merged into the "mainline" Apache CouchDB in the future, fwiw. The shards live across nodes (servers) in the same data-center. "Classic" CouchDB-style Replication is used (afaik) to keep the BigCouches in the various data-centers insync.
CouchDB-style replication (n-master) is change-based, so replication only includes the latest changes.
You would need to setup to/from pairs of replication for each node/database combination. However, if all of your servers are intended to be identical, replication won't actually happen that often--it will only happen if needed.
If A gets a change, replication ships it to B and C (etc). However, if B--having just got that change--replicates it to C before A gets the chance too--due to network latency, etc--when A does finally try, it will realize the data is already there, and not bother sending the change again.
If this is a standard part of your setup (i.e., every time you make a db you want it replicated everywhere else), then I'd highly recommend automating the setup.
Also, checkout the _replicator database. It's much easier to manage what's going on:
https://gist.github.com/fdmanana/832610
Hope something in there is useful. :)

Resources