Keep Solr slaves in sync - solr

We have a master-slave setup running Solr 6.5.0. There is a backend process running 24/7 which pushes its data towards the master server. No commit is done on master. The web frontend is accessing the slave. Replication poll interval is 1 hour.
All is fine so far, but now as the traffic grows, the CPU load on slave is really high. I thought the best thing would be to add a second slave to the master and let the web servers connect via existing load balancers to the two Solr slave machines. I think that the two Solr slaves will handle their replication independently and each slave will poll the master at another time.
As the master receives 24/7 new data I'm worried that both machines do not have the same data set/version. Is there a solution with low administration effort to force both slaves polling new data from master at the same time? (I.e. I'm trying to avoid setting up a real Solr cluster as multiple slaves will fit our needs.)

The problem here is following, during your poll interval, potentially, your slaves could be out-of-sync. In your case you have 1 hour interval.
The thing which could be done with minimal effort is following, you could force replication on slaves at the same time by calling the command:
http://slave_host:port/solr/core_name/replication?command=fetchindex
However, I'm not sure how often you could call this command, since most likely you couldn't do it every minute or so.
Another possibility is to trigger replication whenever a commit is performed on the master index. You could do this by adding configuration:
<str name="replicateAfter">commit</str>
For more information about it take a look here

The traditional master-slave is basically doing rsync over http. So, maybe you can rsync between slaves (and reload cores after rsync).

Related

Replicate index from master at specific time in Solr 7

I have a use case where we have a Solr master that is replicated to three replicas in a cluster, and is also replicated to a separate replica in Hong Kong. We were initially replicating all of them every 00:01:05, but that's too much to do at once for network traffic. For the sake of data continuity on the front end, I still need to replicate the three in the cluster simultaneously, and I want to replicate to the HK index separately so when it replicates, it's not doing it at the same time as the three in the cluster.
My question has to do with setting when this happens. From everything I've read, you can only set pollInterval, which, as its name indicates, is a frequency. What I'd like to do is similar to what can be done with a *nix cron job, where you can set it to run at a specific time after the hour. So for instance, I'd like to have the cluster replicas do their replication at :05, :15, :25, :35, :45, and :55 every hour, and the HK index to replicate at :00, :10, :20, :30, :40, and :50. Is there a way to do that somehow with pollInterval, or perhaps another slave replication handler setting?
I don't think Solr supports natively the kind of scheduling that you are looking for. You might be able to do something like it by kicking of the replication via the Solr API through a cron job.

AWS Elasticache Redis failover

I am using Redis on ElastiCache for a Node application and today the node went down which means our app stopped working. It took 20 minutes for a new node to be provisioned.
From reading the documentation it seems I can set up a cluster which automatically promotes a slave to primary in case of a failure. The big gotcha seems to be you have to set your client to write to the primary node and read from the slave nodes.
This means in the case of a failure, you have to reconfigure your app to point to the newly created 'read' nodes. It also takes a few minutes for a slave to be promoted to primary.
Is there no way to set this up so if the primary fails, a slave will automatically take over for read/write operations?
I'm not storing much data in redis and low read/write operations, but it is required to run the app (live video sessions!).
If I can't have a seamless failover in redis, is there something I can use which provides this functionality? I'm hoping I don't have to move to a traditional DBMS as everything works perfectly but I need to be able to handle failure well.
Thanks
Multi AZ's should automatically switch over with minimal downtime. Once you have created one of these instances, you will get an endpoint for the cluster. Amazon will point that DNS entry to the proper failover node, and handle the promotion of a slave, if the master instances dies.

How to setup Solr Cloud with two search servers?

Hi I'm developing rails project with sunspot solr and configuring Solr Cloud.
My environment: rails 3.2.1, ruby 2.1.2, sunspot 2.1.0, Solr 4.1.6.
Why SolrCloud: I need more stable system - oftentimes search server goes on maintenance and web application stop working on production. So, I think about how to make 2 identical search servers instead of one, to make system more stable: if one server will be down, other will continue working.
I cannot find any good turtorial with simple, easy to understand and described in details turtorial...
I'm trying to set up SolrCloud on two servers, but I do not fully understand how it is working inside:
synchronize data between two servers (is it automatic action?)
balances search requests between two servers
when one server suddenly stop working other should become a master (is it automatic action?)
is there SolrCloud features other than listed?
Read more about SolrCloud here..! https://wiki.apache.org/solr/SolrCloud
Couple of inputs from my experience.
If your application just reads data from SOLR and does not write to SOLR(in real time but you index using an ETL or so) then you can just go for Master Slave hierarchy.
Define one Master :- Point all writes to here. If this master is down you will no longer be able to index the data
Create 2(or more) Slaves :- This is an feature from SOLR and it will take care of synchronizing data from the master based on the interval we specify(Say every 20 seconds)
Create a load balancer based out of slaves and point your application to read data from load balancer.
Pros:
With above setup, you don't have high availability for Master(Data writes) but you will have high availability for data until the last slave goes down.
Cons:
Assume one slave went down and you bought it back after an hour, this slave will be behind the other slaves by one hour. So its manual task to check for data consistency among other slaves before adding back to ELB.
How about SolrCloud?
No Master here, so you can achieve high availability for Writes too
No need to worry about data inconsistency as I described above, SolrCloud architecture will take care of that.
What Suits Best for you.
Define a external Zookeeper with 3 nodes Quorom
Define at least 2 SOLR severs.
Split your Current index to 2 shards (by default each shard will reside one each in 2 solr nodes defined in step #2
Define replica as 2 (This will create replica for shards in each nodes)
Define an LB to point to above solr nodes.
Point your Solr input as well as application to point to this LB.
By above setup, you can sustain fail over for either nodes.
Let me know if you need more info on this.
Regards,
Aneesh N
-Let us learn together.

Solr master-master replication alternatives?

Currently we have 2 servers with a load-balancer before them. We want to be able to turn 1 machine off and later on, without the user noticing it.
Our application also uses solr and now i wanted to install & configure solr on both servers and the question is how do i configure a master-master replication?
After my initial research i found out that it's not possible :(
But what are my options here? I want both indices to stay in sync and when a document is commited on one server it should also go to the other.
Thanks for your help!
Not certain of your specific use case (why turn 1 server on and off?), there is no specific "master-master" replication. Solr does however support distributed indexing and querying via SolrCloud. From the documentation for SolrCloud:
Replication ensures redundancy for your data, and enables you to send
an update request to any node in the shard. If that node is a
replica, it will forward the request to the leader, which then
forwards it to all existing replicas, using versioning to make sure
every replica has the most up-to-date version. This architecture
enables you to be certain that your data can be recovered in the event
of a disaster, even if you are using Near Real Time searching.
It's a bit complex so I'd suggest you spend some time going thru the documentation as it's not quite as simple as setting up a couple of masters and load balancing between them. It is a big step up from the previous master/slave replication that Solr used, so even if it's not a perfect fit it will be a lot closer to what you need.
https://cwiki.apache.org/confluence/display/solr/SolrCloud
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
You can just create a simple master - slave replication as described here:
https://cwiki.apache.org/confluence/display/solr/Index+Replication
But be sure you send your inserts, deletes, updates directly to the master, but selects can go through the load balancer.
The other alternative is to create a third server as a master, and 2 slaves, and the lode balancer can be in front of the two slaves.

Solr Master Slave Failover setup for High Availability

While using Solr (we are currently using 3.5), how do we setup the Masters for a Failover?
Lets say in my Setup I have Two Masters and Two Slaves. The Application commits all the writes to One Active Master, and both the slaves get the updates from this Active Master. There is another repeater which serves the same purpose of the Master.
Now my question is if the Master for some reason comes down, how can I make the Repeater as a Master without any Manual intervention. How can the slaves start getting the updates from the Repeater instead of the broken Master. Is there a recommended way to do this? Are there any other recommended Master/Slave setup's to ensure High availability of the Solr systems?
At this time, your best option is probably to investigate the SolrCloud functionality present in the current Solr 4.0 alpha, which at the time of this writing is due for its final release within a few months. The goal of SolrCloud is to handle data distribution and master election, using the ZooKeeper distributed database to maintain consensus within the cluster about which nodes are serving in while roles.
There are other more traditional ways to set up failover for Solr 3's replicated master-slave architecture, but I personally wouldn't want to make that investment with Solr 4.0 so near to release.
Edit: See Linux-HA, for one such traditional approach. Personally, I would create a purpose-built daemon that reconfigures your cores and load balancer, using ZooKeeper for presence detection and distributed locks.
If outsourcing is an option, you might consider a hosted service such as my own humble Websolr. We provide this kind of distribution and hot failover by default, so our customers don't have to worry as much about the mechanics of how it's implemented.
I agree with Nick. The way replication works in Solr 3.x is not always handy, especially for master fail-over. If you are going to consider Solr 4 you might want to have a look at elasticsearch too, which solves this kind of problems in a really brilliant way!
It uses push replication instead of the pull mechanism used by Solr. That means the document is literally reindexed on all nodes. It might sound strange but that allows to reduce the network load (due to segment merge for example). Furthermore, a node is elected as master and if it crashes one other node will automatically replace it becoming the new master.

Resources