Under what conditions does solr starts replicating from the start, we have noticed that in our master slave setup solr periodically start replicating the entire index from the beginning.
We have not made any changes to schema or config files, in-spite of that full replication get's triggered. How can this be avoided.
Regards,
Ayush
Related
I have a cluster with HBASE+Phoenix.
I've installed SOLR on it.
Now I'm trying to set up hbase replication for the cluster, following this manual:
https://community.hortonworks.com/articles/1181/hbase-indexing-to-solr-with-hdp-search-in-hdp-23.html
Started hbase-indexer server, added hbase-indexer, put data via hbase shell, requested commit via browser.
But there are no changes in the collection in SOLR - zero new records.
Status 'replication' command in hbase shell increases sizeOfLogQueue with each PUT command to the indexed table.
When greping hbase log (hbase-hbase-regionserver-myserver.log) I found lots of records like this:
Indexer_hbaseindexer: Total replicated edits: 0, currently replicating
from:
hdfs://HDP-Test/apps/hbase/data/WALs/myserver,16020,1519204674681/myserver%2C16020%2C1519204674681.default.1519204995372
at position: 45671433
The position here never changes.
Issue author on this link tells that when changing WAL codec to IndexedWALEditCodec, the hbase replication stops.
Is it real that IndexedWALEditCodec stops hbase replication from working correctly? That shouldn't be true.
What may be a problem then? Any hint would be appreciated
env:
HDFS 2.7.3
HBASE 1.1.2
SOLR 5.5.2
HBASE INDEXER 2.2.8
p.s. When restarting Hbase, then querying solr commit, the changes appear. But afterwards it doesn't do anything.
I have an issue with Solr replication.
I have one master and two slaves.
Every so often the replication failes on one on the slaves.
There is no error in the log file, I have upated the settings to record ALL for replication.
The file replication.properties is not "updated" (on the Slave that is failing but it is updated on the other Slave) which suggests that the replication did not start, but according to UI replication took place and "Next Run" is counting time to the next replication, but the same time the replication worked for the other Slave. Both Slaves have connection to Master.
Command "replication?command=details" displays different index versions between Master and Slave.
If I use the "Replicate now" button to force the replication - it will work fine and the next occurance will be also fine, but after few hours/days it will start to fail again on either of the Slaves.
How can I investigate this issue further?
Thank you
Adding extra CPU and increasing RAM helped this issue, since the upgrade the replication is working fine.
I have solr setup, which is configured for Master and slave. The indexing is happening in master and slave is replicating the index at every 2 Min interval from master. So there is a delay of 2 Minutes in getting data from master to slave. Lets assume that my master was indexing at 10:42 some data but due to some hardware issue, master went down at 10:43. So now the data which was indexing at 10:42 was suppose to replicate on Slave by 10:44 (as we have set two minutes interval) Since now the master is not available, how to identify what the last indexed data in solr Master server. Is there way in solr log to track the index activity.
Thanks in Advance
Solr does log the indexing operations if you have the Solr log set to INFO. Any commit/add will show up in the log, so you can check the log for when the last addition was made. Depending on the setup, it might be hard to get the last log when then server is down, though.
You can reduce the time between replications to get more real time replication, or use SolrCloud instead (which should distribute the documents as they're being indexed).
There are also API endpoints (see which connections the Admin interface makes when browsing to the 'replication' status page) for getting the replication status, but those wouldn't help you if the server is gone.
In general - if the server isn't available, you'll have a hard time telling when it was last indexed to. You can work around a few of the issues by storing the indexing time outside of Solr from the indexing task, for example updating a value in memcache or MySQL every time you send something to be indexed from your application.
We have a SolrCloud managed by Zookeeper. One concern that we have is with updating the schema or dataConfig on the fly. All changes that we are planning to make is in the indexing server node on the SolrCloud. Once the changes to the schema or dataConfig are made, then we do a full dataimport.
The concern is that the replication of the new indexes on the slave nodes in the cloud would not happen immediately, but only after the replication interval. Also for the different slave nodes the replication will happen at different times, which might cause inconsistent results.
For e.g.
The index replication interval is 5 mins.
Slave node A started at 10:00 => next index replication would be at 10:05.
Slave node B started at 10:03 => next index replication would be at 10:08.
If we make changes to the schema in the indexing server and re-index the results at 10:04, then the results of this change would be available on node A at 10:05, but in node B only at 10:08. Requests made to the SolrCloud between 10:05 and 10:08 would have inconsistent results depending on which slave node the request gets redirected to.
Please let me know if there is any way to make the results more consistent.
#Wish, what you are stating is not the behavior of a SolrCloud.
In SolrCloud indexing are routed to shard leaders and leader sent the copies to all the replicas.
At any point of time, if the ZooKeeper identifies that any of the replica is not in sync with leader, it will brought down to recovering mode. In this mode it will not serve any requests including the query.
P.S: In solr cloud configs are maintained at ZooKeeper and not at the nodes level.
I guess you are little confusing Solr Cloud and Master Slave mode, please confirm which one setup are you in?
I have a cluster of 3 sharded SOLR 4.1. There is a replicated cluster but the data is quite out of synced. I have stopped polling on those secondary nodes for a long time.
Now I want to start the replication again but I'm afraid it would take too long to replicate 400GB index data on each node.
If I manually copy over the index files from the master to the slave node, will it work?
Thanks
Yes, that should work just fine - as long as you don't write to the index while copying it (or copy it from a snapshot). In fact, that's what the replication does in the background (by replicating the segment files that needs replicating).
In older versions of Solr the replication was just shell scripts triggered to copy the index to other servers after an update happened.