solr replication without commit in master - solr

Solr replicates from master to slave on either commit,optimize or startup.However i m setting up an empty core which acts as a slave core and I want it to replicate data from the master core without firing a commit operation on master core.

You can Force Replication on the slave from the Solr Admin dashboard.
You can also fire replication through url fetchindex
http://slave_host:port/solr/replication?command=fetchindex
Forces the specified slave to fetch a copy of the index from its
master.
If you like, you can pass an extra attribute such as masterUrl or
compression (or any other parameter which is specified in the tag) to do a one time replication from a master. This
obviates the need for hard-coding the master in the slave.

Related

How to handle solr replication when master goes down

I have solr setup, which is configured for Master and slave. The indexing is happening in master and slave is replicating the index at every 2 Min interval from master. So there is a delay of 2 Minutes in getting data from master to slave. Lets assume that my master was indexing at 10:42 some data but due to some hardware issue, master went down at 10:43. So now the data which was indexing at 10:42 was suppose to replicate on Slave by 10:44 (as we have set two minutes interval) Since now the master is not available, how to identify what the last indexed data in solr Master server. Is there way in solr log to track the index activity.
Thanks in Advance
Solr does log the indexing operations if you have the Solr log set to INFO. Any commit/add will show up in the log, so you can check the log for when the last addition was made. Depending on the setup, it might be hard to get the last log when then server is down, though.
You can reduce the time between replications to get more real time replication, or use SolrCloud instead (which should distribute the documents as they're being indexed).
There are also API endpoints (see which connections the Admin interface makes when browsing to the 'replication' status page) for getting the replication status, but those wouldn't help you if the server is gone.
In general - if the server isn't available, you'll have a hard time telling when it was last indexed to. You can work around a few of the issues by storing the indexing time outside of Solr from the indexing task, for example updating a value in memcache or MySQL every time you send something to be indexed from your application.

Sitecore 8.1 index rebuild strategy for SOLR search provider

Just read through the index update strategies document below but couldn't get the clear answer on which strategy is best for SOLR search implementation:
https://doc.sitecore.net/sitecore_experience_platform/search_and_indexing/index_update_strategies
We have setup the master and slave Solr endpoints where master will be used for create/update. And slave for reading only.
Appreciate if you could suggest the indexing strategy to be used for:
Content Authoring
Content Delivery
Solution is hosted in azure web apps and content delivery can be scaled up or down from 1-N number at any time.
I'm planning to configure below:
Only CA have a OnPublishEndAsync
All CDs will not have any indexing strategy.
Appreciate if you could suggest a solution that has worked for you. Also how do we disable indexing strategy?
Thanks.
Usually when you use replication in Solr (master + slave Solr servers), it should be configured like that:
Content Authoring (CM server):
connects to Solr master server.
It runs syncMaster strategy for master database, and onPublishEndAsync for web database.
Content Delivery (CD servers):
connects to Solr slave server (or to some load balancer if there are multiple Solr slave servers).
has all the indexing strategies set to manual - they should NEVER update Slave solr servers.
With this solution, CD servers always can get results from Solr, even if there is full index rebuild in progress (this happens on Master Solr server and data is copied to Slaves after it's finished).
You should think about having 2 Solr Slave servers and load balancer for them. If you do this:
If Solr master is down for some reason, slaves still answers to requests from CD boxes. You can safely restart master, reindex, and the only thing you lost is that you didn't have 100% up to date search results on CD for some time.
If one of the Solr slave servers is down, second slave server still answers to the request and load balancer should redirect all the traffic to the slave server which works.

Is any way in solr3 slave create update delete will reflect on master also

Is any way in solr3,
To perform create update delete on slave will also reflect on master?
No. Slave is merely a slave and can't control master. If the slave was able to edit/delete records in master then the whole purpose of master/slave is defeated.
Ideally, there shouldn't be any data import handler (DIH) configured for slave, and it should ONLY be a master to create, update & delete documents. Slave just replicates what master has indexed. End of story!
Shishir

Solr Replication Starts periodically

Under what conditions does solr starts replicating from the start, we have noticed that in our master slave setup solr periodically start replicating the entire index from the beginning.
We have not made any changes to schema or config files, in-spite of that full replication get's triggered. How can this be avoided.
Regards,
Ayush

Database replication

OK when working with table creation, is it always assumed that creating a table on one database (the master) mean that the DBA should create the table on the slave as well? Also, if using a master/slave configuration, shouldn't data always be getting replicated from the master to the slave to synch?
Right now the problem I am having is my database has a lot of stuff in the master, but the slave is missing parts that only exist in the master. Is something not configured correctly here?
Depends how the replication is configured. Real time replication should keep the master and slave in sync at all times. "Poors mans" replication is usually configured to sync upon some time interval expiring. This is whats probably happening in your case.
I prefer to rely on CREATE TABLE statements being replicated to set up the table on the slave, rather than creating the slave's table by hand. That, of course, relies on the DBMS supporting this.
If you have data on the master that isn't on the slave, that's some sort of failure of replication, either in setup or operationally.
Any table creation on master is replication on slave. Same goes with the inserting data.
Go through the replication settings in my.cnf file for mysql and check if any database / table is ignored from replicating.

Resources