Solr replication can be done with automatic dictionary update? - solr

I am wondering if Solr replication can be done with some key dictionary files update. I am building an index from a build machine and then these will be replicated to a few real production solr machines. One issue I have is that I have dictionary files (synonym and stemming related) which are used in index building in the build machine and those files need to be synchronized with replication. Does Solr have inherent mechanism of supporting it or do I have to program/script something on top of replication (does it have some kind of hook which can be called at the end of replication)?

Solr does support replication of configuration files, but the ones that are within the Conf folder.
Check How_are_configuration_files_replicated

Related

Is it a good idea to stop the Solr replication for all search machines till the the new Solr configurations are deployed on all nodes?

The question is for legacy Solr setup (non-cloud mode).
Let's consider one hypothetical example. Say we have one index machine and 2 search machines.
We have some Solr schema and config changes that we want to deploy to all the machines.
We do a round-robin deployment - deploy to the index machine first then deploy to one search machine at a time. For this whole deployment, we disable the replication from index machines to search machines. Can we do better so that replication is not stopped for the entirety of the deployment process?

Solr indexing on multi schemas (databases)

In our multi-tenant application we have multiple databases, one DB for each company. All users of one company access the same database. I have to implement Solr indexing, can I implement Solr indexing using a single core and create multiple shards wherein one shard for one company. Or do I need multiple cores, wherein each core is created for individual company. Basically I am reading a table in DB, to fetch the file path on each record and then accessing the file system to read the file for indexing.
So, let's put it into the answer. As you described the problem, I think you should create 1 core per 1 company/database, it will be easier for you later on, to restrict users of one company to have access only to their data.
Second, on SolrCloud vs. single instance. A lot depends on the size of the data and desired performance. Quote from the Solr wiki:
Apache Solr includes the ability to set up a cluster of Solr servers
that combines fault tolerance and high availability. Called SolrCloud,
these capabilities provide distributed indexing and search
capabilities, supporting the following features:
Central configuration for the entire cluster
Automatic load balancing and fail-over for queries
ZooKeeper integration for cluster coordination and configuration.
So, if you need those things, and I assume you are, I will prefer SolrCloud over single instance.

Multiple Solr environments with one Zookeeper ensemble

We have two Solr environments in production.
One Solr environment has latest two years data. Other has last 10 years of archived data.
At the moment, these two Solr environments connect to separate Zookeeper ensembles.
The collections have same name & configuration in both Solr environments.
We want to reduce the number of servers for Zookeeper.
Is it feasible to have both Solr environments in production connect to one Zookeeper ensemble without overwriting configs for each other?
Or is it mandatory to have separate Zookeeper ensemble for each Solr environment?
You can use the same Zookeeper ensemble to handle more than one Solr or SolrCloud instance.
However, the data must be kept separate. This is (probably) best done by using the "chroot" functionality in Zookeeper.
Essentially, when you create the "space" in Zookeeper for your Solr instance, you append a /some_thing_unique and keep that in the appropriate config files in Solr - then you should have no trouble.
I haven't experienced moving an existing Solr instance from one Zookeeper to another - I'd guess you would have to take Solr down, change the configs, set up the collection etc.. in Zookeeper, and restart Solr. For sure I'd get that all worked out in a test environment before doing it live.
Hope that helps...
Oh, here's how I did it when creating a collection "new" in Zookeeper... You'll note I gave it a name (the name of my collection) as well as noting what version of Solr I was using. This allows me to install later versions of Solr and move my collection to that later version and keep it all in the same Zookeeper ensemble...
/opt/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost 10.196.12.103,10.196.12.104,10.196.22.103 -cmd makepath /myCollectionName_solr6_2

How to setup Solr Replication with two search servers?

Hi I'm developing rails project with sunspot solr and configure Solr replication.
My environment: rails 3.2.1, ruby 2.1.2, sunspot 2.1.0, Solr 4.1.6.
Why replication: I need more stable system - oftentimes search server goes on maintenance and web application stop working on production. So, I think about how to make 2 identical search servers instead of one, to make system more stable: if one server will be down, other will continue working.
I cannot find any good turtorial with simple, easy to understand and described in details turtorial...
I'm trying to set up replication on two servers, but I do not fully understand how replication working inside:
synchronize data between two servers (is it automatic action?)
balances search requests between two servers
when one server suddenly stop working other should become a master (is it automatic action?)
is there replication features other than listed?
Answer to this is similar to
How to setup Solr Cloud with two search servers?
What is the difference between Solr Replication and Solr Cloud?
Can we close this as duplicate?

Sync solr documents with database records

I wonder there is a proper way to solr documents with sync database records. I usually have problems: there is solr documents while there are no database records referent by solr. It seems some db records has been deleted, but no trigger has been to update solr. I want to write a rake task to remove documents in solr that run periodically.
Any suggestions?
Chamnap
Yes, there is one.
You have to use the DataImportHandler with the delta import feature.
Basically, you specify a query that updates only the rows that have been modified, instead of rebuilding the whole index. Here's an example.
Otherwise you can add a feature in your application that simply trigger the removal of the documents via HTTP in both your DB and in your index.
I'm using Java + Java DB + Lucene (where Solr is based on) for my text search and database records. My solution is to backup then recreate (delete + create) the Lucene database to sync with my records on Java DB. This seems to be the easiest approach, only problem is that this is not advisable to run often. This also means that your records are not updated in real-time. I run my batch job nightly so that all changes reflect the next day. Hope this helps.
Also read an article about syncing Solr and db records here under "No synchronization". It states that it's not easy, but possible in some cases. Would be helpful if you specify your programming language so more people can help you.
In addition to the above, "soft" deletion by setting a deleted or deleted_at column is a great approach. That way you can run a script to periodically clear out deleted records from your Solr index as needed.
You mention using a rake task — is this a Rails app you're working with? Most Solr clients for Rails apps should support deleting records via an after_destroy hook.

Resources