Solr Distributed Search vs. Solr Cloud - solr

For Solr 4.3 users, what would be the benefit of using Solr Distributed Search over Solr Cloud?
Or should all Solr deployment after 4.x just use Solr Cloud, and forget about Solr Distributed Search?

Benefit:
There won't be any benefit of Distributed search over solr Cloud. Solr Cloud is currently the most efficient way to deploy solr cluster. It takes care of all your instances using zookeeper and is very efficient for high availability.
Efficient management
Zookeeper decides which of your documents go to which instance.
I have used Solr Cloud in production also and it work wonderfully for high traffic scenarios.

Solr cloud it self resembles distributed search via solr.
No you can still use all deployments after 4.x as normal standalone solr instance.Just avoid zkHost parameter in bootstrap for that.

JOINs are not supported in SOLR cloud which is a big drawback.
If you want to control shards yourself, means which shard will contain which record, go for distributed search otherwise go for cloud search. Cloud manage all shards itself.
We can have multiple instances of SOLR so in case if one fails, we can move to other in distributed search. In cloud search, ZK manage all these things so if ZK fail, system will be down.

Related

Hbase indexer for Solr

I’m trying to index data from a Hbase table using lucid works hbase indexer , I would like to know if Solr , Hbase indexer & Hbase have to use the same zookeeper?
Can my Solr instance be independent while hbase and Hbase indexer are together reporting to zookeeper1 while Solr reports to its own zookeeper ?
Im following the below url
https://community.hortonworks.com/articles/1181/hbase-indexing-to-solr-with-hdp-search-in-hdp-23.html
It is up to our decisions whether go with the same zookeeper or the different independent one.
Because for hbase-zookeeper production setup zookeeper recommend the 3 node setup which means 3 zookeeper required for that setup. So we can make use of the same server for solr also.
It will help us to reduce the number of servers.
Zookeeper is light weight server which will be used to monitor solr server, so it would be good to keep the zookeeper outside the solr server for production run.

Solr indexing on multi schemas (databases)

In our multi-tenant application we have multiple databases, one DB for each company. All users of one company access the same database. I have to implement Solr indexing, can I implement Solr indexing using a single core and create multiple shards wherein one shard for one company. Or do I need multiple cores, wherein each core is created for individual company. Basically I am reading a table in DB, to fetch the file path on each record and then accessing the file system to read the file for indexing.
So, let's put it into the answer. As you described the problem, I think you should create 1 core per 1 company/database, it will be easier for you later on, to restrict users of one company to have access only to their data.
Second, on SolrCloud vs. single instance. A lot depends on the size of the data and desired performance. Quote from the Solr wiki:
Apache Solr includes the ability to set up a cluster of Solr servers
that combines fault tolerance and high availability. Called SolrCloud,
these capabilities provide distributed indexing and search
capabilities, supporting the following features:
Central configuration for the entire cluster
Automatic load balancing and fail-over for queries
ZooKeeper integration for cluster coordination and configuration.
So, if you need those things, and I assume you are, I will prefer SolrCloud over single instance.

Solr and Zookeeper with a single node

I have the setup of Solr cloud running in my local machine with the internal Zookeeper (i.e) Zookeeper that is being internally used by Solr with the single node.
My query is that while I move my Solr to the production environment, Is it recommended to run the Zookeeper in a isolated/separate/external instance or is it better to go with the internal instance of Zookeeper that comes along with the Solr?
The use solr internal zookeeper is discouraged for the production environments. This is even stated in SolrCloud documentation.
Although Solr comes bundled with Apache ZooKeeper, you should consider yourself discouraged from using this internal ZooKeeper in production, because shutting down a redundant Solr instance will also shut down its ZooKeeper server, which might not be quite so redundant. Because a ZooKeeper ensemble must have a quorum of more than half its servers running at any given time, this can be a problem.
The solution to this problem is to set up an external ZooKeeper ensemble. You should create this ensemble on a different machine so that if any of the solr machine goes down it will not impact the zookeeper and rest of the solr instances. I know currently you are going with one solr instance.
As mentioned, for production is not a good idea to have the internal Zookeeper inside Solr but for development is entirely OK and very practical and for that you just need to add this lines to your /etc/default/solr.in.sh file:
SOLR_MODE=solrcloud
ZK_CREATE_CHROOT=true
As an alternative, you can also start Solr manually with the command $SOLR_HOME_DIR/bin/solr start -c
Tested with Apache Solr 9 on a Debian based Linux

How to setup Solr Replication with two search servers?

Hi I'm developing rails project with sunspot solr and configure Solr replication.
My environment: rails 3.2.1, ruby 2.1.2, sunspot 2.1.0, Solr 4.1.6.
Why replication: I need more stable system - oftentimes search server goes on maintenance and web application stop working on production. So, I think about how to make 2 identical search servers instead of one, to make system more stable: if one server will be down, other will continue working.
I cannot find any good turtorial with simple, easy to understand and described in details turtorial...
I'm trying to set up replication on two servers, but I do not fully understand how replication working inside:
synchronize data between two servers (is it automatic action?)
balances search requests between two servers
when one server suddenly stop working other should become a master (is it automatic action?)
is there replication features other than listed?
Answer to this is similar to
How to setup Solr Cloud with two search servers?
What is the difference between Solr Replication and Solr Cloud?
Can we close this as duplicate?

Remove Shards from SOLR Database

I recently recovered a SOLR database that uses SOLR cloud to shard an index. I now have that database running on a single machine, but the data is still sharded--moreover now this is unnecessary.
How can I stop using SOLR cloud and merge these shards into a single collection?
I ended up using the Lucene Merge Index tool. The SOLR approaches did not work for me (obtuse errors).

Resources