SolrCloud DIH implementation with zookeeper

SolrCloud DIH implementation with zookeeper - solr

I am going to put my old DataImportHandler configuration of solr 4.3 to SolrCloud 5.0.
I have already deployed zookeeper on 3 virtual machines and all are well communicating with each other. I have read about nodes, collections, shards and replicas but I am not able to collect how I can put my old DIH configurations to zookeeper. Currently I have 5 different DIH configurations which I need to put into solrCloud. Is that mean I have to create 5 nodes or collections?, yup I am confused here.
Thanks in Advance!

There is no need of extra node for configuration. Solr Cloud depends upon collection which is sharded across the nodes and you can create replica of it.
These are the Steps you need to do for SolrCloud :-
Run Zookeeper
Run Solrnodes with zookeeper
Upload configuration to zookeeper
Create collection by referring to the configuration
To upload configuration to zookeeper and create collection :-
Create a solrlibs directory
Copy /opt/solr/server/solr-webapp/webapp/WEB-INF/lib/* to it
Copy /opt/solr/server/lib/ext/* to it
Run the command java -classpath .:/opt/solrlibs/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181 -confdir /opt/solrconfigs/test/conf -confname testconf
Create the collection using following command http://192.168.1.4:8080/solr/admin/collections?action=CREATE&name=test&collection.configName=testconf&numShards=2&replicationFactor=2
Num Shards and Replication factor will be based on number of nodes you have.

Related

SolrCloud Zookeeper bootstrap option | Solr 7

I am managing a solrcloud running with 4 solr nodes and 5 zookeeper instances. Solr version is 4.4.
We are planning for an upgrade to the latest version of 7.
Solr node1 is started with bootsrap_conf=true. For any update to schema or configset, we update configset in node1 and do a restart.
My issue is that I dont see that option in solr 7. We have around 200 cores with individual configset for most of them. I read that, with solr7, zookeeper stores the configset. But I dont see from where zookeeper will load this.
If I shut down my entire solr cloud (including zookeeper), how do I reload the configsets to zookeeper ? Do I need to track which configset is used for each core ? Or is there an option like bootstrap_conf in Solr 7 which will load the respective conf to zookeeper.

Hbase indexer for Solr

I’m trying to index data from a Hbase table using lucid works hbase indexer , I would like to know if Solr , Hbase indexer & Hbase have to use the same zookeeper?
Can my Solr instance be independent while hbase and Hbase indexer are together reporting to zookeeper1 while Solr reports to its own zookeeper ?
Im following the below url
https://community.hortonworks.com/articles/1181/hbase-indexing-to-solr-with-hdp-search-in-hdp-23.html

It is up to our decisions whether go with the same zookeeper or the different independent one.
Because for hbase-zookeeper production setup zookeeper recommend the 3 node setup which means 3 zookeeper required for that setup. So we can make use of the same server for solr also.
It will help us to reduce the number of servers.
Zookeeper is light weight server which will be used to monitor solr server, so it would be good to keep the zookeeper outside the solr server for production run.

Multiple Solr environments with one Zookeeper ensemble

We have two Solr environments in production.
One Solr environment has latest two years data. Other has last 10 years of archived data.
At the moment, these two Solr environments connect to separate Zookeeper ensembles.
The collections have same name & configuration in both Solr environments.
We want to reduce the number of servers for Zookeeper.
Is it feasible to have both Solr environments in production connect to one Zookeeper ensemble without overwriting configs for each other?
Or is it mandatory to have separate Zookeeper ensemble for each Solr environment?

You can use the same Zookeeper ensemble to handle more than one Solr or SolrCloud instance.
However, the data must be kept separate. This is (probably) best done by using the "chroot" functionality in Zookeeper.
Essentially, when you create the "space" in Zookeeper for your Solr instance, you append a /some_thing_unique and keep that in the appropriate config files in Solr - then you should have no trouble.
I haven't experienced moving an existing Solr instance from one Zookeeper to another - I'd guess you would have to take Solr down, change the configs, set up the collection etc.. in Zookeeper, and restart Solr. For sure I'd get that all worked out in a test environment before doing it live.
Hope that helps...
Oh, here's how I did it when creating a collection "new" in Zookeeper... You'll note I gave it a name (the name of my collection) as well as noting what version of Solr I was using. This allows me to install later versions of Solr and move my collection to that later version and keep it all in the same Zookeeper ensemble...
/opt/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost 10.196.12.103,10.196.12.104,10.196.22.103 -cmd makepath /myCollectionName_solr6_2

Solr and Zookeeper with a single node

I have the setup of Solr cloud running in my local machine with the internal Zookeeper (i.e) Zookeeper that is being internally used by Solr with the single node.
My query is that while I move my Solr to the production environment, Is it recommended to run the Zookeeper in a isolated/separate/external instance or is it better to go with the internal instance of Zookeeper that comes along with the Solr?

The use solr internal zookeeper is discouraged for the production environments. This is even stated in SolrCloud documentation.
Although Solr comes bundled with Apache ZooKeeper, you should consider yourself discouraged from using this internal ZooKeeper in production, because shutting down a redundant Solr instance will also shut down its ZooKeeper server, which might not be quite so redundant. Because a ZooKeeper ensemble must have a quorum of more than half its servers running at any given time, this can be a problem.
The solution to this problem is to set up an external ZooKeeper ensemble. You should create this ensemble on a different machine so that if any of the solr machine goes down it will not impact the zookeeper and rest of the solr instances. I know currently you are going with one solr instance.

As mentioned, for production is not a good idea to have the internal Zookeeper inside Solr but for development is entirely OK and very practical and for that you just need to add this lines to your /etc/default/solr.in.sh file:
SOLR_MODE=solrcloud
ZK_CREATE_CHROOT=true
As an alternative, you can also start Solr manually with the command $SOLR_HOME_DIR/bin/solr start -c
Tested with Apache Solr 9 on a Debian based Linux

Solr cloud sharding

Currently I have a zookeeper instance controlling replication on 3 servers. It is the solr integrated zookeeper. It works well in my web based application.
I have a new requirement which will require sharding in the cloud and I am not sure how to implement it. Basically I want to separate the data which can only be updated by me, shard 1, from the data that users can update, shard 2. From time to time I will be completely replacing the data directory in shard 1 - but I don't want to disturb the user created data in shard 2.
Shard 1 does not need replication since I can copy the new data to each server when I chose to update it however shard 2 does need replication.
Currently I run the following command on the server running zookeeper -
java -Dbootstrap_confdir=solr/myApp/conf -Dcollection.configName=myConfig -DzkRun -DnumShards=1 -jar start.jar
And the following command on the other 2 non zookeeper servers
java -Djetty-port=8983 -DzkHost=129.**.30.11:9983 -jar start.jar&
This creates a single shard solr instance * 3
I think I just need to add 1 static shard to this configuration however I am not sure the sequence of commands to accomplish it.
Many thanks

Firstly you are using zookeeper to maintain your shards and leaders/replicas. So if you want to have one shard with two instances and another shard with only a leader then you will have to modify your command as:
1)provide -DnumShards=2 so that the zookeeper knows that you need two shards
2)specify the -DzkHost parameter for this first solr instance also.
java -Dbootstrap_confdir=solr/myApp/conf -Dcollection.configName=myConfig -DzkRun -DnumShards=2 -DzkHost=** -jar start.jar
When you do this you will see some errors on console since shard2 is not created as yet.
Now start your other two servers and you should see a shard1 with two servers(leader and replica) and shard2 will have only one instance i.e leader
If you want separation of indexes and control over those indexes.You will have to create two collections instead of two shards.
Explanation
you have 3 servers right!!! so when you will start solrCloud using zookeeper. following things will happen as:
1) start first solr server along with the zookeeper and you will get 1 shard for solr cloud as shard1
2) start second solr server and point to the zookeeper... since you have declared DnumShards=2 ,Zookeeper will check that it needs to create 1 more shard, so it creates shard2 for your collection. By now you will be able to see your admin console with 2 shards for 1 collection.
3) Now start your 3rd server and point it to zookeeper and now zookeeper sees that 2 shards are there so it will now create a replica for shard1 instead of a new shard.
so it will be like
collection--->shard1--->server1,server3
--->shard2--->server2

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight