Add shard replica in SolrCloud - solr

Everytime i start a new node in the Solr cluster a shard or a shard replica is assigned automatically.
How could i specify which shard/shards should be replicated on this new node ?
I'm trying to get to a configuration with 3 shards, 6 servers - one for each shard master and 3 for the replicas - and shard1 to have 3 replicas, one on each of the servers while shard1 and shard2 only one.
How can this be achieved?

You can go to the core admin at the solrcloud Web GUI, unload the core that has been automatically assigned to that node and then create a new core, specifying the collection and the shard you want it to be assigned at. After you create that core you should see at the cloud view , that your node has been adeed to that specific shard and after some time that all documents of that shard have been sychronized with your node.

Related

SolrCloud on different machines

I have setup a Solr cloud on two machines, I created a collection collection1 and split it into two shards with 2 replica's, I added my other Solr machine to the cloud and in the Solr admin page in cloud->tree->live nodes, I can see 4 live, which includes the last Solr instance launched, but I can see my shards are running on the same machine just on different ports, even replica is still showing the leader address.
Now I want to shift the replica to the newly launched Solr instance or just put the entire shard 1 or 2 on the other machines.
I have tried searching about it, but nothing tells me the exact commands.
This question is rather old, but for the sake of completeness:
In the Solr UI goto Collections
Select your collection
Click on the shards on the right side
Click add replica
Choose your new node as the target node
Wait for the replica to be ready (watch in Cloud > Graph)
Back in the shards list, delete the old replica
If the old replica was the leader, a leader election will be triggered automatically.

How to indexing solr cloud with zookeeper to replicate data in all the nodes

Is there any way I can insert data to only one node or shard of Solrv5.x and get it replicated to all the other nodes linked to it via zookeeper.
Thanks,
Ravi
This is what Solr does by default when running in SolrCloud mode (which is when it's using Zookeeper).
As long as you index to one of the nodes, the nodes will figure out where (which server has the collection) the document should go and which other servers it should be replicated to.
You set these settings when creating or changing a collection through the replicationFactor setting.

SolrCloud - Updates to schema or dataConfig

We have a SolrCloud managed by Zookeeper. One concern that we have is with updating the schema or dataConfig on the fly. All changes that we are planning to make is in the indexing server node on the SolrCloud. Once the changes to the schema or dataConfig are made, then we do a full dataimport.
The concern is that the replication of the new indexes on the slave nodes in the cloud would not happen immediately, but only after the replication interval. Also for the different slave nodes the replication will happen at different times, which might cause inconsistent results.
For e.g.
The index replication interval is 5 mins.
Slave node A started at 10:00 => next index replication would be at 10:05.
Slave node B started at 10:03 => next index replication would be at 10:08.
If we make changes to the schema in the indexing server and re-index the results at 10:04, then the results of this change would be available on node A at 10:05, but in node B only at 10:08. Requests made to the SolrCloud between 10:05 and 10:08 would have inconsistent results depending on which slave node the request gets redirected to.
Please let me know if there is any way to make the results more consistent.
#Wish, what you are stating is not the behavior of a SolrCloud.
In SolrCloud indexing are routed to shard leaders and leader sent the copies to all the replicas.
At any point of time, if the ZooKeeper identifies that any of the replica is not in sync with leader, it will brought down to recovering mode. In this mode it will not serve any requests including the query.
P.S: In solr cloud configs are maintained at ZooKeeper and not at the nodes level.
I guess you are little confusing Solr Cloud and Master Slave mode, please confirm which one setup are you in?

How to correctly configure SolrCloud replicas on a two-node / two shards cluster

I'm new to SolrCloud (and Solr).
I need your help understanding collection shard and replicas.
I have two SolrCLoud instances running on two different server.
I have a collection, mycol, with two shards. Each solrcloud host a shard.
Because I'm running two nodes, I am thinking to add redundancy. I have some questions about it:
First Way:
add a new one core on each SolrCloud, assign it to mycol shard2 on SolrCloud hosting mycol shard1 and assign it to mycol shard1 on SolrCloud hosting mycol shard2. New shards will become replica and on each node I will have the complete collection in the case of hardware failure.
Second way:
add two SOlrcCLoud instances on two more servers. They will become replicas automatically.
Third way:
add two SolrCloud instances, now for each existing server. They will become replicas automatically.
I'm driving me crazy to understand what is the correct way.
Can you help me?
Thank you
Regards
Giova
It's a bit hard to discect what you are looking for based on your question, however the standard practice is to deploy two or more SolrCloud nodes. Make sure they can talk to each other and zookeeper. Once that is set-up, you can configure your collections with numShards and ReplicationFactor parameter. These parameter will determine how many shards are created and how many replicas will be created for each shard.Shards are used to break up the collection into smaller chucks, shards don't provide any redundancy. Shard replicas are exact copies of your shards, this will actually provide redundancy.
Once you fire off this command to any of the replicas in the SolrCloud cluster, your collection will be created. The replicas are created on the second server to provide redundancy if the first one goes down. At this point, you should be able to query any replica and SolrCloud will automatically route the query internally and provide results.

Solr cloud sharding

Currently I have a zookeeper instance controlling replication on 3 servers. It is the solr integrated zookeeper. It works well in my web based application.
I have a new requirement which will require sharding in the cloud and I am not sure how to implement it. Basically I want to separate the data which can only be updated by me, shard 1, from the data that users can update, shard 2. From time to time I will be completely replacing the data directory in shard 1 - but I don't want to disturb the user created data in shard 2.
Shard 1 does not need replication since I can copy the new data to each server when I chose to update it however shard 2 does need replication.
Currently I run the following command on the server running zookeeper -
java -Dbootstrap_confdir=solr/myApp/conf -Dcollection.configName=myConfig -DzkRun -DnumShards=1 -jar start.jar
And the following command on the other 2 non zookeeper servers
java -Djetty-port=8983 -DzkHost=129.**.30.11:9983 -jar start.jar&
This creates a single shard solr instance * 3
I think I just need to add 1 static shard to this configuration however I am not sure the sequence of commands to accomplish it.
Many thanks
Firstly you are using zookeeper to maintain your shards and leaders/replicas. So if you want to have one shard with two instances and another shard with only a leader then you will have to modify your command as:
1)provide -DnumShards=2 so that the zookeeper knows that you need two shards
2)specify the -DzkHost parameter for this first solr instance also.
java -Dbootstrap_confdir=solr/myApp/conf -Dcollection.configName=myConfig -DzkRun -DnumShards=2 -DzkHost=** -jar start.jar
When you do this you will see some errors on console since shard2 is not created as yet.
Now start your other two servers and you should see a shard1 with two servers(leader and replica) and shard2 will have only one instance i.e leader
If you want separation of indexes and control over those indexes.You will have to create two collections instead of two shards.
Explanation
you have 3 servers right!!! so when you will start solrCloud using zookeeper. following things will happen as:
1) start first solr server along with the zookeeper and you will get 1 shard for solr cloud as shard1
2) start second solr server and point to the zookeeper... since you have declared DnumShards=2 ,Zookeeper will check that it needs to create 1 more shard, so it creates shard2 for your collection. By now you will be able to see your admin console with 2 shards for 1 collection.
3) Now start your 3rd server and point it to zookeeper and now zookeeper sees that 2 shards are there so it will now create a replica for shard1 instead of a new shard.
so it will be like
collection--->shard1--->server1,server3
--->shard2--->server2

Resources