I have setup Solr 7.4 cluster with 3 nodes and 3 replicas and one collection with 5 shards
I added a collection called posts (with 5 shards and 3 replicas), and by default, the leader of all its shards is 196.209.182.40
Is it appropriate that each shard has a different node as a leader?
for example :
Why Solr chooses all the leaders alike?
Since shards can be located on completely different servers (and usually are), instead of as shown in your example where all shards are located on the same set of three nodes, yes, there can be different leaders for all shards.
The election process is described in Shards and indexing in SolrCloud.
In SolrCloud there are no masters or slaves. Instead, every shard consists of at least one physical replica, exactly one of which is a leader. Leaders are automatically elected, initially on a first-come-first-served basis, and then based on the ZooKeeper process described at https://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_leaderElection.
Referenced from the URL above:
A simple way of doing leader election with ZooKeeper is to use the SEQUENCE|EPHEMERAL flags when creating znodes that represent "proposals" of clients. The idea is to have a znode, say "/election", such that each znode creates a child znode "/election/n_" with both flags SEQUENCE|EPHEMERAL. With the sequence flag, ZooKeeper automatically appends a sequence number that is greater that any one previously appended to a child of "/election". The process that created the znode with the smallest appended sequence number is the leader.
In your case the same node was the first to respond in all cases (and possibly the one you submitted the create request to), and thus, was elected the original leader.
Related
I have six node solr cluster and every node having 200GB of storage, we created one collection with two shards.
I like to know what will happen if my document reached 400GB (node1-200GB,node-2 200GB) ? is solr automatically use another free node from my cluster ?
If my document reached 400GB (node1-200GB,node-2 200GB) ?
Ans: I am not sure about what exacly error you may get, however in production you should try not to face this situation. To avoid/handle such scenarios we have monitoring/autoscaling triggers apis.
Is solr automatically use another free node from my cluster ?
Ans: No, Extra shards will not be added automatically. However whenever you observe that search is getting slow or if solr is crossing physical limitations of machines then you should go for splitShard .
So ultimately you can handle this with autscaling triggers. That is you can set autscaling triggers to identify whether a shard is crossing specified limits about the number of document or size of the index etc. Once this limits reaches this trigger can call splitShard
This link mentions
This trigger can be used for monitoring the size of collection shards,
measured either by the number of documents in a shard or the physical
size of the shard’s index in bytes.
When either of the upper thresholds is exceeded the trigger will
generate an event with a (configurable) requested operation to perform
on the offending shards - by default this is a SPLITSHARD operation.
I have a 3 node Cassandra cluster with RF=3. Now when I do nodetool status I get the owns for each node in the cluster as 100%.
But when I have 5 nodes in the cluster wit RF=3. The owns is 60%(approx as shown in image below).
Now as per my understanding the partitioner will calculate the hash corresponding to first replica node and the data will also be replicated as per the RF on the other nodes.
Now we have a 5 node cluster and RF is 3.
Shouldn't 3 nodes be owning all the data evenly(100%) as partitioner will point to one node as per the partitoning strategy and then same data be replicated to remaining nodes which equals RF-1? It's like the data is getting evenly distributed among all the nodes(5) even though RF is 3.
Edit1:
As per my understanding the reason for 60%(approx) owns for each node is because the RF is 3. It means there will be 3 replicas for each row. It means there will be 300% data. Now there are 5 nodes in the cluster and partitioner will be using the default random hashing algorithm which will distribute the data evenly across all the nodes in the cluster.
But now the issue is that we checked all the nodes of our cluster and all the nodes contain all the data even though the RF is 3.
Edit2:
#Aaron I did as specified in the comment. I created a new cluster with 3 nodes.
I created a Keyspace "test" and set the class to simplestrategy and RF to 2.
Then I created a table "emp" having partition key (id,name).
Now I inserted a single row into the first node.
As per your explanation, It should only be in 2 nodes as RF=2.
But when I logged into all the 3 nodes, i could see the row replicated in all the nodes.
I think since the keyspace is getting replicated in all the nodes therefore, the data is also getting replicated.
Percent ownership is not affected (at all) by actual data being present. You could add a new node to a single node cluster (RF=1) and it would instantly say 50% on each.
Percent ownership is purely about the percentage of token ranges which a node is responsible for. When a node is added, the token ranges are recalculated, but data doesn't actually move until a streaming event happens. Likewise, data isn't actually removed from its original node until cleanup.
For example, if you have a 3 node cluster with a RF of 3, each node will be at 100%. Add one node (with RF=3), and percent ownership drops to about 75%. Add a 5th node (again, keep RF=3) and ownership for each node correctly drops to about 3/5, or 60%. Again, with a RF of 3 it's all about each node being responsible for a set of primary, secondary, and tertiary token ranges.
the default random hashing algorithm which will distribute the data evenly across all the nodes in the cluster.
Actually, the distributed hash with Murmur3 partitioner will evenly distribute the token ranges, not the data. That's an important distinction. If you wrote all of your data to a single partition, I guarantee that you would not get even distribution of data.
The data replicated to another nodes when you add them isn't cleared up automatically - you need to call nodetool cleanup on the "old" nodes after you add the new node into cluster. This will remove the ranges that were moved to other nodes.
Brief overview of the setup:
5 x SolrCloud (Solr 4.6.1) node instances (separate machines).
The setup is intended to store last 48 hours webapp logs (which are pretty intense... ~ 3MB/sec)
"logs" collection has 5 shards (one per node instance).
One logline represents one document of "logs" collection
If I keep storing log documents to this "logs" collection, cores on shards start getting really big and CPU graphs show that instances spend more and more time waiting for disk I/O.
So, my idea is to create new collection with each 15 minutes and name it "logs-201402051400" with shards spread across 5 instances. Document writers will start writing to the new collection as soon as it is created. At some time I will get the list of collection like that:
...
logs-201402051400
logs-201402051415
logs-201402051430
logs-201402051445
logs-201402051500
...
Since there will be max 192 collections (~1000 cores) in the SolrCloud at some certain period of time. It seems that search performance should degrade drastically.
So, I would like to merge collections that are not being currently written to into one large collection (but still sharded across 5 instances). I have found information how to merge cores, but how can I merge collections?
This might NOT be a complete answer to your query - but something tells me that you need to redo the design of your collection.
This is a classic debate between using a Single Collection with Multiple Shards versus Multiple Collections.
I think you ought to setup a Single Collection - and then use Solr Cloud's dynamic sharding capability (implicit router) to add new shards (for newer 15 minute intervals) / delete old shards (for older 15 minute intervals).
Managing a single collection means that you will have a single end point and will save you from complexity of querying multiple collections.
Take a look at one of the answers on this link that talks about using the implicit router for dynamic sharding in SolrCloud.
How to add shards dynamically to collection in solr?
I've setup a SolrCloud structures having 3 shards. Each shard consist of 2 nodes. One is Leader and another is replica. Each solr instance (as node) is running in the separate machine. Now I need to add more machines as my data volume increases. But if I add new node without creating new shard, it'll simply increase more replica of shards. I want to create more shards with new machines and the data should be distributed among the shards.
For testing purpose, I created a SolrCloud with one shard (2 nodes). I tried solr SPLITSHARD with solr-4.5.1. Finally, I see total 3 shards (shard1, shard1_0 and shard1_1) from the admin window. Now it's showing total 6 nodes.
In the background, it has created the following folders under each node.
node1 :
solr/collection1
solr/collection1_shard1_0_replica1
solr/collection1_shard1_1_replica1
node2 :
solr/collection1
solr/collection1_shard1_0_replica2
solr/collection1_shard1_1_replica2
It means, it created 2 new cores under each instance. But I want to run a single core under each machine.
We have been into the same problem too. The only solution I can see for the current version of Solr is to add replicas on the new machines, wait for replication done and delete the original one.
In addition, if you split only one shard in the collection, the cluster will not be uniformly distributed. So you have to split every shards by the same factor.
Once you set numShards property when creating a collection, your intention become impossible. Other answers are only depicting about splitting origial no. of shards into more no. of shards but the data won't be distributed evenly, i.e. suppose 1 data starts with 2 shards, say S1 and S2. When do splitting-shards on S1, it becomes S11,S12,S2 which data in S2 is much more than S11,S12. But I think what you want is data in S1 & S2 is cut evenly into S11, S12, and S2 where S11,S12, and S2 are running on different nodes on different machines. That's NOT possible in current Solr (even v6) AFAIK.
What you want is also me and many other Solrcloud users want and I think it's a very normal intention. Let's hope future version of Solrcloud will provide this functionality.
I have a problem in creating shards with zoo keeper.
When I create a collection and the shards for the corresponding collection ,I have to specify max number of shards the collection going to have.If I started any new solr server it will be considered as new shard by zoo keeper.This will happen up to maximum number of shards I specified during the collection creation.
What is my requirement is,without specifying maximum number of shards, I have to make zoo keeper to consider the newly started solr instance as shards for the corresponding collection.
How to achieve this functionality .....?