i have configured Solr 8.4 on 3+3 , 6 systesm across 2 Data center. Uses zookeeper ensemble external server in each region.
I could configure the solrConfig.xml for test collection as per Solr manual. Followed the instruction manual provided by solr website to start solrCloud in a sequence.
When i insert a single record (document) in primary he dosn't get replicated. The replication happens only after i restart solrCloud on all servers. This is a new collection i defined. I have updated records manually thru UI. Not indexed the collection.
Do i have to restart solr evry time i update ? why records get updated only while restarting?
Please let me know if you have come across this.
Note: i dint run an index. it was empty collection using defaukt configset . Added record from UI in document section
Related
We are currently upgrading our Alfresco 5.x to Alfresco 6.2 but we have troubles with our integration-tests especially the ones which are creating and searching for nodes.
The integration-tests were using the NO INDEX Solr which makes the created nodes searchable immediately but now with a separate Solr instance with Alfresco 6 we have to wait until Solr indexed the new nodes.
Correct me if I'm wrong, but as I know Alfresco only triggers committed transactions for Solr indexing which means either
a #Test has to succeed to successfully commit a transaction or
I have to begin a new transaction with the RetryingTransactionHelper in which I create new nodes
Using the RetryingTransactionHelper works but afterward I have to wait until the new data has been index.
(tl;dr) How do I check if new nodes have been indexed in Solr to use the org.alfresco.service.cmr.search.SearchService in my integration-tests?
I have the External File Field configured and working on a non-cloud Solr setup. Now I need to apply the same to a SolrCloud setup. I have 3 shard and 3 replication factors.
The EFF file needs to go into the data directory of the solr index. How do I upload/update the EFF file, since I have about 3 shards on 3 solr servers each.
Can Zookeeper be used to maintain these files too?
The issue is that updating these files manually, means going to each shard/replica and update them manually.
Any guidance anyone could provide about EFF and SolrCloud.
Thanks,
Brijesh
Is there any way I can insert data to only one node or shard of Solrv5.x and get it replicated to all the other nodes linked to it via zookeeper.
Thanks,
Ravi
This is what Solr does by default when running in SolrCloud mode (which is when it's using Zookeeper).
As long as you index to one of the nodes, the nodes will figure out where (which server has the collection) the document should go and which other servers it should be replicated to.
You set these settings when creating or changing a collection through the replicationFactor setting.
Currently I have a zookeeper instance controlling replication on 3 servers. It is the solr integrated zookeeper. It works well in my web based application.
I have a new requirement which will require sharding in the cloud and I am not sure how to implement it. Basically I want to separate the data which can only be updated by me, shard 1, from the data that users can update, shard 2. From time to time I will be completely replacing the data directory in shard 1 - but I don't want to disturb the user created data in shard 2.
Shard 1 does not need replication since I can copy the new data to each server when I chose to update it however shard 2 does need replication.
Currently I run the following command on the server running zookeeper -
java -Dbootstrap_confdir=solr/myApp/conf -Dcollection.configName=myConfig -DzkRun -DnumShards=1 -jar start.jar
And the following command on the other 2 non zookeeper servers
java -Djetty-port=8983 -DzkHost=129.**.30.11:9983 -jar start.jar&
This creates a single shard solr instance * 3
I think I just need to add 1 static shard to this configuration however I am not sure the sequence of commands to accomplish it.
Many thanks
Firstly you are using zookeeper to maintain your shards and leaders/replicas. So if you want to have one shard with two instances and another shard with only a leader then you will have to modify your command as:
1)provide -DnumShards=2 so that the zookeeper knows that you need two shards
2)specify the -DzkHost parameter for this first solr instance also.
java -Dbootstrap_confdir=solr/myApp/conf -Dcollection.configName=myConfig -DzkRun -DnumShards=2 -DzkHost=** -jar start.jar
When you do this you will see some errors on console since shard2 is not created as yet.
Now start your other two servers and you should see a shard1 with two servers(leader and replica) and shard2 will have only one instance i.e leader
If you want separation of indexes and control over those indexes.You will have to create two collections instead of two shards.
Explanation
you have 3 servers right!!! so when you will start solrCloud using zookeeper. following things will happen as:
1) start first solr server along with the zookeeper and you will get 1 shard for solr cloud as shard1
2) start second solr server and point to the zookeeper... since you have declared DnumShards=2 ,Zookeeper will check that it needs to create 1 more shard, so it creates shard2 for your collection. By now you will be able to see your admin console with 2 shards for 1 collection.
3) Now start your 3rd server and point it to zookeeper and now zookeeper sees that 2 shards are there so it will now create a replica for shard1 instead of a new shard.
so it will be like
collection--->shard1--->server1,server3
--->shard2--->server2
We are using Solr Cloud (4.3) for indexing data. We have 2 shard/2 replica servers in Solr Cloud.
We tried executing query on individual shard and it shows correct
When we execute same query (:) from Solr Admin Console, it display inconsistent results (number of records found is different each time).
What could be wrong? How can we troubleshoot it?
How Query is executed on different (shard/replica) and result combine? Is there any document which explain details about this?
I believe that you have to make sure that solr is doing soft commits to push information to the other replicas. This needs to be set to the frequency that you need the data to stay "current"
solr.autoSoftCommit.maxDocs=<max number of uncommitted documents before soft commit>
solr.autoSoftCommit.maxTime=<max time in ms before soft commit>
http://wiki.apache.org/solr/SolrConfigXml
SOLR autoCommit vs autoSoftCommit
Do a commit operation on solr Cloud after you index your data. Then refresh your results,One or two times it might show you different results,But after that it should be pretty consistent.