I am using Solr v7.7.1 in cloud mode. I am facing an issue related to optimistic concurrency:
I have a nested document which can be updated concurrently multiple times before committing the updates. During the process of indexing, we fetch the document which we want to modify along with its _version_, modify it and then send it to solr along with the same _version_. If the update happens more than once before committing, the following error is thrown:
Caused by:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at
http://1.2.3.4:8983/solr/mcollection_shard1_replica_n2: version
conflict for 1111 expected=1645085633861910528
actual=1645090791527284737
In the above error, we are basically trying to index a document with id 1111 before a previous version of the document was indexed and committed. The solution for this problem is to simply commit all the updates and then again try indexing the new document. However, the solr is giving the same error with same version codes even after committing. What could possibly the issue?
A strange observation is that this problem is not faced when solr is not running in the cloud mode.
This seems to be a very specific issue with solr when we are using nested documents.
While indexing a document, when _version_ is mentioned, the solr checks the version of the already existing latest document by doing a real-time get. The real-time get gets the data from update logs (which means that the data which is not yet open for search is also accessible). For this, solr does something like following:
http://1.2.3.4:8983/solr/mcollection/get?id=1111
Now if you have 2 nested documents where, in one document (doc1), parent has id=1111 and in other document(doc2), the child has id=1111, then it may be possible that solr might check version of doc2 when you intended to index doc1. This might be because solr still indexes all the documents in flat structure and doesn't consider parent-child relationship while doing real-time get.
The solution to this is to make the id of parent and child documents different from each other.
The bug has been reported: https://issues.apache.org/jira/browse/SOLR-13785
Related
I am using SOLR with sitecore, on production environment, I am getting a lot of errors in SOLR log, but sites are working fine, I have 32 solr cores, and I am using Solr version 4.10.3.0 with Sitecore 8.1 update 2, below is sample of these errors, any one can explain to me these errors :
Most of the errors are self-descriptive, like this one:
undefined field: "Reckless"
tells that the field in question is not defined in the solr schema. Try to analyze the queries you system is accepting and the system sending these in.
The less obvious one:
Overlapping onDeckSearchers=2
is warning about warming searchers, in this case 2 of them concurrently. This means, that there were commits to the Solr index in a quick succession, each of which triggered a warming searcher. The reason it is wasteful is that even though the first searcher has warmed up and is ready to serve queries, it will be thrown away as the new searcher warms up and is ready to serve.
Is there a way we can add documents into a specific shard?
For example, documents type A will always get inserted into shard1 and document type B always go to shard2.
I have tried using custom router but it does not guaranty that different prefix will route to different shard.
PS. I am on Solr 5 using cloud mode.
A caveat: I'm using SolrNet to access SolrCloud, and it doesn't integrate with ZooKeeper yet. For Java clients, this might be far easier.
Despite what I read here and here with regard to the CompositeId Router, I could never get it to work. What #jay helped me figure out is a way to use "implicit" routing to achieve this. If you create your collection like this (leave out the numShards parameter):
http://localhost:8983/solr/admin/collections?action=CREATE&name=myCol&maxShardsPerNode=2&router.name=implicit&shards=shard1,shard2&router.field=shard
then add a field to your schema.xml named "shard" (matching the router.field parameter), you can index to a specific shard simply by adding the shard field to the document being indexed and specifying the shard name. At query time, you can specify shards to search -- more here (I was able to simply specify the shard name w/o a specific address).
I haven't tested this in production yet, but have verified using multiple VirtualBox instances, with ZooKeeper, HAProxy, and several Solr nodes, and it's doing exactly what I expected. Corrections and comments welcome.
I have the following problem with Solr 4.5.1, with a cloud install with 4 shards:
I have updated a document via the Solr console (select a core, then select "Documents"). I used the CSV format to upload the document, including the document ID.
When I query the document id from the Solr console (simple query: id:"the-id-of-the-doc-I-updated"), I alternatively obtain the old document (with the values before update, and a given version number), or the new document (with the values after update, and a different version).
No log messages in the Solr console.
Any idea what might be going on, and how to fix that problem?
Thanks in advance,
Yann
This seems to be due in a bug in Solr; the Solr console doesn't handle document routing properly. Deleting documents (via a delete query), and then adding documents from the console fixed that problem.
I have a doubt regarding solr document update. For example, when two requests to update a document in solr comes at the same time, How does solr work?
Does it take one request randomly and locks write before next request comes in?
Thanks in Advance
There are different Locking mechanisms as mentioned in Lucene locking factory docs. By default NativeFSLockFactory is used in which file lock is acquired for the document that is being indexed. The settings for using a different locking mechanism can be changed in solrconfig.xml
Here is a snippet from solconfig.xml
<!-- LockFactory
This option specifies which Lucene LockFactory implementation
to use.
single = SingleInstanceLockFactory - suggested for a
read-only index or when there is no possibility of
another process trying to modify the index.
native = NativeFSLockFactory - uses OS native file locking.
Do not use when multiple solr webapps in the same
JVM are attempting to share a single index.
simple = SimpleFSLockFactory - uses a plain file for locking
Defaults: 'native' is default for Solr3.6 and later, otherwise
'simple' is the default
More details on the nuances of each LockFactory...
http://wiki.apache.org/lucene-java/AvailableLockFactories
-->
<lockType>${solr.lock.type:native}</lockType>
Are you talking about physical locks or logical version control? For logical version control, Solr 4+ supports optimistic concurrency using version field.
You can read about it:
Official documentation
Detailed writeup
I have Solr 3.6 powering search on a Wordpress site I maintain, and this morning I saw that Sorl could not execute a data import. I was attempting to run http://example.com:9393/solr/wordpress/dataimport?command=full-import. Whereas until today the import would chug happily along, now I am getting only the message, Indexing failed. Rolled back all changes.
I'm probably missing something obvious, but where does Solr keep the data import logs? I would like to check them out to see what the problem is, but I have not been able to find the right logs.
Solr does not have exclusive log file for data-import, log statements related to data-import process are written to standard log file that Solr writes to. If you are using Tomcat it should be ../logs/catalina.out .
Error could be caused by any number of problems between Solr, Data source, perhaps the data itself. You might want to check the following questions as well
Indexing failed. Rolled back all changes. (Solr DataImport)
solr dataimport error: Indexing failed. Rolled back all changes