Just wondering, if I index a field in mongoid is there a special query form I should be using to speed up queries using that index or does Class.where(index: value) utilize that automatically?
I quote the creator of the Mongoid ODM from the following bug report in GitHub https://github.com/mongoid/mongoid/issues/1276
If you have fields that are indexed then it's determined on the
database side if the index is to be used - there's nothing special on
the Mongoid side of things when using criteria to provide index hints.
Please remember though if you created the index in Mongoid to run rake
db:create_indexes to ensure it actually got created in the db.
Related
I am using Solr v7.7.1 in cloud mode. I am facing an issue related to optimistic concurrency:
I have a nested document which can be updated concurrently multiple times before committing the updates. During the process of indexing, we fetch the document which we want to modify along with its _version_, modify it and then send it to solr along with the same _version_. If the update happens more than once before committing, the following error is thrown:
Caused by:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at
http://1.2.3.4:8983/solr/mcollection_shard1_replica_n2: version
conflict for 1111 expected=1645085633861910528
actual=1645090791527284737
In the above error, we are basically trying to index a document with id 1111 before a previous version of the document was indexed and committed. The solution for this problem is to simply commit all the updates and then again try indexing the new document. However, the solr is giving the same error with same version codes even after committing. What could possibly the issue?
A strange observation is that this problem is not faced when solr is not running in the cloud mode.
This seems to be a very specific issue with solr when we are using nested documents.
While indexing a document, when _version_ is mentioned, the solr checks the version of the already existing latest document by doing a real-time get. The real-time get gets the data from update logs (which means that the data which is not yet open for search is also accessible). For this, solr does something like following:
http://1.2.3.4:8983/solr/mcollection/get?id=1111
Now if you have 2 nested documents where, in one document (doc1), parent has id=1111 and in other document(doc2), the child has id=1111, then it may be possible that solr might check version of doc2 when you intended to index doc1. This might be because solr still indexes all the documents in flat structure and doesn't consider parent-child relationship while doing real-time get.
The solution to this is to make the id of parent and child documents different from each other.
The bug has been reported: https://issues.apache.org/jira/browse/SOLR-13785
There are indexes of some solr cores which I convert them from solr4 to solr6 but in solr standalone mode. so they don't have the "version" field that solrcolud require.
Here now I want to migrate to solrcloud 6 and I need to put them under cluster. Because the version field dose not exist there in these indexes when I put them Under a solrcloud leader core on the data directory the replicas in the shard didn't update as I saw. so I decided to read them by lucene, get each doc fields, add them to a solrdoc and then put them doc by doc in solrcloud. But cause there are fields that not stored in these indexes so all fields that exist here in these indexes don't move there.
At the end it seems there is no way for me than re-indexing.
I appreciate if there is any better idea or solutions that can help me migrate more easily.
If there is any chance to reindex, just do so, it's going to be the best in the end (you have to deal with two separate issues: a) migrate from 4.X to 6.0 and b)from standalone to SolrCloud...it's going to be messy).
If you cannot reindex:
are all your fields stored OR have docValues=true? If so, you can get the original contents of your docs. Read them and index them with solrj or with some script.
if not, and you have a version field: try to manually put the index in Solrcloud. Not straighforward, but possible.
if you don't have a version field, I think it is impossible to put the index as is in Solrcloud (although some post on the net make you think it is). You could try to write some lucene code to add version field to all docs (with values that make sense), but this should be the very last resort.
ElasticSearch has percolator for prospective search. Does SOLR have a similar feature where you define your query upfront? If not, is there an effective way of implementing this myself on top of the existing SOLR features?
besides what BunkerMentality said, it is not hard to build your own percolator, what you need:
Are the queries you want to run easy to model on Lucene only syntax? if so you are good, if not, you need to convert them to Lucene only. Built them, and keep them in memory as Lucene queries
When a doc arrives:
build a MemoryIndex containing only that single doc
run all your queries on the index
I have done this for a system ingesting millions docs a day and it worked fine.
It's listed as an open new feature, SOLR-4587, on Solr JIRA but it doesn't seem like any work has started on it yet.
There is a link in the comments there to a separate project called Luwak that seems to implement some features similar to percolator.
If it is still relevant, you can use this
It's SOLR Update Processor that based on Luwak
Is there a way we can add documents into a specific shard?
For example, documents type A will always get inserted into shard1 and document type B always go to shard2.
I have tried using custom router but it does not guaranty that different prefix will route to different shard.
PS. I am on Solr 5 using cloud mode.
A caveat: I'm using SolrNet to access SolrCloud, and it doesn't integrate with ZooKeeper yet. For Java clients, this might be far easier.
Despite what I read here and here with regard to the CompositeId Router, I could never get it to work. What #jay helped me figure out is a way to use "implicit" routing to achieve this. If you create your collection like this (leave out the numShards parameter):
http://localhost:8983/solr/admin/collections?action=CREATE&name=myCol&maxShardsPerNode=2&router.name=implicit&shards=shard1,shard2&router.field=shard
then add a field to your schema.xml named "shard" (matching the router.field parameter), you can index to a specific shard simply by adding the shard field to the document being indexed and specifying the shard name. At query time, you can specify shards to search -- more here (I was able to simply specify the shard name w/o a specific address).
I haven't tested this in production yet, but have verified using multiple VirtualBox instances, with ZooKeeper, HAProxy, and several Solr nodes, and it's doing exactly what I expected. Corrections and comments welcome.
I am relatively new to Apache SOlr and have recently been working with DIH, specifically the XPathEntityProcessor. I need a way to periodically index new XML files, however, it appears the delta-import command is only supported by the sqlEntityProcessor [1].
I am working with an increasingly large dataset of XML files and was hoping solr could determine new files and index them...
A potential solution that came to mind is to possibly do a full-import from a staging area consisting of documents that have not been previously index, before moving the documents to their respective permanent locations.
Is there a workaround to mimicking delte-import using XPathEntityProcessor?
What sort of approaches do people using XPathEntityProcessor use to index newer documents?
[1] http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command-1
I've resorted to using the UpdateRequestHandler; it's perfect for what I want to do.
[1] http://wiki.apache.org/solr/XsltUpdateRequestHandler