There is a field named "id" which is used as unique key in solr. Although it's not directly used for faceting or sorting queries, it still comes up in fieldcache and occupies lot of memory.
Please help me understand how this id field came in field cache and also if there is a way to avoid this from fieldcache.
Related
Is it possible to create a composite uniqueKey in schema.xml? Or is it better to concatenate the unique fields into one unique string id field in the source data?
If it's possible, and if it's not that big of a difference, I would prefer to do the former because it would save me a bit of time.
As discussed here How to set multiple fields as uniqueKey in solr? or there http://lucene.472066.n3.nabble.com/Multiple-uniqueKey-fields-td472939.html u cannot simply add multiple fields. Ud need to combine them into one field, but this can not be a multivalued field.
Is it possible in SOLR to update specific field on indexed document without storing other fields ?
I am using Apache Lucene in which update field internally delete original document and index all fields from document, which leads to store all fields values while indexing, and Storing all fields values degraded the indexing performance.
I got thread which says it is possible to update documents without storing the other fields values.
Is there anyway to find the missing records from solr index.
I am running crawling against a SQL DB. My primaryKey is "id".
There are a few records missing in index. Is there any specific way to find those all??
Is it going to make any difference between a long value and string primary key, if we are using range query??
Thanks in advance....!!
If you mean that those records went "missing" during indexation, you can write them down in a file during indexation, because you will know more or less which records will not make it through.
If you are talking about comparing the database with Solr the only way is to crawl all the database and search for the record in Solr.
You can do it with a range query on group of ids if your ids are numeric for example and then if the result does not match you can narrow down the search.
they easiest way though is to just compare the ids one by one but it's also the slowest way. It depends on your database.
Primary keys in Solr are string only, but nobody say you can't have a numeric unique key alongside.
I am using solrcloud-4.3.0 and zookeeper-3.4.5 on windows machine. I have a collection of index with unique field "id". I observed that there were duplicate documents in the index with same unique id value. As per my understanding this should not happen cause the purpose of the unique field is to avoid such situations. Can anyone help me out here what causes this problem ?
In the "/conf/schema.xml" file there is a XML element called "", which seems to be "id" by default... that is supposed to be your "key".
However, according to Solr documentation (http://wiki.apache.org/solr/UniqueKey#Use_cases_which_do_not_require_a_unique_key) you do not always need to have always to have a "unique key", if you do not require to incrementally add new documents to an existing index... maybe that is what is happening in your situation. But I also had the impression you always needed a unique ID.
Probably too late to add an answer to this question, but it is also possible to duplicate documents with unique keys/fields by merging indexes with duplicate documents/fields.
Apparently when indexes are merged either via the lucene IndexMergeTool or the solr CoreAdminHandler, any duplicate documents will be happily appended to the index. (as of lucene and solr 4.6.0)
de-duplication seems to happen at retrieval time.
https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
I have 2 cores on a single solr instance. Schema's of both cores share same primary key.
I want to merge results of a query from both the cores. Is it possible using solr?
I followed Solr:Distributed Search however the example didnt work for me ( I did get result but it was not unified) . I queried solr cores using :
localhost:8983/solr/core1/select/?shards=localhost:8983/solr/core1,localhost:8983/solr/core0&q=123_456.
Has anyone tried this approach before?
What do you mean by Unified results ??
You should be able to get a combined results from both the Cores in a single result set.
However, there are few limitations.
The schema needs to same for both the cores or has been synced up so that the search happens on the same fields and are returned accordingly.
What do mean cores share same primary key ?
The Id needs to be unique across cores.
The unique key field must be unique across all shards. If docs with
duplicate unique keys are encountered, Solr will make an attempt to
return valid results, but the behavior may be non-deterministic.