Is it possible in SOLR to update specific field on indexed document without storing other fields ?
I am using Apache Lucene in which update field internally delete original document and index all fields from document, which leads to store all fields values while indexing, and Storing all fields values degraded the indexing performance.
I got thread which says it is possible to update documents without storing the other fields values.
Related
There is a field named "id" which is used as unique key in solr. Although it's not directly used for faceting or sorting queries, it still comes up in fieldcache and occupies lot of memory.
Please help me understand how this id field came in field cache and also if there is a way to avoid this from fieldcache.
I have a 3 nodes Arango Cluster (Community edition).
I created a database with writeConcern=3 and replicationFactor=3 and a collection with shards=3, and replicationFactor=3.
I have a Hash index on a field of that collection with the unique property set to true. However I am still able to create different documents with the same field value.
I would like to know if there is some strategies to ensure uniqueness of a collection field in the cluster.
The section Indexes On Shards in the Arango docs says the following:
Unique indexes (hash, skiplist, persistent) on sharded collections are only allowed if the fields used to determine the shard key are also included in the list of attribute paths for the index
The reason behind this is simple - it would be very expensive to ensure uniqueness of an attribute x if it is not guaranteed that all documents with identical values of x are stored on the same node.
i m new to solr.
I need to search in only specific set of rows in a table rather indexing whole database.
As far i have read, we have to index whole document for searching in solr.
Please tell if there is any way to index only specific set of rows from database in solr?
I am using solrcloud-4.3.0 and zookeeper-3.4.5 on windows machine. I have a collection of index with unique field "id". I observed that there were duplicate documents in the index with same unique id value. As per my understanding this should not happen cause the purpose of the unique field is to avoid such situations. Can anyone help me out here what causes this problem ?
In the "/conf/schema.xml" file there is a XML element called "", which seems to be "id" by default... that is supposed to be your "key".
However, according to Solr documentation (http://wiki.apache.org/solr/UniqueKey#Use_cases_which_do_not_require_a_unique_key) you do not always need to have always to have a "unique key", if you do not require to incrementally add new documents to an existing index... maybe that is what is happening in your situation. But I also had the impression you always needed a unique ID.
Probably too late to add an answer to this question, but it is also possible to duplicate documents with unique keys/fields by merging indexes with duplicate documents/fields.
Apparently when indexes are merged either via the lucene IndexMergeTool or the solr CoreAdminHandler, any duplicate documents will be happily appended to the index. (as of lucene and solr 4.6.0)
de-duplication seems to happen at retrieval time.
https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
I have a SOLR instance that is updated using deltaQuery/deltaImportQuery.
There is a row in SOLR that was changed in the source database table since last SOLR update.
During the next update deltaQuery returns primary key of this row (because it was changed recently). deltaImportQuery should select data for the particular primary key. This query contains additional filter on some field like IsSearchableItem=1 (I don't want to make searchable some rows).
So, deltaImportQuery does not return any data for the row (this particular row IsSearchable=0). Will this row be removed from SOLR index in this case?
I believe if DIH does not generate a replacement document (I think what you call row), it will not get deleted. Instead, you could look at checking for using $deleteDocById when IsSearchableItem is 1. Check $skipDoc usage in Wikipedia dump example.
Or use deletedPkQuery.