i m new to solr.
I need to search in only specific set of rows in a table rather indexing whole database.
As far i have read, we have to index whole document for searching in solr.
Please tell if there is any way to index only specific set of rows from database in solr?
Related
Is it possible in SOLR to update specific field on indexed document without storing other fields ?
I am using Apache Lucene in which update field internally delete original document and index all fields from document, which leads to store all fields values while indexing, and Storing all fields values degraded the indexing performance.
I got thread which says it is possible to update documents without storing the other fields values.
Solr indexing added/updated requests as 4 though only total of 2 records available in database for the query.
In database table we have only two records that retrieves by the select query but solr indexing says added/updated requests as 4 , I suspect might be due to versioning, as we use delta import in database sometimes that leads to update the data for that record in the database.
Please suggest how can i instruct solr so that it indexes only available records.
You need to check whats the unique key been added to the schema.xml.
It seems that you have not added any unique key because of which it seems that its updating the records and keep on adding the new records.
Is there anyway to find the missing records from solr index.
I am running crawling against a SQL DB. My primaryKey is "id".
There are a few records missing in index. Is there any specific way to find those all??
Is it going to make any difference between a long value and string primary key, if we are using range query??
Thanks in advance....!!
If you mean that those records went "missing" during indexation, you can write them down in a file during indexation, because you will know more or less which records will not make it through.
If you are talking about comparing the database with Solr the only way is to crawl all the database and search for the record in Solr.
You can do it with a range query on group of ids if your ids are numeric for example and then if the result does not match you can narrow down the search.
they easiest way though is to just compare the ids one by one but it's also the slowest way. It depends on your database.
Primary keys in Solr are string only, but nobody say you can't have a numeric unique key alongside.
I am using solrcloud-4.3.0 and zookeeper-3.4.5 on windows machine. I have a collection of index with unique field "id". I observed that there were duplicate documents in the index with same unique id value. As per my understanding this should not happen cause the purpose of the unique field is to avoid such situations. Can anyone help me out here what causes this problem ?
In the "/conf/schema.xml" file there is a XML element called "", which seems to be "id" by default... that is supposed to be your "key".
However, according to Solr documentation (http://wiki.apache.org/solr/UniqueKey#Use_cases_which_do_not_require_a_unique_key) you do not always need to have always to have a "unique key", if you do not require to incrementally add new documents to an existing index... maybe that is what is happening in your situation. But I also had the impression you always needed a unique ID.
Probably too late to add an answer to this question, but it is also possible to duplicate documents with unique keys/fields by merging indexes with duplicate documents/fields.
Apparently when indexes are merged either via the lucene IndexMergeTool or the solr CoreAdminHandler, any duplicate documents will be happily appended to the index. (as of lucene and solr 4.6.0)
de-duplication seems to happen at retrieval time.
https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
I have a SOLR instance that is updated using deltaQuery/deltaImportQuery.
There is a row in SOLR that was changed in the source database table since last SOLR update.
During the next update deltaQuery returns primary key of this row (because it was changed recently). deltaImportQuery should select data for the particular primary key. This query contains additional filter on some field like IsSearchableItem=1 (I don't want to make searchable some rows).
So, deltaImportQuery does not return any data for the row (this particular row IsSearchable=0). Will this row be removed from SOLR index in this case?
I believe if DIH does not generate a replacement document (I think what you call row), it will not get deleted. Instead, you could look at checking for using $deleteDocById when IsSearchableItem is 1. Check $skipDoc usage in Wikipedia dump example.
Or use deletedPkQuery.