Solr not returning all indexed documents

Solr not returning all indexed documents - solr

I have encountered problem that Solr is not returning all indexed documents.
After completing full-import than going to web UI's Schema Browser there is correct information about document count, but when querying for all, there are ~400 documents missing. Following 2 pictures show Schema Browser and Query views
Loading Term Info also shows all documents and even those which are not shown with queries. After click on term the query is made that return nothing.
Database ID's are unique and can not possibly overlap.
Here's code fragment from data-config
<entity name="PZ" pk="DOKUMENTA_ID" transformer="TemplateTransformer, ClobTransformer"
query="
SELECT pz.dokumenta_id,
... // selecting other non pk fields
">
<field column="primaryKey" template="${PZ.DOKUMENTA_ID}" />
<field column="DOKUMENTA_ID" name="pz_id" />
...
primaryKey is defined in schema.xml as unique key

Related

Reindexing Solr Data with different field type

I am facing an issue while reindexing Solr data.
I have indexed some documents specifying a wrong field type on the managed-schema file.
Now, instead of the wrong field definition, I would like to use:
<field name="documentDate" type="date" indexed="true" stored="true"/>
To do this I have:
deleted all the previous wrong indexed documents;
updated the managed-schema
reloaded the core
After these steps I tried to reindex documents, but this fails; looking at logs:
org.apache.solr.common.SolrException: Exception writing document id 2ecde3eb2b5964b2c44362f752f7b90d to the index; possible analysis error: cannot change DocValues type from NUMERIC to SORTED_SET for field "documentDate".
How is this possible? I have removed all the documents storing the field documentDate.. How can I solve this issue?

maybe try to delete the data folder in your core.
You can add new fields to your schema without delete the data folder, but when you modify a field (this is my experience) then I have to delete the data folder and build a new fresh index

Is there a way to view search document fields that are only indexed but not stored via the solr admin panel using the query tool?

I want to view the indexed but not stored fields of a solr search document in the solr admin query tool, is there any provision for this?
Example Field Configuration:
<field name="product_data" type="string" indexed="true" stored="false" multiValued="false" docValues="true" />

If you're using schema version 1.6, Solr will automagically fetch the values from the stored docValues, even if the field itself is set as stored="false". Include the field name in fl to get the values.
However, even if you're looking for the actual tokens indexed for a document / field / value, using the Analysis page is usually the preferred way as it allows you to tweak the value and see the response quickly. The Luke Request Handler / Tool is useful if you want to explore the actual indexed tokens.

How to query a specific document by id

From a previous query I already have the document ID (the uniqueKey in this schema is 'track_id') of the document I'm interested in.
Then I would like to query a sequence of words on that document while highlighting the match.
I can't seem to be able to combine the search parameters in a successful way (all my google searches return purple links :\ ), although I've already tried many combinations these past few days. I also know the field where the matches will be if that's any use in terms of improving match speed.
I'm guessing it should be something like this:
/select?q=track_id:{key_i_already_have} AND/&/{part_I_dont_know} word1 word2 word3
Currently, since I can't combine these two search parameters, I'm only querying the words and thus getting several results from several documents.
Thanks in advance.

From Solr 4 you can use the realtime get, which is much more faster than searching the index by id.
http://localhost:8983/solr/get?ids=id1,id2,id3
For index updates to be visible (searchable), some kind of commit must reopen a searcher to a new point-in-time view of the index. The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher. This is primarily useful when using Solr as a NoSQL data store and not just a search index.

You may try applying Filter Query for id. So it will filter your search query to that id, and then search in that document for all the keywords, and highlight them.
Your query will look like:
/select?fq=track_id:DOC_ID&q=word1 word2 word3
Just make sure your "id" field in schema.xml is defined of the type string to apply filter queries on it.
<field name="id" type="string" indexed="true" stored="true" required="true" />

How to view non-stored fields per document?

I have a field like this:
<field name="status" type="string" indexed="true" stored="false" required="false" />
Using LukeRequestHandler I can view only statistics of the indexed terms, I can view indexed terms per document if stored="true". TermsComponent can show only frequencies of terms, I cannot view terms per document.
Is it possibly to look inside the inverted index without setting stored="true" and reindexing Solr?

In order to view the indexed terms for a single document, you need to use the full Luke application, not the LukeRequestHandler. You would need to copy the index folder from your Solr data directory to another location, then open it in Luke.
There is however a workaround within solr itself - do a search that will return just the one document, and facet on the field you want to examine. Every term in the index for that field on that document will be an entry in the facet output. Here is a full sample URL for this kind of search:
http://localhost:8983/solr/core/select?q=id:1234&facet.field=status&facet.limit=-1&facet.mincount=1&facet=true&facet.method=enum
If you decide to go the Luke route, you can step through your index (or search for an individual document) and view just one document.
The official Luke page is here, but it only supports up through 4.0-ALPHA:
http://code.google.com/p/luke/
You can find Luke for versions beyond 4.0-ALPHA here:
https://java.net/projects/opengrok/downloads
There is an effort underway to absorb Luke into the Lucene/Solr source code as a module, so it will always be current and released at the same time as each Lucene/Solr version.

Solr Faceting Multi-valued vs Tokenizers

I'm trying to set up a subject field in my schema. I'm drawing from a database where a single record can have multiple subjects and the subjects are listed in a comma delimited string. Is there a way to facet on just one of the subjects?
Thanks

Check SolrFacetingOverview for an faceting overview.
Facet Indexing section mentions the field type you should choose for the field that you want to facet on.
You can customize the faceting using SimpleFacetParameters
You can filter the results with entities having particular value for a subject using the filter query e.g. fq=subject:"MATH"
The filtering would produce only the results matching the criteria and the facet results would include the facets from the resultset.

if I understand well you want this, in the dih file:
<entity name="entity" pk="id" query="..." transformer="RegexTransformer">
<field column="subjects" splitBy=","/>
</entity>
and the query for facetting:
http://localhost:8983/solr/select?q=...&facet=true&facet.field=subjects&facet.query=subjects:the-one-you-want
would that work?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Solr not returning all indexed documents - solr

Related

Reindexing Solr Data with different field type

Is there a way to view search document fields that are only indexed but not stored via the solr admin panel using the query tool?

How to query a specific document by id

How to view non-stored fields per document?

Solr Faceting Multi-valued vs Tokenizers

Categories

Resources