Incorrect field reading during ranking - solr

Solr version 5.1.0
Documents contain DocValues field "ts" with timestamp using during ranking.
<field name="ts" type="long" docValues="true" indexed="true" stored="true" multiValued="false"/>
If I directly request document at Solr Admin UI I see that it contains correctly value:
"ts": 1575624481951
But when I added logs into the ranking method I saw that "ts" values for the same document is 0.
LeafReader reader = context.reader();
NumericDocValues timeDV = DocValues.getNumeric(reader, "ts");
long timestamp = timeDV.get(doc);
LOG.info("ts: " + timestamp);
Log:
ts: 0

Problem was in incorrect deleting document from Solr.
That was reproducing with next sequence of actions:
Firstly document was added to Solr without field "ts".
After some actions in app document was added again but with field "ts".
When Solr tried to ranking this document had not this field.
I added additional logs and saw that first version of document was on one shard and second version (with field "ts") was on another shard.
I don't pretty sure why it may happened because as I know Solr should put the same document on the same shard.
But anyway it was fixed with deleting document from index before adding second version.

Related

Reindexing Solr Data with different field type

I am facing an issue while reindexing Solr data.
I have indexed some documents specifying a wrong field type on the managed-schema file.
Now, instead of the wrong field definition, I would like to use:
<field name="documentDate" type="date" indexed="true" stored="true"/>
To do this I have:
deleted all the previous wrong indexed documents;
updated the managed-schema
reloaded the core
After these steps I tried to reindex documents, but this fails; looking at logs:
org.apache.solr.common.SolrException: Exception writing document id 2ecde3eb2b5964b2c44362f752f7b90d to the index; possible analysis error: cannot change DocValues type from NUMERIC to SORTED_SET for field "documentDate".
How is this possible? I have removed all the documents storing the field documentDate.. How can I solve this issue?
maybe try to delete the data folder in your core.
You can add new fields to your schema without delete the data folder, but when you modify a field (this is my experience) then I have to delete the data folder and build a new fresh index

Additional fields in schema.xml are not showing up when I do a query

Solr version information: 6.6.0
The core is named: solr
Instance: /var/solr/data/new_core
In the /var/solr/data/new_core/conf/ directory I have a custom schema.xml file
I have multiple custom fields like this in the schema.xml file
<field name="nid" type="int" indexed="true" stored="true"/>
When I select the 'solr' core and go to query, these custom fields are not showing up in the results. Here's an example of the results:
{
"response":{"numFound":200,"start":0,"docs":[
{
"id":"koe1eh/node/49",
"site":"https://example.com:1881/",
"hash":"koe1eh",
"ss_language":"und",
"url":"https://example.com:1881/node/49",
"ss_name":"tfadmin",
"tos_name":"tfadmin",
"ss_name_formatted":"tfadmin",
"tos_name_formatted":"tfadmin",
"is_uid":1,
"bs_status":true,
"bs_sticky":false,
"bs_promote":false,
"is_tnid":0,
"bs_translate":false,
"ds_created":"2009-03-12T17:46:06Z",
"ds_changed":"2009-06-18T15:25:33Z",
"ds_last_comment_or_change":"2009-06-18T15:25:33Z",
"tos_content_extra":" (Gifts) ",
"sm_field_apptype":["mousepad"],
"_version_":1588589404094464000,
"timestamp":"2018-01-03T16:28:34Z"}]
}}
The query performed is: http://example.com/solr/solr/select?indent=on&q=*:*&rows=1&wt=json
solrconfig.xml is here: https://pastebin.com/iVhZCqTW
schema.xml is here: https://pastebin.com/UBaUN5EK
I have tried restarting solr, reloading the core, and reindexing with no effect.
It turns out that I was just looking at some entries that did not have those fields. When I altered my query to start on record 900, then I saw the fields I was looking for. I'm not sure what else I may have done to get this working as I've been trying many different things.

Solr field not visible in query results

I have added a new field in the schemas:
<field indexed="false" stored="true" docValues="true" sortMissingLast="true" name="RankScoreXXX" type="int" />
After all the indexing operations are done, in the solr admin panel while performing queries I do not see that field in any results where the value is actually 0. Results that contain a > 0 value in this specific field are shown.
By using this parameter I can see that none result does not contain this value
fq: -RankScoreXXX: [* TO *] .Also, I can sort results by this specific field.
I just do not understand why results with RankScoreXXX = 0 are not visible in the solr panel admin for given results.
Am I missing something?
Thanks.
I have ran into this scenario a few times. Let me tell you what each one was:
Field was added but reindexing did not take place for all documents, only new ones. This is not your case as you reindexed.
Request handler was not updated in solrconfig.xml. In this case the person added the field and had configured the request handler to return a specific number of fields using fl. The field was not in the list.

Stored fields in Solr are getting displayed in queries , why?

I am new to using Solr , and I have made a new core and copied the default schema.xml to the conf/ folder. The changes I have made is very trivial .
<field name="id" type="string" indexed="true" stored="false" required="true" multiValued="false" />
As you can see, I set the id field to stored=false. As per my understanding, the field id should not be displayed now when I do a query search. But that is not happening. I have tried restarting solr instance, and did the query to index the file again.
curl 'http://localhost:8983/solr/TwitterCore/update/json?commit=true'
--data-binary #$(echo TwitterData_Core_Conf/TwitterText_en_demo.json)
-H 'Content-type:application
As per Solr Wiki , this should have re-indexed my file. However when I run my query again, I still see the Id .
An example of the document returned (this is not the complete JSON node , I just copied some parts ) :
"text": [
"RT #FollowTrainTV: Moonseternity just joined #FollowTrainTV - Watch them stream on http://t.co/oMcOGA51kT"
],
"lang": [
"en"
],
"id": "0a8edfea-68f7-4b05-b370-27b5aba640b7", // I dont want to see this
"_version_": 1512067627994841000
Maybe someone can give me detailed steps on re-indexing.
When you change the schema.xml file and restart the solr-server, the changes only apply for new documents. This means you have to clear the index and re-index all documents (Except at query tokenizer, these changes are active immediately after server restart, but this is not the case here). After re-indexing, the id field should not be visible any more.
Another remark: You don't have to test your queries with curl. When you connect to http://localhost:8983/solr with your web-browser you should find an admin interface there. There you can select a core and test your queries.
Refer to this https://lucene.apache.org/solr/guide/6_6/docvalues.html document.
Non-stored docValues fields will be also returned along with other
stored fields when all fields are
specified to be returned (e.g. “fl=*”) for search queries depending on
the effective value of the useDocValuesAsStored parameter for each
field. For schema versions >= 1.6, the implicit default is
useDocValuesAsStored="true".
The String field type has docValues="true" . That is the reason why it is appearing in the search response.
You can either add the useDocValuesAsStored="false" parameter to the field or you can use a different fieldType, say text_general.

Know indexing time for a document in Solr

Is it possible to know the indexing time of a document in solr. Like there is a implicit field for "score" which automatically gets added to a document, is there a field that stores value of indexing time?
I need it to know the date when a document got indexed.
Thanks
Solr does not automatically add a create date to documents. You could certainly index one with the document though, using Solr's DateField. In earlier versions or Solr ( < 4.2 ), there was a commented timestamp field in the example schema.xml, which looked like:
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
Also, I think it bears noting that there is no implicit "score" field. Scores are calculated at query time, rather than being tied to the document. Different queries will generate different scores for the same document. There are norms stored with the document that are factored into scores, but they aren't really fields.
femtoRgon give you a correct solution but you must be carefull with partial document update.
If you do not do partial document update you can stop reading now ;-)
If you partially update your document, SolR will merge the existing value with your partial document and the timestamp will not be updated. The solution is to not store the timestamp, then SolR will not be able to merge this value. The drawback is you cannot retrieve the timestamp with your search result.

Resources