Why my configuration for SOLR uniqueKey with solr.UUIDField generates ERROR SolrException missing required field: id - solr

I want to use UUIDUpdateProcessorFactory to automatically generate UUID as the uniqueKey field for all new documents that will be created. I followed the given instructions & also tried already given solutions here & here but nothing worked. I am using solr version 9.1.1 on Ubuntu 22.04.1 LTS.
I amgetting the error: missing required field: id.
Following is my current configuration which is generating error:
solr-spec: 9.1.1
lucene-spec: 9.3.0
managed-schema.xml
-----------------------
<fieldType name="uuid" class="solr.UUIDField" indexed="true"/>
<field name="id" type="uuid" indexed="true" stored="true" required="true" multiValued="false" />
<field name="_root_" type="uuid" indexed="true" stored="false" docValues="false" />
<uniqueKey>id</uniqueKey>
solrconfig.xml
---------------
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">uuid</str>
</lst>
</requestHandler>
Error Message:
RequestHandlerBase org.apache.solr.common.SolrException: [doc=null] missing required field: id
Please advise what I am missing?

Related

Solr spellcheckin randomly working

I've got a problem with the spell checker integrated in solr.
I have (for now) two cores, configured with the same solrconfig.xml (with right settings for the spellchecker) and a slightly different XML (with the same configuration for spellchecker).
The problem is that for one of the core the spell checker works perfectly, for the other not.
For the not working one from Solr Admin I can see that the field "spelling" (the field the spell check uses) is indexed but no stored.
Any idea?
I don't think I will be able to post xml files, as they don't belong to me.
Thanks everybody
EDIT:
Solrxml.conf
<requestHandler name="/select" class="solr.SearchHandler">
...
</requestHandler>
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="classname">solr.IndexBasedSpellChecker</str>
<!-- field to use -->
<str name="field">spelling</str>
<!-- buildOnCommit|buildOnOptimize -->
<str name="buildOnCommit">true</str>
<!-- $solr.solr.home/data/spellchecker-->
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="accuracy">0.7</str>
<float name="thresholdTokenFrequency">.0001</float>
</lst>
</searchComponent>
schema.xml (working)
<schema name="docs" version="1.5">
...
<field name="fooCore1" type="text" indexed="true" stored="true" multiValued="false" />
<!-- Spellcheck -->
<field name="spelling" type="text" indexed="true" stored="true" multiValued="false" />
<copyField source="fooCore1" dest="spelling" />
...
...
<solrQueryParser defaultOperator="OR"/>
</schema>
schema.xml (not working)
<schema name="docs" version="1.5">
...
<field name="fooFoo" type="text" indexed="true" stored="true" multiValued="false" />
<copyField source="fooFoo" dest="fooCore" maxChars="300000" />
<!-- Spellcheck -->
<field name="fooCore2" type="text" indexed="true" stored="true" multiValued="false" />
<copyField source="fooCore2" dest="spelling" maxChars="300000" />
...
</schema>
All fields except spelling in the second schema, are stored and indexed with their value.
Even tried creating a third core but neither it is working.
It seems like that a copyField cannot be a source for another copyField.
Changed the source from a copyfield to a field for the wrong schema and it solved the problem.

Need help to decide between the type of spellchecker to use in solr?

I have a list of cities on mysql db which is hooked onto a UI for autocompletion purposes. I am currently using solr-5.3.0. Data import is happening through scheduled delta imports. I have the following questions:
I want to implement spell checker to this feature. I tried using:
DirectSolrSpellChecker
IndexBasedSpellChecker
FileBasedSpellChecker
Out of these 3 only FileBasedSpellChecker is able to give
suggestions that solely exists on db. For eg, while searching
cologne I've got results like
{
"responseHeader":{
"status":0,
"QTime":4,
"params":{
"q":"searchfield:kolakata",
"indent":"true",
"spellcheck":"true",
"wt":"json"}},
"response":{"numFound":0,"start":0,"docs":[]
},
"spellcheck":{
"suggestions":[
"cologne",{
"numFound":4,
"startOffset":12,
"endOffset":19,
"suggestion":["Cologne",
"Bologna",
"Cogne",
"Bastogne"]}],
"collations":[
"collation","searchfield:Cologne"]}}
These cities are pretty accurate and exists in db/file.
But when I use other 2 I got results like
{
"responseHeader":{
"status":0,
"QTime":4,
"params":{
"q":"searchfield:kolakata",
"indent":"true",
"spellcheck":"true",
"wt":"json"}},
"response":{"numFound":0,"start":0,"docs":[]
},
"spellcheck":{
"suggestions":[
"cologne",{
"numFound":4,
"startOffset":12,
"endOffset":19,
"suggestion":["Cologne",
"Cologn",
"Colognei"]}],
"collations":[
"collation","searchfield:Cologne"]}}
These cities who are not present in my db.
Though FileBasedSpellChecker is giving satisfactory results, but I
am a little apprehensive in using them because, I would need to keep
updating the file manually everytime a new city gets added/removed.
Also its generally not advisable to use FileBasedSpellChecker in
general.
I also need to make the suggestions searchable as well, that means
currently I am accessing the doc returned in
"responseHeader":{"response":{"docs":[<some-format>]}}
to search for results in that city, but now I want the suggestor to
return the results in the same <some-format> instead of just
string results, in order to get it integrated with UI properly.
One minor change requested is to sort the suggestions in ascending
order of edit/levenshtein distance. This is not a hard requirement
and can be negotiated with.
edit
My solrconfig looks like this:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">searchfield</str>
<str name="spellcheck">true</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.dictionary">file</str>
<str name="spellcheck.maxCollationTries">5</str>
<str name="spellcheck.count">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
and
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_ngram</str>
<lst name="spellchecker">
<str name="name">file</str>
<str name="classname">solr.FileBasedSpellChecker</str>
<str name="sourceLocation">spellings.txt</str>
<str name="spellcheckIndexDir">./spellchecker</str>
</lst>
</searchComponent>
schema looks like this:
<field name="name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="latlng" type="location" indexed="true" stored="true" multiValued="false" />
<field name="citycode" type="string" indexed="true" stored="true" multiValued="false" />
<field name="country" type="string" indexed="true" stored="true" multiValued="false" />
<field name="searchscore" type="float" indexed="true" stored="true" multiValued="false" />
<field name="searchfield" type="text_ngram" indexed="true" stored="false" multiValued="true" omitNorms="true" omitTermFreqAndPositions="true" />
<defaultSearchFieldsearchfield</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
<copyField source="name" dest="searchfield"/>

DocExpirationUpdateProcessorFactory not deleting records

I am trying to use the DocExpirationUpdateProcessorFactoryfactory in Solr 4.10.1 version.
I have included the following in my solrconfig.xml
<updateRequestProcessorChain default="true">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.TimestampUpdateProcessorFactory">
<str name="fieldName">timestamp</str>
</processor>
<processor class="solr.processor.DocExpirationUpdateProcessorFactory">
<int name="autoDeletePeriodSeconds">30</int>
<str name="ttlFieldName">ttl</str>
<str name="expirationFieldName">expire_at</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
And I have included the following in my schema.xml
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
<field name="ttl" type="date" indexed="true" stored="true" default="NOW+60SECONDS" multiValued="false"/>
<field name="expire_at" type="date" indexed="true" stored="true" multiValued="false"/>
As you can see I am setting the time to live to be 60 seconds and checking to delete every 30 seconds, when I insert a document , and check after a minute or couple or an hour it never gets deleted.
This is what I see in the indexed document , can you please let me know what might be the issue here ? Please note that the expire_at field is never getting generated in the Solr document as can be seen below.
"id": "3888a8ac-fbc4-437a-8248-132384753c00",
"timestamp": "2015-02-04T04:09:21.29Z",
"_version_": 1492147724740460500,
"ttl": "2015-02-04T04:10:21.29Z"
This is because I was setting the ttl to be of type date ,it should rather be set to type string and should be set as "+60SECONDS" and not as "NOW+60SECONDS"

Configuring Solr to use UUID as a key

I am trying to configure Solr 4 to work with UUID and so far I am unsuccessful
From reading the documentation I have seen two different ways to configure schema.xml to work with UUID (both do not work)
for both I need to write
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
option 1:
add:
<field name="id" type="uuid" indexed="true" stored="true" default="NEW" multiValued="false"/>
and make sure to remove the line
<uniqueKey>id</uniqueKey>
option 2
add:
<field name="id" type="uuid" indexed="true" stored="true" required="true" multiValued="false" />
Both options are not working correctly and returning
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error initializing QueryElevationComponent.
I also tried adding a row to the colrconfig.xml file with the configuration:
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">uniqueKey</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Thanks,
Shimon
After some work here is the solution:
In schema.xml, add (or edit) the field field
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
In solr config, update the chain and add the chain to the handlers (Example: for /update/extract):
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>`
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
<str name="update.chain">uuid</str>
</lst>
</requestHandler>
You may want to remove the Query Elevation component if not using it.
QueryElevationComponent requires unique key to be defined and it should be a string unique key with JIRA.
However, it was fixed with the Solr 4.0 alpha so it would depend what Solr version you are using.
This limitation is documented in the Solr wiki.

How do I get solr to return results from all indicies?

I am starting to integrate with Solr and have run across what I perceive as an issue. I uploaded a simple spreadsheet using the java API (here is an exert:
- Document, id, value
- Excel3, name, steelers
- Excel3, subject, pirates
- Excel3, description, penguins
- Excel3, comments, panthers
- Excel3, author, panthers
)
Using this I used the first column as the "document name", second column as the field in the document to index, and the third column as the indexed data. All of these fields already existed in schema.xml, but here is how they are set up:
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="subject" type="text_general" indexed="true" stored="true"/>
<field name="description" type="text_general" indexed="true" stored="true"/>
<field name="comments" type="text_general" indexed="true" stored="true"/>
<field name="author" type="text_general" indexed="true" stored="true"/>
now here is where my problem comes into play. I run a search for say steelers, and it comes back fine, but if I look for penguins, or many of the other fields, it does not pull back any results. However if I do description:penguins, the result pulls back as expected.
Can anyone please help me understand why the part before the : is required for some fields, but not others?
example searches:
solr/select?indent=on&q=penguins&wt=xml ----Doesn't return any results
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="indent">on</str>
<str name="q">penguins</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>
solr/select?indent=on&q=description:penguins&wt=xml
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">18</int>
<lst name="params">
<str name="indent">on</str>
<str name="q">description:penguins</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="author">panthers</str>
<str name="comments">panthers</str>
<str name="description">penguins</str>
<str name="id">Excel3</str>
<str name="name">steelers</str>
<str name="subject">pirates</str>
</doc>
</result>
</response>
The default query parser will query the default field, which can be specified in the schema.xml as seen here: http://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field
I think #Frank Famer's comment about using the DisMax parser is a real solution to this problem. That said, here are two work-arounds I've seen in practice:
1.Create an additional copyField that is indexed, not stored, that contains the values from all the fields you want to search and then specify that field as the default. It would look something like this in your schema.xml file.
<field name="myhugedefaultfield" type="text" indexed="true" stored="false" multiValued="true"/>
<copyField source="name" dest="myhugedefaultfield"/>
<copyField source="subject" dest="myhugedefaultfield"/>
<copyField source="description" dest="myhugedefaultfield"/>
<defaultSearchField>myhugedefaultfield</defaultSearchField>
2.Alter the user edited syntax and turn the query for penguins into a query for (name:penguins) OR (subject:penguins) OR (description:penguins).

Resources