Solr spellcheckin randomly working - solr

I've got a problem with the spell checker integrated in solr.
I have (for now) two cores, configured with the same solrconfig.xml (with right settings for the spellchecker) and a slightly different XML (with the same configuration for spellchecker).
The problem is that for one of the core the spell checker works perfectly, for the other not.
For the not working one from Solr Admin I can see that the field "spelling" (the field the spell check uses) is indexed but no stored.
Any idea?
I don't think I will be able to post xml files, as they don't belong to me.
Thanks everybody
EDIT:
Solrxml.conf
<requestHandler name="/select" class="solr.SearchHandler">
...
</requestHandler>
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="classname">solr.IndexBasedSpellChecker</str>
<!-- field to use -->
<str name="field">spelling</str>
<!-- buildOnCommit|buildOnOptimize -->
<str name="buildOnCommit">true</str>
<!-- $solr.solr.home/data/spellchecker-->
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="accuracy">0.7</str>
<float name="thresholdTokenFrequency">.0001</float>
</lst>
</searchComponent>
schema.xml (working)
<schema name="docs" version="1.5">
...
<field name="fooCore1" type="text" indexed="true" stored="true" multiValued="false" />
<!-- Spellcheck -->
<field name="spelling" type="text" indexed="true" stored="true" multiValued="false" />
<copyField source="fooCore1" dest="spelling" />
...
...
<solrQueryParser defaultOperator="OR"/>
</schema>
schema.xml (not working)
<schema name="docs" version="1.5">
...
<field name="fooFoo" type="text" indexed="true" stored="true" multiValued="false" />
<copyField source="fooFoo" dest="fooCore" maxChars="300000" />
<!-- Spellcheck -->
<field name="fooCore2" type="text" indexed="true" stored="true" multiValued="false" />
<copyField source="fooCore2" dest="spelling" maxChars="300000" />
...
</schema>
All fields except spelling in the second schema, are stored and indexed with their value.
Even tried creating a third core but neither it is working.

It seems like that a copyField cannot be a source for another copyField.
Changed the source from a copyfield to a field for the wrong schema and it solved the problem.

Related

Why my configuration for SOLR uniqueKey with solr.UUIDField generates ERROR SolrException missing required field: id

I want to use UUIDUpdateProcessorFactory to automatically generate UUID as the uniqueKey field for all new documents that will be created. I followed the given instructions & also tried already given solutions here & here but nothing worked. I am using solr version 9.1.1 on Ubuntu 22.04.1 LTS.
I amgetting the error: missing required field: id.
Following is my current configuration which is generating error:
solr-spec: 9.1.1
lucene-spec: 9.3.0
managed-schema.xml
-----------------------
<fieldType name="uuid" class="solr.UUIDField" indexed="true"/>
<field name="id" type="uuid" indexed="true" stored="true" required="true" multiValued="false" />
<field name="_root_" type="uuid" indexed="true" stored="false" docValues="false" />
<uniqueKey>id</uniqueKey>
solrconfig.xml
---------------
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">uuid</str>
</lst>
</requestHandler>
Error Message:
RequestHandlerBase org.apache.solr.common.SolrException: [doc=null] missing required field: id
Please advise what I am missing?

Does SOLR cell in any way limit the amount of characters imported into a solr.TextField?

I'm indexing with Solr Cell a large HTML page using a curl command with a Windows command prompt like so:
curl http://localhost:8987/solr/myexample/update/extract -d #test.html -H 'Content-type:html'
I have found that I'm missing data (text) in my fields when I query (query?q=*:*&q.op=OR&indent=true) them in the admin menu of SOLR.
Example: I have a bunch of lorem ipsum <p> tags but near the end of my HTML page I have another paragraph tag of Hello world, this does not show up in SOLR admin.
I found the following on the old wiki.
Large individual fields.
It is possible to store megabytes of text in one record. These fields are clumsy to work with. By default the number of characters stored is clipped.
It does not go into any details on how you would prevent the text from being clipped, that is if this is even what's causing the issue because I can't even get MB worth of data in a field before it's cut.
schema.xml
<field name="main" type="text_general" indexed="true" stored="true"/>
<field name="div" type="text_general" indexed="true" stored="true"/>
<field name="doc_id" type="string" uninvertible="true" indexed="true" stored="true"/>
<field name="date_pub" type="pdate" uninvertible="true" indexed="true" stored="true"/>
<field name="p" type="text_general" uninvertible="true" indexed="true" stored="true"/>
<field name="_text_" type="text_general" indexed="true" stored="true" multiValued="true"/>
<copyField source="*" dest="_text_"/>
solrconfig.xml
<requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="fmap.content">content</str>
<str name="capture">div</str>
<str name="fmap.div">div</str>
<str name="capture">h1</str>
<str name="fmap.h1">h1</str>
<str name="capture">h2</str>
<str name="fmap.h2">h2_t</str>
<str name="capture">p</str>
<str name="fmap.p">p</str>
</lst>
</requestHandler>
Solr Version: 8.10.1
SOLR cell doesn't seem to limit the characters, however, and don't ask me why, the culprit was the curl command I was using below:
curl http://localhost:8987/solr/myexample/update/extract -d #test.html -H 'Content-type:html'
Solution: The following command pulls all the text without truncating any of the text (replace paths with wherever your post.jar and HTML file are):
java -jar -Dc=myexample -Dauto example\exampledocs\post.jar example\exampledocs\sample.html
Worth noting these are Window commands for the Command Prompt.

Solr edismax relevancy sorting multiple fields

I use the edismax query parser to handle user queries against our Solr 4.4 server.
Im getting correct query ,but require help with the prioritization.
For example if i give q=ideapad miix 310
1)It will get all the exact matched ,this is working fine .Now if the results
contains ideapad instead of full matched word it should be given least priority
2)prioritization of results in this order
field8,keywords,product,marketing,description .Also here ideapad will be
have least priority.
MY bq:
bq:text:"ideapad miix 310"^20000 OR (text:"miix"^12000 -text:ideapad^-20 -text:thinkpad^-20 -text:ideacentre^-20 -text:thinkcentre^-20 text:"310"^1000 -text:ideapad^-20 -text:thinkpad^-20 -text:ideacentre^-20 -text:thinkcentre^-20)
URL
http://localhost:8983/solr/collection1/select?q=ideapad+miix+310&defType=edismax&bq=text%3A%22ideapad+miix+310%22%5E20000++OR+(text%3A%22miix%22%5E12000+-text%3Aideapad%5E-20+-text%3Athinkpad%5E-20+-text%3Aideacentre%5E-20+-text%3Athinkcentre%5E-20+text%3A%22310%22%5E1000+-text%3Aideapad%5E-20+-text%3Athinkpad%5E-20+-text%3Aideacentre%5E-20+-text%3Athinkcentre%5E-20)
I use the catch all field "text" and boosted copied each fields(field8,keywords etc....)
<field name="field8" type="text_search" indexed="true" stored="true" omitNorms="true"/>
<field name="description" type="text_search" indexed="true" stored="true" omitNorms="true"/>
<field name="keywords" type="commaDelimited" indexed="true" stored="true" omitNorms="true"/>
<field name="product" type="commaDelimited" indexed="true" stored="true" omitNorms="true" omitPositions="true" omitTermFreqAndPositions="true"/>
<field name="marketing" type="commaDelimited_s" indexed="true" stored="true" omitNorms="true" omitPositions="true" omitTermFreqAndPositions="true"/>
<copyField source="field8" dest="text"/>
<copyField source="field8" dest="text"/>
My solrconfig for edismax i have boosted the fields
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="defType">edismax</str>
<str name="qf">
text^100 field8^90 keywords^80 product^70 marketing^60 description^10
</str>
<str name="pf">
text^100 field8^90 keywords^80 product^70 marketing^60 description^10
</str>
</lst>
</requestHandler>

Configuring Solr to use UUID as a key

I am trying to configure Solr 4 to work with UUID and so far I am unsuccessful
From reading the documentation I have seen two different ways to configure schema.xml to work with UUID (both do not work)
for both I need to write
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
option 1:
add:
<field name="id" type="uuid" indexed="true" stored="true" default="NEW" multiValued="false"/>
and make sure to remove the line
<uniqueKey>id</uniqueKey>
option 2
add:
<field name="id" type="uuid" indexed="true" stored="true" required="true" multiValued="false" />
Both options are not working correctly and returning
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error initializing QueryElevationComponent.
I also tried adding a row to the colrconfig.xml file with the configuration:
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">uniqueKey</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Thanks,
Shimon
After some work here is the solution:
In schema.xml, add (or edit) the field field
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
In solr config, update the chain and add the chain to the handlers (Example: for /update/extract):
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>`
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
<str name="update.chain">uuid</str>
</lst>
</requestHandler>
You may want to remove the Query Elevation component if not using it.
QueryElevationComponent requires unique key to be defined and it should be a string unique key with JIRA.
However, it was fixed with the Solr 4.0 alpha so it would depend what Solr version you are using.
This limitation is documented in the Solr wiki.

How do I get solr to return results from all indicies?

I am starting to integrate with Solr and have run across what I perceive as an issue. I uploaded a simple spreadsheet using the java API (here is an exert:
- Document, id, value
- Excel3, name, steelers
- Excel3, subject, pirates
- Excel3, description, penguins
- Excel3, comments, panthers
- Excel3, author, panthers
)
Using this I used the first column as the "document name", second column as the field in the document to index, and the third column as the indexed data. All of these fields already existed in schema.xml, but here is how they are set up:
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="subject" type="text_general" indexed="true" stored="true"/>
<field name="description" type="text_general" indexed="true" stored="true"/>
<field name="comments" type="text_general" indexed="true" stored="true"/>
<field name="author" type="text_general" indexed="true" stored="true"/>
now here is where my problem comes into play. I run a search for say steelers, and it comes back fine, but if I look for penguins, or many of the other fields, it does not pull back any results. However if I do description:penguins, the result pulls back as expected.
Can anyone please help me understand why the part before the : is required for some fields, but not others?
example searches:
solr/select?indent=on&q=penguins&wt=xml ----Doesn't return any results
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="indent">on</str>
<str name="q">penguins</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>
solr/select?indent=on&q=description:penguins&wt=xml
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">18</int>
<lst name="params">
<str name="indent">on</str>
<str name="q">description:penguins</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="author">panthers</str>
<str name="comments">panthers</str>
<str name="description">penguins</str>
<str name="id">Excel3</str>
<str name="name">steelers</str>
<str name="subject">pirates</str>
</doc>
</result>
</response>
The default query parser will query the default field, which can be specified in the schema.xml as seen here: http://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field
I think #Frank Famer's comment about using the DisMax parser is a real solution to this problem. That said, here are two work-arounds I've seen in practice:
1.Create an additional copyField that is indexed, not stored, that contains the values from all the fields you want to search and then specify that field as the default. It would look something like this in your schema.xml file.
<field name="myhugedefaultfield" type="text" indexed="true" stored="false" multiValued="true"/>
<copyField source="name" dest="myhugedefaultfield"/>
<copyField source="subject" dest="myhugedefaultfield"/>
<copyField source="description" dest="myhugedefaultfield"/>
<defaultSearchField>myhugedefaultfield</defaultSearchField>
2.Alter the user edited syntax and turn the query for penguins into a query for (name:penguins) OR (subject:penguins) OR (description:penguins).

Resources