I was taking a look at the solrconfig.xml for the dismax parser and found a bunch of values such as sku, manu and cat. What are these?
<requestHandler name="dismax" class="solr.SearchHandler" >
<lst name="defaults">
<str name="defType">dismax</str>
<str name="echoParams">explicit</str>
<float name="tie">0.01</float>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="pf">
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
</str>
<str name="bf">
popularity^0.5 recip(price,1,1000,1000)^0.3
</str>
<str name="fl">
id,name,price,score
</str>
<str name="mm">
2<-1 5<-2 6<90%
</str>
<int name="ps">100</int>
<str name="q.alt">*:*</str>
<!-- example highlighter config, enable per-query with hl=true -->
<str name="hl.fl">text features name</str>
<!-- for this field, we want no fragmenting, just highlighting -->
<str name="f.name.hl.fragsize">0</str>
<!-- instructs Solr to return the field itself if no query terms are
found -->
<str name="f.name.hl.alternateField">name</str>
<str name="f.text.hl.fragmenter">regex</str> <!-- defined below -->
</lst>
</requestHandler>
Those are fields being searched for: SKU (stock keeping unit), manufacturer and categories.
you are probably looking at the solrconfig.xml that is provided as a SAMPLE, in order to be used with the docs to index at exampledocs/ directory.
These are the field names the sample docs (and schema) contain. It's just like a sample installation of solr.
Related
In the SOLR admin, we can see there is a spellcheck option but it is not showing the result.
How this is works with the select query.
If I searched with the spell URL direct, It gives me result as expected
http://localhost:8983/solr/prashant1/spell?q=blakc&spellcheck=on&wt=json
Result
{
"responseHeader":{
"status":0,
"QTime":8},
"response":{"numFound":0,"start":0,"docs":[]
},
"spellcheck":{
"suggestions":[
"blakc",{
"numFound":10,
"startOffset":0,
"endOffset":5,
"origFreq":0,
"suggestion":[{
"word":"black",
"freq":65146},
{
"word":"blanc",
"freq":151},
{
"word":"blake",
"freq":10},
{
"word":"blac",
"freq":2},
{
"word":"block",
"freq":1863},
{
"word":"blanca",
"freq":32},
{
"word":"blank",
"freq":31},
{
"word":"blade",
"freq":23},
{
"word":"blacks",
"freq":12},
{
"word":"blanco",
"freq":11}]}],
"correctlySpelled":false,
"collations":[]}}
But I need the same result with the select query which is not working from the SOLR admin.
Solrconfig.xml
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_general</str>
<!-- Multiple "Spell Checkers" can be declared and used by this
component
-->
<!-- a spellchecker built from a field of the main index -->
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">Name</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<!-- the spellcheck distance measure used, the default is the internal levenshtein -->
<str name="distanceMeasure">internal</str>
<!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
<float name="accuracy">0.5</float>
<!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
<int name="maxEdits">2</int>
<!-- the minimum shared prefix when enumerating terms -->
<int name="minPrefix">1</int>
<!-- maximum number of inspections per result. -->
<int name="maxInspections">5</int>
<!-- minimum length of a query term to be considered for correction -->
<int name="minQueryLength">4</int>
<!-- maximum threshold of documents a query term can appear to be considered for correction -->
<float name="maxQueryFrequency">0.01</float>
<!-- uncomment this to require suggestions to occur in 1% of the documents
<float name="thresholdTokenFrequency">.01</float>
-->
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
It should work with:
http://localhost:8983/solr/prashant1/select?q=Name%3Ablakc&spellcheck.q=blakc&spellcheck=on
Is there any setting and steps to be done?
Try by adding the spellcheck component to the standard query handler like
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
You can then call it like this:
http://localhost:8983/solr/select?q=yogik&spellcheck=true
Also don't forget to build the spellcheck dictionary before you use it:
http://localhost:8983/solr/select/?q=*:*&spellcheck=true&spellcheck.build=true
Querying an alias with 5 collections and getting suggestions for correct words as well.
Ex:- Collection1 has "tire policy" in it
Collection2 has a word "polite" in it.
When I query "tire policy" it checks and returns "polite" as a suggestion for "policy".
P.S. - During query time i am passing
spellcheck.maxResultsForSuggest=0
As without it spellchecker does not correct the wrong spellings.
I am using DirectSolrSpellchecker
Adjusted the
<float name="maxQueryFrequency">0.01</float> to
<float name="maxQueryFrequency">1</float>
but getting the same issue.
Direct solr spellchecker code-
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">text</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="distanceMeasure">org.apache.lucene.search.spell.LevenshteinDistance</str>
<float name="accuracy">0.5</float>
<int name="maxEdits">2</int>
<int name="minPrefix">1</int>
<int name="maxInspections">5</int>
<int name="minQueryLength">3</int>
<float name="maxQueryFrequency">1</float>
</lst>
spellchecker inside handler
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck.dictionary">file</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
There should not be any suggestions for policy as "maxqueryfreqency" is set to "1".
Have you tried to adjust the maxResultsForSuggest parameter?
With your current value of 5, if your query
returns 5 or fewer results, the spellchecker will report
"correctlySpelled=false" and also offer suggestions
solr 8.1 documentation
I'm having an annoying issue with the spellcheck component of solr 6.5.0. If I run a query through the spellcheck request handler, /spell, the query works as expected and I get suggested spelling for the incorrect words.
{
"responseHeader":{
"status":0,
"QTime":42},
"response":{"numFound":0,"start":0,"docs":[]
},
"spellcheck":{
"suggestions":{
"injary":{
"numFound":3,
"startOffset":0,
"endOffset":6,
"origFreq":0,
"suggestion":[{
"word":"injury",
"freq":121},
{
"word":"inward",
"freq":3},
{
"word":"injure",
"freq":1}]}},
"correctlySpelled":false,
"collations":{
"collation":{
"collationQuery":"injury",
"hits":121,
"misspellingsAndCorrections":[
"injary","injury"]},
"collation":{
"collationQuery":"inward",
"hits":3,
"misspellingsAndCorrections":[
"injary","inward"]},
"collation":{
"collationQuery":"injure",
"hits":1,
"misspellingsAndCorrections":[
"injary","injure"]}}}}
But if I run a query through the standard request handler, /select, I get no suggestions.
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"injary",
"indent":"on",
"spellcheck":"on",
"wt":"json",
"_":"1492780436450"}},
"response":{"numFound":0,"start":0,"docs":[]
}}
Any help would be greatly appreciated.
I modified the solrconfig.xml to bring the two request handlers into line as follows, the rest is default:
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">content</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="distanceMeasure">internal</str>
<float name="accuracy">0.5</float>
<int name="maxEdits">2</int>
<int name="minPrefix">1</int>
<int name="maxInspections">5</int>
<int name="minQueryLength">4</int>
<float name="maxQueryFrequency">0.01</float>
<float name="thresholdTokenFrequency">.0001</float>
</lst>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<!-- Solr will use suggestions from both the 'default' spellchecker
and from the 'wordbreak' spellchecker and combine them.
collations (re-written queries) can include a combination of
corrections from both spellcheckers -->
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
<str name="wt">json</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">_text_</str>
<str name="wt">json</str>
<!-- spell check component configuration -->
<str name="spellcheck">true</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.maxCollationTries">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
It appears the issue was related to my managed-schema file.
I am parsing XML files and solr automatically adds the fields of the XML files to the managed-schema file as type strings. When I changed my dictionary field to type text_general it starting working as expected.
I hostly can't see how this worked but I made no other changes. I deleted my core and started from scratch to make sure I wasn't mistaken but it worked.
The following query works well for me
http://...:8983/solr/vault/select?q=White&defType=edismax&qf=VersionComments+VersionName
returns all the documents where version comments includes White
I try to omit the qf containing the fields names :
In solr config I write
<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">PackageName</str>
<str name="df">Tag</str>
<str name="df">VersionComments</str>
<str name="df">VersionTag</str>
<str name="df">VersionName</str>
<str name="df">SKU</str>
<str name="df">SKUDesc</str>
</lst>
I restart the solr and create a full import.
Then I try using
http://...:8983/solr/vault/select?q=White&defType=edismax
But I dont get the document any as answer.
What am I doing wrong?
df is the default field and will only take effect if the qf is not defined and its a single definition field in the configuration.
You can check the below configuration with qt=edismax parameter :-
<requestHandler name="edismax" class="solr.SearchHandler" >
<lst name="defaults">
<str name="defType">edismax</str>
<str name="echoParams">explicit</str>
<str name="df">PackageName Tag VersionComments ....</str>
</lst>
</requestHandler>
You can use qf (query field) with weight indication.
<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<!--
[....]
-->
<str name="qf">PackageName^40.0 Tag^10.0 VersionComments^5.0 VersionTag^4.0</str>
<!--
[....]
-->
</lst>
</requestHandler>
Solr 4.8.1 We can make default as follows. by editing solrconfig.xml
<requestHandler name="/clustering" startup="lazy" enable="${solr.clustering.enabled:false}" class="solr.SearchHandler">
<lst name="defaults">
<!-- Configure the remaining request handler parameters. -->
<str name="defType">edismax</str>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
</lst>
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
I need to get snippets from documents where the query terms are matched to be able to output results similar to Google's snippet beneath the website URL. For example:
Snippet - Wikipedia, the free encyclopedia
en.wikipedia.org/wiki/Snippet
A snippet is defined as a small piece of something, it may in more specific contexts refer to: Sampling (music), the use of a short phrase of a recording as an ...
I have set hl=true and even hl.fl='*' in the query URL and but no summaries are being output.
Solr FAQs say:
For a field to be summarizable it must be both stored and indexed.
I'm using Nutch and Solr and have set them up using this tutorial. What additional steps to I need to take to be able to do this?
Adding sample query and output:
http://localhost:8983/solr/select/?q=test&version=2.2&start=0&rows=10&indent=on&hl=true
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">57</int>
<lst name="params">
<str name="indent">on</str>
<str name="start">0</str>
<str name="q">test</str>
<str name="hl">true</str>
<str name="version">2.2</str>
<str name="rows">10</str>
</lst>
</lst>
<result name="response" numFound="94" start="0">
<doc>
<arr name="anchor">
<str>User:Sir Lestaty de Lioncourt</str>
</arr>
<float name="boost">0.0</float>
<str name="digest">6c27160d0b08068f3873bb2c063508b3</str>
<str name="id">
http://aa.wikibooks.org/wiki/User:Sir_Lestaty_de_Lioncourt
</str>
<str name="segment">20111029223245</str>
<str name="title">User:Sir Lestaty de Lioncourt - Wikibooks</str>
<date name="tstamp">2011-10-29T21:34:27.055Z</date>
<str name="url">
http://aa.wikibooks.org/wiki/User:Sir_Lestaty_de_Lioncourt
</str>
</doc>
...
</result>
<lst name="highlighting">
<lst name="http://aa.wikibooks.org/wiki/User:Sir_Lestaty_de_Lioncourt"/>
<lst name="http://aa.wikipedia.org/wiki/User:PipepBot"/>
<lst name="http://aa.wikipedia.org/wiki/User:Purodha"/>
...
</lst>
</response>
Looks like you aren't specifying the field to highlight (hl.fl). You should create a text field to use for highlighting (don't use string type) and have it stored/indexed.