I am working on an application that requires me to use Solr for the first time. I got it set up, indexing the correct data, and querying as I would like it, but I cannot seem to get the spellcheck component working properly. No matter what I query, the spellchecker will not return any suggestions. I have included the relevant parts of my solrconfig and schema.xml.
schema.xml
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
</fieldType>
<!-- CUT -->
<field name="spell" type="textSpell" indexed="true" stored="true" />
solrconfig.xml
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.onlyMorePopular">false</str>
<!-- <str name="spellcheck.extendedResults">false</str> -->
<str name="spellcheck.count">3</str>
<str name="qf">
frontlist_flapcopy^0.5 title^2.0 subtitle^1.0 series^1.5 author^3.0 frontlist_ean^6.0
</str>
<str name="pf">
frontlist_flapcopy^0.5 title^2.0 subtitle^1.0 series^1.5 author^3.0 frontlist_ean^6.0
</str>
<str name="fl">
title,subtitle,series,author,eans,formats,prices,frontlist_ean,onsaledate,imprint,frontlist_flapcopy
</str>
<str name="mm">
2<-1 5<-2 6<90%
</str>
<int name="ps">100</int>
<bool name="hl">true</bool>
<str name="q.alt">*:*</str>
<str name="hl.fl">title,subtitle,series,author,frontlist_flapcopy</str>
<str name="f.title.hl.fragsize">0</str>
<str name="f.title.hl.alternateField">title</str>
<str name="f.subtitle.hl.fragsize">0</str>
<str name="f.subtitle.hl.alternateField">url</str>
<str name="f.series.hl.fragsize">0</str>
<str name="f.series.hl.alternateField">url</str>
<str name="f.author.hl.fragsize">0</str>
<str name="f.author.hl.alternateField">url</str>
<str name="f.frontlist_flapcopy.hl.fragsize">0</str>
<str name="f.frontlist_flapcopy.hl.alternateField">url</str>
<str name="echoParams">explicit</str>
<float name="accuracy">0.7</float>
</lst>
<lst name="appends">
<str name="fq">forsaleinusa:true</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<!-- CUT -->
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">default</str>
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="field">spell</str>
<str name="spellcheckIndexDir">/path/to/my/spell/index</str>
<str name="accuracy">0.7</str>
<float name="thresholdTokenFrequency">.0001</float>
</lst>
<lst name="spellchecker">
<str name="name">jarowinkler</str>
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="field">spell</str>
<str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
<str name="spellcheckIndexDir">/path/to/my/spell/index</str>
</lst>
<str name="queryAnalyzerFieldType">textSpell</str>
</searchComponent>
When I go to http://localhost:8983/solr/select/?q=query&spellcheck.build=true then look at the files generated in /path/to/my/spell/index, there is a segments.gen and a segments_1, both of which contain only a few bytes of binary data. Then, when I enter a query and append &spellcheck=true to the query string, I get no suggestions, no matter my query:
<lst name="spellcheck">
<lst name="suggestions"/>
</lst>
Any idea what is going on here?
I had a very similar problem that I was never able to resolve. Someone posted a detailed answer on my question that may be able to help you out:
solr suggester not returning any results
I ended up resolving this issue a while ago, but to my recollection, the issue was that I was using multiple <copyField/> directives to copy data to the "spell" field, but I did not set multiValued="true" on that field. When I made the spellcheck field multivalued, it worked like a charm!
Related
I have a Solr core (Solr version 6.4.1) where I'm also using a spellcheck component.
Problem is, as long as I have less than 30k items my spellchecker works fine. Increasing the number of docs to 30k or more causes the spellcheck not to return any result.
I'm aware of parameters in solrconfig.xml file, such as maxQueryFrequency or thresholdTokenFrequency, but altering them did not solve the problem.
I also read these: Apache Solr : Search is not returning result for large document indexed, Solr spellchecker not returning any results, solr suggester not returning any results and Solr spellcheckin randomly working, but they didn't help neither.
These are the relevant parts in solrconfig.xml:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_general</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">_spellcheck_</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="distanceMeasure">internal</str>
<float name="accuracy">0.5</float>
<int name="maxEdits">2</int>
<int name="minPrefix">1</int>
<int name="maxInspections">5</int>
<int name="minQueryLength">4</int>
<float name="maxQueryFrequency">0.1</float>
<float name="thresholdTokenFrequency">.0000001</float>
</lst>
</searchComponent>
and, in my request handler:
<bool name="spellcheck">true</bool>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.alternativeTermCount">2</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">5</str>
<str name="spellcheck.maxCollations">3</str>
_spellcheck_ is a CopyField (source="*"), indexed as text_general which is defined as:
<fieldType name="text_general" class="solr.TextField" >
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory" />
<tokenizer class="solr.ClassicTokenizerFactory" />
<filter class="solr.ClassicFilterFactory" />
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.HyphenatedWordsFilterFactory" />
</analyzer>
<analyzer type="query">
<charFilter class="solr.HTMLStripCharFilterFactory" />
<tokenizer class="solr.ClassicTokenizerFactory" />
<filter class="solr.ClassicFilterFactory" />
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.HyphenatedWordsFilterFactory" />
</analyzer>
</fieldType>
Any advice?
After some more work I found out the responsible to be the maxResultForSuggest parameter. The default value of 5 was not adequate to the size of my data set, setting it to 100 in my search handler solved my problem:
<str name="spellcheck.maxResultsForSuggest">100</str>
Hope this will help somebody.
I'm getting trouble with spell check.
If I send a request with "wrd", spellcheck give me suggestion I want : "word". But if I send a request with multiple terms, like "wrd black", spellcheck returns a correctlySpelled to true.
I want spellcheck suggestion : "word black".
Note that if I send a request with "wrd blck", spellcheck gives me suggestions I want ("word black").
I don't think this is a normal behaviour, but I can't find where is the problem.
Here is my solrconfig.xml :
<config>
<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
<lst name="defaults">
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">15</str>
<str name="spellcheck.maxCollations">10</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">spell</str>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="buildOnOptimize">true</str>
<str name="buildOnCommit">true</str>
<float name="thresholdTokenFrequency">.01</float>
</lst>
</searchComponent>
</config>
and in my schema.xml :
<field name="spell" type="textSpell" indexed="true" stored="false" multiValued="true" />
<copyField source="attr_*" dest="spell" />
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
</analyzer>
</fieldType>
Anyone has any ideas ?
There seems to be a bug when one of the query terms is spelled correctly and spellcheck configuration having maxCollationTries >1, i can not tell for sure its a bug , i am going through code to find out this.
Remove this config from your default params of your handler
<str name="spellcheck.maxCollationTries">15</str>
You can use this as query param as spellcheck.maxCollationTries=15 and try.
I have search suggestions working pretty well and I like that I get suggestions even if the original keyword returned results (if we have documents with misspellings in our collection). However, often, I am getting suggestions that return the exact same results. Ex. I search for yellow mint tin, I get "Did you mean yellow mint tins?"
Is there a way to remove suggestions that return the same results as the original term?
I'm using solr 4.6.0
Here's the info from solrconfig.xml
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_general</str>
<!-- a spellchecker built from a field of the main index -->
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">spell2</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<!-- the spellcheck distance measure used, the default is the internal levenshtein -->
<str name="distanceMeasure">internal</str>
<!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
<float name="accuracy">0.1</float>
<!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
<int name="maxEdits">2</int>
<!-- the minimum shared prefix when enumerating terms -->
<int name="minPrefix">0</int> <!-- if set to 1, must start with same letter -->
<!-- maximum number of inspections per result. -->
<int name="maxInspections">5</int>
<!-- minimum length of a query term to be considered for correction -->
<int name="minQueryLength">4</int>
<!-- maximum threshold of documents a query term can appear to be considered for correction -->
<float name="maxQueryFrequency">0.01</float>
</lst>
<!-- a spellchecker that can break or combine words. See "/spell" handler below for usage -->
<lst name="spellchecker">
<str name="name">wordbreak</str>
<str name="classname">solr.WordBreakSolrSpellChecker</str>
<str name="field">spell2</str>
<str name="combineWords">true</str>
<str name="breakWords">true</str>
<int name="maxChanges">10</int>
<str name="buildOnCommit">true</str>
<int name="minBreakLength">3</int>
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="echoParams">none</str>
<int name="rows">10</int>
<str name="df">contents</str>
<str name="defType">edismax</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">25</str>
<str name="spellcheck.maxResultsForSuggest">25</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.collateParam.defType">dismax</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
Here's the info from schema.xml
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="spell2" type="text_general" indexed="true" stored="false" required="false" multiValued="true" />
An example query - http://localhost:8985/solr/(collection)/spell?q=yellow%20buttermints
returns
<str name="collation">yellow (butter mints)</str>
<str name="collation">yellow buttermint</str>
"yellow buttermints" and "yellow buttermint" return the same results.
I don't think there is a definite way to guarantee this. But this should definitely help -
Add this filter both at query and index time - EnglishMinimalStemFilterFactory
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-EnglishMinimalStemFilter
I am not sure if how would SynonymFilterFactory work in this case. You could try it without it too
I am working with Solr Spell Check . Got it up and running . However for certain misspells it is not giving the expected result :
Correct Word : Cancer
Incorrect Spelling : Cacner ,cacnar , cancar ,cancre,cancere .
I am not getting "cancer" as the suggestion for "cacnar" instead it shows "inner" which although sounds more like cacner is not the correct suggestion . And for cacnar again I am getting a suggestion as 'pulmonary'.
Any way of configuring it to display cancer instead of the other results ?
Alternatively is there any score for the suggestions that can be referred to before showing it to the user ?
As per request here is the configuration :
The field used for dictionary (in schema.xml):
<copyField source="procname" dest="dtextspell" />
<field name = "dtextspell" stored="false" type="text_small" multiValued="true" indexed="true"/>
Definition of "text_small" (again in schema.xml) :
<fieldType name="text_small" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
<analyzer type ="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
</fieldType>
In solrconfig.xml :
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_small</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="field">dtextspell</str>
<float name="thresholdTokenFrequency">.0001</float>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="field">name</str>
<str name="buildOnCommit">true</str>
</lst></searchComponent>
Attached it to the select request handler like this :
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="spellcheck.count">10</str>
<str name="df">text</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr> </requestHandler>
To build the spell check :
http://localhost:8080/solr/select?q=*:*&spellcheck=true&spellcheck.build=true
To search for term :
http://localhost:8080/solr/select?q=procname:%22cacner%22&spellcheck=true&defType=edismax
The response XML :
<lst name="spellcheck"><lst name="suggestions">
<lst name="cacner">
<int name="numFound">1</int>
<int name="startOffset">10</int>
<int name="endOffset">16</int>
<arr name="suggestion">
<str>inner</str> <end tags start from here>
Hope it helps !!
Sounds like you've not rebuilt the spellchecker's index recently. Request a manual update by make a query with spellcheck=true&spellcheck.build=true appended to the query string (do NOT do this on every request, as the build process can take some time). You should also make sure that you're using the correct field to build your spellchecker's index.
You can also configure the spellchecker component to rebuild the index on every commit or on every optimize, by adding:
<str name="buildOnCommit">true</str>
or
<str name="buildOnOptimize">true</str>
to your spellchecker configuration.
I have tried to use phonetic filters for the field that indexes spellings (solr 1.4). Following is the fieldType configuration in schema.xml
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
</analyzer>
</fieldType>
However i do not see any difference when phonetic filter is used (size of the spellchecker index remains same and no difference in corrections). Are phonetic filters ignored when used with spellCheckers or is there any issue with my configuration?
solrConfig.xml
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck">true</str>
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">loc_name_texts</str>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
UPDATE:
I have initially configured filters wrongly so WhitespaceTokenizer was being used all the time. I have corrected that now.. However, when phonetic filters are used, solr returns the transformed data (metaphones). Is there anyway to get the content stored as part of the field?
phonetic filters in solr are not used to return a corrected suggestion. they are used to match a document even if the query is spelled wrong.
the spellcheck component is used to return a corrected suggestion, but works only on fields with whole words, not phonetic fields.
try changing 'spellcheck' element to this
<bool name="spellcheck">true</bool>