Solr 4.10 - Suggester is not working with multi-valued field - solr

Hello everyone i am using solr 4.10 and i am not getting the result as per my expectation. i want to get auto complete suggestion using multiple fields that is discountCatName,discountSubName and vendorName. i have a created multi-valued field "suggestions" using copyfield and using that filed for searching in suggester configuration.
Note: discountSubName & discountCatName are again multi-valued field, vendorName is string.
This is a suggestion field data from one of my document:
"suggestions": [
"Budget Car Rental",
"Car Rentals",
"Business Deals",
"Auto",
"Travel",
"Car Rentals" ]
If i type for a "car" i am getting "Budget Car Rental" in my suggestion but not "Car Rentals", below are my configurations. let me know if i need to change the tokenizer and filters.Any help in this would be appreciate.
Below is my code block as per explained the scenario above.
Suggestion field,fieldType,searchComponent and request handler respectively which i am using for auto complete suggestions
<!--suggestion field -->
<field name="suggestions" type="suggestType" indexed="true" stored="true" multiValued="true"/>
<copyField source="discountCatName" dest="suggestions"/>
<copyField source="discountSubName" dest="suggestions"/>
<copyField source="vendorName" dest="suggestions"/>
<!--suggest fieldType -->
<fieldType name="suggestType" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>
<!--suggest searchComponent configuration -->
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">analyzing</str>
<str name="lookupImpl">BlendedInfixLookupFactory</str>
<str name="suggestAnalyzerFieldType">suggestType</str>
<str name="blenderType">linear</str>
<str name="minPrefixChars">1</str>
<str name="doHighlight">false</str>
<str name="weightField">score</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">suggestions</str>
<str name="buildOnStartup">true</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<!--suggest request handler -->
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">analyzing</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>

I just discovered by debugging Solr 4.10 source code there is a bug in DocumentDictionaryFactory lookup, it's always look in the first string incase of multi-valued field and then stop suggestion from that document hence i am not getting expected output from my above configuration.
I have a created a separate index for all the fields i want to apply search like catName0...catName10, subName0...subName10 and then created multiple suggestion dictionaries for each fields and lastly i parsed the response form all the suggestion dictionary merged them and sorted based on weight and highlight position.
Lengthy approach but no other way as this solr 4.10 was required.

Related

Solr - Suggester custom field not detected & Multiple fields cannot work

Goal
Trying to implement an auto-suggester in Solr. Fields to extra suggestions from are title and content fields.
Progress thus far
I followed the official Solr guide to implement the feature, however, was stuck for a long time, as it was complaining that the custom field suggestType was not defined.
After a long time of trying I decided to add the field type to managed-schema.xml instead of schema.xml and it worked!
Thus far, it only worked when I based the suggestion field off content, however, we would like to use 2 fields to base suggestions of which is title and content.
Steps followed
1) Add custom field type in managed-schema xml:
<fieldType name="suggestType" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
2) Add custom field which uses custom field type in schema.xml:
<field name="suggestText" type="suggestType" stored="true" indexed="true" />
3) Add 'suggest' handler in solr-config.xml:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">fuzzySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="storeDir">fuzzy_suggestions</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">suggestText</str>
<str name="suggestAnalyzerFieldType">suggestType</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.dictionary">analyzingSuggester</str>
<str name="suggest.onlyMorePopular">true</str>
<str name="suggest.count">10</str>
<str name="suggest.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
4) Copy both fields 'title' and 'content' to 'suggestText' in schema.xml:
<copyField source="title" dest="suggestField"/>
<copyField source="content" dest="suggestField"/>
Questions
Why does it only work when I add the custom field type to managed-schema.xml instead of schema.xml? From my understanding, managed-schema.xml should not be manually edited.
No results seem to appear when after I map both title and content field to the custom field textSuggest. I would like to know what I am missing.
Thanks.
It seems like you have a typo in your copy-field definition. The "dest" attribute is suggestField but the field you created earlier is called suggestText.

SOLR 6.4.1 Suggester is stubbornly case-sensitive, how to make case-insensitive?

I've tried everything under the sun (well it is called solr after all) to make solr Suggest case-insensitive, but it stubbornly continues to be case-sensitive.
This returns a suggestion of Mexican:
http://localhost:8983/solr/mycollection/autocomplete?suggest.q=Mex
This returns 0 results:
http://localhost:8983/solr/mycollection/autocomplete?suggest.q=mex
To further diagnose I tried a lower case /select search against my suggestions field, which successfully returned docs containing "Mexican":
http://localhost:8983/solr/mycollection/select?q=suggestions:mex*
But no such luck using lowercase with the Suggester. It's as though my <filter class="solr.LowerCaseFilterFactory"/> has no effect when used by the Suggester.
I of course did a full config upload, collection reload, data re-index, and suggester rebuild before testing. I'm on SOLR 6.4.1 running in cloud mode. Any ideas? Diagnostic tips?
schema.xml
<fieldType name="textSuggest" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="recipe" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="suggestions" type="textSuggest" indexed="true" stored="true" multiValued="true" />
<copyField source="recipe" dest="suggestions"/>
solrconfig.xml
<searchComponent class="solr.SuggestComponent" name="suggest">
<lst name="suggester">
<str name="name">foodsuggester</str>
<str name="lookupImpl">WFSTLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">suggestions</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
<str name="storeDir">suggester_wfst_dir</str>
<str name="suggestAnalyzerFieldType">textSuggest</str>
</lst>
</searchComponent>
<requestHandler name="/autocomplete" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.dictionary">foodsuggester</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
The WFSTLookupFactory apparently does not take the suggestAnalyzerFieldType parameter and it is ignored. You could use the AnalyzingLookupFactory, which will analyze the text according to the suggestAnalyzerFieldType. So if you only want the lower case to be analyzed in the suggester you can use the suggestAnalzerFieldType, and indicate that you want to use the suggestText field type for analysis through the suggestAnalyzerFieldType.
It seems the WFSTLookupFactory lookup implmentation is case sensitive.
You can use FuzzyLookupFactory, if you don't have any specific reason for using WFSTLookupFactory.
<str name="lookupImpl">FuzzyLookupFactory</str>

Spellcheck Solr: solr.DirectSolrSpellChecker config

I am trying to test the spellchecking functionality with Solr 4.7.2 using solr.DirectSolrSpellChecker (where you don't need to build a dedicated index).
I have a field named "title" in my index; I used a copy field definition to create a field named "title_spell" to be queried for the spellcheck (title_spell is correctly filled). However, in the admin solr admin console, I always get empty suggesions.
For example: I have a solr document with the title "A B automobile"; I enter in the admin console (spellcheck crossed and under the input field spellcheck.q) "atuomobile". I expect to get at least something like "A B automobile" or "automobile" but the spellcheck suggestion remains empty...
My configuration:
schema.xml (only relevant part copied):
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="de_DE/synonyms.txt" ignoreCase="true"
expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
</fieldType>
...
<field name="title_spell" type="textSpell" indexed="true" stored="true" multiValued="false"/>
solr.xml (only relevant part copied):
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">title_spell</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="distanceMeasure">internal</str>
<float name="accuracy">0.5</float>
<int name="maxEdits">2</int>
<int name="minPrefix">1</int>
<int name="maxInspections">5</int>
<int name="minQueryLength">4</int>
<float name="maxQueryFrequency">0.01</float>
<float name="thresholdTokenFrequency">.01</float>
</lst>
</searchComponent>
...
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="echoParams">explicit</str>
</lst>
<!--Versuch, das online datum mit in die Gewichtung zu nehmen...-->
<lst name="appends">
<str name="bf">recip(ms(NOW/MONTH,sort_date___d_i_s),3.16e-11,50,1)</str>
<!--<str name="qf">title___td_i_s_gcopy^1e-11</str>-->
<str name="qf">title___td_i_s_gcopy^21</str>
<str name="q.op">AND</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
What did I miss? Thanks for your answers!
How large is your index? For a small index (think less than a few million docs), you're going to have to tune accuracy, maxQueryFrequency, and thresholdTokenFrequency. (Actually, it would probably be worth doing this on larger indices as well.)
For example, my 1.5 million doc index uses the following for these settings:
<float name="maxQueryFrequency">0.01</float>
<float name="thresholdTokenFrequency">.00001</float>
<float name="accuracy">0.5</float>
accuracy tells Solr how accurate a result needs to be before it's considered worth returning as a suggestion.
maxQueryFrequency tells Solr how frequently the term needs to occur in the index before it's can be considered worth returning as a suggestion.
thresholdTokenFrequency tells Solr what percentage of documents the term must be included in before it's considered worth returning as a suggestion.
If you plan to use spellchecking on multiple phrases, you may need to add a ShingleFilter to your title_spell field.
Another thing you might try is setting your queryAnalyzerFieldType to title_spell.
Can you please try editing your requestHandler declaration.
<requestHandler name="/standard" class="solr.SearchHandler" default="true">
and query url as:
http://localhost:8080/solr/service/standard?q=<term>&qf=title_spell
First experiment with small terms and learn how it is behaving. One problem here is it will only return all the terms starting with the same query term. You can use FuzzyLookupFactory which will match and return fuzzy result. For more information check solr suggester wiki.

Solr Spell Check

I am working with Solr Spell Check . Got it up and running . However for certain misspells it is not giving the expected result :
Correct Word : Cancer
Incorrect Spelling : Cacner ,cacnar , cancar ,cancre,cancere .
I am not getting "cancer" as the suggestion for "cacnar" instead it shows "inner" which although sounds more like cacner is not the correct suggestion . And for cacnar again I am getting a suggestion as 'pulmonary'.
Any way of configuring it to display cancer instead of the other results ?
Alternatively is there any score for the suggestions that can be referred to before showing it to the user ?
As per request here is the configuration :
The field used for dictionary (in schema.xml):
<copyField source="procname" dest="dtextspell" />
<field name = "dtextspell" stored="false" type="text_small" multiValued="true" indexed="true"/>
Definition of "text_small" (again in schema.xml) :
<fieldType name="text_small" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
<analyzer type ="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
</fieldType>
In solrconfig.xml :
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_small</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="field">dtextspell</str>
<float name="thresholdTokenFrequency">.0001</float>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="field">name</str>
<str name="buildOnCommit">true</str>
</lst></searchComponent>
Attached it to the select request handler like this :
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="spellcheck.count">10</str>
<str name="df">text</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr> </requestHandler>
To build the spell check :
http://localhost:8080/solr/select?q=*:*&spellcheck=true&spellcheck.build=true
To search for term :
http://localhost:8080/solr/select?q=procname:%22cacner%22&spellcheck=true&defType=edismax
The response XML :
<lst name="spellcheck"><lst name="suggestions">
<lst name="cacner">
<int name="numFound">1</int>
<int name="startOffset">10</int>
<int name="endOffset">16</int>
<arr name="suggestion">
<str>inner</str> <end tags start from here>
Hope it helps !!
Sounds like you've not rebuilt the spellchecker's index recently. Request a manual update by make a query with spellcheck=true&spellcheck.build=true appended to the query string (do NOT do this on every request, as the build process can take some time). You should also make sure that you're using the correct field to build your spellchecker's index.
You can also configure the spellchecker component to rebuild the index on every commit or on every optimize, by adding:
<str name="buildOnCommit">true</str>
or
<str name="buildOnOptimize">true</str>
to your spellchecker configuration.

solr spellchecker with phonetic filters

I have tried to use phonetic filters for the field that indexes spellings (solr 1.4). Following is the fieldType configuration in schema.xml
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
</analyzer>
</fieldType>
However i do not see any difference when phonetic filter is used (size of the spellchecker index remains same and no difference in corrections). Are phonetic filters ignored when used with spellCheckers or is there any issue with my configuration?
solrConfig.xml
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck">true</str>
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">loc_name_texts</str>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
UPDATE:
I have initially configured filters wrongly so WhitespaceTokenizer was being used all the time. I have corrected that now.. However, when phonetic filters are used, solr returns the transformed data (metaphones). Is there anyway to get the content stored as part of the field?
phonetic filters in solr are not used to return a corrected suggestion. they are used to match a document even if the query is spelled wrong.
the spellcheck component is used to return a corrected suggestion, but works only on fields with whole words, not phonetic fields.
try changing 'spellcheck' element to this
<bool name="spellcheck">true</bool>

Resources