Sole Suggester: AnalyzingInfixLookupFactory - Store Lookup build failed - solr

I have this configuration (with solr 5.3.1):
<searchComponent class="solr.SuggestComponent" name="suggest">
<lst name="suggester">
<str name="name">suggest</str>
<str name="storeDir">dict_suggest</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="highlight">false</str>
<str name="field">suggestion</str>
<str name="suggestAnalyzerFieldType">suggest</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="payloadField">id</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="suggest">true</str>
<str name="suggest.dictionary">suggest</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
The field in schema.xml is defined as <field name="suggestion" type="suggest" indexed="true" stored="true" required="true" multiValued="true" />.
The field type definition is this:
<fieldType name="suggest" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Each time I try to build the index, solr shows Store Lookup build failed
There's no dump or description in the logs.
Am I missing something in the config? The suggester seems to work fine, so the "in memory" index works fine.
Thanks

Related

Returning single word from Solr Suggester

I am developing a web application, and am using Solr as search engine. I would like to add autocomplete functionality. To do this, I have added the Suggester component, and configured a separate field for it. This works ok.
The problem is that Suggester returns the whole value of the field. For example, if the name of an article is "A newsworthy item" and I search for "new", it will return the whole "A newsworthy item", where I would like it to just return "newsworthy". In other words, return the individual word tokens.
The schema looks like this:
<fieldType name="text_autocomplete" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="term" type="text_autocomplete" indexed="true" stored="true" multiValued="false" />
<field name="weight" type="float" indexed="true" stored="true" />
<copyField source="name" dest="term"/>
The values are copied into the "term" field. The Solr config:
<!-- Search component -->
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggester</str>
<str name="lookupImpl">AnalyzingLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">term</str>
<str name="weightField">weight</str>
<str name="suggestAnalyzerFieldType">text_autocomplete</str>
<str name="buildOnStartup">false</str>
</lst>
</searchComponent>
<!-- Search handler -->
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">suggester</str>
<str name="suggest.build">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Can anyone suggest a schema and/or configuration that will make the Suggester return a single word?
Instead of solr.SuggestComponent try making use of solr.SpellCheckComponent.
As SuggestComponent is meant to suggest the full phrase.
You can look into the details of solr.SpellCheckComponent over here.
http://wiki.apache.org/solr/SpellCheckComponent
For you quick reference, you can try with this.
<searchComponent name="suggest" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
<str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
<str name="field">term</str>
<str name="accuracy">0.7</str>
<float name="thresholdTokenFrequency">.0001</float>
</lst>
</searchComponent>

Returning an entire Document on Solr Suggestion

Implemented a basic Solr Suggestion.I am able to get the suggested terms.
But is there a way to return entire SOLR Document based on the suggestion?
Here is the searchComponent and requestHandler in solr_config.xml.
<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory</str>
<str name="field">complete_search</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
The field and fieldType defintion in schema.xml are as follows.
<field name="complete_search" type="text_auto" indexed="true" stored="true" multiValued="true"/>
<fieldType class="solr.TextField" name="text_auto">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
The result I am getting is as follows:
<arr name="suggestion">
<str>global academy for learning</str>
<str>global art</str>
<str>global institute of fine arts</str>
<str>global kids</str>
<str>global music academy</str>
<str>global residential school</str>
<str>globetrippers</str>
<str>globetrotters</str>
<str>glorious kids</str>
<str>glow tennis academy</str>
</arr>
My query for is this http://localhost:8983/solr/core_name/suggest?q=glo
So is there a way to get output in the form of a SOLR Document as in
<doc>
<str name="id">35716</str>
<str name="PID">35716</str>
<str name="service_name">Cherubs Montessori</str>
<arr name="complete_search">
<str>Cherubs Montessori</str>
<str>Arts and Crafts</str>
<str>No 173, 9th Main Road, 7th Sector, HSR Layout</str>
<str>Bangalore</str>
<str>HSR Layout</str>
</arr>
<str name="permalink">http://zp.local/extracurricular-activities/cherubs-montessori-at-hsr-layout-in-bangalore</str>
<arr name="categories">
<str>Arts and Crafts</str>
</arr>
<float name="average_ratings">0.0</float>
<str name="lat_lng">12.9102859,77.6450215</str>
<str name="listing_thumbnail">/uploads/2015/09/Cherubs-Montessori-300x122.jpg</str>
<float name="maximum_age">14.0</float>
<float name="minimum_age">5.0</float>
<str name="address">No 173, 9th Main Road, 7th Sector, HSR Layout</str>
<str name="city">Bangalore</str>
<str name="locality">HSR Layout</str>
<long name="_version_">1514279153996660736</long></doc>
<doc>
It is not possible at the moment. You can send only one field in the payload attribute along with your suggestions. You can find more information here.

Solr4 - spellcheck issue with multi terms

I'm getting trouble with spell check.
If I send a request with "wrd", spellcheck give me suggestion I want : "word". But if I send a request with multiple terms, like "wrd black", spellcheck returns a correctlySpelled to true.
I want spellcheck suggestion : "word black".
Note that if I send a request with "wrd blck", spellcheck gives me suggestions I want ("word black").
I don't think this is a normal behaviour, but I can't find where is the problem.
Here is my solrconfig.xml :
<config>
<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
<lst name="defaults">
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">15</str>
<str name="spellcheck.maxCollations">10</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">spell</str>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="buildOnOptimize">true</str>
<str name="buildOnCommit">true</str>
<float name="thresholdTokenFrequency">.01</float>
</lst>
</searchComponent>
</config>
and in my schema.xml :
<field name="spell" type="textSpell" indexed="true" stored="false" multiValued="true" />
<copyField source="attr_*" dest="spell" />
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
</analyzer>
</fieldType>
Anyone has any ideas ?
There seems to be a bug when one of the query terms is spelled correctly and spellcheck configuration having maxCollationTries >1, i can not tell for sure its a bug , i am going through code to find out this.
Remove this config from your default params of your handler
<str name="spellcheck.maxCollationTries">15</str>
You can use this as query param as spellcheck.maxCollationTries=15 and try.

Solr dictionary based suggester won't suggest on whole phrase

When I enter a query containing multiple word to my Suggester component I got separated results for each. The problem is well explained here: How to have Solr autocomplete on whole phrase when query contains multiple terms?
The only difference is, I have a suggester based on a dictionary file, not an index field. The solution explained in the above link and many others didn't work
Here is the configuration:
<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory</str>
<str name="buildOnCommit">true</str>
<str name="suggestAnalyzerFieldType">text_suggest</str>
<str name="sourceLocation">suggestionsFull.txt</str>
</lst>
<str name="queryAnalyzerFieldType">text_suggest</str>
<!-- <queryConverter name="queryConverter" class="org.apache.solr.spelling.SuggestQueryConverter"/> -->
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.collate">false</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
schema.xml
<fieldType name="text_suggest" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.TurkishLowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>
</fieldType>
I also use spellcheck.q parameter instead of q
http://localhost:8983/solr/collection1/suggest?spellcheck.q=bu+bir&wt=json&indent=true
What am I doing wrong?
Finally I found the solution:
Looks like even if you build the suggestion dictionary from a file but not from an index field, you have to specify an index field in the solrconfig. Thus in the schema.xml create a dummy field from the text_suggest fieldtype which we had already created:
<field name="text_suggest" type="text_suggest" indexed="false" stored="false" />
Then in the solrconfig.xml add <str name="field">text_suggest</str> line to the searchComponent:
<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory</str>
<str name="buildOnCommit">true</str>
<str name="suggestAnalyzerFieldType">text_suggest</str>
<str name="field">text_suggest</str>
<str name="sourceLocation">suggestionsFull.txt</str>
</lst>
</searchComponent>
Restart the solr and you're done!

SolR : full sentence spellcheck

I'm trying to configure a spellchecker to autocomplete full sentences from my query.
I've already been able to get this results:
"american israel" :
-> "american something"
-> "israel something"
But i want :
"american israel" :
-> "american israel something"
This is my solrconfig.xml :
<searchComponent name="suggest_full" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">suggestTextFull</str>
<lst name="spellchecker">
<str name="name">suggest_full</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">text_suggest_full</str>
<str name="fieldType">suggestTextFull</str>
</lst>
</searchComponent>
<requestHandler name="/suggest_full" class="org.apache.solr.handler.component.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest_full</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.onlyMorePopular">true</str>
</lst>
<arr name="last-components">
<str>suggest_full</str>
</arr>
</requestHandler>
And this is my schema.xml:
<fieldType name="suggestTextFull" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
...
<field name="text_suggest_full" type="suggestTextFull" indexed="true" stored="false" multiValued="true"/>
I've read somewhere that I have to use spellcheck.q because q use the WhitespaceAnalyzer, but when I use spellcheck.q i get a java.lang.NullPointerException
Any ideas ?
If you spellcheck fields ( text_suggest_full ) contain american something and israel something so make sure, that there also exist an document/entry , with the value american israel something.
Solr will not merge american something and israel something to one term and will not apply the result to your spellchecking for american israel.
Wouldnt be there an autocomplete approach more suitable? See this article e.g.
You can use the suggester / a flexible "autocomplete" component;
you must have version 3.X of solr
SolrConfig.xml :
<searchComponent name="suggest" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">name_autocomplete</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Shema.xml
<field name="name_autocomplete" type="text" indexed="true" stored="true" multiValued="false" />
Add copyField
<copyField source="name" dest="name_autocomplete" />
Reload solr, reindex all and test :
http://localhost:8983/solr/suggest?q=&amerspellcheck=true&spellcheck.collate=true&spellcheck.build=true
Get something like :
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="ameri">
<int name="numFound">2</int>
<int name="startOffset">0</int>
<int name="endOffset">2</int>
<arr name="suggestion">
<str>american morocco</str>
<str>american morocco something</str>
</arr>
</lst>
<str name="collation">american morocco something</str>
</lst>
</lst>
</response>
Hope that help
Cheers
IMHO, a problem with the spellcheck component is that each word is spell checked against the full index.
The "collation" of the spell checked words does not neccesary match an single document within the index, but might come from separate indexed documents.

Resources