I've followed the solr wiki article for suggester almost to the T here: http://wiki.apache.org/solr/Suggester. I have the following xml in my solrconfig.xml:
<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">description</str>
<float name="threshold">0.05</float>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
However, when I run the following query (or something similar):
../suggest/?q=barbequ
I only get the following result xml back:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">78</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions"/>
</lst>
</response>
As you can see, this isn't very helpful. Any suggestions to help resolve this?
A couple of things I can think of that might cause this problem:
The source field ("description") is incorrect - ensure that this is indeed the field that seeds terms for your spell checker. It could even be that the field is a different case (eg. "Description" instead of "description").
The source field in your schema.xml is not set up correctly or is being processed by filters that cause the source dictionary to be invalid. I use a separate field to seed the dictionary, and use <copyfield /> to copy relevant other fields to that.
The term "barbeque" doesn't appear in at least 5% of records (you've indicated this requirement by including <float name="threshold">0.05</float>) and therefore is not included in the lookup dictionary
In SpellCheckComponent the <str name="spellcheck.onlyMorePopular">true</str> setting means that only terms that would produce more results are returned as suggestions. According to the Suggester documentation this has a different function (sorting suggestions by weight) but it might be worth switching this to false to see if it is causing the issue.
Relevant parts of my schema.xml:
<schema>
<types>
<!-- Field type specifically for spell checking -->
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.StandardFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.StandardFilterFactory" />
</analyzer>
</fieldType>
</types>
<fields>
<field name="spell" type="textSpell" indexed="true" stored="false" multiValued="true" />
</fields>
<!-- Copy fields which are used to seed the spell checker -->
<copyField source="name" dest="spell" />
<copyField source="description" dest="spell" />
<schema>
Could the problem be that you're querying /suggest instead of /spell
../suggest/?q=barbequ
In my setup this the string I pass in:
/solr/spell?q=barbequ&spellcheck=true&spellcheck.collate=true
And the first time you do a spellcheck you need to include
&spellcheck.build=true
I'm running on solr 4 btw. So, perhaps /suggest is an entirely different endpoint that does something else. If so, apologize.
Please check, if the term-parameter are set in the schema.xml, like:
<field name="TEXT" type="text_en" indexed="true" stored="true" multiValued="true"
termVectors="true"
termPositions="true"
termOffsets="true"/>
...restart solr and reindex again
Related
I am new to Solr and trying to provide partial word matching with Solr 8.8.1, but partials are giving no results. I have combed the blogs without luck to fix this.
For example, the text of the document contains the word longer. Index analysis gives lon, long, longe, longer. If I query longer using alltext_en:longer, I get a match. However, if I query (for example) longe using alltext_en:longe, I get no match. explainOther returns 0.0 = No matching clauses.
It seems that I am missing something obvious, since this is not a complex phrase query.
Apologies in advance if I have missed any needed details - I will update the question if you tell me what else is needed to know.
Here are the relevant field specs from my managed-schema:
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="15" minGramSize="3"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<dynamicField name="*_txt_en" type="text_en" indexed="true" stored="true"/>
<field name="alltext_en" type="text_en" multiValued="true" indexed="true" stored="true"/>
<copyField source="*_txt_en" dest="alltext_en"/>
Here is the relevant part of solrconfig.xml:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<!-- Query settings -->
<str name="defType">edismax</str>
<str name="q">*:*</str>
<str name="q.alt">*:*</str>
<str name="rows">50</str>
<str name="fl">*,score,[explain]</str>
<str name="ps">10</str>
<!-- Highlighting defaults -->
<str name="hl">on</str>
<str name="hl.fl">_text_</str>
<str name="hl.preserveMulti">true</str>
<str name="hl.encoder">html</str>
<str name="hl.simple.pre"><span class="artica-snippet"></str>
<str name="hl.simple.post"></span></str>
<!-- Spell checking defaults -->
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.alternativeTermCount">2</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">5</str>
<str name="spellcheck.maxCollations">3</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
That stemming filter will modify the tokens in ways you don't predict - and since they only happen on the token you try to match agains the ngrammed tokens when querying, the token might not be what you expect). If you're generating ngrams, stemming filters should usually be removed. I'd also remove the possessive filter (Also, small note - try to avoid using * when formatting text, since it's hard to know if you've used it when querying and the formatting is an error - instead use a backtick to indicate that the text is a code keyword/query.) – MatsLindh
That answered it - I removed the stemmer from the index step and everything was fine. Brilliant, thank you, #MatsLindh!
I am using lucene for indexing and solr for searching and having below requirements
example: “Test Five”
Highest priority - Words having both “Test” and “Five”
Next - Words having only left most word “Test”
Next – Next word in the list –“Five” (left to right) etc etc
Please find my schema
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="acSearch" type="searchFieldType" required="false" indexed="true" stored="false" multiValued="true" />
<copyField source="name" dest="acSearch" />
<fieldType name="searchFieldType" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.PatternTokenizerFactory" pattern="[,]+" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.PatternTokenizerFactory" pattern="[,]+" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer></fieldType>
solrconfig.xml
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="wt">json</str>
<str name="indent">true</str>
<str name="fl">name</str>
<str name="rows">200</str>
<str name="df">dySearch</str>
<str name="sort">score desc</str>
</lst>
<arr name="components">
<str>query</str>
</arr>
I am not getting proper out put if i search
1.Test Five in response Test Five is coming first next not from left to right in any order data is coming
If i search Five Test related data is coming first .
Please could you help ....
you can just give a different boost to each of the terms when you build your query string:
q=Test^10 Five^3 last^1
With this, you don't need to mess with ngram etc, use just some standard analyzer
We got the problem that we get spellchecking results that are technically correct but not suitable for the context of the input term.
For example the user searches for "ventilator" and the spellchecker returns "vibrator" as the corrected term.
We could remove the value "vibrator" from the possible results but if someone misspells "vibrator" we should return the corrected term.
Is it possible to exclude specific mappings (e.g. "ventilator" > "vibrator")?
The current config:
solrconfig.xml:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_spell</str>
<lst name="spellchecker">
<str name="name">de</str>
<str name="field">spellcheck_de</str>
<str name="buildOnCommit">true</str>
<str name="buildOnOptimize">true</str>
</lst>
And the Field config from schema.xml:
<fieldType name="text_spell_de" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>
you can stuff like 'exclude terms that are less frequent than X' on the index, and the like. But if you want to 'exclude term X when serving suggestions for term Y only' then no, you can't.
I am using SOLR 4.9.0 with the following configuration (I am including only the part I consider relevant to the question):
<field name="content" type="text" indexed="true" stored="false"
termVectors="true" multiValued="false" />
<fieldType name="text" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
I can do proximity search for a term being close to another term:
content:"very suggestion"~100
I need to add the functionality of being able to search for a term being close to a number token, such as in:
content:"very [0.01 TO 0.99]"~100
content:"very [100 TO 1000000]"~100
Is there a tokenizer that already provides this functionality?
If not, what would roughly be the steps in order to adapt the standard tokenizer to be able to do that?
Any speculations on what the effect on the index structure, size, and indexing/searching speed would be?
EDIT:
I think that the following SOLR configuration is actually also relevant to my question:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">id</str>
<str name="wt">json</str>
<str name="indent">true</str>
<str name="fl">* score</str>
</lst>
</requestHandler>
More than two years later, I found the answer to my question :)
By using the
https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-ComplexPhraseQueryParser
one can do:
{!complexphrase inOrder=false}content:"fee [100 10000]"~10
I am trying to implement auto complete feature using Solr 5.3.0
solrconfig.xml looks like this
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">default</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">suggest_ngram</str>
<str name="weightField">price</str>
<str name="suggestAnalyzerFieldType">text_suggest_ngram</str>
<str name="buildOnStartup">true</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
managed-schema looks like this:
<fieldType name="text_suggest_ngram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="10" minGramSize="2" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="suggest_ngram" type="text_suggest_ngram" indexed="true" stored="false"/>
<field name="name" type="string" multiValued="false" indexed="true" stored="true"/>
<field name="price" type="tlong" multiValued="false" indexed="true" stored="true"/>
<copyField source="name" dest="suggest_ngram"/>
Now when I use the analyzer from the admin panel of Solr, I can see the indexed ngrams. And it successfully points out the match.
However when I use the query:
http://localhost:8983/solr/products/suggest?suggest=true&suggest.build=true&wt=json&suggest.q=Jind
I get 0 suggestions.
The response is here:
https://api.myjson.com/bins/47r3i
There exists a value "Jindal Panther" for the name key in one of the docs.
Moreover, I have found that if I create a dummy copyfield "suggest" with type as "String", with source as "name", any suggestion that works fine on "name" will not work on "suggest". Can this be any misconfiguration of copyfield to enable suggestions?
Any help would be appreciated.
Thanks in advance.
EDIT:
Got the solution. See the accepted answer and its comments below.
There is a blog that I encountered that beautifully explains Suggesters. It is definitely worth reading for a newbie to Solr Search.
https://lucidworks.com/blog/2015/03/04/solr-suggester/
The field on which you want to configure the suggester should be store=true. It need not to be indexed. The suggester configuration will build a dictionary according to the provide configuration in the suggestComponet. The name field have stored as true where as suggest_ngram is not. You need to update the schema configuration like this:
<field name="suggest_ngram" type="text_suggest_ngram" indexed="false" stored="true"/>
Also you need to provide the parameter suggest.dictionary, the dictionary you are using for suggestions. For you it is names as default.
http://localhost:8983/solr/products/suggest?suggest=true&
suggest.build=true&
wt=json&
suggest.dictionary=default&
suggest.q=Jind
OR you can provide the dictionary configuration in requestHandler of /suggest:
<str name="suggest.dictionary">default</str>