solr spellcheck not returning any suggestions - solr

My spellcheck originally was working fine and then recently i noticed somewhere in past few months it stopped working. Spend huge amount of time to find what went wrong but no clue. Any help is much appreciated.
Below are my handler config
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.count">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
Below are my spellcheck component config
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">default</str>
<str name="classname">solr.FileBasedSpellChecker</str>
<str name="sourceLocation">mesh1.txt</str>
<str name="characterEncoding">UTF-8</str>
<str name="spellcheckIndexDir">./spellcheckerFile</str>
<str name="spellcheck.build">true</str>
I am able to see spellcheck is turned on in the solr query result. But it is not returning any suggestions
"spellcheck":{
"suggestions":[]}}

Problem was dictionary file did not got built. Because to build a file apart from adding "spellcheck.build> true" in config file, we also have to pass a url with command &spellcheck.build=true
This has to be done only once.

Related

Solr Error: QueryComponent.mergeIds(QueryComponent.java:895) in custom request handler

I'm using Solr 8.4.0 and i tried to make a search request handler that only return a specific sets of field in a collection, without anyone can change what fields to be displayed.
Here is how the request handler looks like
<requestHandler class="solr.SearchHandler" name="/search">
<arr name="components">
<str>query</str>
<str>facet</str>
</arr>
<lst name="defaults">
<int name="rows">10</int>
<str name="wt">json</str>
<str name="q.alt">*:*</str>
</lst>
<lst name="invariants">
<str name="facet">true</str>
<str name="facet.mincount">1</str>
<str name="fl">_uniqueid</str>
<str name="fl">document_title_t</str>
<str name="fl">document_title_string_s</str>
<str name="fl">document_shortsummary_t</str>
<str name="fl">page_url_s</str>
<str name="fl">topic_path</str>
<str name="fl">itemid_s</str>
<str name="echoParams">none</str>
<str name="omitHeader">true</str>
</lst></requestHandler>
After making the collection and trying the request handler, i received
this error
It seems this issue only happens when we are using multiple shards, changing the collection to a single shard removes the error, but we need to have multiple shards for this collection later on production. We are using 2 shards and 3 replicas
I have managed to solve this issue. By going through solr's code from repository in github, i found out that at queryComponent.java line 895 it's trying to access a certain header. After removing the omitHeader invariant the request handler seems to work perfectly

Solr suggester returning terms from deleted documents

I have a SolrCloud setup and I'm testing the suggestion component. I have several hundred documents in the index. I did not want some of the documents in the index because they contain gibberish (they were binary files that got improperly converted to text). I've removed them from the index, but the gibberish words from them are still showing up in the suggestions.
My suggest configuration looks like this:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">fuzzySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">HighFrequencyDictionaryFactory</str>
<str name="storeDir">suggester_fuzzy_dir</str>
<str name="field">dictionary_text</str>
<str name="suggestAnalyzerFieldType">phrase_suggest</str>
<str name="exactMatchFirst">true</str>
<float name="threshold">0.001</float>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.dictionary">fuzzySuggester</str>
<str name="suggest.onlyMorePopular">true</str>
<str name="suggest.count">5</str>
<str name="suggest.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Note that buildOnCommit is set to true. I also tried to remove them using a /suggest query with the suggest.build=true parameter, but that had no effect.
Is there something else required to remove terms from the dictionary?
Despite using expungeDeletes=true in the update, the deleted documents were still hanging around. Optimizing removed them and appears to have removed all the gibberish terms from suggestions.

How to optimize documentdictionary build on solr cloud suggester?

I have around 300,000 records to be uploaded on a solr cloud suggester. These records are dynamic i.e. new documents will be added and some document will be deleted in future on a regular basis. The problem I am facing is either:
Use FileDictionaryFactory: this method is an operational nightmare. I would need to keep generating the file and upload it to zookeeper (still haven't figured out how to upload huge file like this to zookeeper). And might need to create index on each server on the solr cloud separately. Doing this frequently does not seems possible.
Use DocumentDictionaryFactory: this method seems like an obvious choice, but building index here is a nightmare as well. Everytime I try to build index, I get the "No space left on the device" error. I tried building it on 5K records and it was successful. But it took 40 minutes and consumed all 10GB of memory during this entire 40 minutes.
My question is, can we optimize this index building time if we follow the second approach.
Or if I follow the first approach what should be the ideal way of dealing with frequent changes to be indexed on solr cloud.
my Configs:
For FileDictionaryFactory:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggestions</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">FileDictionaryFactory</str>
<str name="field">searchfield</str>
<str name="weightField">searchscore</str>
<str name="suggestAnalyzerFieldType">text_ngram</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
<str name="sourceLocation">spellings.txt</str>
<str name="storeDir">autosuggest_dict</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">suggestions</str>
<str name="suggest.dictionary">results</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
For DocumentDictionaryFactory:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggestions</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">searchfield</str>
<str name="weightField">searchscore</str>
<str name="payloadField">payload</str>
<str name="suggestAnalyzerFieldType">text_ngram</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
<str name="sourceLocation">spellings.txt</str>
<str name="storeDir">autosuggest_dict</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">suggestions</str>
<str name="suggest.dictionary">results</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
I think the main issue for the DocumentDictionaryFactory (this is my preferred option) is that you are using text_ngram. If your values are not very short, this will produce (I guess, you didn't share text_ngram definition) a very large FST, thus the time to create it.
Unless I am missing something, you don't need to do that, just use some type that tokenizes with StandardTokenizerFactory and suggestions should work.

Solr auto suggest (suggester) returns no results

I am quite new to using solr5.1 - and I am now playing around to see if I can get the autosuggest to work. So, the background is that I have a core named docs and I post documents to this collection using something along the lines of:
/bin/post -c docs /path/to/my/docs/*.pdf
..and I have the documents indexed. Now, I tweak the solrconfig.xml, located within my core directory to include autosuggest. So, borrowing some code from here, I include the following:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="storeDir">suggester_fuzzy_dir</str>
<!-- Substitute these for the two above for another "flavor"
<str name=”lookupImpl”>AnalyzingInfixLookupFactory</str>
<str name=”indexPath”>suggester_infix_dir</str>
-->
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">_text_</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">mySuggester</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Now, I do:
http://localhost:8983/solr/docs/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&wt=json&suggest.q=mode
and I always seem to get:
{"responseHeader":
{"status":0,"QTime":19},
"command":"build","suggest":
{"mySuggester":
{"mode":
{"numFound":0,"suggestions":[]}
}
}
}
I am not entirely sure what is it that I am doing wrong - I was wondering if I should be modifying the solrconfig.xml before adding any docs? or was it something wrong with the snippet of code inserted into it? Also, I am not sure if this line:
<str name="suggestAnalyzerFieldType">string</str>
is correct. I read that use of string should not be used - but again, I see the solr docs using it.
I have also tried code from here, but it seems to give me the same results - i.e no suggested words.
Any help on this would be great.
UPDATE: #01:
I have recently been reading that some features of solr need stored=true to work. I am now wondering if this applies to Suggester. So, do I need to turn the text field in solr.xml to stored=true?

Solr and spellcheck component : spellcheck.q doesn't take into consideration

I use spellcheck component and when I request solr I have results. But if I use spellcheck.q, i haven't result.
Someone has an idea ?
Thanks
<!-- The spell check component can return a list of alternative spelling
suggestions. -->
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">spellCheck</str>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="buildOnCommit">true</str>
<str name="accuracy">0.4</str>
<float name="thresholdTokenFrequency">.0004</float>
</lst>
</searchComponent>
<!--<queryConverter name="queryConverter" class="solr.SpellingQueryConverter"/>-->
<!-- Handler par défaut -->
<requestHandler name="default" class="solr.SearchHandler" lazy="true" default="true">
<lst name="defaults">
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">10</str>
<str name="hl.usePhraseHighLighter">true</str>
<str name="hl.highlightMultiTerm">true</str>
<str name="hl.mergeContiguous">true</str>
</lst>
<arr name="last-components">
<str>highlight</str>
<str>spellcheck</str>
</arr>
</requestHandler>
Have you added your spellcheck component to the corresponding request handler (in solr config), set spellcheck parameter to true (or on) and configured the correct dictionary to use (if its name different than "default")?
If you don't use the spellcheck.q parameter, then the default is to use the q parameter (from http://wiki.apache.org/solr/SpellCheckComponent#q_OR_spellcheck.q). From that wiki:
Essentially, if you have a spelling "ready" version in your application, then it is probably better to send spellcheck.q, otherwise, if you just want Solr to do the job, use the q parameter
The reason that it works if you change the definition of the field type is probably due to the new field type being "spelling ready". It would help if you posted the query you are using and the relevant lines in the schema.xml.

Resources