Solr auto suggest (suggester) returns no results - solr

I am quite new to using solr5.1 - and I am now playing around to see if I can get the autosuggest to work. So, the background is that I have a core named docs and I post documents to this collection using something along the lines of:
/bin/post -c docs /path/to/my/docs/*.pdf
..and I have the documents indexed. Now, I tweak the solrconfig.xml, located within my core directory to include autosuggest. So, borrowing some code from here, I include the following:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="storeDir">suggester_fuzzy_dir</str>
<!-- Substitute these for the two above for another "flavor"
<str name=”lookupImpl”>AnalyzingInfixLookupFactory</str>
<str name=”indexPath”>suggester_infix_dir</str>
-->
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">_text_</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">mySuggester</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Now, I do:
http://localhost:8983/solr/docs/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&wt=json&suggest.q=mode
and I always seem to get:
{"responseHeader":
{"status":0,"QTime":19},
"command":"build","suggest":
{"mySuggester":
{"mode":
{"numFound":0,"suggestions":[]}
}
}
}
I am not entirely sure what is it that I am doing wrong - I was wondering if I should be modifying the solrconfig.xml before adding any docs? or was it something wrong with the snippet of code inserted into it? Also, I am not sure if this line:
<str name="suggestAnalyzerFieldType">string</str>
is correct. I read that use of string should not be used - but again, I see the solr docs using it.
I have also tried code from here, but it seems to give me the same results - i.e no suggested words.
Any help on this would be great.
UPDATE: #01:
I have recently been reading that some features of solr need stored=true to work. I am now wondering if this applies to Suggester. So, do I need to turn the text field in solr.xml to stored=true?

Related

solr spellcheck not returning any suggestions

My spellcheck originally was working fine and then recently i noticed somewhere in past few months it stopped working. Spend huge amount of time to find what went wrong but no clue. Any help is much appreciated.
Below are my handler config
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.count">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
Below are my spellcheck component config
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">default</str>
<str name="classname">solr.FileBasedSpellChecker</str>
<str name="sourceLocation">mesh1.txt</str>
<str name="characterEncoding">UTF-8</str>
<str name="spellcheckIndexDir">./spellcheckerFile</str>
<str name="spellcheck.build">true</str>
I am able to see spellcheck is turned on in the solr query result. But it is not returning any suggestions
"spellcheck":{
"suggestions":[]}}
Problem was dictionary file did not got built. Because to build a file apart from adding "spellcheck.build> true" in config file, we also have to pass a url with command &spellcheck.build=true
This has to be done only once.

Solr Suggester - dynamic or passed at runtime field

Is it possible to have dynamic field or pass field for suggestions at runtime (in query for example) for SuggestComponent?
Depending on user's language I would like to suggest him different things. I have dynamic field name_* that has concrete fields: name_pl, name_de and name_en (can be more, I want to have flexibility here) and I would like to search for suggestions depending on language: for pl I want to get suggestions in name_pl, for en in name_en and so on.
So far I have standard Suggester with field specified:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">HighFrequencyDictionaryFactory</str>
<str name="">name_pl</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">false</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler"
startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
But actually I need either to use name_* or preferably at runtime to pass the field name for example: http://localhost:8983/solr/services/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&suggest.q=name&suggest.field=name_pl
How would you implement such mechanism?
It is not the answer you may expect but I started a comment and I ended up with this.
By using a dynamic field here you would have to rebuild the suggester at each query, I suggest ;) you require a specific suggestComponent' dictionary on query.
The value for field should remain static because it is parsed once to build a dictionary index from that field. Or you would have to delete/rebuild that index each time a suggest query requires a dictionary other than the one previously built.
Instead you should replicate the suggester definition for each language you may have so that Solr can build one dictionary index per field/language (just name the suggesters according to the target field language) :
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggest_nl</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">HighFrequencyDictionaryFactory</str>
<str name="field">name_pl</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">false</str>
</lst>
<lst name="suggester">
<str name="name">suggest_en</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">HighFrequencyDictionaryFactory</str>
<str name="field">name_en</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">false</str>
</lst>
<!-- etc. -->
</searchComponent>
Now you can query the target dictionary dynamically :
.../suggest?suggest=true&suggest.q=name&suggest.dictionary=suggest_nl
There is an easy way to do this, not sure if you are aware of it:
you create one dictionary per language: suggester_pl, suggester_en...each using the right field. They are all defined inside a single SuggestComponent
when calling, you select which one to hit with &suggest.dictionary=suggester_en
check the docs here

Solr suggester returning terms from deleted documents

I have a SolrCloud setup and I'm testing the suggestion component. I have several hundred documents in the index. I did not want some of the documents in the index because they contain gibberish (they were binary files that got improperly converted to text). I've removed them from the index, but the gibberish words from them are still showing up in the suggestions.
My suggest configuration looks like this:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">fuzzySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">HighFrequencyDictionaryFactory</str>
<str name="storeDir">suggester_fuzzy_dir</str>
<str name="field">dictionary_text</str>
<str name="suggestAnalyzerFieldType">phrase_suggest</str>
<str name="exactMatchFirst">true</str>
<float name="threshold">0.001</float>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.dictionary">fuzzySuggester</str>
<str name="suggest.onlyMorePopular">true</str>
<str name="suggest.count">5</str>
<str name="suggest.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Note that buildOnCommit is set to true. I also tried to remove them using a /suggest query with the suggest.build=true parameter, but that had no effect.
Is there something else required to remove terms from the dictionary?
Despite using expungeDeletes=true in the update, the deleted documents were still hanging around. Optimizing removed them and appears to have removed all the gibberish terms from suggestions.

Solr - Suggest Component with 2 different field types

Im having trouble finding a way how to have 2 differently structured fields in one suggest component. (https://cwiki.apache.org/confluence/display/solr/Suggester)
The goal is to have an autocomplete module with these fields.
A field where StandardTokenizer is used
example output: This is a title
A field where a Custom tokenizer is used (Basically a regex to get a base domain of a full URL)
example output: thisisatitle.com
Therefore the requesthandler containing the the suggestcomponent is able to show both strings in the results array: thisisatitle.com and This is a title
Things ive tried are:
Multiple suggestcomponents
Ive googled and the only solution ive currently found is using shards as they allow for different schemas to be combined. To my mind that is rather ineffective as running 2 servers would be a waste of resources and also maintainability would suffer.
Any suggestions/workarounds are welcome.
To use multiple suggestion dictionaries (that can have different analyzers applied), you can use the "multiple dictionaries" configuration as shown in the documentation:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">cat</str>
<str name="weightField">price</str>
<str name="suggestAnalyzerFieldType">string</str>
</lst>
<lst name="suggester">
<str name="name">altSuggester</str>
<str name="dictionaryImpl">DocumentExpressionDictionaryFactory</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="field">product_name</str>
<str name="weightExpression">((price * 2) + ln(popularity))</str>
<str name="sortField">weight</str>
<str name="sortField">price</str>
<str name="storeDir">suggest_fuzzy_doc_expr_dict</str>
<str name="suggestAnalyzerFieldType">text_en</str>
</lst>
</searchComponent>

How to optimize documentdictionary build on solr cloud suggester?

I have around 300,000 records to be uploaded on a solr cloud suggester. These records are dynamic i.e. new documents will be added and some document will be deleted in future on a regular basis. The problem I am facing is either:
Use FileDictionaryFactory: this method is an operational nightmare. I would need to keep generating the file and upload it to zookeeper (still haven't figured out how to upload huge file like this to zookeeper). And might need to create index on each server on the solr cloud separately. Doing this frequently does not seems possible.
Use DocumentDictionaryFactory: this method seems like an obvious choice, but building index here is a nightmare as well. Everytime I try to build index, I get the "No space left on the device" error. I tried building it on 5K records and it was successful. But it took 40 minutes and consumed all 10GB of memory during this entire 40 minutes.
My question is, can we optimize this index building time if we follow the second approach.
Or if I follow the first approach what should be the ideal way of dealing with frequent changes to be indexed on solr cloud.
my Configs:
For FileDictionaryFactory:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggestions</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">FileDictionaryFactory</str>
<str name="field">searchfield</str>
<str name="weightField">searchscore</str>
<str name="suggestAnalyzerFieldType">text_ngram</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
<str name="sourceLocation">spellings.txt</str>
<str name="storeDir">autosuggest_dict</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">suggestions</str>
<str name="suggest.dictionary">results</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
For DocumentDictionaryFactory:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggestions</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">searchfield</str>
<str name="weightField">searchscore</str>
<str name="payloadField">payload</str>
<str name="suggestAnalyzerFieldType">text_ngram</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
<str name="sourceLocation">spellings.txt</str>
<str name="storeDir">autosuggest_dict</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">suggestions</str>
<str name="suggest.dictionary">results</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
I think the main issue for the DocumentDictionaryFactory (this is my preferred option) is that you are using text_ngram. If your values are not very short, this will produce (I guess, you didn't share text_ngram definition) a very large FST, thus the time to create it.
Unless I am missing something, you don't need to do that, just use some type that tokenizes with StandardTokenizerFactory and suggestions should work.

Resources