query on suggester while build dictionary - solr

My suggester conf:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">titleSuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="field">name</str>
<str name="suggestAnalyzerFieldType">text_pt</str>
<str name="payloadField">type</str>
<str name="weightField">weightField</str>
<str name="buildOnCommit">false</str>
<str name="buildOnStartup">false</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="indexPath">/home/dev/suggestions</str>
</lst>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">titleSuggester</str>
<str name="suggest.onlyMorePopular">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
It's work! But, i neeed build my dictionary every hour, and this build takes 2 minutes.
Every hour i run:
localhost:8983/solr/AutoComplete/suggest?suggest.q=term&suggest.build=true
During this time i need get results, but when i run a query as:
localhost:8983/solr/AutoComplete/suggest?suggest.q=term
i get this return(because build is running):
<response>
<lst name="responseHeader">
<int name="status">500</int>
<int name="QTime">5</int>
</lst>
<lst name="error">
<str name="msg">suggester was not built</str>
What can I do to get results while the build is running?

This question is quite old, but I have the same problem (my rebuild may run an hour) and I came to this solution:
Configure two components, e.g. suggest_A and suggest_B with different indexPath values.
Configure two request handlers, e.g. suggest and suggest_Rebuild.
Assign suggest_A to suggest and suggest_B to suggest_Rebuild.
Do the rebuild on the suggest_Rebuild handler. After the rebuild is finished, switch the component assignment of both components via the config API (update-requesthandler).
The drawback of this solution is that you need the double disk space.

Related

Solr Suggester taking too long to provide response

I am using Solr Suggester to provide suggestion in the search page of our application. But every suggestion request to Solr is taking too long to send response. I have tried with multiple lookup Impl such as AnalyzingLookupFactory, AnalyzingInfixLookupFactory, FuzzyLookupFactory etc.
Below is my configuration:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">spell_suggest</str>
<str name="weightField">spell_suggest</str>
<str name="suggestAnalyzerFieldType">text_general</str>
<str name="buildOnStartup">false</str>
</lst>
<lst name="suggester">
<str name="name">altSuggester</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="field">spell_suggest</str>
<str name="weightField">spell_suggest</str>
<str name="suggestAnalyzerFieldType">text_general</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<!--<str name="suggest.dictionary">mySuggester</str> -->
<str name="suggest.dictionary">altSuggester</str>
<str name="suggest">true</str>
<str name="suggest.count">6</str>
<str name="spellcheck">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
The response, with just 42000 indexed documents, is taking close to 5 to 7 seconds to provide response. This is impacting the functionality badly in the application
Following is my request: http://<myIP>:8983/solr/mycollection/suggest?df=spell_suggest&suggest=true&suggest.build=true&q=Vendor
Please suggest if I need to provide few more configurations or need to modify existing configurations to improve performance.
Thanks!
When you're issuing suggest.build each time, you're effectively asking for the suggestion index to be rebuilt from scratch each time you're querying the suggester.
It should only be rebuilt after changes if necessary (depending on which dictionaryImpl you're using).

Solr Suggest Component and OutOfMemory Error

I am using the Solr Suggester component in Solr 5.5 with a lot of address data. My Machine has allotted 20Gb RAM for solr and the machine has 32GB RAM in total.
I have an address book core with the following vitals -
"numDocs"=153242074
"segmentCount"=34
"size"=30.29 GB
My solrconfig.xml looks something like this -
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester1</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="storeDir">suggester_fuzzy_dir</str>
<!-- Substitute these for the two above for another "flavor"
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name=?indexPath?>suggester_infix_dir</str>
-->
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">site_address</str>
<str name="suggestAnalyzerFieldType">suggestType</str>
<str name="payloadField">property_metadata</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
<lst name="suggester">
<str name="name">mySuggester2</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="indexPath">suggester_infix_dir</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">site_address_other</str>
<str name="suggestAnalyzerFieldType">suggestType</str>
<str name="payloadField">property_metadata</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
</searchComponent>
The handler is defined like so -
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">mySuggester1</str>
<str name="suggest.dictionary">mySuggester2</str>
<str name="suggest.collate">false</str>
<str name="echoParams">explicit</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Problem Statement
Every time I try to build the suggest index using the suggest.build=true url parameter, I end up with an OutOfMemory error. I have no clue how I can make this work with the current setup. Can anyone explain why this is happening? And how can I fix this issue?
On the mailing list, the critical info is that the OOME is due to java heap space.
This means that you either need to increase your heap size or reduce the amount of heap that Solr needs. It is not always possible to reduce the heap requirements, but here are some ideas:
https://wiki.apache.org/solr/SolrPerformanceProblems#Reducing_heap_requirements

Solr suggester returning terms from deleted documents

I have a SolrCloud setup and I'm testing the suggestion component. I have several hundred documents in the index. I did not want some of the documents in the index because they contain gibberish (they were binary files that got improperly converted to text). I've removed them from the index, but the gibberish words from them are still showing up in the suggestions.
My suggest configuration looks like this:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">fuzzySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">HighFrequencyDictionaryFactory</str>
<str name="storeDir">suggester_fuzzy_dir</str>
<str name="field">dictionary_text</str>
<str name="suggestAnalyzerFieldType">phrase_suggest</str>
<str name="exactMatchFirst">true</str>
<float name="threshold">0.001</float>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.dictionary">fuzzySuggester</str>
<str name="suggest.onlyMorePopular">true</str>
<str name="suggest.count">5</str>
<str name="suggest.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Note that buildOnCommit is set to true. I also tried to remove them using a /suggest query with the suggest.build=true parameter, but that had no effect.
Is there something else required to remove terms from the dictionary?
Despite using expungeDeletes=true in the update, the deleted documents were still hanging around. Optimizing removed them and appears to have removed all the gibberish terms from suggestions.

How to optimize documentdictionary build on solr cloud suggester?

I have around 300,000 records to be uploaded on a solr cloud suggester. These records are dynamic i.e. new documents will be added and some document will be deleted in future on a regular basis. The problem I am facing is either:
Use FileDictionaryFactory: this method is an operational nightmare. I would need to keep generating the file and upload it to zookeeper (still haven't figured out how to upload huge file like this to zookeeper). And might need to create index on each server on the solr cloud separately. Doing this frequently does not seems possible.
Use DocumentDictionaryFactory: this method seems like an obvious choice, but building index here is a nightmare as well. Everytime I try to build index, I get the "No space left on the device" error. I tried building it on 5K records and it was successful. But it took 40 minutes and consumed all 10GB of memory during this entire 40 minutes.
My question is, can we optimize this index building time if we follow the second approach.
Or if I follow the first approach what should be the ideal way of dealing with frequent changes to be indexed on solr cloud.
my Configs:
For FileDictionaryFactory:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggestions</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">FileDictionaryFactory</str>
<str name="field">searchfield</str>
<str name="weightField">searchscore</str>
<str name="suggestAnalyzerFieldType">text_ngram</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
<str name="sourceLocation">spellings.txt</str>
<str name="storeDir">autosuggest_dict</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">suggestions</str>
<str name="suggest.dictionary">results</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
For DocumentDictionaryFactory:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggestions</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">searchfield</str>
<str name="weightField">searchscore</str>
<str name="payloadField">payload</str>
<str name="suggestAnalyzerFieldType">text_ngram</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
<str name="sourceLocation">spellings.txt</str>
<str name="storeDir">autosuggest_dict</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">suggestions</str>
<str name="suggest.dictionary">results</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
I think the main issue for the DocumentDictionaryFactory (this is my preferred option) is that you are using text_ngram. If your values are not very short, this will produce (I guess, you didn't share text_ngram definition) a very large FST, thus the time to create it.
Unless I am missing something, you don't need to do that, just use some type that tokenizes with StandardTokenizerFactory and suggestions should work.

solr suggester not working with shard for multiple core

I'm trying to use the suggest component (solr 4.6) with multiple cores. I have added a search component and a request handler in my solrconfig. That works fine for 1 core but querying my solr instance with the shards parameter does not work.
But did you mean' (spell check ) is working fine with multiple cores using shard.
Here is the configuration part of solrconfig file :
<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggestDictionary</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
<str name="field">suggest</str>
<float name="threshold">0.0005</float>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
<lst name="defaults">
<str name="echoParams">none</str>
<str name="wt">xml</str>
<str name="indent">false</str>
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggestDictionary</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.collate">false</str>
<str name="qt">/suggest</str>
<str name="shards.qt">/suggest</str>
<str name="shards">localhost:8080/cores/core1,localhost:8080/cores/core2</str>
<bool name="distrib">false</bool>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
<shardHandlerFactory class="HttpShardHandlerFactory">
<int name="socketTimeOut">1000</int>
<int name="connTimeOut">5000</int>
</shardHandlerFactory>
</requestHandler>
It works for me..
You can get the suggestions using this RestURL
http://localhost:8983/solr/demo/spell?q=howoo&wt=json&indent=true&qt=spell&shards.qt=/spell&shards=localhost:8983/solr/demo_shard2_replica1,localhost:8983/solr/demo_shard1_replica2
OR Simply use this :
http://localhost:8983/solr/demo/spell?q=hoo&wt=json&indent=true&shards.qt=/spell
shards.qt=/spell : Need to add that allows suggestion on shards
Here, you have make changes and apply for things which requires.
Collection = demo
Shards = demo_shard2_replica1, demo_shard1_replica2
Replace collection and shards names with your names of collection and shards.

Resources