I am new to Solr and trying to provide partial word matching with Solr 8.8.1, but partials are giving no results. I have combed the blogs without luck to fix this.
For example, the text of the document contains the word longer. Index analysis gives lon, long, longe, longer. If I query longer using alltext_en:longer, I get a match. However, if I query (for example) longe using alltext_en:longe, I get no match. explainOther returns 0.0 = No matching clauses.
It seems that I am missing something obvious, since this is not a complex phrase query.
Apologies in advance if I have missed any needed details - I will update the question if you tell me what else is needed to know.
Here are the relevant field specs from my managed-schema:
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="15" minGramSize="3"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<dynamicField name="*_txt_en" type="text_en" indexed="true" stored="true"/>
<field name="alltext_en" type="text_en" multiValued="true" indexed="true" stored="true"/>
<copyField source="*_txt_en" dest="alltext_en"/>
Here is the relevant part of solrconfig.xml:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<!-- Query settings -->
<str name="defType">edismax</str>
<str name="q">*:*</str>
<str name="q.alt">*:*</str>
<str name="rows">50</str>
<str name="fl">*,score,[explain]</str>
<str name="ps">10</str>
<!-- Highlighting defaults -->
<str name="hl">on</str>
<str name="hl.fl">_text_</str>
<str name="hl.preserveMulti">true</str>
<str name="hl.encoder">html</str>
<str name="hl.simple.pre"><span class="artica-snippet"></str>
<str name="hl.simple.post"></span></str>
<!-- Spell checking defaults -->
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.alternativeTermCount">2</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">5</str>
<str name="spellcheck.maxCollations">3</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
That stemming filter will modify the tokens in ways you don't predict - and since they only happen on the token you try to match agains the ngrammed tokens when querying, the token might not be what you expect). If you're generating ngrams, stemming filters should usually be removed. I'd also remove the possessive filter (Also, small note - try to avoid using * when formatting text, since it's hard to know if you've used it when querying and the formatting is an error - instead use a backtick to indicate that the text is a code keyword/query.) – MatsLindh
That answered it - I removed the stemmer from the index step and everything was fine. Brilliant, thank you, #MatsLindh!
I am trying to test the spellchecking functionality with Solr 4.7.2 using solr.DirectSolrSpellChecker (where you don't need to build a dedicated index).
I have a field named "title" in my index; I used a copy field definition to create a field named "title_spell" to be queried for the spellcheck (title_spell is correctly filled). However, in the admin solr admin console, I always get empty suggesions.
For example: I have a solr document with the title "A B automobile"; I enter in the admin console (spellcheck crossed and under the input field spellcheck.q) "atuomobile". I expect to get at least something like "A B automobile" or "automobile" but the spellcheck suggestion remains empty...
My configuration:
schema.xml (only relevant part copied):
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="de_DE/synonyms.txt" ignoreCase="true"
expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
</fieldType>
...
<field name="title_spell" type="textSpell" indexed="true" stored="true" multiValued="false"/>
solr.xml (only relevant part copied):
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">title_spell</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="distanceMeasure">internal</str>
<float name="accuracy">0.5</float>
<int name="maxEdits">2</int>
<int name="minPrefix">1</int>
<int name="maxInspections">5</int>
<int name="minQueryLength">4</int>
<float name="maxQueryFrequency">0.01</float>
<float name="thresholdTokenFrequency">.01</float>
</lst>
</searchComponent>
...
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="echoParams">explicit</str>
</lst>
<!--Versuch, das online datum mit in die Gewichtung zu nehmen...-->
<lst name="appends">
<str name="bf">recip(ms(NOW/MONTH,sort_date___d_i_s),3.16e-11,50,1)</str>
<!--<str name="qf">title___td_i_s_gcopy^1e-11</str>-->
<str name="qf">title___td_i_s_gcopy^21</str>
<str name="q.op">AND</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
What did I miss? Thanks for your answers!
How large is your index? For a small index (think less than a few million docs), you're going to have to tune accuracy, maxQueryFrequency, and thresholdTokenFrequency. (Actually, it would probably be worth doing this on larger indices as well.)
For example, my 1.5 million doc index uses the following for these settings:
<float name="maxQueryFrequency">0.01</float>
<float name="thresholdTokenFrequency">.00001</float>
<float name="accuracy">0.5</float>
accuracy tells Solr how accurate a result needs to be before it's considered worth returning as a suggestion.
maxQueryFrequency tells Solr how frequently the term needs to occur in the index before it's can be considered worth returning as a suggestion.
thresholdTokenFrequency tells Solr what percentage of documents the term must be included in before it's considered worth returning as a suggestion.
If you plan to use spellchecking on multiple phrases, you may need to add a ShingleFilter to your title_spell field.
Another thing you might try is setting your queryAnalyzerFieldType to title_spell.
Can you please try editing your requestHandler declaration.
<requestHandler name="/standard" class="solr.SearchHandler" default="true">
and query url as:
http://localhost:8080/solr/service/standard?q=<term>&qf=title_spell
First experiment with small terms and learn how it is behaving. One problem here is it will only return all the terms starting with the same query term. You can use FuzzyLookupFactory which will match and return fuzzy result. For more information check solr suggester wiki.
Hello everyone i am using solr 4.10 and i am not getting the result as per my expectation. i want to get auto complete suggestion using multiple fields that is discountCatName,discountSubName and vendorName. i have a created multi-valued field "suggestions" using copyfield and using that filed for searching in suggester configuration.
Note: discountSubName & discountCatName are again multi-valued field, vendorName is string.
This is a suggestion field data from one of my document:
"suggestions": [
"Budget Car Rental",
"Car Rentals",
"Business Deals",
"Auto",
"Travel",
"Car Rentals" ]
If i type for a "car" i am getting "Budget Car Rental" in my suggestion but not "Car Rentals", below are my configurations. let me know if i need to change the tokenizer and filters.Any help in this would be appreciate.
Below is my code block as per explained the scenario above.
Suggestion field,fieldType,searchComponent and request handler respectively which i am using for auto complete suggestions
<!--suggestion field -->
<field name="suggestions" type="suggestType" indexed="true" stored="true" multiValued="true"/>
<copyField source="discountCatName" dest="suggestions"/>
<copyField source="discountSubName" dest="suggestions"/>
<copyField source="vendorName" dest="suggestions"/>
<!--suggest fieldType -->
<fieldType name="suggestType" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>
<!--suggest searchComponent configuration -->
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">analyzing</str>
<str name="lookupImpl">BlendedInfixLookupFactory</str>
<str name="suggestAnalyzerFieldType">suggestType</str>
<str name="blenderType">linear</str>
<str name="minPrefixChars">1</str>
<str name="doHighlight">false</str>
<str name="weightField">score</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">suggestions</str>
<str name="buildOnStartup">true</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<!--suggest request handler -->
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">analyzing</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
I just discovered by debugging Solr 4.10 source code there is a bug in DocumentDictionaryFactory lookup, it's always look in the first string incase of multi-valued field and then stop suggestion from that document hence i am not getting expected output from my above configuration.
I have a created a separate index for all the fields i want to apply search like catName0...catName10, subName0...subName10 and then created multiple suggestion dictionaries for each fields and lastly i parsed the response form all the suggestion dictionary merged them and sorted based on weight and highlight position.
Lengthy approach but no other way as this solr 4.10 was required.
With solr, I try to highlighting some text using hl.formatter option with hl.simple.pre/post.
My problem is that the hl.simple.pre/post code doesn't appear sometime in the highlighting results, I don't understand why.
By example I call this URL :
http://localhost:8080/solr/Employees/select?q=lastName:anthan&fl=lastName&wt=json&indent=true&hl=true&hl.fl=lastName&hl.simple.pre=<em>&hl.simple.post=</em>
I get :
..."highlighting": {
"NB0094418": {
"lastName": [
"Yogan<em>anthan</em>" => OK
]
},
"NB0104046": {
"lastName": [
"Vijayakanthan" => KO, I want Vijayak<em>anthan</em>
]
},
"NB0144981": {
"lastName": [
"Parmananthan" => KO, I want Parman<em>anthan</em>
]
},...
Someone have an idea why I have this behavior ?
My configuration :
schema.xml
<fieldType name="nameType" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="2" maxGramSize="50" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" />
</analyzer>
</fieldType>
...
<fields>
<field name="lastName" type="nameType" indexed="true" stored="true" required="true" />
</fields>
solrconfig.xml
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="echoParams">explicit</str>
</lst>
</requestHandler>
...
<searchComponent class="solr.HighlightComponent" name="highlight">
<highlighting>
<fragmenter name="gap" default="true" class="solr.highlight.GapFragmenter">
<lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<fragmenter name="regex" class="solr.highlight.RegexFragmenter">
<lst name="defaults">
<int name="hl.fragsize">70</int>
<float name="hl.regex.slop">0.5</float>
<str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
</lst>
</fragmenter>
<formatter name="html" default="true" class="solr.highlight.HtmlFormatter">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<em>]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
</formatter>
<encoder name="html" default="true" class="solr.highlight.HtmlEncoder" />
<fragListBuilder name="simple" default="true" class="solr.highlight.SimpleFragListBuilder" />
<fragListBuilder name="single" class="solr.highlight.SingleFragListBuilder" />
<fragmentsBuilder name="default" default="true" class="solr.highlight.ScoreOrderFragmentsBuilder">
</fragmentsBuilder>
<fragmentsBuilder name="colored" class="solr.highlight.ScoreOrderFragmentsBuilder">
<lst name="defaults">
<str name="hl.tag.pre"><![CDATA[
<b style="background:yellow">,<b style="background:lawgreen">,
<b style="background:aquamarine">,<b style="background:magenta">,
<b style="background:palegreen">,<b style="background:coral">,
<b style="background:wheat">,<b style="background:khaki">,
<b style="background:lime">,<b style="background:deepskyblue">]]></str>
<str name="hl.tag.post"><![CDATA[</b>]]></str>
</lst>
</fragmentsBuilder>
</highlighting>
</searchComponent>
I was dealing with a very similar problem until yesterday. I tried many different solutions, iteratively, so some details I ended up with of this may not be necessary. But I'll describe what I got working eventually. Short answer, I think the highlighter is failing to find the term position information it needs on longer fields.
Firstly, the symptoms I was seeing: sometimes the search term highlight would show up, and sometimes the entire field would show up in the highlighting section, but without the highlight information. The pattern ended up being based on both the length of the field, and the length of the search term. I found that the longer the field (actually, the token that was ngrammed), the shorter the search term that could be highlighted successfully. It wasn't 1-to-1, though. I found that for a field with 11 or fewer characters, highlighting worked fine in all cases. If the field had 12 characters, no ngram longer than 9 characters would be highlighted. For a field with 15 characters, ngrams longer than 7 characters would not be highlighted. For fields longer than 18 characters, ngrams longer than 6 characters would not be highlighted. And for fields longer than 21 characters, ngrams longer than 5 aren't highlighted, and fields longer than 24 characters wouldn't highlight more than 4 characters. (It looks like, from the examples you have above, that the specific sizes you are seeing are not exactly the same, but I do notice that the names in the documents where the highlighting did not work were longer than the one where it did.)
So, here's what ended up working:
I switched from using WhitespaceTokenizer and NGramFilterFactory to using NGramTokenizerFactory instead. (You are already using this, and I'll have more later on a difficulty this raised for me.) This wasn't sufficient to solve the problem, though, because the term positions still weren't being stored.
I started using the FastVectorHighlighter. This forced some changes in how my schema fields were indexed (including storing storing the term vectors, positions and offsets), and I also had to change my pre- and post- indicator tag configuration from hl.simple.pre to hl.tag.pre (and similarly for *post).
Once I had made these changes, the highlighting started working consistently. This had the side-effect, though, of removing the behavior I had been getting from the WhitespaceTokenizer. If I had a field that contained the phrase "this is a test" I was ending up with ngrams that included "s is a", "a tes", etc., and I really just wanted the ngrams of the individual words, not of the whole phrase. There is a note in the NGramTokenizer JavaDocs that you can override NGramTokenizer.isTokenChar() to provide pre-tokenizing, but I couldn't find an example of this on the web. I'll include one below.
End result:
WhitespaceSplittingNGramTokenizer.java:
package info.jwismar.solr.plugin;
import java.io.Reader;
import org.apache.lucene.analysis.ngram.NGramTokenizer;
import org.apache.lucene.util.Version;
public class WhitespaceSplittingNGramTokenizer extends NGramTokenizer {
public WhitespaceSplittingNGramTokenizer(Version version, Reader input, int minGram, int maxGram) {
super(version, input, minGram, maxGram);
}
public WhitespaceSplittingNGramTokenizer(Version version, AttributeFactory factory, Reader input, int minGram,
int maxGram) {
super(version, factory, input, minGram, maxGram);
}
public WhitespaceSplittingNGramTokenizer(Version version, Reader input) {
super(version, input);
}
#Override
protected boolean isTokenChar(int chr) {
return !Character.isWhitespace(chr);
}
}
WhitespaceSplittingNGramTokenizerFactory.java:
package info.jwismar.solr.plugin;
import java.io.Reader;
import java.util.Map;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.ngram.NGramTokenizer;
import org.apache.lucene.analysis.util.TokenizerFactory;
import org.apache.lucene.util.AttributeSource.AttributeFactory;
public class WhitespaceSplittingNGramTokenizerFactory extends TokenizerFactory {
private final int maxGramSize;
private final int minGramSize;
/** Creates a new WhitespaceSplittingNGramTokenizer */
public WhitespaceSplittingNGramTokenizerFactory(Map<String, String> args) {
super(args);
minGramSize = getInt(args, "minGramSize", NGramTokenizer.DEFAULT_MIN_NGRAM_SIZE);
maxGramSize = getInt(args, "maxGramSize", NGramTokenizer.DEFAULT_MAX_NGRAM_SIZE);
if (!args.isEmpty()) {
throw new IllegalArgumentException("Unknown parameters: " + args);
}
}
#Override
public Tokenizer create(AttributeFactory factory, Reader reader) {
return new WhitespaceSplittingNGramTokenizer(luceneMatchVersion, factory, reader, minGramSize, maxGramSize);
}
}
These need to be packaged up into a .jar and installed someplace where SOLR can find it. One option is to add a lib directive in solrconfig.xml to tell SOLR where to look. (I called mine solr-ngram-plugin.jar and installed it in /opt/solr-ngram-plugin/.)
Inside solrconfig.xml:
<lib path="/opt/solr-ngram-plugin/solr-ngram-plugin.jar" />
schema.xml (field type definition):
<fieldType name="any_token_ngram" class="solr.TextField">
<analyzer type="index">
<tokenizer class="info.jwismar.solr.plugin.WhitespaceSplittingNGramTokenizerFactory" maxGramSize="30" minGramSize="2"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory"
pattern="^(.{30})(.*)?" replacement="$1" replace="all" />
</analyzer>
</fieldType>
schema.xml (field definitions):
<fields>
<field name="property_address_full" type="string" indexed="false" stored="true" />
<field name="property_address_full_any_ngram" type="any_token_ngram" indexed="true"
stored="true" omitNorms="true" termVectors="true" termPositions="true"
termOffsets="true"/>
</fields>
<copyField source="property_address_full" dest="property_address_full_any_ngram" />
solrconfig.xml (request handler (you can pass these parameters in the normal select URL, instead, if you prefer)):
<!-- request handler to return typeahead suggestions -->
<requestHandler name="/suggest" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="defType">edismax</str>
<str name="rows">10</str>
<str name="mm">2</str>
<str name="fl">*,score</str>
<str name="qf">
property_address_full^100.0
property_address_full_any_ngram^10.0
</str>
<str name="sort">score desc</str>
<str name="hl">true</str>
<str name="hl.fl">property_address_full_any_ngram</str>
<str name="hl.tag.pre">|-></str>
<str name="hl.tag.post"><-|</str>
<str name="hl.fragsize">1000</str>
<str name="hl.mergeContinuous">true</str>
<str name="hl.useFastVectorHighlighter">true</str>
</lst>
</requestHandler>
If you are asking why your configuration defined hl.tag.pre and hl.tag.post are not appearing in the sample query you gave and is instead showing the <em> and </em> pre/post tags...
This is because you are specifying the hl.tag.pre and hl.tag.post parameters in the query string (at request time). Therefore, they are overriding the defaults settings you have defined for the highlight searchComponent in your solrconfig.xml file.
Either remove those query string parameters or set the searchComponent configuration file to set hl.tag.pre and hl.tag.post in a <lst name="invariant"> to force these to override any request time parameters.
Here is an overview of the various Configuration Settings
I am working with Solr Spell Check . Got it up and running . However for certain misspells it is not giving the expected result :
Correct Word : Cancer
Incorrect Spelling : Cacner ,cacnar , cancar ,cancre,cancere .
I am not getting "cancer" as the suggestion for "cacnar" instead it shows "inner" which although sounds more like cacner is not the correct suggestion . And for cacnar again I am getting a suggestion as 'pulmonary'.
Any way of configuring it to display cancer instead of the other results ?
Alternatively is there any score for the suggestions that can be referred to before showing it to the user ?
As per request here is the configuration :
The field used for dictionary (in schema.xml):
<copyField source="procname" dest="dtextspell" />
<field name = "dtextspell" stored="false" type="text_small" multiValued="true" indexed="true"/>
Definition of "text_small" (again in schema.xml) :
<fieldType name="text_small" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
<analyzer type ="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
</fieldType>
In solrconfig.xml :
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_small</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="field">dtextspell</str>
<float name="thresholdTokenFrequency">.0001</float>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="field">name</str>
<str name="buildOnCommit">true</str>
</lst></searchComponent>
Attached it to the select request handler like this :
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="spellcheck.count">10</str>
<str name="df">text</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr> </requestHandler>
To build the spell check :
http://localhost:8080/solr/select?q=*:*&spellcheck=true&spellcheck.build=true
To search for term :
http://localhost:8080/solr/select?q=procname:%22cacner%22&spellcheck=true&defType=edismax
The response XML :
<lst name="spellcheck"><lst name="suggestions">
<lst name="cacner">
<int name="numFound">1</int>
<int name="startOffset">10</int>
<int name="endOffset">16</int>
<arr name="suggestion">
<str>inner</str> <end tags start from here>
Hope it helps !!
Sounds like you've not rebuilt the spellchecker's index recently. Request a manual update by make a query with spellcheck=true&spellcheck.build=true appended to the query string (do NOT do this on every request, as the build process can take some time). You should also make sure that you're using the correct field to build your spellchecker's index.
You can also configure the spellchecker component to rebuild the index on every commit or on every optimize, by adding:
<str name="buildOnCommit">true</str>
or
<str name="buildOnOptimize">true</str>
to your spellchecker configuration.