I actually use Solr 4.8.1 and I set up spellcheck. After indexing, the request doesn't return any suggestion.
After the advice of #n0tting, I modified a little my files.
Here are steps:
1- solrconfig.xml
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">phraseText</str>
<lst name="spellchecker">
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="name">default</str>
<str name="field">title_spellcheck</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
add some configurations in standard requestHandler:
<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
<!-- default values for query parameters -->
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<!-- Optional, must match spell checker's name as defined above, defaults to "default" -->
<str name="spellcheck.dictionary">default</str>
<!-- omp = Only More Popular -->
<str name="spellcheck.onlyMorePopular">false</str>
<!-- exr = Extended Results -->
<str name="spellcheck.extendedResults">false</str>
<!-- The number of suggestions to return -->
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
2 schema.xml
Define a field for spell check:
<field name="title_spellcheck" type="phraseText" indexed="true" stored="false" multiValued="true" />
<copyField source="title" dest="title_spellcheck"/>
3 Request:
.../select?q=recommend&defType=edismax&qf=title&spellcheck=true&spellcheck.build=true&spellcheck.q=recommend&spellcheck.collate=true
I don't get any suggestion at result, neither <lst name="spellcheck">. can anybody give me an advice? Thanks a lot.
References:
https://cwiki.apache.org/confluence/display/solr/Spell+Checking
http://solr.pl/en/2011/05/23/%E2%80%9Ccar-sale-application%E2%80%9D-%E2%80%93-spellcheckcomponent-%E2%80%93-did-you-really-mean-that-part-5/
Related
This is a follow up question of this question. I have a list of cities onto which I want to implement spell-checker. I have the priorities/weights of these cities with me. I tried implementing a solrsuggester with a FileDictionaryFactory as a base with the following format:
<city-name> <TAB> <weight> <TAB> <other parameters like citycode,country>
I am passing other attributes like citycode, country etc as pipe separated payload string.
Here's my solrconfig
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">FileDictionaryFactory</str>
<str name="field">name</str>
<str name="weightField">searchscore</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">false</str>
<str name="sourceLocation">spellings.txt</str>
<str name="storeDir">autosuggest_dict</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">mySuggester</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
and my schema
<field name="name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="countrycode" type="string" indexed="true" stored="true" multiValued="false" />
<field name="latlng" type="location" indexed="true" stored="true" multiValued="false" />
<field name="searchfield" type="text_ngram" indexed="true" stored="false" multiValued="true" omitNorms="true" omitTermFreqAndPositions="true" />
<uniqueKey>id</uniqueKey>
<defaultSearchField>searchfield</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
<copyField source="name" dest="searchfield"/>
Now the problem I am facing is I am getting 0 results for each and every search query. Even though I can see the storeDir getting created and it has a bin file with data looks like my payload data.
This is the url format I am using
/suggest?suggest=true&suggest.dictionary=mySuggester&wt=json&suggest.q=cologne
So, I have the following questions:
What does the creation of storeDir signify ? Is it indexed successfully
If yes, then what's wrong with my query ? If no, Am I missing something here(indexPath ???).
Is it the right way to supply search parameters on payload field ? If no, is there any other way ?
There is slight change in your solrconfig.xml. you need to remove buildOnStartup from suggester configuration or set it true.
[solrconfig.xml]
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">FileDictionaryFactory</str>
<str name="field">name</str>
<str name="weightField">searchscore</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">true</str>
<str name="sourceLocation">spellings.txt</str>
<str name="storeDir">autosuggest_dict</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">mySuggester</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
There is a problem in File Based Suggester that it will not build their suggestions through query by setting suggest=true. You need to build the File Based Suggestion on startup.
I was using searchfield as defaultSearchField in schema, but had configured name as suggest field. The moment I changed field to searchfield and suggestAnalyzerFieldType to text_ngram, it started working.
Here is the working solrconfig:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggestions</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">FileDictionaryFactory</str>
<str name="field">searchfield</str>
<str name="weightField">searchscore</str>
<str name="suggestAnalyzerFieldType">text_ngram</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
<str name="sourceLocation">spellings.txt</str>
<str name="storeDir">autosuggest_dict</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">suggestions</str>
<str name="suggest.dictionary">results</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
I am trying to add autocomplete feature for the phrase queries. Have the following configuration -
in schema.xml file
<field name="textSpell" type="spell" indexed="true" stored="true"
multiValued="true" termVectors="true" termPositions="true"
termOffsets="true" />
<field name="suggest_phrase" type="suggest_phrase" indexed="true"
stored="false" multivalued="false"/>
in solrconfig.xml
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">default</str>
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingLookupFactory</str> <!-- org.apache.solr.spelling.suggest.fst -->
<str name="dictionaryImpl">DocumentDictionaryFactory</str> <!-- org.apache.solr.spelling.suggest.HighFrequencyDictionaryFactory -->
<str name="field">textSpell</str>
<float name="thresholdTokenFrequency">.0001</float>
<!-- <str name="weightField">price</str>-->
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnCommit">true</str>
<!--<str name="buildOnOptimize">true</str>-->
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
<!-- Suggest Phrase -->
<searchComponent name="suggest_phrase" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">suggest_phrase</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
<str name="field">suggest_phrase</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler class="solr.SearchHandler"
name="/suggest_phrase" startup="lazy">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest_phrase</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.collate">false</str>
</lst>
<arr name="components">
<str>suggest_phrase</str>
</arr>
</requestHandler>
Also I am setting the following things - for the query to solr using solrj
SolrQuery suggestQuery = new SolrQuery();
suggestQuery.setParam(CommonParams.QT, "/terms");
suggestQuery.setParam(TermsParams.TERMS, true);
suggestQuery.setParam(TermsParams.TERMS_LIMIT, "5");
suggestQuery.setParam(TermsParams.TERMS_FIELD,"content");
suggestQuery.setParam(TermsParams.TERMS_LOWER, query);
suggestQuery.setParam(TermsParams.TERMS_PREFIX_STR, query);
suggestQuery.setParam("spellCheck", "true");
suggestQuery.setParam("spellcheck.q", query);
However, it doesn't yield results for phrase queries works only on single terms. Any suggestions. I am using Solr4.10.2
You are using two fieldTypes: "spell" and "suggest_phrase." How are they defined? The first thing I would check is whether or not you are using a WhitespaceTokenizerFactory on them - in which case, it wouldn't work over a phrase because the space in a phrase would terminate the token.
I am having an issue with by solr settings.
After a lot of investigation today, I found that its the spellcheck component which is causing the issue of Core Reload to hang.
If its turned off, all will run well and core can easily reload. However, when the spellcheck is on, the core wont reload instead hangs forever. Then the only way to get the project back alive is to stop solr, and delete the data folder then start solr again.
Here are the solr config settings for spell check:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<!-- Spell checking defaults -->
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck">on</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.alternativeTermCount">2</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.maxCollations">3</str>
<str name="spellcheck.maxCollationTries">3</str>
<str name="spellcheck.collateExtendedResults">true</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_en_splitting</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">location_details</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="buildOnCommit">true</str>
<float name="accuracy">0.5</float>
<float name="thresholdTokenFrequency">.01</float>
<int name="maxEdits">1</int>
<int name="minPrefix">3</int>
<int name="maxInspections">3</int>
<int name="minQueryLength">4</int>
<float name="maxQueryFrequency">0.001</float>
</lst>
</searchComponent>
.
Here is the field from schema:
<field name="location_details" type="text_en_splitting" indexed="true" stored="false" required="false" />
Basically, it is a bug in Solr. You need to just hide/comment/remove the following from your requestHandler:
<!--<str name="spellcheck.maxCollationTries">3</str> here is a bug, put this parameter in the actual query string instead -->
Furthermore, if you really need to use maxCollationTries, you can enter it as a Query parameter in your url instead.
I need to customize Solr highlighting prefix and suffix like this:
<span class="highlight">text</span>
instead of the default
<em>text</em>
That's why I'm using this configuration within the solrconfig.xml for the HighlightComponent:
<searchComponent class="solr.HighlightComponent" name="highlight">
<highlighting>
<fragmentsBuilder name="simple" default="true" class="solr.highlight.SimpleFragmentsBuilder">
<lst name="defaults">
<str name="hl.tag.pre"><![CDATA[<span class="highlight">]]></str>
<str name="hl.tag.post"><![CDATA[</span>]]></str>
</lst>
</fragmentsBuilder>
</highlighting>
</searchComponent>
The following are the default parameters for my standard request handler:
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="hl">true</str>
<str name="hl.fl">body,title</str>
<str name="hl.useFastVectorHighlighter">true</str>
</lst>
</requestHandler>
When I search for the text word I do get the text word highlighted, but not always using the prefix and suffix I configured:
<lst name="highlighting">
<lst name="document_1">
<arr name="body">
<str>my <em>text</em> highlighted</str>
</arr>
<arr name="title">
<str>my <span class="highlight">text</span> highlighted</str>
</arr>
</lst>
</lst>
Does anybody know why?
I am guessing you are seeing this behavior behavior because you only have the prefix and suffix defined for the SimpleFragmentsBuilder and the other highlights are coming from another fragment builder.
I am using a custom prefix and suffix for my highlighting and I set this value in the formatter section of the highlighting section of the solrconfig.xml and have not had any issues as it will apply to all fragment builders.
So maybe try the following:
<highlighting>
<fragmentsBuilder name="simple" default="true"
class="solr.highlight.SimpleFragmentsBuilder"/>
<!-- Configure the standard formatter -->
<formatter name="html" class="org.apache.solr.highlight.HtmlFormatter"
default="true">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<span class="highlight">]]></str>
<str name="hl.simple.post"><![CDATA[</span>]]></str>
</lst>
</formatter>
</highlighting>
I finally found out why! I'm using fastVectorHighlighter to make highlighting faster.
At the beginning I was highlighting only the title field and everything worked fine.
When I added the body field to highlighting I forgot to enable termVectors=true.
Now that my body field looks like this
<field name="body" type="text" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />
after a full reindex highlighting is working perfectly:
<lst name="highlighting">
<lst name="document_1">
<arr name="body">
<str>my <span class="highlight">text</span> highlighted</str>
</arr>
<arr name="title">
<str>my <span class="highlight">text</span> highlighted</str>
</arr>
</lst>
</lst>
Previously the body field highlighting did work, but without fastVectorHighlighter since the field didn't have the termVectors=true parameter. That's why I got body highlighted with default prefix and suffix. Since fastVectorHighlighter is a completely different highlighting method, the configuration is different as well.
To avoid this kind of mistakes, as long the users can choose what fields to highlight with the hl.fl parameter, I'd recommend to include also the configuration for the standard highlighting (formatter element, class solr.highlight.HtmlFormatter) like this:
<searchComponent class="solr.HighlightComponent" name="highlight">
<highlighting>
<formatter name="html" default="true" class="solr.highlight.HtmlFormatter">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<span class="highlight">]]></str>
<str name="hl.simple.post"><![CDATA[</span>]]></str>
</lst>
</formatter>
<fragmentsBuilder name="simple" default="true" class="solr.highlight.SimpleFragmentsBuilder">
<lst name="defaults">
<str name="hl.tag.pre"><![CDATA[<span class="highlight">]]></str>
<str name="hl.tag.post"><![CDATA[</span>]]></str>
</lst>
</fragmentsBuilder>
</highlighting>
</searchComponent>
This way highlighting will work with the same prefix and suffix even for fields with termVectors disabled.
I am trying to set up spellchecker, according to solr documentation. But when I am testing, I don't have any suggestion. My piece of code follows:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="name">default</str>
<str name="field">name</str>
<str name="spellcheckIndexDir">./spellchecker</str>
</lst>
<str name="queryAnalyzerFieldType">textSpell</str>
</searchComponent>
<requestHandler name="/spellcheck" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<!-- Optional, must match spell checker's name as defined above, defaults to "default" -->
<str name="spellcheck.dictionary">default</str>
<!-- omp = Only More Popular -->
<str name="spellcheck.onlyMorePopular">false</str>
<!-- exr = Extended Results -->
<str name="spellcheck.extendedResults">false</str>
<!-- The number of suggestions to return -->
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
The query I send to Solr:
q=%2B%28text%3A%28gasal%29%29&suggestField=contentOriginal&ontologySeed=gasal&spellcheck.build=true&spellcheck.q=gasal&spellcheck=true&spellcheck.collate=true&hl=true&hl.snippets=5&hl.fl=text&hl.fl=text&rows=12&start=0&qt=%2Fsuggestprobabilistic
Does anybody know why?? Thanks in advance
First, don't repeat queryAnalyzerFieldType twice in the component configuration.
It is recommended not to use a /spellcheck handler but instead to bind the spellcheck component to the standard query handler (or dismax if it is what you use) like this:
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">
...
</lst>
<arr name="last-components">
<str>spellcheck</str>
...
</arr>
</requestHandler>
You can then call it like this:
http://localhost:8983/solr/select?q=komputer&spellcheck=true
Also don't forget to build the spellcheck dictionary before you use it:
http://localhost:8983/solr/select/?q=*:*&spellcheck=true&spellcheck.build=true
You can force the dictionary to build at each commit by configuring it in the component:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="name">default</str>
<str name="field">name</str>
<str name="spellcheckIndexDir">./spellchecker1</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
Finally, make sure that your name field is really an indexed field of type textSpell and that it contains enough content to build a good dictionary. In my case, I have a field named spellchecker that is populated from a couple of fields of my index (using copyField instructions in the schema).