Solr spellcheck configuration - solr

I am trying to build the spellcheck index with IndexBasedSpellChecker
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">text</str>
<str name="spellcheckIndexDir">./spellchecker</str>
</lst>
And I want to specify the dynamic field "*_text" as the field option:
<dynamicField name="*_text" stored="false" type="text" multiValued="true" indexed="true">
How it can be done?

Copy all the text fields to one field:
<copyField source="*_text" dest="textSpell" />
and then build spellcheck index from field "textSpell"
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">textSpell</str>
<str name="spellcheckIndexDir">./spellchecker</str>
</lst>

This will be helpful
Implementation of solr spellchecker and
spellCheckComponent

Related

Solr 7.2 suggester contextField filter returning no results

I am trying to filter using a contextField in Solr 7.2. In solrconfig.xml I have the following:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggest_artist</str>
<str name="lookupImpl">BlendedInfixLookupFactory</str>
<str name="dictionaryimpl">DocumentDictionaryFactory</str>
<str name="field">artist</str>
<str name="weightField">monthly_dlds</str>
<str name="contextField">territory</str>
<str name="queryAnalyzerFieldType">phrase_suggest</str>
<str name="suggestAnalyzerFieldType">text_suggest</str>
<str name="buildOnStartup">true</str>
<str name="buildOnCommit">true</str>
<str name="storeDir">suggest_a</str>
<str name="indexPath">suggest_a</str>
<str name="highlight">false</str>
</lst>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="echoParams">all</str>
<str name="wt">json</str>
<str name="indent">true</str>
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
In my schema, the territory field is configured as follows:
<field name="territory" type="string" indexed="true" stored="true" multiValued="true"/>
The territory field is multivalued, containing territories (['US', 'CA', etc.]).
I run the suggest query as follows:
http://localhost:8983/solr/test_suggester/suggest?suggest.dictionary=suggest_artist&suggest.q=m&suggest.cfq=US
and I get a response with no suggestions found.
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
"echoParams":"all",
"indent":"true",
"suggest.q":"m",
"suggest.count":"10",
"suggest":"true",
"suggest.dictionary":"suggest_artist",
"wt":"json",
"suggest.cfq":"US"}},
"suggest":{"suggest_artist":{
"m":{
"numFound":0,
"suggestions":[]}}}}
Without the suggest.cfg=US I am getting a list of suggestions (I have checked that there are items that should be returned by searching using fq=territory:US). I have tried using a single-valued field, using both boolean (eg: us_terr:true) and string field-types (us_terr:"t"), and the results have been the same. The suggester is in its own separate collection on SolrCloud, with only one shard.
The issue was a that dictionaryImpl was misspelt.
<str name="dictionaryimpl">DocumentDictionaryFactory</str>
should be:
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
This meant that it was using the default dictionary implementation, HighFrequencyDictionaryFactory, which doesn't support context filtering.

Couldn't get data in suggester even when storeDir getting created by FileDictionaryFactory

This is a follow up question of this question. I have a list of cities onto which I want to implement spell-checker. I have the priorities/weights of these cities with me. I tried implementing a solrsuggester with a FileDictionaryFactory as a base with the following format:
<city-name> <TAB> <weight> <TAB> <other parameters like citycode,country>
I am passing other attributes like citycode, country etc as pipe separated payload string.
Here's my solrconfig
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">FileDictionaryFactory</str>
<str name="field">name</str>
<str name="weightField">searchscore</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">false</str>
<str name="sourceLocation">spellings.txt</str>
<str name="storeDir">autosuggest_dict</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">mySuggester</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
and my schema
<field name="name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="countrycode" type="string" indexed="true" stored="true" multiValued="false" />
<field name="latlng" type="location" indexed="true" stored="true" multiValued="false" />
<field name="searchfield" type="text_ngram" indexed="true" stored="false" multiValued="true" omitNorms="true" omitTermFreqAndPositions="true" />
<uniqueKey>id</uniqueKey>
<defaultSearchField>searchfield</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
<copyField source="name" dest="searchfield"/>
Now the problem I am facing is I am getting 0 results for each and every search query. Even though I can see the storeDir getting created and it has a bin file with data looks like my payload data.
This is the url format I am using
/suggest?suggest=true&suggest.dictionary=mySuggester&wt=json&suggest.q=cologne
So, I have the following questions:
What does the creation of storeDir signify ? Is it indexed successfully
If yes, then what's wrong with my query ? If no, Am I missing something here(indexPath ???).
Is it the right way to supply search parameters on payload field ? If no, is there any other way ?
There is slight change in your solrconfig.xml. you need to remove buildOnStartup from suggester configuration or set it true.
[solrconfig.xml]
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">FileDictionaryFactory</str>
<str name="field">name</str>
<str name="weightField">searchscore</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">true</str>
<str name="sourceLocation">spellings.txt</str>
<str name="storeDir">autosuggest_dict</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">mySuggester</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
There is a problem in File Based Suggester that it will not build their suggestions through query by setting suggest=true. You need to build the File Based Suggestion on startup.
I was using searchfield as defaultSearchField in schema, but had configured name as suggest field. The moment I changed field to searchfield and suggestAnalyzerFieldType to text_ngram, it started working.
Here is the working solrconfig:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggestions</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">FileDictionaryFactory</str>
<str name="field">searchfield</str>
<str name="weightField">searchscore</str>
<str name="suggestAnalyzerFieldType">text_ngram</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
<str name="sourceLocation">spellings.txt</str>
<str name="storeDir">autosuggest_dict</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">suggestions</str>
<str name="suggest.dictionary">results</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>

Solr Spellcheck request returns nothing

I actually use Solr 4.8.1 and I set up spellcheck. After indexing, the request doesn't return any suggestion.
After the advice of #n0tting, I modified a little my files.
Here are steps:
1- solrconfig.xml
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">phraseText</str>
<lst name="spellchecker">
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="name">default</str>
<str name="field">title_spellcheck</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
add some configurations in standard requestHandler:
<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
<!-- default values for query parameters -->
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<!-- Optional, must match spell checker's name as defined above, defaults to "default" -->
<str name="spellcheck.dictionary">default</str>
<!-- omp = Only More Popular -->
<str name="spellcheck.onlyMorePopular">false</str>
<!-- exr = Extended Results -->
<str name="spellcheck.extendedResults">false</str>
<!-- The number of suggestions to return -->
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
2 schema.xml
Define a field for spell check:
<field name="title_spellcheck" type="phraseText" indexed="true" stored="false" multiValued="true" />
<copyField source="title" dest="title_spellcheck"/>
3 Request:
.../select?q=recommend&defType=edismax&qf=title&spellcheck=true&spellcheck.build=true&spellcheck.q=recommend&spellcheck.collate=true
I don't get any suggestion at result, neither <lst name="spellcheck">. can anybody give me an advice? Thanks a lot.
References:
https://cwiki.apache.org/confluence/display/solr/Spell+Checking
http://solr.pl/en/2011/05/23/%E2%80%9Ccar-sale-application%E2%80%9D-%E2%80%93-spellcheckcomponent-%E2%80%93-did-you-really-mean-that-part-5/

Autocomplete for phrases solrj

I am trying to add autocomplete feature for the phrase queries. Have the following configuration -
in schema.xml file
<field name="textSpell" type="spell" indexed="true" stored="true"
multiValued="true" termVectors="true" termPositions="true"
termOffsets="true" />
<field name="suggest_phrase" type="suggest_phrase" indexed="true"
stored="false" multivalued="false"/>
in solrconfig.xml
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">default</str>
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingLookupFactory</str> <!-- org.apache.solr.spelling.suggest.fst -->
<str name="dictionaryImpl">DocumentDictionaryFactory</str> <!-- org.apache.solr.spelling.suggest.HighFrequencyDictionaryFactory -->
<str name="field">textSpell</str>
<float name="thresholdTokenFrequency">.0001</float>
<!-- <str name="weightField">price</str>-->
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnCommit">true</str>
<!--<str name="buildOnOptimize">true</str>-->
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
<!-- Suggest Phrase -->
<searchComponent name="suggest_phrase" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">suggest_phrase</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
<str name="field">suggest_phrase</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler class="solr.SearchHandler"
name="/suggest_phrase" startup="lazy">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest_phrase</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.collate">false</str>
</lst>
<arr name="components">
<str>suggest_phrase</str>
</arr>
</requestHandler>
Also I am setting the following things - for the query to solr using solrj
SolrQuery suggestQuery = new SolrQuery();
suggestQuery.setParam(CommonParams.QT, "/terms");
suggestQuery.setParam(TermsParams.TERMS, true);
suggestQuery.setParam(TermsParams.TERMS_LIMIT, "5");
suggestQuery.setParam(TermsParams.TERMS_FIELD,"content");
suggestQuery.setParam(TermsParams.TERMS_LOWER, query);
suggestQuery.setParam(TermsParams.TERMS_PREFIX_STR, query);
suggestQuery.setParam("spellCheck", "true");
suggestQuery.setParam("spellcheck.q", query);
However, it doesn't yield results for phrase queries works only on single terms. Any suggestions. I am using Solr4.10.2
You are using two fieldTypes: "spell" and "suggest_phrase." How are they defined? The first thing I would check is whether or not you are using a WhitespaceTokenizerFactory on them - in which case, it wouldn't work over a phrase because the space in a phrase would terminate the token.

Basic UIMA with SOLR

I am trying to connect UIMA with Solr. I have downloaded the Solr 3.5 dist and have it successfully running with nutch and tika on windows 7 using solrcell and curl via cygwin.
To begin, I copied the 6 jars from solr/contrib/uima/lib to the working /lib in solr.
Next, I read the readme.txt file in solr/contrib/uima/lib and edited both my solrconfig.xml and schema.xml to no avail.
I then found this link which seemed a bit more applicable since I didnt care to use Alchemy or OpenCalais: http://code.google.com/a/apache-extras.org/p/rondhuit-uima/?redir=1
Still- when I run a curl command that imports a pdf via solrcell I do not get the additional UIMA fields nor do I get anything on my logs. The test.pdf is parsed though and I see the pdf in Solr using:
curl 'http://localhost:8080/solr/update/extract?fmap.content=content&literal.id=doc1&commit=true' -F "file=#test.pdf"
SolrConfig.XML
<updateRequestProcessorChain name="uima">
<processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<lst name="uimaConfig">
<lst name="runtimeParameters">
<str name="host">http://localhost</str>
<str name="port">8080</str>
</lst>
<str name="analysisEngine">C:\uima\desc\com\rondhuit\uima\desc\NextAnnotatorDescriptor.xml</str>
<bool name="ignoreErrors">true</bool>
<str name="logField">id</str>
<lst name="analyzeFields">
<bool name="merge">false</bool>
<arr name="fields">
<str>content</str>
</arr>
</lst>
<lst name="fieldMappings">
<lst name="type">
<str name="name">com.rondhuit.uima.next.NamedEntity</str>
<lst name="mapping">
<str name="feature">entity</str>
<str name="fieldNameFeature">uname</str>
<str name="dynamicField">*_sm</str>
</lst>
</lst>
</lst>
</lst>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<requestHandler name="/update/uima" class="solr.XmlUpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">uima</str>
</lst>
</requestHandler>
AND I ALSO ADJUSTED MY requestHander:
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
<lst name="defaults">
<str name="update.processor">uima</str>
</lst>
</requestHandler>
Schema.XML
<!-- fields for UIMA -->
<field name="uname" type="string" indexed="true" stored="true" multiValued="true" required="false"/>
<dynamicField name="*_sm" type="string" indexed="true" stored="true"/>
All I am trying to do is have UIMA pull out names from text (just to start as a demo) and cannot figure out what I am doing wrong.
Thank you in advance for reading this.
Not sure if this ever got addressed, but in case someone else is looking, I had this same problem yesterday. Figured out that I was calling /update/extract to use solrcell, which doesn't use uima because it's integrated into /update.

Resources