Need help to decide between the type of spellchecker to use in solr? - solr

I have a list of cities on mysql db which is hooked onto a UI for autocompletion purposes. I am currently using solr-5.3.0. Data import is happening through scheduled delta imports. I have the following questions:
I want to implement spell checker to this feature. I tried using:
DirectSolrSpellChecker
IndexBasedSpellChecker
FileBasedSpellChecker
Out of these 3 only FileBasedSpellChecker is able to give
suggestions that solely exists on db. For eg, while searching
cologne I've got results like
{
"responseHeader":{
"status":0,
"QTime":4,
"params":{
"q":"searchfield:kolakata",
"indent":"true",
"spellcheck":"true",
"wt":"json"}},
"response":{"numFound":0,"start":0,"docs":[]
},
"spellcheck":{
"suggestions":[
"cologne",{
"numFound":4,
"startOffset":12,
"endOffset":19,
"suggestion":["Cologne",
"Bologna",
"Cogne",
"Bastogne"]}],
"collations":[
"collation","searchfield:Cologne"]}}
These cities are pretty accurate and exists in db/file.
But when I use other 2 I got results like
{
"responseHeader":{
"status":0,
"QTime":4,
"params":{
"q":"searchfield:kolakata",
"indent":"true",
"spellcheck":"true",
"wt":"json"}},
"response":{"numFound":0,"start":0,"docs":[]
},
"spellcheck":{
"suggestions":[
"cologne",{
"numFound":4,
"startOffset":12,
"endOffset":19,
"suggestion":["Cologne",
"Cologn",
"Colognei"]}],
"collations":[
"collation","searchfield:Cologne"]}}
These cities who are not present in my db.
Though FileBasedSpellChecker is giving satisfactory results, but I
am a little apprehensive in using them because, I would need to keep
updating the file manually everytime a new city gets added/removed.
Also its generally not advisable to use FileBasedSpellChecker in
general.
I also need to make the suggestions searchable as well, that means
currently I am accessing the doc returned in
"responseHeader":{"response":{"docs":[<some-format>]}}
to search for results in that city, but now I want the suggestor to
return the results in the same <some-format> instead of just
string results, in order to get it integrated with UI properly.
One minor change requested is to sort the suggestions in ascending
order of edit/levenshtein distance. This is not a hard requirement
and can be negotiated with.
edit
My solrconfig looks like this:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">searchfield</str>
<str name="spellcheck">true</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.dictionary">file</str>
<str name="spellcheck.maxCollationTries">5</str>
<str name="spellcheck.count">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
and
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_ngram</str>
<lst name="spellchecker">
<str name="name">file</str>
<str name="classname">solr.FileBasedSpellChecker</str>
<str name="sourceLocation">spellings.txt</str>
<str name="spellcheckIndexDir">./spellchecker</str>
</lst>
</searchComponent>
schema looks like this:
<field name="name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="latlng" type="location" indexed="true" stored="true" multiValued="false" />
<field name="citycode" type="string" indexed="true" stored="true" multiValued="false" />
<field name="country" type="string" indexed="true" stored="true" multiValued="false" />
<field name="searchscore" type="float" indexed="true" stored="true" multiValued="false" />
<field name="searchfield" type="text_ngram" indexed="true" stored="false" multiValued="true" omitNorms="true" omitTermFreqAndPositions="true" />
<defaultSearchFieldsearchfield</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
<copyField source="name" dest="searchfield"/>

Related

Solr server Context Filtering in Auto suggester not working

I'm experiencing problem when I try to use Context Filtering with auto suggester. What I want is to filter the suggestions based on url field
Here is my searchComponent:
<lst name="suggester">
<str name="name">AnalyzingInfixSuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">main_title</str>
<str name="weightField">main_title</str>
<str name="contextField">url</str>
<str name="suggestAnalyzerFieldType">text_general</str>
</lst>
Here are the fields in my schema:
<field name="main_title" type="string" indexed="true" stored="true"/>
<field name="url" type="string" indexed="true" stored="true"/>
Example:
I'm searching for "aacsb" and I have two results, which is correct. One is in English and one in German. I want to filter them out and show only the German result.
My urls looks like this:
https://www.myWebsite.com/aacsb-dog-lion?german
https://www.myWebsite.com/aacsb-dog-lion?english
Here are my queries:
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=-url:english
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=-english
With these I'm receiving both results. It doesn't matter if we have the field name or not.
When I tried these
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=url:english
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=english
I don't receive any results.
I read the documentation several times:LINK, but I still can't make it work.
Any help is welcomed.
Thanks!
EDIT:
I pasted the wrong queries, this was the correct:
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=url:\*english\*
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=\*english\*

Solr edismax relevancy sorting multiple fields

I use the edismax query parser to handle user queries against our Solr 4.4 server.
Im getting correct query ,but require help with the prioritization.
For example if i give q=ideapad miix 310
1)It will get all the exact matched ,this is working fine .Now if the results
contains ideapad instead of full matched word it should be given least priority
2)prioritization of results in this order
field8,keywords,product,marketing,description .Also here ideapad will be
have least priority.
MY bq:
bq:text:"ideapad miix 310"^20000 OR (text:"miix"^12000 -text:ideapad^-20 -text:thinkpad^-20 -text:ideacentre^-20 -text:thinkcentre^-20 text:"310"^1000 -text:ideapad^-20 -text:thinkpad^-20 -text:ideacentre^-20 -text:thinkcentre^-20)
URL
http://localhost:8983/solr/collection1/select?q=ideapad+miix+310&defType=edismax&bq=text%3A%22ideapad+miix+310%22%5E20000++OR+(text%3A%22miix%22%5E12000+-text%3Aideapad%5E-20+-text%3Athinkpad%5E-20+-text%3Aideacentre%5E-20+-text%3Athinkcentre%5E-20+text%3A%22310%22%5E1000+-text%3Aideapad%5E-20+-text%3Athinkpad%5E-20+-text%3Aideacentre%5E-20+-text%3Athinkcentre%5E-20)
I use the catch all field "text" and boosted copied each fields(field8,keywords etc....)
<field name="field8" type="text_search" indexed="true" stored="true" omitNorms="true"/>
<field name="description" type="text_search" indexed="true" stored="true" omitNorms="true"/>
<field name="keywords" type="commaDelimited" indexed="true" stored="true" omitNorms="true"/>
<field name="product" type="commaDelimited" indexed="true" stored="true" omitNorms="true" omitPositions="true" omitTermFreqAndPositions="true"/>
<field name="marketing" type="commaDelimited_s" indexed="true" stored="true" omitNorms="true" omitPositions="true" omitTermFreqAndPositions="true"/>
<copyField source="field8" dest="text"/>
<copyField source="field8" dest="text"/>
My solrconfig for edismax i have boosted the fields
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="defType">edismax</str>
<str name="qf">
text^100 field8^90 keywords^80 product^70 marketing^60 description^10
</str>
<str name="pf">
text^100 field8^90 keywords^80 product^70 marketing^60 description^10
</str>
</lst>
</requestHandler>

integrating solr autosuggest functionality error

I am trying to integrate auto suggest functionality of solr in my project. I use this as my starting point. I changed my searched fields accordingly.
my schema.xml
<field name="name" type="text_suggest" indexed="true" stored="true"/>
<field name="manu" type="text_suggest" indexed="true" stored="true"/>
<field name="popularity" type="int" indexed="true" stored="true" />
<!-- A variant of textsuggest which only matches from the very left edge -->
<copyField source="name" dest="textnge"/>
<field name="textnge" type="autocomplete_edge" indexed="true" stored="false" />
<!-- A variant of name which matches from the left edge of all terms (implicit truncation) -->
<copyField source="name" dest="textng"/>
<field name="textng" type="autocomplete_ngram" indexed="true" stored="false" omitNorms="true" omitTermFreqAndPositions="true" />
My request handler in solrconfig.xml
<requestHandler class="solr.SearchHandler" name="/ac" default="true" >
<lst name="defaults">
<str name="defType">edismax</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
<str name="qf">name^50 manu^20.0 textng^50.0</str>
<str name="pf">textnge^50.0</str>
<str name="bf">product(log(sum(popularity,1)),100)^20</str>
<str name="debugQuery">false</str>
</lst>
</requestHandler>
The problem is that my "/ac" handler is acting more like "/select" handler. When I type "moni" I am getting nothing. But when I type "monitor", its returning me the documents containing monitor in them.
I have been trying this for whole day and nothing seems to work. Any help will be deeply appreciated
Well when you look for "moni" in your query, you are actually specifically saying that you're looking for the "moni" keyword. Try looking for multiterms keywrods by adding "*", such as q=moni*.
You can also look in other fieldType analyser like autocomplete_edge (q=textnge:mori) or autocomplete_ngram (q=textng:mori) for more data.
I think you need to specify search component in solarconfig.xml like below
<searchComponent class="solr.SpellCheckComponent" name="ac">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory</str>
<str name="field">yourfieldname</str> <!-- the indexed field to derive suggestions from -->
<float name="threshold">0.005</float>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>

Configuring Solr to use UUID as a key

I am trying to configure Solr 4 to work with UUID and so far I am unsuccessful
From reading the documentation I have seen two different ways to configure schema.xml to work with UUID (both do not work)
for both I need to write
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
option 1:
add:
<field name="id" type="uuid" indexed="true" stored="true" default="NEW" multiValued="false"/>
and make sure to remove the line
<uniqueKey>id</uniqueKey>
option 2
add:
<field name="id" type="uuid" indexed="true" stored="true" required="true" multiValued="false" />
Both options are not working correctly and returning
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error initializing QueryElevationComponent.
I also tried adding a row to the colrconfig.xml file with the configuration:
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">uniqueKey</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Thanks,
Shimon
After some work here is the solution:
In schema.xml, add (or edit) the field field
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
In solr config, update the chain and add the chain to the handlers (Example: for /update/extract):
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>`
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
<str name="update.chain">uuid</str>
</lst>
</requestHandler>
You may want to remove the Query Elevation component if not using it.
QueryElevationComponent requires unique key to be defined and it should be a string unique key with JIRA.
However, it was fixed with the Solr 4.0 alpha so it would depend what Solr version you are using.
This limitation is documented in the Solr wiki.

How do I get solr to return results from all indicies?

I am starting to integrate with Solr and have run across what I perceive as an issue. I uploaded a simple spreadsheet using the java API (here is an exert:
- Document, id, value
- Excel3, name, steelers
- Excel3, subject, pirates
- Excel3, description, penguins
- Excel3, comments, panthers
- Excel3, author, panthers
)
Using this I used the first column as the "document name", second column as the field in the document to index, and the third column as the indexed data. All of these fields already existed in schema.xml, but here is how they are set up:
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="subject" type="text_general" indexed="true" stored="true"/>
<field name="description" type="text_general" indexed="true" stored="true"/>
<field name="comments" type="text_general" indexed="true" stored="true"/>
<field name="author" type="text_general" indexed="true" stored="true"/>
now here is where my problem comes into play. I run a search for say steelers, and it comes back fine, but if I look for penguins, or many of the other fields, it does not pull back any results. However if I do description:penguins, the result pulls back as expected.
Can anyone please help me understand why the part before the : is required for some fields, but not others?
example searches:
solr/select?indent=on&q=penguins&wt=xml ----Doesn't return any results
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="indent">on</str>
<str name="q">penguins</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>
solr/select?indent=on&q=description:penguins&wt=xml
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">18</int>
<lst name="params">
<str name="indent">on</str>
<str name="q">description:penguins</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="author">panthers</str>
<str name="comments">panthers</str>
<str name="description">penguins</str>
<str name="id">Excel3</str>
<str name="name">steelers</str>
<str name="subject">pirates</str>
</doc>
</result>
</response>
The default query parser will query the default field, which can be specified in the schema.xml as seen here: http://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field
I think #Frank Famer's comment about using the DisMax parser is a real solution to this problem. That said, here are two work-arounds I've seen in practice:
1.Create an additional copyField that is indexed, not stored, that contains the values from all the fields you want to search and then specify that field as the default. It would look something like this in your schema.xml file.
<field name="myhugedefaultfield" type="text" indexed="true" stored="false" multiValued="true"/>
<copyField source="name" dest="myhugedefaultfield"/>
<copyField source="subject" dest="myhugedefaultfield"/>
<copyField source="description" dest="myhugedefaultfield"/>
<defaultSearchField>myhugedefaultfield</defaultSearchField>
2.Alter the user edited syntax and turn the query for penguins into a query for (name:penguins) OR (subject:penguins) OR (description:penguins).

Resources