How to tune apache SOLR spellcheck for desired suggestion? - solr

Enviroment: SAP Hybris 6.7.0.0, Apache Solr 7.7.2
I am using solr to power a indie eCommerce platform. In that context we have product data in the Solr dB. For example: productName_text, BrandName_string, etc.
I've created a spellcheck component with this current configuration below:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">en</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="field">spellcheck_en</str>
<str name="distanceMeasure">internal</str>
<float name="accuracy">0.7</float>
<int name="maxEdits">2</int>
<int name="minPrefix">0</int>
<int name="maxInspections">5</int>
<int name="minQueryLength">2</int>
</lst>
</searchComponent>
And turned on spellcheck on /select request handler
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.collate">true</str>
and spellcheck is configured dynamically for the a single field. Suppose:
productName_text
which consists of product names from a typical electronic gadgets or it's cases. For example:
"Apple Watch Series 2 38mm Stainless Steel Case with Midnight Blue Modern Buckle Medium"
"A.O. Smith X4 RO Water Purifier (White)"
If we misspell "wath" for "watch" we get suggestion "water". Or spelling "suop maker" for "soup maker" we get "shop maker". How to tune spellchecker according to my data? Is there any other solution to implement for misbehaving queries.
Tried playing with all the spellcheck configuration from [1]: https://cwiki.apache.org/confluence/display/SOLR/SpellCheckComponent but couldn't find any solid solution yet.
Tried implementing WordBreakSolrSpellChecker, which doesn't seem to change any outcome
Played around with "spellcheck.collate" and other attributes, but it returns suggestion which has no search result.
I've observed, spellcheck is deeply affected by multivalued fields(?)
In general, How to go about the terms which should give wrong suggestion, or suggestions that are that must not come based on user preferences? Is it possible to handle two different spellcheck components, if "DirectSolrSpellChecker" does'nt give desired suggestion , I can switch to "FilebasedSpellChecker"? Can I maintain a .txt file to track all the terms which needs tuning, or the same in SAP hybris?

Related

Solr SuggestComponent - Building dictionaries based on certain filters?

I am currently using the solr suggest component for an autocomplete feature. Now, according to user permissions and which area of the site I am on, I want to offer the user different suggestions. Now I assumed it would easily be possible to only consider certain explicit entries for building my dictionary (i.e.: dict1 is built only from entries where type=t1 and locale=en, dict 2 where type=t1 and locale=de, dict3 where type=t2 and locale=en, etc...). But I can't figure out where I would do such a thing. The system is running solr 4.6.
Do you know of any solution or have a possible workaround?
I am not currently able to update solr on the system or change the way documents are indexed apart from the solr configuration, so unfortunately context filtering is not available to me. This would only be a last resort if nothing else works.
Since you’re using old Solr 4.6 which doesn’t have context filtering or even multiple dictionaries, you would need to specify your dictionaries aka SearchComponent for each of the entry
<searchComponent class="solr.SpellCheckComponent" name="suggest-en">
<lst name="spellchecker">
-->
<!--
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">name</str> <!-- the indexed field to derive suggestions from
<float name="threshold">0.005</float>
<str name="buildOnCommit">true</str>
<str name="sourceLocation">american-english</str>
--> </lst>
</searchComponent>
And then define request handlers, like this:
<requestHandler class="org.apache.solr.handler.component.SearchHandler"
name="/suggest-en">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest-en</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
The workaround here would be, that you would need to specify suggest-en, suggest-de, and all other types of the request handlers, and later point out clients depends on their profile to the correct request handler.

Search suggestions in django-oscar using solr

I've setup a django-oscar project and enabled solr 4.7.2 on it as per documentation.
Solr seems to be working fine. Testing the suggestions for 'exxample' (localhost:8983/solr/collection1/spell?spellcheck.q=exxample&spellcheck=true>) I get:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">10</int>
</lst>
<result name="response" numFound="0" start="0"/>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="exxampl">
<int name="numFound">1</int>
<int name="startOffset">0</int>
<int name="endOffset">8</int>
<int name="origFreq">0</int>
<arr name="suggestion">
<lst>
<str name="word">exampl</str>
<int name="freq">2</int>
</lst>
</arr>
</lst>
<bool name="correctlySpelled">false</bool>
<lst name="collation">
<str name="collationQuery">exampl</str>
<int name="hits">2</int>
<lst name="misspellingsAndCorrections">
<str name="exxampl">exampl</str>
</lst>
</lst>
</lst>
</lst>
</response>
I've also enabled OSCAR_SEARCH_FACETS to make sure that Solr has been correctly registered by Django-Oscar, and it seems to be working fine.
HOWEVER, when I do a test search for a simple misspelling in django-oscar, I get 0 returned search results and no suggestions. I'm not sure what to do next.
Help would be greatly appreciated!
I've managed to fix this problem. I'll write my complete solution to setting up Solr with spelling suggestions on Django-Oscar since setup procedures require adjustments from that described in the official documentation. This is also my first time working with Solr (or any search engine), so don't expect some expert guidance, just a guide on how to get Solr up and running on Oscar.
I am using Oscar 1.5 with Solr 4.7.2 (solutions also works for 4.10.4 ... not sure about other versions). Do everything as per documentations - note that there is a slight difference in instructions for versions of Oscar that are < 1.5.
Once you have Solr installed and running you can test out an inquiry on the Solr server # localhost:8983/solr/collection1/spell?spellcheck.q=[your search inquiry goes here; no brackets]&spellcheck=true>. Needs to be a word from your database - either in product description or product title.
You will get an error result saying that Analyzer needs to be of same type. Fix this by editing the solrconfig.xml file located at ./solr-4.7.2/example/solr/collection1/conf/solrconfig.xml. Search for <str name="field">, and change each non-commented instance to <str name="field">text</str> - you can also change each instance to <str name="field">title</str>, but this restricts to words found in titles only. Restart the Solr server. These changes will do away with the Analyzer error and your Solr server will now start showing results, however they won't yet be fed into your Oscar site.
To fix this you need to make another adjustment to the same solrconfig.xml file. Search for <requestHandler name="/select" class="solr.SearchHandler">, and at the bottom of this request handler include the following code:
<arr name="last-components">
<str>spellcheck</str>
</arr>
Restart the server. Now you have spelling suggestions in your Oscar site. Hope others have found this helpful. Like I said - this is the first time I'm using Solr. If someone has anything to add, or extend Solr functionality on Oscar it would be great.

Autocomplete term suggestion as per popularity

I have implemented autocomplete term suggestion in my MVC application. Let me explain you how I have done this. I have created one table in DB and table columns is like:
Id SearchTerm CatID ResultCount Clicks Latency TermSearchTime
Now, whenever user search a term we store it in this table. Next time it same word match we display term suggestion. Moreover, we display term suggestion as term popularity. Which word is more searched is displayed first in suggestion.
But now I also want to provide term suggestion for misspell term. For example Samsung is already there in my table. If someone search for samsng in that case Samsung should be there in term suggestion.
As I do not know how to spell check in SQL server, I decided to do it using Solr.
How can I do it using Solr with my default behaviour which I have done with SQL Db? Moreover, please note Search result I fetch from the Solr. I have already index all products. Do I need to index Search Term as well?
Any help is appreciation. Thanks.
check this in your solrconfig.xml file to use spellcheck handler.
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">text</str>
<!-- Solr will use suggestions from both the 'default' spellchecker
and from the 'wordbreak' spellchecker and combine them.
collations (re-written queries) can include a combination of
corrections from both spellcheckers -->
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr></requestHandler>
if not present then copy paste in your file. restart solr. try /spell?q=ipad

Solr very slow filters

I have problem with very slow filters in Solr (version 4.9.1), there is ~50k documents. For first query which use specific category_id filter value, query takes ~15 seconds, second time is much more faster (it takes miliseconds). But i want to have fast filters always :) So after googling it i read that I must have filterCache and cache Autowarming
Sooo what I've done:
filterCache:
<filterCache
class="solr.FastLRUCache"
size="16384"
initialSize="4096"
autowarmCount="4096" />
firstSearcher:
<listener event="firstSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst>
<str name="q">*</str>
<str name="fq">category_id:1043</str>
</lst>
</arr>
</listener>
<useColdSearcher>true</useColdSearcher>
<useFilterForSortedQuery>true</useFilterForSortedQuery>
<maxWarmingSearchers>2</maxWarmingSearchers>
It doesn't work ;/ no idea why... For first entry on this category it takes 15s, than its fast. But I always must have fast response, for categories and for other filters.
I make an experiment, everything works better if I use mainquery instead of filters, but filters should be as fast as mainquery (i read it somewhere).
Summary:
What i'm doing wrong that autowarming dont work?
How make autowarming for each filter/each filter value?
What I'm trying to do:
Ok so, I have shop with ~50 000 products and ~1000 categories and a lot of other filters (type, price etc), my catalog is based on SOLR (filtering), now if I use filters first entry to category takes 15seconds, it must be fast every single time....
My example query:
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="debugQuery">true</str>
<str name="website_id:1"/>
<str name="stats.field">PLN_0_price_decimal</str>
<str name="product_status:1"/>
<str name="q">**</str>
<str name="store_id:1"/>
<str name="fq">category_id:10561</str>
</lst>
</lst>
So, solution was simple, I have to use * instead of ** in my query.
Part of debug section from response with *:
<str name="parsedquery">MatchAllDocsQuery(*:*)</str>
<str name="parsedquery_toString">*:*</str>
Same part of debug section from response with **:
<str name="parsedquery">textSearch:**</str>
<str name="parsedquery_toString">textSearch:**</str>
The first time you use a filter, every document needs to be looked at, even if the main query will match only a couple. You could disable caching for such filter or switch to a post-filter (by assigning filter cost). The fuller explanation is here.

Solr edismax qf and pf defaults not working to boost fields

I am attempting to set up a request handler that will boost certain fields by different amounts. I have the following request handler.
<requestHandler name="/select" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="start">0</str>
<int name="rows">10</int>
<str name="defType">edismax</str>
<str name="qf">
title^50.0 searchTitle^7.0 keywords^5.0 content^1.0 text^1.0
</str>
<str name="pf">
title^50.0 searchTitle^7.0 keywords^5.0 content^1.0 text^1.0
</str>
<str name="df">text</str>
</lst>
</requestHandler>
However, the fields aren't being boosted correctly, if at all. I noticed that documents with the search term in the title field aren't appearing any higher than documents with the search term in the text field. Arbitrarily re-arranging the weights produces the same document order each time.
When I go into the solr web interface/admin UI and do a search I get the same results. However, if I explicitly check the edismax checkbox and enter the field-boost data in the qf and pf boxes I get the results and the weighting I would expect.
In fact, I also just tried changing the rows value to 5 and still received the same result. It looks like my queries aren't being handled by the /select handler, even though that is what I choose both in the solr Admin UI and when I create the HttpSolrServer object to do the queries from the server.
I am using solr v4.8.0.
Any help would be appreciated.
Check setting in solrconfig for
<requestDispatcher handleSelect="false" >
If you want to use select as a requesthandler, this needs to be
<requestDispatcher handleSelect="true" >

Resources