Solr filter query cache and autowarming - solr

For the solr autowarming, is there any way to autowarm the filter queries which are executed before?

Yes. Create firstSearcher and newSearcher event listeners as documented here on the Solr wiki: http://wiki.apache.org/solr/SolrCaching#newSearcher_and_firstSearcher_Event_Listeners
It will look like this in your solrconfig.xml
<listener event="firstSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<!-- seed common sort fields -->
<lst> <str name="q">anything</str> <str name="sort">name desc, price desc, populartiy desc</str> </lst>
<!-- seed common facets and filter queries -->
<lst> <str name="q">anything</str>
<str name="facet.field">category</str>
<str name="fq">inStock:true</str>
<str name="fq">price:[0 TO 100]</str>
</lst>
</arr>
</listener>

Related

Implement k means clustering in solr

How can i implement k means clustering in solr 6.5 ?
Requirements :-
1) I want to cluster the docs at the query time on the basis of their score
2) I have written my own handler and i want to add the clustering function in that handler only such that it does not the ordering of the docs
I had tried to write the clustering search component as below :-
<searchComponent name="clustering" enable="${solr.clustering.enabled:true}" class="solr.clustering.ClusteringComponent">
<lst name="engine">
<str name="name">kmeans</str>
<str name="carrot.algorithm">org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm</str>
<str name="BisectingKMeansClusteringAlgorithm.clusterCount">4</str>
<str name="documents">100</str>
<str name="BisectingKMeansClusteringAlgorithm.maxIterations">4</str>
</lst>
</searchComponent>
My Request Handler is as :
<requestHandler name="abc" class="solr.SearchHandler">
<lst name="invariants">
<str name="defType">synonym_edismax</str>
<str name="synonyms">true</str>
<str name="indent">on</str>
</lst>
<lst name="appends">
<str name="fq">search_term</str>
</lst>
<lst name="defaults">
<str name="echoParams">none</str>
<str name="wt">json</str>
<str name="timeAllowed">15000</str>
<str name="qf">Field1</str>
<str name="qf">Field2^0.5</str>
<str name="pf">Field3</str>
<float name="tie">0.2</float>
<str name="fl">Field5,Field6</str>
<str name="facet">false</str>
<str name="mm">2<-1 4<70%</str>
<!-- spellcheck -->
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">1</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
How can i add the clustering in this request handler such that my number of clusters is 4 and iterations is also 4
Also whats the difference between
carrot.url
carrot.snippet
carrot.title
I read the docs definition but i m unable to understand it.
To add the clustering component to a request handler just :
<arr name="last-components">
<str>spellcheck</str>
<str>clustering</str>
</arr>
Then :
<str name="carrot.url">id</str> -> unique key of your document
This is the unique identifier for your document.
<str name="carrot.title">doctitle</str> -> the title(s)/label(s) for your document
This is the field or list of fields, which are short and tend to be more important to group your documents together
<str name="carrot.snippet">content</str> -> the content/text/body of your document
From the wiki :
carrot.title
The field (alternatively comma- or space-separated list of fields) that should be mapped to the logical document’s title. The clustering algorithms typically give more weight to the content of the title field compared to the content (snippet). For best results, the field should contain concise, noise-free content. If there is no clear title in your data, you can leave this parameter blank.
carrot.snippet
The field (alternatively comma- or space-separated list of fields) that should be mapped to the logical document’s main content. If this mapping points to very large content fields the performance of clustering may drop significantly. An alternative then is to use query-context snippets for clustering instead of full field content. See the description of the carrot.produceSummary parameter for details.
carrot.url
The field that should be mapped to the logical document’s content URL. Leave blank if not required.

solr suggester not working with shard for multiple core

I'm trying to use the suggest component (solr 4.6) with multiple cores. I have added a search component and a request handler in my solrconfig. That works fine for 1 core but querying my solr instance with the shards parameter does not work.
But did you mean' (spell check ) is working fine with multiple cores using shard.
Here is the configuration part of solrconfig file :
<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggestDictionary</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
<str name="field">suggest</str>
<float name="threshold">0.0005</float>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
<lst name="defaults">
<str name="echoParams">none</str>
<str name="wt">xml</str>
<str name="indent">false</str>
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggestDictionary</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.collate">false</str>
<str name="qt">/suggest</str>
<str name="shards.qt">/suggest</str>
<str name="shards">localhost:8080/cores/core1,localhost:8080/cores/core2</str>
<bool name="distrib">false</bool>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
<shardHandlerFactory class="HttpShardHandlerFactory">
<int name="socketTimeOut">1000</int>
<int name="connTimeOut">5000</int>
</shardHandlerFactory>
</requestHandler>
It works for me..
You can get the suggestions using this RestURL
http://localhost:8983/solr/demo/spell?q=howoo&wt=json&indent=true&qt=spell&shards.qt=/spell&shards=localhost:8983/solr/demo_shard2_replica1,localhost:8983/solr/demo_shard1_replica2
OR Simply use this :
http://localhost:8983/solr/demo/spell?q=hoo&wt=json&indent=true&shards.qt=/spell
shards.qt=/spell : Need to add that allows suggestion on shards
Here, you have make changes and apply for things which requires.
Collection = demo
Shards = demo_shard2_replica1, demo_shard1_replica2
Replace collection and shards names with your names of collection and shards.

What are manu, sku and cat in the qf parameter in SOLR?

I was taking a look at the solrconfig.xml for the dismax parser and found a bunch of values such as sku, manu and cat. What are these?
<requestHandler name="dismax" class="solr.SearchHandler" >
<lst name="defaults">
<str name="defType">dismax</str>
<str name="echoParams">explicit</str>
<float name="tie">0.01</float>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="pf">
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
</str>
<str name="bf">
popularity^0.5 recip(price,1,1000,1000)^0.3
</str>
<str name="fl">
id,name,price,score
</str>
<str name="mm">
2<-1 5<-2 6<90%
</str>
<int name="ps">100</int>
<str name="q.alt">*:*</str>
<!-- example highlighter config, enable per-query with hl=true -->
<str name="hl.fl">text features name</str>
<!-- for this field, we want no fragmenting, just highlighting -->
<str name="f.name.hl.fragsize">0</str>
<!-- instructs Solr to return the field itself if no query terms are
found -->
<str name="f.name.hl.alternateField">name</str>
<str name="f.text.hl.fragmenter">regex</str> <!-- defined below -->
</lst>
</requestHandler>
Those are fields being searched for: SKU (stock keeping unit), manufacturer and categories.
you are probably looking at the solrconfig.xml that is provided as a SAMPLE, in order to be used with the docs to index at exampledocs/ directory.
These are the field names the sample docs (and schema) contain. It's just like a sample installation of solr.

How to make Solr query returning results grouped by field

I like to channel Solr search results at query time. For example I have three channels: products, faq and other_docs. All within the same Solr core with the same fields filled. What I would like to acceive is to have Solr group the results "channel" for me.
Sample database (csv):
id,channel,name,desc
1,product,Some product,This is an very cool product!
2,product,Other product,This is an other product!
3,faq,How to stuff,This time: Simply do it!
4,other_docs,Legal notice,All your base are belong to us!
Wanted query result (xml):
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="grouped">
<lst name="channel">
<int name="matches">3</int>
<arr name="groups">
<lst>
<str name="groupValue">product</str>
<result name="doclist" numFound="2" start="0">
<doc>
<str name="name">Some product</str>
<str name="desc">This is an very cool product!</str></doc>
<doc>
<str name="name">Other product</str>
<str name="desc">This is an other product!</str></doc>
</result>
</lst>
<lst>
<str name="groupValue">faq</str>
<result name="doclist" numFound="1" start="0">
<doc>
<str name="name">How to stuff</str>
<str name="desc">This time: Simply do it!</str></doc>
</result>
</lst>
</arr>
</lst>
</lst>
</response>
How do I acceive this?
Check Field collapsing feature in SOLR
Result Grouping / Field Collapsing

Removing unwanted items from solr autosuggester

I am trying to implement auto suggest from a huge set of paragraphs that are indexed. But I would want to filter out certain unwanted words appearing in auto suggest. For example words like "and", "how", "when", etc needs to be avoided. How do i go about it.
This is the configuration I have done for autosuggest in solrconfig.xml..
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">keywords</str>
<float name="threshold">0.005</float>
<str name="buildOnCommit">true</str>
</lst>
I would recommend adding the StopFilterFactory to the backing fieldType definition for your keywords field in your schema.xml file. If you need those words ("and", "how", "when") in your keywords field for other searching requirements, I would suggest creating a custom field in your schema.xml just for the suggester and you can use the copyField directive to populate this new field.

Resources