Can we give boost to fields through solr config file? - solr

Every time we mention in query to give boost. Is it possible to mention boost for any field name in solr config itself ?

in the requestHandler config :
<requestHandler name="/select" class="solr.SearchHandler">
....
<lst name="appends">
<str name="qf">my_col^1</str>
<!--str name="qf">my_col^boost_val</str-->
<!--str name="bq">my_col2^boost_val</str-->
</lst>
....

It is possible to individually boost fields in Solr.
There is an additional parameter qf (Query Fields) which introduce list of fields, each of which is assigned a boost factor to increase or decrease that particular field's importance in the query.
Below is the sample solrconfig.
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">dismax</str>
<str name="qf">title^10 content^5</str>
</lst>
</requestHandler>
In above qf assigns title field a boost of 10 and content a boost of 5.
NOTE :- The qf (Query Fields) Parameter can't be used with the standard query parser. You can use it with the dismax or edismax query parser.

Related

Solr query with repeating parameter

I try to "extend" one of the examples from the Solr Query Parsing presentation ( http://www.slideshare.net/erikhatcher/sa-22830939 ). I'd like to extend it in a way that I'm able to retrieve multiple solr documents at once with a querystring which http://.../solr/docs?id=1&id=2&id=3
The original configuration for the requestHandler looks like this:
<requestHandler name="/docs" class="solr.SearchHandler">
<lst name="defaults">
<str name="q">{!term f=id v=$id}</str>
</lst>
<arr name="components">
<str>query</str>
<str>highlight</str>
<str>debug</str>
</arr>
</requestHandler>
But this works only for a single id parameter ( http://.../solr/docs?id=1 ) - which query parser or configuration would I have to use to match it against multiple id parameters?
Thanks for your help.

How to boost repeated values in a multiValue field on Solr

I have some repeated (same strings) data in a multiValue field on my solr index and i want to boost documents by matches count in that field. For example:
doc1 : { locales : ['en_US', 'de_DE', 'fr_FR', 'en_US'] }
doc2 : { locales : ['en_US'] }
When i run the query q=locales:en_US i would like to see the doc1 at the top because it has two "en_US" values. What is the proper way to boost this kind of data?
Should i use a special tokenizer?
Solr version is: 4.5
Disclaimer
In order to use either of the following solutions you will need to make either one of the following changes:
Create a copyField for locales:
<field name="locales" type="string" indexed="true" stored="true" multiValued="true"/>
<!-- No need to store(stored="false") locales_text as it will only be used for searching/sorting/boosting -->
<field name="locales_text" type="text_general" indexed="true" stored="false" multiValued="true"/>
<copyField source="locales" dest="locales_text"/>
Change the type of locales to "text_general" (the type is provided in the standard solr collection1)
First solution (Ordering):
Results can be ordered by some function. So we can order by number of occurrences (termfreq function) in field:
If copyField is used, then sort query will be: termfreq(locales_text,'en_US') DESC
If locales is of text_general type, then sort query will be: termfreq(locales,'en_US') DESC
Example response for copyField option (the result is the same for text_general type):
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="fl">*,score</str>
<str name="sort">termfreq(locales_text,'en_US') DESC</str>
<str name="indent">true</str>
<str name="q">locales:en_US</str>
<str name="_">1383598933337</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="2" start="0" maxScore="0.5945348">
<doc>
<arr name="locales">
<str>en_US</str>
<str>de_DE</str>
<str>fr_FR</str>
<str>en_US</str>
</arr>
<str name="id">4f9f71f6-7811-4c22-b5d6-c62887983d08</str>
<long name="_version_">1450808563062538240</long>
<float name="score">0.4203996</float></doc>
<doc>
<arr name="locales">
<str>en_US</str>
</arr>
<str name="id">7f93e620-cf7b-4b90-b741-f6edc9db77c9</str>
<long name="_version_">1450808391856291840</long>
<float name="score">0.5945348</float></doc>
</result>
</response>
You can also use fl=*,termfreq(locales_text,'en_US') to see the number of matches.
One thing to keep in mind - it is an order function, not a boost function. If you will rather boost score based on multiple matches, you will be probably more insterested in the second solution.
I included the score in the results to demonstrate what #arun was talking about. You can see that the score is different(probably to length)... Quite unexpected(for me) that for multivalued string it is the same.
Second solution (Boosting):
If copyField is used, then the query will be : {!boost b=termfreq(locales_text,'en_US')}locales:en_US
If locales is of text_general type, then the query will be: {!boost b=termfreq(locales,'en_US')}locales:en_US
Example response for copyField option (the result is the same for text_general type):
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="lowercaseOperators">true</str>
<str name="fl">*,score</str>
<str name="indent">true</str>
<str name="q">{!boost b=termfreq(locales_text,'en_US')}locales:en_US</str>
<str name="_">1383599910386</str>
<str name="stopwords">true</str>
<str name="wt">xml</str>
<str name="defType">edismax</str>
</lst>
</lst>
<result name="response" numFound="2" start="0" maxScore="1.1890696">
<doc>
<arr name="locales">
<str>en_US</str>
<str>de_DE</str>
<str>fr_FR</str>
<str>en_US</str>
</arr>
<str name="id">4f9f71f6-7811-4c22-b5d6-c62887983d08</str>
<long name="_version_">1450808563062538240</long>
<float name="score">1.1890696</float></doc>
<doc>
<arr name="locales">
<str>en_US</str>
</arr>
<str name="id">7f93e620-cf7b-4b90-b741-f6edc9db77c9</str>
<long name="_version_">1450808391856291840</long>
<float name="score">0.5945348</float></doc>
</result>
</response>
You can see that the score changed significantly. The first document score two time more than the second (because there was two matches each scored as 0.5945348).
Third solution (omitNorms=false)
Based on the answer from #arun I figured that there is also a third option.
If you convert you field to (for example) text_general AND set omitNorms=true for that field - it should have the same result.
The default standard request handler in Solr does not use only the term frequency to compute the scores. Along with term frequency, it also uses the length of the field. See the lucene scoring algorithm, where it says:
lengthNorm - computed when the document is added to the index in accordance with the number of tokens of this field in the document, so that shorter fields contribute more to the score.
Since doc2 has a shorter field it might have scored higher. Check the score for the results with fl=*,score in your query. To know how Solr arrived at the score, use fl=*,score&wt=xml&debugQuery=on (then right click on your browser and view page-source to see a properly indented score calculation). I believe you will see the lengthNorm contributing to a lower score for doc1.
To have length of the field not contribute to the score, you need to disable it. Set omitNorms=true for that field. (Ref: http://wiki.apache.org/solr/SchemaXml) Then see what the scores are.

Solr Facet Search-Spell check

I'm usign Solr facet search on a column of database. It successfully returns the data:
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="tags">
<int name="lol">58</int>
<int name="scienc">58</int>
<int name="photo">34</int>
<int name="axiom">27</int>
<int name="geniu">14</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
I want to make sure that only complete words are counted. In the above example you can see counts for'scienc' and 'geniu' that should be for 'science' and 'genius'. How can I achieve this? Can I incorporate spell checking feature?
This probably has to do with the underlying fieldType that you have associated with your tags field. The field value is most likely being stemmed or having other analyzers associated with it. I would suggest one of two things:
Remove the stemming and/or other processing to prevent the words from appearing as partial.
(Recommended) Create a separate field tags_facet with fieldType="string" in your schema.xml and use a copyField directive to copy the values feed into your original tags field. Then facet on this new tags_facet field.
Use the copyField feature of Solr to copy the original field to one with a string fieldType. If the values are a set of words, instead of string, you could use a whitespace tokenised fieldtype (without ngrams of course.)

Solr More Like This (MLT) using a different unique identifier than the default one id

I m trying to use MLT but I have as unique identifier doc_id instead of id and if I do this :
http://localhost:8983/solr/mlt/?q=doc_id:question#11 I have no results
where If I do this
http://localhost:8983/solr/mlt/?q=id:11 I have results
<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
<lst name="defaults">
<str name="mlt.fl">title,text</str>
<str name="mlt.mintf">1</str>
<str name="mlt.mindf">2</str>
<str name="mlt.minwl">2</str>
<str name="mlt.boost">true</str>
<int name="rows">5</int>
<str name="fl">id,doc_id,title,content_type,user_id,topic_id,score</str>
</lst>
</requestHandler>
How can I use MLT with doc_id as my unique identifier ?
What you have looks fine. MLT just users the query to find a doc and if found use that doc for the source document. Are you sure a document is returned with the query "doc_id:question#11". Put the value in quotes and see if that get you the document back, ex. doc_id:"question#11". What is the datatype for doc_id?

Can I restrict the search to a specific date range?

I want to get all results AFTER a given date, can you do this with solr?
(http://lucene.apache.org/solr/)
Right now the results are search the entire result set, I want to filter for anything after a given date.
Update
This isn't working for me yet.
My returned doc:
trying:
http://www.example.com:8085/solr/select/?q=test&version=2.2&start=0&rows=10&indent=on&indexed_at:2009-08-27T13%3A15%3A27.73Z
<doc>
<str name="apptype">Forum</str>
<str name="collapse">forum:334</str>
<str name="content"> testing </str>
<str name="contentid">357</str>
<str name="createdby">some_user</str>
<str name="date">20090819</str>
<str name="dummy_id">1</str>
<int name="group">5</int>
<date name="indexed_at">2009-08-25T16:48:45.121Z</date>
<str name="rating">000.0</str>
<str name="rawcontent"><p>testing</p></str>
−
<arr name="roles">
<str>1</str>
<str>2</str>
<str>3</str>
<str>4</str>
<str>14</str>
<str>15</str>
<str>16</str>
</arr>
<int name="section">79</int>
<int name="thread">334</int>
<str name="title">testing</str>
<str name="titlesort">testing</str>
<str name="type">forum</str>
−
<str name="unique_id">
BLAHBLAH|357
</str>
<str name="url">/blahey/f/79/p/334/357.aspx#357</str>
<str name="user">21625</str>
<str name="username">some_user</str>
</doc>
Yes you can I assume you have a field with the date value you want to filter on. Then you do
yourdatefield:[2008-08-27T23:59:59.999Z TO *]
a sample url would be localhost:8983/solr/select?q=yourdatefield:[2008-08-27T23:59:59.999Z TO *]
you want to submit the date part as a query so in the value of q like
localhost:8983/solr/select/q=(text:test+AND+indexed_at:`[2009-08-27T13:A15:A27.73Z TO *`])
So the entire query is contained within the q querystring paramter.
the format of the date is ISO 8601.
You can add a automatic timestamp to the documents as they are indexed using:
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
in the schema.xml. The default schema has this commented out so if you copied the default, you just need to uncomment it.
You could add that and use olle's suggested search pattern to find the documents indexed after a certain date. (You'd have to update yourdatefield with timestamp or whatever you name the field in the xml.
You will need to create a query that compares dates, here is the syntax for queries:
http://wiki.apache.org/solr/SolrQuerySyntax
And here is how you can make date comparisons in the query:
http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html

Resources