What is the proper way to boost items with newer dates? - solr

I have a more like this query which I would like to update to return newer documents first. According to the documentation, I would need to add recip(ms(NOW,mydatefield),3.16e-11,1,1) to my query.
But when I try to add it to either of mlt.qf or bf parameters. The results stay exactly the same.
This is my query:
/solr/mlt?
q=id:cms.article.137861
&defType=edismax
&rows=3
&indent=on
&mlt.fl=series_id,tags,title,text
&mlt.qf=show_id text^1.1 title^1.1 tags^90
&wt=json
&fl=url,title,tags,django_id,content_type_id
&bf=recip(ms(NOW,pub_date),3.16e-11,1,1)

this is taken from the solr wiki (its down but i have it cached)
i think this is what you are looking for.
How can I boost the score of newer documents
Do an explicit sort by date (relevancy scores are ignored)
Use an index-time boost that is larger for newer documents
Use a FunctionQuery to influence the score based on a date field.
In Solr 1.3, use something of the form recip(rord(myfield),1,1000,1000)
In Solr 1.4, use something of the form recip(ms(NOW,mydatefield),3.16e-11,1,1)
http://lucene.apache.org/solr/api/org/apache/solr/search/function/ReciprocalFloatFunction.html http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html
A full example of a query for "ipod" with the score boosted higher the newer the product is:
http://localhost:8983/solr/select?q={!boost b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}ipod
One can simplify the implementation by decomposing the query into multiple arguments:
http://localhost:8983/solr/select?q={!boost b=$dateboost v=$qq}&dateboost=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)&qq=ipod
Now the main "q" argument as well as the "dateboost" argument may be specified as defaults in a search handler in solrconfig.xml, and clients would only need to pass "qq", the user query.
To boost another query type such as a dismax query, the value of the boost query is a full sub-query and hence can use the {!querytype} syntax. Alternately, the defType param can be used in the boost local params to set the default type to dismax. The other dismax parameters may be set as top level parameters.
http://localhost:8983/solr/select?q={!boost b=$dateboost v=$qq defType=dismax}&dateboost=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)&qf=text&pf=text&qq=ipod
Consider using reduced precision to prevent excessive memory consumption. You would instead use recip(ms(NOW/HOUR,mydatefield),3.16e-11,1,1). See this thread for more information.

apparently your date field is not a TrieDate

Related

SOLR edismax with BF function on non existing fields

I would like to apply negative boost on the documents which does not have specific fields. But its not working and results the same boost value for the document with and without that field.Any pointers would be of great help.
bf=if(not(exists('image-small_string')),0,-500)
The answer is to boost those documents that do not match your query, instead of trying to apply a negative boost to those that do.
To boost documents that has a specific field, you can use bq=foo:[* TO *]^5 (and adjust the boost factory to match the behaviour you're looking for).

Solr Query Syntax conversion from boolean expression

I'm attempting to query solr for documents, given a basic schema with the following field names, data types irrelevant:
I'm attempting to match documents that match at least one of the following:
occupation, name, age, gender but i want to OR them together
How do you OR together many terms, and enforce the document to match at least one?
This seems to be failing: +(name:Sarah age:24 occupation:doctor gender:male)
How do you convert a boolean expression into solr query syntax? I can't figure out the syntax with + and - and the default operator for OR.
Still I don't get your requirement but you just need to query like:
+(age:24 OR gender:male)
Or if you want data for multiple value in same field with OR condition like.
i.e. You get data of age:24 and age:25 both.
+(age:24 OR age:25 OR gender:male)
Then you can:
+(age:(24 25) OR gender:male)
If it is't your requirement, then let me know.
If you want to make it as simple as possible for the client, just go for the dismax[1] or edismax[2] query parser.
Specifically you can configure a request parameter called "qf" :
"The qf parameter introduces a list of fields, each of which is assigned a boost factor to increase or decrease that particular field’s importance in the query. For example, the query below:
qf=fieldOne^2.3 fieldTwo fieldThree^0.4
assigns fieldOne a boost of 2.3, leaves fieldTwo with the default boost (because no boost factor is specified), and fieldThree a boost of 0.4.
These boost factors make matches in fieldOne much more significant than matches in fieldTwo, which in turn are much more significant than matches in fieldThree." from the wiki
Then you can just pass a free text query, and it will be searched in the fields you specified, giving also different importance to each one, if necessary.
[1] https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html
[2] https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html

In Solr, how can we use terms external to the search query to bias result ordering?

We're working on a plan to identify content tags our users are interested in. So, for instance, we may determine that User X consumes content tagged with "kermit" and "piggy" more often than other tags. These are their "favored tags."
When the users search, we'd like to favor/bias documents that contain these terms.
This means we can't boost the documents at index time, because every user will have different favored tags. Additionally, they may not be searching for the favored tags themselves. They may search for "gonzo," and so we absolutely want to give them documents with "gonzo," but we want to boost documents that also contain "kermit" or "piggy."
These favored tags are not used to actually query the index, but rather are used to bias the result ordering. The favored tags become something of a tie-breaker -- all else being equal, documents containing these terms will rank higher.
This is new/planned development, so we can use whatever version and parser stack is optimal to solve this problem.
Solution in SolrNet
The question was correctly answered below, but here's the code for SolrNet just in case someone else is using it.
var localParams = new LocalParams();
localParams.Add("bq", "kermit^10000); //numeric value is the degree of boost
var solr = ServiceLocator.Current.GetInstance<ISolrOperations<MySolrDocumentClass>>();
solr.Query(new SolrQuery("whatever") + localParams);
You didn't specify which query parser you're using, but if you are using the Dismax or Extended Dismax query parser, the bq argument should do exactly what you're looking for. bq adds search criteria to a search solely for the purpose of affecting the relevancy, but not to limit the result set.
From the Dismax documentation:
The bq (Boost Query) Parameter
The bq parameter specifies an additional, optional, query clause that
will be added to the user's main query to influence the score. For
example, if you wanted to add a relevancy boost for recent documents:
q=cheese
bq=date:[NOW/DAY-1YEAR TO NOW/DAY]
You can specify multiple bq parameters. If you want your query to be
parsed as separate clauses with separate boosts, use multiple bq
parameters.
In this case, you may want to add &bq=kermit&bq=piggy to the end of your Solr query. If you aren't using one of these query parsers, this need may be exactly the motivation you need to switch.

How to boost fields in solr

I already have the boost determined before hand. I have a field in the solr index called boost1 . This boost field will have a value from 1 to 10 similar to google PR rank. This is the boost that should be applied to every query ran in solr. here are the fields in my index
Id
Title
Text
Boost1
The boost field should be apply to every query. I am trying to implement functionality similar to Google PR rank. Is there a way to do this using solr?
you can add the boost during query e.g.
q={!boost b=boost1}
How_can_I_boost_the_score_of_newer_documents
However, this may need to be added explicitly by you.
If you are using dismax or edismax with the request handler, The bf (Boost Functions) parameter could be used to boost the documents.
http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29
bf=boost1^0.5
This can be added to defaults with the request handler definition, so that they are applied to all the search queries.
you can use function queries to vary the amount of boost FunctionQuery
I think you need to use index time document boosts. See this if you are indexing XML or this if using DataImportHandler.

Adding date boosting to complex SOLR queries

I currently have a SOLR query which uses the query (q), query fields (qf) and phrase fields (pf) to retrieve the results I want. An example is:
/solr/select
?q=superbowl
&qf=title^3+headline^2+intro+fulltext
&pf=title^3+headline^2+intro+fulltext
&fl=id,title,ts_modified,score
&debugQuery=true
The idea is that the title and headline of the "main item" give the best indication of what the result is "about", but the intro and fulltext provides some input too. Ie, imagine a collection of links, where the collection itself has metadata (what it's a collection of), but each link has it's own data (title of the link, synopsis, etc). If we search for "superbowl", the most relevant results are the ones with "superbowl" in the collection metadata, the least relevant results are those with "superbowl" in just the synopsis of one of the links... but they're all valid results.
What I'm trying to do is add a boost to the relevancy score so that the most recent results float towards the top, but retaining title,headline,intro,fulltext as part of the formula. A recent result with the search string in the collection metadata would be more relevant than one with it only in the links metadata... but that "links only" recent result might be more relevant than a very old result with the search string in the collection metadata. (I hope that's somewhat clear).
The problem is that I can't figure out how to combine the boost function documented on the SOLR site with the use of the qf/pf fields. Specifically...
From the SOLR site, something like the following works to boost the results by date:
/solr/select
?q={!boost%20b=$dateboost%20v=$qq}
&dateboost=ord(ts_modified)
&qq=superbowl
&fl=ts_modified,score
&debugQuery=true
However, I can't figure out how to combine that query with the use of qf and pf. Any suggestions would be more than welcome.
Thanks to danben's response, I was able to come up with the following:
/solr/select
?q={!boost%20b=$dateboost%20v=$qq%20defType=dismax}
&dateboost=ord(ts_modified)
&qq=superbowl
&qf=title^3+headline^2+intro^2+fulltext
&pf=title^3+headline^2+intro^2+fulltext
&fl=ts_modifieds,score
&debugQuery=true
It looks like the actual problems I was having were:
I left spaces in the q param instead of escaping them (%20) when copy/pasting
I didn't include the defType=dismax in my q param, so that it would pay attention to the qf/pf parameters
Check out http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
This is based on the ms function, which returns the difference in milliseconds between two timestamps / dates, and ReciprocalFloatFunction which increases as the value passed decreases.
Since you are using the DisMaxRequestHandler, you may need to specify your query using the bq/bf parameters. From http://lucene.apache.org/solr/api/org/apache/solr/handler/DisMaxRequestHandler.html:
bq - (Boost Query) a raw lucene query that will be included in the
users query to influence the score. If
this is a BooleanQuery with a default
boost (1.0f), then the individual
clauses will be added directly to the
main query. Otherwise, the query will
be included as is. This param can be
specified multiple times, and the
boosts are are additive. NOTE: the
behaviour listed above is only in
effect if a single bq paramter is
specified. Hence you can disable it by
specifying an additional, blank, bq
parameter.
bf - (Boost Functions) functions (with optional boosts) that will be
included in the users query to
influence the score. Format is:
"funcA(arg1,arg2)^1.2
funcB(arg3,arg4)^2.2". NOTE:
Whitespace is not allowed in the
function arguments. This param can be
specified multiple times, and the
functions are additive.
Here is a nice article about Date-boosting Solr search results:
http://www.metaltoad.com/blog/date-boosting-solr-drupal-search-results
In Drupal this can be simply achieved by the following code:
using Apachesolr module
/**
* Implements hook_apachesolr_query_alter().
*/
function hook_search_apachesolr_query_alter(DrupalSolrQueryInterface $query) {
$query->addParam('bf', array('freshness' =>
'recip(abs(ms(NOW/HOUR,dm_field_date)),3.16e-11,1,.1)'
));
}

Resources