dismax query parse - mm attribute - solr

"mm":"2",
"q":"IBASEDESCRIPTION:(ankor sunnyvale tokyo^3 london labs)",
"defType":"dismax",
"fl":"score, IBASEDESCRIPTION",
"q.op":"AND",
"rows":"3",
"debug.explain.structured":"true",
"debugQuery":"on"
This is what I see in the response header.
mm=2 it means 2 optional clauses should match.
q.op is AND - I assumed that conditions between the clauses are AND
I expect the following field not to match:
Level 3 Communications-london
akamai-level 3-london
But they are part of the results.
Could anyone help me to understand the behavior here?
I suspect it is because ^3 in the tokyo field. But that's what the boosting factor.

mm=2 does not mean two optional clauses should match - it means that two clauses should match. q.op does not take effect when mm is set.
If no 'mm' parameter is specified in the query, or as a default in solrconfig.xml, the effective value of the q.op parameter (either in the query, as a default in solrconfig.xml, or from the defaultOperator option in the Schema) is used to influence the behavior.
Why those specific entries are included is hard to say without seeing the analysis for the field and the actual value indexed (the debug query will include information about what terms are being hit for each field and how much it contributes to the score).
You should also use qf here instead of having the field name present when using the dismax handler.

Related

Solr Query Syntax conversion from boolean expression

I'm attempting to query solr for documents, given a basic schema with the following field names, data types irrelevant:
I'm attempting to match documents that match at least one of the following:
occupation, name, age, gender but i want to OR them together
How do you OR together many terms, and enforce the document to match at least one?
This seems to be failing: +(name:Sarah age:24 occupation:doctor gender:male)
How do you convert a boolean expression into solr query syntax? I can't figure out the syntax with + and - and the default operator for OR.
Still I don't get your requirement but you just need to query like:
+(age:24 OR gender:male)
Or if you want data for multiple value in same field with OR condition like.
i.e. You get data of age:24 and age:25 both.
+(age:24 OR age:25 OR gender:male)
Then you can:
+(age:(24 25) OR gender:male)
If it is't your requirement, then let me know.
If you want to make it as simple as possible for the client, just go for the dismax[1] or edismax[2] query parser.
Specifically you can configure a request parameter called "qf" :
"The qf parameter introduces a list of fields, each of which is assigned a boost factor to increase or decrease that particular field’s importance in the query. For example, the query below:
qf=fieldOne^2.3 fieldTwo fieldThree^0.4
assigns fieldOne a boost of 2.3, leaves fieldTwo with the default boost (because no boost factor is specified), and fieldThree a boost of 0.4.
These boost factors make matches in fieldOne much more significant than matches in fieldTwo, which in turn are much more significant than matches in fieldThree." from the wiki
Then you can just pass a free text query, and it will be searched in the fields you specified, giving also different importance to each one, if necessary.
[1] https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html
[2] https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html

what's the difference between these two solr queries?

What's the difference between these two solr queries:
NOT name:*
NOT name:[* TO *]
Both of the two can return some results.But I can't discriminate the difference.
Based on reading the documentation for SOLR query.
NOT name:[* TO *]
removes all documents with the name and whatever value name contains, as shown on this documentation: https://wiki.apache.org/solr/SolrQuerySyntax
NOT name:*
removes all member fields belonging to name.
NOT is a keyword reserve in removing results of whatever field + value. They show you different results because if you specify a value to NOT name:[* TO *], you are bound to get results that exclude from the rule you've specified.
Keep in mind that SOLR query employs certain regex rules.

Solr DisMax query equivalent

I am trying to set up elevate handler in SOLR 3.5.0 and I need the equivalent of the below query in dismax format which defines different boost values on the same field based on the match type(exact match gets 200 whereas wildcard match gets 100).
q=name:(foo*^100.0 OR foo^200.0)
This is one way to solve this problem.
Keep a text field with only WhiteSpaceTokenizer (and maybe LowerCaseFilter depending on your case-sensitivity needs). Use this field for the exact match. Let's call this field name_ws.
Instead of using a wild-card query on name_ws, use a text-type copy field with EdgeNGramTokenizer in your analyzer chain, which will output tokens like:
food -> f, fo, foo, food
Let's call this field name_edge.
Then you can issue this dismax query:
q=foo&defType=dismax&qf=name_ws^200+name_edge^100
(Add debugQuery=on to verify if the scoring works the way you want.)

What is the proper way to boost items with newer dates?

I have a more like this query which I would like to update to return newer documents first. According to the documentation, I would need to add recip(ms(NOW,mydatefield),3.16e-11,1,1) to my query.
But when I try to add it to either of mlt.qf or bf parameters. The results stay exactly the same.
This is my query:
/solr/mlt?
q=id:cms.article.137861
&defType=edismax
&rows=3
&indent=on
&mlt.fl=series_id,tags,title,text
&mlt.qf=show_id text^1.1 title^1.1 tags^90
&wt=json
&fl=url,title,tags,django_id,content_type_id
&bf=recip(ms(NOW,pub_date),3.16e-11,1,1)
this is taken from the solr wiki (its down but i have it cached)
i think this is what you are looking for.
How can I boost the score of newer documents
Do an explicit sort by date (relevancy scores are ignored)
Use an index-time boost that is larger for newer documents
Use a FunctionQuery to influence the score based on a date field.
In Solr 1.3, use something of the form recip(rord(myfield),1,1000,1000)
In Solr 1.4, use something of the form recip(ms(NOW,mydatefield),3.16e-11,1,1)
http://lucene.apache.org/solr/api/org/apache/solr/search/function/ReciprocalFloatFunction.html http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html
A full example of a query for "ipod" with the score boosted higher the newer the product is:
http://localhost:8983/solr/select?q={!boost b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}ipod
One can simplify the implementation by decomposing the query into multiple arguments:
http://localhost:8983/solr/select?q={!boost b=$dateboost v=$qq}&dateboost=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)&qq=ipod
Now the main "q" argument as well as the "dateboost" argument may be specified as defaults in a search handler in solrconfig.xml, and clients would only need to pass "qq", the user query.
To boost another query type such as a dismax query, the value of the boost query is a full sub-query and hence can use the {!querytype} syntax. Alternately, the defType param can be used in the boost local params to set the default type to dismax. The other dismax parameters may be set as top level parameters.
http://localhost:8983/solr/select?q={!boost b=$dateboost v=$qq defType=dismax}&dateboost=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)&qf=text&pf=text&qq=ipod
Consider using reduced precision to prevent excessive memory consumption. You would instead use recip(ms(NOW/HOUR,mydatefield),3.16e-11,1,1). See this thread for more information.
apparently your date field is not a TrieDate

Adding date boosting to complex SOLR queries

I currently have a SOLR query which uses the query (q), query fields (qf) and phrase fields (pf) to retrieve the results I want. An example is:
/solr/select
?q=superbowl
&qf=title^3+headline^2+intro+fulltext
&pf=title^3+headline^2+intro+fulltext
&fl=id,title,ts_modified,score
&debugQuery=true
The idea is that the title and headline of the "main item" give the best indication of what the result is "about", but the intro and fulltext provides some input too. Ie, imagine a collection of links, where the collection itself has metadata (what it's a collection of), but each link has it's own data (title of the link, synopsis, etc). If we search for "superbowl", the most relevant results are the ones with "superbowl" in the collection metadata, the least relevant results are those with "superbowl" in just the synopsis of one of the links... but they're all valid results.
What I'm trying to do is add a boost to the relevancy score so that the most recent results float towards the top, but retaining title,headline,intro,fulltext as part of the formula. A recent result with the search string in the collection metadata would be more relevant than one with it only in the links metadata... but that "links only" recent result might be more relevant than a very old result with the search string in the collection metadata. (I hope that's somewhat clear).
The problem is that I can't figure out how to combine the boost function documented on the SOLR site with the use of the qf/pf fields. Specifically...
From the SOLR site, something like the following works to boost the results by date:
/solr/select
?q={!boost%20b=$dateboost%20v=$qq}
&dateboost=ord(ts_modified)
&qq=superbowl
&fl=ts_modified,score
&debugQuery=true
However, I can't figure out how to combine that query with the use of qf and pf. Any suggestions would be more than welcome.
Thanks to danben's response, I was able to come up with the following:
/solr/select
?q={!boost%20b=$dateboost%20v=$qq%20defType=dismax}
&dateboost=ord(ts_modified)
&qq=superbowl
&qf=title^3+headline^2+intro^2+fulltext
&pf=title^3+headline^2+intro^2+fulltext
&fl=ts_modifieds,score
&debugQuery=true
It looks like the actual problems I was having were:
I left spaces in the q param instead of escaping them (%20) when copy/pasting
I didn't include the defType=dismax in my q param, so that it would pay attention to the qf/pf parameters
Check out http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
This is based on the ms function, which returns the difference in milliseconds between two timestamps / dates, and ReciprocalFloatFunction which increases as the value passed decreases.
Since you are using the DisMaxRequestHandler, you may need to specify your query using the bq/bf parameters. From http://lucene.apache.org/solr/api/org/apache/solr/handler/DisMaxRequestHandler.html:
bq - (Boost Query) a raw lucene query that will be included in the
users query to influence the score. If
this is a BooleanQuery with a default
boost (1.0f), then the individual
clauses will be added directly to the
main query. Otherwise, the query will
be included as is. This param can be
specified multiple times, and the
boosts are are additive. NOTE: the
behaviour listed above is only in
effect if a single bq paramter is
specified. Hence you can disable it by
specifying an additional, blank, bq
parameter.
bf - (Boost Functions) functions (with optional boosts) that will be
included in the users query to
influence the score. Format is:
"funcA(arg1,arg2)^1.2
funcB(arg3,arg4)^2.2". NOTE:
Whitespace is not allowed in the
function arguments. This param can be
specified multiple times, and the
functions are additive.
Here is a nice article about Date-boosting Solr search results:
http://www.metaltoad.com/blog/date-boosting-solr-drupal-search-results
In Drupal this can be simply achieved by the following code:
using Apachesolr module
/**
* Implements hook_apachesolr_query_alter().
*/
function hook_search_apachesolr_query_alter(DrupalSolrQueryInterface $query) {
$query->addParam('bf', array('freshness' =>
'recip(abs(ms(NOW/HOUR,dm_field_date)),3.16e-11,1,.1)'
));
}

Resources