Solr - how to plan field boosting - solr

I query using
qf=Name+Tag
Now I want that documents that have the phrase in tag will arrive first so I use
qf=Name+Tag^2
and they do appear first.
What should be the rule of thumb regarding the number that comes after the field?
How do I know what number to set it?

The number is pure preference based and is mainly trial and error basis.
As to how much the field weighs in comparison to the other field.
The scoring takes into account various factors, however some factors can be considered and tested
e.g. term frequency - So is a word appears twice in Name should it override a single occurrence in the tag field
Also, if you are checking for a Phrase match you should use pf if using the edismax parser.
qf will match individuals words where pf will match whole words.
For e.g. if you have fields name & tag and you search for ruby rails
qf would cause scoring name:ruby tag:ruby & name:rails tag:rails
pf would cause scoring name:"ruby rails" tag:"ruby rails"
so would be better to use qf to match the results and boost single matches but have higher pf values.

Related

how to use pf(Phrase Fields) and ps(Phrase Slop) of eDisMax Query Parser in solr?

What is Phrase Fields, Phrase Slop and Query Phrase Slop in eDisMax. I go through many website but not understand these with implementation. I want to know how query pass in solr using this and how output differs from each other If I have following data.
{
"id":"2",
"shipping_firstname":"Sudhanshu",
"address":"H.No. 444, Gali No.2 Jain Nagar",
"date_added":"2017-01-21T14:15:15Z",
"_version_":1562029999829024768}]
}
Welcome,
Phrase Fields, Phrase Slop and Query Phrase Slop in eDisMax parser are used to boost a document based on certain criteria.
Based on your use case you can give different boost values to manipulate the overall score of a document.
The pf (Phrase Fields) parameter can be used to boost the score of documents in which all of the terms in the q parameter appear in close proximity. The pf parameter takes a list of fields and optional corresponding boosts. The eDisMax query parser will attempt to make phrase queries out of all the terms in the q parameter, and if it’s able to find the exact phrase in any of the phrase fields, it will apply the specified boost to the match for that document.
The ps (Phrase Slop) parameter :
When using the pf parameter, you may not want to require all terms in the query to appear as an exact phrase. You can make use of the ps (phrase slop) parameter to specify how many term positions the terms in the query can be off by to be considered a match on the phrase fields.
The qs (Query Phrase Slop) parameter :
Just as the ps parameter allows you to define the amount of slop (edit distances) on phrases matching in the phrase fields (pf parameter), the qs parameter allows you to do the same for phrases the user explicitly specifies in the main q parameter. Think of the qs parameter as redefining what an exact match is, allowing you to change the slop from the default of 0 (terms must appear beside each other) to a higher number.
What is your requirement here? These params can only help you for ranking results to boost or get some documents at the top and not in actual search criteria or finding matching documents.

How to boost a document if full query text is present in it? - Solr

I need to give priority to the documents where full search term occurs. For example if the search term is "Georgia Tech", I want the document having "Georgia Tech" in it to have extra boost than those having more frequent "Georgia" term in them.
that is almost standard:
index it two fields (use copyField), one whitespace (or similar) tokenized, one as a keyword.
you use edismax and boost the keyword field with more weight than the other

Solr Query Syntax conversion from boolean expression

I'm attempting to query solr for documents, given a basic schema with the following field names, data types irrelevant:
I'm attempting to match documents that match at least one of the following:
occupation, name, age, gender but i want to OR them together
How do you OR together many terms, and enforce the document to match at least one?
This seems to be failing: +(name:Sarah age:24 occupation:doctor gender:male)
How do you convert a boolean expression into solr query syntax? I can't figure out the syntax with + and - and the default operator for OR.
Still I don't get your requirement but you just need to query like:
+(age:24 OR gender:male)
Or if you want data for multiple value in same field with OR condition like.
i.e. You get data of age:24 and age:25 both.
+(age:24 OR age:25 OR gender:male)
Then you can:
+(age:(24 25) OR gender:male)
If it is't your requirement, then let me know.
If you want to make it as simple as possible for the client, just go for the dismax[1] or edismax[2] query parser.
Specifically you can configure a request parameter called "qf" :
"The qf parameter introduces a list of fields, each of which is assigned a boost factor to increase or decrease that particular field’s importance in the query. For example, the query below:
qf=fieldOne^2.3 fieldTwo fieldThree^0.4
assigns fieldOne a boost of 2.3, leaves fieldTwo with the default boost (because no boost factor is specified), and fieldThree a boost of 0.4.
These boost factors make matches in fieldOne much more significant than matches in fieldTwo, which in turn are much more significant than matches in fieldThree." from the wiki
Then you can just pass a free text query, and it will be searched in the fields you specified, giving also different importance to each one, if necessary.
[1] https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html
[2] https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html

Solr negative boost

I'm looking into the possibility of de-boosting a set of documents during
query time. In my application, when I search for e.g. "preferences", I want
to de-boost content tagged with ContentGroup:"Developer" or in other words,
push those content back in the order. Here's the catch. I've the following
weights on query fields and boost query on source
qf=text^6 title^15 IndexTerm^8
As you can see, title has a higher weight.
Now, a bunch of content tagged with ContentGroup:"Developer" consists of a
title like "Preferences.material" or "Preferences Property" or
"Preferences.graphics". The boost on title pushes these documents at the
top.
What I'm looking is to see if there's a way to deboost all documents that are
tagged with ContentGroup:"Developer" irrespective of the term occurrence is
text or title. I tried something like, but didn't make any difference.
Source:simplecontent^10 Source:Help^20 (-ContentGroup-local:("Developer"))^99
I'm using edismax query parser.
Any pointers will be appreciated.
Thanks,
Shamik
You're onto something with your last attempt, but you have to start with *:*, so that you actually have something to subtract the documents from. The resulting set of documents (those not matching your query) can then be boosted.
From the Solr Relevancy FAQ
How do I give a negative (or very low) boost to documents that match a query?
True negative boosts are not supported, but you can use a very "low" numeric boost value on query clauses. In general the problem that confuses people is that a "low" boost is still a boost, it can only improve the score of documents that match. For example, if you want to find all docs matching "foo" or "bar" but penalize the scores of documents matching "xxx" you might be tempted to try...
q = foo^100 bar^100 xxx^0.00001 # NOT WHAT YOU WANT
...but this will still help a document matching all three clauses score higher then a document matching only the first two. One way to fake a "negative boost" is to give a large boost to everything that does not match. For example...
q = foo^100 bar^100 (*:* -xxx)^999
NOTE: When using (e)dismax, people sometimes expect that specifying a pure negative query with a large boost in the "bq" param will work (since Solr automatically makes top level purely negative positive queries by adding an implicit ":" -- but this doesn't work with "bq", because of how queries specified via "bq" are added directly to the main query. You need to be explicit...
?defType=dismax&q=foo bar&bq=(*:* -xxx)^999

Solr Custom Boosting if a specific field matches the query

We are trying to implement a very interesting search logic with custom boosting and I am wondering if Solr can support this.
We have the following fields in our index:
Name
Description
Keywords (array)
Each keyword will have an amount(int value) paired to it.
A search is run across Name, description and keywords field. If a keyword matches the search text, the corresponding index must be boosted based on the amount of the matching keyword only.
I've read through Solr DisMax and they can only boost a field using a fixed amount.
My scenario will be to boost the result by X amount based on matching keywords only.
Thanks in advance
The only viable solution i see to this problem (assuming ofcourse you DO NOT know the number of keywords in advance) would be to just make the query as a filter query (to skip the scoring stage), get all documents matching ( a bit problematic), then just sort them on your side using the matched term to build the a java Comparator.
Problems may arise when you get a particularly large number of documents, but you could probably side step this issue by pagination
If you don't have too much different amounts maybe you can try this on index-time:
Store "keywords" in different fields(dynamicfields->boost-*) based on it's amount:
boost-1 = keyword1,keyword4,keyword6 <br/>
boost-10 = keyword2<br/>
boost-100 = keyword5
You can search across all your boost fields(edismax), boost every dynamicfield with his amount in your (e)dismax conf(boost-1^1,boost-10^10,boost-100^100).

Resources