SOLR Down boosting on field Value - solr

I have got this query in solr. The problem is, i am getting search results that contains a category of items named "PRD DELETED".
Now all the items that have "PRD DELETED", i want to display those at the end.
For Ex if 100 records are there and one page contains 25 records, then on the last page "PRD DELETED" records should display.
Pls note that "PRD DELETED" is a value and not any category. I think down boosting is needed here, but i am unable to find the exact solution.
Any suggestion here would be a big help.

The solution is usually to do the opposite: boost all documents that isn't deleted, instead of trying to negatively boost those that is. Boosts are either multiplicative or additive, and while multiplicative boosts can reduce the score value, the additive can't. bq and qf are additive, while boost is multiplicative.
The Relevancy FAQ has an example for this case:
When using (e)dismax, people sometimes expect that specifying a pure negative query with a large boost in the "bq" param will work (since Solr automatically makes top level purely negative positive queries by adding an implicit ":" -- but this doesn't work with "bq", because of how queries specified via "bq" are added directly to the main query. You need to be explicit...
?defType=dismax
&q=foo bar
&bq=(*:* -xxx)^999
Implementing it as a multiplicative boost would probably involve using if and then returning either 1 or a lower value depending on whether the field has the given value.

Related

Solr Query Syntax conversion from boolean expression

I'm attempting to query solr for documents, given a basic schema with the following field names, data types irrelevant:
I'm attempting to match documents that match at least one of the following:
occupation, name, age, gender but i want to OR them together
How do you OR together many terms, and enforce the document to match at least one?
This seems to be failing: +(name:Sarah age:24 occupation:doctor gender:male)
How do you convert a boolean expression into solr query syntax? I can't figure out the syntax with + and - and the default operator for OR.
Still I don't get your requirement but you just need to query like:
+(age:24 OR gender:male)
Or if you want data for multiple value in same field with OR condition like.
i.e. You get data of age:24 and age:25 both.
+(age:24 OR age:25 OR gender:male)
Then you can:
+(age:(24 25) OR gender:male)
If it is't your requirement, then let me know.
If you want to make it as simple as possible for the client, just go for the dismax[1] or edismax[2] query parser.
Specifically you can configure a request parameter called "qf" :
"The qf parameter introduces a list of fields, each of which is assigned a boost factor to increase or decrease that particular field’s importance in the query. For example, the query below:
qf=fieldOne^2.3 fieldTwo fieldThree^0.4
assigns fieldOne a boost of 2.3, leaves fieldTwo with the default boost (because no boost factor is specified), and fieldThree a boost of 0.4.
These boost factors make matches in fieldOne much more significant than matches in fieldTwo, which in turn are much more significant than matches in fieldThree." from the wiki
Then you can just pass a free text query, and it will be searched in the fields you specified, giving also different importance to each one, if necessary.
[1] https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html
[2] https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html

Solr negative boost

I'm looking into the possibility of de-boosting a set of documents during
query time. In my application, when I search for e.g. "preferences", I want
to de-boost content tagged with ContentGroup:"Developer" or in other words,
push those content back in the order. Here's the catch. I've the following
weights on query fields and boost query on source
qf=text^6 title^15 IndexTerm^8
As you can see, title has a higher weight.
Now, a bunch of content tagged with ContentGroup:"Developer" consists of a
title like "Preferences.material" or "Preferences Property" or
"Preferences.graphics". The boost on title pushes these documents at the
top.
What I'm looking is to see if there's a way to deboost all documents that are
tagged with ContentGroup:"Developer" irrespective of the term occurrence is
text or title. I tried something like, but didn't make any difference.
Source:simplecontent^10 Source:Help^20 (-ContentGroup-local:("Developer"))^99
I'm using edismax query parser.
Any pointers will be appreciated.
Thanks,
Shamik
You're onto something with your last attempt, but you have to start with *:*, so that you actually have something to subtract the documents from. The resulting set of documents (those not matching your query) can then be boosted.
From the Solr Relevancy FAQ
How do I give a negative (or very low) boost to documents that match a query?
True negative boosts are not supported, but you can use a very "low" numeric boost value on query clauses. In general the problem that confuses people is that a "low" boost is still a boost, it can only improve the score of documents that match. For example, if you want to find all docs matching "foo" or "bar" but penalize the scores of documents matching "xxx" you might be tempted to try...
q = foo^100 bar^100 xxx^0.00001 # NOT WHAT YOU WANT
...but this will still help a document matching all three clauses score higher then a document matching only the first two. One way to fake a "negative boost" is to give a large boost to everything that does not match. For example...
q = foo^100 bar^100 (*:* -xxx)^999
NOTE: When using (e)dismax, people sometimes expect that specifying a pure negative query with a large boost in the "bq" param will work (since Solr automatically makes top level purely negative positive queries by adding an implicit ":" -- but this doesn't work with "bq", because of how queries specified via "bq" are added directly to the main query. You need to be explicit...
?defType=dismax&q=foo bar&bq=(*:* -xxx)^999

How to perform an exact search in Solr

I implementing Solr search using an API. When I call it using the parameters as, "Chillout Lounge", it returns me the collection which are same/similar to the string "Chillout Lounge".
But when I search for "Chillout Lounge Box", it returns me results which don't have any of these three words.(in the DB there are values which have these 3 values, but they are not returned.)
According to me, Solr uses Fuzzy search, but when it is done it should return me some values, which will have at least one these value.
Or what could be the possible changes I should to my schema.XML, such that is would give me proper values.
First of all - "Fuzzy search" is a feature you'll have to ask for (by using ~ in standard Lucene query syntax).
If you're talking about regular searches, you can use q.op to select which operator to use. q.op=AND will make sure that all the terms match, while q.op=OR will make any document that contain at least one of the terms be returned. As long as you aren't using fq for this, the documents that match more terms should be scored higher (as the score will add up across multiple terms), and thus, be shown higher in the result set.
You can use the debug query feature in the web interface to see scores for each term for a document, and find out why the document was returned at all. If the document doesn't match any terms, it shouldn't be returned, unless you're asking for all documents to be returned.
Be aware that the analyzer chain defined for the field you're searching might affect what's considered a match and not.
You'll have to add a proper example to get a more detailed answer.

Haystack/Solr boosting results if the query is found in a specific field

We're having issues with non relevant results being returned as the highest results in our search and we're trying to improve that behavior, but not really sure how.
We have SearchIndex with about a dozen fields. The document=True field is a template backed field that we have placed the majority of the content into. Some of the stuff found in there is much less relevant than other stuff, even if it's still useful.
To give a concrete example: if a user searches for "red rose", we want to return red roses as the top results...even better if lower results are just roses or just red, or even are described as being "rose red" in color.
The issue is our document=True field has a ton of items that are described as being "rose red". Worse the actual red roses don't have "red" and "rose" particularly close to each other as those values would come from disparate fields. As a result we get the top few hundred results that are completely irrelevant.
What we would like to do is either:
A. Search the primary document and then search each of our other fields and boost (but not hard filter) accordingly. If the term "rose" appears in one of the items names and "red" appears as one of it's attribute values than that result should have a higher score. This gives us the optimal results in theory sorted by relevancy.
B. Search all fields at once and boost if the value is any of the "boosted" fields.
It seems like using field boost should be the answer, but we can't figure out how to express it since filtering based on a field is a harsh exclude and we want it to only impact the relevance scoring.
The result of both of these is effectively the same. We just can't figure out how to do either of them with Haystack. Or if we'd have to fall back to raw queries how to write a solr query that accomplishes this.
I can give you some pointers, as I did not get the exact use case :-
You can check on Solr edismax query parser to configure:-
Fields you want to search on - Mainly to select the results
Variable boost on fields for relevancy - To determine the importance on fields
Variable boost for different words combination e.g. single words, phrase match, shingle match with slop to determine relevancy
Provide additional boost on other fields
This will help you to filter the results and order them accordingly as per the field and word combination matches

SOLR index time boost depending on the field value

Is it possible to boost a document on the indexing stage depending on the field value?
I'm indexing a text field pulled from the database. I would like to boost results that are shorter over the longer ones. So the value of boost should depend on the length of the text field.
This is needed to alter the standard SOLR behavior that in my case tends to return documents with multiple matches first.
Considering I have a field that stores the length of the document, the equivalent in the query of what I need at indexing would be:
q={!boost b=sqrt(length)}text:abcd
Example:
I have two items in the DB:
ABCDEBCE
ABCD
I always want to get ABCD first for the 'BC' query even though the other item contains the search query twice.
The other solution to the problem would be ability to 'switch off' the feature that scores multiple matches higher at query time. Don't know if that is possible either...
Doing this at index time is important as the hardware I run the SOLR on is not too powerful and trying to boost on query time returns with OutOfMemory Exception. (Even If I could work around that increasing memory for java I prefer to be on the safe side and implement the index the most efficient way possible.)
Yes and no - but how you do it depends on how you're indexing your documents.
As far as I know there's no way of resolving this only on the solr server side at the moment.
If you're using the regular XML based interface to submit documents, let the code that generates the submitted XML add boost=".." values to the field or to the document depending on the length of the text field.
You can check upon DIH Special Commands which has a $docBoost command
$docBoost : Boost the current doc. The value can be a number or the
toString of a number
However, there seems no $fieldBoost Command.
For you case though, if you are using DefaultSimilarity, shorter fields are boosted higher then longer fields in the Score calculation.
You can surely implement your own Simiarity class with a changed TF (Term Frequency) and LengthNorm Calculation as your needs.

Resources