Boost evenly across field of varying length - solr

I've got a text field that can potentially have multiple values.
doc 1:
field a:"X Y"
doc 2:
field a:"X"
I want to be able to do :
a:X^5
And have both doc 1 and 2 get an identical score.
I've been messing around with all the field options, but I always end up with doc 2 getting double the score of doc 1.
I've tried setting multiValued="true", but get the same result.
Is there someway that I can set my search or the field definition so that it will boost just based upon the existence of the search term and not be effected by the rest of the field's contents.

Disable norms by setting omitNorms=true in your schema and reindex - it should disable the length normalization for the field and give you the desired results.
For more details of what omitNorms does, see this.

The field a of doc 2 has only one term as compared to doc 1 which has two.
Solr DefaultSimilartiy implementation takes into account the length norm, number of terms in the field, for the fields when calculating the score.
LenghtNorm is 1.0 / Math.sqrt(numTerms)
LengthNorm allows you to make shorter documents score higher.
You can provide your own implementation of Similarity class which doesn't take into account the lengthNorm.
Check computeNorm method implementation.
You can turn of the Norms using omitNorms=false.
Norms allow for index time boosts and field length normalization. This allows you to add boosts to fields at index time and makes shorter documents score higher.
So you would lose both of the above if you use it.

Related

Document size adjusting Search.Score - virtually reducing Scoring profile score

We are using scoring profile for driving the relevance and adjusting scores i.e. boost the relevance for a attribute isActive is 1 by 50 using function in scoring profile, While searching for a specific fields on the Index by passing &searchFields=******
however Search.Score seems highly squeezed by size of the document , smaller the size high score probably due to TF-IDF…..
And this is defeating the purpose of using scoring profile , however in our case we don’t want score to be impact due to size of document since we are passing searchFields.
Cases where searchFields are not passed we want scores to be adjusted by size i.e. free form search in all searchable fields.
example search query -
agency temps&$count=true&$top=30&$skip=0&searchMode=All&$filter=(CompanyCode eq '13453' and VNumber eq '00023232312016') &scoringProfile=BusinessProfile1&searchFields=VCategory
I wonder if the new featuresMode preview capability would be helpful for you? Using this, you can get a lot more information back from the search query such as uniqueTokenMatches and termFrequency on a field by field basis. Using this, you could adjust the ordering as needed on the client side.
Also, you are correct that the default is a TF-IDF like scoring, however, you might also be interested in trying BM25 which although does not solve what you are asking for, could be more effective for helping to get scores you are looking for.
For now I adopted the approach to adjust the parameters for algorithm BM25 as advised by Liam, and added b as 0.0 in index creation json, so that document size is not used during TF-IDF while calculating score for the document,
"similarity": {
"#odata.type": "#Microsoft.Azure.Search.BM25Similarity",
"b" : 0.0,
"k1" : 1.3
}
however same time identified another field on the index having a correlation with size of the record on the index i.e. larger the size higher the value of that field and using that in scoring profile for the case where document size should be considered in scoring.

How can I change the score of a document based on the *Price* of a field (say, "popularity") in solr

How to boost record depend on any field in Solr.
Reference link :https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_increase_the_score_for_specific_documents
But I am not getting clearlly in my case.
I have some record after search
How to get Id : 5,8,17 and 1 up some step not top of the list, just boost some step.Because it's price is higher.
It's my row query ;
select?mm=100%25&version=2.2&q=(book)&defType=edismax&spellcheck.q=(book)&qf=Price^10+Name^1+nGramContent&spellcheck=true&stats=true&facet.mincount=1&facet=true&spellcheck.collate=true&stats.field=Price&rows=50&indent=on&wt=json&fl=Id,score,Price
Please help me.
Thanks!
The qf parameters are for hits in the field and will not affect the ranking unless the query produces a hit in the field. Your example would require you to search for the price (and not book) for anything to be boosted by the qf=Price^10 argument.
The FAQ you've linked to answers your question, just not the question you've referenced: How can I change the score of a document based on the value of a field. From the example (replace popularity with price for your case):
# simple boosts by popularity
defType=dismax&qf=text&q=supervillians&bf=popularity
q={!boost b=popularity}text:supervillians
# boosts based on complex functions of the popularity field
defType=dismax&qf=text&q=supervillians&bf=sqrt(popularity)
q={!boost b=sqrt(popularity)}text:supervillians
edismax makes the {!boost} (multiplicative boost) available as the boost= parameter as well, so you can reference it directly instead of having it in your query.

SOLR Down boosting on field Value

I have got this query in solr. The problem is, i am getting search results that contains a category of items named "PRD DELETED".
Now all the items that have "PRD DELETED", i want to display those at the end.
For Ex if 100 records are there and one page contains 25 records, then on the last page "PRD DELETED" records should display.
Pls note that "PRD DELETED" is a value and not any category. I think down boosting is needed here, but i am unable to find the exact solution.
Any suggestion here would be a big help.
The solution is usually to do the opposite: boost all documents that isn't deleted, instead of trying to negatively boost those that is. Boosts are either multiplicative or additive, and while multiplicative boosts can reduce the score value, the additive can't. bq and qf are additive, while boost is multiplicative.
The Relevancy FAQ has an example for this case:
When using (e)dismax, people sometimes expect that specifying a pure negative query with a large boost in the "bq" param will work (since Solr automatically makes top level purely negative positive queries by adding an implicit ":" -- but this doesn't work with "bq", because of how queries specified via "bq" are added directly to the main query. You need to be explicit...
?defType=dismax
&q=foo bar
&bq=(*:* -xxx)^999
Implementing it as a multiplicative boost would probably involve using if and then returning either 1 or a lower value depending on whether the field has the given value.

Adding Boost to Score According to Payload of Multivalued Field at Solr

Here is my case;
I have a field at my schema named elmo_field. I want that elmo_field should have payloaded values. i.e.
dorothy|0.46 sesame|0.37 big bird|0.19 bird|0.22
When a user searches for a keyword i.e. dorothy I want to add 0.46 to usual score. If user searches for big bird, 0.19 should be added and if user searches for bird, 0.22 should be added (payloads are added - or payloads * normalize coefficient will be added).
I mean I will make a search on my index at my other fields of solr schema. And I will make another search (this one is an exact match search) at elmo_field at same time and if matches something I will increase score with payloads.
Any ideas?
I've implemented a custom similarity wrapper. For usual things I've used DefaultSimilarity. If a field is a payloaded field another similarity that is implemented by me is used. That similarity class just ignores payload value. I've also implemented a query parser that is a customized version of edismax. With that approach I could add payload value into the document score.
Have you looked at CustomScoreQuery?
There's an example with some explanation how to do this at http://dev.fernandobrito.com/2012/10/building-your-own-lucene-scorer/
You could do a boost on a query as this question suggests: How to assign a weight to a term query in Lucene/Solr
Or you could try using payloads as described here:
http://searchhub.org/2009/08/05/getting-started-with-payloads/

SOLR index time boost depending on the field value

Is it possible to boost a document on the indexing stage depending on the field value?
I'm indexing a text field pulled from the database. I would like to boost results that are shorter over the longer ones. So the value of boost should depend on the length of the text field.
This is needed to alter the standard SOLR behavior that in my case tends to return documents with multiple matches first.
Considering I have a field that stores the length of the document, the equivalent in the query of what I need at indexing would be:
q={!boost b=sqrt(length)}text:abcd
Example:
I have two items in the DB:
ABCDEBCE
ABCD
I always want to get ABCD first for the 'BC' query even though the other item contains the search query twice.
The other solution to the problem would be ability to 'switch off' the feature that scores multiple matches higher at query time. Don't know if that is possible either...
Doing this at index time is important as the hardware I run the SOLR on is not too powerful and trying to boost on query time returns with OutOfMemory Exception. (Even If I could work around that increasing memory for java I prefer to be on the safe side and implement the index the most efficient way possible.)
Yes and no - but how you do it depends on how you're indexing your documents.
As far as I know there's no way of resolving this only on the solr server side at the moment.
If you're using the regular XML based interface to submit documents, let the code that generates the submitted XML add boost=".." values to the field or to the document depending on the length of the text field.
You can check upon DIH Special Commands which has a $docBoost command
$docBoost : Boost the current doc. The value can be a number or the
toString of a number
However, there seems no $fieldBoost Command.
For you case though, if you are using DefaultSimilarity, shorter fields are boosted higher then longer fields in the Score calculation.
You can surely implement your own Simiarity class with a changed TF (Term Frequency) and LengthNorm Calculation as your needs.

Resources