Tweaking Solr scoring function - solr

In Solr 6.0.0, I want to tweak the scoring based on which field matches (in indexing and searching xml files), do I have to override my own similarity class ?

Although you can use a custom Similarity, that's more useful for changing things like how term popularity effects score. Simply ranking different fields differently is typically what you use score boosts for.
q=foo:something^100 OR bar:something^10
I recommend reading https://wiki.apache.org/solr/SolrRelevancyFAQ and maybe take a look at the dismax query parser, which has a convenient notation for that kind of thing. See https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser (and the "qf" parameter)

Related

SOLR edismax with BF function on non existing fields

I would like to apply negative boost on the documents which does not have specific fields. But its not working and results the same boost value for the document with and without that field.Any pointers would be of great help.
bf=if(not(exists('image-small_string')),0,-500)
The answer is to boost those documents that do not match your query, instead of trying to apply a negative boost to those that do.
To boost documents that has a specific field, you can use bq=foo:[* TO *]^5 (and adjust the boost factory to match the behaviour you're looking for).

In Solr, how can we use terms external to the search query to bias result ordering?

We're working on a plan to identify content tags our users are interested in. So, for instance, we may determine that User X consumes content tagged with "kermit" and "piggy" more often than other tags. These are their "favored tags."
When the users search, we'd like to favor/bias documents that contain these terms.
This means we can't boost the documents at index time, because every user will have different favored tags. Additionally, they may not be searching for the favored tags themselves. They may search for "gonzo," and so we absolutely want to give them documents with "gonzo," but we want to boost documents that also contain "kermit" or "piggy."
These favored tags are not used to actually query the index, but rather are used to bias the result ordering. The favored tags become something of a tie-breaker -- all else being equal, documents containing these terms will rank higher.
This is new/planned development, so we can use whatever version and parser stack is optimal to solve this problem.
Solution in SolrNet
The question was correctly answered below, but here's the code for SolrNet just in case someone else is using it.
var localParams = new LocalParams();
localParams.Add("bq", "kermit^10000); //numeric value is the degree of boost
var solr = ServiceLocator.Current.GetInstance<ISolrOperations<MySolrDocumentClass>>();
solr.Query(new SolrQuery("whatever") + localParams);
You didn't specify which query parser you're using, but if you are using the Dismax or Extended Dismax query parser, the bq argument should do exactly what you're looking for. bq adds search criteria to a search solely for the purpose of affecting the relevancy, but not to limit the result set.
From the Dismax documentation:
The bq (Boost Query) Parameter
The bq parameter specifies an additional, optional, query clause that
will be added to the user's main query to influence the score. For
example, if you wanted to add a relevancy boost for recent documents:
q=cheese
bq=date:[NOW/DAY-1YEAR TO NOW/DAY]
You can specify multiple bq parameters. If you want your query to be
parsed as separate clauses with separate boosts, use multiple bq
parameters.
In this case, you may want to add &bq=kermit&bq=piggy to the end of your Solr query. If you aren't using one of these query parsers, this need may be exactly the motivation you need to switch.

Terms Prevalence in SolR searches

Is there a way to specify a set of terms that are more important when performing a search?
For example, in the following question:
"This morning my printer ran out of paper"
Terms such as "printer" or "paper" are far more important than the rest, and I don't know if there is a way to list these terms to indicate that, in the global knowledge, they'd have more weight than the rest of words.
For specific documents you can use QueryElevationComponent, which uses special XML file in which you place your specific terms for which you want specific doc ids.
Not exactly what you need, I know.
And regarding your comment about users not caring what's underneath, you control the final query. Or, in the worst case, you can modify it after you receive it at Solr server side.
Similar: Lucene term boosting with sunspot-rails
When you build the query you can define what are the values and how much these fields have weight on the search.
This can be done in many ways:
Setting the boost
The boost can be set by using "^ "
Using plus operator
If you define + operator in your query, if there is a exact result for that filed value it is shown in the result.
For a better understanding of solr, it is best to get familiar with lucene query syntax. Refer to this link to get more info.

Is it possible to have SOLR MoreLikeThis use different fields for model and matches?

Let's say I have documents with two fields, A and B.
I'd like to use SOLR's MoreLikeThis, but with a twist: I'm most interested in boosting documents whose A field is like my model document's B field. (That is, extract MLT's 'interesting terms' from the model B field, but only collect MLT results based on the A field.)
I don't see a way to use the mlt.fl fields or mlt.qf boosts to achieve this effect in a single query. (It seems mlt.fl specifies fields used for both discovery of 'interesting terms' and matching to those terms.) Am I missing some option?
Or will I have to extract the 'interesting terms' myself and swap the 'field:term' details?
(Other ideas in this same vein appreciated as well.)
Two options I see are:
Use a copyField - index your original document with a copy of field A named B, and then query using B.
Extend MoreLikeThisHandler and change the fields you query.
The first option costs a bit of programming (mostly configuration changes) and some memory consumption. The second involves more programming but no memory footprint increase. Hope one of them suits your needs.
I now think there are two ways to achieve the desired effect (without customizing the MLT source code).
First option: Do an initial MLT query with the MLT handler, adding the parameter &mlt.interestingTerms=details. This includes the list of terms that were deemed interesting, ranked with their relative boosts. The usual behavior uses those discovered terms against the same mlt.fl fields to find similar documents. For example, the response will include something like:
"interestingTerms":
["field_b:foo",5.0,"field_b:bar",2.9085307,"field_b:baz",1.67070794]
(Since the only thing about this initial query that's interesting is the interestingTerms, throwing in an fq that rules out all docs could help it skip unnecessary scoring work.)
Explicitly re-composing that interestingTerms info into a new OR query field_a:foo^5.0 field_a:bar^2.9085307 field_a:baz^1.67070794 amounts to using the B field example text to find documents that are similar in field A, and may be mimicking exactly the kind of query default MLT does on its usual model field.
Second option: Grab the model document's actual field B text, and feed it directly as a ContentStream body, to be used in lieu of a query, for specifying the model document. Then target mlt.fl at field A for the sake of collecting similar results. For example, a fragment of the parameters might be …&stream.body=foo bar baz&mlt.fl=field_a&…. Again, the net effect being that model text originally from field_b is finding documents similar only in field_a.

Use different Solr Similarity algo for every search

Is possible in Solr 1.4 to specify which similarity class to use for every search within a single index?
Let's say, I got 2 type of search (keyword and brand). For keyword search, I want to use the DefaultSimilarity class. But, for brand search, I want to use my CustomSimilarity class.
I've been modifying the schema.xml to specify a single similarity class to use. But, I came to this requirement that I have to use 2 different similarity classes.
I'll be glad to here your thoughts on this.
Thanks in advance.
AFAIK the Similarity can only be defined at the schema/index level and can't be overriden per fieldType or per query. (see this and this).
However you can customize your result ordering using other methods: boosting, function queries, a custom analyzer per field, or even sorting.
The Solr Relevancy Cookbook wiki is a good reference.

Resources