In my Solr scheme, I have a numeric field that stores a color value (out of, say 65535). How can I make so that when I search for a particular color, the search relevance gets boosted, depending on how close (in absolute value) the particular search is to the asked value?
you can use function queries to calculate the closeness and boost the value.
e.g. div(x,65535) which will generate a value of 1 if exact and less values depending on the closeness.
You can check for the other queries as well to factor the boost accordingly.
And boost the results q={!boost b=div(x,65535)}text:supervillians
together with the function queries, you can use the recip function for calculating boost factor from the color distance http://wiki.apache.org/solr/FunctionQuery#recip
Example:
recip(div(x,65535),1,10000,10000)
Related
We're having some relevance issues with Solr results. In this particular example we have product A showing up above product B. Product A's title contains the search term. Product B's title also contains the search term along with its Description and Category Name. So logically, Product B should be more relevant and appear above Product A, but it does not.
The schema is configured to take all of these extra fields into account. After analyzing the debug info of the query with ...&debugQuery=true&debug.explain.structured=trueit appears that both products have achieved the same score. Looking further, I can see these extra fields having scores calculated, but for some reason, the parser only takes the maximum of these scores instead of the sum which causes it to be the same:
Is there a reason that Solr behaves this way? Is there any way to change this behavior to use the sum instead of the max? (Just like in the parent element in the images)
You can control how the score is calculated using the tie parameter, provided that you are using Dismax/eDismax query parser.
Solr documentation explains it very well :
tie (Tie Breaker) parameter :
The tie parameter specifies a float value (which should be something
much less than 1) to use as tiebreaker in DisMax queries.
When a term from the user’s input is tested against multiple fields,
more than one field may match. If so, each field will generate a
different score based on how common that word is in that field (for
each document relative to all other documents).
The tie parameter lets
you control how much the final score of the query will be influenced
by the scores of the lower scoring fields compared to the highest
scoring field.
A value of "0.0" - the default - makes the query a pure "disjunction
max query": that is, only the maximum scoring subquery contributes to
the final score.
A value of "1.0" makes the query a pure "disjunction
sum query" where it doesn’t matter what the maximum scoring sub query
is, because the final score will be the sum of the subquery scores.
Typically a low value, such as 0.1, is useful.
I can see from the documentation that I can use referencePointParameter and tagsParameter to pass parameters into the disance and tags scoring functions respectively.
I'd like to do the same with the magnitude scoring function, but can't see from the documentation how to do this (or if it's even possible).
For example, if a product was £100, I'd like to get similar products with a similar price. I think I could do this with 2 magnitude functions (e.g. boost from £80 to £100, and again from £120 to £100 will boost products closest to the £100 price of the original product).
Is this possible?
No, it is not possible to do magnitude boosting based on relative values of a field across documents. This feature is intended for situations where you statically know the ranges that you want to boost (for example, when boosting based on a rating field with a fixed scale).
For a specific facet field of our Solr documents, it would make way more sense to be able to sort facets by their relative "interesting-ness" i.e. their tf-idf score, rather than by popularity. This would make it easy to automatically get rid of unwanted common English words, as both their TF and DF would be high.
When a query is made, TF should be calculated, using all the documents that participate in teh results list.
I assume that the only problem with this approach would be when no query is made, resp., when one searches for ":". Then, no term will prevail over the others in terms of interestingness. Please, correct me if I am wrong here.
Anyway,is this possible? What other relative measurements of "interesting-ness" would you suggest?
facet.sort
This param determines the ordering of the facet field constraints.
count - sort the constraints by count (highest count first) index - to
return the constraints sorted in their index order (lexicographic by
indexed term). For terms in the ascii range, this will be
alphabetically sorted. The default is count if facet.limit is greater
than 0, index otherwise.
Prior to Solr1.4, one needed to use true instead of count and false
instead of index.
This parameter can be specified on a per field basis.
It looks like you couldn't do it out of the box without some serious changes on client side or in Solr.
This is a very interesting idea and I have been searching around for some time to find a solution. Anything new in this area?
I assume that for facets with a limited number of possible values, an interestingness-score can be computed on the client side: For a given result set based on a filter, we can exclude this filter for the facet using the local params-syntax (!tag & !ex) Local Params - On the client side, we can than compute relative compared to the complete index (or another subpart of a filter). This would probably not work for result sets build by a query-parameter.
However, for an indexed text-field with many potential values, such as a fulltext-field, one would have to retrieve df-counts for all terms. I imagine this could be done efficiently using the terms component and probably should be cached on the client-side / in memory to increase efficiency. This appears to be a cumbersome method, however, and doesn't give the flexibility to exclude only certain filters.
For these cases, it would probably be better to implement this within solr as a new option for facet.sort, because the information needed is easily available at the time facet counts are computed.
There has been a discussion about this way back in 2009.
Currently, with the larger flexibility of facet.json, e.g. sorting on stats-facets (e.g. avg(price)) of another field, I guess this could be implemented as an additional sort-option. At least for facets of type term, the result-count (df for current result-set) only needs to be divided by the df of that term for the index (docfreq). If the current result-set is the complete index, facets should be sorted by count.
I will probably implement a workaround in the client for fields with a fixed and rather small vocabulary, e.g. based on a second, cashed query on the complete index. However, for term-fields and similar this might not scale.
Is it possible to do a custom multiplication of the Score returned by Solr? We have a factor in the range of 1.00-1.30 based on our own formula and I wish to just multiply the "final" Solr score with this - without having it normalized.
I've tried using various boosts in DisMax, but none of them produce the desired result, because 1) custom value is added (not multiplied) to the score and 2) they are normalized (queryNorm) before addition.
I found a way to do this. Using the Extended DisMax query parser, introduced in 3.1, it offers all the same features as the normal DisMax, but with a few useful enhancements.
The one I needed was the boost parameter. It acts the same way as the bf parameter from DisMax, but instead of adding a normalized value to the score, it multiplies the boost into the score (without any normalization).
For more info, see the Solr Wiki on ExtendedDisMax
I have a field in my schema which holds the number of votes a document has. How can I boost documents based on that number?
Something like the one which has the maximum number has a boost of 10, the one with the smallest number has 0.5 and in between the values get calculated automatically.
What I do now is this, but it doesn't give the desired results:
recip(rord(vote_count),1,1000,1000)^10.0
Thanks.
i tend to build my indexes using raw lucene, in which case it is extremely easy,
doc.setBoost(boost_val);
I'm just starting on this and it looks like either a linear boost or log based boost will help most: i.e. log(votecount)^10 (don't forget ^10 means boost times 10, not to the tenth power.