I would like to do something like solr relevancy and sort by the scoring.
But the SOLR relevancy have some defined score (tf-idf).
For example:
I have boost method by field "NAME" and records returned(with score from solr and my expected score).
AA boost score 3
BB boost score 2
AB boost score 1
ID, Name, Score, Expected_Score
1, AA, 8, 3
2, AA, 6, 3
3, AA, 6, 3
4, AABB, 7, 6
5, BB, 9, 2
You can see the solr relevancy will given different score for NAME "AA" as it base on default score from query instead of only field Name. Also the most important is "BB" are given higher score compare to "AA" but the booster score is less.
I would like the scoring just base on customized field and data to boost only, without affected by others field. Similar with MySQL Weightage (ordering results based on combine weightage of fields) but in SOLR
Related
I am trying the range facet on a field called popularity .
facet=true&facet.range=popularity&facet.range.gap=1&facet.range.start=1&facet.range.end=5.001
I am getting the below result
"popularity":{
"counts":[
"1.0",3,
"2.0",0,
"3.0",8,
"4.0",21,
"5.0",23],
But when I am adding the filter query - fq=popularity:[3 TO 4} , I am getting 11 results, I see all the results are having popularity between 3 to 4, but why when faceting on popularity it is under counting the items ?
Thanks
I have data indexed into solr as with fields like :-
name:Apples weight:5kg
name:Grapes weight:2kg
name:papaya weight:7kg
name:Apples weight:3kg
name:Grapes weight:3kg
I want my results to be shown in such a way that all my results except Apples comes as usual results and after that the results for apples are shown at the end that too with weight range of 4-8 kg only.
i.e the results for apples are shown at the end that too with a particular weight range.
First you'll have to limit the documents you want to your criteria - i.e. you want all documents, except for those that are apples and outside of 4-8kg (this assumes that your weight field is an integer - if it isn't - make it an integer field so that you can do proper range searches):
q=(*:* NOT name:Apples) OR (name:Apples AND weight[4 TO 8])
Then you can apply a negative boost to Apples (which you do by boosting everything that doesn't match by a large factor):
bq=(*:* -name:Apples)^1000
Let me try to explain my problem, let's assume I have a multi-valued field called "enrolment" in each document that contains name of students in it.
Now while searching Solr, let's say I fire search for the names of three students - Manish, Amit, Navin. Now Solr returns all documents containing any one of these names (which is obviously desired in my case). Now some documents may have all 3 of them, or 2 of them or 1 of them. I want these documents/results sorted in an order such that document with maximum matching will be at the top, followed by lesser number of matches.
I tried adding sort: score desc for this, but it doesn't work as desired because the score is "1" for all matching documents.
How can I achieve the sort order by maximum number of matches for my multi-valued field?
Given a multivalued integer field where you want to rank the documents based on the number of matches, apply a boost query for each match. For example, if you have a series of monitors that come in different sizes, you can apply a boost for each size that is valid (I hacked this together and tested it with the example docs from the tech core, so that's my example and I'm sticking with it). I have two relevant documents, one named VA902B with sizes given as a multi valued field with values 23, 28, and 32, and one named 3007WFP with values 23, 29, 36 in the same field.
Here I'm asking for any document, but give me those that have both size 28 and size 23 at the top, and then those that have either size 28 or size 23, and then any other document:
?bq=sizes:28&bq=sizes:23&defType=edismax&q=*:*
If I want to limit the set of documents to only those that match either of the sizes, I can use that as my main query:
?defType=edismax&q=sizes:(23%2028)
.. and this is where I discover that your presumption that the score is the same regardless of the number of matches is false. Adding &debugQuery=true to the URL gives us detailed scoring information for each document:
"explain": {
"VA902B": "\n2.0 = sum of:\n 1.0 = sizes:[23 TO 23]\n 1.0 = sizes:[28 TO 28]\n",
"3007WFP": "\n1.0 = sum of:\n 1.0 = sizes:[23 TO 23]\n"
},
.. which means that there is no need for applying a boost - the behaviour you want is the standard behaviour for Solr. This was my initial thought, but that should have given you the correct answer with the queries you gave in the comments.
But I'll show you how my strategy with applying boosts would have worked as well:
?bq=sizes:28&bq=sizes:23&defType=edismax&q=sizes:(23%2028)&debugQuery=true
.. which now tells us that the score for each document has effectively doubled, since it gets scored 1.0 (from the query) + 1.0 (from the boost) for each match.
"explain": {
"VA902B": "\n4.0 = sum of:\n 2.0 = sum of:\n 1.0 = sizes:[23 TO 23]\n 1.0 = sizes:[28 TO 28]\n 1.0 = sizes:[28 TO 28]\n 1.0 = sizes:[23 TO 23]\n",
"3007WFP": "\n2.0 = sum of:\n 1.0 = sum of:\n 1.0 = sizes:[23 TO 23]\n 1.0 = sizes:[23 TO 23]\n"
},
I also tested the q=sizes(23 28) query with the standard lucene query parser (and not dismax/edismax which support bq), and the behaviour was the same.
I have received a spec to add a relevance score to search results, based on which column the result is in. As an example I have a product table with, amongst other fields, keywords,productNames and brands.
I currently check to find a product by link to
JOIN CONTAINSTABLE(Products, (keywords, productNames, brands), '"NIKE*"')
Now this will find the record with the search term on but I need to weight the results by column eg. keywords scores 1, productNames scores 2, brands 4, etc. The sum of the scores I can then add together to give my relevancy of result. i.e. if "Nike" is in all three columns it would score 7, just in brands 4, etc.
To facilitate this I need to know which columns containstable matches on, but haven't found any details on that.
I've looked at the ISABOUT option, but that's for weighting multiple search terms in a single column.
At the moment I have a case statement
CASE WHEN CONTAINS (Keywords, '"Nike*"') THEN 1 ELSE 0 END +
CASE WHEN CONTAINS (productNames, '"Nike*"') THEN 2 ELSE 0 END +
CASE WHEN CONTAINS (brands, '"Nike*"') THEN 4 ELSE 0
AS Relevance
Which does work, but seems to be very wasteful since containstable must already be doing the work.
If anyone has any ideas then they'll be gratefully received.
i work with solr, i can't fix my problem of result's accuracy (q vs bf taking into account accents)
i have a solr index with 2 fields indexed (this is simplified):
town, population
Félines, 100
Ferrand, 10000
when i query: q=Fé&qf=town town_ascii&bf=population^2&defType=dismax
I'd like this order on my results : Félines > Ferrand.
When i query: q=Fe&qf=town town_ascii&bf=population^2&defType=dismax I'd like this order on my results : Ferrand > Félines
The trouble is that Ferrand beats every time Félines because its population is bigger, how can i solve that? I didn't find how to use the score of the query and use it in bf to balance population
You didn't post your schema.xml but I suppose you're using the ASCIIFoldingFilterFactory for the town_ascii field. It means that if you're indexing the word Félines the following are the indexed terms:
town: Félines
town_ascii: Felines
Therefore, you're saying that a match for the town field is more important than a match for town_ascii. You should change the qf parameter to something like qf=town^3 town_ascii to give more weight to the town field. Then you can adjust the weight depending on what is the desired weight for town compared to population.