Confuse about the tie and qf in edismax - solr

I am confused about the qf and tie parameters in eDisMax
According to the document:
The qf is used to specify which field to search, while tie is use to specify all other field(except the highest score field)'s affect to the total score.
My confusion is since we already specify which field(suppose we only specify only one field) to search, why we still be able to get other fields to affect the total results(I guess this must be my misunderstanding to how edismax works, but this is also my confuse.)?
Or does that mean each time, edismax will calculate all the score across all fields and apply them with tie to the final score(even we only specify one field)?

No, tie parameter is not about fields. Let me explain basic stuff that eDisMax doing - when it works against multiple fields it didn't sum score across fields (as boolean query did, for example), instead it choose maximum.
E.g. if we have fields A and B and score for field A is 3.0, and for B - 5.0, then eDisMax will get score 5.0, completely ignoring other score.
The "tie" param let's you configure how much the final score of the query will be influenced by the scores of the lower scoring fields compared to the highest scoring field.
So, if tie = 0.1, then final score of previous example will be 5.0 + 0.1 * 3.0 = 5.3
More information about tie param: https://wiki.apache.org/solr/ExtendedDisMax#tie_.28Tie_breaker.29

Related

Solr Query Syntax conversion from boolean expression

I'm attempting to query solr for documents, given a basic schema with the following field names, data types irrelevant:
I'm attempting to match documents that match at least one of the following:
occupation, name, age, gender but i want to OR them together
How do you OR together many terms, and enforce the document to match at least one?
This seems to be failing: +(name:Sarah age:24 occupation:doctor gender:male)
How do you convert a boolean expression into solr query syntax? I can't figure out the syntax with + and - and the default operator for OR.
Still I don't get your requirement but you just need to query like:
+(age:24 OR gender:male)
Or if you want data for multiple value in same field with OR condition like.
i.e. You get data of age:24 and age:25 both.
+(age:24 OR age:25 OR gender:male)
Then you can:
+(age:(24 25) OR gender:male)
If it is't your requirement, then let me know.
If you want to make it as simple as possible for the client, just go for the dismax[1] or edismax[2] query parser.
Specifically you can configure a request parameter called "qf" :
"The qf parameter introduces a list of fields, each of which is assigned a boost factor to increase or decrease that particular field’s importance in the query. For example, the query below:
qf=fieldOne^2.3 fieldTwo fieldThree^0.4
assigns fieldOne a boost of 2.3, leaves fieldTwo with the default boost (because no boost factor is specified), and fieldThree a boost of 0.4.
These boost factors make matches in fieldOne much more significant than matches in fieldTwo, which in turn are much more significant than matches in fieldThree." from the wiki
Then you can just pass a free text query, and it will be searched in the fields you specified, giving also different importance to each one, if necessary.
[1] https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html
[2] https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html

Adding Boost to Score According to Payload of Multivalued Field at Solr

Here is my case;
I have a field at my schema named elmo_field. I want that elmo_field should have payloaded values. i.e.
dorothy|0.46 sesame|0.37 big bird|0.19 bird|0.22
When a user searches for a keyword i.e. dorothy I want to add 0.46 to usual score. If user searches for big bird, 0.19 should be added and if user searches for bird, 0.22 should be added (payloads are added - or payloads * normalize coefficient will be added).
I mean I will make a search on my index at my other fields of solr schema. And I will make another search (this one is an exact match search) at elmo_field at same time and if matches something I will increase score with payloads.
Any ideas?
I've implemented a custom similarity wrapper. For usual things I've used DefaultSimilarity. If a field is a payloaded field another similarity that is implemented by me is used. That similarity class just ignores payload value. I've also implemented a query parser that is a customized version of edismax. With that approach I could add payload value into the document score.
Have you looked at CustomScoreQuery?
There's an example with some explanation how to do this at http://dev.fernandobrito.com/2012/10/building-your-own-lucene-scorer/
You could do a boost on a query as this question suggests: How to assign a weight to a term query in Lucene/Solr
Or you could try using payloads as described here:
http://searchhub.org/2009/08/05/getting-started-with-payloads/

Solr - how to plan field boosting

I query using
qf=Name+Tag
Now I want that documents that have the phrase in tag will arrive first so I use
qf=Name+Tag^2
and they do appear first.
What should be the rule of thumb regarding the number that comes after the field?
How do I know what number to set it?
The number is pure preference based and is mainly trial and error basis.
As to how much the field weighs in comparison to the other field.
The scoring takes into account various factors, however some factors can be considered and tested
e.g. term frequency - So is a word appears twice in Name should it override a single occurrence in the tag field
Also, if you are checking for a Phrase match you should use pf if using the edismax parser.
qf will match individuals words where pf will match whole words.
For e.g. if you have fields name & tag and you search for ruby rails
qf would cause scoring name:ruby tag:ruby & name:rails tag:rails
pf would cause scoring name:"ruby rails" tag:"ruby rails"
so would be better to use qf to match the results and boost single matches but have higher pf values.

If WildcardQuery doesn't affect the scoring of documents, why does it return 0.5 constantly?

I am using a WildcardQuery on documents and I see that the result documents all of them have a score of 0.5. I read that queries like WildcardQuery do not affect the scoring of documents and now I am wondering what is the cause of the score to be 0.5.
I am using this simple query:
WildcardQuery wq = new WildCardQuery("filed_name", "book");
WildcardQuery certainly does affect scoring. It uses a CONSTANT_SCORE_AUTO_REWRITE, which may be what you are referring to. That means that fields that match the WildcardQuery each have a equal boost to the score added by that match. There is, however, none of the typical Similarity logic (tf-idf, for instance) applied for the WildcardQuery's matches.

Solr Fuzzy search in multiValued field with max distance between terms

Hello stackOverflowers
I have a field in a Solr document collection with a field called
names_txt - this is a multiValue="true" field.
This field contains all the names of the associated persons to a document
I want to be able to both do a fuzzy search and at the same time limit the number of terms between the to matching terms.
The query
names_txt:("markus foss"~2)
Will return all documents where you find the terms markus and foss where theres max 2 terms between them.
But when i search in a fuzzy way AND want to also specify the max number of terms between the matches, I cant get the syntax right.
The query:
names_txt:(markus~0.7 foss~0.7)
This does work, but returns false postives, since it will return a document with "markus something" in one value, and "foss somethingElse" in another.
What I would like to write is:
(markus~0.7 foss~0.7)~2
but this syntax is illegal in solr.
Anyone out there have a solution for my problem?
Since in one single query term Solr can either process a word distance restraint or a fuzzy search restraint, we will need two terms for this:
names_txt:("markus foss"~2) AND names_txt:(markus~0.7 foss~0.7)
Note that quantifying fuzzyness by a float number is deprecated. Internally, lucene converts converts the float number to an int between 0 and 2 anyway, so we should use this integer (Damereau Levenshtein) edit distance right from the beginning in our search terms. So my final proposal states:
names_txt:("markus foss"~2) AND names_txt:(markus~1 foss~1)
(For those who are interested: The deprecated, somewhat quirky function that converts the similarity float to an edit distance int can be found at the end of this code file.)
I think you could do that using SpanQuery The issue is that the usual query parsers in Solr dont support them. Look at this article that mentions those that support spans: Surround, Xml-Query-Parser and Qsol. But check the status of each in current solr version.

Resources