Cloudant Lucene index with different relevance per field - cloudant

How can I specify during the index creation that one field should receive more relevance than another field?
Example: I have documents with a title and a description field and want the content of the title field to be more important during query time.
doc1: title:"Hello, world", description:"Just a greeting"
doc2: title:"Greetings", description:"Hello, everybody. Hello, hello"
index("default", doc.title);
index("default", doc.description);
A search for the term "hello" should return doc1 one with a higher relevance than doc2 because the word "hello" is present in the title field even though doc2 contains the word 3 times.
How can this be accomplished?

You can specify a boost at query time e.g. if you index items separately
index("title", doc.title);
index("description", doc.description);
Then at query time your can specify that the title gets more weight than the description field
q=(title:hello)^100 OR (description:hello)
where ^100 indicates that this term is boosted. See https://docs.cloudant.com/search.html#query-syntax

Related

Solr search relevancy

i use solr and i have a trouble with result score. For example
i have such docs with one field (for example "content"):
content = car
content = cars
content = carable awesome
content = awful for carable
And i make search query with such params ":{
"mm":"1",
"q":"car",
"tie":"0.1",
"defType":"dismax",
"fl":"*, score",}
i expect to see the result like this:
car: 5 score
cars: 4.8 score
carable awesome: 3
awful for carable: 3
Word without "s" should be highter, but i have strange things. How i can boost absolute match (like a car)
This happens because the field type you're using for the field has a stemming filter (or an ngramfilter) attached (which makes cars and car generate hits against each other). You can't boost "exact hits" inside such a field, since for Lucene they are the same value. What's stored in the index is the same for both car and cars - the latter is processed down to car as well.
To implement this and get exact hits higher, you add a second field without that filter present that only tokenizes (splits) your content on whitespace and lowercases the token. That way you have a field where cars and car are stored as different tokens, and tokens won't contribute to the score if they're not being matched.
You can use qf in Solr to tell Solr which fields you want to search against, and you can give a boost at the same time - so in your case you'd have qf=exact_field^10 text_field where hits in exact_field would be valued ten times higher than hits in the regular field (the exact boost values will depend on your use case and how you want the query profile to behave).
You can also use the different boost arguments (bq and boost) to apply boosts outside of your regular query (i.e. add a query to bq that replicates your original query), but the previous suggestion will probably work just fine.

Sorting of solr documents based on search term in solr

I would like to sort solr documents based on searched term. For example the search term is "stringABC"
Then the order of the results should be
stringABC,
stringABCxxxx,
xxxxstringABCxxxx
The solr document will contain lot of fileds ex: title, description, path, article No, Product code etc..
And the default field will contain more than one field ex: title, description and path.
So the solr doc will only be returned when the search term satisfied any field from the default field.
Use three fields - one with the exact string, one with a EdgeNgramTokenizer and one with an NgramTokenizer. You can then use qf=field1^10 field2^5 field3 to score hits in these fields according to how you want to prioritize them between each other.

Boosting search results for numbers in solr

Suppose I have two documents with just one field as follows:
Document 1: foo bar 1
Document 2: foo baz 2
And a user searches for "foo baz 1"
Doucment 1 matches "foo" and "1" and Document 2 matches "baz" and "foo" so they would ordinarily be tied. Is there any way to weight a match on a number higher than a match on text that would cause Document 1's match to be preferred over Document 2?
I don't want to boost by the number that matched, I want all numbers to be boosted by the same amount.
Your question is about boosting numbers in a query.
At query time you can boosting a term or you could use payloads at index time: Adding Boost to Score According to Payload of Multivalued Field at Solr

Solr docs must match one field

I have two fields
text field .. All important fields like category, product name, brand are copied into it.
attributes field .. All attributes are copied into this field.
I have a single search query e.g. "50 mm diameter drill"
I want to search this string in both fields. I am assuming that this will match all products that have drill in the text field.
I want to narrow down the result in case any attributes that match any of 50 mm diameter.
And in case none matches in the attributes field I want to return all documents that match text field.
Edit: I dont want any docs which don't match text field.
I only want that if search is matched to attributes field, and docs are found we return only those docs.
If not found we return all docs which match text field
This is getting a bit tricky and a lot of things depend on your field processing requirements.
You will need to use a combination of field weighting, to rank attributes field higher and edismax minimum match mm
Minimum match allows you to configure how many terms in the query must be hit in order for it to display results. This helps weed out documents that only hit on one term in one field.
Lastly, if you really want to have your own logic in here, you can prepend field with + to make it mandatory. For example +attributes:drill will only return items that have drill in the attributes field.
Whether "drill" will match depends on how your fields are processed, but probably, yes. The easiest way to do this is to not limit by "if not matched here, do this ..", but to score matches in the attributes field higher. You can do this by using qf (if using (e)dismax) together with their weights, such as attributes^20 text which will score any match in attributes 20 times more than a match in text. Any search matching documents with the correct term in attributes will then be scored higher than those just matching in text.
You can also do something similar in the q parameter, where you can weight each term separately: text:drill OR attributes:drill^20.

Solr indexing Title and Text together

I am indexing certain documents in Solr which have a Title and Text. I dont want to create a separate field called Title in the document schema and want to index the title by putting it inside the text itself in some way so that title words are given more importance while scoring.
e.g. Title : Olympics 2012, Text : In December 2012, Olympics were held in......
I want to put the Title words in the Text itself, above should have just one field called Text with Title words inside it.
e.g. Text : Olympics 2012 In December 2012, Olympics were held in......
In the above, title words will not be given any special importance. Is there a way I can accomplish this by giving title words a little extra importance than other words in Text field while indexing/scoring ?
giving title words a little extra importance than other words in Text
field while indexing/scoring
I think there is no need to copy the title field to text field to boost the title over text field. Assuming you have index both fields as full text, please consider to use edismax query, and provide the qf (Query Fields) as
qf=title^10 text
which indicates that matches in title are 10 times more significant than matches in text
The following is an example query in case it helps
http://localhost:8983/solr/select/?q=Olympics&defType=edismax&qf=title^10.0+text

Resources