Azure Search different scores for exact match - azure-cognitive-search

I have Table : User, with fields say firstName, lastName
If I search for Amit and use only searchField as firstName, i get different scores.
$count=true&search=Amit^2&searchFields=firstName&$select=firstName&queryType=full
"value": [
{
"#search.score": 7.986226,
"firstName": "Amit"
},
{
"#search.score": 7.986226,
"firstName": "Amit"
},
...
...
...
{
"#search.score": 7.986226,
"firstName": "Amit"
},
{
"#search.score": 7.9655724,
"firstName": "Amit"
},
Above is small resultset but i can see score changing after 15-20 results.
I was expecting same score if firstName is same, since complex query can be sort on score, last Name.

The search score for a document is a combination of how well a document matches a query and how relevant it is compared to "nearby" documents. Depending on the exact partitioning of documents into shards, exact matches may get different scores, but they will always score higher than non-exact matches.

Related

How can I rank results lower in SOLR if two fields match at the same time?

I have records with a "title" and a "brand" fields and i query both fields.
Sometimes a record has the brand in the title, which will result in higher scores, but I want to score them the same.
How can i rate records lower were both fields match?
Your solution is not ideal.
In Solr, there is the Dismax query parser that allows you to search for individual terms across several fields, using some other parameters to influence the final score.
The q parameter defines the main query while the qf parameter can be used to specify a list of fields with which to search.
In addition, the tie parameter lets you control how much the final score of the query will be influenced by the scores of the lower-scoring fields compared to the highest-scoring field.
Let's make a simple example.
Using the standard query parser this is what you will obtain running this query (q=adidas):
http://localhost:8983/solr/indexName/select?q=title:adidas%20OR%20brand:adidas&fl=id,title,brand,score
"docs": [
{
"id": "2",
"title": "Shoes Adidas",
"brand": "Adidas",
"score": 0.9623127
},
{
"id": "1",
"title": "Shoes",
"brand": "Adidas",
"score": 0.31506687
},
{
"id": "6",
"title": "Shirt",
"brand": "Adidas",
"score": 0.31506687
}
]
The doc with id 2 has a higher score than the others because the score is the sum of two clauses ('adidas' in title + 'adidas' in brand).
If you perform a Dismax query with tie=0 (a pure "disjunction max query"):
http://localhost:8983/solr/indexName/select?defType=dismax&q=adidas&qf=brand%20title&fl=id,title,brand,score&tie=0
You will obtain:
"docs": [
{
"id": "2",
"title": "Shoes Adidas",
"brand": "Adidas",
"score": 0.6472458
},
{
"id": "1",
"title": "Shoes",
"brand": "Adidas",
"score": 0.31506687
},
{
"id": "6",
"title": "Shirt",
"brand": "Adidas",
"score": 0.31506687
}
]
The doc with id 2 has a lower score than before because only the maximum scoring subquery contributes to the final score, i.e. it takes the max score between 0.6472458 and 0.31506687 without summing them (0.9623127).
With the qf parameter, it is also possible to assign a boost factor to increase or decrease the importance of a particular field in the query, for example:
&qf=brand^3 title
It makes matches in brand much more significant than matches in title.
In any case, boosting should be used with caution because it may lead to unexpected results. Every decision with boosting should be supported by an online and offline search relevance evaluation.
Can this help you?
I solved it by removing all occurrences of the brand in the title (and other fields) when writing the index.

Include fields other than count in azure facet results?

While faceting azure search returns the count for each facet field by default.How do I also get other searchable fields for every facet?
Ex When I facet for area , I want something like this.(description is a searchable field)
{
"area": [
{
"count": 1,
"description": "Acrylics",
"value": "ACR"
},
{
"count": 1,
"description": "Power",
"value": "POW"
}
]
}
Can someone please help with the extra parameters I need to send in the query?
Unfortunately there is no good way to do this as there is no direct support for nested faceting in Azure search (you can upvote it here). To achieve the result you want you would need to store the data together as a composite value as described by this workaround.

Why is it possible to get duplicate results from Azure Search when paging?

Sometimes when using Azure Search's paging there may be duplicate documents in the results. Here is an example of a paging request:
GET /indexes/myindex/docs?search=*$top=15&$skip=15&$orderby=rating desc
Why is this possible? How can it happen? Are there any consistency guarantees when paging?
The results of paginated queries are not guaranteed to be stable if the underlying index is changing, or if you are relying on sorting by relevance score. Paging simply changes the value of $skip for each page, but each query is independent and operates on the current view of the data (i.e. – there is no snapshotting or other consistency mechanism like you’d find in a general-purpose database).
Here is an example of how you might get duplicates. Assume an index with four documents:
{ "id": "1", "rating": 5 }
{ "id": "2", "rating": 3 }
{ "id": "3", "rating": 2 }
{ "id": "4", "rating": 1 }
Now assume you want to page through the results with a page size of two, ordered by rating. You’d execute this query to get the first page:
$top=2&$skip=0&$orderby=rating desc
And get these results:
{ "id": "1", "rating": 5 }
{ "id": "2", "rating": 3 }
Now you insert a fifth document into the index:
{ "id": "5", "rating": 4 }
Shortly thereafter, you execute a query to fetch the second page of results:
$top=2&$skip=2&$orderby=rating desc
And get these results:
{ "id": "2", "rating": 3 }
{ "id": "3", "rating": 2 }
Notice that you’ve fetched document 2 twice. This is because the new document 5 has a greater value for rating, so it sorts before document 2 and lands on the first page.
In situations where you're relying on document score (either you don't use $orderby or you're using $orderby=search.score()), paging can return duplicate results because each query might be handled by a different replica, and that replica may have different term and document frequency statistics -- enough to change the relative ordering of documents at page boundaries.
For these reasons, it’s important to think of Azure Search as a search engine (because it is), and not a general-purpose database.

Cloudant search with grouping on array-type field

I have a database with documents like these:
{_id: "1", module:["m1"]}
{_id: "2", module:["m1", "m2"]}
{_id: "3", module:["m3"]}
There is an search index created for these documents with the following index function:
function (doc) {
doc.module && doc.module.forEach &&
doc.module.forEach(function(module){
index("module", module, {"store":true, "facet": true});
});
}
The index uses "keyword" analyzer on module field.
The sample data is quite small (11 documents, 3 different module values)
I have two issues with queries that are using group_field=module parameter:
Not all groups are returned. I get 2 out of 3 groups that I expect. Seems like if a document with ["m1", "m2"] is returned in the "m1" group, but there is no "m2" group. When I use counts=["modules"] I get complete lists of distinct values.
I'd like to be able to get something like:
{
"total_rows": 3,
"groups": [
{ "by": "m1",
"total_rows": 1,
"rows": [ {_id: "1", module: "m1"},
{_id: "2", module: "m2"}
]
},
{ "by": "m2",
"total_rows": 1,
"rows": [ {_id: "2", module: "m2"} ]
},
....
]
}
When using group_field, bookmark is not returned, so there is no way to get the next chunk of the data beyond 200 groups or 200 rows in a group.
Cloudant Search is based on Apache Lucene, and hence has its properties/limitations.
One limitation of grouping is that "the group field must be a single-valued indexed field" (Lucene Grouping), hence a document can be only in one group.
Another limitation/property of grouping is that topNGroups and maxDocsPerGroup need to be provided in advance, and in Cloudant case the max numbers are 200 and 200 (they can be set lower by using group_limit and limit parameters).

Solr facet substring search

Imagine I have the following facets:
Speakers: [Mike Thompson, Thomas Wilkinson, Sally Jeffers]
Venues: [Weill Thomas Medical Center, BB&R Medical Associates, KLR Thompson]
Solr seems to allow a &facet.prefix=Thom where I can get the facets that START with "Thom" and that will return "Speaker: Thomas Wilkinson" but no others.
How can I do the equivalent of &facet.substring=Thom which will return Mike Thompson and Weill Thomas....
I tried &facet.query=Thom but that doesnt seem to work at all.
Thanks
It is not possible to be sure as you did not provide your full query string, but it may be that the facet is not returning Weill Thomas in facet results because you are only specifying facet.field=speakers in your query, and Weill Thomas is actually in the venues field. You would require second facet.field=venues parameter in your search query to retrieve those.
Facet prefix is only used to filter results once the search is already done, so don't use that parameter for searching purposes. Check this question: SOLR facet search by prefix with results highlighting
Edit based on comment:
You don't necessarily need to filter results returned by faceting after the fact, just make sure that only the facets you want match the original query. The facets that were not part of the search query will have 0 occurances on them if you return all facets. You can then set facet.mincount=1 to only get facets that are found within the search results. Here's an example that I mocked up with test data:
q=*Thom*&rows=0&df=speakers&wt=json&indent=true&facet=true&facet.field=speakers&facet.field=venues&facet.mincount=1&json.nl=map
And the response from Solr:
"responseHeader": {
"status": 0,
"QTime": 3,
"params": {
"q": "*Thom*",
"df": "speakers",
"facet.field": [
"speakers",
"venues"
],
"json.nl": "map",
"indent": "true",
"facet.mincount": "1",
"rows": "0",
"wt": "json",
"facet": "true",
"_": "1431772681445"
}
},
"response": {
"numFound": 2,
"start": 0,
"docs": []
},
"facet_counts": {
"facet_queries": {},
"facet_fields": {
"speakers": {
"Mark Thomas": 1,
"Thomas Moore": 1
},
"venues": {
"Weill Thomas": 1
}
},
"facet_dates": {},
"facet_ranges": {},
"facet_intervals": {},
"facet_heatmaps": {}
}
Just wanted to point out a caveat of the proposed solution (i.e. which is to basically just do your facet substring query as the main Solr query, and then the facet values will be what you want). This won't work correctly for multi-valued fields. For example, if a document had 3 values for speaker of "Mark Thomas", "Fred Jones", "John Doe", then the query 'q=*Thom*' would return as facets "Fred Jones" and "John Doe", in addition to "Mark Thomas", and this would not be the desired result (i.e. "Fred Jones" and "John Doe" should not be returned). So for single-valued fields this solution could work, but for multi-valued fields you would probably have to write an intermediary web service that would filter out the non-matches (like "Fred Jones" and "John Doe"). Solr should really add a facet.substring parameter that would work like the facet.prefix parameter, but do substring filtering on the facet values instead of prefix filtering.

Resources