Include fields other than count in azure facet results? - azure-cognitive-search

While faceting azure search returns the count for each facet field by default.How do I also get other searchable fields for every facet?
Ex When I facet for area , I want something like this.(description is a searchable field)
{
"area": [
{
"count": 1,
"description": "Acrylics",
"value": "ACR"
},
{
"count": 1,
"description": "Power",
"value": "POW"
}
]
}
Can someone please help with the extra parameters I need to send in the query?

Unfortunately there is no good way to do this as there is no direct support for nested faceting in Azure search (you can upvote it here). To achieve the result you want you would need to store the data together as a composite value as described by this workaround.

Related

Azure Search highlighting doesn't work for wildcards with scoring profiles

Azure Search supports highlighting with full text search which facilitates clients to locate the matched term in a returned document. I have provided a simple index schema below to illustrate the issue.
{
"name": "simple-index",
"fields": [
{
"name": "key",
"type": "Edm.String"
},
{
"name": "simplefield",
"type": "Edm.String"
}
],
"scoringProfiles": [
{
"name": "boostedprofile",
"functionAggregation": null,
"text": {
"weights": {
"simplefield": 5,
}
},
"functions": []
}
],
"corsOptions": null,
"suggesters": [],
"analyzers": [],
"tokenizers": [],
"tokenFilters": [],
"charFilters": []
}
For a normal search query like below, it works as expected and gives back the expected result.
search=foobar&highlight=simplefield
On extending the above query to use a wildcard query, things are again as expected with the response containing highlights on the terms matching the prefix. So far so good.
search=foo*&highlight=simplefield&querytype=full
After this when I apply a scoring profile on top of the previous query, the results are unexpected and no highlights are returned.
search=foo*&highlight=simplefield&querytype=full&scoringprofile=boostedprofile
How do I make highlights work for the wildcard queries when using a scoring profiles?
At the time of answering, this is a known limitation in Azure Search where highlighting doesn't work for wildcard queries when used with scoring profiles. Internally Azure Search uses a concept of highlighter which is responsible for the highlighting flow as a separate process that happens after search.
In the case of wildcard query, it involves looking up all terms in the index that match the provided prefix term and then use them to compose the highlighted text. Scoring profiles affect the way terms are looked up in index for highlighting. Due to that the result doesn't include any highlights.
As this is a specific limitation in wildcard queries, one workaround is to pre-process the index to avoid issuing wildcard/prefix queries. Please take a look at custom analysis (https://learn.microsoft.com/en-us/rest/api/searchservice/custom-analyzers-in-azure-search) You can, for example, use edgeNgram tokenfilter and store prefixes of words in the index and issue a regular term query with the prefix (with out the '*' operator)
I hope this is useful. Please vote on the feedback item to help us prioritize our development efforts to support other modes of highlighting that will support the above use-case. https://feedback.azure.com/forums/263029-azure-search/suggestions/32661961-implement-other-highlighters

Solr contextual query boosting

I am new to Solr and I would like to boost my results with a contextual parameter.
For example, in a simple case, admit my documents are :
{
"id": "1",
"list":[]
}
{
"id": "2",
"list": ["3"]
}
...
If I have a query like for example :
http://localhost:8983/solr/MyCore/select?q=*:*&contextParam=3
I would like to boost all documents which list contains the contextParam.
I already use the edismax parser to boost documents on other fields like dates but right now I can't access this optional parameter.
Thank you !

Why is it possible to get duplicate results from Azure Search when paging?

Sometimes when using Azure Search's paging there may be duplicate documents in the results. Here is an example of a paging request:
GET /indexes/myindex/docs?search=*$top=15&$skip=15&$orderby=rating desc
Why is this possible? How can it happen? Are there any consistency guarantees when paging?
The results of paginated queries are not guaranteed to be stable if the underlying index is changing, or if you are relying on sorting by relevance score. Paging simply changes the value of $skip for each page, but each query is independent and operates on the current view of the data (i.e. – there is no snapshotting or other consistency mechanism like you’d find in a general-purpose database).
Here is an example of how you might get duplicates. Assume an index with four documents:
{ "id": "1", "rating": 5 }
{ "id": "2", "rating": 3 }
{ "id": "3", "rating": 2 }
{ "id": "4", "rating": 1 }
Now assume you want to page through the results with a page size of two, ordered by rating. You’d execute this query to get the first page:
$top=2&$skip=0&$orderby=rating desc
And get these results:
{ "id": "1", "rating": 5 }
{ "id": "2", "rating": 3 }
Now you insert a fifth document into the index:
{ "id": "5", "rating": 4 }
Shortly thereafter, you execute a query to fetch the second page of results:
$top=2&$skip=2&$orderby=rating desc
And get these results:
{ "id": "2", "rating": 3 }
{ "id": "3", "rating": 2 }
Notice that you’ve fetched document 2 twice. This is because the new document 5 has a greater value for rating, so it sorts before document 2 and lands on the first page.
In situations where you're relying on document score (either you don't use $orderby or you're using $orderby=search.score()), paging can return duplicate results because each query might be handled by a different replica, and that replica may have different term and document frequency statistics -- enough to change the relative ordering of documents at page boundaries.
For these reasons, it’s important to think of Azure Search as a search engine (because it is), and not a general-purpose database.

Cloiudant using $nin There is no index available for this selector

I created a JSON index in cloudant on _id like so:
{
"index": {
"fields": [ "_id"]
},
"ddoc": "mydesigndoc",
"type": "json",
"name": "myindex"
}
First off, unless I specified the index name, somehow cloudant could not differentiate between the index I created and the default text based index for _id (if that is truly the case, then this is a bug I believe)
I ran the following query against the _find endpoint of my db:
{
"selector": {
"_id": {
"$nin":["v1","v2"]
}
},
"fields":["_id", "field1", "field2"],
"use_index": "mydesigndoc/myindex"
}
The result was this error:
{"error":"no_usable_index","reason":"There is no index available for this selector."}
if I change "$nin":["v1","v2"] to "$eq":"v1" then it works fine, but that is not the query I am after.
So in order to get what I want, I had to this to my selector "_id": {"$gt":null}, which now looks like:
{
"selector": {
"_id": {
"$nin":["v1","v2"],
"$gt":null
}
},
"fields":["_id", "field1", "field2"],
"use_index": "mydesigndoc/myindex"
}
Why is this behavior? This seems to be only happening if I use the _id field in the selector.
What are the ramifications of adding "_id": {"$gt":null} to my selector? Is this going to scan the entire table rather than use the index?
I would appreciate any help, thank you
Cloudant Query can use Cloudant's pre-existing primary index for selection and range querying without you having to create your own index in the _id field.
Unfortunately, the index doesn't really help when using the $nin operator - Cloudant would have to scan the entire database to check for documents which are not in your list - the index doesn't really get it any further forward.
By changing the operator to $eq you are playing to the strengths of the index which can be used to locate the record you need quickly and efficiently.
In short, the query you are attempting is inefficient. If your query was more complex e.g. the equivalent of WHERE colour='red' AND _id NOT IN ['a','b'] then a Cloudant index on colour could be used to reduce the data set to a reasonable level before doing the $nin operation on the remaining data.

Solr facet substring search

Imagine I have the following facets:
Speakers: [Mike Thompson, Thomas Wilkinson, Sally Jeffers]
Venues: [Weill Thomas Medical Center, BB&R Medical Associates, KLR Thompson]
Solr seems to allow a &facet.prefix=Thom where I can get the facets that START with "Thom" and that will return "Speaker: Thomas Wilkinson" but no others.
How can I do the equivalent of &facet.substring=Thom which will return Mike Thompson and Weill Thomas....
I tried &facet.query=Thom but that doesnt seem to work at all.
Thanks
It is not possible to be sure as you did not provide your full query string, but it may be that the facet is not returning Weill Thomas in facet results because you are only specifying facet.field=speakers in your query, and Weill Thomas is actually in the venues field. You would require second facet.field=venues parameter in your search query to retrieve those.
Facet prefix is only used to filter results once the search is already done, so don't use that parameter for searching purposes. Check this question: SOLR facet search by prefix with results highlighting
Edit based on comment:
You don't necessarily need to filter results returned by faceting after the fact, just make sure that only the facets you want match the original query. The facets that were not part of the search query will have 0 occurances on them if you return all facets. You can then set facet.mincount=1 to only get facets that are found within the search results. Here's an example that I mocked up with test data:
q=*Thom*&rows=0&df=speakers&wt=json&indent=true&facet=true&facet.field=speakers&facet.field=venues&facet.mincount=1&json.nl=map
And the response from Solr:
"responseHeader": {
"status": 0,
"QTime": 3,
"params": {
"q": "*Thom*",
"df": "speakers",
"facet.field": [
"speakers",
"venues"
],
"json.nl": "map",
"indent": "true",
"facet.mincount": "1",
"rows": "0",
"wt": "json",
"facet": "true",
"_": "1431772681445"
}
},
"response": {
"numFound": 2,
"start": 0,
"docs": []
},
"facet_counts": {
"facet_queries": {},
"facet_fields": {
"speakers": {
"Mark Thomas": 1,
"Thomas Moore": 1
},
"venues": {
"Weill Thomas": 1
}
},
"facet_dates": {},
"facet_ranges": {},
"facet_intervals": {},
"facet_heatmaps": {}
}
Just wanted to point out a caveat of the proposed solution (i.e. which is to basically just do your facet substring query as the main Solr query, and then the facet values will be what you want). This won't work correctly for multi-valued fields. For example, if a document had 3 values for speaker of "Mark Thomas", "Fred Jones", "John Doe", then the query 'q=*Thom*' would return as facets "Fred Jones" and "John Doe", in addition to "Mark Thomas", and this would not be the desired result (i.e. "Fred Jones" and "John Doe" should not be returned). So for single-valued fields this solution could work, but for multi-valued fields you would probably have to write an intermediary web service that would filter out the non-matches (like "Fred Jones" and "John Doe"). Solr should really add a facet.substring parameter that would work like the facet.prefix parameter, but do substring filtering on the facet values instead of prefix filtering.

Resources