Groupby/faceting by multiple fields in azure search - azure-cognitive-search

I want to groupby/faceting by multiple fields, say by "name" and "type" fields in the search index. Is it possible in Azure search. If so how can it be done?

It is not possible to facet by the combined values of multiple fields. You'd have to denormalize the fields yourself when you populate the index, then facet by the denormalized field. For example, if you have 'name' and 'type' fields, you'd have to create a combined 'nametype' field containing the combination of 'name' and 'type'. Then you would refer to the 'nametype' field in the 'facet' parameter of the Search request.
If before you had a document like this:
{ "id": "1", "name": "John", "type": "Customer" }
Now you will have a document like this:
{ "id": "1", "name": "John", "type": "Customer", "nametype": "John; Customer" }
(You can use whatever separator you like between the name part and type part of nametype.)
Now, when you search, include facet=nametype in the request, and you'll get a count of all combinations of 'name' and 'type' that exist in the index.

Related

Include fields other than count in azure facet results?

While faceting azure search returns the count for each facet field by default.How do I also get other searchable fields for every facet?
Ex When I facet for area , I want something like this.(description is a searchable field)
{
"area": [
{
"count": 1,
"description": "Acrylics",
"value": "ACR"
},
{
"count": 1,
"description": "Power",
"value": "POW"
}
]
}
Can someone please help with the extra parameters I need to send in the query?
Unfortunately there is no good way to do this as there is no direct support for nested faceting in Azure search (you can upvote it here). To achieve the result you want you would need to store the data together as a composite value as described by this workaround.

MongoDB unique index for all array elements

I'm trying to create unique index for array field in document. This index should works like when I have one document with array which contain two elements, then if I want to add a new document where array field if contain these two elements then should happen duplicate error - but not in situation when only one of elements is duplicated in another array.
Maybe I'll show the example what I mean:
First I create a simple document:
{
"name" : "Just a name",
"users" : [
"user1",
"user2"
]
}
And I want to create unique index on 'users' array field. The result of what I want is to make it possible to create another documents like this:
{
"name" : "Just a name",
"users" : [
"user1",
"user3"
]
}
or
{
"name" : "Just a name",
"users" : [
"user2",
"user5"
]
}
BUT it should be impossible to create second:
{
"name" : "Just a name",
"users" : [
"user1",
"user2"
]
}
Or reversed:
{
"name" : "Just a name",
"users" : [
"user2",
"user1"
]
}
But this is impossible because Mongo give me a error that "users1" is duplicated.
Is it possible to create unique index on all array elements as shown above?
As per the Mongo official documentation
For unique indexes, the unique constraint applies across separate documents in the collection rather than within a single document.
Because the unique constraint applies to separate documents, for a unique multikey index, a document may have array elements that result in repeating index key values as long as the index key values for that document do not duplicate those of another document.
So you can't insert the second documents as
{
"name" : "Just a name",
"users" : [
"user1",
"user3"
]
}
You will get the duplicate error of unique constraint:
> db.st1.insert({"name":"Just a name","users":["user1","user3"]})
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "E11000 duplicate key error collection: test.st1 index: users_1 dup key: { : \"user1\" }"
}
})
Since users user1 already exist the users index for the the fist documents.
Currently you have only solution to manage it through your code from where you are inserting into the collection. Before save or update make a validation logic and make sure the constraint you want to impose.
I have a very similar problem and sadly it seems it's not possible. Not because unique constraint applies only across separate documents, but because:
To index a field that holds an array value, MongoDB creates an index key for each element in the array
i.e. each of the individual array elements has to be unique across all other documents.

How to "join" 2 indices and search in ElasticSearch?

Suppose I have an index called "posts" with the following properties:
{
"uid": "<user id>",
"date": "<some date>",
"message": "<some message>"
}
And another index called "users" with the following properties:
{
"uid": "<user id>",
"gender": "Male"
}
Now, I'm searching for posts posted by people who are males. How can I do that?
I definitely don't want to have a "user" property in a post and store the gender of the user in there. Because when a user updates his/her gender, I'd have to go to every single post that he/she has ever posted to update the gender.
Elasticsearch doesn't support inter index relation till now. There is 'join' datatype but it supports only fields within the same index.

How to filter an array in Azure Search

I have following Data in my Index,
{
"name" : "The 100",
"lists" : [
"2c8540ee-85df-4f1a-b35f-00124e1d3c4a;Bellamy",
"2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike",
"2c8540ee-85df-4f1a-b35f-00155c02e581;Clark"
]
}
I have to get all the documents where the lists has Pike in it.
Though a full search query works with Any I could't get the contains work.
$filter=lists/any(t: t eq '2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike')
However i am not sure how to search only with Pike.
$filter=lists/any(t: t eq 'Pike')
I guess the eq looks for a full text search, is there any way with the given data structure I should make this query work.
Currently the field lists has no searchable property only the filterable property.
The eq operator looks for exact, case-sensitive matches. That's why it doesn't match 'Pike'. You need to structure your index such that terms like 'Pike' can be easily found. You can accomplish this in one of two ways:
Separate the GUIDs from the names when you index documents. So instead of indexing "2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike" as a single string, you could index them as separate strings in the same array, or perhaps in two different collection fields (one for GUIDs and one for names) if you need to correlate them by position.
If the field is searchable, you can use the new search.ismatch function in your filter. Assuming the field is using the standard analyzer, full-text search will word-break on the semicolons, so you should be able to search just for "Pike" and get a match. The syntax would look like this: $filter=search.ismatch('Pike', 'lists') (If looking for "Pike" is all your filter does, you can just use the search and searchFields parameters to the Search API instead of $filter.) If the "lists" field is not already searchable, you will need to either add a new field and re-index the "lists" values, or re-create your index from scratch with the new field definition.
Update
There is a new approach to solve this type of problem that's available in API versions 2019-05-06 and above. You can now use complex types to represent structured data, including in collections. For the original example, you could structure the data like this:
{
"name" : "The 100",
"lists" : [
{ "id": "2c8540ee-85df-4f1a-b35f-00124e1d3c4a", "name": "Bellamy" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c40f11c", "name": "Pike" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c02e581", "name": "Clark" }
]
}
And then directly query for the name sub-field like this:
$filter=lists/any(l: l/name eq 'Pike')
The documentation for complex types is here.

Cloudant search with grouping on array-type field

I have a database with documents like these:
{_id: "1", module:["m1"]}
{_id: "2", module:["m1", "m2"]}
{_id: "3", module:["m3"]}
There is an search index created for these documents with the following index function:
function (doc) {
doc.module && doc.module.forEach &&
doc.module.forEach(function(module){
index("module", module, {"store":true, "facet": true});
});
}
The index uses "keyword" analyzer on module field.
The sample data is quite small (11 documents, 3 different module values)
I have two issues with queries that are using group_field=module parameter:
Not all groups are returned. I get 2 out of 3 groups that I expect. Seems like if a document with ["m1", "m2"] is returned in the "m1" group, but there is no "m2" group. When I use counts=["modules"] I get complete lists of distinct values.
I'd like to be able to get something like:
{
"total_rows": 3,
"groups": [
{ "by": "m1",
"total_rows": 1,
"rows": [ {_id: "1", module: "m1"},
{_id: "2", module: "m2"}
]
},
{ "by": "m2",
"total_rows": 1,
"rows": [ {_id: "2", module: "m2"} ]
},
....
]
}
When using group_field, bookmark is not returned, so there is no way to get the next chunk of the data beyond 200 groups or 200 rows in a group.
Cloudant Search is based on Apache Lucene, and hence has its properties/limitations.
One limitation of grouping is that "the group field must be a single-valued indexed field" (Lucene Grouping), hence a document can be only in one group.
Another limitation/property of grouping is that topNGroups and maxDocsPerGroup need to be provided in advance, and in Cloudant case the max numbers are 200 and 200 (they can be set lower by using group_limit and limit parameters).

Resources