How to ensure all items in a collection match a filter in Azure Cognitive Search - azure-cognitive-search

I have Azure Cognitive Search running, and my index is working as expected.
We are trying to add a security filter into the search, based on the current users permissions.
The users permissions are coming to me in as IEnumerable, but I am currently selecting just a string[] and passing that into my filter, then do a string.join, which looks like this.
permission1, permission2, permission3, permission4
In our SQL database, we have a view that is where the index is getting it's data from. There is a column on the view called RequiredPermissions, it is a Collection(Edm.string) in the index, and the data looks like this.
[ 'permission1', 'permission2', 'permission3' ]
The requirement is that for a record to return in the results, a user's permissions must contain all of the RequiredPermissions for that record.
So if we have a user with the following permissions
permission1, permission3, permission5
And we have the following records
Id, SearchText, Type, Permissions
1, abc, User, [ 'permission1', 'permission2' ]
2, abc.pdf, Document, [ 'permission1' ]
3, abc, Thing, [ 'permission1', 'permission3' ]
4, abc, Stuff, [ 'permission3', 'permission4' ]
If the user searched for 'abc' and these four results would come back, I need to $filter results that do not have the proper permissions. So I would expect the following results
Id, Returned, Reason
1, no, the user does not have permission2
2, yes, the user has permission1 and nothing else is needed
3, yes, the user has both permission1 and permission3
4, no, the user does not have permission4
If I run the following filter, then I get back anything that has permission1 or permission3, which is not acceptable, since the user should not see items Id 1 or 4
RequiredPermissions/any(role: search.in(role, 'permission1, permission3', ','))
If I run this filter, then I get nothing back, everything is rejected, because no records have permission5, and the user has it
RequiredPermissions/all(role: not search.in(role, 'permission1, permission3', ','))
If I try to run the search using 'all' and without the 'not' I get the following error
RequiredPermissions/all(role: search.in(role, 'permission1, permission3', ','))
Invalid expression: Invalid lambda expression. Found a test for equality or inequality where the opposite was expected in a lambda expression that iterates over a field of type Collection(Edm.String). For 'any', please use expressions of the form 'x eq y' or 'search.in(...)'. For 'all', please use expressions of the form 'x ne y', 'not (x eq y)', or 'not search.in(...)'.\r\nParameter name: $filter
So it seems that I cannot use the 'not' with 'any', and I must use the 'not' with 'all'
What I wish for is a way to say that a user has all the permissions in their list that is in the RequiredPermissions column.
I am currently just working in Postman using the RestApi to solve this, but I will eventually move this into .Net.

Your scenario can't be implemented with Collection(Edm.String) due to the limitations on how all and any work on such collections (documented here).
Fortunately, there is an alternative. You can model permissions as a collection of complex types, which allows you to use all the way that you need to implement your permissions model. Here is a JSON example of how the field would be defined:
{
"name": "test",
"fields": [
{ "name": "Id", "type": "Edm.String", "key": true },
{ "name": "RequiredPermissions", "type": "Collection(Edm.ComplexType)", "fields": [{ "name": "Name", "type": "Edm.String" }] }
]
}
Here is a JSON example of what a document would look like with its permissions defined:
{ "#search.action": "upload", "Id": "1", "RequiredPermissions": [{"Name": "permission1"}, {"Name": "permission2"}] }
Here is how you could construct a filter that has the desired effect:
RequiredPermissions/all(perm: search.in(perm/Name, 'permission1,permission3,permission5'))
While this works, you are strongly advised to test the performance of this solution with a realistic set of data. Under the hood, all is executed as a negated any, and negated queries can sometimes perform poorly with the type of inverted indexes used by a search engine.
Also, please be aware that there is currently a limit on the number of elements in all complex collections across a document. This limit is currently 3000. So if RequiredPermissions were the only complex collection in your index, this means you could have at most 3000 permissions defined per document.

Related

Get member emails when using delta on group

I'm trying to use the graph API to get a list of users which have been added to a specific group using the delta feature so I reduce the amount of data that passes through.
However, when I $expand on the members property, I'm only getting the id, and not the specific properties I need (mail and some other details) - despite the fact I'm $selecting it.
The url I'm using for the query is
https://graph.microsoft.com/v1.0/groups/delta?$expand=members($select=id,mail)&$select=members&$filter=id eq '<myGroupId>'
And the data I'm getting returned is:
[...]
"value": [
{
"id": "xxxxx-xxxx-xxx-xxx-xxxxx",
"members#delta": [
{
"#odata.type": "#microsoft.graph.user",
"id": "xxxxxxxxxxxxxxxxxxxx"
},
{
"#odata.type": "#microsoft.graph.user",
"id": "xxxxxxxxxxxxxxxxxxxxx"
},
I want my members#delta to include the details of the member so I don't have to query for them seperately.
According this members#delta property contains only the ids of member objects in the group.

How Do I Select A Sub-Field For A User?

I am trying to get information about users from Microsoft Graph Explorer. The default query gives me way too much info per user https://graph.microsoft.com/beta/users
Therefore, I am getting a subset of data using $select https://graph.microsoft.com/beta/users?&$select=id,displayName,identities which looks like this:
{
"#odata.context": "https://graph.microsoft.com/beta/$metadata#users(id,displayName,identities)",
"value": [
{
"id": "06609x07-2b89-5l92-8egg-5666a36uu26w",
"displayName": "Joe",
"identities": [
{
"signInType": "emailAddress",
"issuer": "contoso.onmicrosoft.com",
"issuerAssignedId": "joe.bloggs#contoso.com"
},
{
"signInType": "userPrincipalName",
"issuer": "contoso.onmicrosoft.com",
"issuerAssignedId": "06609x07-2b89-5l92-8egg-5666a36uu26w#contoso.onmicrosoft.com"
}
]
}
]
}
Is it possible to modify the select so that it doesn't return the whole identities block but just returns the issuerAssignedId field instead (ideally just the first one)?
It seems impossible to return only issuerAssignedId field with $select.
That describes the operation of $select. There is no mention of $select in the properties.
Use the $select query parameter to return a set of properties that are
different than the default set for an individual resource or a
collection of resources. With $select, you can specify a subset or a
superset of the default properties.
On the other hand, the issuerAssignedId just supports $filter, see here.

Characters to split the user-query in Vespa engine

We split the user-query on ascii spaces to create a weakAnd(...).
The user-input "Watch【Docudrama】" does not contain a whitespace - but throws an error.
Question: Which codepoints beside whitespaces should be used to split the query?
YQL (fails):
select * from post where text contains "Watch【Docudrama】" limit 1;
YQL (works):
select * from post where weakAnd(text contains "Watch",text contains "【Docudrama】") limit 1;
Error message:
{
"root": {
"id": "toplevel",
"relevance": 1,
"fields": {
"totalCount": 0
},
"errors": [
{
"code": 4,
"summary": "Invalid query parameter",
"source": "content",
"message": "Can not add WORD_ALTERNATIVES text:[ Watch【Docudrama】(1.0) watch(0.7) ] to a segment phrase"
}
]
}
}
Are you sure you need to use WAND for this? Try setting the user query grammar to "any" (default is "all"), which will use the "OR" operator for user supplied terms. There is an example here: https://docs.vespa.ai/documentation/reference/query-language-reference.html#userinput
The process of splitting up the query is known as Tokenization. This is a complex and language dependent process, Vespa uses Apache OpenNLP to do this (and more): https://docs.vespa.ai/documentation/linguistics.html has more information and also references to the code which performs this operation.
If you really want to use WAND, instead of reimplementing the query parsing logic outside Vespa, I suggest you create a Java searcher which descends the query tree and modifies it by replacing the created AndItem with WeakAndItem. See https://docs.vespa.ai/documentation/searcher-development.html and the code example here: https://docs.vespa.ai/documentation/advanced-ranking.html

Using search.in with all

Follwing statement find all profiles that has Facebook or twitter and this works:
$filter=SocialAccounts/any(x: search.in(x, 'Facebook,Twitter'))
But I cant find any samples for finding all that has both Facebook and twitter. I tried:
$filter=SocialAccounts/all(x: search.in(x, 'Facebook,Twitter'))
But this is not valid query.
Azure Search does not support the type of ‘all’ filter that you’re looking for. Using search.in with ‘all’ would be equivalent to using OR, but Azure Search can only handle AND in the body of an ‘all’ lambda (which is equivalent to OR in the body of an ‘any’ lambda).
You might try a workaround like this:
$filter=tags/any(t: t eq 'Facebook') and tags/any(t: t eq 'Twitter')
However, this isn't actually equivalent to using all with search.in. The query as expressed using all is matching documents where every social account is strictly either Facebook or Twitter. If any other social account is present, the document won’t match. The workaround doesn’t have this property. A document must have at least Facebook and Twitter in order to match, but not exclusively those. This is certainly a valid scenario; it just isn't the same as using all with search.in, which was the original question.
No matter how you try to rewrite the query, you won’t be able to express an equivalent to the all query. This is a limitation due to the way Azure Search stores collections of strings and other primitive types in the inverted index.
Please vote on user voice to help prioritize:
https://feedback.azure.com/forums/263029-azure-search/suggestions/37166749-efficient-way-to-express-a-true-all
A possible workaround is to use the new Complex Types feature, which does allow more expressive filters inside lambda expressions. For example, if you model tags as objects with a single value property instead of as a collection of strings, you should be able to execute a filter like this:
$filter=tags/all(t: search.in(t/value, 'Facebook,Twitter'))
In the REST API, you'd define tags like this:
{
"name": "myindex",
"fields": [
...
{
"name": "tags",
"type": "Collection(Edm.ComplexType)",
"fields": [
{ "name": "value", "type": "Edm.String", "filterable": true }
]
}
]
}
Note that this feature is in preview at the time of this writing, but will be generally available (and publicly documented) soon.

jsonschema: Verifying that an array contains an element, without erroring on other elements

I recently found jsonschema and I've been loving using it, however recently I've come across something that I want to do that I just haven't been able to figure out.
What I want to do is to validate that an array must contain an element that matches a schema, but I don't want to have validation fail on other elements that would be in the list.
Say that I have an array like the following:
arr = [
{"some object": True},
False,
{"AnotherObj": "a string this time"},
"test"
]
I want to be able to do something like "validate that arr contains an object that has a property 'some object' that is a boolean, and error if it doesn't, but don't care about other elements."
I don't want it to validate the other items in the list. I just want to make sure that the list contains an element that matches the schema at least once. I also do not know the order which the elements will arrive in the array.
I've tried this already with a schema like:
{"type": "array",
"items": {
"type": "object",
"properties": {
"tool": {
# A schema here to validate tool
},
"required": ["tool"]
}
}
The problem is that it requires every item in the array to have the property "tool", and not what I actually want.
Any help anyone can give me with this would be much appreciated! I've been stumped on this for a really long time with no forward progress.
Thanks!
I've gotten an answer to this question:
The schema used is (where ... B ... is the schema to require):
{
"type": "array",
"not": {
"items": {
"not": {... B ...}
}
}
}
It basically works out to be something like "Ensure that not (items don't match B)". I'm not 100% clear on why this works the way it does, but it does so I figured I'd share it for posterity.

Resources