Permissions in ElasticSearch

Permissions in ElasticSearch - arrays

Given a set of documents similar to the following:
{
"value": "Some random string here",
"permissions": ["job.view", "special.permission"]
}
We want to be able to create a search that'll allow us to pass an array of permissions to match against, for example, we might want to pass in
["job.view", "foo.bar", "pineapple.eat"]
as the permissions.
The document should only return in the search if all the permissions listed in the document exist in the set passed in as part of the query.
Not fussed whether we have to change the document layout, or the query, but, we're currently restricted to not being able to use the Scripting API (due to AWS).

There is a convoluted way to do this which requires you also index the number of permissions in the document, i.e. if the document contains two values in the permissions array (like in your example) then you'd also include the field permissions_count: 2.
Then your query would contain as many bool/should queries as there are permissions permutations in your search array. For instance, in your search array you have 3 permissions ["job.view", "foo.bar", "pineapple.eat"], then you need to check the following conditions:
permissions contains all three permissions and permissions_count: 3
or permissions contains two of the three permissions (three combinations) and permissions_count: 2
or permissions contains only one of the three permissions (three possibilities) and permissions_count: 1
So when checking for 1 permission, you'll have 1 query in your bool/should. For 2 permissions, you'll have 3, for 3 permissions you have 7, etc..
The full query is shown below:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"terms": {
"permissions": ["job.view", "special.permission", "pineapple.eat"]
}
},
{
"term": {
"permissions_count": 3
}
}
]
}
},
{
"bool": {
"must": [
{
"terms": {
"permissions": ["special.permission", "pineapple.eat"]
}
},
{
"term": {
"permissions_count": 2
}
}
]
}
},
{
"bool": {
"must": [
{
"terms": {
"permissions": ["job.view", "pineapple.eat"]
}
},
{
"term": {
"permissions_count": 2
}
}
]
}
},
{
"bool": {
"must": [
{
"terms": {
"permissions": ["special.permission", "job.view"]
}
},
{
"term": {
"permissions_count": 2
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"permissions": "special.permission"
}
},
{
"term": {
"permissions_count": 1
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"permissions": "job.view"
}
},
{
"term": {
"permissions_count": 1
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"permissions": "pineapple.eat"
}
},
{
"term": {
"permissions_count": 1
}
}
]
}
}
],
"minimum_number_should_match": 1
}
}
}
As you can see it's a bit convoluted only to check for three permissions...
I would maybe approach this problem from a different perspective and come up with another field (or set of fields) that would contain a bitset of permissions or cleverly chosen integers for each permissions, but I haven't fully thought-out this one yet.
Another solution would be to leverage Shield and document-access control instead of storing the permissions within the documents themselves.

Related

How to store nested structure in graph database

I am analyzing how to store nested/hierarchical structure in graph database. I want to store like a tree where settings vertex will have two children DigitalInput and Input2 and like in subsequent parameters. Any inputs for which approach I should choose and how?
"properties": {
"A": {
"value": "prop1 new value"
},
"settings": {
"DigitalInput": {
"Input1": {
"nTransIn1": {
"tagName": {
"value": ""
}
}
},
"Input2": {
"nTransIn2": {
"tagName": {
"value": ""
}
}
}
}

MongoDB Track data changes

I want to track changes on MongoDB Documents. The big Challenge is that MongoDB has nested Documents.
Example
[
{
"_id": "60f7a86c0e979362a25245eb",
"email": "walltownsend#delphide.com",
"friends": [
{
"name": "Hancock Nelson"
},
{
"name": "Owen Dotson"
},
{
"name": "Cathy Jarvis"
}
]
}
]
after the update/change
[
{
"_id": "60f7a86c0e979362a25245eb",
"email": "walltownsend#delphide.com",
"friends": [
{
"name": "Daphne Kline" //<------
},
{
"name": "Owen Dotson"
},
{
"name": "Cathy Jarvis"
}
]
}
]
This is a very basic example of a highly expandable real world use chase.
On a SQL Based Database, I would suggest some sort of this solution.
The SQL way
users
_id
email
60f7a8b28db7c78b57bbc217
cathyjarvis#delphide.com
friends
_id
user_id
name
0
60f7a8b28db7c78b57bbc217
Hancock Nelson
1
60f7a8b28db7c78b57bbc217
Suarez Burt
2
60f7a8b28db7c78b57bbc217
Mejia Elliott
after the update/change
users
_id
email
60f7a8b28db7c78b57bbc217
cathyjarvis#delphide.com
friends
_id
user_id
name
0
60f7a8b28db7c78b57bbc217
Daphne Kline
1
60f7a8b28db7c78b57bbc217
Suarez Burt
2
60f7a8b28db7c78b57bbc217
Mejia Elliott
history
_id
friends_id
field
preUpdate
postUpdate
0
0
name
Hancock Nelson
Daphne Kline
If there is an update and the change has to be tracked before the next update, this would work for NoSQL as well. If there is a second Update, we have a second line in the SQL database and it't very clear. On NoSQL, you can make a list/array of the full document and compare changes during the indexes, but there is very much redundant information which hasn't changed.

Have a look at Set Expression Operators
$setDifference
$setEquals
$setIntersection
Be ware, these operators perform set operation on arrays, treating arrays as sets. If an array contains duplicate entries, they ignore the duplicate entries. They ignore the order of the elements.
In your example the update would result in
removed: [ {name: "Hancock Nelson" } ],
added: [ {name: "Daphne Kline" } ]
If the number of elements is always the same before and after the update, then you could use this one:
db.collection.insertOne({
friends: [
{ "name": "Hancock Nelson" },
{ "name": "Owen Dotson" },
{ "name": "Cathy Jarvis" }
],
updated_friends: [
{ "name": "Daphne Kline" },
{ "name": "Owen Dotson" },
{ "name": "Cathy Jarvis" }
]
})
db.collection.aggregate([
{
$set: {
difference: {
$map: {
input: { $range: [0, { $size: "$friends" }] },
as: "i",
in: {
$cond: {
if: {
$eq: [
{ $arrayElemAt: ["$friends", "$$i"] },
{ $arrayElemAt: ["$updated_friends", "$$i"] }
]
},
then: null,
else: {
old: { $arrayElemAt: ["$friends", "$$i"] },
new: { $arrayElemAt: ["$updated_friends", "$$i"] }
}
}
}
}
}
}
},
{
$set: {
difference: {
$filter: {
input: "$difference",
cond: { $ne: ["$$this", null] }
}
}
}
}
])

Multiple match_phrase conditions with another bool in a single ElasticSearch query?

I am trying to conduct an Elasticsearch query that searched a text field ("body") and returns items that match at least one of two multi-word phrases I provide (ie: "stack overflow" OR "the stackoverflow"). I would also like the query to only provide results that occur after a given timestamp, with the results ordered by time.
My current solution is below. I believe the MUST is working correctly (gte a timestamp), but the BOOL + SHOULD with two match_phrases is not correct. I am getting the following error:
Unexpected character ('{' (code 123)): was expecting double-quote to start field name
Which I think is because I have two match_phrases in there?
This is the ES mapping and the details of the ES API I am using details are here.
{"query":
{"bool":
{"should":
[{"match_phrase":
{"body":"a+phrase"}
},
{"match_phrase":
{"body":"another+phrase"}
}
]
},
{"bool":
{"must":
[{"range":
{"created_at:
{"gte":"thispage"}
}
}
]}
}
},"size":10000,
"sort":"created_at"
}

I think you were just missing a single " after created_at.
{
"query": {
"bool": {
"must": [
{
"range": {
"created_at": {
"gte": "1534004694"
}
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"body": "a+phrase"
}
},
{
"match_phrase": {
"body": "another+phrase"
}
}
]
}
}
]
}
},
"size": 10,
"sort": "created_at"
}
Also, you are allowed to have both must and should as properties of a bool object, so this is also worth trying.
{
"query": {
"bool": {
"must": {
"range": {
"created_at": {
"gte": "1534004694"
}
}
},
"should": [
{
"match_phrase": {
"body": "a+phrase"
}
},
{
"match_phrase": {
"body": "another+phrase"
}
}
]
}
},
"size": 10,
"sort": "created_at"
}
On a side note, Postman or any JSON formatter/validator would really help in determining where the error is.

Remove elements/objects From Array in ElasticSearch Followed by Matching Query

I'm having issues trying to remove elements/objects from an array in elasticsearch.
This is the mapping for the index:
{
"example1": {
"mappings": {
"doc": {
"properties": {
"locations": {
"type": "geo_point"
},
"postDate": {
"type": "date"
},
"status": {
"type": "long"
},
"user": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
And this is an example document.
{
"_index": "example1",
"_type": "doc",
"_id": "8036",
"_score": 1,
"_source": {
"user": "kimchy8036",
"postDate": "2009-11-15T13:12:00",
"locations": [
[
72.79887719999999,
21.193036000000003
],
[
-1.8262150000000001,
51.178881999999994
]
]
}
}
Using the query below, I can add multiple locations.
POST /example1/_update_by_query
{
"query": {
"match": {
"_id": "3"
}
},
"script": {
"lang": "painless",
"inline": "ctx._source.locations.add(params.newsupp)",
"params": {
"newsupp": [
-74.00,
41.12121
]
}
}
}
But I'm not able to remove array objects from locations. I have tried the query below but it's not working.
POST example1/doc/3/_update
{
"script": {
"lang": "painless",
"inline": "ctx._source.locations.remove(params.tag)",
"params": {
"tag": [
-74.00,
41.12121
]
}
}
}
Kindly let me know where i am doing wrong here. I am using elastic version 5.5.2

In painless scripts, Array.remove() method removes by index, not by value.
Here's a working example that removes array elements by value in Elasticsearch script:
POST objects/_update_by_query
{
"query": {
... // use regular ES query to remove only in relevant documents
},
"script": {
"source": """
if (ctx._source[params.array_attribute] != null) {
for (int i=ctx._source[params.array_attribute].length-1; i>=0; i--) {
if (ctx._source[params.array_attribute][i] == params.value_to_remove) {
ctx._source[params.array_attribute].remove(i);
}
}
}
""",
"params": {
"array_attribute": "<NAME_OF_ARRAY_PROPERTY_TO_REMOVE_VALUE_IN>",
"value_to_remove": "<VALUE_TO_REMOVE_FROM_ARRAY>",
}
}
}
You might want to simplify script, if your script shall only remove values from one specific array attribute. For example, removing "green" from document's .color_list array:
_doc/001 = {
"color_list": ["red", "blue", "green"]
}
Script to remove "green":
POST objects/_update_by_query
{
"query": {
... // use regular ES query to remove only in relevant documents
},
"script": {
"source": """
for (int i=ctx._source.color_list.length-1; i>=0; i--) {
if (ctx._source.color_list[i] == params.color_to_remove) {
ctx._source.color_list.remove(i);
}
}
""",
"params": {
"color_to_remove": "green"
}
}
}

Unlike add(), remove() takes the index of the element and remove it.
Your ctx._source.locations in painless is an ArrayList. It has List's remove() method:
E remove(int index)
Removes the element at the specified position in this list (optional operation). ...
See Painless API - List for other methods.
See this answer for example code.

"script" : {
"lang":"painless",
"inline":"ctx._source.locations.remove(params.tag)",
"params":{
"tag":indexToRemove
}
}
If with ctx._source.locations.add(elt) You add the element, with ctx._source.locations.remove(indexToRemove), you remove by the index of element in the array.

Build json builder with arrayJson in groovy

I am new in groovy and I want to construct a json object with the builder
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{ "match": { "content": "scontent" } },
{ "match": { "title":"stitle" } }
]
}
},
{
"bool": {
"should": [
{ "match": { "a1": "v1" } },
{ "match": { "a2":"v2" } },
... and so on ...
{ "match": { "an":"vn" } }
]
}
}
]
}
},
"highlight": {
"fields": {
"content":{}
}
}
}
I search a lot of on other posts on stackoverflow and I write this code
So I did this but no way to get what I want :
JsonBuilder builder = new JsonBuilder()
def body = builder {
from Lib.or(qQuery.start, 0)
size Lib.or(qQuery.num, 10)
query {
bool {
must [
{
bool {
should [
{ match { content 'scontent' } },
{ match { title 'stitle' } }
]
}
},
{
bool {
should myVals.collect {[
'match' : { it.key it.value }
]}
}
}
]
}
}
highlight {
fields {
content {}
}
}
}
Thanks for any help !

I think you can make this work with the JsonBuilder as is, but it is usually easier to create the data structure using maps and lists (which is what the builder outputs) in groovy as you have more control there.
Example code:
import groovy.json.*
def data = [
query: [
bool: [
must: [
[bool:
[should: [
[match: [ content: 'scontent']],
[match: [ title: 'stitle']]
]]
],
[bool:
[should: [
[match: [ a1: 'v1']],
[match: [ a2: 'v2']],
[match: [ vn: 'vn']]
]]
]
]
]
]
]
println JsonOutput.prettyPrint(JsonOutput.toJson(data))
produces:
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"content": "scontent"
}
},
{
"match": {
"title": "stitle"
}
}
]
}
},
{
"bool": {
"should": [
{
"match": {
"a1": "v1"
}
},
{
"match": {
"a2": "v2"
}
},
{
"match": {
"vn": "vn"
}
}
]
}
}
]
}
}
}
I did not include your full json as it takes up some space, but the structure is there. Note the use of lists ([valueA, valueB]) vs maps ([someKey: someValue]) in the data structure.
Granted this makes the JsonBuilder less than 100% useful but I haven't seen any concise ways of including lists of large json objects in a list within the structure. You can do:
def json = JsonBuilder()
json.query {
bool('list', 'of', 'values')
}
but for larger structures as list elements I would say go with the lists and maps approach.