ElasticSearch Painless script: How to iterate in an array of Nested Objects - arrays

I am trying to create a script using the script_score of the function_score.
I have several documents whose rankings field is type="nested".
The mapping for the field is:
"rankings": {
"type": "nested",
"properties": {
"rank1": {
"type": "long"
},
"rank2": {
"type": "float"
},
"subject": {
"type": "text"
}
}
}
A sample document is:
"rankings": [
{
"rank1": 1051,
"rank2": 78.5,
"subject": "s1"
},
{
"rank1": 45,
"rank2": 34.7,
"subject": "s2"
}]
What I want to achieve is to iterate over the nested objects of rankings. Actually, I need to use i.e. a for loop in order to find a particular subject and use the rank1, rank2 to compute something.
So far, I use something like this but it does not seem to work (throwing a Compile error):
"function_score": {
"script_score": {
"script": {
"lang": "painless",
"inline":
"sum = 0;"
"for (item in doc['rankings_cug']) {"
"sum = sum + doc['rankings_cug.rank1'].value;"
"}"
}
}
}
I have also tried the following options:
for loop using : instead of in: for (item:doc['rankings']) with no success.
for loop using in but trying to iterate over a specific element of the object, i.e. the rank1: for (item in doc['rankings.rank1'].values), which actually compile but it seems that it finds a zero-length array of rank1.
I have read that _source element is the one which can return JSON-like objects, but as far as I found out it is not supported in Search queries.
Can you please give me some ideas of how to proceed with that?
Thanks a lot.

You can access _source via params._source. This one will work:
PUT /rankings/result/1?refresh
{
"rankings": [
{
"rank1": 1051,
"rank2": 78.5,
"subject": "s1"
},
{
"rank1": 45,
"rank2": 34.7,
"subject": "s2"
}
]
}
POST rankings/_search
POST rankings/_search
{
"query": {
"match": {
"_id": "1"
}
},
"script_fields": {
"script_score": {
"script": {
"lang": "painless",
"inline": "double sum = 0.0; for (item in params._source.rankings) { sum += item.rank2; } return sum;"
}
}
}
}
DELETE rankings

Unfortunately, ElasticSearch scripting in general does not support the ability to access nested documents in this way (including Painless). Perhaps, consider a different structure to your mappings where rankings are stored in multi-valued fields if you need to be able to iterate across them in such a way. Ultimately, the nested data will need to de-normalized and put into the parent documents to be able to gets scores in the way described here.

For Nested objects in an array, iterated over the items and it worked.
Following is my sample data in elasticsearch index:
{
"_index": "activity_index",
"_type": "log",
"_id": "AVjx0UTvgHp45Y_tQP6z",
"_version": 4,
"found": true,
"_source": {
"updated": "2016-12-11T22:56:13.548641",
"task_log": [
{
"week_end_date": "2016-12-11",
"log_hours": 16,
"week_start_date": "2016-12-05"
},
{
"week_start_date": "2016-03-21",
"log_hours": 0,
"week_end_date": "2016-03-27"
},
{
"week_start_date": "2016-04-24",
"log_hours": 0,
"week_end_date": "2016-04-30"
}
],
"created": "2016-12-11T22:56:13.548635",
"userid": 895,
"misc": {
},
"current": false,
"taskid": 1023829
}
}
Here is the "Painless" script to iterate over nested objects:
{
"script": {
"lang": "painless",
"inline":
"boolean contains(def x, def y) {
for (item in x) {
if (item['week_start_date'] == y){
return true
}
}
return false
}
if(!contains(ctx._source.task_log, params.start_time_param) {
ctx._source.task_log.add(params.week_object)
}",
"params": {
"start_time_param": "2016-04-24",
"week_object": {
"week_start_date": "2016-04-24",
"week_end_date": "2016-04-30",
"log_hours": 0
}
}
}
}
Used above script for update: /activity_index/log/AVjx0UTvgHp45Y_tQP6z/_update
In the script, created a function called 'contains' with two arguments. Called the function.
The old groovy style: ctx._source.task_log.contains() will not work since ES 5.X stores nested objects in a separate document. Hope this helps!`

Related

ElasticSearch: search multiple elements in array of object

I'm on Elastic Search 6.8.22
I have multiple users and each one has multiple papers ("valid" or not):
{"name":"Amy",
"papers":[
{"type":"idcard", "country":"fr", "valid":"no"},
{"type":"idcard", "country":"us", "valid":"yes"}
]}
{"name":"Brittany",
"papers":[
{"type":"idcard", "country":"fr", "valid":"no"},
{"type":"idcard", "country":"us", "valid":"no"}
]}
{"name":"Chloe",
"papers":[
{"type":"idcard", "country":"fr", "valid":"yes"},
{"type":"idcard", "country":"us", "valid":"no"}
]}
I'm trying to find only user with a paper: "valid" for "fr":
{"query": {
"bool": {
"filter": [
{"match":{"papers.valid": "yes"}},
{"match":{"papers.country": "fr"}}
]}}}
It returns Chloe, which is fine (she has a paper which is both "valid" and "fr").
But it also returns Amy; because she has one "valid" paper and another one which is "fr".
This is due to the fact that ES doesn't understand array of objects and flattens everything into values with arrays (as far as I understand).
I've tried using "combined term queries" from this link, but I guess it only works for arrays of "primitive" (not complex objects).
I've seen that I can transform arrays into nested objects to do what I need, but it seems to be overcomplicated and would slow down the queries (because of hidden joins).
My question is:
Is there any way I can search if a document has in its array of objects, one that match multiple criteria at the same time ?
(Originally, I wanted a query that checks if every "papers" in the array matched criteria, but that seems impossible, ex. all papers of type "idcard" must be "valid")
You need to define papers as a nested field in the mapping, then you can run a nested search on it
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
So if for example, your mapping will be this:
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"papers": {
"type": "nested",
"properties": {
"type": {
"type": "keyword"
},
"country": {
"type": "keyword"
},
"valid": {
"type": "keyword"
}
}
}
}
}
}
this query will work
{
"query": {
"nested": {
"path": "papers",
"query": {
"bool": {
"filter": [
{
"term": {
"papers.valid": "yes"
}
},
{
"term": {
"papers.country": "fr"
}
}
]
}
}
}
}
}

Multikey partial index not used with elemMatch

Consider the following document format which has an array field tasks holding embedded documents
{
"foo": "bar",
"tasks": [
{
"status": "sleep",
"id": "1"
},
{
"status": "active",
"id": "2"
}
]
}
There exists a partial index on key tasks.id
{
"v": 2,
"unique": true,
"key": {
"tasks.id": 1
},
"name": "tasks.id_1",
"partialFilterExpression": {
"tasks.id": {
"$exists": true
}
},
"ns": "zardb.quxcollection"
}
The following $elemMatch query with multiple conditions on the same array element
db.quxcollection.find(
{
"tasks": {
"$elemMatch": {
"id": {
"$eq": "1"
},
"status": {
"$nin": ["active"]
}
}
}
}).explain()
does not seem to use the index
"winningPlan": {
"stage": "COLLSCAN",
"filter": {
"tasks": {
"$elemMatch": {
"$and": [{
"id": {
"$eq": "1"
}
},
{
"status": {
"$not": {
"$eq": "active"
}
}
}
]
}
}
},
"direction": "forward"
}
How can I make the above query use the index? The index does seem to be used via dot notation
db.quxcollection.find({"tasks.id": "1"})
however I need the same array element to match multiple conditions which includes the status field, and the following does not seem to be equivalent to the above $elemMatch based query
db.quxcollection.find({
"tasks.id": "1",
"tasks.status": { "$nin": ["active"] }
})
The way the partial indexes work is it uses the path as a key. With $elemMatch you don't have the path explicitly in the query. If you check it with .explain("allPlansExecution") it is not even considered by the query planner.
To benefit from the index you can specify the path in the query:
db.quxcollection.find(
{
"tasks.id": "1",
"tasks": {
"$elemMatch": {
"id": {
"$eq": "1"
},
"status": {
"$nin": ["active"]
}
}
}
}).explain()
It duplicates part of the elemMatch condition, so the index will be used to get all documents containing tasks of specific id, then it will filter out documents with "active" tasks at fetch stage. I must admit the query doesn't look nice, so may be add some comments to the code with explanations.

Remove elements/objects From Array in ElasticSearch Followed by Matching Query

I'm having issues trying to remove elements/objects from an array in elasticsearch.
This is the mapping for the index:
{
"example1": {
"mappings": {
"doc": {
"properties": {
"locations": {
"type": "geo_point"
},
"postDate": {
"type": "date"
},
"status": {
"type": "long"
},
"user": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
And this is an example document.
{
"_index": "example1",
"_type": "doc",
"_id": "8036",
"_score": 1,
"_source": {
"user": "kimchy8036",
"postDate": "2009-11-15T13:12:00",
"locations": [
[
72.79887719999999,
21.193036000000003
],
[
-1.8262150000000001,
51.178881999999994
]
]
}
}
Using the query below, I can add multiple locations.
POST /example1/_update_by_query
{
"query": {
"match": {
"_id": "3"
}
},
"script": {
"lang": "painless",
"inline": "ctx._source.locations.add(params.newsupp)",
"params": {
"newsupp": [
-74.00,
41.12121
]
}
}
}
But I'm not able to remove array objects from locations. I have tried the query below but it's not working.
POST example1/doc/3/_update
{
"script": {
"lang": "painless",
"inline": "ctx._source.locations.remove(params.tag)",
"params": {
"tag": [
-74.00,
41.12121
]
}
}
}
Kindly let me know where i am doing wrong here. I am using elastic version 5.5.2
In painless scripts, Array.remove() method removes by index, not by value.
Here's a working example that removes array elements by value in Elasticsearch script:
POST objects/_update_by_query
{
"query": {
... // use regular ES query to remove only in relevant documents
},
"script": {
"source": """
if (ctx._source[params.array_attribute] != null) {
for (int i=ctx._source[params.array_attribute].length-1; i>=0; i--) {
if (ctx._source[params.array_attribute][i] == params.value_to_remove) {
ctx._source[params.array_attribute].remove(i);
}
}
}
""",
"params": {
"array_attribute": "<NAME_OF_ARRAY_PROPERTY_TO_REMOVE_VALUE_IN>",
"value_to_remove": "<VALUE_TO_REMOVE_FROM_ARRAY>",
}
}
}
You might want to simplify script, if your script shall only remove values from one specific array attribute. For example, removing "green" from document's .color_list array:
_doc/001 = {
"color_list": ["red", "blue", "green"]
}
Script to remove "green":
POST objects/_update_by_query
{
"query": {
... // use regular ES query to remove only in relevant documents
},
"script": {
"source": """
for (int i=ctx._source.color_list.length-1; i>=0; i--) {
if (ctx._source.color_list[i] == params.color_to_remove) {
ctx._source.color_list.remove(i);
}
}
""",
"params": {
"color_to_remove": "green"
}
}
}
Unlike add(), remove() takes the index of the element and remove it.
Your ctx._source.locations in painless is an ArrayList. It has List's remove() method:
E remove(int index)
Removes the element at the specified position in this list (optional operation). ...
See Painless API - List for other methods.
See this answer for example code.
"script" : {
"lang":"painless",
"inline":"ctx._source.locations.remove(params.tag)",
"params":{
"tag":indexToRemove
}
}
If with ctx._source.locations.add(elt) You add the element, with ctx._source.locations.remove(indexToRemove), you remove by the index of element in the array.

Add new object inside array of objects, inside array of objects in mongodb

Considering the below bad model, as I am totally new to this.
{
"uid": "some-id",
"database": {
"name": "nameOfDatabase",
"collection": [
{
"name": "nameOfCollection",
"fields": {
"0": "field_1",
"1": "field_2"
}
},
{
"name": "nameOfAnotherCollection",
"fields": {
"0": "field_1"
}
}
]
}
}
I have the collection name (i.e database.collection.name) and I have a few fields to add to it or delete from it (there are some already existing ones under database.collection.fields, I want to add new ones or delete exiting ones).
In short how do I update/delete "fields", when I have the database name and the collection name.
I cannot figure out how to use positional operator $ in this context.
Using mongoose update as
Model.update(conditions, updates, options, callback);
I don't know what are correct conditions and correct updates parameters.
So far I have unsuccessfully used the below for model.update
conditions = {
"uid": req.body.uid,
"database.name": "test",
"database.collection":{ $elemMatch:{"name":req.body.collection.name}}
};
updates = {
$set: {
"fields": req.body.collection.fields
}
};
---------------------------------------------------------
conditions = {
"uid": req.body.uid,
"database.name": "test",
"database.collection.$.name":req.body.collection.name
};
updates = {
$addToSet: {
"fields": req.body.collection.fields
}
};
I tried a lot more but none did work, as I am totally new.
I am getting confused between $push, $set, $addToSet, what to use what not to?, how to?
The original schema is supposed to be as show below, but running queries on it is getting harder n harder.
{
"uid": "some-id",
"database": [
{ //array of database objects
"name": "nameOfDatabase",
"collection": [ //array of collection objects inside respective databases
{
"name": "nameOfCollection",
"fields": { //fields inside a this particular collection
"0": "field_1",
"1": "field_2"
}
}
]
}
]
}

Cloudant find Query with $and and $or elements

I'm using the following json to find results in a Cloudant
{
"selector": {
"$and": [
{
"type": {
"$eq": "sensor"
}
},
{
"v": {
"$eq": 2355
}
},
{
"$or": [
{
"p": "#401000103"
},
{
"p": "#401000114"
}
]
},
{
"t_max": {
"$gte": 1459554894
}
},
{
"t_min": {
"$lte": 1459509591
}
}
]
},
"fields": [
"_id",
"p"
],
"limit": 200
}
If I run this againt my cloudant database I get the following error:
{
"error": "unknown_error",
"reason": "function_clause",
"ref": 3379914628
}
If I remove one the $or elements I get the results for query.
(,{"p":"#401000114"})
Also i get a result if I replace #401000114 with #401000114 I get result.
But when I want to use both element I get the error code above.
Can anybody tell what this error_reason: function_clause mean?
error_reason: function_clause means there was a problem on the server, you should probably reach out to Cloudant Support and see if they can help you with your issue.
I had contact with the Cloudant support.
This is there answer:
The issue affects Cloudant generally
It affects both mult-tenant and dedicated clusters.
There are working on the sollution.
A workaround is in the array to which the $or operator applies has two elements, you can get the correct result by repeating one of the items in the array.

Resources