How to prevent Elasticsearch from flattening 2D arrays in "fields"-containing query - arrays

Nested arrays get flattened when represented in "fields". I expect that values from the same path to be merged, but that the internal data structure will not be modified.
Could someone explain whether I am doing something incorrectly, or whether this belongs as an Elasticsearch issue?
Steps to reproduce:
Create the 2D data
curl -XPOST localhost:9200/test/5 -d '{ "data": [ [100],[2,3],[6,7] ] }'
Query the data, specifying fields
curl -XGET localhost:9200/test/5/_search -d '{"query":{"query_string":{"query":"*"} }, "fields":["data"] } }'
Result:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"test","_type":"5","_id":"AVdsHJrepOClGTFyoGqo","_score":1.0,"fields":{"data":[100,2,3,6,7]}}]}}
Repeat without the use of "fields":
curl -XGET localhost:9200/test/5/_search -d '{"query":{"query_string":{"query":"*"} } } }'
Result:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"test","_type":"5","_id":"AVdsHJrepOClGTFyoGqo","_score":1.0,"_source":{ "data": [ [100],[2,3],[6,7] ] }}]}}
Notice that _source and fields differ, in that "fields" decomposes the 2D array into a 1D array.

When you specify nothing else in your request, what you get back foreach hit is the "_source" object, that is, exactly the Json you sent to ES during indexing (even including whitespace!).
When you use source filtering, as Andrey suggests, it's the same except you can include or exclude certain fields.
When you use the "fields" directive in your query, the return values are not taken from the _source, but read directly from the Lucene Index. (see docs) Now the key in your search response will switch from "_source" to "fields" to reflect this change.
As alkis said:
https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html
These docs say up front that, yes, Elasticsearch does flatten arrays.

Instead of specifying "fields" I usually do source filtering
Your query would change to something like:
curl -XGET <IPADDRESS>:9200/test/5/_search -d '{"_source":{"include": ["data"]}, "query":{"query_string":{"query":"*"} }}'

From here https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html
it seems that elasticsearch considers them the same.
In Elasticsearch, there is no dedicated array type. Any field can contain zero or more values by default, however, all values in the array must be of the same datatype. For instance:
an array of strings: [ "one", "two" ]
an array of integers: [ 1, 2 ]
an array of arrays: [ 1, [ 2, 3 ]] which is the equivalent of [ 1, 2, 3 ]
an array of objects: [ { "name": "Mary", "age": 12 }, { "name": "John", "age": 10 }]
You could use an array of json objects and use nested data type with nested query.
Maybe nested data type could be helpful
PUT /my_index
PUT /my_index/_mapping/my_type
{
"properties" : {
"data" : {
"type" : "nested",
"properties": {
"value" : {"type": "long" }
}
}
}
}
POST /my_index/my_type
{
"data": [
{ "value": [1, 2] },
{ "value": [3, 4] }
]
}
POST /my_index/my_type
{
"data": [
{ "value": [1, 5] }
]
}
GET /my_index/my_type/_search
{
"query": {
"nested": {
"path": "data",
"query": {
"bool": {
"must": [
{
"match": {
"data.value": 1
}
},
{
"match": {
"data.value": 2
}
}
]
}
}
}
}
}

Related

Encapsulate a JSON Array inside an object with JOLT?

I work on a project where the output of one of our APIs is a JSON array. I'd like to encapsulate this array inside an object.
I try to use a JOLT transformation (this is the first time I use this tool) to achieve this. I've already searched through a lot of example, but I still can't figure out what my JOLT specification has to be to perform the transformation. I can't find what I am looking for.
For example, if my input is like this:
[
{
"id": 1,
"name": "foo"
},
{
"id": 2,
"name": "bar"
}
]
I'd like the output to be:
{
"list":
[
{
"id": 1,
"name": "foo"
},
{
"id": 2,
"name": "bar"
}
]
}
In short, I just want to put my array inside a field of another object.
You can use a shift transformation spec such as
[
{
"operation": "shift",
"spec": {
"*": "list[]"
}
}
]
where "*" wildcard represents indices of the current wrapper array of objects
the demo on the site http://jolt-demo.appspot.com/ is

PyMongo - How to compare the given array exactly matches with the document

I have a MongoDB document with the following attributes:
{
"label": [
"ibc",
"ibd",
"ibe"
],
"location": "vochelle st"
}
and I have to return the document only if the documents label exactly matches the given array i.e., ["ibc","ibd"] and for the same, I am using the query:
db.collection.find({"location":"vochelle st","dock_label":{"$all":["ibc", "ibd"]}})
Actual Response:
{
"label": [
"ibc",
"ibd",
"ibe"
],
"location": "vochelle st"
}
Expected Response:
{}
Since the label "ibe" doesn't exist in the given array, the expected result has to be the empty dictionary.
Give $size in your query
db.collection.find({
location: "vochelle st",
label: {
$all: [
"ibc",
"ibd"
],
$size: 2
}
})
mongoplayground
Use $setIntersection to intersect both label and input array.
Compare both intersected array (from 1) and label arrays are matched via $eq.
db.collection.find({
"location": "vochelle st",
$expr: {
$eq: [
{
$setIntersection: [
"$label",
[
"ibc",
"ibd"
]
]
},
"$label"
]
}
})
Sample Mongo Playground
If you want to check if the array exactly matches your input, you don't need any operator, just compare it with your value:
db.collection.find({"location":"vochelle st","label": ["ibc", "ibd"]})

Mongodb - Take only one element in nested array

I'm using mongodb to store my data. My collection consists in a list of objects identified by a type a list of other objects for each of them.
An example of my collection is:
[
{
"type": "a",
"properties": [
{
"value": "value_a",
"date": "my_date_a"
},
{
"value": "value_b",
"date": "my_date_b"
},
...
]
},
...
]
Based on the above data structure, I want to retrieve all collections by a given type, taking for each of them only one element in the nested array (reducing the nested list to a list of only one element).
So, given a type "a", an example of the result may be:
[
{
"type": "a",
"properties": [
{
"value": "value_a",
"date": "my_date_a"
}
]
},
...
]
I'm started trying this query { "type": "a" } to filter the collections. But, how can I do to take only one "properties" element? I cannot use the "slice" operator.
Thanks a lot.
I'm assuming from your reference to slice, that you're not interested in matching a particular nested element, and rather just getting a value at a fixed index (eg, 0).
If you're willing to use the aggregation pipeline, you can use arrayElementAt within a projection:
db.collection.aggregate([
// matches documents with type 'a'
{ $match: { type: 'a' } },
// creates a new document for each
{ $project: {
// that contains the original value for type
type: 1,
// and the first element from the original properties for properties
properties: { $arrayElemAt: [ "$properties", 0 ] }
} }
])

Store key-value pair in JSON array of Objects

I am interested in storing key-value pair of metadata inside a JSON array containing multiple JSON objects. This will instruct a generic parser what to do with the list of JSON objects in the JSON Array when it is processing the JSON Array. Below is a sample JSON, with where I am hoping to have some sort of metadata field.
{
"Data": [
<< "metadata":"instructions" here >>
{
"foo": 1,
"bar": "barString"
},
{
"foo": 3,
"bar": "fooString"
}
]
}
What is the proper way to structure this mixed data JSON array?
I would add a meta key as a peer of data like below. This would separate your data from the meta data.
{
"Meta": {
"metadata":"instructions"
},
"Data": [
{
"foo": 1,
"bar": "barString"
},
{
"foo": 3,
"bar": "fooString"
}
]
}
If you can modify the structure of the data, why not add a property meta with your instructions (i.e. Data.meta) and another property content (for want of a better word...) (i.e. Data.content), where the latter is the original array of objects.
That way, it is still valid JSON, and other implementations can read the meta-field as well without much ado.
Edit: just realized, you would also have to make Data an object rather than array. Then your JSON-schema should become this:
{
"Data": {
"metadata": "instructions here",
"content": [
{
"foo": 1,
"bar": "barString"
},
{
"foo": 3,
"bar": "fooString"
}
]
}
}
This will probably be the most stable, maintainable and portable solution.
For refrence, something similar has already been asked before.
After some additional discussion with another developer, we thought of one way to include the metadata instructions in the data JSON array.
{
"Data": [
{
"metadata": "Instructions"
}
{
"foo": 1,
"bar": "barString"
},
{
"foo": 3,
"bar": "fooString"
}
]
}
This approach does come with the limitation that index 0 of the data JSON array MUST contain a JSON Object containing the metadata and associated instructions for the generic parser. Failure to include this metadata object as index 0 would trigger an error case that the generic parser would need to handle. So it does have its trade-offs.
I will go to try help you..
"metadata" : [
{
"foo": 1,
"bar": "barString"
},
{
"foo": 3,
"bar": "fooString"
}
]

Restheart query for an nested array subdocument

I m working with mongodb and restheart.
In my nosql db i have a unique document with this structure:
{
"_id": "docID",
"users": [
{
"userID": "12",
"elements": [
{
"elementID": "1492446877599",
"events": [
{
"event1": "one"
},
{
"event2": "two",
}
]
}
},
{
"userID": "11",
"elements": [
{
"elementID": "14924",
"events": [
{
"event1": "one"
},
{
"event2": "two",
}
]
}
}
]
}
how can i build an url-query in order to get the user with id 11?
Using mongo shell it should be something like this one:
db.getCollection('collection').find({},{'users':{'$elemMatch':{'userID':'12'}}}).pretty()
I cannot find anything similar on restheart.
Could someone help me?
Using this
http://myHost:port/documents/docID?filter={%27users%27:{%27$elemMatch%27:{%27userID%27:%2712%27}}}
restheart returns me all the documents: userID 11 and 12.
Your request is against a document resource, i.e. the URL is http://myHost:port/documents/docID
The filter query parameter applies for collection requests, i.e. URLs such as http://myHost:port/documents
In any case you need to projection (the keys query parameter) to limit the returned properties.
You should achieve it with the following request (I haven't tried it) using the $elementMatch projection operator:
http://myHost:port/documents?keys={"users":{"$elemMatch":{"userID":"12"}}}

Resources