Encapsulate a JSON Array inside an object with JOLT? - arrays

I work on a project where the output of one of our APIs is a JSON array. I'd like to encapsulate this array inside an object.
I try to use a JOLT transformation (this is the first time I use this tool) to achieve this. I've already searched through a lot of example, but I still can't figure out what my JOLT specification has to be to perform the transformation. I can't find what I am looking for.
For example, if my input is like this:
[
{
"id": 1,
"name": "foo"
},
{
"id": 2,
"name": "bar"
}
]
I'd like the output to be:
{
"list":
[
{
"id": 1,
"name": "foo"
},
{
"id": 2,
"name": "bar"
}
]
}
In short, I just want to put my array inside a field of another object.

You can use a shift transformation spec such as
[
{
"operation": "shift",
"spec": {
"*": "list[]"
}
}
]
where "*" wildcard represents indices of the current wrapper array of objects
the demo on the site http://jolt-demo.appspot.com/ is

Related

MongoDB Array Query - Single out an array element

I am having trouble with querying a MongoDB collection with an array inside.
Here is the structure of my collection that I am querying. This is one record:
{
"_id": "abc123def4567890",
"profile_id": "abc123def4567890",
"image_count": 2,
"images": [
{
"image_id": "ABC123456789",
"image_url": "images/something.jpg",
"geo_loc": "-0.1234,11.234567890",
"title": "A Title",
"shot_time": "01:23:33",
"shot_date": "11/22/2222",
"shot_type": "scenery",
"conditions": "cloudy",
"iso": 16,
"f": 2.4,
"ss": "1/545",
"focal": 6.0,
"equipment": "",
"instructions": "",
"upload_date": 1234567890,
"update_date": 1234567890
},
{
"image_id": "ABC123456789",
"image_url": "images/something.jpg",
"geo_loc": "-0.1234,11.234567890",
"title": "A Title",
"shot_time": "01:23:33",
"shot_date": "11/22/2222",
"shot_type": "portrait",
"conditions": "cloudy",
"iso": "16",
"f": "2.4",
"ss": "1/545",
"focal": "6.0",
"equipment": "",
"instructions": "",
"upload_date": 1234567890,
"update_date": 1234567890
}
]
}
Forgive the formatting, I didn't know how else to show this.
As you can see, it's a profile with a series of images within an array called 'images' and there are 2 images. Each of the 'images' array items contain an object of attributes for the image (url, title, type, etc).
All I want to do is to return the object element whose attributes match certain criteria:
Select object from images which has shot_type = "scenery"
I tried to make it as simple as possible so i started with:
find( { "images.shot_type": "scenery" } )
This returns the entire record and both the images within. So I tried projection but I could not isolate the single object within the array (in this case object at position 0) and return it.
I think the answer lies with projection but I am unsure.
I have gone through the MongoDB documents for hours now and can't find inspiration. I have read about $elemMatch, $, and the other array operators, nothing seems to allow you to single out an array item based on data within. I have been through this page too https://docs.mongodb.com/manual/tutorial/query-arrays/ Still can't work it out.
Can anyone provide help?
Have I made an error by using '$push' to populate my images field (making it an array) instead of using '$set' which would have made it into an embedded document? Would this have made a difference?
Using aggregation:
db.collection.aggregate({
$project: {
_id: 0,
"result": {
$filter: {
input: "$images",
as: "img",
cond: {
$eq: [
"$$img.shot_type",
"scenery"
]
}
}
}
}
})
Playground
You can use $elemMatch in this way (simplified query):
db.collection.find({
"profile_id": "1",
},
{
"images": {
"$elemMatch": {
"shot_type": 1
}
}
})
You can use two objects into find query. The first will filter all document and will only get those whose profile_id is 1. You can omit this stage and use only { } if you wnat to search into the entire collection.
Then, the other object uses $elemMatch to get only the element whose shot_type is 1.
Check an example here

Mongodb - Take only one element in nested array

I'm using mongodb to store my data. My collection consists in a list of objects identified by a type a list of other objects for each of them.
An example of my collection is:
[
{
"type": "a",
"properties": [
{
"value": "value_a",
"date": "my_date_a"
},
{
"value": "value_b",
"date": "my_date_b"
},
...
]
},
...
]
Based on the above data structure, I want to retrieve all collections by a given type, taking for each of them only one element in the nested array (reducing the nested list to a list of only one element).
So, given a type "a", an example of the result may be:
[
{
"type": "a",
"properties": [
{
"value": "value_a",
"date": "my_date_a"
}
]
},
...
]
I'm started trying this query { "type": "a" } to filter the collections. But, how can I do to take only one "properties" element? I cannot use the "slice" operator.
Thanks a lot.
I'm assuming from your reference to slice, that you're not interested in matching a particular nested element, and rather just getting a value at a fixed index (eg, 0).
If you're willing to use the aggregation pipeline, you can use arrayElementAt within a projection:
db.collection.aggregate([
// matches documents with type 'a'
{ $match: { type: 'a' } },
// creates a new document for each
{ $project: {
// that contains the original value for type
type: 1,
// and the first element from the original properties for properties
properties: { $arrayElemAt: [ "$properties", 0 ] }
} }
])

Store key-value pair in JSON array of Objects

I am interested in storing key-value pair of metadata inside a JSON array containing multiple JSON objects. This will instruct a generic parser what to do with the list of JSON objects in the JSON Array when it is processing the JSON Array. Below is a sample JSON, with where I am hoping to have some sort of metadata field.
{
"Data": [
<< "metadata":"instructions" here >>
{
"foo": 1,
"bar": "barString"
},
{
"foo": 3,
"bar": "fooString"
}
]
}
What is the proper way to structure this mixed data JSON array?
I would add a meta key as a peer of data like below. This would separate your data from the meta data.
{
"Meta": {
"metadata":"instructions"
},
"Data": [
{
"foo": 1,
"bar": "barString"
},
{
"foo": 3,
"bar": "fooString"
}
]
}
If you can modify the structure of the data, why not add a property meta with your instructions (i.e. Data.meta) and another property content (for want of a better word...) (i.e. Data.content), where the latter is the original array of objects.
That way, it is still valid JSON, and other implementations can read the meta-field as well without much ado.
Edit: just realized, you would also have to make Data an object rather than array. Then your JSON-schema should become this:
{
"Data": {
"metadata": "instructions here",
"content": [
{
"foo": 1,
"bar": "barString"
},
{
"foo": 3,
"bar": "fooString"
}
]
}
}
This will probably be the most stable, maintainable and portable solution.
For refrence, something similar has already been asked before.
After some additional discussion with another developer, we thought of one way to include the metadata instructions in the data JSON array.
{
"Data": [
{
"metadata": "Instructions"
}
{
"foo": 1,
"bar": "barString"
},
{
"foo": 3,
"bar": "fooString"
}
]
}
This approach does come with the limitation that index 0 of the data JSON array MUST contain a JSON Object containing the metadata and associated instructions for the generic parser. Failure to include this metadata object as index 0 would trigger an error case that the generic parser would need to handle. So it does have its trade-offs.
I will go to try help you..
"metadata" : [
{
"foo": 1,
"bar": "barString"
},
{
"foo": 3,
"bar": "fooString"
}
]

How to sort in array of object - CouchBase

Hi currently I want to sort array of object, I use ARRAY_SORT function, it will use the first field of object to sort & it work well if every object has the same JSON structure. If one element in array has different JSON structure, the result is incorrect.
The query I use :
SELECT ARRAY_SORT(c.student) as student FROM Class c
Result :
"student": [
{
"id": 3,
"name": "Kenny35"
},
{
"id": 6,
"name": "Kenny35"
},
{
"id": 7,
"name": "Kenny35"
},
{
"id": 8,
"name": "Kenny35"
},
{
"hobby": "video game",
"id": 5,
"name": "Kenny35"
}
]
How can I specify property of object in array for ARRAY_SORT function ?
dev,
Objects are first compared by length/size of the object, then followed by fields in the object.
http://developer.couchbase.com/documentation/server/4.5/n1ql/n1ql-language-reference/comparisonops.html
That is the only collation supported now.
-Prasad
You can issue a query and use ORDER BY.
SELECT *
FROM Class c
UNNEST c.student s
ORDER BY ...

How to prevent Elasticsearch from flattening 2D arrays in "fields"-containing query

Nested arrays get flattened when represented in "fields". I expect that values from the same path to be merged, but that the internal data structure will not be modified.
Could someone explain whether I am doing something incorrectly, or whether this belongs as an Elasticsearch issue?
Steps to reproduce:
Create the 2D data
curl -XPOST localhost:9200/test/5 -d '{ "data": [ [100],[2,3],[6,7] ] }'
Query the data, specifying fields
curl -XGET localhost:9200/test/5/_search -d '{"query":{"query_string":{"query":"*"} }, "fields":["data"] } }'
Result:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"test","_type":"5","_id":"AVdsHJrepOClGTFyoGqo","_score":1.0,"fields":{"data":[100,2,3,6,7]}}]}}
Repeat without the use of "fields":
curl -XGET localhost:9200/test/5/_search -d '{"query":{"query_string":{"query":"*"} } } }'
Result:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"test","_type":"5","_id":"AVdsHJrepOClGTFyoGqo","_score":1.0,"_source":{ "data": [ [100],[2,3],[6,7] ] }}]}}
Notice that _source and fields differ, in that "fields" decomposes the 2D array into a 1D array.
When you specify nothing else in your request, what you get back foreach hit is the "_source" object, that is, exactly the Json you sent to ES during indexing (even including whitespace!).
When you use source filtering, as Andrey suggests, it's the same except you can include or exclude certain fields.
When you use the "fields" directive in your query, the return values are not taken from the _source, but read directly from the Lucene Index. (see docs) Now the key in your search response will switch from "_source" to "fields" to reflect this change.
As alkis said:
https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html
These docs say up front that, yes, Elasticsearch does flatten arrays.
Instead of specifying "fields" I usually do source filtering
Your query would change to something like:
curl -XGET <IPADDRESS>:9200/test/5/_search -d '{"_source":{"include": ["data"]}, "query":{"query_string":{"query":"*"} }}'
From here https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html
it seems that elasticsearch considers them the same.
In Elasticsearch, there is no dedicated array type. Any field can contain zero or more values by default, however, all values in the array must be of the same datatype. For instance:
an array of strings: [ "one", "two" ]
an array of integers: [ 1, 2 ]
an array of arrays: [ 1, [ 2, 3 ]] which is the equivalent of [ 1, 2, 3 ]
an array of objects: [ { "name": "Mary", "age": 12 }, { "name": "John", "age": 10 }]
You could use an array of json objects and use nested data type with nested query.
Maybe nested data type could be helpful
PUT /my_index
PUT /my_index/_mapping/my_type
{
"properties" : {
"data" : {
"type" : "nested",
"properties": {
"value" : {"type": "long" }
}
}
}
}
POST /my_index/my_type
{
"data": [
{ "value": [1, 2] },
{ "value": [3, 4] }
]
}
POST /my_index/my_type
{
"data": [
{ "value": [1, 5] }
]
}
GET /my_index/my_type/_search
{
"query": {
"nested": {
"path": "data",
"query": {
"bool": {
"must": [
{
"match": {
"data.value": 1
}
},
{
"match": {
"data.value": 2
}
}
]
}
}
}
}
}

Resources