Jolt to Transform JSON Array - arrays

I want to use Jolt to transform a JSON dataset. The problem is that my entire dataset is treated like an array because it is originally transformed from XML. Here is an example of the first 3 records:
{
"XMLSOCCER.COM" : { "Team" :[{
"Team_Id" : "45",
"Name" : "Aberdeen",
"Country" : "Scotland",
"Stadium" : "Pittodrie Stadium",
"HomePageURL" : "http://www.afc.co.uk",
"WIKILink" : "http://en.wikipedia.org/wiki/Aberdeen_F.C.",
"Capacity" : "20866",
"Manager" : "Derek McInnes"
},{
"Team_Id" : "46",
"Name" : "St Johnstone",
"Country" : "Scotland",
"Stadium" : "McDiarmid Park",
"HomePageURL" : "http://www.perthstjohnstonefc.co.uk",
"WIKILink" : "http://en.wikipedia.org/wiki/St._Johnstone_F.C."
},{
"Team_Id" : "47",
"Name" : "Motherwell",
"Country" : "Scotland",
"Stadium" : "Fir Park Stadium",
"HomePageURL" : "http://www.motherwellfc.co.uk",
"WIKILink" : "http://en.wikipedia.org/wiki/Motherwell_F.C."
}}]}}
For a single record-set, I can use this spec which gives me the correct output:
[
{
"operation": "shift",
"spec": {
"XMLSOCCER.COM": {
"Team": {
"Team_Id": "Team_Id",
"Name": "Name",
"Country": "Country",
"Stadium": "Stadium",
"Capacity": "Capacity",
"Manager": "Manager"
}
}
}}]
But because my entire dataset is treated as a JSON array (an array under "Team"), I cannot figure out how to create the spec to work with this configuration. I appreciate any input. thanks!

Spec: Match into all the elements of the Team array, and then reference the element number of the team array for each key in the output.
[
{
"operation": "shift",
"spec": {
"XMLSOCCER.COM": {
"Team": {
"*": {
"Team_Id": "soccer[&1].Team_Id",
"Name": "soccer[&1].Name",
"Country": "soccer[&1].Country",
"Stadium": "soccer[&1].Stadium",
"Capacity": "soccer[&1].Capacity",
"Manager": "soccer[&1].Manager"
}
}
}
}
}
]

Related

How do I create a MongoDB aggregate to lookup and add fields using ObjectIds in array objects

Using Mongo 4.4
I'm looking to to lookups across collections and add a human readable value from the target collection to the source collection using a aggregate.
This works fine for individual values, but for some lookups the ObjectIds are in objects in arrays, and I can't get that work. I can pull all the values back, but not place the individual values in the array objects.
In this test case, I have a library database with a books collection and a subscribers collection. The subscribers have a checkouts entry with is an array of objects, containing a reference to a book, and the checkout date. I want to add the book title to each object in the array.
Test Database:
books collection:
[
{
"_id" : ObjectId("63208c9f0d97eff0cfbefde6"),
"title" : "There and back again",
"author" : "Bilbo Baggins",
"publisher" : "Middle Earth Books"
},
{
"_id" : ObjectId("63208cd10d97eff0cfbeff02"),
"title" : "Two Towers",
"author" : "JRR Tolkin",
"publisher" : "Dude Books"
},
{
"_id" : ObjectId("63208cf10d97eff0cfbeffa3"),
"title" : "Dune",
"author" : "Frank Herbert",
"publisher" : "Classic Books"
},
{
"_id" : ObjectId("63208d1d0d97eff0cfbf0087"),
"title" : "Old Man's War",
"author" : "John Scalzi",
"publisher" : "Old Man Books"
}
]
subscribers collection:
[
{
"_id" : ObjectId("63208c2e0d97eff0cfbefb46"),
"name" : "Tom",
"checkouts" : [
{
"bookId" : ObjectId("63208cd10d97eff0cfbeff02"),
"checkoutDate" : ISODate("2022-01-01T21:21:20.202Z")
},
{
"bookId" : ObjectId("63208d1d0d97eff0cfbf0087"),
"checkoutDate" : ISODate("2022-01-02T21:22:20.202Z")
}
],
"address" : "123 Somewhere"
},
{
"_id" : ObjectId("63208c4e0d97eff0cfbefc1f"),
"name" : "Bob",
"checkouts" : [],
"address" : "123 Somewhere"
},
{
"_id" : ObjectId("63208c640d97eff0cfbefc9a"),
"name" : "Mary",
"checkouts" : [],
"address" : "123 Somewhere Else"
}
Desired Output for user Tom:
{
"_id" : ObjectId("63208c2e0d97eff0cfbefb46"),
"name" : "Tom",
"checkouts" : [
{
"bookId" : ObjectId("63208cd10d97eff0cfbeff02"),
"checkoutDate" : ISODate("2022-01-01T21:21:20.202Z"),
"title" : "Two Towers"
},
{
"bookId" : ObjectId("63208d1d0d97eff0cfbf0087"),
"checkoutDate" : ISODate("2022-01-02T21:22:20.202Z"),
"title" : "Old Man's War"
}
],
"address" : "123 Somewhere",
}
Using this aggregate:
db.getCollection('subscribers').aggregate([
{$match: {_id: ObjectId("63208c2e0d97eff0cfbefb46") } },
{$lookup: {from: "books", localField: "checkouts.bookId", foreignField: "_id", as: "book_tmp_field" }},
{$addFields: { "checkouts.title": "$book_tmp_field.title"}},
{$project: { book_tmp_field: 0}}
])
This is the closest I can get:
{
"_id" : ObjectId("63208c2e0d97eff0cfbefb46"),
"name" : "Tom",
"checkouts" : [
{
"bookId" : ObjectId("63208cd10d97eff0cfbeff02"),
"checkoutDate" : ISODate("2022-01-01T21:21:20.202Z"),
"title" : [
"Two Towers",
"Old Man's War"
]
},
{
"bookId" : ObjectId("63208d1d0d97eff0cfbf0087"),
"checkoutDate" : ISODate("2022-01-02T21:22:20.202Z"),
"title" : [
"Two Towers",
"Old Man's War"
]
}
],
"address" : "123 Somewhere"
}
Before performing the lookup, you should UNWIND the checkouts array. After all the processing is done, group the documents, to obtain the checkouts in the array. Finally, project your desired output document. Like this:
db.subscribers.aggregate([
{
$match: {
_id: ObjectId("63208c2e0d97eff0cfbefb46")
}
},
{
"$unwind": "$checkouts"
},
{
$lookup: {
from: "books",
localField: "checkouts.bookId",
foreignField: "_id",
as: "book_tmp_field"
}
},
{
$addFields: {
"checkouts.title": "$book_tmp_field.title"
}
},
{
$project: {
book_tmp_field: 0
}
},
{
"$group": {
"_id": {
_id: "$_id",
address: "$address",
name: "$name"
},
"checkouts": {
"$push": "$checkouts"
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
"$_id",
{
checkouts: "$checkouts"
}
]
}
}
}
])
Here's the playground link.

JSON schema different for first row and different for remaining rows

my problem statement is :
consider a a list of 15 rows, all rows should have 5 keys. However only the 0th row will have 4 keys. But all the remaining rows will have all the 5 keys.
I want to validate this again my response. Does first and other keyword really exist.
I found this here Correct JSON Schema for an array of items of different type
Example schema
{
"type": "array",
"items": {
"oneOf": [
{
"first": [{
"type": "object",
"required": ["state"],
"properties":{
"state":{
"type":"string"
}
}
}]
},
{
"other": [{
"type": "object",
"required": ["state", "zip"],
"properties":{
"state":{
"type":"string"
},
"zip":{
"type":"string"
}
}
}]
}
]
}
}
First things first: what do you want to achieve with following schema definition?
"first" : [ { ...schema... } ]
As to your problem statement, I am not sure, what you want to achieve:
Schema that allows first array item to be an object with 4 keys, while all other items should have 5 keys?
Schema, that allows only array items=object with 5 keys and will reject a JSON, which does have 4 keys in first item
Could you please rephrase your question to make it more clear? I did some solution basing on assumptions, but it would be good if you could confirm my understanding.
Required reading
Please read first through:
http://json-schema.org/latest/json-schema-validation.html#rfc.section.6.4.1
If "items" is an array of schemas, validation succeeds if each element
of the instance validates against the schema at the same position, if
any.
plus https://stackoverflow.com/a/52758108/2811843 on above topic
https://json-schema.org/understanding-json-schema/reference/array.html#length
https://json-schema.org/understanding-json-schema/reference/array.html#tuple-validation
and https://json-schema.org/understanding-json-schema/reference/array.html in general
as well as
https://json-schema.org/understanding-json-schema/reference/object.html#property-names
https://json-schema.org/understanding-json-schema/reference/object.html#size
and https://json-schema.org/understanding-json-schema/reference/object.html in general.
Possible solution
After looking at sample schema I will rephrase problem statement making some wild assumptions you want a schema, that allows an array of items, where item = object. First item could have 4 keys, while all other items must have 5 keys.
I need a JSON schema that will describe an array of objects, where
first object always has 4 keys/properties, while all remaining objects
do have 5 keys/properties.
Additionally, there is always at least first item in array (containing 4 keys) and there can be up to X other
objects (containing 5 keys) in array.
Go for Tuple-typing and array of objects. Thus you might exactly check that first item (object) has exactly 4 properties and define the schema for the rest of them.
First, full working schema (with comments inside). The "examples" section contains examples of arrays to illustrate the logic, only last 3 will be valid against schema.
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "array",
"$comment" : "This is an array, where first item must be an object with at least 4 properties and one property named \"state\" and can contain minimum 1 and maximum of 3 items",
"minItems" : 1,
"maxItems" : 3,
"items": [
{
"type": "object",
"minProperties" : 4,
"required" : ["state"],
}
],
"additionalItems" : {
"$comment" : "Any additional item in this array must be an object with at least 5 keys and two of them must be \"state\" and \"zip\".",
"type" : "object",
"minProperties" : 5,
"required" : ["state", "zip"],
},
"examples" : [
[
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
},
{},
{}
],
[
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
"zip" : "12345"
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
}
],
[
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
"zip" : "12345"
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
"zip" : "54321"
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
"zip" : "54321"
}
],
[],
[
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
"zip" : "12345"
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
"zip" : "54321"
},
],
[
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
},
],
[
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
},
{
"key1" : "1",
"key2" : "2",
"key3" : "3",
"state" : "some state",
"zip" : "12345"
},
]
]
}
So, step by step:
"type": "array",
"minItems" : 1,
"maxItems" : 3,
an JSON which is an array with minimum 1 item, maximum 3 items, will be ok. If you don't define "minItems" value, the empty array would pass validation against schema.
"items": [
{
"type": "object",
"minProperties" : 4,
"required" : ["state"],
}
],
This is the Tuple magic - a finite, ordered list of elements (sequence). Yep, maths has it's saying. By using "items" : [ ... ] instead of { ... } you fall into quoted above section of JSON Schema Validation spec (http://json-schema.org/latest/json-schema-validation.html#rfc.section.6.4.1 ).
Above basically says: This is an array, where first item must be an object with at least 4 keys and one of those keys must be "state".
Ok, last but not least:
"additionalItems" : {
"$comment" : "Any additional item in this array must be an object with at least 5 keys and two of them must be \"state\" and \"zip\".",
"type" : "object",
"minProperties" : 5,
"required" : ["state", "zip"],
}
By this I said:
in this array (which must have first item an object with 4 keys and one of those keys is "state" and oh, by the way, an array must have at least 1 item and tops 3 items) you can have additional items on top of the the ones already defined in "items" section. Each such additional item must be an object with at least 5 keys, out of which two must be "state" and "zip".
Does it solve your issue?

Druid - descending timestamps with groupBy query

What I'm asking for should be very simple but the Druid docs have little to no info about this.
I am making a groupBy query, and the data is very large so I'm "paging" it by increasing limitSpec.limit on each subsequent query.
By default, the returned array starts from the beginning timestamp and moves forward in time. I want the results to start from the end timestamp and move backwards in time from there.
Does anyone know how to do that?
So in other words, by default a groupBy query would look like this:
[
{
"version" : "v1",
"timestamp" : "2012-01-01T00:00:00.000Z",
"event" : {
"total_usage" : <some_value_one>
}
},
{
"version" : "v1",
"timestamp" : "2012-01-02T00:00:00.000Z",
"event" : {
"total_usage" : <some_value_two>
}
}
]
Whereas I want it to look like this:
[
{
"version" : "v1",
"timestamp" : "2012-01-02T00:00:00.000Z",
"event" : {
"total_usage" : <some_value_two>
}
},
{
"version" : "v1",
"timestamp" : "2012-01-01T00:00:00.000Z",
"event" : {
"total_usage" : <some_value_one>
}
}
]
You can achieve the ordering by using the "columns" attribute in the limit spec. see the below example.
{
"type" : "default",
"limit" : <integer_value>,
"columns" : [list of OrderByColumnSpec],
}
For more details you can refer the below druid doc -
http://druid.io/docs/latest/querying/limitspec.html
You can add timestamp as a dimension but truncated to date (assuming you use day granularity in your query) and force Druid to sort the result first by dimension values and then by timestamp.
Example Query:
{
"dataSource": "your_datasource",
"queryType": "groupBy",
"dimensions": [
{
"type": "default",
"dimension": "some_dimension_in",
"outputName": "some_dimension_out",
"outputType": "STRING"
},
{
"type": "extraction",
"dimension": "__time",
"outputName": "__timestamp",
"extractionFn": {
"type": "timeFormat",
"format" : "yyyy-MM-dd"
}
}
],
"aggregations": [
{
"type": "doubleSum",
"name": "some_metric",
"fieldName": "some_metric_field"
}
],
"limitSpec": {
"type": "default",
"limit": 1000,
"columns": [
{
"dimension": "__timestamp",
"direction": "descending",
"dimensionOrder": "numeric"
},
{
"dimension": "some_metric",
"direction": "descending",
"dimensionOrder": "numeric"
}
]
},
"intervals": [
"2019-09-01/2019-10-01"
],
"granularity": "day",
"context": {
"sortByDimsFirst": "true"
}
}

JSON Path for Array Of Array using Jayway JSON Path

I have a JSON Structured like this :
[{
"firstName": "John",
"age" : 26,
"phoneNumbers": [
{
"type" : "iPhone",
"number": "0123-4567-8888"
},
{
"type" : "home",
"number": "0123-4567-8910"
}
]
},
{
"firstName": "Johny",
"lastName" : "doe",
"age" : 26,
"address" : {
"streetAddress": "naist street",
"city" : "Nara",
"postalCode" : "630-0192"
},
"phoneNumbers": [
{
"type" : "iPhone",
"number": "0123-4567-8888"
},
{
"type" : "home",
"number": "0123-4567-8910"
}
]
}]
I want to extract the users who have a Iphone and name is JOHN .
I have used below expression
$[?(#.firstName=='John')].phoneNumbers[?(#.type=='iPhone')]
But I want to extract the complete user information . I have tried Filter Criteria API as well , but In it I am not able to find a way to access Phone Type attribute.
As mentioned in this post, the Jayway implementation supports inlined AND and OR criteria. The following JSON Path should meet your requirements.
$[?(#.firstName=='John' && 'iPhone' in #.phoneNumbers[*].type)]
Below is screenshot from Jayway JsonPath Evaluator
Also, please be informed that the syntax may vary depending on the implementation used.
for(var i=0;i<s.length;i++){
for(var j=0;j<s[i].phoneNumbers.length;j++){
if(s[i].phoneNumbers[j].type == 'iPhone'){
alert(s[i].firstName+" "+s[i].age+" "+s[i].phoneNumbers[j].type+"
"+s[i].phoneNumbers[j].number);
}
}
}
Here var s is your json object.

mongodb, update array element in array

i have a trouble.
i need to update value in nected array (array in array).
For example i have document like this:
{
"_id" : ObjectId("59eccf5ea7f6ff30be74d8ce"),
"name" : "some name",
"description" : "some description",
"users" : [
{
"id" : ObjectId("59d1549f4f5c6f6e0f1d6576"),
"technologies" : [
{"id": ObjectId("59450bc718fda360fdf4a719")},
]
},
{
"id": ObjectId("59d1549e4f5c6f6e0f1d6571"),
"technologies": [
{"id": ObjectId("59450f8318fda360fdf4a78b")},
{"id": ObjectId("59450bc718fda360fdf4a719")},
{"id": ObjectId("59450e3f18fda360fdf4a767")}
]
},
{
"id": ObjectId("59d154a44f5c6f6e0f1d65af"),
"technologies": [
ObjectId("59450f8318fda360fdf4a78b")
]
}
]
}
i need to delete exact technology from exact user. i know only:
_id - global document id
userId: 'users.id' element
technologyId: 'users.$.technologies.$.id' id of technology item that should be deleted
documentation of mongo says that i cant use two $ in update statement, but maybe is exists some actions to awoid this?
Try the following:
db.yourColl.update(
{
"_id": ObjectId("59eccf5ea7f6ff30be74d8ce"),
"users.id": ObjectId("59d1549e4f5c6f6e0f1d6571")
},
{
"$pull": {
"users.$.technologies": {
"id": ObjectId("59450bc718fda360fdf4a719")
}
}
}
)
The result should be:
{
"_id" : ObjectId("59eccf5ea7f6ff30be74d8ce"),
"name" : "some name",
"description" : "some description",
"users" : [
{
"id" : ObjectId("59d1549f4f5c6f6e0f1d6576"),
"technologies" : [
{
"id" : ObjectId("59450bc718fda360fdf4a719")
}
]
},
{
"id" : ObjectId("59d1549e4f5c6f6e0f1d6571"),
"technologies" : [
{
"id" : ObjectId("59450f8318fda360fdf4a78b")
},
{
"id" : ObjectId("59450e3f18fda360fdf4a767")
}
]
},
{
"id" : ObjectId("59d154a44f5c6f6e0f1d65af"),
"technologies" : [
ObjectId("59450f8318fda360fdf4a78b")
]
}
]
}

Resources