JOLT - JSON Arry converted to Nested Objects with Removal of Duplicates - arrays

currently I´m receiving a JSON prefix soup Array and it make things much easier if I could transform it with JOLT to have nested objects and remove duplicates. So currently this is the JSON that is coming back from the source:
[
{
"License_idlicense": 1,
"License_StartDate": "2022-11-15 00:00:00.0",
"License_EndDate": "2022-11-29 00:00:00.0",
"License_MonthlySearchMax": 500,
"CustomerId": 0,
"Guid": "c24c1fa3-0388-4c08-b431-8d0f05fe263a",
"Name": "Usuário Trial",
"CustStartDate": "2022-11-15 00:00:00.0",
"Connector_ConnectorId": 0,
"Connector_Name": "BigDataCorp",
"Connector_Version": "1.01"
},
{
"License_idlicense": 2,
"License_StartDate": "2022-11-15 00:00:00.0",
"License_EndDate": "2022-11-30 00:00:00.0",
"License_MonthlySearchMax": 500,
"CustomerId": 0,
"Guid": "c24c1fa3-0388-4c08-b431-8d0f05fe263a",
"Name": "Usuário Trial",
"CustStartDate": "2022-11-15 00:00:00.0",
"Connector_ConnectorId": 1,
"Connector_Name": "Credilink",
"Connector_Version": "1.01"
}
]
and on another topic, #Barbaros helped me suggesting the following transform:
JOLT Transform from Prefix Soup to Nested
[
{
"operation": "shift",
"spec": {
"*": {
"CustomerId": "ECOLicense.Customers[&1].&",
"Guid": "ECOLicense.Customers[&1].&",
"Name": "ECOLicense.Customers[&1].&",
"*_*": "ECOLicense.Customers[&1].&(0,1).&(0,2)"
}
}
}
]
which results in the following:
{
"ECOLicense" : {
"Customers" : [ {
"CustomerId" : 0,
"Guid" : "c24c1fa3-0388-4c08-b431-8d0f05fe263a",
"Name" : "Usuário Trial",
"License" : {
"idlicense" : 1,
"StartDate" : "2022-11-15 00:00:00.0",
"EndDate" : "2022-11-29 00:00:00.0",
"MonthlySearchMax" : 500
},
"Connector" : {
"ConnectorId" : 0,
"Name" : "BigDataCorp",
"Version" : "1.01"
}
}, {
"CustomerId" : 0,
"Guid" : "c24c1fa3-0388-4c08-b431-8d0f05fe263a",
"Name" : "Usuário Trial",
"License" : {
"idlicense" : 2,
"StartDate" : "2022-11-15 00:00:00.0",
"EndDate" : "2022-11-30 00:00:00.0",
"MonthlySearchMax" : 500
},
"Connector" : {
"ConnectorId" : 1,
"Name" : "Credilink",
"Version" : "1.01"
}
} ]
}
}
Unfortunately, as you can see, the Main object Customers is duplicate but has different Connector and License objects. I would like to have the following result:
{
"ECOLicense": {
"Customers": [
{
"CustomerId": 0,
"Guid": "c24c1fa3-0388-4c08-b431-8d0f05fe263a",
"Name": "Usuário Trial",
"Licenses": [
{
"License": {
"idlicense": 1,
"StartDate": "2022-11-15 00:00:00.0",
"EndDate": "2022-11-29 00:00:00.0",
"MonthlySearchMax": 500
},
"Connector": {
"ConnectorId": 0,
"Name": "BigDataCorp",
"Version": "1.01"
}
},
{
"License": {
"idlicense": 2,
"StartDate": "2022-11-15 00:00:00.0",
"EndDate": "2022-11-30 00:00:00.0",
"MonthlySearchMax": 500
},
"Connector": {
"ConnectorId": 1,
"Name": "Credilink",
"Version": "1.01"
}
}
]
}
]
}
}
Seems to be a big challenge with several transformations. Thank ou for your support

You can add an extra node, Licenses[0] while converting Customers[&1] to Customers[0], and then apply a cardinality transformation spec such as
[
{
"operation": "shift",
"spec": {
"*": {
"CustomerId|Guid|Name": "ECOLicense.Customers[0].&",
"*_*": "ECOLicense.Customers[0].Licenses[&1].&(0,1).&(0,2)"
}
}
},
{
"operation": "cardinality",
"spec": {
"*": {
"*": {
"*": {
"*": "ONE",
"Licenses": "MANY"
}
}
}
}
}
]

Related

Creating 1-1 mapping of Array results

I am trying to use data coming back from an ElasticSearch query, and get it into an Array in jsonata in a certain format.
Essentially, I need my result set to be like this:
{
"userName": [
"david2#david2.com",
"david2#david2.com",
"david2#david2.com",
"david2#david2.com"
],
"label": [
"Dealer",
"Inquiry",
"DD Test Skill1",
"_11DavidTest"
],
"value": [
3,
5,
2,
1
]
}
However, what I am getting is this:
{
"userName": "david2#david2.com",
"label": [
"Dealer",
"Inquiry",
"DD Test Skill1",
"_11DavidTest"
],
"value": [
3,
5,
2,
1
]
}
I am using the following to map the results:
(
$data := $map(data.hits.hits._source.item."Prod::User", function($v) {
{
"userName": $v.userName,
"label": $v.userSkillLevels.skill.name,
"value": $v.userSkillLevels.level
}
});
)
And my overall dataset returned form ElasticSearch is as follows:
{
"data": {
"took": 3,
"timed_out": false,
"_shards": {
"total": 15,
"successful": 15,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.002851,
"hits": [
{
"_index": "items_latest_production1_user",
"_type": "_doc",
"_id": "63d000766f67d40a73073d5d_f6144acf2b3ff31209ef9f6d461cd849",
"_score": 1.002851,
"_source": {
"item": {
"Prod::User": {
"userSkillLevels": [
{
"level": 3,
"skill": {
"name": "Dealer"
}
},
{
"level": 5,
"skill": {
"name": "Inquiry"
}
},
{
"level": 2,
"skill": {
"name": "DD Test Skill1"
}
},
{
"level": 1,
"skill": {
"name": "_11DavidTest"
}
}
],
"userName": "david2#david2.com"
}
}
}
}
]
}
}
}
I can see that each user that comes back from Elastic, then as all the skills/levels in an array associated to 1 username.
I need to have the same number of userNames, as their are skills ... and am struggling to get it just right.
Thoughts? Appreciate any help, I'm sure I'm overlooking something simple.
Actually, need this format:
[
{
"userName" : "david2#david2.com"
"label" : "Dealer"
"value" : 3
},
{
"userName" : "david2#david2.com"
"label" : "Inquiry"
"value" : 5
},
{
"userName" : "david2#david2.com"
"label" : "DD Test Skill"
"value" : 2
},
{
"userName" : "david2#david2.com"
"label" : "_11DavidTest"
"value" : 1
}
]
One way you could solve this is to create the userName array yourself and making sure it has the same length as userSkillLevels.skill, here's how you can do this:
(
$data := $map(data.hits.hits._source.item."Prod::User", function($v, $i, $a) {
{
"userName": $map([1..$count($v.userSkillLevels.skill)], function() {$v.userName}),
"label": $v.userSkillLevels.skill.name,
"value": $v.userSkillLevels.level
}
});
)
Feel free to check out this response in Stedi JSONata Playground: https://stedi.link/PVwGacu
Actually, need this format:
For the updated format from your question, this solution can produce it using the parent operator:
data.hits.hits._source.item."Prod::User".userSkillLevels.{
"userName": %.userName,
"label": skill.name,
"value": level
}
Check it out on the playground: https://stedi.link/gexB3Cb

JOLT : Make Nested Make a nested array as part of main array

I am trying to transform the JSON as below expected output. Stuck with the spec below. Can someone help in this?
There is an inner array with name "content" in the "results" array which I want to make it as a part of main array.
Input JSON
{
"total": 100,
"start": 1,
"page-length": 10,
"results": [
{
"index": 1,
"uri": "uri1/uri2",
"extracted": {
"kind": "object",
"content": [
{
"code": "A1",
"region": "APAC"
}
]
}
},
{
"index": 2,
"uri": "uri1/uri2",
"extracted": {
"kind": "object",
"content": [
{
"code": "B1",
"region": "AMER"
}
]
}
},
{
"index": 3,
"uri": "uri1/uri2",
"extracted": {
"kind": "object",
"content": [
{
"code": "C1",
"region": "APAC"
}
]
}
}
]
}
Expected json output
[
{
"code": "A1",
"region": "APAC"
},
{
"code": "B1",
"region": "AMER"
},
{
"code": "C1",
"region": "APAC"
}
]
Spec Tried
[
{
"operation": "shift",
"spec": {
"results": {
"*": {
"extracted": { "content": { "#": "&" } }
}
}
}
},
{
"operation": "shift",
"spec": {
"content": {
"#": "&"
}
}
}
]
Find the output below I am getting on Jolt tool
You can use # wildcard nested within square brackets in order to reach the level of the indexes of the "results" array such that
[
{
"operation": "shift",
"spec": {
"results": {
"*": {//the indexes of the "results" array
"extracted": {
"content": {
"*": {//the indexes of the "content" array
"*": "[#5].&"
}
}
}
}
}
}
}
]
Also the following spec, which repeats the content of the inner array without keys, will give the same result :
[
{
"operation": "shift",
"spec": {
"results": {
"*": {
"extracted": {
"content": {
"*": "[]"// [] seems like redundant but kept for the case the array has a single object.
}
}
}
}
}
}
]
which is quite similar to your tried one.

how to partially replace a string in array in APACHE NIFI using jolt

I want to always replace
abcd with abcd.india
xyxvv with ind.hello
india.gateway.url/time/123/v1 with india.ios.gw.url/time/123/v2
'someText' with 'diffText'
presented in website under information array, without modifying prefix and suffix
Input :
{
"requestId": 1122344,
"Name": "testing",
"information": [
{
"website": "abcd/122/ty",
"city": "pune",
"pincode": false,
"client_name": 5
},
{
"website": "http://xyxvv/122/ty",
"city": "delhi",
"pincode": false,
"client_name": 5
},
{
"website": "http://someText",
"city": "delhi",
"pincode": false,
"client_name": 5
},
{
"website": "http://india.gateway.url/time/123/v1",
"city": "maharashtra",
"pincode": false,
"client_name": 6
}
],
"ReasonText": "something",
"Code": "ABCD"
}
Desired Output :
{
"requestId" : 1122344,
"Name" : "testing",
"information" : [ {
"website" : "abcd.india/122/ty",
"city" : "pune",
"pincode" : false,
"client_name" : 5
}, {
"website" : "http://ind.hello/122/ty",
"city" : "delhi",
"pincode" : false,
"client_name" : 5
}, {
"website" : "http://diffText",
"city" : "delhi",
"pincode" : false,
"client_name" : 5
},
{
"website" : "http://india.ios.gw.url/time/123/v2",
"city" : "delhi",
"pincode" : false,
"client_name" : 6
} ],
"ReasonText" : "something",
"Code" : "ABCD"
}
You can consecutively use split, join and concat functions to split the string, and then conditionally convert substrings and concatenate back all the pieces such as
[
{
"operation": "modify-overwrite-beta",
"spec": {
"information": {
"*": {
"ht": "=split('://',#(1,website))",
"ws": "=split('/',#(1,ht[1]))",
"last_ws": "=lastElement(#(1,ws))",
"size_ws": "=size(#(1,ws))"
}
}
}
},
{
"operation": "shift",
"spec": {
"*": "&",
"information": {
"*": {
"ws": {
"0": { // for the first omponent of the array composed splitting the sting by / characters
"abcd": {
"#abcd.india": "&5[&4].ws"
},
"xyxvv": {
"#ind.hello": "&5[&4].ws"
},
"india.gateway.url": {
"#india.ios.gw.hello": "&5[&4].ws"
}
},
"*": {
"v1": {
"#(3,last_ws)": {//whenever match occurs with v1, set it to v2(occurence assumed only at this leaf level)
"#v2": "&6[&5].ws"
}
},
"*": {
"#ind.hello": "&5[&4].ws"
}
}
},
"*": "&2[&1].&"
}
}
}
}
,
{
"operation": "modify-overwrite-beta",
"spec": {
"information": {
"*": {
"ws": "=join('/',#(1,ws))",
"website": "=concat(#(1,ht[0]),'://',#(1,ws))"
}
}
}
},
{
"operation": "remove",
"spec": {
"information": {
"*": {
"ht": "",
"ws": "",
"*ws": ""
}
}
}
}
]

JOLT Transform from Prefix Soup to Nested

I´m receiving this JSON as part of a SQL Query and I want to create nested objects out of the composed properties. This can be done easily if it was 1 element, but the MYSQL query returns an array right at the root:
[
{
"idlicense": 1,
"StartDate": "2022-11-15 00:00:00.0",
"EndDate": "2022-11-29 00:00:00.0",
"MonthlySearchMax": 500,
"Customer_CustomerId": 0,
"Customer_Guid": "c24c1fa3-0388-4c08-b431-8d0f05fe263a",
"Customer_Name": "User",
"Customer_CustStartDate": "2022-11-15 00:00:00.0",
"Connector_ConnectorId": 0,
"Connector_Name": "connector0",
"Connector_Version": "1.01"
},
{
"idlicense": 2,
"StartDate": "2022-11-15 00:00:00.0",
"EndDate": "2022-11-29 00:00:00.0",
"MonthlySearchMax": 500,
"Customer_CustomerId": 0,
"Customer_Guid": "c24c1fa3-0388-4c08-b431-8d0f05fe263a",
"Customer_Name": "User",
"Customer_CustStartDate": "2022-11-15 00:00:00.0",
"Connector_ConnectorId": 1,
"Connector_Name": "connector1",
"Connector_Version": "1.01"
}
]
I´m trying to create a nested JSON with JOLT on NIFI but can´t find the right format looking at the available examples and seems to be simple. I need my final JSON to look like this:
{
"Licenses": [
{
"idLicense": "1",
"Customer": {
"CustomerId": "0",
"Guid": "c24c1fa3-0388-4c08-b431-8d0f05fe263a",
"Name": "User",
"CustStartDate": "2022-11-15 00:00:00.0"
},
"Connector": {
"ConnectorId": "0",
"Name": "Connector0",
"Version": "1.01"
}
},
{
"idLicense": "2",
"Customers": {
"CustomerId": "1",
"Guid": "c24c1fa3-0388-4c08-b431-8d0f05fe263a",
"Name": "User",
"CustStartDate": "2022-11-15 00:00:00.0"
},
"Connector": {
"ConnectorId": "1",
"Name": "Connector1",
"Version": "1.01"
}
}
]
}
So far I´ve done this JOLT transform:
{
"idlicense": [1,2],
"Customer": {
"CustomerId": [0,0],
"Guid": ["c24c1fa3-0388-4c08-b431-8d0f05fe263a","c24c1fa3-0388-4c08-b431-8d0f05fe263a"],
"Name": ["User","User"],
"CustStartDate": ["2022-11-15 00:00:00.0","2022-11-15 00:00:00.0"]
},
"Connector": {
"ConnectorId": [0,1],
"Name": ["Conector0","Connector1"],
"Version": ["1.01","1.01"]
}
}
thank you for your support!
You can use this shift transformation spec
[
{
"operation": "shift",
"spec": {
"*": {
"idlicense": "Licenses[&1].&",
"*_*": "Licenses[&1].&(0,1).&(0,2)"
}
}
}
]
where &(0,1) represents before, &(0,2) after underscore character

Elasticsearch date histogram aggregation with embedded array carries values over when multiple entries

When using the date histogram with an embedded array to create a histogram for displaying used discounts per day, I have the problem that when a customer has multiple subscriptions the counted discount is counted for all subscriptions even when the createdAt is different.
{
"aggs": {
"histogram": {
"aggs": {
"counter": {
"terms": {
"field": "subscriptions.discounts.couponCodeKey.keyword",
"missing": "0",
"size": 1000
}
}
},
"date_histogram": {
"extended_bounds": {
"max": "2022-07-19T21:38:55.3506091Z",
"min": "2020-12-31T23:00:00Z"
},
"field": "subscriptions.createdAt",
"calendar_interval": "year",
"time_zone": "Europe/Zurich"
}
}
},
"query": {
"range": {
"subscriptions.createdAt": {
"gte": "2020-12-31T23:00:00Z"
}
}
},
"size": 0,
"sort": [
{
"subscriptions.createdAt": {
"order": "asc"
}
}
]
}
The sample object looks as follows:
{
"Subscriptions": [
{
"Id": "c211ff01-3720-4ad6-99a3-b923696e4f1c",
"CreatedAt": "2022-06-20T18:38:31.403Z",
"Discounts": [
{
"CouponCodeKey": "cash"
}
]
},
{
"Id": "df7fd661-b07a-4001-b9a6-6c784deca706",
"CreatedAt": "2022-07-05T08:00:00Z",
"Discounts": null
}
],
"CreatedAt": "2022-06-20T18:37:38.362Z"
}
This creates this wrong result:
"aggregations" : {
"date_histogram#histogram" : {
"buckets" : [
{
"key_as_string" : "2022-06-20T00:00:00.000+02:00",
"key" : 1655676000000,
"doc_count" : 1,
"sterms#counter" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "cash",
"doc_count" : 1
}
]
}
},
{
"key_as_string" : "2022-07-05T00:00:00.000+02:00",
"key" : 1656972000000,
"doc_count" : 1,
"sterms#counter" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "cash",
"doc_count" : 1
}
]
}
}
]
}
}
The discount from the first subscription at 20.06.2022 was also counted towards the 05.07.2022 subscription, this should not happen.
Thanks in advance for any help!

Resources