How to use jq to convert 2 objects into CSV? - arrays

I'm trying to convert objects that look like this:
{
"metricId": "metric1",
"data": [
{
"dimensions": [
"DEVICE-a1b2c3",
"queue1"
],
"dimensionMap": {
"Queue": "queue1",
"enitity": "DEVICE-a1b2c3"
},
"timestamps": [
1626286800000
],
"values": [
1
]
},
{
"dimensions": [
"DEVICE-a1b2c3",
"queue2"
],
"dimensionMap": {
"Queue": "queue2",
"entity": "DEVICE-a1b2c3"
},
"timestamps": [
1626286800000
],
"values": [
2
]
}
]
}
{
"metricId": "metric2",
"data": [
{
"dimensions": [
"DEVICE-a1b2c3",
"queue1"
],
"dimensionMap": {
"Queue": "queue1",
"entity": "DEVICE-a1b2c3"
},
"timestamps": [
1626286800000
],
"values": [
11
]
},
{
"dimensions": [
"DEVICE-a1b2c3",
"queue2"
],
"dimensionMap": {
"Queue": "queue2",
"entity": "DEVICE-a1b2c3"
},
"timestamps": [
1626286800000
],
"values": [
22
]
}
]
}
To CSV that looks like this:
"metric1","queue1",1626286800000,1
"metric1","queue1",1626286800000,2
"metric2","queue1",1626286800000,11
"metric2","queue1",1626286800000,22
I was somewhat successful but I'm getting duplicates in my results.
Command: jq -r '. | {id:.metricId, queue: .data[].dimensionMap.Queue, time: .data[].timestamps[0], value: .data[].values[0]} | [.id, .queue, .time, .value] | #csv'
Output:
"metric1","queue1",1626286800000,1
"metric1","queue1",1626286800000,2
"metric1","queue1",1626286800000,1
"metric1","queue1",1626286800000,2
"metric1","queue2",1626286800000,1
"metric1","queue2",1626286800000,2
"metric1","queue2",1626286800000,1
"metric1","queue2",1626286800000,2
"metric2","queue1",1626286800000,11
"metric2","queue1",1626286800000,22
"metric2","queue1",1626286800000,11
"metric2","queue1",1626286800000,22
"metric2","queue2",1626286800000,11
"metric2","queue2",1626286800000,22
"metric2","queue2",1626286800000,11
"metric2","queue2",1626286800000,2
I've looked over the documentation and several blog posts/videos but I haven't been able to find a solution so far. Thank you for your help.

One way to tackle the problem is to use jq "$-variables":
.metricId as $metricId
| .data[]
| .dimensionMap.Queue as $q
| [.timestamps, .values] | transpose[]
| [$metricId, $q, .[]]
| #csv

Related

How to filter jCal with jq?

I have an jCal JSON array which I'd like to filter with jq. JSON arrays are somewhat new to me and I have been banging my head to the wall on this for hours...
The file looks like this:
[
"vcalendar",
[
[
"calscale",
{},
"text",
"GREGORIAN"
],
[
"version",
{},
"text",
"2.0"
],
[
"prodid",
{},
"text",
"-//SabreDAV//SabreDAV//EN"
],
[
"x-wr-calname",
{},
"unknown",
"Call log private"
],
[
"x-apple-calendar-color",
{},
"unknown",
"#ffaa00"
],
[
"refresh-interval",
{},
"duration",
"PT4H"
],
[
"x-published-ttl",
{},
"unknown",
"PT4H"
]
],
[
[
"vevent",
[
[
"dtstamp",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"created",
{},
"date-time",
"2015-02-18T16:44:04Z"
],
[
"uid",
{},
"text",
"9b23142b-8d86-3e17-2f44-2bed65b2e471"
],
[
"last-modified",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"description",
{},
"text",
"Phone call to +49xxxxxxxxxx lasted for 0 seconds."
],
[
"summary",
{},
"text",
"Outgoing: +49xxxxxxx"
],
[
"dtstart",
{},
"date-time",
"2015-02-18T10:58:12Z"
],
[
"dtend",
{},
"date-time",
"2015-02-18T10:58:44Z"
],
[
"transp",
{},
"text",
"OPAQUE"
]
],
[]
],
[
"vevent",
[
[
"dtstamp",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"created",
{},
"date-time",
"2015-01-09T19:12:05Z"
],
[
"uid",
{},
"text",
"c337e092-a012-5f5a-497f-932fbc6159e5"
],
[
"last-modified",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"description",
{},
"text",
"Phone call to +1xxxxxxxxxx lasted for 39 seconds."
],
[
"summary",
{},
"text",
"Outgoing: +1xxxxxxxxxx"
],
[
"dtstart",
{},
"date-time",
"2015-01-09T17:23:16Z"
],
[
"dtend",
{},
"date-time",
"2015-01-09T17:24:19Z"
],
[
"transp",
{},
"text",
"OPAQUE"
]
],
[]
],
]
]
I would like to filter out dtstart, dtend, the target phone number and the connection duration from the description for each vevent which was created e.g. in January 2019 ("2019-01.*") and output them as a CSV.
This JSON is a bit strange because the information is stored position-based in an array instead of an object.
Using the first element of an array ("vevent") to identify its contents is not the best practice.
But anyway ... if this is the data source you are dealing with, this code should help you.
jq -r '..
| arrays
| select(.[0] == "vevent")[1]
| [
(.[] | select(.[0] == "dtstart") | .[3]),
(.[] | select(.[0] == "dtend") | .[3]),
(.[] | select(.[0] == "description") | .[3])
]
| #csv
'
Alternatively, the repeating code can be transferred into a function
jq -r 'def getField($name; $idx): .[] | select(.[0] == $name) | .[$idx];
..
| arrays
| select(.[0] == "vevent")[1]
| [ getField("dtstart"; 3), getField("dtend"; 3), getField("description"; 3) ]
| #csv
'
Output
"2015-02-18T10:58:12Z","2015-02-18T10:58:44Z","Phone call to +49xxxxxxxxxx lasted for 0 seconds."
"2015-01-09T17:23:16Z","2015-01-09T17:24:19Z","Phone call to +1xxxxxxxxxx lasted for 39 seconds."
You can also extract phone number and duration with the help of regular expressions in jq:
jq -r 'def getField($name; $idx): .[] | select(.[0] == $name) | .[$idx];
..
| arrays
| select(.[0] == "vevent")[1]
| [
getField("dtstart"; 3),
getField("dtend"; 3),
(getField("description"; 3) | match("call to ([^ ]*)") | .captures[0].string),
(getField("description"; 3) | match("(\\d+) seconds") | .captures[0].string)
]
| #csv
'
Output
"2015-02-18T10:58:12Z","2015-02-18T10:58:44Z","+49xxxxxxxxxx","0"
"2015-01-09T17:23:16Z","2015-01-09T17:24:19Z","+1xxxxxxxxxx","39"
Not the most efficient solution, but quite understandable by first building an object out of key-value pairs and then filtering and transforming those.
.[2][][1] is a stream of events encoded as arrays.
Which means that:
.[2][][1]
| map({key:.[0], value:.[3]})
| from_entries
the above gives you a stream of objects; one object per event:
{
"dtstamp": "2015-04-05T16:42:10Z",
"created": "2015-02-18T16:44:04Z",
"uid": "9b23142b-8d86-3e17-2f44-2bed65b2e471",
"last-modified": "2015-04-05T16:42:10Z",
"description": "Phone call to +49xxxxxxxxxx lasted for 0 seconds.",
"summary": "Outgoing: +49xxxxxxx",
"dtstart": "2015-02-18T10:58:12Z",
"dtend": "2015-02-18T10:58:44Z",
"transp": "OPAQUE"
}
{
"dtstamp": "2015-04-05T16:42:10Z",
"created": "2015-01-09T19:12:05Z",
"uid": "c337e092-a012-5f5a-497f-932fbc6159e5",
"last-modified": "2015-04-05T16:42:10Z",
"description": "Phone call to +1xxxxxxxxxx lasted for 39 seconds.",
"summary": "Outgoing: +1xxxxxxxxxx",
"dtstart": "2015-01-09T17:23:16Z",
"dtend": "2015-01-09T17:24:19Z",
"transp": "OPAQUE"
}
Now plug that into the final program: select the wanted objects, add CSV headers, build the rows and ultimately convert to CSV:
["start", "end", "description"],
(
.[2][][1]
| map({key:.[0], value:.[3]})
| from_entries
| select(.created | startswith("2015-01"))
| [.dtstart, .dtend, .description]
)
| #csv
Raw output (-r):
"start","end","description"
"2015-01-09T17:23:16Z","2015-01-09T17:24:19Z","Phone call to +1xxxxxxxxxx lasted for 39 seconds."
If you need to further transform .description, you can use split or capture. Or use a different property, such as .summary, in your CSV rows. Only a single line needs to be changed.

Sort array by two fields in different levels

My input:
[
{
"nfStatusNotificationUri": "http://172.19.0.2:32672/callback/nnrf-nfm/v1/onNFStatusEventPost/4e0becf9-c3ec-4002-a32b-2e35b76469b2",
"subscrCond": {
"serviceName": "namf-evts"
},
"subscriptionId": "36bc52dfdbdd4044b97ef15684706205",
"validityTime": "2022-04-30T16:40:48.274Z",
"reqNotifEvents": [
"NF_DEREGISTERED",
"NF_PROFILE_CHANGED",
"NF_REGISTERED"
]
},
{
"nfStatusNotificationUri": "http://172.19.0.2:32672/callback/nnrf-nfm/v1/onNFStatusEventPost/5319def1-af0b-4b7b-a94e-b787e614c065",
"subscrCond": {
"serviceName": "nbsf-management"
},
"subscriptionId": "e2e904bb52ca4fd6b048841c83a4c38e",
"validityTime": "2022-04-30T16:40:48.26Z",
"reqNotifEvents": [
"NF_DEREGISTERED",
"NF_PROFILE_CHANGED",
"NF_REGISTERED"
]
},
{
"nfStatusNotificationUri": "http://172.19.0.2:32672/callback/nnrf-nfm/v1/onNFStatusEventPost/31dfe10b-4020-47bd-943e-a3e293086b29",
"subscrCond": {
"serviceName": "namf-comm"
},
"subscriptionId": "e508077fab4f4b8d9dd732176a3777b9",
"validityTime": "2022-04-30T16:40:48.273Z",
"reqNotifEvents": [
"NF_DEREGISTERED",
"NF_PROFILE_CHANGED",
"NF_REGISTERED"
]
}
]
I would like to sort it by "subscriptionId" and "serviceName".
I can sort by subscriptionId but I don't know how to specify serviceName to the following expression.
jq -S '.|=sort_by(.subscriptionId)|.[].reqNotifEvents|=sort |del(.[].subscriptionId, .[].validityTime, .[].nfStatusNotificationUri)'
You can parameterize sort_by by a list of keys like so:
sort_by(.subscriptionId, .subscrCond.serviceName)
Online demo

Loop on JSON array in bash script and pull data through JQ

i want to pull data from jSON file through jq in bash script. I have 50 plus SG objects in JSON .
here is an example of one SG . I want to print a VPC id and group id in one line and so on for another objects.
Solution i tried :
jq -c . $old |
while IFS= read -r obj; do
vpcidold=$( printf '%s' "$obj" | jq '.SecurityGroups[].VpcId')
securityidold=$( printf '%s' "$obj" | jq '.SecurityGroups[].GroupId')
echo "${vpcidold}||${securityidold}"
done > oldtest.json
it is working file but giving data in a line by line and want to do more optimised this with for loop.
How can I create a loop on JSON array to get desire output
"SG": [
{
"Description": "des",
"GroupName": "Gpname",
"IpPermissions": [
{
"FromPort": 80,
"IpProtocol": "tcp",
"IpRanges": [
{
"CidrIp": "0.0.0.0/0"
}
],
"Ipv6Ranges": [],
"PrefixListIds": [],
"ToPort": 80,
"UserIdGroupPairs": []
}
],
"OwnerId": "123",
"GroupId": "sg",
"IpPermissionsEgress": [
{
"IpProtocol": "-1",
"IpRanges": [
{
"CidrIp": "0.0.0.0/0"
}
],
"Ipv6Ranges": [],
"PrefixListIds": [],
"UserIdGroupPairs": []
}
],
"Tags": [
{
"Key": "projectcode",
"Value": "none"
},
{
"Key": "sgid",
"Value": "sg-123"
}
],
"VpcId": "vpc-123"
}
]
},
If the JSON file is just an array of the objects, you don't need to loop over them in bash. jq will loop over them implicitely:
jq -r '.[][][] | (.VpcId + "||" + .GroupId)' file.json
Tested on the following input:
[
{ "SG": [
{
"Description": "des",
"GroupName": "Gpname",
"GroupId": "sg",
"VpcId": "vpc-123"
}
] },
{ "SG": [
{
"Description": "des",
"GroupName": "xyz",
"GroupId": "sg-12345",
"VpcId": "vpc-12345"
}
] }
]
Output:
vpc-123||sg
vpc-12345||sg-12345

JQ - return one array for multiple nested JSON arrays

I have a JSON structure that has repeated keys per message. I would like to combine these into one array per message.
[
{
"id": 1,
"PolicyItems": [
{
"accesses": [
{
"isAllowed": true,
"type": "drop"
},
{
"isAllowed": true,
"type": "select"
}
],
"groups": [],
"users": ["admin"]
}
]
},
{
"id": 2,
"PolicyItems": [
{
"accesses": [
{
"isAllowed": true,
"type": "drop"
}
{
"isAllowed": true,
"type": "update"
}
],
"groups": [],
"users": [
"admin",
"admin2"
]
}
]
}]
I have this:
cat ranger_v2.json | jq -r '[.[] | {"id", "access_type":(.policyItems[].accesses[] | .type)}]'
But this outputs:
[
{
"id": 1,
"access_type": "drop"
},
{
"id": 1,
"access_type": "select"
},
{
"id": 2,
"access_type": "drop"
},
{
"id": 2,
"access_type": "update"
}
]
However, what I want is to output:
[{
"id": 1,
"access_type": ["drop|select"]
},
{
"id": 2,
"access_type": ["drop|update"]
}]
Any ideas how I could do this? I'm a bit stumped!
The values could be 'drop' and 'select', but equally could be anything, so I don't want to hard code these.
Let's start by observing that with your input, the filter:
.[]
| {id, access_type: [.PolicyItems[].accesses[].type]}
produces the two objects:
{
"id": 1,
"access_type": [
"drop",
"select"
]
}
{
"id": 2,
"access_type": [
"drop",
"update"
]
}
Now it's a simple matter to tweak the above filter so as to produce the desired format:
[.[]
| {id, access_type: [.PolicyItems[].accesses[].type]}
| .access_type |= [join("|")] ]
Or equivalently, the one-liner:
map({id, access_type: [[.PolicyItems[].accesses[].type] | join("|")]})
I found something that I can work with.
If I wrap the query with []...
cat ranger_v2.json | jq -r '[.[] | {"id", "access_type":([.policyItems[].accesses[] | .type])}]'
... it produces this type of output:
[
{
"id": 1,
"access_type": ["drop","select"]
},
{
"id": 2,
"access_type": ["drop","update"]
}
]
I can then use the following:
(if (."access_type" | length > 0 ) then . else ."access_type" = [""] end )]
and
(."access_type" | #tsv)
Before I can convert to #csv and use sed to replace the tab with a pipe.
#csv' | sed -e "s/[\t]\+/|/g"
It may not be the most economical way of getting what I need, but it works for me. (Please let me know if there's a better way of doing it.)
cat ranger_v2.json | jq -r '[.[] | {"id", "access_type":([.policyItems[].accesses[] | .type])}] | .[] | [(if (."access_type" | length > 0 ) then . else ."access_type" = [""] end )] | .[] | [.id, (."access_type" | #tsv)] | #csv' | sed -e "s/[\t]\+/|/g"

how to add an element to a list only when it is not exists already if the list is null create one?

input
{
"apps": [
{
"name": "whatever1",
"id": "ID1"
},
{
"name": "whatever2",
"id": "ID2",
"dep": [
"a.jar"
]
},
{
"name": "whatever3",
"id": "ID3",
"dep": [
"a.jar",
"b.jar"
]
}
]
}
output
{
"apps": [
{
"name": "whatever1",
"id": "ID1",
"dep": [
"b.jar"
]
},
{
"name": "whatever2",
"id": "ID2",
"dep": [
"a.jar",
"b.jar"
]
},
{
"name": "whatever3",
"id": "ID3",
"dep": [
"a.jar",
"b.jar"
]
}
]
}
in the above example
whatever1 does not have dep, so create one.
whatever2 has dep and does not have b.jar, so add b.jar
whatever3 aready has dep and b.jar is there so untouched.
what i have tried.
# add blindly, whatever3 is not right
cat dep.json | jq '.apps[].dep += ["b.jar"]'
# missed one level and whatever3 is gone.
cat dep.json | jq '.apps | map(select(.dep == null or (.dep | contains(["b.jar"]) | not)))[] | .dep += ["b.jar"]'
For the sake of clarity, let's define a helper function for performing the core task:
# It is assumed that the input is an object
# that either does not have the specified key or
# that it is array-valued
def ensure_has($key; $value):
if has($key) and (.[$key] | index($value)) then .
else .[$key] += [$value]
end ;
The task can now be accomplished in a straightforward way:
.apps |= map(ensure_has("dep"; "b.jar"))
Alternatively ...
.apps[] |= ensure_has("dep"; "b.jar")
after some trial and error, it looks like this is one way to do it.
cat dep.json | jq '.apps[].dep |= (. + ["b.jar"] | unique)'

Resources