I have an jCal JSON array which I'd like to filter with jq. JSON arrays are somewhat new to me and I have been banging my head to the wall on this for hours...
The file looks like this:
[
"vcalendar",
[
[
"calscale",
{},
"text",
"GREGORIAN"
],
[
"version",
{},
"text",
"2.0"
],
[
"prodid",
{},
"text",
"-//SabreDAV//SabreDAV//EN"
],
[
"x-wr-calname",
{},
"unknown",
"Call log private"
],
[
"x-apple-calendar-color",
{},
"unknown",
"#ffaa00"
],
[
"refresh-interval",
{},
"duration",
"PT4H"
],
[
"x-published-ttl",
{},
"unknown",
"PT4H"
]
],
[
[
"vevent",
[
[
"dtstamp",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"created",
{},
"date-time",
"2015-02-18T16:44:04Z"
],
[
"uid",
{},
"text",
"9b23142b-8d86-3e17-2f44-2bed65b2e471"
],
[
"last-modified",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"description",
{},
"text",
"Phone call to +49xxxxxxxxxx lasted for 0 seconds."
],
[
"summary",
{},
"text",
"Outgoing: +49xxxxxxx"
],
[
"dtstart",
{},
"date-time",
"2015-02-18T10:58:12Z"
],
[
"dtend",
{},
"date-time",
"2015-02-18T10:58:44Z"
],
[
"transp",
{},
"text",
"OPAQUE"
]
],
[]
],
[
"vevent",
[
[
"dtstamp",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"created",
{},
"date-time",
"2015-01-09T19:12:05Z"
],
[
"uid",
{},
"text",
"c337e092-a012-5f5a-497f-932fbc6159e5"
],
[
"last-modified",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"description",
{},
"text",
"Phone call to +1xxxxxxxxxx lasted for 39 seconds."
],
[
"summary",
{},
"text",
"Outgoing: +1xxxxxxxxxx"
],
[
"dtstart",
{},
"date-time",
"2015-01-09T17:23:16Z"
],
[
"dtend",
{},
"date-time",
"2015-01-09T17:24:19Z"
],
[
"transp",
{},
"text",
"OPAQUE"
]
],
[]
],
]
]
I would like to filter out dtstart, dtend, the target phone number and the connection duration from the description for each vevent which was created e.g. in January 2019 ("2019-01.*") and output them as a CSV.
This JSON is a bit strange because the information is stored position-based in an array instead of an object.
Using the first element of an array ("vevent") to identify its contents is not the best practice.
But anyway ... if this is the data source you are dealing with, this code should help you.
jq -r '..
| arrays
| select(.[0] == "vevent")[1]
| [
(.[] | select(.[0] == "dtstart") | .[3]),
(.[] | select(.[0] == "dtend") | .[3]),
(.[] | select(.[0] == "description") | .[3])
]
| #csv
'
Alternatively, the repeating code can be transferred into a function
jq -r 'def getField($name; $idx): .[] | select(.[0] == $name) | .[$idx];
..
| arrays
| select(.[0] == "vevent")[1]
| [ getField("dtstart"; 3), getField("dtend"; 3), getField("description"; 3) ]
| #csv
'
Output
"2015-02-18T10:58:12Z","2015-02-18T10:58:44Z","Phone call to +49xxxxxxxxxx lasted for 0 seconds."
"2015-01-09T17:23:16Z","2015-01-09T17:24:19Z","Phone call to +1xxxxxxxxxx lasted for 39 seconds."
You can also extract phone number and duration with the help of regular expressions in jq:
jq -r 'def getField($name; $idx): .[] | select(.[0] == $name) | .[$idx];
..
| arrays
| select(.[0] == "vevent")[1]
| [
getField("dtstart"; 3),
getField("dtend"; 3),
(getField("description"; 3) | match("call to ([^ ]*)") | .captures[0].string),
(getField("description"; 3) | match("(\\d+) seconds") | .captures[0].string)
]
| #csv
'
Output
"2015-02-18T10:58:12Z","2015-02-18T10:58:44Z","+49xxxxxxxxxx","0"
"2015-01-09T17:23:16Z","2015-01-09T17:24:19Z","+1xxxxxxxxxx","39"
Not the most efficient solution, but quite understandable by first building an object out of key-value pairs and then filtering and transforming those.
.[2][][1] is a stream of events encoded as arrays.
Which means that:
.[2][][1]
| map({key:.[0], value:.[3]})
| from_entries
the above gives you a stream of objects; one object per event:
{
"dtstamp": "2015-04-05T16:42:10Z",
"created": "2015-02-18T16:44:04Z",
"uid": "9b23142b-8d86-3e17-2f44-2bed65b2e471",
"last-modified": "2015-04-05T16:42:10Z",
"description": "Phone call to +49xxxxxxxxxx lasted for 0 seconds.",
"summary": "Outgoing: +49xxxxxxx",
"dtstart": "2015-02-18T10:58:12Z",
"dtend": "2015-02-18T10:58:44Z",
"transp": "OPAQUE"
}
{
"dtstamp": "2015-04-05T16:42:10Z",
"created": "2015-01-09T19:12:05Z",
"uid": "c337e092-a012-5f5a-497f-932fbc6159e5",
"last-modified": "2015-04-05T16:42:10Z",
"description": "Phone call to +1xxxxxxxxxx lasted for 39 seconds.",
"summary": "Outgoing: +1xxxxxxxxxx",
"dtstart": "2015-01-09T17:23:16Z",
"dtend": "2015-01-09T17:24:19Z",
"transp": "OPAQUE"
}
Now plug that into the final program: select the wanted objects, add CSV headers, build the rows and ultimately convert to CSV:
["start", "end", "description"],
(
.[2][][1]
| map({key:.[0], value:.[3]})
| from_entries
| select(.created | startswith("2015-01"))
| [.dtstart, .dtend, .description]
)
| #csv
Raw output (-r):
"start","end","description"
"2015-01-09T17:23:16Z","2015-01-09T17:24:19Z","Phone call to +1xxxxxxxxxx lasted for 39 seconds."
If you need to further transform .description, you can use split or capture. Or use a different property, such as .summary, in your CSV rows. Only a single line needs to be changed.
My input:
[
{
"nfStatusNotificationUri": "http://172.19.0.2:32672/callback/nnrf-nfm/v1/onNFStatusEventPost/4e0becf9-c3ec-4002-a32b-2e35b76469b2",
"subscrCond": {
"serviceName": "namf-evts"
},
"subscriptionId": "36bc52dfdbdd4044b97ef15684706205",
"validityTime": "2022-04-30T16:40:48.274Z",
"reqNotifEvents": [
"NF_DEREGISTERED",
"NF_PROFILE_CHANGED",
"NF_REGISTERED"
]
},
{
"nfStatusNotificationUri": "http://172.19.0.2:32672/callback/nnrf-nfm/v1/onNFStatusEventPost/5319def1-af0b-4b7b-a94e-b787e614c065",
"subscrCond": {
"serviceName": "nbsf-management"
},
"subscriptionId": "e2e904bb52ca4fd6b048841c83a4c38e",
"validityTime": "2022-04-30T16:40:48.26Z",
"reqNotifEvents": [
"NF_DEREGISTERED",
"NF_PROFILE_CHANGED",
"NF_REGISTERED"
]
},
{
"nfStatusNotificationUri": "http://172.19.0.2:32672/callback/nnrf-nfm/v1/onNFStatusEventPost/31dfe10b-4020-47bd-943e-a3e293086b29",
"subscrCond": {
"serviceName": "namf-comm"
},
"subscriptionId": "e508077fab4f4b8d9dd732176a3777b9",
"validityTime": "2022-04-30T16:40:48.273Z",
"reqNotifEvents": [
"NF_DEREGISTERED",
"NF_PROFILE_CHANGED",
"NF_REGISTERED"
]
}
]
I would like to sort it by "subscriptionId" and "serviceName".
I can sort by subscriptionId but I don't know how to specify serviceName to the following expression.
jq -S '.|=sort_by(.subscriptionId)|.[].reqNotifEvents|=sort |del(.[].subscriptionId, .[].validityTime, .[].nfStatusNotificationUri)'
You can parameterize sort_by by a list of keys like so:
sort_by(.subscriptionId, .subscrCond.serviceName)
Online demo
i want to pull data from jSON file through jq in bash script. I have 50 plus SG objects in JSON .
here is an example of one SG . I want to print a VPC id and group id in one line and so on for another objects.
Solution i tried :
jq -c . $old |
while IFS= read -r obj; do
vpcidold=$( printf '%s' "$obj" | jq '.SecurityGroups[].VpcId')
securityidold=$( printf '%s' "$obj" | jq '.SecurityGroups[].GroupId')
echo "${vpcidold}||${securityidold}"
done > oldtest.json
it is working file but giving data in a line by line and want to do more optimised this with for loop.
How can I create a loop on JSON array to get desire output
"SG": [
{
"Description": "des",
"GroupName": "Gpname",
"IpPermissions": [
{
"FromPort": 80,
"IpProtocol": "tcp",
"IpRanges": [
{
"CidrIp": "0.0.0.0/0"
}
],
"Ipv6Ranges": [],
"PrefixListIds": [],
"ToPort": 80,
"UserIdGroupPairs": []
}
],
"OwnerId": "123",
"GroupId": "sg",
"IpPermissionsEgress": [
{
"IpProtocol": "-1",
"IpRanges": [
{
"CidrIp": "0.0.0.0/0"
}
],
"Ipv6Ranges": [],
"PrefixListIds": [],
"UserIdGroupPairs": []
}
],
"Tags": [
{
"Key": "projectcode",
"Value": "none"
},
{
"Key": "sgid",
"Value": "sg-123"
}
],
"VpcId": "vpc-123"
}
]
},
If the JSON file is just an array of the objects, you don't need to loop over them in bash. jq will loop over them implicitely:
jq -r '.[][][] | (.VpcId + "||" + .GroupId)' file.json
Tested on the following input:
[
{ "SG": [
{
"Description": "des",
"GroupName": "Gpname",
"GroupId": "sg",
"VpcId": "vpc-123"
}
] },
{ "SG": [
{
"Description": "des",
"GroupName": "xyz",
"GroupId": "sg-12345",
"VpcId": "vpc-12345"
}
] }
]
Output:
vpc-123||sg
vpc-12345||sg-12345
I have a JSON structure that has repeated keys per message. I would like to combine these into one array per message.
[
{
"id": 1,
"PolicyItems": [
{
"accesses": [
{
"isAllowed": true,
"type": "drop"
},
{
"isAllowed": true,
"type": "select"
}
],
"groups": [],
"users": ["admin"]
}
]
},
{
"id": 2,
"PolicyItems": [
{
"accesses": [
{
"isAllowed": true,
"type": "drop"
}
{
"isAllowed": true,
"type": "update"
}
],
"groups": [],
"users": [
"admin",
"admin2"
]
}
]
}]
I have this:
cat ranger_v2.json | jq -r '[.[] | {"id", "access_type":(.policyItems[].accesses[] | .type)}]'
But this outputs:
[
{
"id": 1,
"access_type": "drop"
},
{
"id": 1,
"access_type": "select"
},
{
"id": 2,
"access_type": "drop"
},
{
"id": 2,
"access_type": "update"
}
]
However, what I want is to output:
[{
"id": 1,
"access_type": ["drop|select"]
},
{
"id": 2,
"access_type": ["drop|update"]
}]
Any ideas how I could do this? I'm a bit stumped!
The values could be 'drop' and 'select', but equally could be anything, so I don't want to hard code these.
Let's start by observing that with your input, the filter:
.[]
| {id, access_type: [.PolicyItems[].accesses[].type]}
produces the two objects:
{
"id": 1,
"access_type": [
"drop",
"select"
]
}
{
"id": 2,
"access_type": [
"drop",
"update"
]
}
Now it's a simple matter to tweak the above filter so as to produce the desired format:
[.[]
| {id, access_type: [.PolicyItems[].accesses[].type]}
| .access_type |= [join("|")] ]
Or equivalently, the one-liner:
map({id, access_type: [[.PolicyItems[].accesses[].type] | join("|")]})
I found something that I can work with.
If I wrap the query with []...
cat ranger_v2.json | jq -r '[.[] | {"id", "access_type":([.policyItems[].accesses[] | .type])}]'
... it produces this type of output:
[
{
"id": 1,
"access_type": ["drop","select"]
},
{
"id": 2,
"access_type": ["drop","update"]
}
]
I can then use the following:
(if (."access_type" | length > 0 ) then . else ."access_type" = [""] end )]
and
(."access_type" | #tsv)
Before I can convert to #csv and use sed to replace the tab with a pipe.
#csv' | sed -e "s/[\t]\+/|/g"
It may not be the most economical way of getting what I need, but it works for me. (Please let me know if there's a better way of doing it.)
cat ranger_v2.json | jq -r '[.[] | {"id", "access_type":([.policyItems[].accesses[] | .type])}] | .[] | [(if (."access_type" | length > 0 ) then . else ."access_type" = [""] end )] | .[] | [.id, (."access_type" | #tsv)] | #csv' | sed -e "s/[\t]\+/|/g"
input
{
"apps": [
{
"name": "whatever1",
"id": "ID1"
},
{
"name": "whatever2",
"id": "ID2",
"dep": [
"a.jar"
]
},
{
"name": "whatever3",
"id": "ID3",
"dep": [
"a.jar",
"b.jar"
]
}
]
}
output
{
"apps": [
{
"name": "whatever1",
"id": "ID1",
"dep": [
"b.jar"
]
},
{
"name": "whatever2",
"id": "ID2",
"dep": [
"a.jar",
"b.jar"
]
},
{
"name": "whatever3",
"id": "ID3",
"dep": [
"a.jar",
"b.jar"
]
}
]
}
in the above example
whatever1 does not have dep, so create one.
whatever2 has dep and does not have b.jar, so add b.jar
whatever3 aready has dep and b.jar is there so untouched.
what i have tried.
# add blindly, whatever3 is not right
cat dep.json | jq '.apps[].dep += ["b.jar"]'
# missed one level and whatever3 is gone.
cat dep.json | jq '.apps | map(select(.dep == null or (.dep | contains(["b.jar"]) | not)))[] | .dep += ["b.jar"]'
For the sake of clarity, let's define a helper function for performing the core task:
# It is assumed that the input is an object
# that either does not have the specified key or
# that it is array-valued
def ensure_has($key; $value):
if has($key) and (.[$key] | index($value)) then .
else .[$key] += [$value]
end ;
The task can now be accomplished in a straightforward way:
.apps |= map(ensure_has("dep"; "b.jar"))
Alternatively ...
.apps[] |= ensure_has("dep"; "b.jar")
after some trial and error, it looks like this is one way to do it.
cat dep.json | jq '.apps[].dep |= (. + ["b.jar"] | unique)'