JQ - Groupby and join using mapping - arrays

Submitted my first question and had hoped to apply it to the bigger JSON file but I am just not getting it.
Using JQ I am trying to turn this JSON:
[{"field": "F1","results": [{"details": [
{"name": "P1","matches": [
{"displayName": "User1","smtpAddress": "user1#foo.bar"},
{"displayName": "User2","smtpAddress": "user2#foo.bar"}
]
},
{"name": "P2","matches": [
{"displayName": "User3","smtpAddress": "user3#foo.bar"},
{"displayName": "User4","smtpAddress": "user4#foo.bar"}
]
}]}]},
{"field": "F2","results": [{"details": [
{"name": "P3","matches": [
{"displayName": "User1","smtpAddress": "user1#foo.bar"},
{"displayName": "User5","smtpAddress": "user5#foo.bar"}
]
},
{"name": "P4","matches": [
{"displayName": "User6","smtpAddress": "user6#foo.bar"},
{"displayName": "User7","smtpAddress": "user7#foo.bar"}
]
}]}]}]
into CSV like this.
"F1","P1","User1 <user1#foo.bar>;User2 <user2#foo.bar>"
"F1","P2","User3 <user3#foo.bar>;User4 <user4#foo.bar>"
"F2","P3","User1 <user1#foo.bar>;User5 <user5#foo.bar>"
"F2","P4","User6 <user6#foo.bar>;User7 <user7#foo.bar>"
Cannot get the sub nested array to be respected by MAP. Any explanation is appreciated.

jq -r '.[]
| .field as $field
| (.results[].details[]
| [$field, .name] +
[([.matches[] | "\(.displayName) <\(.smtpAddress)>"] | join(";")) ])
| #csv'

Related

How to filter jCal with jq?

I have an jCal JSON array which I'd like to filter with jq. JSON arrays are somewhat new to me and I have been banging my head to the wall on this for hours...
The file looks like this:
[
"vcalendar",
[
[
"calscale",
{},
"text",
"GREGORIAN"
],
[
"version",
{},
"text",
"2.0"
],
[
"prodid",
{},
"text",
"-//SabreDAV//SabreDAV//EN"
],
[
"x-wr-calname",
{},
"unknown",
"Call log private"
],
[
"x-apple-calendar-color",
{},
"unknown",
"#ffaa00"
],
[
"refresh-interval",
{},
"duration",
"PT4H"
],
[
"x-published-ttl",
{},
"unknown",
"PT4H"
]
],
[
[
"vevent",
[
[
"dtstamp",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"created",
{},
"date-time",
"2015-02-18T16:44:04Z"
],
[
"uid",
{},
"text",
"9b23142b-8d86-3e17-2f44-2bed65b2e471"
],
[
"last-modified",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"description",
{},
"text",
"Phone call to +49xxxxxxxxxx lasted for 0 seconds."
],
[
"summary",
{},
"text",
"Outgoing: +49xxxxxxx"
],
[
"dtstart",
{},
"date-time",
"2015-02-18T10:58:12Z"
],
[
"dtend",
{},
"date-time",
"2015-02-18T10:58:44Z"
],
[
"transp",
{},
"text",
"OPAQUE"
]
],
[]
],
[
"vevent",
[
[
"dtstamp",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"created",
{},
"date-time",
"2015-01-09T19:12:05Z"
],
[
"uid",
{},
"text",
"c337e092-a012-5f5a-497f-932fbc6159e5"
],
[
"last-modified",
{},
"date-time",
"2015-04-05T16:42:10Z"
],
[
"description",
{},
"text",
"Phone call to +1xxxxxxxxxx lasted for 39 seconds."
],
[
"summary",
{},
"text",
"Outgoing: +1xxxxxxxxxx"
],
[
"dtstart",
{},
"date-time",
"2015-01-09T17:23:16Z"
],
[
"dtend",
{},
"date-time",
"2015-01-09T17:24:19Z"
],
[
"transp",
{},
"text",
"OPAQUE"
]
],
[]
],
]
]
I would like to filter out dtstart, dtend, the target phone number and the connection duration from the description for each vevent which was created e.g. in January 2019 ("2019-01.*") and output them as a CSV.
This JSON is a bit strange because the information is stored position-based in an array instead of an object.
Using the first element of an array ("vevent") to identify its contents is not the best practice.
But anyway ... if this is the data source you are dealing with, this code should help you.
jq -r '..
| arrays
| select(.[0] == "vevent")[1]
| [
(.[] | select(.[0] == "dtstart") | .[3]),
(.[] | select(.[0] == "dtend") | .[3]),
(.[] | select(.[0] == "description") | .[3])
]
| #csv
'
Alternatively, the repeating code can be transferred into a function
jq -r 'def getField($name; $idx): .[] | select(.[0] == $name) | .[$idx];
..
| arrays
| select(.[0] == "vevent")[1]
| [ getField("dtstart"; 3), getField("dtend"; 3), getField("description"; 3) ]
| #csv
'
Output
"2015-02-18T10:58:12Z","2015-02-18T10:58:44Z","Phone call to +49xxxxxxxxxx lasted for 0 seconds."
"2015-01-09T17:23:16Z","2015-01-09T17:24:19Z","Phone call to +1xxxxxxxxxx lasted for 39 seconds."
You can also extract phone number and duration with the help of regular expressions in jq:
jq -r 'def getField($name; $idx): .[] | select(.[0] == $name) | .[$idx];
..
| arrays
| select(.[0] == "vevent")[1]
| [
getField("dtstart"; 3),
getField("dtend"; 3),
(getField("description"; 3) | match("call to ([^ ]*)") | .captures[0].string),
(getField("description"; 3) | match("(\\d+) seconds") | .captures[0].string)
]
| #csv
'
Output
"2015-02-18T10:58:12Z","2015-02-18T10:58:44Z","+49xxxxxxxxxx","0"
"2015-01-09T17:23:16Z","2015-01-09T17:24:19Z","+1xxxxxxxxxx","39"
Not the most efficient solution, but quite understandable by first building an object out of key-value pairs and then filtering and transforming those.
.[2][][1] is a stream of events encoded as arrays.
Which means that:
.[2][][1]
| map({key:.[0], value:.[3]})
| from_entries
the above gives you a stream of objects; one object per event:
{
"dtstamp": "2015-04-05T16:42:10Z",
"created": "2015-02-18T16:44:04Z",
"uid": "9b23142b-8d86-3e17-2f44-2bed65b2e471",
"last-modified": "2015-04-05T16:42:10Z",
"description": "Phone call to +49xxxxxxxxxx lasted for 0 seconds.",
"summary": "Outgoing: +49xxxxxxx",
"dtstart": "2015-02-18T10:58:12Z",
"dtend": "2015-02-18T10:58:44Z",
"transp": "OPAQUE"
}
{
"dtstamp": "2015-04-05T16:42:10Z",
"created": "2015-01-09T19:12:05Z",
"uid": "c337e092-a012-5f5a-497f-932fbc6159e5",
"last-modified": "2015-04-05T16:42:10Z",
"description": "Phone call to +1xxxxxxxxxx lasted for 39 seconds.",
"summary": "Outgoing: +1xxxxxxxxxx",
"dtstart": "2015-01-09T17:23:16Z",
"dtend": "2015-01-09T17:24:19Z",
"transp": "OPAQUE"
}
Now plug that into the final program: select the wanted objects, add CSV headers, build the rows and ultimately convert to CSV:
["start", "end", "description"],
(
.[2][][1]
| map({key:.[0], value:.[3]})
| from_entries
| select(.created | startswith("2015-01"))
| [.dtstart, .dtend, .description]
)
| #csv
Raw output (-r):
"start","end","description"
"2015-01-09T17:23:16Z","2015-01-09T17:24:19Z","Phone call to +1xxxxxxxxxx lasted for 39 seconds."
If you need to further transform .description, you can use split or capture. Or use a different property, such as .summary, in your CSV rows. Only a single line needs to be changed.

JQ - access nested square brackets with fields with no names

trying to access a field in the list array via jq. The fields doesnt have a name for me to gain access to and extract. Please assist?
Trying to extract John and Smith.
$ cat test.txt
{
"content": {
"list": [
[
[
"name",
"John",
123
],
[
"surname",
"Smith",
345
],
1
]
]
}
}
$ jq -r '.content | {name: ."list"}' test.txt
{
"name": [
[
[
"name",
"John",
123
],
[
"surname",
"Smith",
345
],
1
]
]
}
You could do something as naive as:
$ jq -r '.content.list[][][1]?' test.json
John
Smith
Which will extract the second field from the array third nested arrays, and ignore the numeric literal.
Alternative you could manipulate the data before-hand to make it easier to manipulate afterwards:
$ jq '.content.list | map(map({ (.[0]): .[1] }?) | add)'
[
{
"name": "John",
"surname": "Smith"
}
]
Extracting the name(s) would be as simple as just using | [].name:
$ jq '.content.list | map(map({ (.[0]): .[1] }?) | add) | .[].name'
"John"

Flatten a hierarchical JSON array using JQ

Can anyone help me get the correct jq command to flatten the below example? I've seen a few other posts and I'm hacking away at it but can't seem to get it. I'd greatly appreciate any help.
Input:
[
{
"name": "level1",
"children": [
{
"name": "level2",
"children": [
{
"name": "level3-1",
"children": []
},
{
"name": "level3-2",
"children": []
}
]
}
]
}
]
Output:
[
{
"displayName": "level1",
"parent": ""
},
{
"displayName": "level2",
"parent": "level1"
},
{
"displayName": "level3-1",
"parent": "level2"
},
{
"displayName": "level3-2",
"parent": "level2"
}
]
Here's a straightforward solution that does not involve a helper function and actually solves a more general problem. It is based on the idea of beginning by adding a "parent" key to each child, and then using .. to collect all the name/parent pairs.
So first consider:
[ walk(if type=="object" and has("children")
then .name as $n | .children |= map(.parent = $n)
else . end)
| ..
| select(type=="object" and has("name"))
| {displayName: .name, parent}
]
This meets the requirements except that for the top-level (parentless) object, it produces a .parent value of null. That would generally be more JSON-esque than "", but if the empty string is really required, one has simply to replace the last non-trivial line above by:
| {displayName: .name, parent: (.parent // "")}
With a simple recursive function:
def f: .name as $parent | .children[] | {$parent, displayName: .name}, f;
[ {name: "", children: .} | f ]
Online demo

JQ - return one array for multiple nested JSON arrays

I have a JSON structure that has repeated keys per message. I would like to combine these into one array per message.
[
{
"id": 1,
"PolicyItems": [
{
"accesses": [
{
"isAllowed": true,
"type": "drop"
},
{
"isAllowed": true,
"type": "select"
}
],
"groups": [],
"users": ["admin"]
}
]
},
{
"id": 2,
"PolicyItems": [
{
"accesses": [
{
"isAllowed": true,
"type": "drop"
}
{
"isAllowed": true,
"type": "update"
}
],
"groups": [],
"users": [
"admin",
"admin2"
]
}
]
}]
I have this:
cat ranger_v2.json | jq -r '[.[] | {"id", "access_type":(.policyItems[].accesses[] | .type)}]'
But this outputs:
[
{
"id": 1,
"access_type": "drop"
},
{
"id": 1,
"access_type": "select"
},
{
"id": 2,
"access_type": "drop"
},
{
"id": 2,
"access_type": "update"
}
]
However, what I want is to output:
[{
"id": 1,
"access_type": ["drop|select"]
},
{
"id": 2,
"access_type": ["drop|update"]
}]
Any ideas how I could do this? I'm a bit stumped!
The values could be 'drop' and 'select', but equally could be anything, so I don't want to hard code these.
Let's start by observing that with your input, the filter:
.[]
| {id, access_type: [.PolicyItems[].accesses[].type]}
produces the two objects:
{
"id": 1,
"access_type": [
"drop",
"select"
]
}
{
"id": 2,
"access_type": [
"drop",
"update"
]
}
Now it's a simple matter to tweak the above filter so as to produce the desired format:
[.[]
| {id, access_type: [.PolicyItems[].accesses[].type]}
| .access_type |= [join("|")] ]
Or equivalently, the one-liner:
map({id, access_type: [[.PolicyItems[].accesses[].type] | join("|")]})
I found something that I can work with.
If I wrap the query with []...
cat ranger_v2.json | jq -r '[.[] | {"id", "access_type":([.policyItems[].accesses[] | .type])}]'
... it produces this type of output:
[
{
"id": 1,
"access_type": ["drop","select"]
},
{
"id": 2,
"access_type": ["drop","update"]
}
]
I can then use the following:
(if (."access_type" | length > 0 ) then . else ."access_type" = [""] end )]
and
(."access_type" | #tsv)
Before I can convert to #csv and use sed to replace the tab with a pipe.
#csv' | sed -e "s/[\t]\+/|/g"
It may not be the most economical way of getting what I need, but it works for me. (Please let me know if there's a better way of doing it.)
cat ranger_v2.json | jq -r '[.[] | {"id", "access_type":([.policyItems[].accesses[] | .type])}] | .[] | [(if (."access_type" | length > 0 ) then . else ."access_type" = [""] end )] | .[] | [.id, (."access_type" | #tsv)] | #csv' | sed -e "s/[\t]\+/|/g"

how to add an element to a list only when it is not exists already if the list is null create one?

input
{
"apps": [
{
"name": "whatever1",
"id": "ID1"
},
{
"name": "whatever2",
"id": "ID2",
"dep": [
"a.jar"
]
},
{
"name": "whatever3",
"id": "ID3",
"dep": [
"a.jar",
"b.jar"
]
}
]
}
output
{
"apps": [
{
"name": "whatever1",
"id": "ID1",
"dep": [
"b.jar"
]
},
{
"name": "whatever2",
"id": "ID2",
"dep": [
"a.jar",
"b.jar"
]
},
{
"name": "whatever3",
"id": "ID3",
"dep": [
"a.jar",
"b.jar"
]
}
]
}
in the above example
whatever1 does not have dep, so create one.
whatever2 has dep and does not have b.jar, so add b.jar
whatever3 aready has dep and b.jar is there so untouched.
what i have tried.
# add blindly, whatever3 is not right
cat dep.json | jq '.apps[].dep += ["b.jar"]'
# missed one level and whatever3 is gone.
cat dep.json | jq '.apps | map(select(.dep == null or (.dep | contains(["b.jar"]) | not)))[] | .dep += ["b.jar"]'
For the sake of clarity, let's define a helper function for performing the core task:
# It is assumed that the input is an object
# that either does not have the specified key or
# that it is array-valued
def ensure_has($key; $value):
if has($key) and (.[$key] | index($value)) then .
else .[$key] += [$value]
end ;
The task can now be accomplished in a straightforward way:
.apps |= map(ensure_has("dep"; "b.jar"))
Alternatively ...
.apps[] |= ensure_has("dep"; "b.jar")
after some trial and error, it looks like this is one way to do it.
cat dep.json | jq '.apps[].dep |= (. + ["b.jar"] | unique)'

Resources