I'm trying to filter and output from JSON with jq.
The API will sometime return an object and sometime an array, I want to catch the result using an if statement and return empty string when the object/array is not available.
{
"result":
{
"entry": {
"id": "207579",
"title": "Realtek Bluetooth Mesh SDK on Linux\/Android Segmented Packet reference buffer overflow",
"summary": "A vulnerability, which was classified as critical, was found in Realtek Bluetooth Mesh SDK on Linux\/Android (the affected version unknown). This affects an unknown functionality of the component Segmented Packet Handler. There is no information about possible countermeasures known. It may be suggested to replace the affected object with an alternative product.",
"details": {
"affected": "A vulnerability, which was classified as critical, was found in Realtek Bluetooth Mesh SDK on Linux\/Android (the affected version unknown).",
"vulnerability": "The manipulation of the argument reference with an unknown input leads to a unknown weakness. CWE is classifying the issue as CWE-120. The program copies an input buffer to an output buffer without verifying that the size of the input buffer is less than the size of the output buffer, leading to a buffer overflow.",
"impact": "This is going to have an impact on confidentiality, integrity, and availability.",
"countermeasure": "There is no information about possible countermeasures known. It may be suggested to replace the affected object with an alternative product."
},
"timestamp": {
"create": "1661860801",
"change": "1661861110"
},
"changelog": [
"software_argument"
]
},
"software": {
"vendor": "Realtek",
"name": "Bluetooth Mesh SDK",
"platform": [
"Linux",
"Android"
],
"component": "Segmented Packet Handler",
"argument": "reference",
"cpe": [
"cpe:\/a:realtek:bluetooth_mesh_sdk"
],
"cpe23": [
"cpe:2.3:a:realtek:bluetooth_mesh_sdk:*:*:*:*:*:*:*:*"
]
}
}
}
Would also like to to use the statement globally for the whole array output so I can parse it to .csv and escape the null, since sofware name , can also contain an array or an object. Having a global if statement with simplify the syntax result and suppress the error with ?
The error i received from bash
jq -r '.result [] | [ "https://vuldb.com/?id." + .entry.id ,.software.vendor // "empty",(.software.name | if type!="array" then [.] | join (",") else . //"empty" end )?,.software.type // "empty",(.software.platform | if type!="array" then [] else . | join (",") //"empty" end )?] | #csv' > tst.csv
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7452 0 7393 100 59 4892 39 0:00:01 0:00:01 --:--:-- 4935
jq: error (at <stdin>:182): Cannot iterate over null (null)
What I have tried is the following code which i tried to demo https://jqplay.org/ which is incorrect syntax
.result [] |( if .[] == null then // "empty" else . end
| ,software.name // "empty" ,.software.platform |if type!="array" then [.] // "empty" else . | join (",") end)
Current output
[
[
"Bluetooth Mesh SDK"
],
"Linux,Android"
]
Desired outcome
[
"Bluetooth Mesh SDK",
"empty"
]
After fixing your input JSON, I think you can get the desired output by using the following JQ filter:
if (.result | type) == "array" then . else (.result |= [.]) end \
| .result[].software | [ .name, (.platform // [ "Empty" ] | join(",")) ]
Where
if (.result | type) == "array" then . else (.result |= [.]) end
Wraps .result in an array if type isn't array
.result[].software
Loops though the software in each .result obj
[ .name, (.platform // [ "Empty" ] | join(",")) ]
Create an array with .name and .platform (which is replaced by [ "Empty" ] when it doesn't exist. Then it's join()'d to a string
Outcome:
[
"Bluetooth Mesh SDK",
"Linux,Android"
]
Online demo
Or
[
"Bluetooth Mesh SDK",
"Empty
]
Online demo
Related
I would like in JQ to substitute the values in the first array with the names in the second array based on the key "size" while keeping the length of the first array unchanged.
[
[{\"size\":0},{\"size\":1},{\"size\":2},{\"size\":3},{\"size\":4},{\"size\":5},{\"size\":6},{\"size\":7},{\"size\":8},{\"size\":9},{\"size\":0},{\"size\":1},{\"size\":2},{\"size\":3},{\"size\":4},{\"size\":5},{\"size\":6},{\"size\":7},{\"size\":8},{\"size\":0},{\"size\":1},{\"size\":2},{\"size\":3},{\"size\":4},{\"size\":5},{\"size\":6},{\"size\":7},{\"size\":8}]
[{\"size\":0,\"name\":\"6M\"},{\"size\":1,\"name\":\"6.5M\"},{\"size\":2,\"name\":\"7M\"},{\"size\":3,\"name\":\"7.5M\"},{\"size\":4,\"name\":\"8M\"},{\"size\":5,\"name\":\"8.5M\"},{\"size\":6,\"name\":\"9M\"},{\"size\":7,\"name\":\"9.5M\"},{\"size\":8,\"name\":\"10M\"},{\"size\":9,\"name\":\"11M\"}]
]
So the wanted result would look like this:
[{\"size\":6M},{\"size\":6.5M},{\"size\":7M},{\"size\":7.5M},{\"size\":8M},{\"size\":8.5M},{\"size\":9M},{\"size\":9.5M},{\"size\":10M},{\"size\":11M},{\"size\":6M},{\"size\":6.5M},{\"size\":7M},{\"size\":7.5M},{\"size\":8M},{\"size\":8.5M},{\"size\":9M},{\"size\":9.5M},{\"size\":10M},{\"size\":6M},{\"size\":6.5M},{\"size\":7M},{\"size\":7.5M},{\"size\":8M},{\"size\":8.5M},{\"size\":9M},{\"size\":9.5M},{\"size\":10M}]
I used this input to obtain the above arrays jq "[ .DATA.product.traits|([.colors.colorMap[]| {size:.sizes[]}]), ( [ .sizes.sizeMap[]| {size:.id, name}])]" and then tried using if-then-else statement (|if (.[0:-1][][].size == .[-1][].size) then [.[-1][].name ] else null end") but I was unable to reach the desired output..
Given the input
[
[
{"size":0}, {"size":1}, {"size":2}, {"size":3}, {"size":4},
{"size":5}, {"size":6}, {"size":7}, {"size":8}, {"size":9},
{"size":0}, {"size":1}, {"size":2}, {"size":3}, {"size":4},
{"size":5}, {"size":6}, {"size":7}, {"size":8}, {"size":0},
{"size":1}, {"size":2}, {"size":3}, {"size":4}, {"size":5},
{"size":6}, {"size":7}, {"size":8}
],
[
{"size":0,"name":"6M"}, {"size":1,"name":"6.5M"},
{"size":2,"name":"7M"}, {"size":3,"name":"7.5M"},
{"size":4,"name":"8M"}, {"size":5,"name":"8.5M"},
{"size":6,"name":"9M"}, {"size":7,"name":"9.5M"},
{"size":8,"name":"10M"}, {"size":9,"name":"11M"}
]
]
you can generate an INDEX over the second item, and use it for reference in a map on the first
(.[1] | INDEX(.size)) as $map | .[0] | map($map[.size | tostring])
to get
[
{"size":0,"name":"6M"}, {"size":1,"name":"6.5M"},
{"size":2,"name":"7M"}, {"size":3,"name":"7.5M"},
{"size":4,"name":"8M"}, {"size":5,"name":"8.5M"},
{"size":6,"name":"9M"}, {"size":7,"name":"9.5M"},
{"size":8,"name":"10M"}, {"size":9,"name":"11M"},
{"size":0,"name":"6M"}, {"size":1,"name":"6.5M"},
{"size":2,"name":"7M"}, {"size":3,"name":"7.5M"},
{"size":4,"name":"8M"}, {"size":5,"name":"8.5M"},
{"size":6,"name":"9M"}, {"size":7,"name":"9.5M"},
{"size":8,"name":"10M"}, {"size":0,"name":"6M"},
{"size":1,"name":"6.5M"}, {"size":2,"name":"7M"},
{"size":3,"name":"7.5M"}, {"size":4,"name":"8M"},
{"size":5,"name":"8.5M"}, {"size":6,"name":"9M"},
{"size":7,"name":"9.5M"}, {"size":8,"name":"10M"}
]
Demo
I am trying to flatten a JSON file to be able to load it into PostgreSQL all in AWS Glue. I am using PySpark. Using a crawler I crawl the S3 JSON and produce a table. I then use an ETL Glue script to:
read the crawled table
use the 'Relationalize' function to flatten the file
convert the dynamic frame to a dataframe
try to 'explode' the request.data field
Script so far:
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = glue_source_database, table_name = glue_source_table, transformation_ctx = "datasource0")
df0 = Relationalize.apply(frame = datasource0, staging_path = glue_temp_storage, name = dfc_root_table_name, transformation_ctx = "dfc")
df1 = df0.select(dfc_root_table_name)
df2 = df1.toDF()
df2 = df1.select(explode(col('`request.data`')).alias("request_data"))
<then i write df1 to a PostgreSQL database which works fine>
Issues I face:
The 'Relationalize' function works well except the request.data field which becomes a bigint and therefore 'explode' doesn't work.
Explode cannot be done without using 'Relationalize' on the JSON first due to the structure of the data. Specifically the error is: "org.apache.spark.sql.AnalysisException: cannot resolve 'explode(request.data)' due to data type mismatch: input to function explode should be array or map type, not bigint"
If I try to make the dynamic frame a dataframe first then I get this issue: "py4j.protocol.Py4JJavaError: An error occurred while calling o72.jdbc.
: java.lang.IllegalArgumentException: Can't get JDBC type for struct..."
I tried to also upload a classifier so that the data would flatten in the crawl itself but AWS confirmed this wouldn't work.
The JSON format of the original file is as follows, that I an trying to normalise:
- field1
- field2
- {}
- field3
- {}
- field4
- field5
- []
- {}
- field6
- {}
- field7
- field8
- {}
- field9
- {}
- field10
# Flatten nested df
def flatten_df(nested_df):
for col in nested_df.columns:
array_cols = [c[0] for c in nested_df.dtypes if c[1][:5] == 'array']
for col in array_cols:
nested_df =nested_df.withColumn(col, F.explode_outer(nested_df[col]))
nested_cols = [c[0] for c in nested_df.dtypes if c[1][:6] == 'struct']
if len(nested_cols) == 0:
return nested_df
flat_cols = [c[0] for c in nested_df.dtypes if c[1][:6] != 'struct']
flat_df = nested_df.select(flat_cols +
[F.col(nc+'.'+c).alias(nc+'_'+c)
for nc in nested_cols
for c in nested_df.select(nc+'.*').columns])
return flatten_df(flat_df)
df=flatten_df(df)
It will replace all dots with underscore. Note that it uses explode_outer and not explode to include Null value in case array itself is null. This function is available in spark v2.4+ only.
Also remember, exploding array will add more duplicates and overall row size will increase. Flattening struct will increase column size. In short, your original df will explode horizontally and vertically. It may slow down processing data later.
Therefore my recommendation would be to identify feature related data and store only those data in postgresql and original json files in s3.
Once you have rationalized the json column, you don't need to explode it. Relationalize transforms the nested JSON into key-value pairs at the outermost level of the JSON document. The transformed data maintains a list of the original keys from the nested JSON separated by periods.
Example :
Nested json :
{
"player": {
"username": "user1",
"characteristics": {
"race": "Human",
"class": "Warlock",
"subclass": "Dawnblade",
"power": 300,
"playercountry": "USA"
},
"arsenal": {
"kinetic": {
"name": "Sweet Business",
"type": "Auto Rifle",
"power": 300,
"element": "Kinetic"
},
"energy": {
"name": "MIDA Mini-Tool",
"type": "Submachine Gun",
"power": 300,
"element": "Solar"
},
"power": {
"name": "Play of the Game",
"type": "Grenade Launcher",
"power": 300,
"element": "Arc"
}
},
"armor": {
"head": "Eye of Another World",
"arms": "Philomath Gloves",
"chest": "Philomath Robes",
"leg": "Philomath Boots",
"classitem": "Philomath Bond"
},
"location": {
"map": "Titan",
"waypoint": "The Rig"
}
}
}
Flattened out json after rationalize :
{
"player.username": "user1",
"player.characteristics.race": "Human",
"player.characteristics.class": "Warlock",
"player.characteristics.subclass": "Dawnblade",
"player.characteristics.power": 300,
"player.characteristics.playercountry": "USA",
"player.arsenal.kinetic.name": "Sweet Business",
"player.arsenal.kinetic.type": "Auto Rifle",
"player.arsenal.kinetic.power": 300,
"player.arsenal.kinetic.element": "Kinetic",
"player.arsenal.energy.name": "MIDA Mini-Tool",
"player.arsenal.energy.type": "Submachine Gun",
"player.arsenal.energy.power": 300,
"player.arsenal.energy.element": "Solar",
"player.arsenal.power.name": "Play of the Game",
"player.arsenal.power.type": "Grenade Launcher",
"player.arsenal.power.power": 300,
"player.arsenal.power.element": "Arc",
"player.armor.head": "Eye of Another World",
"player.armor.arms": "Philomath Gloves",
"player.armor.chest": "Philomath Robes",
"player.armor.leg": "Philomath Boots",
"player.armor.classitem": "Philomath Bond",
"player.location.map": "Titan",
"player.location.waypoint": "The Rig"
}
Thus in your case, request.data is already a new column flattened out from request column and its type is interpreted as bigint by spark.
Reference : Simplify/querying nested json with the aws glue relationalize transform
I've some data in a file called myfile.json. I need to format using jq - in JSON it looks like this ;
{
"result": [
{
"service": "ebsvolume",
"name": "gtest",
"resourceIdentifier": "vol-999999999999",
"accountName": "g-test-acct",
"vendorAccountId": "12345678912",
"availabilityZone": "ap-southeast-2c",
"region": "ap-southeast-2",
"effectiveHourly": 998.56,
"totalSpend": 167.7,
"idle": 0,
"lastSeen": "2018-08-16T22:00:00Z",
"volumeType": "io1",
"state": "in-use",
"volumeSize": 180,
"iops": 2000,
"throughput": 500,
"lastAttachedTime": "2018-08-08T22:00:00Z",
"lastAttachedId": "i-086f957ee",
"recommendations": [
{
"action": "Rightsize",
"preferenceOrder": 2,
"risk": 0,
"savingsPct": 91,
"savings": 189.05,
"volumeType": "gp2",
"volumeSize": 120,
},
{
"action": "Rightsize",
"preferenceOrder": 4,
"risk": 0,
"savingsPct": 97,
"savings": 166.23,
"volumeType": "gp2",
"volumeSize": 167,
},
{
"action": "Rightsize",
"preferenceOrder": 6,
"risk": 0,
"savingsPct": 91,
"savings": 111.77,
"volumeType": "gp2",
"volumeSize": 169,
}
]
}
}
I have it formatted better with the following
jq '.result[] | [.service,.name,.resourceIdentifier,.accountName,.vendorAccountId,.availabilityZone,.region,.effectiveHourly,.totalSpend,.idle,.lastSeen,.volumeType,.state,.volumeSize,.iops,.throughput,.lastAttachedTime,.lastAttachedId] |#csv' ./myfile.json
This nets the following output ;
"\"ebsvolume\",\"gtest\",\"vol-999999999999\",\"g-test-acct\",\"12345678912\",\"ap-southeast-2c\",\"ap-southeast-2\",998.56,167.7,0,\"2018-08-16T22:00:00Z\",\"io1\",\"in-use\",180,2000,500,\"2018-08-08T22:00:00Z\",\"i-086f957ee\""
I figured out this but its not exactly what I am trying to achieve. I want to have each recommendation listed underneath on a seperate line, and not at the end of the same line.
jq '.result[] | [.service,.name,.resourceIdentifier,.accountName,.vendorAccountId,.availabilityZone,.region,.effectiveHourly,.totalSpend,.idle,.lastSeen,.volumeType,.state,.volumeSize,.iops,.throughput,.lastAttachedTime,.lastAttachedId,.recommendations[].action] |#csv' ./myfile.json
This nets :
"\"ebsvolume\",\"gtest\",\"vol-999999999999\",\"g-test-acct\",\"12345678912\",\"ap-southeast-2c\",\"ap-southeast-2\",998.56,167.7,0,\"2018-08-16T22:00:00Z\",\"io1\",\"in-use\",180,2000,500,\"2018-08-08T22:00:00Z\",\"i-086f957ee\",\"Rightsize\",\"Rightsize\",\"Rightsize\""
What I want is
"\"ebsvolume\",\"gtest\",\"vol-999999999999\",\"g-test-acct\",\"12345678912\",\"ap-southeast-2c\",\"ap-southeast-2\",998.56,167.7,0,\"2018-08-16T22:00:00Z\",\"io1\",\"in-use\",180,2000,500,\"2018-08-08T22:00:00Z\",\"i-086f957ee\",
\"Rightsize\",
\"Rightsize\",
\"Rightsize\""
So not entirely sure how to deal with the array inside the "recommendations" section in jq, I think it might be called unflattening?
You can try this:
jq '.result[] | [ flatten[] | try(.action) // . ] | #csv' file
"\"ebsvolume\",\"gtest\",\"vol-999999999999\",\"g-test-acct\",\"12345678912\",\"ap-southeast-2c\",\"ap-southeast-2\",998.56,167.7,0,\"2018-08-16T22:00:00Z\",\"io1\",\"in-use\",180,2000,500,\"2018-08-08T22:00:00Z\",\"i-086f957ee\",\"Rightsize\",\"Rightsize\",\"Rightsize\""
flatten does what it says.
try tests if .action is neither null nor false. If so, it emits its value, otherwise jq emits the other value (operator //).
The filtered values are put into an array in order to get them converted with the #csv operator.
That didn't overly work for me actually it omitted all the data in the previous array - but thanks!
I ended up with the following, granted it doesn't put the Rightsize details on a seperate line but it will have to do:
jq -r '.result[] | [.service,.name,.resourceIdentifier,.accountName,.vendorAccountId,.availabilityZone,.region,.effectiveHourly,.totalSpend,.idle,.lastSeen,.volumeType,.state,.volumeSize,.iops,.throughput,.lastAttachedTime,.lastAttachedId,.recommendations[][]] |#csv' ./myfile.json
So, I'm trying to set up check_json.pl in NagiosXI to monitor some statistics. https://github.com/c-kr/check_json
I'm using the code with the modification I submitted in pull request #32, so line numbers reflect that code.
The json query returns something like this:
[
{
"total_bytes": 123456,
"customer_name": "customer1",
"customer_id": "1",
"indices": [
{
"total_bytes": 12345,
"index": "filename1"
},
{
"total_bytes": 45678,
"index": "filename2"
},
],
"total": "765.43gb"
},
{
"total_bytes": 123456,
"customer_name": "customer2",
"customer_id": "2",
"indices": [
{
"total_bytes": 12345,
"index": "filename1"
},
{
"total_bytes": 45678,
"index": "filename2"
},
],
"total": "765.43gb"
}
]
I'm trying to monitor the sized of specific files. so a check should look something like:
/path/to/check_json.pl -u https://path/to/my/json -a "SOMETHING" -p "SOMETHING"
...where I'm trying to figure out the SOMETHINGs so that I can monitor the total_bytes of filename1 in customer2 where I know the customer_id and index but not their position in the respective arrays.
I can monitor customer1's total bytes by using the string "[0]->{'total_bytes'}" but I need to be able to specify which customer and dig deeper into file name (known) and file size (stat to monitor) AND the working query only gives me the status (OK,WARNING, or CRITICAL). Adding -p all I get are errors....
The error with -p no matter how I've been able to phrase it is always:
Not a HASH reference at ./check_json.pl line 235.
Even when I can get a valid OK from the example "[0]->{'total_bytes'}", using that in -p still gives the same error.
Links pointing to documentation on the format to use would be very helpful. Examples in the README for the script or in the -h output are failing me here. Any ideas?
I really have no idea what your question is. I'm sure I'm not alone, hence the downvotes.
Once you have the decoded json, if you have a customer_id to search for, you can do:
my ($customer_info) = grep {$_->{customer_id} eq $customer_id} #$json_response;
Regarding the error on line 235, this looks odd:
foreach my $key ($np->opts->perfvars eq '*' ? map { "{$_}"} sort keys %$json_response : split(',', $np->opts->perfvars)) {
# ....................................... ^^^^^^^^^^^^^
$perf_value = $json_response->{$key};
if perfvars eq "*", you appear to be looking for $json_reponse->{"{total}"} for example. You might want to validate the user's input:
die "no such key in json data: '$key'\n" unless exists $json_response->{$key};
This entire business of stringifying the hash ref lookups just smells bad.
A better question would look like:
I have this JSON data. How do I get the sum of total_bytes for the customer with id 1?
See https://stackoverflow.com/help/mcve
I am trying to do a parsing of a long file like this (the output of the command play in Linux):
File :1.mp3
In:0.00% 00:00:00.00 [00:03:14.51] Out:0 [ | ]
In:0.19% 00:00:00.37 [00:03:14.14] Out:16.4k [ | ]
In:0.29% 00:00:00.56 [00:03:13.95] Out:24.6k [======|======]
In:0.33% 00:00:00.65 [00:03:13.86] Out:28.7k [ =====|===== ]
In:0.43% 00:00:00.84 [00:03:13.67] Out:36.9k [ =====|===== ]
In:0.53% 00:00:01.02 [00:03:13.49] Out:45.1k [ -====|===== ]
In:0.62% 00:00:01.21 [00:03:13.30] Out:53.2k [ =====|===== ]
In:0.72% 00:00:01.39 [00:03:13.11] Out:61.4k [-=====|======]
In:0.81% 00:00:01.58 [00:03:12.93] Out:69.6k [-=====|=====-]
In:0.91% 00:00:01.76 [00:03:12.74] Out:77.8k [-=====|=====-]
In:0.96% 00:00:01.86 [00:03:12.65] Out:81.9k [ =====|===== ]
And so on
I would like to parse the percentage number.
How can i do it without saving the file into(because is too large ~ 100KB) a String.
i thought with this regular expression :"In:(\d{1,2}\.\d{2})"
how to do it?
Try this regex:
/^In:([0-9]{1,3}\.[0-9]{1,2})\%/gm
Explanation:
/
^ Matches start of string.
In: Matches "In:".
( ) Groups percentage (excl. sign).
[0-9]{1,3} Matches 1-3 (incl.) numbers.
\. Matches a dot.
[0-9]{1,2} Matches 1-2 (incl.) numbers.
\% Matches a percent sign.
/gm Allows multiple matches and makes ^ match beginning of line (not beginning of string), respectively.