Snowflake Retrieve value from Semi Structured Data - arrays

I'm trying to retrieve the health value from Snowflake semi structured data in a variant column called extra from table X.
An example of the code can be seen below:
[
{
"party":
"[{\"class\":\"Farmer\",\"gender\":\"Female\",\"ethnicity\":\"NativeAmerican\",\"health\":2},
{\"class\":\"Adventurer\",\"gender\":\"Male\",\"ethnicity\":\"White\",\"health\":3},
{\"class\":\"Farmer\",\"gender\":\"Male\",\"ethnicity\":\"White\",\"health\":0},
{\"class\":\"Banker\",\"gender\":\"Female\",\"ethnicity\":\"White\",\"health\":0}
}
]
I have tried reading the Snowflake documentation from https://community.snowflake.com/s/article/querying-semi-structured-data
I have also tried the following queries to flatten the query:
SELECT result.value:health AS PartyHealth
FROM X
WHERE value = 'Trail'
AND name = 'Completed'
AND PartyHealth > 0,
TABLE(FLATTEN(X, 'party')) result
AND
SELECT [0]['party'][0]['health'] AS Health
FROM X
WHERE value = 'Trail'
AND name = 'Completed'
AND PH > 0;
I am trying to retrieve the health value from table X from column extra which contains the the variant party, which has 4 repeating values [0-3]. Im not sure how to do this is someone able to tell me how to query semi structured data in Snowflake, considering the documentation doesn't make much sense?

First, the JSON value you posted seems wrong formatted (might be a copy paste issue).
Here's an example that works:
first your JSON formatted:
[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]
create a table to test:
CREATE OR REPLACE TABLE myvariant (v variant);
insert the JSON value into this table:
INSERT INTO myvariant SELECT PARSE_JSON('[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]');
now, to select a value you start from column name, in my case v, and as your JSON is an array inside, I specify first value [0], and from there expand, so something like this:
SELECT v[0]:party[0].health FROM myvariant;
Above gives me:
For the other rows you can simply do:
SELECT v[0]:party[1].health FROM myvariant;
SELECT v[0]:party[2].health FROM myvariant;
SELECT v[0]:party[3].health FROM myvariant;

Another option might be to make the data more like a table ... I find it easier to work with than the JSON :-)
Code at bottom - just copy/paste and it runs in Snowflake returning screenshot below.
Key Doco is Lateral Flatten
SELECT d4.path, d4.value
from
lateral flatten(input=>PARSE_JSON('[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]') ) as d ,
lateral flatten(input=> value) as d2 ,
lateral flatten(input=> d2.value) as d3 ,
lateral flatten(input=> d3.value) as d4

Related

Extracting data from JSON column defined as String

A table has a ports column (defined as VARCHAR) which has the following data:
[{u'position': 1, u'macAddress': u'00:8C:FA:C1:7C:88'}, {u'position':
2, u'macAddress': u'00:8C:FA:5E:98:81'}]
I want to extract the data from just the macAddress fields into separate rows. I tried to flatten the data in Snowflake but it is not working as the column is not defined as VARIANT and the the fields have a 'u' in front of them (this is my guess).
00:8C:FA:C3:7C:84
00:5C:FA:7E:98:87
Could someone please help with the requirement.
The provided JSON is not a valid JSON but it is possible to treat it as one with text operations and PARSE_JSON:
SELECT s.value:macAddress::TEXT AS macAddress
FROM t
,LATERAL FLATTEN(INPUT => PARSE_JSON(REPLACE(REPLACE(col, 'u''', ''''), '''', '"')))
AS s;
For input:
CREATE OR REPLACE TABLE t(col TEXT)
AS
SELECT $$[{u'position': 1, u'macAddress': u'00:8C:FA:C1:7C:88'}, {u'position': 2, u'macAddress': u'00:8C:FA:5E:98:81'}]$$;
Output:

Parsing string with multiple delimiters into columns

I want to split strings into columns.
My columns should be:
account_id, resource_type, resource_name
I have a JSON file source that I have been trying to parse via ADF data flow. That hasn't worked for me, hence I flattened the data and brought it into SQL Server (I am open to parsing values via ADF or SQL if anyone can show me how). Please check the JSON file at the bottom.
Use this code to query the data I am working with.
CREATE TABLE test.test2
(
resource_type nvarchar(max) NULL
)
INSERT INTO test.test2 ([resource_type])
VALUES
('account_id:224526257458,resource_type:buckets,resource_name:camp-stage-artifactory'),
('account_id:535533456241,resource_type:buckets,resource_name:tni-prod-diva-backups'),
('account_id:369798452057,resource_type:buckets,resource_name:369798452057-s3-manifests'),
('account_id:460085747812,resource_type:buckets,resource_name:vessel-incident-report-nonprod-accesslogs')
The output that I should be able to query in SQL Server should like this:
account_id
resource_type
resource_name
224526257458
buckets
camp-stage-artifactory
535533456241
buckets
tni-prod-diva-backups
and so forth.
Please help me out and ask for clarification if needed. Thanks in advance.
EDIT:
Source JSON Format:
{
"start_date": "2021-12-01 00:00:00+00:00",
"end_date": "2021-12-31 23:59:59+00:00",
"resource_type": "all",
"records": [
{
"directconnect_connections": [
"account_id:227148359287,resource_type:directconnect_connections,resource_name:'dxcon-fh40evn5'",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:'dxcon-ffxgf6kh'",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:'dxcon-fg5j5v6o'",
"account_id:227148359287,resource_type:directconnect_connections,resource_name:'dxcon-fgvfo1ej'"
]
},
{
"virtual_interfaces": [
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fgvj25vt'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fgbw5gs0'",
"account_id:401311080156,resource_type:virtual_interfaces,resource_name:'dxvif-ffnosohr'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fg18bdhl'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-ffmf6h64'",
"account_id:390251991779,resource_type:virtual_interfaces,resource_name:'dxvif-fgkxjhcj'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-ffp6kl3f'"
]
}
]
}
Since you don't have a valid JSON string and not wanting to get in the business of string manipulation... perhaps this will help.
Select B.*
From test2 A
Cross Apply ( Select account_id = max(case when value like 'account_id:%' then stuff(value,1,11,'') end )
,resource_type = max(case when value like 'resource_type:%' then stuff(value,1,14,'') end )
,resource_name = max(case when value like 'resource_name:%' then stuff(value,1,14,'') end )
from string_split(resource_type,',')
)B
Results
account_id resource_type resource_name
224526257458 buckets camp-stage-artifactory
535533456241 buckets tni-prod-diva-backups
369798452057 buckets 369798452057-s3-manifests
460085747812 buckets vessel-incident-report-nonprod-accesslogs
Unfortunately, the values inside the arrays are not valid JSON. You can patch them up by adding {} to the beginning/end, and adding " on either side of : and ,.
DECLARE #json nvarchar(max) = N'{
"start_date": "2021-12-01 00:00:00+00:00",
"end_date": "2021-12-31 23:59:59+00:00",
"resource_type": "all",
"records": [
{
"directconnect_connections": [
"account_id:227148359287,resource_type:directconnect_connections,resource_name:''dxcon-fh40evn5''",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:''dxcon-ffxgf6kh''",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:''dxcon-fg5j5v6o''",
"account_id:227148359287,resource_type:directconnect_connections,resource_name:''dxcon-fgvfo1ej''"
]
},
{
"virtual_interfaces": [
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fgvj25vt''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fgbw5gs0''",
"account_id:401311080156,resource_type:virtual_interfaces,resource_name:''dxvif-ffnosohr''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fg18bdhl''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-ffmf6h64''",
"account_id:390251991779,resource_type:virtual_interfaces,resource_name:''dxvif-fgkxjhcj''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-ffp6kl3f''"
]
}
]
}';
SELECT
j4.account_id,
j4.resource_type,
TRIM('''' FROM j4.resource_name) resource_name
FROM OPENJSON(#json, '$.records') j1
CROSS APPLY OPENJSON(j1.value) j2
CROSS APPLY OPENJSON(j2.value) j3
CROSS APPLY OPENJSON('{"' + REPLACE(REPLACE(j3.value, ':', '":"'), ',', '","') + '"}')
WITH (
account_id bigint,
resource_type varchar(20),
resource_name varchar(100)
) j4;
db<>fiddle
The first three calls to OPENJSON have no schema, so the resultset is three columns: key value and type. In the case of arrays (j1 and j3), key is the index into the array. In the case of single objects (j2), key is each property name.

How Can I Calculate the Average of Floats in a Nested Array in a Variant Column

I have a VARIANT column that contains a JSON response from a web service. It contains a nested array with a float value that I would like to aggregate and return as an average. Here is an example SnowSQL command that I am using:
select
value:disambiguated.id,
value:mentions
from TABLE(
FLATTEN(input =>
PARSE_JSON('{ "entities": [{"count": 2,"disambiguated": {"id": 123},"label": "Coronavirus Disease 2019","mentions": [{"confidence": 0.5928,}, {"confidence": 0.5445,}],"type": "MEDICAL"}]}'):entities
)
)
Which returns:
VALUE:DISAMBIGUATED.ID VALUE:MENTIONS
123 [ { "confidence": 0.5928 }, { "confidence": 0.5445 } ]
What I would like to return is something with the two "confidence" values averaged to 0.56825. I was able to add a second FLATTEN statement which isolated the "mentions" array and allowed me to extract each "confidence" value. I can not seem to figure out how to group the records to calculate the average. Would love to use the built in AVG() function if possible. Thank you in advance for any help you can provide.
Using your example, you can use LATERAL FLATTEN to create your required flattened fields, and then aggregate as you normally would. In this example, I'm grouping on the ID that is in the data, but you could also use y.index or z.index depending on which of those you wanted to group on for your AVG().
WITH x AS (
SELECT PARSE_JSON('{ "entities": [{"count": 2,"disambiguated": {"id": 123},"label": "Coronavirus Disease 2019","mentions": [{"confidence": 0.5928,}, {"confidence": 0.5445,}],"type": "MEDICAL"}]}') as json_str
)
SELECT
y.value:disambiguated.id as id,
avg(z.value:confidence)
from x,
LATERAL FLATTEN(input => json_str:entities) y,
LATERAL FLATTEN(input => y.value:mentions) z
GROUP BY id
;

MSSQL JSON_VALUE to match ANY Object in Array

I have a table with a JSON text field:
create table breaches(breach_id int, detail text);
insert into breaches values
( 1,'[{"breachedState": null},
{"breachedState": "PROCESS_APPLICATION",}]')
I'm trying to use MSSQL's in build JSON parsing functions to test whether ANY object in a JSON array has a matching member value.
If the detail field was a single JSON object, I could use:
select * from breaches
where JSON_VALUE(detail,'$.breachedState') = 'PROCESS_APPLICATION'
but it's an Array, and I want to know if ANY Object has breachedState = 'PROCESS_APPLICATION'
Is this possible using MSSQL's JSON functions?
You can use function OPENJSON to check each object, try this query:
select * from breaches
where exists
(
select *
from
OPENJSON (detail) d
where JSON_VALUE(value,'$.breachedState') = 'PROCESS_APPLICATION'
)
Btw, there is an extra "," in your insert query, it should be:
insert into breaches values
( 1,'[{"breachedState": null},
{"breachedState": "PROCESS_APPLICATION"}]')

How to delete array element in JSONB column based on nested key value?

How can I remove an object from an array, based on the value of one of the object's keys?
The array is nested within a parent object.
Here's a sample structure:
{
"foo1": [ { "bar1": 123, "bar2": 456 }, { "bar1": 789, "bar2": 42 } ],
"foo2": [ "some other stuff" ]
}
Can I remove an array element based on the value of bar1?
I can query based on the bar1 value using: columnname #> '{ "foo1": [ { "bar1": 123 } ]}', but I've had no luck finding a way to remove { "bar1": 123, "bar2": 456 } from foo1 while keeping everything else intact.
Thanks
Running PostgreSQL 9.6
Assuming that you want to search for a specific object with an inner object of a certain value, and that this specific object can appear anywhere in the array, you need to unpack the document and each of the arrays, test the inner sub-documents for containment and delete as appropriate, then re-assemble the array and the JSON document (untested):
SELECT id, jsonb_build_object(key, jarray)
FROM (
SELECT foo.id, foo.key, jsonb_build_array(bar.value) AS jarray
FROM ( SELECT id, key, value
FROM my_table, jsonb_each(jdoc) ) foo,
jsonb_array_elements(foo.value) AS bar (value)
WHERE NOT bar.value #> '{"bar1": 123}'::jsonb
GROUP BY 1, 2 ) x
GROUP BY 1;
Now, this may seem a little dense, so picked apart you get:
SELECT id, key, value
FROM my_table, jsonb_each(jdoc)
This uses a lateral join on your table to take the JSON document jdoc and turn it into a set of rows foo(id, key, value) where the value contains the array. The id is the primary key of your table.
Then we get:
SELECT foo.id, foo.key, jsonb_build_array(bar.value) AS jarray
FROM foo, -- abbreviated from above
jsonb_array_elements(foo.value) AS bar (value)
WHERE NOT bar.value #> '{"bar1": 123}'::jsonb
GROUP BY 1, 2
This uses another lateral join to unpack the arrays into bar(value) rows. These objects can now be searched with the containment operator to remove the objects from the result set: WHERE NOT bar.value #> '{"bar1": 123}'::jsonb. In the select list the arrays are re-assembled by id and key but now without the offending sub-documents.
Finally, in the main query the JSON documents are re-assembled:
SELECT id, jsonb_build_object(key, jarray)
FROM x -- from above
GROUP BY 1;
The important thing to understand is that PostgreSQL JSON functions only operate on the level of the JSON document that you can explicitly indicate. Usually that is the top level of the document, unless you have an explicit path to some level in the document (like {foo1, 0, bar1}, but you don't have that). At that level of operation you can then unpack to do your processing such as removing objects.

Resources