How to retrieve all child nodes from JSON file - snowflake-cloud-data-platform

How to retrieve all child nodes from JSON file - snowflake-cloud-data-platform

I have below JSON file, which is in the external stage, I'm trying to write a copy query into the table with the below query. But it's fetching a single record from the node "values" whereas I need to insert all child elements for the values node. I have loaded this file into a table with the variant datatype.
The query I'm using:
select record:batchId batchId, record:results[0].pageInfo.numberOfPages NoofPages, record:results[0].pageInfo.pageNumber pageNo,
record:results[0].pageInfo.pageSize PgSz, record:results[0].requestId requestId,record:results[0].showPopup showPopup,
record:results[0].values[0][0].columnId columnId,record:results[0].values[0][0].value val
from lease;
{
"batchId": "",
"results": [
{
"pageInfo": {
"numberOfPages": ,
"pageNumber": ,
"pageSize":
},
"requestId": "",
"showPopup": false,
"values": [
[
{
"columnId": ,
"value": ""
},
{
"columnId": ,
"value":
}
]
]
}
]
}

you need to use the LATERAL FLATTEN functions, something like this:
I created this table:
create table json_test (seq_no integer, json_text variant);
and then populated it with this JSON string:
insert into json_test(seq_no, json_text)
select 1, parse_json($${
"batchId": "a",
"results": [
{
"pageInfo": {
"numberOfPages": "1",
"pageNumber": "1",
"pageSize": "100000"
},
"requestId": "a",
"showPopup": false,
"values": [
[
{
"columnId": "4567",
"value": "2020-10-09T07:24:29.000Z"
},
{
"columnId": "4568",
"value": "2020-10-10T10:24:29.000Z"
}
]
]
}
]
}$$);
Then the following query:
select
json_text:batchId batchId
,json_text:results[0].pageInfo.numberOfPages numberOfPages
,json_text:results[0].pageInfo.pageNumber pageNumber
,json_text:results[0].pageInfo.pageSize pageSize
,json_text:results[0].requestId requestId
,json_text:results[0].showPopup showPopup
,f.value:columnId columnId
,f.value:value value
from json_test t
,lateral flatten(input => t.json_text:results[0]:values[0]) f;
gives these results - which I think is roughly what you are looking for:
BATCHID NUMBEROFPAGES PAGENUMBER PAGESIZE REQUESTID SHOWPOPUP COLUMNID VALUE
"a" "1" "1" "100000" "a" false "4567" "2020-10-09T07:24:29.000Z"
"a" "1" "1" "100000" "a" false "4568" "2020-10-10T10:24:29.000Z"

Related

PostgreSQL jsonb_set multiple elements in array

I have following jsonb structure in column recipients in a table called mailing:
[
{
"text": "Text1",
"smsId": 1,
"value": "123456",
"status": "Sent"
},
{
"text": "Text1",
"smsId": 2,
"value": "23456",
"status": "Sent"
},
{
"text": "Text1",
"smsId": 3,
"value": "345678",
"status": "Sent"
}]
I need to update one field in multiple elements, so the outcome should look like this:
[
{
"text": "Text1",
"smsId": 1,
"value": "123456",
"status": "Delivered"
},
{
"text": "Text1",
"smsId": 2,
"value": "23456",
"status": "Delivered"
},
{
"text": "Text1",
"smsId": 3,
"value": "345678",
"status": "Delivered"
}]
The most close I got to solution is this:
WITH item AS (SELECT mailing_id, ('{' || INDEX-1 || ',status}')::text[] AS PATH
FROM mailing, jsonb_array_elements(recipients) WITH ORDINALITY arr(recipient, INDEX)
WHERE recipient->>'smsId' = any(array['1', '2', '3']))
UPDATE mailing m
SET recipients = jsonb_set(recipients, item.path, '"Delivered"',FALSE)
FROM item
WHERE m.mailing_id = item.mailing_id;
But this solution updates only first row, and I am not sure if I should somehow loop this or try different approach?

You need to aggregate modified array elements with jsonb_agg():
with new_data as (
select
mailing_id,
jsonb_agg(
case when value->>'smsId' = any('{1,2,3}') then value || '{"status": "Delivered"}'
else value
end) as recipients
from mailing
cross join jsonb_array_elements(recipients)
group by mailing_id
)
update mailing m
set recipients = n.recipients
from new_data n
where m.mailing_id = n.mailing_id;
Test it in db<>fidlle.

Is it possible to get key value pairs from snowflake api instead rowType?

I'm working with an API from snowflake and to deal with the json data, I would need to receive data as key-value paired instead of rowType.
I've been searching for results but haven't found any
e.g. A table user with name and email attributes
Name
Email
Kelly
kelly#email.com
Fisher
fisher#email.com
I would request this body:
{
"statement": "SELECT * FROM user",
"timeout": 60,
"database": "DEV",
"schema": "PLACE",
"warehouse": "WH",
"role": "DEV_READER",
"bindings": {
"1": {
"type": "FIXED",
"value": "123"
}
}
}
The results would come like:
{
"resultSetMetaData": {
...
"rowType": [
{ "name": "Name",
...},
{ "name": "Email",
...}
],
},
"data": [
[
"Kelly",
"kelly#email.com"
],
[
"Fisher",
"fisher#email.com"
]
]
}
And the results needed would be:
{
"resultSetMetaData": {
...
"data": [
[
"Name":"Kelly",
"Email":"kelly#email.com"
],
[
"Name":"Fisher",
"Email":"fisher#email.com"
]
]
}
Thank you for any inputs

The output is not valid JSON, but the return can arrive in a slightly different format:
{
"resultSetMetaData": {
...
"data":
[
{
"Name": "Kelly",
"Email": "kelly#email.com"
},
{
"Name": "Fisher",
"Email": "fisher#email.com"
}
]
}
}
To get the API to send it that way, you can change the SQL from select * to:
select object_construct(*) as KVP from "USER";
You can also specify the names of the keys using:
select object_construct('NAME', "NAME", 'EMAIL', EMAIL) from "USER";
The object_construct function takes an arbitrary number of parameters, as long as they're even, so:
object_construct('KEY1', VALUE1, 'KEY2', VALUE2, <'KEY_N'>, <VALUE_N>)

I'm attempting to parse json data from zendesk using v: structure

With standard fields, like id, this works perfectly. But I am not finding a way to parse the custom fields where the structure is
"custom_fields": [
{
"id": 57852188,
"value": ""
},
{
"id": 57522467,
"value": ""
},
{
"id": 57522487,
"value": ""
}
]
The general format that I have been using is:
Select v:id,v:updatedat
from zd_tickets
updated data:
{
"id":151693,
"brand_id": 36000,
"created_at": "2022-0523T19:26:35Z",
"custom_fields": [
{ "id": 57866008, "value": false },
{ "id": 360022282754, "value": "" },
{ "id": 80814087, "value": "NC" } ],
"group_id": 36000770
}

If you want to select all repeating elements you will need to use FLATTEN, otherwise you can use standard notation. This is all documented here: https://docs.snowflake.com/en/user-guide/querying-semistructured.html#retrieving-a-single-instance-of-a-repeating-element

So using this CTE to access the data in a way that look like a table:
with data(json) as (
select parse_json(column1) from values
('{
"id":151693,
"brand_id": 36000,
"created_at": "2022-0523T19:26:35Z",
"custom_fields": [
{ "id": 57866008, "value": false },
{ "id": 360022282754, "value": "" },
{ "id": 80814087, "value": "NC" } ],
"group_id": 36000770
} ')
)
SQL to unpack the top level items, as you have shown you have working:
select
json:id::number as id
,json:brand_id::number as brand_id
,try_to_timestamp(json:created_at::text, 'yyyy-mmddThh:mi:ssZ') as created_at
,json:custom_fields as custom_fields
from data;
gives:
ID
BRAND_ID
CREATED_AT
CUSTOM_FIELDS
151693
36000
2022-05-23 19:26:35.000
[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]
So now how to tackle that json/array of custom_fields..
Well if you only ever have 3 values, and the order is always the same..
select
to_array(json:custom_fields) as custom_fields_a
,custom_fields_a[0] as field_0
,custom_fields_a[1] as field_1
,custom_fields_a[2] as field_2
from data;
gives:
CUSTOM_FIELDS_A
FIELD_0
FIELD_1
FIELD_2
[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]
{ "id": 57866008, "value": false }
{ "id": 360022282754, "value": "" }
{ "id": 80814087, "value": "NC" }
so we can use flatten to access those objects, which makes "more rows"
select
d.json:id::number as id
,d.json:brand_id::number as brand_id
,try_to_timestamp(d.json:created_at::text, 'yyyy-mmddThh:mi:ssZ') as created_at
,f.*
from data as d
,table(flatten(input=>json:custom_fields)) f
ID
BRAND_ID
CREATED_AT
SEQ
KEY
PATH
INDEX
VALUE
THIS
151693
36000
2022-05-23 19:26:35.000
1
[0]
0
{ "id": 57866008, "value": false }
[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]
151693
36000
2022-05-23 19:26:35.000
1
[1]
1
{ "id": 360022282754, "value": "" }
[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]
151693
36000
2022-05-23 19:26:35.000
1
[2]
2
{ "id": 80814087, "value": "NC" }
[ { "id": 57866008, "value": false }, { "id": 360022282754, "value": "" }, { "id": 80814087, "value": "NC" } ]
So we can pull out know values (a manual PIVOT)
select
d.json:id::number as id
,d.json:brand_id::number as brand_id
,try_to_timestamp(d.json:created_at::text, 'yyyy-mmddThh:mi:ssZ') as created_at
,max(iff(f.value:id=80814087, f.value:value::text, null)) as v80814087
,max(iff(f.value:id=360022282754, f.value:value::text, null)) as v360022282754
,max(iff(f.value:id=57866008, f.value:value::text, null)) as v57866008
from data as d
,table(flatten(input=>json:custom_fields)) f
group by 1,2,3, f.seq
grouping by the f.seq means if you have many "rows" of input these will be kept apart, even if they share common values for 1,2,3
gives:
ID
BRAND_ID
CREATED_AT
V80814087
V360022282754
V57866008
151693
36000
2022-05-23 19:26:35.000
NC
<empty string>
false
Now if you do not know the names of the values, there is no way short of dynamic SQL and double parsing to turns rows into columns.

I ended up doing the following, with 2 different CTEs (CTE and UCF):
Used to_array to gather my custom fields
Unioned the custom fields together twice; once for the id of the field and once for the value (and used combinations of substring, position and replace to clean up data as needed (same setup for all fields)
Joined the resulting data to a Custom Fields Table (contains the id and a name) to include the name of the custom field in my result set.
WITH UCF AS (--Union Gathered Array into 2 fields (an id field and a value field)
WITH CTE AS( ---Gather array of custom fields
SELECT v:id as id,
to_array(v:custom_fields) as cf
,cf[0] as f0,cf[1] as f1,cf[2] as f2
FROM ZD_TICKETS)
SELECT id,
substring(f0,7,position(',',f0)-7) AS cf_id, REPLACE(substring(f0,position('value":',f0)+8,position('"',f0,position('value":',f0)+8)),'"}') AS cf_value
FROM CTE c
WHERE f0 not like '%null%'
UNION
SELECT id,
substring(f1,7,position(',',f1)-7) AS cf_id,
REPLACE(substring(f1,position('value":',f1)+8,position('"',f1,position('value":',f1)+8)),'"}') AS cf_value
FROM CTE c
WHERE f1 not like '%null%'
-- field 3
UNION
SELECT id,
substring(f2,7,position(',',f2)-7) AS cf_id,
REPLACE(substring(f2,position('value":',f2)+8,position('"',f2,position('value":',f2)+8)),'"}') AS cf_value
FROM CTE c
WHERE f2 not like '%null%' --this removes records where the value is null
)
SELECT UCF.*,CFD.name FROM UCF
LEFT OUTER JOIN "FLBUSINESS_DB"."STAGING"."FILE_ZD_CUSTOM_FIELD_IDS" CFD
ON CFD.id=UCF.cf_id
WHERE cf_value<>'' --this removes records where the value is blank
The result set looks like:

Loading JSON data into snowpipe

we have below Valid JSON data which resides in S3 and we are trying load this data into snowflake table by snowpipe .
"Vendor": {
"string": "ABC"
},
"vmAddresses": [{
"Address": {
"string": "addr1"
},
"Category": {
"string": "order"
}
]
SELECT $1:Vendor.string::varchar,
$1:vmAddresses[0].Address.string,
object_keys($1:vmAddresses[0]),
object_pick($1:vmAddresses[0],'Address', 'Category')
FROM #S3://20210310194308.json
with OBJECT_KEYS we are able to get the keys but unable to get the corresponding value of it . the below format is what we are trying to get
{
"Address": "addr1",
"Category": "order"
}
Any help would be appreciated.

When I tried to validate your sample text trough parse_json and an online json formatter, both of them complained about invalid JSON. I corrected it, and run your SQL:
with json_data as (
select parse_json( '{ "Vendor": {"string": "ABC" }, "vmAddresses": [ { "Address": { "string": "addr1" }, "Category": { "string": "order" } } ] }' ) j)
select j:Vendor.string,
j:vmAddresses[0].Address.string,
object_keys(j:vmAddresses[0]),
object_pick(j:vmAddresses[0],'Address', 'Category')
from json_data;
And it worked as expected:
j:vmAddresses[0].Address.string <-- returns "addr1"
object_keys(j:vmAddresses[0]) <-- returns [ "Address", "Category" ]
j:vmAddresses[0] or object_pick(j:vmAddresses[0],'Address', 'Category') <-- returns
{"Address": { "string": "addr1" }, "Category": { "string": "order" } }
Which value are you trying to parse? Everything seems working.
Additional answers based on comment:
You can use object_construct to build the JSON after reading the values with the vmAddresses[0].Address.string notation:
with json_data as (
select parse_json( '{ "Vendor": {"string": "ABC" }, "vmAddresses": [ { "Address": { "string": "addr1" }, "Category": { "string": "order" } } ] }' ) j)
select OBJECT_CONSTRUCT( 'Address', j:vmAddresses[0].Address.string, 'Category', j:vmAddresses[0].Category.string )
from json_data;

Array within Element within Array in Variant

How can I get the data out of this array stored in a variant column in Snowflake. I don't care if it's a new table, a view or a query. There is a second column of type varchar(256) that contains a unique ID.
If you can just help me read the "confirmed" data and the "editorIds" data I can probably take it from there. Many thanks!
Output example would be
UniqueID ConfirmationID EditorID
u3kd9 xxxx-436a-a2d7 nupd
u3kd9 xxxx-436a-a2d7 9l34c
R3nDo xxxx-436a-a3e4 5rnj
yP48a xxxx-436a-a477 jTpz8
yP48a xxxx-436a-a477 nupd
[
{
"confirmed": {
"Confirmation": "Entry ID=xxxx-436a-a2d7-3525158332f0: Confirmed order submitted.",
"ConfirmationID": "xxxx-436a-a2d7-3525158332f0",
"ConfirmedOrders": 1,
"Received": "8/29/2019 4:31:11 PM Central Time"
},
"editorIds": [
"xxsJYgWDENLoX",
"JR9bWcGwbaymm3a8v",
"JxncJrdpeFJeWsTbT"
] ,
"id": "xxxxx5AvGgeSHy8Ms6Ytyc-1",
"messages": [],
"orderJson": {
"EntryID": "xxxxx5AvGgeSHy8Ms6Ytyc-1",
"Orders": [
{
"DropShipFlag": 1,
"FromAddressValue": 1,
"OrderAttributes": [
{
"AttributeUID": 548
},
{
"AttributeUID": 553
},
{
"AttributeUID": 2418
}
],
"OrderItems": [
{
"EditorId": "aC3f5HsJYgWDENLoX",
"ItemAssets": [
{
"AssetPath": "https://xxxx573043eac521.png",
"DP2NodeID": "10000",
"ImageHash": "000000000000000FFFFFFFFFFFFFFFFF",
"ImageRotation": 0,
"OffsetX": 50,
"OffsetY": 50,
"PrintedFileName": "aC3f5HsJYgWDENLoX-10000",
"X": 50,
"Y": 52.03909266409266,
"ZoomX": 100,
"ZoomY": 93.75
}
],
"ItemAttributes": [
{
"AttributeUID": 2105
},
{
"AttributeUID": 125
}
],
"ItemBookAttribute": null,
"ProductUID": 52,
"Quantity": 1
}
],
"SendNotificationEmailToAccount": true,
"SequenceNumber": 1,
"ShipToAddress": {
"Addr1": "Addr1",
"Addr2": "0",
"City": "City",
"Country": "US",
"Name": "Name",
"State": "ST",
"Zip": "00000"
}
}
]
},
"orderNumber": null,
"status": "order_placed",
"submitted": {
"Account": "350000",
"ConfirmationID": "xxxxx-436a-a2d7-3525158332f0",
"EntryID": "xxxxx-5AvGgeSHy8Ms6Ytyc-1",
"Key": "D83590AFF0CC0000B54B",
"NumberOfOrders": 1,
"Orders": [
{
"LineItems": [],
"Note": "",
"Products": [
{
"Price": "00.30",
"ProductDescription": "xxxxxint 8x10",
"Quantity": 1
},
{
"Price": "00.40",
"ProductDescription": "xxxxxut Black 8x10",
"Quantity": 1
},
{
"Price": "00.50",
"ProductDescription": "xxxxx"
},
{
"Price": "00.50",
"ProductDescription": "xxxscount",
"Quantity": 1
}
],
"SequenceNumber": "1",
"SubTotal": "00.70",
"Tax": "1.01",
"Total": "00.71"
}
],
"Received": "8/29/2019 4:31:10 PM Central Time"
},
"tracking": null,
"updatedOn": 1.598736670503000e+12
}
]

So, this is how I'd query that exact JSON assuming the data is in column var in table x:
SELECT x.var[0]:confirmed:ConfirmationID::varchar as ConfirmationID,
f.value::varchar as EditorID
FROM x,
LATERAL FLATTEN(input => var[0]:editorIds) f
;
Since your sample output doesn't match the JSON that you provided, I will assume that this is what you need.
Also, as a note, your JSON includes outer [ ] which indicates that the entire JSON string is inside an array. This is the reason for var[0] in my query. If you have multiple records inside that array, then you should remove that. In general, you should exclude those and instead load each record into the table separately. I wasn't sure whether you could make that change, so I just wanted to make note.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to retrieve all child nodes from JSON file - snowflake-cloud-data-platform

Related

PostgreSQL jsonb_set multiple elements in array

Is it possible to get key value pairs from snowflake api instead rowType?

I'm attempting to parse json data from zendesk using v: structure

Loading JSON data into snowpipe

Array within Element within Array in Variant

Categories

Resources