Querying variant data in Snowflake - snowflake-cloud-data-platform

Here is the data variant source table I am using in my example. I want to make a query to parse this data into a table in snowflake from a variant src.
{
"col1": bool,
"col2": null,
"col3": "datetime",
"col4": int,
"col5": "string",
"col6": "string",
"array": [
{
"x": bool,
"y": null,
"v": "datetime",
"z": int,
"w": "string",
"q": "string",
"obj": {
"a": "bool",
"b": "float"
},
"col7": "datetime"
}
]
}
-- Here what I tried
SELECT
src:col1::string as col1,
src:col2::string as col2,
src:col3::string as col3,
src:col4::string as col4,
src:col5::string as col5,
src:col6::string as col6,
s.value:x::string as S_x,
s.value:y::string as s_y,
s.value:v::string as s_v,
s.value:z::string as s_z,
s.value:w::string as s_w,
s.value:q::string as s_q,
s.value:obj.value:a::string as s_obj_a,
s.value:obj.value:b::string as s_obj_b,
src:col7::string as col7
FROM tblvariant
, table(flatten(src:s)) s
;
Everything is working except that these two columns (a, b) are null while they should contain their data.
Any suggestion?
Many thanks!

Your sample JSON does not match with your SQL. Where are "stages" and "metadata"? Anyway, the problem seems about extra "value" keyword.
create or replace table tblvariant ( src variant )
as select parse_json ('
{
"col1": "bool",
"col2": null,
"col3": "datetime",
"col4": "int",
"col5": "string",
"col6": "string",
"stages": [
{
"x": "bool",
"y": null,
"v": "datetime",
"z": "int",
"w": "string",
"q": "string",
"obj": {
"a": "bool",
"b": "float"
},
"col7": "datetime"
}
]
}' );
As you see, I modified your sample JSON and renamed "array" to "stages" (according to your SQL). This SQL retrieves values of a and b:
SELECT
src:col1::string as col1,
src:col2::string as col2,
src:col3::string as col3,
src:col4::string as col4,
src:col5::string as col5,
src:col6::string as col6,
s.value:x::string as S_x,
s.value:y::string as s_y,
s.value:v::string as s_v,
s.value:z::string as s_z,
s.value:w::string as s_w,
s.value:q::string as s_q,
s.value:obj.a::string as s_obj_a,
s.value:obj.b::string as s_obj_b,
src:col7::string as col7
FROM tblvariant
, table(flatten(src:stages)) s
-- , table(flatten(s.value:metadata)) m
;

s.value:obj .value:a ::string as s_obj_a,
s.value:obj .value:b ::string as s_obj_b,
Accessing an object's keys can be done with the use of the dot (.) notation. You do not need to use the GET_PATH (:) operator to gain access to those fields:
s.value:metadata.a::string as s_m_a,
s.value:metadata.b::string as s_m_b,
You also do not need to run a second FLATTEN over the metadata object within your stages array, unless you truly need one exclusive row per metadata key, assuming metadata is an object type and not a nested array. If you just want to extract the values out into the same level as each array row, just using the above should suffice.

Related

Postgres Jsonb — Add key and value to objects in a jsonb array if they don't exist based on a condition

I have a table, let's call it myTable with the following structure
org
data
text
jsonb
The data in the jsonb field is an array structured in the following way:
[
{
"type": "XYZ",
"valueA": "500",
"valueB": "ABC",
},
{
"type": "ABC",
"valueA": "300",
"valueB": "CDE",
}
]
What I want to do is transform that data by checking if an object in the array has a "type" key with a value of "XYZ" and if true, adding a valueC key.
valueC's value will be an array of strings. The values of the array will depend on the value of the org column.
I want to do this for all rows such that if a specific org is present, and the jsonb array in the data column contains an object with "type": "XYZ", then I get this result:
[
{
"type": "XYZ",
"valueA": "500",
"valueB": "ABC",
"valueC": ["SOMETHING"],
},
{
"type": "ABC",
"valueA": "300",
"valueB": "CDE",
}
]
I also want to ensure this script only runs if valueC is not present in the object that matches the conditions, so it is not re-run during a migration/rollback unless needed.

Importing JSON file and discover key names with OUTER APPLY

I'm trying to bulk import a JSON file with exercise data from Fitbit. Before I insert the data into a table, I want to find all the distinct key names used across the entire file.
TL;DR: How do I "collapse" the OUTER APPLY results below into a single set of distinct keys?
declare
#json nvarchar(max) = '[
{
"logId": 5687739287,
"activityName": "Walk",
"activityTypeId": 90013,
"averageHeartRate": 100,
"calories": 140,
"duration": 1178000,
"activeDuration": 1178000,
"steps": 1584,
"logType": "auto_detected",
"manualValuesSpecified": {
"calories": false,
"distance": false,
"steps": false
},
"lastModified": "01/21/17 15:14:05",
"startTime": "01/20/17 20:07:43",
"originalStartTime": "01/20/17 20:07:43",
"originalDuration": 1178000,
"elevationGain": 0.0,
"hasGps": false,
"shouldFetchDetails": false,
"hasActiveZoneMinutes": false
},
{
"logId": 8704352278,
"activityName": "Bike",
"activityTypeId": 90001,
"averageHeartRate": 147,
"calories": 742,
"distance": 10.955718,
"distanceUnit": "Mile",
"duration": 3823000,
"activeDuration": 3579000,
"source": {
"type": "tracker",
"name": "Charge 2",
"id": "86599831",
"url": "https://www.fitbit.com/",
"trackerFeatures": [
"HEARTRATE",
"GPS",
"DISTANCE",
"CALORIES",
"SPEED",
"ELEVATION"
]
},
"logType": "tracker",
"manualValuesSpecified": {
"calories": false,
"distance": false,
"steps": false
},
"tcxLink": "REDACTED",
"speed": 11.020001341156748,
"lastModified": "07/10/17 01:05:32",
"startTime": "07/09/17 23:53:39",
"originalStartTime": "07/09/17 23:53:39",
"originalDuration": 3823000,
"elevationGain": 497.998688,
"hasGps": true,
"shouldFetchDetails": true,
"hasActiveZoneMinutes": false
}
]';
IF OBJECT_ID('tempdb..#exercise') IS NOT NULL DROP TABLE #exercise
SELECT activity.*
FROM OPENJSON (#json)
WITH(
logId bigint
,activityName varchar(max)
,activityTypeId int
,source nvarchar(max) as JSON
,averageHeartRate int
/*
????
not all keys are known
????
*/
) AS activity
/*
I cannot take credit for this trick.
It shows me all the keys, BUT
it's for EACH record, and there are hundreds of records!
How do I collapse these results to see a single set of distinct keys?
*/
SELECT L1.[key], L2.[key], L2.[value]
FROM openjson(#json,'$') AS L1
OUTER APPLY openjson(L1.[value]) AS L2
The source file is relatively consistent, but not all entries will have the same keys as shown in the image below. The "Bike" activity has more content than the "Walk" activity, source: {}, speed, tcxLink, distanceUnit, etc. etc.
Although I can target an grab data with FROM OPENJSON, I simply don't know what keys to expect throughout the entire file.
...
FROM OPENJSON (#json)
WITH(
logId bigint
,activityName varchar(max)
,activityTypeId int
,source nvarchar(max) as JSON
,averageHeartRate int
/*
????
not all keys are known
????
*/
)
So.... this OUTER APPLY is helpful, but is there anyway to "collapse" it so that I see a single set of all used keys? (not repeated for every single activity)
You can use the DISTINCT keyword to condense the results:
SELECT distinct L2.[key]
FROM openjson(#json,'$') AS L1
OUTER APPLY openjson(L1.[value]) AS L2

How to retrieve all child nodes from JSON file

I have below JSON file, which is in the external stage, I'm trying to write a copy query into the table with the below query. But it's fetching a single record from the node "values" whereas I need to insert all child elements for the values node. I have loaded this file into a table with the variant datatype.
The query I'm using:
select record:batchId batchId, record:results[0].pageInfo.numberOfPages NoofPages, record:results[0].pageInfo.pageNumber pageNo,
record:results[0].pageInfo.pageSize PgSz, record:results[0].requestId requestId,record:results[0].showPopup showPopup,
record:results[0].values[0][0].columnId columnId,record:results[0].values[0][0].value val
from lease;
{
"batchId": "",
"results": [
{
"pageInfo": {
"numberOfPages": ,
"pageNumber": ,
"pageSize":
},
"requestId": "",
"showPopup": false,
"values": [
[
{
"columnId": ,
"value": ""
},
{
"columnId": ,
"value":
}
]
]
}
]
}
you need to use the LATERAL FLATTEN functions, something like this:
I created this table:
create table json_test (seq_no integer, json_text variant);
and then populated it with this JSON string:
insert into json_test(seq_no, json_text)
select 1, parse_json($${
"batchId": "a",
"results": [
{
"pageInfo": {
"numberOfPages": "1",
"pageNumber": "1",
"pageSize": "100000"
},
"requestId": "a",
"showPopup": false,
"values": [
[
{
"columnId": "4567",
"value": "2020-10-09T07:24:29.000Z"
},
{
"columnId": "4568",
"value": "2020-10-10T10:24:29.000Z"
}
]
]
}
]
}$$);
Then the following query:
select
json_text:batchId batchId
,json_text:results[0].pageInfo.numberOfPages numberOfPages
,json_text:results[0].pageInfo.pageNumber pageNumber
,json_text:results[0].pageInfo.pageSize pageSize
,json_text:results[0].requestId requestId
,json_text:results[0].showPopup showPopup
,f.value:columnId columnId
,f.value:value value
from json_test t
,lateral flatten(input => t.json_text:results[0]:values[0]) f;
gives these results - which I think is roughly what you are looking for:
BATCHID NUMBEROFPAGES PAGENUMBER PAGESIZE REQUESTID SHOWPOPUP COLUMNID VALUE
"a" "1" "1" "100000" "a" false "4567" "2020-10-09T07:24:29.000Z"
"a" "1" "1" "100000" "a" false "4568" "2020-10-10T10:24:29.000Z"

ambiguous column name 'VALUE'

Any idea to overcome ambiguous column with snowflake lateral flatten function error with below logic is much appreciated.
I'm trying to flatten the nested JSON data using the below query by selecting the value from variant column, However getting ambiguous column name 'VALUE' error with lateral flatten function. Can someone help me to achieve the desired output. Issue here is the JSON key name is coming as "value" and I couldn't get that data using lateral flatten. Desired output has been attached as image to this thread.
Sample JSON Data
{"issues": [
{
"expand": "a,b,c,d",
"fields": {
"customfield_10000": null,
"customfield_10001": null,
"customfield_10002": [
{
"id": "1234",
"self": "xxx",
"value": "Test"
}
],
},
"id": "123456",
"key": "K-123"
}
]}*
*select
a.value:id::number as ISSUE_ID,
a.value:key::varchar as ISSUE_KEY,
b.value:id::varchar as ROOT_CAUSE_ID,
**b.value:value::varchar as ROOT_CAUSE_VALUE**
from
abc.table_variant,
lateral flatten( input => payload_json:issues) as a,
lateral flatten( input => a.value:fields.customfield_10002) as b;*
Try
b.value:"value"::varchar
WITH CTE AS
(select parse_json('{"issues": [
{
"expand": "a,b,c,d",
"fields": {
"customfield_10000": null,
"customfield_10001": null,
"customfield_10002": [
{
"id": "1234",
"self": "xxx",
"value": "Test"
}
],
},
"id": "123456",
"key": "K-123"
}
]}')
as col)
select
a.value:id::number as ID,
a.value:key::varchar as KEY,
b.value:id::INT as customfield_10002,
b.value:value::varchar as customfield_10002_value
from cte,
lateral flatten(input => cte.col, path => 'issues') a,
lateral flatten(input => a.value:fields.customfield_10002) b;

Postgresql get elements of a JSON array

Let's say that we have the following JSON in Postgresql:
{ "name": "John", "items": [ { "item_name": "lettuce", "price": 2.65, "units": "no" }, { "item_name": "ketchup", "price": 1.51, "units": "litres" } ] }
The JSONs are stored in the following table:
create table testy_response_p (
ID serial NOT NULL PRIMARY KEY,
content_json json NOT NULL
)
insert into testy_response_p (content_json) values (
'{ "name": "John", "items": [ { "item_name": "lettuce", "price": 2.65, "units": "no" }, { "item_name": "ketchup", "price": 1.51, "units": "litres" } ] }'
)
Since the following can return either JSON or text (with -> and ->> respectively select content_json ->> 'items' from testy_response_p) I want to use a subquery in order to get elements of the array under items:
select *
from json_array_elements(
select content_json ->> 'items' from testy_response_p
)
All I get is an error but I don't know what I'm doing wrong. The output of the subquery is text. The final output is:
{ "item_name": "lettuce", "price": 2.65, "units": "no" }
{ "item_name": "ketchup", "price": 1.51, "units": "litres" }
You need to join to the function's result. You can't use the ->> operator because that returns text, not json and json_array_elements() only works with a JSON value for its input.
select p.id, e.*
from testy_response_p p
cross join lateral json_array_elements(p.content_json -> 'items') as e;
Online example: https://rextester.com/MFGEA29396

Resources