Extracting data from JSON column defined as String - snowflake-cloud-data-platform

A table has a ports column (defined as VARCHAR) which has the following data:
[{u'position': 1, u'macAddress': u'00:8C:FA:C1:7C:88'}, {u'position':
2, u'macAddress': u'00:8C:FA:5E:98:81'}]
I want to extract the data from just the macAddress fields into separate rows. I tried to flatten the data in Snowflake but it is not working as the column is not defined as VARIANT and the the fields have a 'u' in front of them (this is my guess).
00:8C:FA:C3:7C:84
00:5C:FA:7E:98:87
Could someone please help with the requirement.

The provided JSON is not a valid JSON but it is possible to treat it as one with text operations and PARSE_JSON:
SELECT s.value:macAddress::TEXT AS macAddress
FROM t
,LATERAL FLATTEN(INPUT => PARSE_JSON(REPLACE(REPLACE(col, 'u''', ''''), '''', '"')))
AS s;
For input:
CREATE OR REPLACE TABLE t(col TEXT)
AS
SELECT $$[{u'position': 1, u'macAddress': u'00:8C:FA:C1:7C:88'}, {u'position': 2, u'macAddress': u'00:8C:FA:5E:98:81'}]$$;
Output:

Related

Return Parts of an Array in Postgres

I have a column (text) in my Postgres DB (v.10) with a JSON format.
As far as i now it's has an array format.
Here is an fiddle example: Fiddle
If table1 = persons and change_type = create then i only want to return the name and firstname concatenated as one field and clear the rest of the text.
Output should be like this:
id table1 did execution_date change_type attr context_data
1 Persons 1 2021-01-01 Create Name [["+","name","Leon Bill"]]
1 Persons 2 2021-01-01 Update Firt_name [["+","cur_nr","12345"],["+","art_cd","1"],["+","name","Leon"],["+","versand_art",null],["+","email",null],["+","firstname","Bill"],["+","code_cd",null]]
1 Users 3 2021-01-01 Create Street [["+","cur_nr","12345"],["+","art_cd","1"],["+","name","Leon"],["+","versand_art",null],["+","email",null],["+","firstname","Bill"],["+","code_cd",null]]
Disassemble json array into SETOF using json_array_elements function, then assemble it back into structure you want.
select m.*
, case
when m.table1 = 'Persons' and m.change_type = 'Create'
then (
select '[["+","name",' || to_json(string_agg(a.value->>2,' ' order by a.value->>1 desc))::text || ']]'
from json_array_elements(m.context_data::json) a
where a.value->>1 in ('name','firstname')
)
else m.context_data
end as context_data
from mutations m
modified fiddle
(Note:
utilization of alphabetical ordering of names of required fields is little bit dirty, explicit order by case could improve readability
resulting json is assembled from string literals as much as possible since you didn't specified if "+" should be taken from any of original array elements
the to_json()::text is just for safety against injection
)

Snowflake Retrieve value from Semi Structured Data

I'm trying to retrieve the health value from Snowflake semi structured data in a variant column called extra from table X.
An example of the code can be seen below:
[
{
"party":
"[{\"class\":\"Farmer\",\"gender\":\"Female\",\"ethnicity\":\"NativeAmerican\",\"health\":2},
{\"class\":\"Adventurer\",\"gender\":\"Male\",\"ethnicity\":\"White\",\"health\":3},
{\"class\":\"Farmer\",\"gender\":\"Male\",\"ethnicity\":\"White\",\"health\":0},
{\"class\":\"Banker\",\"gender\":\"Female\",\"ethnicity\":\"White\",\"health\":0}
}
]
I have tried reading the Snowflake documentation from https://community.snowflake.com/s/article/querying-semi-structured-data
I have also tried the following queries to flatten the query:
SELECT result.value:health AS PartyHealth
FROM X
WHERE value = 'Trail'
AND name = 'Completed'
AND PartyHealth > 0,
TABLE(FLATTEN(X, 'party')) result
AND
SELECT [0]['party'][0]['health'] AS Health
FROM X
WHERE value = 'Trail'
AND name = 'Completed'
AND PH > 0;
I am trying to retrieve the health value from table X from column extra which contains the the variant party, which has 4 repeating values [0-3]. Im not sure how to do this is someone able to tell me how to query semi structured data in Snowflake, considering the documentation doesn't make much sense?
First, the JSON value you posted seems wrong formatted (might be a copy paste issue).
Here's an example that works:
first your JSON formatted:
[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]
create a table to test:
CREATE OR REPLACE TABLE myvariant (v variant);
insert the JSON value into this table:
INSERT INTO myvariant SELECT PARSE_JSON('[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]');
now, to select a value you start from column name, in my case v, and as your JSON is an array inside, I specify first value [0], and from there expand, so something like this:
SELECT v[0]:party[0].health FROM myvariant;
Above gives me:
For the other rows you can simply do:
SELECT v[0]:party[1].health FROM myvariant;
SELECT v[0]:party[2].health FROM myvariant;
SELECT v[0]:party[3].health FROM myvariant;
Another option might be to make the data more like a table ... I find it easier to work with than the JSON :-)
Code at bottom - just copy/paste and it runs in Snowflake returning screenshot below.
Key Doco is Lateral Flatten
SELECT d4.path, d4.value
from
lateral flatten(input=>PARSE_JSON('[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]') ) as d ,
lateral flatten(input=> value) as d2 ,
lateral flatten(input=> d2.value) as d3 ,
lateral flatten(input=> d3.value) as d4

SQL Server: How to remove a key from a Json object

I have a query like (simplified):
SELECT
JSON_QUERY(r.SerializedData, '$.Values') AS [Values]
FROM
<TABLE> r
WHERE ...
The result is like this:
{ "2019":120, "20191":120, "201902":121, "201903":134, "201904":513 }
How can I remove the entries with a key length less then 6.
Result:
{ "201902":121, "201903":134, "201904":513 }
One possible solution is to parse the JSON and generate it again using string manipulations for keys with desired length:
Table:
CREATE TABLE Data (SerializedData nvarchar(max))
INSERT INTO Data (SerializedData)
VALUES (N'{"Values": { "2019":120, "20191":120, "201902":121, "201903":134, "201904":513 }}')
Statement (for SQL Server 2017+):
UPDATE Data
SET SerializedData = JSON_MODIFY(
SerializedData,
'$.Values',
JSON_QUERY(
(
SELECT CONCAT('{', STRING_AGG(CONCAT('"', [key] ,'":', [value]), ','), '}')
FROM OPENJSON(SerializedData, '$.Values') j
WHERE LEN([key]) >= 6
)
)
)
SELECT JSON_QUERY(d.SerializedData, '$.Values') AS [Values]
FROM Data d
Result:
Values
{"201902":121,"201903":134,"201904":513}
Notes:
It's important to note, that JSON_MODIFY() in lax mode deletes the specified key if the new value is NULL and the path points to a JSON object. But, in this specific case (JSON object with variable key names), I prefer the above solution.

How to load .jsonl into a snowflake table variant?

How to load .jsonl into a table variant as json of snowflake
create or replace table sampleColors (v variant);
insert into
sampleColors
select
parse_json(column1) as v
from
values
( '{r:255,g:12,b:0} {r:0,g:255,b:0} {r:0,g:0,b:255}')
v;
select * from sampleColors;
Error parsing JSON: more than one document in the input
If you want each RGB value in its own row, you need to split the JSONL to a table with one row per JSON using a table function like this:
insert into
sampleColors
select parse_json(VALUE)
from table(split_to_table( '{r:255,g:12,b:0} {r:0,g:255,b:0} {r:0,g:0,b:255} {c:0,m:1,y:1,k:0} {c:1,m:0,y:1,k:0} {c:1,m:1,y:0,k:0}', ' '));

replace not working after specific number

I have a table with a column vouchn. The recod of this column eg-if it receipt voucher it will record like RV103 AND LIKE payment it stores like PV99. I also use this sql for gettin max records.
SELECT MAX(REPLACE(vouchn, 'RV', '')) AS vcno
FROM dbo.dayb
WHERE (vouchn LIKE '%RV%')
it is ok until i reach RV999. After then even record RV1000 is there the above sql retrieve RV999. What is the error of the above code.
If REPLACE(vouchn, 'RV', '') always return numeric result, you can as the below:
SELECT MAX(REPLACE(vouchn, 'RV', '') * 1) AS vcno
FROM dbo.dayb
WHERE (vouchn LIKE '%RV%')
Add CONVERT to numeric type:
SELECT MAX(CONVERT(BIGINT,REPLACE(vouchn, 'RV', ''))) AS vcno
FROM dbo.dayb
WHERE (vouchn LIKE '%RV%')

Resources