Insert into clickhouse JSONEachRow nested - arrays

I have the following table
SET flatten_nested = 0;
CREATE TABLE test.hm
(
customDimensions Array(Nested(index Nullable(Int64), value Nullable(String)))
)
engine = Memory;
I am trying to insert into it with the following query:
INSERT INTO test.hm FORMAT JSONEachRow {"customDimensions": [{"index": 1, "value": 2}]}
But it fails with
Code: 130, e.displayText() = DB::Exception: Array does not start with '[' character: (while reading the value of key customDimensions): (at row 1) (version 21.8.4.51 (official build))
How to fix it and insert JSON into flatten_nested = 0 having multi lvl nested hierarchy?

Are you sure you need Array(Nested because it's two-dimensional array.
you can use select to get understanding what JSONEachRow CH expectes
insert into test.hm values([[(1,'test1'), (2,'test2')]]);
select * from test.hm format JSONEachRow;
{"customDimensions":[[["1","test1"],["2","test2"]]]}
I guess you really need Array(Tuple(index Nullable(Int64), value Nullable(String)))
And you can use JSONExtract
https://kb.altinity.com/altinity-kb-schema-design/altinity-kb-jsonasstring-and-mat.-view-as-json-parser/
https://kb.altinity.com/altinity-kb-queries-and-syntax/jsonextract-to-parse-many-attributes-at-a-time/
Or https://clickhouse.com/docs/en/guides/developer/working-with-json/json-semi-structured/#json-object-type

Related

Snowflake Retrieve value from Semi Structured Data

I'm trying to retrieve the health value from Snowflake semi structured data in a variant column called extra from table X.
An example of the code can be seen below:
[
{
"party":
"[{\"class\":\"Farmer\",\"gender\":\"Female\",\"ethnicity\":\"NativeAmerican\",\"health\":2},
{\"class\":\"Adventurer\",\"gender\":\"Male\",\"ethnicity\":\"White\",\"health\":3},
{\"class\":\"Farmer\",\"gender\":\"Male\",\"ethnicity\":\"White\",\"health\":0},
{\"class\":\"Banker\",\"gender\":\"Female\",\"ethnicity\":\"White\",\"health\":0}
}
]
I have tried reading the Snowflake documentation from https://community.snowflake.com/s/article/querying-semi-structured-data
I have also tried the following queries to flatten the query:
SELECT result.value:health AS PartyHealth
FROM X
WHERE value = 'Trail'
AND name = 'Completed'
AND PartyHealth > 0,
TABLE(FLATTEN(X, 'party')) result
AND
SELECT [0]['party'][0]['health'] AS Health
FROM X
WHERE value = 'Trail'
AND name = 'Completed'
AND PH > 0;
I am trying to retrieve the health value from table X from column extra which contains the the variant party, which has 4 repeating values [0-3]. Im not sure how to do this is someone able to tell me how to query semi structured data in Snowflake, considering the documentation doesn't make much sense?
First, the JSON value you posted seems wrong formatted (might be a copy paste issue).
Here's an example that works:
first your JSON formatted:
[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]
create a table to test:
CREATE OR REPLACE TABLE myvariant (v variant);
insert the JSON value into this table:
INSERT INTO myvariant SELECT PARSE_JSON('[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]');
now, to select a value you start from column name, in my case v, and as your JSON is an array inside, I specify first value [0], and from there expand, so something like this:
SELECT v[0]:party[0].health FROM myvariant;
Above gives me:
For the other rows you can simply do:
SELECT v[0]:party[1].health FROM myvariant;
SELECT v[0]:party[2].health FROM myvariant;
SELECT v[0]:party[3].health FROM myvariant;
Another option might be to make the data more like a table ... I find it easier to work with than the JSON :-)
Code at bottom - just copy/paste and it runs in Snowflake returning screenshot below.
Key Doco is Lateral Flatten
SELECT d4.path, d4.value
from
lateral flatten(input=>PARSE_JSON('[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]') ) as d ,
lateral flatten(input=> value) as d2 ,
lateral flatten(input=> d2.value) as d3 ,
lateral flatten(input=> d3.value) as d4

SQL Server: How to remove a key from a Json object

I have a query like (simplified):
SELECT
JSON_QUERY(r.SerializedData, '$.Values') AS [Values]
FROM
<TABLE> r
WHERE ...
The result is like this:
{ "2019":120, "20191":120, "201902":121, "201903":134, "201904":513 }
How can I remove the entries with a key length less then 6.
Result:
{ "201902":121, "201903":134, "201904":513 }
One possible solution is to parse the JSON and generate it again using string manipulations for keys with desired length:
Table:
CREATE TABLE Data (SerializedData nvarchar(max))
INSERT INTO Data (SerializedData)
VALUES (N'{"Values": { "2019":120, "20191":120, "201902":121, "201903":134, "201904":513 }}')
Statement (for SQL Server 2017+):
UPDATE Data
SET SerializedData = JSON_MODIFY(
SerializedData,
'$.Values',
JSON_QUERY(
(
SELECT CONCAT('{', STRING_AGG(CONCAT('"', [key] ,'":', [value]), ','), '}')
FROM OPENJSON(SerializedData, '$.Values') j
WHERE LEN([key]) >= 6
)
)
)
SELECT JSON_QUERY(d.SerializedData, '$.Values') AS [Values]
FROM Data d
Result:
Values
{"201902":121,"201903":134,"201904":513}
Notes:
It's important to note, that JSON_MODIFY() in lax mode deletes the specified key if the new value is NULL and the path points to a JSON object. But, in this specific case (JSON object with variable key names), I prefer the above solution.

How to load .jsonl into a snowflake table variant?

How to load .jsonl into a table variant as json of snowflake
create or replace table sampleColors (v variant);
insert into
sampleColors
select
parse_json(column1) as v
from
values
( '{r:255,g:12,b:0} {r:0,g:255,b:0} {r:0,g:0,b:255}')
v;
select * from sampleColors;
Error parsing JSON: more than one document in the input
If you want each RGB value in its own row, you need to split the JSONL to a table with one row per JSON using a table function like this:
insert into
sampleColors
select parse_json(VALUE)
from table(split_to_table( '{r:255,g:12,b:0} {r:0,g:255,b:0} {r:0,g:0,b:255} {c:0,m:1,y:1,k:0} {c:1,m:0,y:1,k:0} {c:1,m:1,y:0,k:0}', ' '));

MSSQL JSON_VALUE to match ANY Object in Array

I have a table with a JSON text field:
create table breaches(breach_id int, detail text);
insert into breaches values
( 1,'[{"breachedState": null},
{"breachedState": "PROCESS_APPLICATION",}]')
I'm trying to use MSSQL's in build JSON parsing functions to test whether ANY object in a JSON array has a matching member value.
If the detail field was a single JSON object, I could use:
select * from breaches
where JSON_VALUE(detail,'$.breachedState') = 'PROCESS_APPLICATION'
but it's an Array, and I want to know if ANY Object has breachedState = 'PROCESS_APPLICATION'
Is this possible using MSSQL's JSON functions?
You can use function OPENJSON to check each object, try this query:
select * from breaches
where exists
(
select *
from
OPENJSON (detail) d
where JSON_VALUE(value,'$.breachedState') = 'PROCESS_APPLICATION'
)
Btw, there is an extra "," in your insert query, it should be:
insert into breaches values
( 1,'[{"breachedState": null},
{"breachedState": "PROCESS_APPLICATION"}]')

Number 0 is not saving to database as a prefix in SQL Server of CHAR data type column

I am trying to insert an value as '019393' into a table with a CHAR(10) column.
It is inserting only '19393' into the database
I am implementing this feature in a stored procedure, doing some manipulation like incrementing that number by 15 and saving it back with '0' as the prefix
I am using SQL Server database
Note: I tried CASTING that value as VARCHAR before saving to the database, but even that did not get the solution
Code
SELECT
#fromBSB = fromBSB, #toBSB = toBSB, #type = Type
FROM
[dbo].[tbl_REF_SpecialBSBRanges]
WHERE
CAST(#inputFromBSB AS INT) BETWEEN fromBSB AND toBSB
SET #RETURNVALUE = #fromBSB
IF(#fromBSB = #inputFromBSB)
BEGIN
PRINT 'Starting Number is Equal';
DELETE FROM tbl_REF_SpecialBSBRanges
WHERE Type = #type AND fromBSB = #fromBSB AND toBSB = #toBSB
INSERT INTO [tbl_REF_SpecialBSBRanges] ([Type], [fromBSB], [toBSB])
VALUES(#type, CAST('0' + #fromBSB + 1 AS CHAR), #toBSB)
INSERT INTO [tbl_REF_SpecialBSBRanges] ([Type], [fromBSB], [toBSB])
VALUES(#inputBSBName, #inputFromBSB, #inputToBSB)
END
Okay, without knowing the column datatypes, I would suggest trying this:
Change from
CAST('0'+#fromBSB+1 AS CHAR)
To
'0'+CAST(#fromBSB+1 AS CHAR(10))
But if the columns are integers this won't make a difference.

Resources