Not able to transform data in expected format in snowflake - arrays

I got data in rows for a column like this
[
{
"value": "A",
"path": "nth-child(1)"
},
{
"value": "K",
"path": "nth-child(2)"
},
{
"value": "C",
"path": "nth-child(3)"
}
]
Need help .....
Want to get data like this format in rows from that column
{
"A",
"K",
"C",
},
Have tried like this : but it combine all the rows of the table
SELECT LISTAGG(f.value:value::STRING, ',') AS col
FROM tablename
,LATERAL FLATTEN(input => parse_json(column_name)) f

I have used a CTE just to provide fake data for the example:
WITH data(json) as (
select parse_json(column1) from values
('[{"value":"A","path":"nth-child(1)"},{"value":"K","path":"nth-child(2)"},{"value":"C","path":"nth-child(3)"}]'),
('[{"value":"B","path":"nth-child(1)"},{"value":"L","path":"nth-child(2)"},{"value":"D","path":"nth-child(3)"}]'),
('[{"value":"C","path":"nth-child(1)"},{"value":"M","path":"nth-child(2)"},{"value":"E","path":"nth-child(3)"}]')
)
SELECT LISTAGG(f.value:value::text,',') as l1
from data as d
,table(flatten(input=>d.json)) f
group by f.seq
order by f.seq;
gives:
L1
A,K,C
B,L,D
C,M,E
Thus with some string concatenation via ||
SELECT '{' || LISTAGG('"' ||f.value:value::text|| '"' , ',') || '}' as l1
from data as d
,table(flatten(input=>d.json)) f
group by f.seq
order by f.seq;
gives:
L1
{"A","K","C"}
{"B","L","D"}
{"C","M","E"}

Related

SQL Server table data to JSON Path result

I am looking for a solution to convert the table results to a JSON path.
I have a table with two columns as below. Column 1 Will always have normal values, but column 2 will have values up to 15 separated by ';' (semicolon).
ID Column1 Column2
--------------------------------------
1 T1 Re;BoRe;Va
I want to convert the above column data in to below JSON Format
{
"services":
[
{ "service": "T1"}
],
"additional_services":
[
{ "service": "Re" },
{ "service": "BoRe" },
{ "service": "Va" }
]
}
I have tried creating something like the below, but cannot get to the exact format that I am looking for
SELECT
REPLACE((SELECT d.Column1 AS services, d.column2 AS additional_services
FROM Table1 w (nolock)
INNER JOIN Table2 d (nolock) ON w.Id = d.Id
WHERE ID = 1
FOR JSON PATH), '\/', '/')
Please let me know if this is something we can achieve using T-SQL
As I mention in the comments, I strongly recommend you fix your design and normalise your design. Don't store delimited data in your database; Re;BoRe;Va should be 3 rows, not 1 delimited one. That doesn't mean you can't achieve what you want with your denormalised data, just that your design is flawed, and thus it needs being brought up.
One way to achieve what you're after is with some nested FOR JSON calls:
SELECT (SELECT V.Column1 AS service
FOR JSON PATH) AS services,
(SELECT SS.[value] AS service
FROM STRING_SPLIT(V.Column2,';') SS
FOR JSON PATH) AS additional_services
FROM (VALUES(1,'T1','Re;BoRe;Va'))V(ID,Column1,Column2)
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER;
This results in the following JSON:
{
"services": [
{
"service": "T1"
}
],
"additional_services": [
{
"service": "Re"
},
{
"service": "BoRe"
},
{
"service": "Va"
}
]
}

How do I do an aggregate query against a Couchbase array index?

I have documents in my database that contain a "flags" array. Each of those has a "flag" value that contains a string. I'm trying to get the count of how many of each flag string there are across all documents. So for example, if I had two documents:
{
"flags": [
{
"flag": "flag1",
...
},
{
"flag": "flag2",
...
}
],
...
},
{
"flags": [
{
"flag": "flag1",
...
},
{
"flag": "flag3",
...
}
],
...
}
I would expect a result back like:
{
{
"flag": "flag1",
"flag_count": 2
},
{
"flag": "flag2",
"flag_count": 1
},
{
"flag": "flag3",
"flag_count": 1
}
}
I've created an index that looks like this:
CREATE INDEX `indexname` ON `dbname`((all (array (`f`.`flag`) for `f` in `flags` end)),`flags`) WHERE (`type` in ["type1", "type2"])
So far, the only way I've been able to get this to work is with a query like this:
SELECT f1.flag, count(*) as flag_count from dbname s unnest flags as f1 where (s.type in ["type1", "type2"]) AND any f in s.flags satisfies f.flag like '%' end group by f1.flag
This all makes sense to me except that it requires something along the lines of that AND any f in s.flags satisfies f.flag like '%' part to run at all - if I leave that out, it tells me it can't find an index that works. Is there a way to structure this such that I could leave that out? It seems unnecessary to me, but I guess I'm missing something.
CREATE INDEX ix1 ON dbname( ALL ARRAY f.flag FOR f IN flags END)
WHERE type IN ["type1", "type2"];
SELECT f.flag, COUNT(1) AS flag_count
FROM dbname AS d
UNNEST d.flags AS f
WHERE d.type IN ["type1", "type2"] AND f.flag LIKE "%"
GROUP BY f.flag;
If ARRAY has duplicate flag value and count one per document
SELECT f.flag, COUNT( DISTINCT META(d).id) AS flag_count
FROM dbname AS d
UNNEST d.flags AS f
WHERE d.type IN ["type1", "type2"] AND f.flag LIKE "%"
GROUP BY f.flag;
Check UNNEST https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/indexing-arrays.html

Loading JSON into BigQuery: Field is sometimes an array and sometimes a string

I'm trying to load JSON data to BigQuery. The excerpt of my data causing problems looks like this:
[{"Value":"123","Code":"A"},{"Value":"000","Code":"B"}]
{"Value":"456","Code":"A"}
[{"Value":"123","Code":"A"},{"Value":"789","Code":"C"},{"Value":"000","Code":"B"}]
{"Value":"Z","Code":"A"}
I have defined the schema for this field to be:
{
"fields": [
{
"mode": "NULLABLE",
"name": "Code",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "Value",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "Properties",
"type": "RECORD"
}
But I'm having trouble successfully extracting the string and array values into one repeated field. This SQL will successfully extract the string values:
JSON_EXTRACT_SCALAR(json_string,'$.Properties.Code') as Code,
JSON_EXTRACT_SCALAR(json_string,'$.Properties.Value') as Value
And this SQL will successfully extract the array values:
ARRAY(
SELECT
STRUCT(
JSON_EXTRACT_SCALAR(Properties_Array,'$.Code') AS Code,
JSON_EXTRACT_SCALAR(Properties_Array,'$.Value') AS Value
)
FROM UNNEST(JSON_EXTRACT_ARRAY(json_string,'$.Properties')) Properties_Array)
AS Properties
I am trying to find a way to have BigQuery to read this string as a one element array instead of preprocessing the data. Is this possible in #StandardSQL?
Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` as (
SELECT '{"Properties":[{"Value":"123","Code":"A"},{"Value":"000","Code":"B"}]}' json_string UNION ALL
SELECT '{"Properties":{"Value":"456","Code":"A"}}' UNION ALL
SELECT '{"Properties":[{"Value":"123","Code":"A"},{"Value":"789","Code":"C"},{"Value":"000","Code":"B"}]}' UNION ALL
SELECT '{"Properties": {"Value":"Z","Code":"A"}}'
)
SELECT json_string,
ARRAY(
SELECT STRUCT(
JSON_EXTRACT_SCALAR(Properties,'$.Code') AS Code,
JSON_EXTRACT_SCALAR(Properties,'$.Value') AS Value
)
FROM UNNEST(IFNULL(
JSON_EXTRACT_ARRAY(json_string,'$.Properties'),
[JSON_EXTRACT(json_string,'$.Properties')])) Properties
) AS Properties
FROM `project.dataset.table`
with output

How can I count the number of top level json keys in a Snowflake database variant field?

I'm looking for the number 2 here... array_size appears to work on a variant list but is not doing so well on this json. Is there a clever way to do this? I don't know/probably can't trust the structure will only go this deep, and am hoping to use this as just another field on a query where I'm pulling out a bunch of other values out of the json; so ideally a solution allows this as well.
select dict, array_size(dict)
from (select parse_json('{
"partition": [
"partition_col"
],
"sample_weight": [
"sample_weight"
]
}') as dict)
In that case you can create a small Javascript UDF:
create or replace function count_keys(MYVAR variant)
returns float
language javascript
as '
return (Object.entries(MYVAR)).length
'
;
To call it:
select count_keys(parse_json(
'{
"partition": [
"partition_col"
],
"sample_weight": [
"sample_weight"
]
}')
)
;
Use flatten:
with dict as (
select parse_json('{
"partition": [
"partition_col"
],
"sample_weight": [
"sample_weight"
]
}') val
)
select val, count(*)
from dict,
lateral flatten(input => val)
group by val
;

postgresql json array query

I tried to query my json array using the example here: How do I query using fields inside the new PostgreSQL JSON datatype?
They use the example:
SELECT *
FROM json_array_elements(
'[{"name": "Toby", "occupation": "Software Engineer"},
{"name": "Zaphod", "occupation": "Galactic President"} ]'
) AS elem
WHERE elem->>'name' = 'Toby';
But my Json array looks more like this (if using the example):
{
"people": [{
"name": "Toby",
"occupation": "Software Engineer"
},
{
"name": "Zaphod",
"occupation": "Galactic President"
}
]
}
But I get an error: ERROR: cannot call json_array_elements on a non-array
Is my Json "array" not really an array? I have to use this Json string because it's contained in a database, so I would have to tell them to fix it if it's not an array.
Or, is there another way to query it?
I read documentation but nothing worked, kept getting errors.
The json array has a key people so use my_json->'people' in the function:
with my_table(my_json) as (
values(
'{
"people": [
{
"name": "Toby",
"occupation": "Software Engineer"
},
{
"name": "Zaphod",
"occupation": "Galactic President"
}
]
}'::json)
)
select t.*
from my_table t
cross join json_array_elements(my_json->'people') elem
where elem->>'name' = 'Toby';
The function json_array_elements() unnests the json array and generates all its elements as rows:
select elem->>'name' as name, elem->>'occupation' as occupation
from my_table
cross join json_array_elements(my_json->'people') elem
name | occupation
--------+--------------------
Toby | Software Engineer
Zaphod | Galactic President
(2 rows)
If you are interested in Toby's occupation:
select elem->>'occupation' as occupation
from my_table
cross join json_array_elements(my_json->'people') elem
where elem->>'name' = 'Toby'
occupation
-------------------
Software Engineer
(1 row)

Resources