Snowflake - Lateral Flatten creating duplicate rows - snowflake-cloud-data-platform

I'm creating a new table (my_new_table) from another table (my_existing_table) that has 4 columns, product and monthly_budget have nested values that I'm trying to extract:
Product column is a dictionary like this:
{"name": "Display", "full_name": "Ad Bundle"}
MONTHLY_BUDGETS is a list with several dictionaries, the column looks like this:
[{"id": 123, "quantity_booked": "23", "budget_booked": "0.0", "budget_booked_loc": "0.0"} ,
{"id": 234, "quantity_booked": "34", "budget_booked": "0.0", "budget_booked_loc": "0.0"},
{"id": 455, "quantity_booked": "44", "budget_booked": "0.0", "budget_booked_loc": "0.0"}]
The below is what I'm doing to create the new table and unnest from the other table:
CREATE OR REPLACE TABLE my_new_table as (
with og_table as (
select
id,
parse_json(product) as PRODUCT,
IO_NAME,
parse_json(MONTHLY_BUDGETS) as MONTHLY_BUDGETS
from my_existing_table
)
select
id,
PRODUCT:name::string as product_name,
PRODUCT:full_name::string as product_full_name,
IO_NAME,
MONTHLY_BUDGETS:id::integer as monthly_budgets_id,
MONTHLY_BUDGETS:quantity_booked::float as monthly_budgets_quantity_booked,
MONTHLY_BUDGETS:budget_booked_loc::float as monthly_budgets_budget_booked_loc
from og_table,
lateral flatten( input => PRODUCT) as PRODUCT,
lateral flatten( input => MONTHLY_BUDGETS) as MONTHLY_BUDGETS);
however once my new table is created and I run this:
select distinct id, count(*)
from my_new_table
where id = '123'
group by 1;
I see 18 under the count(*) column when I should only have 1, so it looks like there are a lot of duplicates, but why? and how do I prevent this?

LATERAL FLATTEN produces a CROSS JOIN between the input row and the flatten results.
So if we have this data
Id, Array
1, [10,20,30]
2, [40,50,60]
and you do a flatten on Array, via something like:
SELECT d.id,
d.array,
f.value as val
FROM data d
LATERAL FLATTEN(input => d.array) f
Id, Array, val
1, [10,20,30], 10
1, [10,20,30], 20
1, [10,20,30], 30
2, [40,50,60], 40
2, [40,50,60], 50
2, [40,50,60], 60
for for you case, given you are doing two flatten's for each ID you will have many duplicate rows of ID.
Just like above if on my output if I did a SELECT ID, count(*) FROM output GROUP BY 1 I will have the values 1,3 and 2,3

Related

PostgreSQL aggregate over json arrays

I have seen a lot of references to using json_array_elements on extracting the elements of a JSON array. However, this appears to only work on exactly 1 array. If I use this in a generic query, I get the error
ERROR: cannot call json_array_elements on a scalar
Given something like this:
orders
{ "order_id":"2", "items": [{"name": "apple","price": 1.10}]}
{ "order_id": "3","items": [{"name": "apple","price": 1.10},{"name": "banana","price": 0.99}]}
I would like to extract
item
count
apple
2
banana
1
Or
item
total_value_sold
apple
2.20
banana
0.99
Is it possible to aggregate over json arrays like this using json_array_elements?
Use the function for orders->'items' to flatten the data:
select elem->>'name' as name, (elem->>'price')::numeric as price
from my_table
cross join jsonb_array_elements(orders->'items') as elem;
It is easy to get the aggregates you want from the flattened data:
select name, count(*), sum(price) as total_value_sold
from (
select elem->>'name' as name, (elem->>'price')::numeric as price
from my_table
cross join jsonb_array_elements(orders->'items') as elem
) s
group by name;
Db<>fiddle.

Querying json that starts with an array

I have a JSON that starts with an array and I don't manage to query it.
JSON is in this format:
[
{"#id":1,
"field1":"qwerty",
"#field2":{"name":"my_name", "name2":"my_name_2"},
"field3":{"event":[{"event_type":"OP",...}]}
},
{"#id":2..
}
]
Any suggestions on how to query this?
If I try to use lateral flatten I don't know what key to use:
select
'???'.Value:#id::string as id
from tabl1
,lateral flatten (tabl1_GB_RECORD:???) as gb_record
Your SQL was close but not complete, the following will give you #id values
with tbl1 (v) as (
select parse_json('
[
{"#id":1,
"field1":"qwerty",
"#field2":{"name":"my_name", "name2":"my_name_2"},
"field3":{"event":[{"event_type":"OP"}]}
},
{"#id":2
}
]')
)
select t.value:"#id" id from tbl1
, lateral flatten (input => v) as t
Result:
id
___
1
2
Let me know if you have any other questions
You leverage the field that you want to flatten when the json begins with an array. Something along these lines:
WITH x AS (
SELECT parse_json('[
{"#id":1,
"field1":"qwerty",
"#field2":{"name":"my_name", "name2":"my_name_2"},
"field3":{"event":[{"event_type":"OP"}]}
},
{"#id":2,
"field1":"qwerty",
"#field2":{"name":"my_name", "name2":"my_name_2"},
"field3":{"event":[{"event_type":"OP"}]}
}
]') as json_data
)
SELECT y.value,
y.value:"#id"::number as id,
y.value:field1::string as field1,
y.value:"#field2"::variant as field2,
y.value:field3::variant as field3,
y.value:"#field2":name::varchar as name
FROM x,
LATERAL FLATTEN (input=>json_data) y;

how to combine two columns to one column like a map in hive?

In hive I have two columns in a table:
user_id product_id score
1 1, 2, 3 0.7, 0.2, 0.1
2 2, 3, 1 0.5, 0.25, 0.25
The type of product_id and score are both string. Now I wish to generate a new column which is combined by product_id and score like this:
user_id product_score
1 1:0.7, 2:0.2, 3:0.1
2 2:0.5, 3:0.25, 1:0.25
In the new table, the column product_score is like a map, the product_id is the key and the score is the value, but it is actually still a string. The product_id and score is connected by ':'. The different product_ids are connected by ',' and oredered by the initial order in product_id in initial table. How can I achieve this?
Use split() to get arrays, map() to convert to map
select user_id,
map(product_id[0], score[0],
product_id[1], score[1],
product_id[2], score[2]
) as product_score
(
select user_id, split(product_id,',') as product_id, split(score,',') as score
from ...
)s;
Solved - merge two arrays columns like key and value map with order.
Approach - Explode array with posexplode method and get equal pos value from multiple columns
SQL Query -
with rowidcol as
(
select user_id, split(product_id, ',') prod_arr, split(score, ',') score_arr, row_number() over() as row_id
from prod
),
coltorows as
(
select row_id, user_id, prod_arr[prd_index] product, score_arr[score_index] score, prd_index, score_index
from rowidcol
LATERAL view posexplode(prod_arr) ptable as prd_index, pdid
LATERAL view posexplode(score_arr) prtable as score_index, sid
),
colselect as
(
select row_id, user_id, collect_list(concat(product, ':', score)) product_score
from coltorows
where prd_index = score_index
group by row_id, user_id
)
select user_id, concat_ws(',', product_score) as prodcut_score
from colselect
order by user_id;
Input -
Table Name - Prod -
user_id product_id score
1 A,B,C,D 10,20,30,40
2 X,Y,Z 1,2,3
3 K,F,G 100,200,300
Output -
user_id prodcut_score
1 A:10,B:20,C:30,D:40
2 X:1,Y:2,Z:3
3 K:100,F:200,G:300

How can I apply a function to each element of a array column?

I have a dataset were a column in with an ARRAY of OBJECTs like this:
ID TAGS
1 {"tags": [{"tag": "a"}, {"tag": "b"}]}
2 {"tags": [{"tag": "c"}, {"tag": "d"}]}
I want to extract the tag field of each element of the array, so the end result would be:
ID TAGS
1 ["a","b"]
2 ["c","d"]
Assuming the following table t1:
CREATE OR REPLACE TEMPORARY TABLE t1 AS (
select 1 as ID , PARSE_JSON('{"tags": [{"tag":"a"}, {"tag":"b"}]}') AS PAYLOAD
UNION ALL
select 2, PARSE_JSON('{"tags": [{"tag":"c"}, {"tag":"d"}]}')
);
One possible solution is to create a javascript function and use the javascript .map() to apply a function to each element of the array:
create or replace function extract_tags(a array)
returns array
language javascript
strict
as '
return A.map(function(d) {return d.tag});
';
SELECT ID, EXTRACT_TAGS(PAYLOAD:tags) AS tags from t1;
this gives the desired result:
ID TAGS
1 [ "a", "b" ]
2 [ "c", "d" ]
A pure SQL approach would be to combine LATERAL FLATTEN and ARRAY_AGG like this:
with t2 as (
select ID, t2.value:tag as tag
from t1, LATERAL FLATTEN(input => payload:tags) t2
)
select t2.id, ARRAY_AGG(t2.tag) as tags from t2
group by ID
order by ID ASC;
t2 itself will become:
ID TAG
1 "a"
1 "b"
2 "c"
2 "d"
and after the GROUP BY ID it becomes:
ID TAGS
1 [ "a", "b" ]
2 [ "c", "d" ]

Using union to do a crosstab query

I have a table which has the following structure :
id key data
1 A 10
1 B 20
1 C 30
I need to write a query so that i get these keys as columns and the value as rows.
Eg :
id A B C
1 10 20 30
I have tried using union and case but i get 3 rows for instead of one
Any suggestion?
The most straightforward way to do this is:
SELECT DISTINCT "id",
(SELECT "data" FROM Table1 WHERE "key" = 'A') AS "A",
(SELECT "data" FROM Table1 WHERE "key" = 'B') AS "B",
(SELECT "data" FROM Table1 WHERE "key" = 'C') AS "C"
FROM Table1
Or you can use a PIVOT:
SELECT * FROM
(SELECT "id", "key", "data" FROM Table1)
PIVOT (
MAX("data")
FOR ("key") IN ('A', 'B', 'C'));
sqlfiddle demo

Resources