Flatten and aggregate two columns of arrays via distinct in Snowflake

Flatten and aggregate two columns of arrays via distinct in Snowflake - snowflake-cloud-data-platform

Table structure is
+------------+---------+
| Animals | Herbs |
+------------+---------+
| [Cat, Dog] | [Basil] |
| [Dog, Lion]| [] |
+------------+---------+
Desired output (don't care about sorting of this list):
unique_things
+------------+
[Cat, Dog, Lion, Basil]
First attempt was something like
SELECT ARRAY_CAT(ARRAY_AGG(DISTINCT(animals)), ARRAY_AGG(herbs))
But this produces
[[Cat, Dog], [Dog, Lion], [Basil], []]
Since the distinct is operating on each array, not looking at distinct components within all arrays

If I understand your requirements right and assuming the source tables of
insert into tabarray select array_construct('cat', 'dog'), array_construct('basil');
insert into tabarray select array_construct('lion', 'dog'), null;
I would say the result would look like this:
select array_agg(distinct value) from
(
select
value from tabarray
, lateral flatten( input => col1 )
union all
select
value from tabarray
, lateral flatten( input => col2 ))
;

UPDATE
It is possible without using FLATTEN, by using ARRAY_UNION_AGG:
Returns an ARRAY that contains the union of the distinct values from the input ARRAYs in a column.
For sample data:
CREATE OR REPLACE TABLE t AS
SELECT ['Cat', 'Dog'] AS Animals, ['Basil'] AS Herbs
UNION SELECT ['Dog', 'Lion'], [];
Query:
SELECT ARRAY_UNION_AGG(ARRAY_CAT(Animals, Herbs)) AS Result
FROM t
or:
SELECT ARRAY_UNION_AGG(Animals) AS Result
FROM (SELECT Animals FROM t
UNION ALL
SELECT Herbs FROM t);
Output:
You could flatten the combined array and then aggregate back:
SELECT ARRAY_AGG(DISTINCT F."VALUE") AS unique_things
FROM tab, TABLE(FLATTEN(ARRAY_CAT(tab.Animals, tab.Herbs))) f

Here is another variation to handle NULLs in case they appear in data set.
SELECT ARRAY_AGG(DISTINCT a.VALUE) unique_things from tab, TABLE (FLATTEN(array_compact(array_append(tab.Animals, tab.Herbs)))) a

Related

Renaming a JSON column for a UNION

Remark: my example is overly simplified. In reality, I am dealing with a huge query. But to illustrate the issue/errors, let us resort to apples and oranges.
My original query looked like this:
SELECT 'FruitsCount' AS "Type", (SELECT count(id) as Counter, [Name] FROM Fruits group by name FOR JSON PATH) AS "Value"
Which would result in something like. Let's refer to this as Format A
|---------------------|------------------------------------------------------------------------------|
| Type | Value |
|---------------------|------------------------------------------------------------------------------|
| FruitCount | [{"Counter":2, "Name":"Apple"},{"Counter":3, "Name":"Orange"}] |
|---------------------|------------------------------------------------------------------------------|
However, now I want to create a union of Fruit and Vegetable counts. My query now looks like this
(SELECT count(id) as Counter, [Name] FROM Fruits group by name
UNION
SELECT count(id) as Counter, [Name] FROM Vegetables group by name)
FOR JSON PATH
|---------------------|------------------------------------------------------------------------------|
| JSON_F52E2B61-18A1-11d1-B105-00805F49916B |
|---------------------|------------------------------------------------------------------------------|
| [{"Counter":2, "Name":"Apple"},{"Counter":3, "Name":"Orange"},{"Counter":7, "Name":"Tomato"}] |
|---------------------|------------------------------------------------------------------------------|
However, I want it in the format as before, where I have a Type and Value columns (Format A).
I tried doing the following:
SELECT 'FruitsCount' AS "Type", ((SELECT count(id) as Counter, [Name] FROM Fruits group by name
UNION
SELECT count(id) as Counter, [Name] FROM Vegetables group by name) FOR JSON PATH) as "Value"
However, I am presented with Error 156: Incorrect syntax near the keyword 'FOR'.
Then I tried the following:
SELECT 'FruitsAndVegCount' AS "Type", (SELECT count(id) as Counter, [Name] FROM Fruits group by name
UNION
SELECT count(id) as Counter, [Name] FROM Vegetables group by name FOR JSON PATH) as "Value"
However, I am presented with Error 1086: The FOR XML and FOR JSON clauses are invalid in views, inline functions, derived tables, and subqueries when they contain a set operator.
I'm stuck in trying to get my "union-ized" query to be in Format A.
Update 1: Here is the desired output
|---------------------|------------------------------------------------------------------------------------------------|
| Type | Value |
|---------------------|------------------------------------------------------------------------------------------------|
| FruitAndVegCount | [{"Counter":2, "Name":"Apple"},{"Counter":3, "Name":"Orange"},{"Counter":7, "Name":"Tomato"}] |
|---------------------|------------------------------------------------------------------------------------------------|
The goal is to only have a single row, with 2 columns (Type, Value) where Type is whatever I specify (i.e. FruitAndVegCount) and Value is a JSON of the ResultSet that is created by the union query.

If I understand the question correctly, the following statement is an option:
SELECT
[Type] = 'FruitAndVegCount',
[Value] = (
SELECT Counter, Name
FROM (
SELECT count(id) as Counter, [Name] FROM Fruits group by name
UNION ALL
SELECT count(id) as Counter, [Name] FROM Vegetables group by name
) t
FOR JSON PATH
)

You could do it with two columns, Type and Value, as follows. Something like this
select 'FruitAndVegCount' as [Type],
(select [Counter], [Name]
from (select count(id) as Counter, [Name] from #Fruits group by [name]
union all
select count(id) as Counter, [Name] from #Vegetables group by [name]) u
for json path) [Value];
Output
Type Value
FruitAndVegCount [{"Counter":2,"Name":"apple"},{"Counter":1,"Name":"pear"},{"Counter":2,"Name":"carrot"},{"Counter":1,"Name":"kale"},{"Counter":2,"Name":"lettuce"}]

In PostgreSQL, how can I extract matching items from a list?

I have a query in PostgreSQL that returns results like this, records with a string and a json array:
id | property_list
-----+-------------------------------------------------------------------------------
"i1" | [{"a":{"b":"no"}}, {"a":{"b":"yes"}}, {"a":{"b":"true"}}, {"a":{"b":"false"}}]
"i2" | [{"a":{"b":"yes"}}, {"a":{"b":"no"}}, {"a":{"b":"no"}}]
What I need is something like this:
id | yes_or_true
-----+------------
"i1" | 2
"i2" | 1
I need to count the properties in property_list where a.b equals "yes" or "true".
There are more properties, but there is always an a.b property with a string as its value.
I can solve this using a PL/pgSQL function, but for some reason, I'm in a situation where I can't use a PL/pgSQL function. How can I solve this in the query?

You can do this using jsonb_array_elements and a subquery:
SELECT
id,
(SELECT count(*)
FROM json_array_elements(property_list) el
WHERE el->'a'->>'b' IN ('true','yes')
) AS yes_or_true
FROM the_table

A lateral join to jsonb_array_elements() will solve this:
with indat (id, property_list) as (
values
('i1', '[{"a":{"b":"no"}}, {"a":{"b":"yes"}}, {"a":{"b":"true"}}, {"a":{"b":"false"}}]'::jsonb),
('i2', '[{"a":{"b":"yes"}}, {"a":{"b":"no"}}, {"a":{"b":"no"}}]'::jsonb)
)
select id, count(*) filter (where jdat->'a'->>'b' in ('yes', 'true'))
from indat
cross join lateral jsonb_array_elements(property_list) as j(jdat)
group by id;
id | count
----+-------
i1 | 2
i2 | 1
(2 rows)

PostgreSQL: Efficiently aggregate array columns as part of a group by

We wish to perform a GROUP BY operation on a table. The original table contains an ARRAY column. Within a group, the content of these arrays should be transformed into a single array with unique elements. No ordering of these elements is required. contain
Newest PostgreSQL versions are available.
Example original table:
id | fruit | flavors
---: | :----- | :---------------------
| apple | {sweet,sour,delicious}
| apple | {sweet,tasty}
| banana | {sweet,delicious}
Exampled desired result:
count_total | aggregated_flavors
----------: | :---------------------------
1 | {delicious,sweet}
2 | {sour,tasty,delicious,sweet}
SQL toy code to create the original table:
CREATE TABLE example(id int, fruit text, flavors text ARRAY);
INSERT INTO example (fruit, flavors)
VALUES ('apple', ARRAY [ 'sweet','sour', 'delicious']),
('apple', ARRAY [ 'sweet','tasty' ]),
('banana', ARRAY [ 'sweet', 'delicious']);
We have come up with a solution requiring transforming the array to s
SELECT COUNT(*) AS count_total,
array
(SELECT DISTINCT unnest(string_to_array(replace(replace(string_agg(flavors::text, ','), '{', ''), '}', ''), ','))) AS aggregated_flavors
FROM example
GROUP BY fruit
However we think this is not optimal, and may be problematic as we assume that the string does neither contain "{", "}", nor ",". It feels like there must be functions to combine arrays in the way we need, but we weren't able to find them.
Thanks a lot everyone!

demo:db<>fiddle
Assuming each record contains a unique id value:
SELECT
fruit,
array_agg(DISTINCT flavor), -- 2
COUNT(DISTINCT id) -- 3
FROM
example,
unnest(flavors) AS flavor -- 1
GROUP BY fruit
unnest() array elements
Group by fruit value: array_agg() for distinct flavors
Group by fruit value: COUNT() for distinct ids with each fruit group.
if the id column is really empty, you could generate the id values for example with the row_number() window function:
demo:db<>fiddle
SELECT
*
FROM (
SELECT
*, row_number() OVER () as id
FROM example
) s,
unnest(flavors) AS flavor

How to extract elements of a JSONB array?

Running PostgresSQL v10.5.
In my table table_a that has a column metadata which is of type jsonb.
It has a JSON array as one of it's keys array_key with value something like this:
[{"key1":"value11", "key2":"value21", "key3":"value31"},
{"key1":"value21", "key2":"value22", "key3":"value23"}]
This is how I can query this key
SELECT metadata->>'array_key' from table_a
This gives me the entire array. Is there any way I can query only selected keys and perhaps format them?
The type of the array is text i.e pg_typeof(metadata->>'array_key') is text
An ideal output would be
"value11, value13", "value21, value23"

Use jsonb_array_elements() to get elements of the array (as value) which can be filtered by keys:
select value->>'key1' as key1, value->>'key3' as key3
from table_a
cross join jsonb_array_elements(metadata->'array_key');
key1 | key3
---------+---------
value11 | value31
value21 | value23
(2 rows)
Use an aggregate to get the output as a single value for each row, e.g.:
select string_agg(concat_ws(', ', value->>'key1', value->>'key3'), '; ')
from table_a
cross join jsonb_array_elements(metadata->'array_key')
group by id;
string_agg
------------------------------------
value11, value31; value21, value23
(1 row)
Working example in rextester.

Returning Field names as part of a SQL Query

I need to write a Sql Satement that gets passed any valid SQL subquery, and return the the resultset, WITH HEADERS.
Somehow i need to interrogate the resultset, get the fieldnames and return them as part of a "Union" with the origional data, then pass the result onwards for exporting.
Below my attempt: I have a Sub-Query Callled "A", wich returns a dataset and i need to query it for its fieldnames. ?ordinally maybe?
select A.fields[0].name, A.fields[1].name, A.fields[2].name from
(
Select 'xxx1' as [Complaint Mechanism] , 'xxx2' as [Actual Achievements]
union ALL
Select 'xxx3' as [Complaint Mechanism] , 'xxx4' as [Actual Achievements]
union ALL
Select 'xxx5' as [Complaint Mechanism] , 'xxx6' as [Actual Achievements] ) as A
Any pointers would be appreciated (maybe i am just missing the obvious...)
The Resultset should look like the table below:
F1 F2
--------------------- ---------------------
[Complaint Mechanism] [Actual Achievements]
xxx1 xxx2
xxx3 xxx4
xxx5 xxx6

If you have a static number of columns, you can put your data into a temp table and then query tempdb.sys.columns to get the column names, which you can then union on top of your data. If you will have a dynamic number of columns, you will need to use dynamic SQL to build your pivot statement but I'll leave that up to you to figure out.
The one caveat here is that all data under your column names will need to be converted to strings:
select 1 a, 2 b
into #a;
select [1] as FirstColumn
,[2] as SecondColumn
from (
select column_id
,name
from tempdb.sys.columns
where object_id = object_id('tempdb..#a')
) d
pivot (max(name)
for column_id in([1],[2])
) pvt
union all
select cast(a as nvarchar(100))
,cast(b as nvarchar(100))
from #a;
Query Results:
| FirstColumn | SecondColumn |
|-------------|--------------|
| a | b |
| 1 | 2 |

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Flatten and aggregate two columns of arrays via distinct in Snowflake - snowflake-cloud-data-platform

Here is another variation to handle NULLs in case they appear in data set. SELECT ARRAY_AGG(DISTINCT a.VALUE) unique_things from tab, TABLE (FLATTEN(array_compact(array_append(tab.Animals, tab.Herbs)))) a

Related

Renaming a JSON column for a UNION

In PostgreSQL, how can I extract matching items from a list?

PostgreSQL: Efficiently aggregate array columns as part of a group by

How to extract elements of a JSONB array?

Returning Field names as part of a SQL Query

Categories

Resources