SNOWFLAKE - Query that gets a column sum and aggregates array column - arrays

I have a table like:
|------|-------|----------------------|
| id | qty | collection |
|------|-------|----------------------|
| foo | 2 | ['foo', 'bar'] |
|------|-------|----------------------|
| foo | 4 | ['baz', 'qux'] |
|------|-------|----------------------|
| bar | 8 | ['beep', 'boop] |
|------|-------|----------------------|
and I want an output like:
|------|-------|------------------------------------|
| id | qty | collection |
|------|-------|------------------------------------|
| foo | 6 | ['foo', 'bar', 'baz', 'qux'] |
|------|-------|------------------------------------|
| bar | 8 | ['beep', 'boop'] |
|------|-------|------------------------------------|
My first attempt was to do something like
SELECT
id, SUM(qty), ARRAY_AGG(collection)
GROUP BY id
which gives me the correct qty sum but the array agg is multidimensional array
Doing something like a lateral flatten gives me the correct output array but the sum is off because the flattens array created extra rows with qty.

Here is one way to do what you ask
with tbl as (select $1 id, $2 qty, parse_json($3) collection from values ('foo',2,'["foo", "bar"]'), ('foo',4,'["baz", "qux"]'), ('bar',8,'["beep", "boop"]'))
select
id,
sum(iff(_collection.index=0, qty, 0)),
array_agg(_collection.value)
from tbl, lateral flatten(collection) _collection
group by id

Related

SQL - split data into two columns based on identifier

I have a table that contains the product categories/headers and product names but all in one column. I need to split it out into separate Category and Product columns. I also have a helper column Header_Flag which can be used to determine if the row contains the header or product name. The input table looks like this:
|---------------------|------------------|
| Product | Header_Flag |
|---------------------|------------------|
| Furniture | Y |
| Bed | N |
| Table | N |
| Chair | N |
| Cosmetics | Y |
| Lip balm | N |
| Lip stick | N |
| Eye liner | N |
| Apparel | Y |
| Shirt | N |
| Trouser | N |
|---------------------|------------------|
The output format I'm looking for would be like this:
|---------------------|------------------|
| Category | Product |
|---------------------|------------------|
| Furniture | Bed |
| Furniture | Table |
| Furniture | Chair |
| Cosmetics | Lip balm |
| Cosmetics | Lip stick |
| Cosmetics | Eye liner |
| Apparel | Shirt |
| Apparel | Trouser |
|---------------------|------------------|
With the data is it stands, you cannot get the results you are after. To be able to achieve this, you need to be able to order your data, using an ORDER BY clause, and ordering on either of these column does not achieve the same result as the sample data:
CREATE TABLE dbo.YourTable (Product varchar(20),HeaderFlag char(1));
GO
INSERT INTO dbo.YourTable
VALUES('Furniture','Y'),
('Bed','N'),
('Table','N'),
('Chair','N'),
('Cosmetics','Y'),
('Lipbalm','N'),
('Lipstick','N'),
('Eyeliner','N'),
('Apparel','Y'),
('Shirt','N'),
('Trouser','N');
GO
SELECT *
FROM dbo.YourTable
ORDER BY Product;
GO
SELECT *
FROM dbo.YourTable
ORDER BY HeaderFlag
GO
DROP TABLE dbo.YourTable;
AS you can see, the orders both differ.
If you add a column you can order on though (I'm going to use an IDENTITY) then you can achieve this:
CREATE TABLE dbo.YourTable (I int IDENTITY, Product varchar(20),HeaderFlag char(1));
GO
INSERT INTO dbo.YourTable
VALUES('Furniture','Y'),
('Bed','N'),
('Table','N'),
('Chair','N'),
('Cosmetics','Y'),
('Lipbalm','N'),
('Lipstick','N'),
('Eyeliner','N'),
('Apparel','Y'),
('Shirt','N'),
('Trouser','N');
GO
SELECT *
FROM dbo.YourTable YT
ORDER BY I;
Then you can use a cumulative COUNT to put the values into groups and get the header:
WITH Grps AS(
SELECT YT.I,
YT.Product,
YT.HeaderFlag,
COUNT(CASE YT.HeaderFlag WHEN 'Y' THEN 1 END) OVER (ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM dbo.YourTable YT),
Split AS(
SELECT G.I,
MAX(CASE G.HeaderFlag WHEN 'Y' THEN Product END) OVER (PARTITION BY G.Grp) AS Category,
G.Product,
G.HeaderFlag
FROM Grps G)
SELECT S.Category,
S.Product
FROM Split S
WHERE HeaderFlag = 'N'
ORDER BY S.I;

Extract into multiple columns from JSON with PostgreSQL

I have a column item_id that contains data in JSON (like?) structure.
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
| id | item_id |
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
| 56711 | {itemID":["0530#2#1974","0538\/2#2#1974","0538\/3#2#1974","0538\/18#2#1974","0539#2#1974"]}" |
| 56712 | {itemID":["0138528#2#4221","0138529#2#4221","0138530#2#4221","0138539#2#4221","0118623\/2#2#4220"]}" |
| 56721 | {itemID":["2704\/1#1#1356"]}" |
| 56722 | {itemID":["0825\/2#2#3349","0840#2#3349","0844\/10#2#3349","0844\/11#2#3349","0844\/13#2#3349","0844\/14#2#3349","0844\/15#2#3349"]}" |
| 57638 | {itemID":["0161\/1#2#3364","0162\/1#2#3364","0163\/2#2#3364"]}" |
| 57638 | {itemID":["109#1#3364","110\/1#1#3364"]}" |
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
I need the last four digits before every comma (if there is) and the last 4 digits distincted and separated into individual colums.
The distinct should happen across id as well, so only one result row with id: 57638 is permitted.
Here is a fiddle with a code draft that is not giving the right answer.
The desired result should look like this:
+----------+-----------+-----------+
| id | item_id_1 | item_id_2 |
+----------+-----------+-----------+
| 56711 | 1974 | |
| 56712 | 4220 | 4221 |
| 56721 | 1356 | |
| 56722 | 3349 | |
| 57638 | 3364 | 3365 |
+----------+-----------+-----------+
There can be quite a lot 'item_id_%' column in the results.
with the_table (id, item_id) as (
values
(56711, '{"itemID":["0530#2#1974","0538\/2#2#1974","0538\/3#2#1974","0538\/18#2#1974","0539#2#1974"]}'),
(56712, '{"itemID":["0138528#2#4221","0138529#2#4221","0138530#2#4221","0138539#2#4221","0118623\/2#2#4220"]}'),
(56721, '{"itemID":["2704\/1#1#1356"]}'),
(56722, '{"itemID":["0825\/2#2#3349","0840#2#3349","0844\/10#2#3349","0844\/11#2#3349","0844\/13#2#3349","0844\/14#2#3349","0844\/15#2#3349"]}'),
(57638, '{"itemID":["0161\/1#2#3364","0162\/1#2#3364","0163\/2#2#3364"]}'),
(57638, '{"itemID":["109#1#3365","110\/1#1#3365"]}')
)
select id
,(array_agg(itemid)) [1] itemid_1
,(array_agg(itemid)) [2] itemid_2
from (
select distinct id
,split_part(replace(json_array_elements(item_id::json -> 'itemID')::text, '"', ''), '#', 3)::int itemid
from the_table
order by 1
,2
) t
group by id
DEMO
You can unnest the json array, get the last 4 characters of each element as a number, then do conditional aggregation:
select
id,
max(val) filter(where rn = 1) item_id_1,
max(val) filter(where rn = 2) item_id_2
from (
select
id,
right(val, 4)::int val,
dense_rank() over(partition by id order by right(val, 4)::int) rn
from mytable t
cross join lateral jsonb_array_elements_text(t.item_id -> 'itemID') as x(val)
) t
group by id
You can add more conditional max()s to the outer query to handle more possible values.
Demo on DB Fiddle:
id | item_id_1 | item_id_1
----: | --------: | --------:
56711 | 1974 | null
56712 | 4220 | 4221
56721 | 1356 | null
56722 | 3349 | null
57638 | 3364 | 3365

How to get counts of CSV values in SQL Server column

I have a table that contains CSV strings for some of the values.
I'd like to get a count of each time an entry in the CSV exists.
However, the count is comparing strings instead of substrings.
Sample Data
| Category | Items |
|----------|---------------------------------|
| Basket 1 | Apples, Bananas, Oranges, Plums |
| Basket 2 | Oranges |
| Basket 3 | Oranges, Plums |
| Basket 4 | Apples, Bananas, Oranges, Plums |
Sample Select
select distinct
[key] = 'Items',
[value] = [items],
[count] = count([items])
from someTable
group by [items]
Current Output
| key | value | count |
|----------|---------------------------------|-------|
| Items | Apples, Bananas, Oranges, Plums | 2 |
| Items | Oranges | 1 |
| Items | Oranges, Plums | 1 |
Expected Output
| key | value | count |
|-------|---------|-------|
| Items | Apples | 2 |
| Items | Bananas | 2 |
| Items | Oranges | 4 |
| Items | Plums | 3 |
How can I get the count for each CSV entry in a column?
You want to use the STRING_SPLIT table-valued function to turn the comma-separated values into rows and then count them. You have to remove the spaces because STRING_SPLIT only accepts a singular separator character.
create table data
(
Category varchar(25)
, Items varchar(100)
)
insert into data
values
('Basket 1' ,'Apples, Bananas, Oranges, Plums')
, ('Basket 2', 'Oranges')
, ('Basket 3', 'Oranges, Plums')
, ('Basket 4', 'Apples, Bananas, Oranges, Plums')
select
'Items' as [key]
, value
, count(*) as [count]
from data
cross apply string_split(replace(Items, ' ', ''), ',')
group by value
Here is the demo.

How to Group by and concatenate arrays in PostgreSQL

I have a table in PostgreSQL. I want to concatenate all the arrays(i.e. col) after grouping them by time. The arrays are of varying dimensions.
| time | col |
|------ |------------------ |
| 1 | {1,2} |
| 1 | {3,4,5,6} |
| 2 | {} |
| 2 | {7} |
| 2 | {8,9,10} |
| 3 | {11,12,13,14,15} |
The result should be as follows:
| time | col |
|------ |------------------ |
| 1 | {1,2,3,4,5,6} |
| 2 | {7,8,9,10} |
| 3 | {11,12,13,14,15} |
What I have come up with so far is as follows:
SELECT ARRAY(SELECT elem FROM tab, unnest(col) elem);
But this does not do the grouping. It just takes the entire table and concatenates it.
To preserve the same dimension of you array you can't directly use array_agg(), so first we unnest your arrays and apply distinct to remove duplicates (1). In outer query this is the time to aggregate. To preserve values ordering include order by within aggregate function:
select time, array_agg(col order by col) as col
from (
select distinct time, unnest(col) as col
from yourtable
) t
group by time
order by time
(1) If you don't need duplicate removal just remove distinct word.
you can use next query
SELECT
array_agg(_unnested.item) as array_coll
from my_table
left join LATERAL (SELECT unnest(my_table.array_coll) as item) _unnested ON TRUE

Selecting grouped rows after first two rows SQL Server

This is a bit of a tricky question/situation and my search fu failed me.
Lets say i have the following data
| UID | SharedID | Type | Date |
|-----|----------|------|-----------|
| 1 | 1 | foo | 2/4/2016 |
| 2 | 1 | foo | 2/5/2016 |
| 3 | 1 | foo | 2/8/2016 |
| 4 | 1 | foo | 2/11/2016 |
| 5 | 2 | bar | 1/11/2016 |
| 6 | 2 | bar | 2/11/2016 |
| 7 | 3 | baz | 2/1/2016 |
| 8 | 3 | baz | 2/3/2016 |
| 9 | 3 | baz | 2/11/2016 |
And I would like to ommit a variable number of leading rows (most recent date in this case) and lets say that number is 2 in this example. The resulting table would be something like this:
| UID | SharedID | Type | Date |
|-----|----------|------|-----------|
| 1 | 1 | foo | 2/4/2016 |
| 2 | 1 | foo | 2/5/2016 |
| 7 | 3 | baz | 2/1/2016 |
Is this possible in SQL? Essentially I want to filter on an unknown number of rows which uses the date column as the order by. The goal is to get the oldest types and get a list of UID's in the process.
Sure, it's possible. Use a ROW_NUMBER function to assign a value to each row, partitioning by the SharedID column so that the count restarts every time that ID changes, and select those rows with a value greater than your limit.
WITH cteNumberedRows AS (
SELECT UID, SharedID, Type, Date,
ROW_NUMBER() OVER(PARTITION BY SharedID ORDER BY Date DESC) AS RowNum
FROM YourTable
)
SELECT UID, SharedID, Type, Date
FROM cteNumberedRows
WHERE RowNum > 2;
Not sure if I understand what you mean but something like this?
SELECT * FROM MyTable t1 JOIN MyTable T2 ON t2.id NOT IN (
SELECT TOP 2 UID FROM myTable
WHERE SharedID = t1.sharedID
ORDER BY [Date] DESC
)

Resources