How to get counts of CSV values in SQL Server column - sql-server

I have a table that contains CSV strings for some of the values.
I'd like to get a count of each time an entry in the CSV exists.
However, the count is comparing strings instead of substrings.
Sample Data
| Category | Items |
|----------|---------------------------------|
| Basket 1 | Apples, Bananas, Oranges, Plums |
| Basket 2 | Oranges |
| Basket 3 | Oranges, Plums |
| Basket 4 | Apples, Bananas, Oranges, Plums |
Sample Select
select distinct
[key] = 'Items',
[value] = [items],
[count] = count([items])
from someTable
group by [items]
Current Output
| key | value | count |
|----------|---------------------------------|-------|
| Items | Apples, Bananas, Oranges, Plums | 2 |
| Items | Oranges | 1 |
| Items | Oranges, Plums | 1 |
Expected Output
| key | value | count |
|-------|---------|-------|
| Items | Apples | 2 |
| Items | Bananas | 2 |
| Items | Oranges | 4 |
| Items | Plums | 3 |
How can I get the count for each CSV entry in a column?

You want to use the STRING_SPLIT table-valued function to turn the comma-separated values into rows and then count them. You have to remove the spaces because STRING_SPLIT only accepts a singular separator character.
create table data
(
Category varchar(25)
, Items varchar(100)
)
insert into data
values
('Basket 1' ,'Apples, Bananas, Oranges, Plums')
, ('Basket 2', 'Oranges')
, ('Basket 3', 'Oranges, Plums')
, ('Basket 4', 'Apples, Bananas, Oranges, Plums')
select
'Items' as [key]
, value
, count(*) as [count]
from data
cross apply string_split(replace(Items, ' ', ''), ',')
group by value
Here is the demo.

Related

SQL - split data into two columns based on identifier

I have a table that contains the product categories/headers and product names but all in one column. I need to split it out into separate Category and Product columns. I also have a helper column Header_Flag which can be used to determine if the row contains the header or product name. The input table looks like this:
|---------------------|------------------|
| Product | Header_Flag |
|---------------------|------------------|
| Furniture | Y |
| Bed | N |
| Table | N |
| Chair | N |
| Cosmetics | Y |
| Lip balm | N |
| Lip stick | N |
| Eye liner | N |
| Apparel | Y |
| Shirt | N |
| Trouser | N |
|---------------------|------------------|
The output format I'm looking for would be like this:
|---------------------|------------------|
| Category | Product |
|---------------------|------------------|
| Furniture | Bed |
| Furniture | Table |
| Furniture | Chair |
| Cosmetics | Lip balm |
| Cosmetics | Lip stick |
| Cosmetics | Eye liner |
| Apparel | Shirt |
| Apparel | Trouser |
|---------------------|------------------|
With the data is it stands, you cannot get the results you are after. To be able to achieve this, you need to be able to order your data, using an ORDER BY clause, and ordering on either of these column does not achieve the same result as the sample data:
CREATE TABLE dbo.YourTable (Product varchar(20),HeaderFlag char(1));
GO
INSERT INTO dbo.YourTable
VALUES('Furniture','Y'),
('Bed','N'),
('Table','N'),
('Chair','N'),
('Cosmetics','Y'),
('Lipbalm','N'),
('Lipstick','N'),
('Eyeliner','N'),
('Apparel','Y'),
('Shirt','N'),
('Trouser','N');
GO
SELECT *
FROM dbo.YourTable
ORDER BY Product;
GO
SELECT *
FROM dbo.YourTable
ORDER BY HeaderFlag
GO
DROP TABLE dbo.YourTable;
AS you can see, the orders both differ.
If you add a column you can order on though (I'm going to use an IDENTITY) then you can achieve this:
CREATE TABLE dbo.YourTable (I int IDENTITY, Product varchar(20),HeaderFlag char(1));
GO
INSERT INTO dbo.YourTable
VALUES('Furniture','Y'),
('Bed','N'),
('Table','N'),
('Chair','N'),
('Cosmetics','Y'),
('Lipbalm','N'),
('Lipstick','N'),
('Eyeliner','N'),
('Apparel','Y'),
('Shirt','N'),
('Trouser','N');
GO
SELECT *
FROM dbo.YourTable YT
ORDER BY I;
Then you can use a cumulative COUNT to put the values into groups and get the header:
WITH Grps AS(
SELECT YT.I,
YT.Product,
YT.HeaderFlag,
COUNT(CASE YT.HeaderFlag WHEN 'Y' THEN 1 END) OVER (ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM dbo.YourTable YT),
Split AS(
SELECT G.I,
MAX(CASE G.HeaderFlag WHEN 'Y' THEN Product END) OVER (PARTITION BY G.Grp) AS Category,
G.Product,
G.HeaderFlag
FROM Grps G)
SELECT S.Category,
S.Product
FROM Split S
WHERE HeaderFlag = 'N'
ORDER BY S.I;

SNOWFLAKE - Query that gets a column sum and aggregates array column

I have a table like:
|------|-------|----------------------|
| id | qty | collection |
|------|-------|----------------------|
| foo | 2 | ['foo', 'bar'] |
|------|-------|----------------------|
| foo | 4 | ['baz', 'qux'] |
|------|-------|----------------------|
| bar | 8 | ['beep', 'boop] |
|------|-------|----------------------|
and I want an output like:
|------|-------|------------------------------------|
| id | qty | collection |
|------|-------|------------------------------------|
| foo | 6 | ['foo', 'bar', 'baz', 'qux'] |
|------|-------|------------------------------------|
| bar | 8 | ['beep', 'boop'] |
|------|-------|------------------------------------|
My first attempt was to do something like
SELECT
id, SUM(qty), ARRAY_AGG(collection)
GROUP BY id
which gives me the correct qty sum but the array agg is multidimensional array
Doing something like a lateral flatten gives me the correct output array but the sum is off because the flattens array created extra rows with qty.
Here is one way to do what you ask
with tbl as (select $1 id, $2 qty, parse_json($3) collection from values ('foo',2,'["foo", "bar"]'), ('foo',4,'["baz", "qux"]'), ('bar',8,'["beep", "boop"]'))
select
id,
sum(iff(_collection.index=0, qty, 0)),
array_agg(_collection.value)
from tbl, lateral flatten(collection) _collection
group by id

Extract into multiple columns from JSON with PostgreSQL

I have a column item_id that contains data in JSON (like?) structure.
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
| id | item_id |
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
| 56711 | {itemID":["0530#2#1974","0538\/2#2#1974","0538\/3#2#1974","0538\/18#2#1974","0539#2#1974"]}" |
| 56712 | {itemID":["0138528#2#4221","0138529#2#4221","0138530#2#4221","0138539#2#4221","0118623\/2#2#4220"]}" |
| 56721 | {itemID":["2704\/1#1#1356"]}" |
| 56722 | {itemID":["0825\/2#2#3349","0840#2#3349","0844\/10#2#3349","0844\/11#2#3349","0844\/13#2#3349","0844\/14#2#3349","0844\/15#2#3349"]}" |
| 57638 | {itemID":["0161\/1#2#3364","0162\/1#2#3364","0163\/2#2#3364"]}" |
| 57638 | {itemID":["109#1#3364","110\/1#1#3364"]}" |
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
I need the last four digits before every comma (if there is) and the last 4 digits distincted and separated into individual colums.
The distinct should happen across id as well, so only one result row with id: 57638 is permitted.
Here is a fiddle with a code draft that is not giving the right answer.
The desired result should look like this:
+----------+-----------+-----------+
| id | item_id_1 | item_id_2 |
+----------+-----------+-----------+
| 56711 | 1974 | |
| 56712 | 4220 | 4221 |
| 56721 | 1356 | |
| 56722 | 3349 | |
| 57638 | 3364 | 3365 |
+----------+-----------+-----------+
There can be quite a lot 'item_id_%' column in the results.
with the_table (id, item_id) as (
values
(56711, '{"itemID":["0530#2#1974","0538\/2#2#1974","0538\/3#2#1974","0538\/18#2#1974","0539#2#1974"]}'),
(56712, '{"itemID":["0138528#2#4221","0138529#2#4221","0138530#2#4221","0138539#2#4221","0118623\/2#2#4220"]}'),
(56721, '{"itemID":["2704\/1#1#1356"]}'),
(56722, '{"itemID":["0825\/2#2#3349","0840#2#3349","0844\/10#2#3349","0844\/11#2#3349","0844\/13#2#3349","0844\/14#2#3349","0844\/15#2#3349"]}'),
(57638, '{"itemID":["0161\/1#2#3364","0162\/1#2#3364","0163\/2#2#3364"]}'),
(57638, '{"itemID":["109#1#3365","110\/1#1#3365"]}')
)
select id
,(array_agg(itemid)) [1] itemid_1
,(array_agg(itemid)) [2] itemid_2
from (
select distinct id
,split_part(replace(json_array_elements(item_id::json -> 'itemID')::text, '"', ''), '#', 3)::int itemid
from the_table
order by 1
,2
) t
group by id
DEMO
You can unnest the json array, get the last 4 characters of each element as a number, then do conditional aggregation:
select
id,
max(val) filter(where rn = 1) item_id_1,
max(val) filter(where rn = 2) item_id_2
from (
select
id,
right(val, 4)::int val,
dense_rank() over(partition by id order by right(val, 4)::int) rn
from mytable t
cross join lateral jsonb_array_elements_text(t.item_id -> 'itemID') as x(val)
) t
group by id
You can add more conditional max()s to the outer query to handle more possible values.
Demo on DB Fiddle:
id | item_id_1 | item_id_1
----: | --------: | --------:
56711 | 1974 | null
56712 | 4220 | 4221
56721 | 1356 | null
56722 | 3349 | null
57638 | 3364 | 3365

SQL Group By a Partition By

This must be accomplished in MS SQL Server. I believe OVER( PARTITION BY) must be used, but I've failed at all my tries and I end up counting the records to each ID or something else...
I have this table:
| ID | COLOR |
+------+--------+
| 1 | Red |
| 1 | Green |
| 1 | Blue |
| 2 | Red |
| 2 | Green |
| 2 | Blue |
| 3 | Red |
| 3 | Brown |
| 3 | Orange |
Notice that ID = 1 and ID = 2 have precisely the same values for COLOR, however ID = 3 only shares the value COLOR = Red.
I would like to group the table as follows:
| COLOR | COUNT | GROUPING |
+--------+-------+----------+
| Red | 2 | Type 1 |
| Green | 2 | Type 1 |
| Blue | 2 | Type 1 |
| Red | 1 | Type 2 |
| Brown | 1 | Type 2 |
| Orange | 1 | Type 2 |
This would mean that ID = 1 and ID = 2 share the same 3 values for color and they are aggregated together as Type 1. Although ID = 3 shares one value for color to ID = 1 and ID = 2 (which is 'Red') the rest of the values are not shared, as such it is considered of Type 2 (different grouping).
The tables used are simple examples and are enough to replicate to the entire dateset, however each ID can have in theory hundreds of records with different values for colors in each row. However they are unique, one ID can't have the the same color in different rows.
My best attempt:
SELECT
ID,
COLOR,
CONCAT ('TYPE ', COUNT(8) OVER( PARTITION by ID)) AS COLOR_GROUP
FROM
{TABLE};
Result:
| ID | COLOR | GROUPING |
+------+--------+----------+
| 1 | Green | Type 3 |
| 1 | Blue | Type 3 |
| 1 | Red | Type 3 |
| 2 | Green | Type 3 |
| 2 | Blue | Type 3 |
| 2 | Red | Type 3 |
| 3 | Red | Type 3 |
| 3 | Brown | Type 3 |
| 3 | Orange | Type 3 |
Although the results are terrible I've tried different methods, none of them is better.
Hope I was clear enough.
Thank you for the help!
try the following:
declare #t table ( ID int,COLOR varchar(100))
insert into #t select 1 ,'Red'
insert into #t select 1 ,'Green'
insert into #t select 1 ,'Blue'
insert into #t select 2 ,'Red'
insert into #t select 2 ,'Green'
insert into #t select 2 ,'Blue'
insert into #t select 3 ,'Red'
insert into #t select 3 ,'Brown'
insert into #t select 3 ,'Orange'
select *, STUFF((SELECT CHAR(10) + ' '+COLOR
FROM #t t_in where t_in.ID=t.ID
order by COLOR
FOR XML PATH ('')) , 1, 1, '') COLOR_Combined
into #temp
from #t t
select COLOR, count(color) [COUNT], 'TYPE ' + convert(varchar(10), dense_rank() OVER (order by [grouping])) [GROUPING]
from
(
select id, COLOR, COLOR_Combined, (row_number() over (order by id) - row_number() over (partition by Color_Combined order by id)) [grouping]
from #temp
)t
group by COLOR, [grouping]
drop table if exists #temp
Please find the db<>fiddle here.

How do you create a query which returns dynamic column names in Postgresql?

I have two tables in a reporting database, one for orders, and one for order items. Each order can have multiple order items, along with a quantity for each:
Orders
+----------+---------+
| order_id | email |
+----------+---------+
| 1 | 1#1.com |
+----------+---------+
| 2 | 2#2.com |
+----------+---------+
| 3 | 3#3.com |
+----------+---------+
Order Items
+---------------+----------+----------+--------------+
| order_item_id | order_id | quantity | product_name |
+---------------+----------+----------+--------------+
| 1 | 1 | 1 | Tee Shirt |
+---------------+----------+----------+--------------+
| 2 | 1 | 3 | Jeans |
+---------------+----------+----------+--------------+
| 3 | 1 | 1 | Hat |
+---------------+----------+----------+--------------+
| 4 | 2 | 2 | Tee Shirt |
+---------------+----------+----------+--------------+
| 5 | 3 | 3 | Tee Shirt |
+---------------+----------+----------+--------------+
| 6 | 3 | 1 | Jeans |
+---------------+----------+----------+--------------+
For reporting purposes, I'd love to denormalise this data into a separate PostgreSQL view (or just run a query) that turns the data above into something like this:
+----------+---------+-----------+-------+-----+
| order_id | email | Tee Shirt | Jeans | Hat |
+----------+---------+-----------+-------+-----+
| 1 | 1#1.com | 1 | 3 | 1 |
+----------+---------+-----------+-------+-----+
| 2 | 2#2.com | 2 | 0 | 0 |
+----------+---------+-----------+-------+-----+
| 3 | 3#3.com | 3 | 1 | 0 |
+----------+---------+-----------+-------+-----+
ie, it's a sum of the quantity of each item within the order with the product name; and the product names set as the column titles. Do I need to use something like crosstab to do this, or is there a clever way using subqueries even if I don't know the list of distinct product names at before the query runs.
This is one possible answer:
create table orders
(
orders_id int PRIMARY KEY,
email text NOT NULL
);
create table orders_items
(
order_item_id int PRIMARY KEY,
orders_id int REFERENCES orders(orders_id) NOT NULL,
quantity int NOT NULL,
product_name text NOT NULL
);
insert into orders VALUES (1, '1#1.com');
insert into orders VALUES (2, '2#2.com');
insert into orders VALUES (3, '3#3.com');
insert into orders_items VALUES (1,1,1,'T-Shirt');
insert into orders_items VALUES (2,1,3,'Jeans');
insert into orders_items VALUES (3,1,1,'Hat');
insert into orders_items VALUES (4,2,2,'T-Shirt');
insert into orders_items VALUES (5,3,3,'T-Shirt');
insert into orders_items VALUES (6,3,1,'Jeans');
select
orders.orders_id,
email,
COALESCE(tshirt.quantity, 0) as "T-Shirts",
COALESCE(jeans.quantity,0) as "Jeans",
COALESCE(hat.quantity, 0) as "Hats"
from
orders
left join (select orders_id, quantity from orders_items where product_name = 'T-Shirt')
as tshirt ON (tshirt.orders_id = orders.orders_id)
left join (select orders_id, quantity from orders_items where product_name = 'Jeans')
as jeans ON (jeans.orders_id = orders.orders_id)
left join (select orders_id, quantity from orders_items where product_name = 'Hat')
as hat ON (hat.orders_id = orders.orders_id)
;
Tested with postgresql. Result:
orders_id | email | T-Shirts | Jeans | Hats
-----------+---------+----------+-------+------
1 | 1#1.com | 1 | 3 | 1
2 | 2#2.com | 2 | 0 | 0
3 | 3#3.com | 3 | 1 | 0
(3 rows)
Based on your comment, you can try to use tablefunc like this:
CREATE EXTENSION tablefunc;
SELECT * FROM crosstab
(
'SELECT orders_id, product_name, quantity FROM orders_items ORDER BY 1',
'SELECT DISTINCT product_name FROM orders_items ORDER BY 1'
)
AS
(
orders_id text,
TShirt text,
Jeans text,
Hat text
);
But I think you are thinking the wrong way about SQL. You usually know which rows you want and have to tell it SQL. "Rotating tables" 90 degrees is not part of SQL and should be avoided.

Resources