This must be accomplished in MS SQL Server. I believe OVER( PARTITION BY) must be used, but I've failed at all my tries and I end up counting the records to each ID or something else...
I have this table:
| ID | COLOR |
+------+--------+
| 1 | Red |
| 1 | Green |
| 1 | Blue |
| 2 | Red |
| 2 | Green |
| 2 | Blue |
| 3 | Red |
| 3 | Brown |
| 3 | Orange |
Notice that ID = 1 and ID = 2 have precisely the same values for COLOR, however ID = 3 only shares the value COLOR = Red.
I would like to group the table as follows:
| COLOR | COUNT | GROUPING |
+--------+-------+----------+
| Red | 2 | Type 1 |
| Green | 2 | Type 1 |
| Blue | 2 | Type 1 |
| Red | 1 | Type 2 |
| Brown | 1 | Type 2 |
| Orange | 1 | Type 2 |
This would mean that ID = 1 and ID = 2 share the same 3 values for color and they are aggregated together as Type 1. Although ID = 3 shares one value for color to ID = 1 and ID = 2 (which is 'Red') the rest of the values are not shared, as such it is considered of Type 2 (different grouping).
The tables used are simple examples and are enough to replicate to the entire dateset, however each ID can have in theory hundreds of records with different values for colors in each row. However they are unique, one ID can't have the the same color in different rows.
My best attempt:
SELECT
ID,
COLOR,
CONCAT ('TYPE ', COUNT(8) OVER( PARTITION by ID)) AS COLOR_GROUP
FROM
{TABLE};
Result:
| ID | COLOR | GROUPING |
+------+--------+----------+
| 1 | Green | Type 3 |
| 1 | Blue | Type 3 |
| 1 | Red | Type 3 |
| 2 | Green | Type 3 |
| 2 | Blue | Type 3 |
| 2 | Red | Type 3 |
| 3 | Red | Type 3 |
| 3 | Brown | Type 3 |
| 3 | Orange | Type 3 |
Although the results are terrible I've tried different methods, none of them is better.
Hope I was clear enough.
Thank you for the help!
try the following:
declare #t table ( ID int,COLOR varchar(100))
insert into #t select 1 ,'Red'
insert into #t select 1 ,'Green'
insert into #t select 1 ,'Blue'
insert into #t select 2 ,'Red'
insert into #t select 2 ,'Green'
insert into #t select 2 ,'Blue'
insert into #t select 3 ,'Red'
insert into #t select 3 ,'Brown'
insert into #t select 3 ,'Orange'
select *, STUFF((SELECT CHAR(10) + ' '+COLOR
FROM #t t_in where t_in.ID=t.ID
order by COLOR
FOR XML PATH ('')) , 1, 1, '') COLOR_Combined
into #temp
from #t t
select COLOR, count(color) [COUNT], 'TYPE ' + convert(varchar(10), dense_rank() OVER (order by [grouping])) [GROUPING]
from
(
select id, COLOR, COLOR_Combined, (row_number() over (order by id) - row_number() over (partition by Color_Combined order by id)) [grouping]
from #temp
)t
group by COLOR, [grouping]
drop table if exists #temp
Please find the db<>fiddle here.
Related
I have two tables in a reporting database, one for orders, and one for order items. Each order can have multiple order items, along with a quantity for each:
Orders
+----------+---------+
| order_id | email |
+----------+---------+
| 1 | 1#1.com |
+----------+---------+
| 2 | 2#2.com |
+----------+---------+
| 3 | 3#3.com |
+----------+---------+
Order Items
+---------------+----------+----------+--------------+
| order_item_id | order_id | quantity | product_name |
+---------------+----------+----------+--------------+
| 1 | 1 | 1 | Tee Shirt |
+---------------+----------+----------+--------------+
| 2 | 1 | 3 | Jeans |
+---------------+----------+----------+--------------+
| 3 | 1 | 1 | Hat |
+---------------+----------+----------+--------------+
| 4 | 2 | 2 | Tee Shirt |
+---------------+----------+----------+--------------+
| 5 | 3 | 3 | Tee Shirt |
+---------------+----------+----------+--------------+
| 6 | 3 | 1 | Jeans |
+---------------+----------+----------+--------------+
For reporting purposes, I'd love to denormalise this data into a separate PostgreSQL view (or just run a query) that turns the data above into something like this:
+----------+---------+-----------+-------+-----+
| order_id | email | Tee Shirt | Jeans | Hat |
+----------+---------+-----------+-------+-----+
| 1 | 1#1.com | 1 | 3 | 1 |
+----------+---------+-----------+-------+-----+
| 2 | 2#2.com | 2 | 0 | 0 |
+----------+---------+-----------+-------+-----+
| 3 | 3#3.com | 3 | 1 | 0 |
+----------+---------+-----------+-------+-----+
ie, it's a sum of the quantity of each item within the order with the product name; and the product names set as the column titles. Do I need to use something like crosstab to do this, or is there a clever way using subqueries even if I don't know the list of distinct product names at before the query runs.
This is one possible answer:
create table orders
(
orders_id int PRIMARY KEY,
email text NOT NULL
);
create table orders_items
(
order_item_id int PRIMARY KEY,
orders_id int REFERENCES orders(orders_id) NOT NULL,
quantity int NOT NULL,
product_name text NOT NULL
);
insert into orders VALUES (1, '1#1.com');
insert into orders VALUES (2, '2#2.com');
insert into orders VALUES (3, '3#3.com');
insert into orders_items VALUES (1,1,1,'T-Shirt');
insert into orders_items VALUES (2,1,3,'Jeans');
insert into orders_items VALUES (3,1,1,'Hat');
insert into orders_items VALUES (4,2,2,'T-Shirt');
insert into orders_items VALUES (5,3,3,'T-Shirt');
insert into orders_items VALUES (6,3,1,'Jeans');
select
orders.orders_id,
email,
COALESCE(tshirt.quantity, 0) as "T-Shirts",
COALESCE(jeans.quantity,0) as "Jeans",
COALESCE(hat.quantity, 0) as "Hats"
from
orders
left join (select orders_id, quantity from orders_items where product_name = 'T-Shirt')
as tshirt ON (tshirt.orders_id = orders.orders_id)
left join (select orders_id, quantity from orders_items where product_name = 'Jeans')
as jeans ON (jeans.orders_id = orders.orders_id)
left join (select orders_id, quantity from orders_items where product_name = 'Hat')
as hat ON (hat.orders_id = orders.orders_id)
;
Tested with postgresql. Result:
orders_id | email | T-Shirts | Jeans | Hats
-----------+---------+----------+-------+------
1 | 1#1.com | 1 | 3 | 1
2 | 2#2.com | 2 | 0 | 0
3 | 3#3.com | 3 | 1 | 0
(3 rows)
Based on your comment, you can try to use tablefunc like this:
CREATE EXTENSION tablefunc;
SELECT * FROM crosstab
(
'SELECT orders_id, product_name, quantity FROM orders_items ORDER BY 1',
'SELECT DISTINCT product_name FROM orders_items ORDER BY 1'
)
AS
(
orders_id text,
TShirt text,
Jeans text,
Hat text
);
But I think you are thinking the wrong way about SQL. You usually know which rows you want and have to tell it SQL. "Rotating tables" 90 degrees is not part of SQL and should be avoided.
Working on pretty large table in SQL-Server. Table has some identical rows. I need to remove duplicate rows. Problem is I cannot alter this table i.e. to create an ID column.
I could update one column value of the other row on pairs of duplicates. Then delete afterwards using this value.
How to update only one these rows?
For example: Firstly / lastly inserted, First occurrence, newest / oldest..
Thanks!
table Structure
NrValue | Comment | Value1 | Value2 | Value3 |
--------|-----------|-----------|-----------|---------------|
00000 | data0 | zz | top | vivalasvegas|
00100 | NULL | N/A | sex | no |
00100 | NULL | N/A | sex | no |
00200 | NULL | female | sex | yes |
00200 | NULL | female | sex | yes |
00300 | NULL | male | sex | yesplease |
00300 | NULL | male | sex | yesplease |
00400 | data21 | M | -- | na |
00500 | NULL | F | ezig | na |
So, I could use 'Comment' -column to update but I cannot touch other than duplicate rows. I know by NrValue which rows can be updated.
Result would be:
NrValue | Comment | Value1 | Value2 | Value3 |
--------|-----------|-----------|-----------|---------------|
00000 | data0 | zz | top | vivalasvegas|
00100 | 1 | N/A | sex | no |
00100 | 2 | N/A | sex | no |
00200 | 3 | female | sex | yes |
00200 | 4 | female | sex | yes |
00300 | 5 | male | sex | yesplease |
00300 | 6 | male | sex | yesplease |
00400 | data21 | M | -- | na |
00500 | NULL | F | ezig | na |
Lastly I delete rows where NrValue = 00100, 00200 or 00300 AND Comment = 2, 4 or 6.
Use something like
ROW_NUMBER() OVER(PARTITION BY AllRelevantColumns ORDER BY SomeOrderCriteria)
This will generate a 1 for all rows, but duplicates get a 2 (or a 3 ...)
You might place this value in a new column or use this for cleaning...
UPDATE Following your test data...
DECLARE #mockup TABLE(NrValue INT,Comment VARCHAR(100),Value1 VARCHAR(100),Value2 VARCHAR(100),Value3 VARCHAR(100));
INSERT INTO #mockup VALUES
(00000,'data0','zz','top','vivalasvegas')
,(00100,'NULL','N/A','sex','no')
,(00100,'NULL','N/A','sex','no')
,(00200,'NULL','female','sex','yes')
,(00200,'NULL','female','sex','yes')
,(00300,'NULL','male','sex','yesplease')
,(00300,'NULL','male','sex','yesplease')
,(00400,'data21','M','--','na')
,(00500,'NULL','F','ezig','na');
WITH Numbered AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY NrValue ORDER BY (SELECT NULL)) AS DupNr
,*
FROM #mockup
)
DELETE FROM Numbered
WHERE DupNr>1;
SELECT * FROM #mockup;
This concept is called updateable CTE. The DELETE FROM Numbered ... will affect the underlying table actually...
If the NrValue is not enough to detect a row as duplicate, just add more columns to the PARTITION BY
You don't need update, you want to delete duplicates so why do you want that intermediate step?
Yor code should look like this:
declare #t table (col1 int, col2 int);
insert into #t values
(1, 1), (1, 1),
(1, 2), (1, 2),(1, 2), (1, 2),
(3, 2), (3, 2),(3, 2);
with cte as
(
select *, row_number() over (partition by col1, col2 order by 1/0) rn
from #t
)
delete cte
where rn > 1;
select *
from #t;
Sorry for not posting it in comment (rows limit and code formatting lost there)
I have 5 columns in SQL that I need to turn into a cross tab in Crystal.
This is what I have:
Key | RELATIONSHIP | DISABLED | LIMITED | RURAL | IMMIGRANT
-----------------------------------------------------------------
1 | Other Dependent | Yes | No | No | No
2 | Victim/Survivor | No | No | No | No
3 | Victim/Survivor | Yes | No | No | No
4 | Child | No | No | No | No
5 | Victim/Survivor | No | No | No | No
6 | Victim/Survivor | No | No | No | No
7 | Child | No | No | No | No
8 | Victim/Survivor | No | Yes | Yes | Yes
9 | Child | No | Yes | Yes | Yes
10 | Child | No | Yes | Yes | Yes
This is what I want the cross tab to look like (Distinct count on Key):
| Victim/Survivor | Child | Other Dependent | Total |
--------------------------------------------------------------
| DISABLED | 1 | 0 | 1 | 2 |
--------------------------------------------------------------
| LIMITED | 1 | 2 | 0 | 3 |
--------------------------------------------------------------
| RURAL | 1 | 2 | 0 | 3 |
--------------------------------------------------------------
| IMMIGRANT | 1 | 2 | 0 | 3 |
--------------------------------------------------------------
| TOTAL | 4 | 6 | 1 | 11 |
--------------------------------------------------------------
I used this formula in Crystal in an effort to combine 4 columns (Field name = {#OTHERDEMO})...
IF {TABLE.DISABLED} = "YES" THEN "DISABLED" ELSE
IF {TABLE.LIMITED} = "YES" THEN "LIMITED" ELSE
IF {TABLE.IMMIGRANT} = "YES" THEN "IMMIGRANT" ELSE
IF {TABLE.RURAL} = "YES" THEN "RURAL"
...then made the cross-tab with #OTHERDEMO as the rows, RELATIONSHIP as the Columns with a distinct count on KEY:
Problem is, once crystal hits the first "Yes" it stops counting thus not categorizing correctly in the cross-tab. So I get a table that counts the DISABILITY first and gives the correct display, then counts the Limited and gives some info, but then dumps everything else.
In the past, I have done mutiple conditional formulas...
IF {TABLE.DISABLED} = "YES" AND {TABLE.RELATIONSHIP} = "Victim/Survivor" THEN {TABLE.KEY} ELSE {#NULL}
(the #null formula is because Crystal, notoriously, gets confused with nulls.)
...then did a distinct count on Key, and finally summed it in the footer.
I am convinced there is another way to do this. Any help/ideas would be greatly appreciated.
If you unpivot those "DEMO" columns into rows it will make the crosstab super easy...
select
u.[Key],
u.[RELATIONSHIP],
u.[DEMO]
from
Table1
unpivot (
[b] for [DEMO] in ([DISABLED], [LIMITED], [RURAL], [IMMIGRANT])
) u
where
u.[b] = 'Yes'
SqlFiddle
or if you were stuck on SQL2000 compatibility level you could manually unpivot the Yes values...
select [Key], [REALTIONSHIP], [DEMO] = cast('DISABLED' as varchar(20))
from Table1
where [DISABLED] = 'Yes'
union
select [Key], [REALTIONSHIP], [DEMO] = cast('LIMITED' as varchar(20))
from Table1
where [LIMITED] = 'Yes'
union
select [Key], [REALTIONSHIP], [DEMO] = cast('RURAL' as varchar(20))
from Table1
where [RURAL] = 'Yes'
union
select [Key], [REALTIONSHIP], [DEMO] = cast('IMMIGRANT' as varchar(20))
from Table1
where [IMMIGRANT] = 'Yes'
For the crosstab, use a count on the Key column (aka row count), [DEMO] on rows, and [RELATIONSHIP] on columns.
This is a bit of a tricky question/situation and my search fu failed me.
Lets say i have the following data
| UID | SharedID | Type | Date |
|-----|----------|------|-----------|
| 1 | 1 | foo | 2/4/2016 |
| 2 | 1 | foo | 2/5/2016 |
| 3 | 1 | foo | 2/8/2016 |
| 4 | 1 | foo | 2/11/2016 |
| 5 | 2 | bar | 1/11/2016 |
| 6 | 2 | bar | 2/11/2016 |
| 7 | 3 | baz | 2/1/2016 |
| 8 | 3 | baz | 2/3/2016 |
| 9 | 3 | baz | 2/11/2016 |
And I would like to ommit a variable number of leading rows (most recent date in this case) and lets say that number is 2 in this example. The resulting table would be something like this:
| UID | SharedID | Type | Date |
|-----|----------|------|-----------|
| 1 | 1 | foo | 2/4/2016 |
| 2 | 1 | foo | 2/5/2016 |
| 7 | 3 | baz | 2/1/2016 |
Is this possible in SQL? Essentially I want to filter on an unknown number of rows which uses the date column as the order by. The goal is to get the oldest types and get a list of UID's in the process.
Sure, it's possible. Use a ROW_NUMBER function to assign a value to each row, partitioning by the SharedID column so that the count restarts every time that ID changes, and select those rows with a value greater than your limit.
WITH cteNumberedRows AS (
SELECT UID, SharedID, Type, Date,
ROW_NUMBER() OVER(PARTITION BY SharedID ORDER BY Date DESC) AS RowNum
FROM YourTable
)
SELECT UID, SharedID, Type, Date
FROM cteNumberedRows
WHERE RowNum > 2;
Not sure if I understand what you mean but something like this?
SELECT * FROM MyTable t1 JOIN MyTable T2 ON t2.id NOT IN (
SELECT TOP 2 UID FROM myTable
WHERE SharedID = t1.sharedID
ORDER BY [Date] DESC
)
can i SELECT distinct 2 fields (provfrom, provto) on table AS one column
with condition :
- values of 2 fields is never same in one row
- values in field provfrom can be inside field provto but in different row
- values in field provto can be inside field provfrom but in different row
example :
i have 2 column as below
-------------------------
| provfrom | provto |
-------------------------
| 2 | 4 |
| 3 | 7 |
| 3 | 7 |
| 5 | 2 |
| 5 | 2 |
| 7 | 2 |
| 7 | 2 |
| 1 | 5 |
| 2 | 5 |
| 2 | 8 |
| 5 | 8 |
-------------------------
the result that i want by disticnt is as below
-------------
| prov |
-------------
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 7 |
| 8 |
-------------
Can i do this in sql server?
i try to found out by explore google, but not found it
Thanks
You can use Union keyword which will give distinct elements from Both Tables
select provfrom from mytable
union
select provTo from mytable
You can either do this with a union or by using apply, the apply has less IO so I would go with the apply query.
create table #temp
(
provfrom tinyint,
provto tinyint
);
insert into #temp (provfrom, provto)
values (2,4),(3,7),(3,7),
(5,2),(5,2),(7,2),
(7,2),(1,5),(2,5),
(2,8),(5,8);
set statistics io on;
select distinct
a.provfromto
from #temp as t
cross apply (values (t.provfrom),(t.provto)) as a(provfromto);
select provfrom from #temp
union
select provTo from #temp
set statistics io off;
drop table #temp;
Try this:
select t.prov
from
(select provfrom as prov
from yourtable
union
select provto
from yourtable) as t
order by t.prov
UNION function apply a distinct clause, so you'll get all value per one occurence.
The external query about ordering your result set