Pairs of duplicate, identical rows. Update just one of them - sql-server

Working on pretty large table in SQL-Server. Table has some identical rows. I need to remove duplicate rows. Problem is I cannot alter this table i.e. to create an ID column.
I could update one column value of the other row on pairs of duplicates. Then delete afterwards using this value.
How to update only one these rows?
For example: Firstly / lastly inserted, First occurrence, newest / oldest..
Thanks!
table Structure
NrValue | Comment | Value1 | Value2 | Value3 |
--------|-----------|-----------|-----------|---------------|
00000 | data0 | zz | top | vivalasvegas|
00100 | NULL | N/A | sex | no |
00100 | NULL | N/A | sex | no |
00200 | NULL | female | sex | yes |
00200 | NULL | female | sex | yes |
00300 | NULL | male | sex | yesplease |
00300 | NULL | male | sex | yesplease |
00400 | data21 | M | -- | na |
00500 | NULL | F | ezig | na |
So, I could use 'Comment' -column to update but I cannot touch other than duplicate rows. I know by NrValue which rows can be updated.
Result would be:
NrValue | Comment | Value1 | Value2 | Value3 |
--------|-----------|-----------|-----------|---------------|
00000 | data0 | zz | top | vivalasvegas|
00100 | 1 | N/A | sex | no |
00100 | 2 | N/A | sex | no |
00200 | 3 | female | sex | yes |
00200 | 4 | female | sex | yes |
00300 | 5 | male | sex | yesplease |
00300 | 6 | male | sex | yesplease |
00400 | data21 | M | -- | na |
00500 | NULL | F | ezig | na |
Lastly I delete rows where NrValue = 00100, 00200 or 00300 AND Comment = 2, 4 or 6.

Use something like
ROW_NUMBER() OVER(PARTITION BY AllRelevantColumns ORDER BY SomeOrderCriteria)
This will generate a 1 for all rows, but duplicates get a 2 (or a 3 ...)
You might place this value in a new column or use this for cleaning...
UPDATE Following your test data...
DECLARE #mockup TABLE(NrValue INT,Comment VARCHAR(100),Value1 VARCHAR(100),Value2 VARCHAR(100),Value3 VARCHAR(100));
INSERT INTO #mockup VALUES
(00000,'data0','zz','top','vivalasvegas')
,(00100,'NULL','N/A','sex','no')
,(00100,'NULL','N/A','sex','no')
,(00200,'NULL','female','sex','yes')
,(00200,'NULL','female','sex','yes')
,(00300,'NULL','male','sex','yesplease')
,(00300,'NULL','male','sex','yesplease')
,(00400,'data21','M','--','na')
,(00500,'NULL','F','ezig','na');
WITH Numbered AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY NrValue ORDER BY (SELECT NULL)) AS DupNr
,*
FROM #mockup
)
DELETE FROM Numbered
WHERE DupNr>1;
SELECT * FROM #mockup;
This concept is called updateable CTE. The DELETE FROM Numbered ... will affect the underlying table actually...
If the NrValue is not enough to detect a row as duplicate, just add more columns to the PARTITION BY

You don't need update, you want to delete duplicates so why do you want that intermediate step?
Yor code should look like this:
declare #t table (col1 int, col2 int);
insert into #t values
(1, 1), (1, 1),
(1, 2), (1, 2),(1, 2), (1, 2),
(3, 2), (3, 2),(3, 2);
with cte as
(
select *, row_number() over (partition by col1, col2 order by 1/0) rn
from #t
)
delete cte
where rn > 1;
select *
from #t;
Sorry for not posting it in comment (rows limit and code formatting lost there)

Related

Extract into multiple columns from JSON with PostgreSQL

I have a column item_id that contains data in JSON (like?) structure.
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
| id | item_id |
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
| 56711 | {itemID":["0530#2#1974","0538\/2#2#1974","0538\/3#2#1974","0538\/18#2#1974","0539#2#1974"]}" |
| 56712 | {itemID":["0138528#2#4221","0138529#2#4221","0138530#2#4221","0138539#2#4221","0118623\/2#2#4220"]}" |
| 56721 | {itemID":["2704\/1#1#1356"]}" |
| 56722 | {itemID":["0825\/2#2#3349","0840#2#3349","0844\/10#2#3349","0844\/11#2#3349","0844\/13#2#3349","0844\/14#2#3349","0844\/15#2#3349"]}" |
| 57638 | {itemID":["0161\/1#2#3364","0162\/1#2#3364","0163\/2#2#3364"]}" |
| 57638 | {itemID":["109#1#3364","110\/1#1#3364"]}" |
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
I need the last four digits before every comma (if there is) and the last 4 digits distincted and separated into individual colums.
The distinct should happen across id as well, so only one result row with id: 57638 is permitted.
Here is a fiddle with a code draft that is not giving the right answer.
The desired result should look like this:
+----------+-----------+-----------+
| id | item_id_1 | item_id_2 |
+----------+-----------+-----------+
| 56711 | 1974 | |
| 56712 | 4220 | 4221 |
| 56721 | 1356 | |
| 56722 | 3349 | |
| 57638 | 3364 | 3365 |
+----------+-----------+-----------+
There can be quite a lot 'item_id_%' column in the results.
with the_table (id, item_id) as (
values
(56711, '{"itemID":["0530#2#1974","0538\/2#2#1974","0538\/3#2#1974","0538\/18#2#1974","0539#2#1974"]}'),
(56712, '{"itemID":["0138528#2#4221","0138529#2#4221","0138530#2#4221","0138539#2#4221","0118623\/2#2#4220"]}'),
(56721, '{"itemID":["2704\/1#1#1356"]}'),
(56722, '{"itemID":["0825\/2#2#3349","0840#2#3349","0844\/10#2#3349","0844\/11#2#3349","0844\/13#2#3349","0844\/14#2#3349","0844\/15#2#3349"]}'),
(57638, '{"itemID":["0161\/1#2#3364","0162\/1#2#3364","0163\/2#2#3364"]}'),
(57638, '{"itemID":["109#1#3365","110\/1#1#3365"]}')
)
select id
,(array_agg(itemid)) [1] itemid_1
,(array_agg(itemid)) [2] itemid_2
from (
select distinct id
,split_part(replace(json_array_elements(item_id::json -> 'itemID')::text, '"', ''), '#', 3)::int itemid
from the_table
order by 1
,2
) t
group by id
DEMO
You can unnest the json array, get the last 4 characters of each element as a number, then do conditional aggregation:
select
id,
max(val) filter(where rn = 1) item_id_1,
max(val) filter(where rn = 2) item_id_2
from (
select
id,
right(val, 4)::int val,
dense_rank() over(partition by id order by right(val, 4)::int) rn
from mytable t
cross join lateral jsonb_array_elements_text(t.item_id -> 'itemID') as x(val)
) t
group by id
You can add more conditional max()s to the outer query to handle more possible values.
Demo on DB Fiddle:
id | item_id_1 | item_id_1
----: | --------: | --------:
56711 | 1974 | null
56712 | 4220 | 4221
56721 | 1356 | null
56722 | 3349 | null
57638 | 3364 | 3365

SQL Group By a Partition By

This must be accomplished in MS SQL Server. I believe OVER( PARTITION BY) must be used, but I've failed at all my tries and I end up counting the records to each ID or something else...
I have this table:
| ID | COLOR |
+------+--------+
| 1 | Red |
| 1 | Green |
| 1 | Blue |
| 2 | Red |
| 2 | Green |
| 2 | Blue |
| 3 | Red |
| 3 | Brown |
| 3 | Orange |
Notice that ID = 1 and ID = 2 have precisely the same values for COLOR, however ID = 3 only shares the value COLOR = Red.
I would like to group the table as follows:
| COLOR | COUNT | GROUPING |
+--------+-------+----------+
| Red | 2 | Type 1 |
| Green | 2 | Type 1 |
| Blue | 2 | Type 1 |
| Red | 1 | Type 2 |
| Brown | 1 | Type 2 |
| Orange | 1 | Type 2 |
This would mean that ID = 1 and ID = 2 share the same 3 values for color and they are aggregated together as Type 1. Although ID = 3 shares one value for color to ID = 1 and ID = 2 (which is 'Red') the rest of the values are not shared, as such it is considered of Type 2 (different grouping).
The tables used are simple examples and are enough to replicate to the entire dateset, however each ID can have in theory hundreds of records with different values for colors in each row. However they are unique, one ID can't have the the same color in different rows.
My best attempt:
SELECT
ID,
COLOR,
CONCAT ('TYPE ', COUNT(8) OVER( PARTITION by ID)) AS COLOR_GROUP
FROM
{TABLE};
Result:
| ID | COLOR | GROUPING |
+------+--------+----------+
| 1 | Green | Type 3 |
| 1 | Blue | Type 3 |
| 1 | Red | Type 3 |
| 2 | Green | Type 3 |
| 2 | Blue | Type 3 |
| 2 | Red | Type 3 |
| 3 | Red | Type 3 |
| 3 | Brown | Type 3 |
| 3 | Orange | Type 3 |
Although the results are terrible I've tried different methods, none of them is better.
Hope I was clear enough.
Thank you for the help!
try the following:
declare #t table ( ID int,COLOR varchar(100))
insert into #t select 1 ,'Red'
insert into #t select 1 ,'Green'
insert into #t select 1 ,'Blue'
insert into #t select 2 ,'Red'
insert into #t select 2 ,'Green'
insert into #t select 2 ,'Blue'
insert into #t select 3 ,'Red'
insert into #t select 3 ,'Brown'
insert into #t select 3 ,'Orange'
select *, STUFF((SELECT CHAR(10) + ' '+COLOR
FROM #t t_in where t_in.ID=t.ID
order by COLOR
FOR XML PATH ('')) , 1, 1, '') COLOR_Combined
into #temp
from #t t
select COLOR, count(color) [COUNT], 'TYPE ' + convert(varchar(10), dense_rank() OVER (order by [grouping])) [GROUPING]
from
(
select id, COLOR, COLOR_Combined, (row_number() over (order by id) - row_number() over (partition by Color_Combined order by id)) [grouping]
from #temp
)t
group by COLOR, [grouping]
drop table if exists #temp
Please find the db<>fiddle here.

T-SQL Limited Cross Join

I want to join 2 tables such that I get the NAR for every combination of Type and BillingID where it exists.
Where a BillingID doesn't have a certain Type, then either NULL or 0 is returned for the NAR along with the Type and BillingID.
Is something like this even possible using SQL?
A simplified version of my data is shown below:
Type list:
+----------+
| Type |
+----------+
| NEW |
| CHNG |
| LAP |
+----------+
Data:
+----------+-----------+-----+
| Type | BillingID | NAR |
+----------+-----------+-----+
| NEW | ABC | 5 |
| CHNG | ABC | 15 |
| LAP | ABC | 10 |
| CHNG | DEF | 20 |
+----------+-----------+-----+
Desired result:
+----------+-----------+-----+
| Type | BillingID | NAR |
+----------+-----------+-----+
| NEW | ABC | 5 |
| CHNG | ABC | 15 |
| LAP | ABC | 10 |
| CHNG | DEF | 20 |
| NEW | DEF | 0 |
| LAP | DEF | 0 |
+----------+-----------+-----+
The last 2 rows are what is causing me problems.
I think you can do it like this:
declare #table table (type1 varchar(5))
insert into #table
values
('new'),
('chng'),
('lap')
declare #table2 table (typeid varchar(5),billingid varchar(5),nar int)
insert into #table2
values
( 'NEW', 'ABC', 5 ),
( 'CHNG' , 'ABC', 15 ),
( 'LAP' , 'ABC', 10 ),
( 'CHNG' , 'DEF', 20 )
select Z.*,case when c.nar IS null then 0 else c.nar end as nar from (
select * from #table a
outer apply (select distinct billingid from #table2 b ) p
)Z
left join #table2 c on Z.type1 = c.typeid and Z.billingid = c.billingid
order by billingid
Result

Update table combining rows based on the same column value

I'm having a table called table such that:
| id | name | city |
|----|-------|---------|
| 0 | Rose | Madrid |
| 1 | Alex | Lima |
| 2 | Rose | Sidney |
| 3 | Mario | Glasgow |
And I need to UPDATE the table so that two rows sharing the same name combined into a new one and deleted.
| id | name | city |
|----|-------|----------------|
| 1 | Alex | Lima |
| 3 | Mario | Glasgow |
| 4 | Rose | Madrid, Sidney |
I don't care if it has to be done in several SQL statements.
So far all I've done is to list the rows that are affected by this.
SELECT *
FROM table
WHERE name IN (
SELECT name
FROM table
GROUP BY name
HAVING COUNT(*) > 1
);
Assuming that id is auto increment primary key, you need an INSERT and a DELETE statement:
insert into tablename(name, city)
select name, group_concat(city, ',')
from tablename
group by name
having count(*) > 1;
delete from tablename
where instr(name, ',') = 0
and exists (
select 1 from tablename t
where t.id <> tablename.id and t.name = tablename.name
and ',' || t.city || ',' like '%,' || tablename.city || ',%'
);
See the demo.
Results:
| id | name | city |
| --- | ----- | ------------- |
| 1 | Alex | Lima |
| 3 | Mario | Glasgow |
| 4 | Rose | Madrid,Sidney |

How do you create a query which returns dynamic column names in Postgresql?

I have two tables in a reporting database, one for orders, and one for order items. Each order can have multiple order items, along with a quantity for each:
Orders
+----------+---------+
| order_id | email |
+----------+---------+
| 1 | 1#1.com |
+----------+---------+
| 2 | 2#2.com |
+----------+---------+
| 3 | 3#3.com |
+----------+---------+
Order Items
+---------------+----------+----------+--------------+
| order_item_id | order_id | quantity | product_name |
+---------------+----------+----------+--------------+
| 1 | 1 | 1 | Tee Shirt |
+---------------+----------+----------+--------------+
| 2 | 1 | 3 | Jeans |
+---------------+----------+----------+--------------+
| 3 | 1 | 1 | Hat |
+---------------+----------+----------+--------------+
| 4 | 2 | 2 | Tee Shirt |
+---------------+----------+----------+--------------+
| 5 | 3 | 3 | Tee Shirt |
+---------------+----------+----------+--------------+
| 6 | 3 | 1 | Jeans |
+---------------+----------+----------+--------------+
For reporting purposes, I'd love to denormalise this data into a separate PostgreSQL view (or just run a query) that turns the data above into something like this:
+----------+---------+-----------+-------+-----+
| order_id | email | Tee Shirt | Jeans | Hat |
+----------+---------+-----------+-------+-----+
| 1 | 1#1.com | 1 | 3 | 1 |
+----------+---------+-----------+-------+-----+
| 2 | 2#2.com | 2 | 0 | 0 |
+----------+---------+-----------+-------+-----+
| 3 | 3#3.com | 3 | 1 | 0 |
+----------+---------+-----------+-------+-----+
ie, it's a sum of the quantity of each item within the order with the product name; and the product names set as the column titles. Do I need to use something like crosstab to do this, or is there a clever way using subqueries even if I don't know the list of distinct product names at before the query runs.
This is one possible answer:
create table orders
(
orders_id int PRIMARY KEY,
email text NOT NULL
);
create table orders_items
(
order_item_id int PRIMARY KEY,
orders_id int REFERENCES orders(orders_id) NOT NULL,
quantity int NOT NULL,
product_name text NOT NULL
);
insert into orders VALUES (1, '1#1.com');
insert into orders VALUES (2, '2#2.com');
insert into orders VALUES (3, '3#3.com');
insert into orders_items VALUES (1,1,1,'T-Shirt');
insert into orders_items VALUES (2,1,3,'Jeans');
insert into orders_items VALUES (3,1,1,'Hat');
insert into orders_items VALUES (4,2,2,'T-Shirt');
insert into orders_items VALUES (5,3,3,'T-Shirt');
insert into orders_items VALUES (6,3,1,'Jeans');
select
orders.orders_id,
email,
COALESCE(tshirt.quantity, 0) as "T-Shirts",
COALESCE(jeans.quantity,0) as "Jeans",
COALESCE(hat.quantity, 0) as "Hats"
from
orders
left join (select orders_id, quantity from orders_items where product_name = 'T-Shirt')
as tshirt ON (tshirt.orders_id = orders.orders_id)
left join (select orders_id, quantity from orders_items where product_name = 'Jeans')
as jeans ON (jeans.orders_id = orders.orders_id)
left join (select orders_id, quantity from orders_items where product_name = 'Hat')
as hat ON (hat.orders_id = orders.orders_id)
;
Tested with postgresql. Result:
orders_id | email | T-Shirts | Jeans | Hats
-----------+---------+----------+-------+------
1 | 1#1.com | 1 | 3 | 1
2 | 2#2.com | 2 | 0 | 0
3 | 3#3.com | 3 | 1 | 0
(3 rows)
Based on your comment, you can try to use tablefunc like this:
CREATE EXTENSION tablefunc;
SELECT * FROM crosstab
(
'SELECT orders_id, product_name, quantity FROM orders_items ORDER BY 1',
'SELECT DISTINCT product_name FROM orders_items ORDER BY 1'
)
AS
(
orders_id text,
TShirt text,
Jeans text,
Hat text
);
But I think you are thinking the wrong way about SQL. You usually know which rows you want and have to tell it SQL. "Rotating tables" 90 degrees is not part of SQL and should be avoided.

Resources