Related
This is the Scenario : I have a duplicate rows in my table with the same Id , Name and so on .
1) I have to find the duplicate row matching all the criteria ( this is done)
2) Delete them only if the criteria match
3) Use the id of the deleted record and update the existing row in the table
For this i have created a 2 temp table. Temp1 is the table with all the record. Temp2 consist of duplicated row.
IF OBJECT_ID('tempdb..#Temp1') IS NOT NULL
DROP TABLE #Temp1
IF OBJECT_ID('tempdb..#Temp2') IS NOT NULL
DROP TABLE #Temp2
IF OBJECT_ID('tempdb..#Temp3') IS NOT NULL
DROP TABLE #Temp3
CREATE Table #Temp1 (
Id int,
Name NVARCHAR(64),
StudentNo INT NULL,
ClassCode NVARCHAR(8) NULL,
Section NVARCHAR(8) NULL,
)
INSERT INTO #Temp1 (Id, Name,StudentNo,ClassCode,Section) Values(1,'Joe',123,'A1', 'I')
INSERT INTO #Temp1 (Id, Name,StudentNo,ClassCode,Section) Values(1,'Joe',123,'A1', 'I')
INSERT INTO #Temp1 (Id, Name,StudentNo,ClassCode,Section) Values(2,'Harry',113,'X2', 'H')
INSERT INTO #Temp1 (Id, Name,StudentNo,ClassCode,Section) Values(2,'Harry',113,'X2', 'H')
INSERT INTO #Temp1 (Id, Name,StudentNo,ClassCode,Section) Values(3,'Elle',121,'J1', 'E1')
INSERT INTO #Temp1 (Id, Name,StudentNo,ClassCode,Section) Values(3,'Elle',121,'J1', 'E')
INSERT INTO #Temp1 (Id, Name,StudentNo,ClassCode,Section) Values(8,'Jane',191,'A1', 'E')
INSERT INTO #Temp1 (Id, Name,StudentNo,ClassCode,Section) Values(5,'Silva',811,'S1', 'SE')
INSERT INTO #Temp1 (Id, Name,StudentNo,ClassCode,Section) Values(6,'Juan',411,'S2', 'SE')
INSERT INTO #Temp1 (Id, Name,StudentNo,ClassCode,Section) Values(7,'Carla',431,'S2', 'SE')
;WITH CTE AS (
select
ROW_NUMBER() over (partition by Id
, StudentNo
order by Id, StudentNo)as Duplicate_RowNumber
, * from #Temp1 )
select t1.Id,t1.Name,t1.StudentNo,t1.Section,t1.ClassCode
INTO #Temp2
from CTE as c INNER JOIN #Temp1 as t1 ON t1.Id = c.Id
and t1.StudentNo = t1.StudentNo
and c.Duplicate_RowNumber >1
-- this will have 6 rows all the duplicates are included
--select * from #Temp2
-- this is for output clause
DECLARE #inserted Table (Id int,
Name NVARCHAR(64),
StudentNo INT NULL,
ClassCode NVARCHAR(8) NULL,
Section NVARCHAR(8) NULL)
DELETE FROM #temp1
OUTPUT deleted.Id , deleted.Name ,deleted.StudentNo ,deleted.ClassCode ,deleted.Section into #inserted
WHERE EXISTS ( SELECT * FROM #Temp2 as t2
where #temp1.Id = t2.Id
and #temp1.Name = t2.Name
and #temp1.StudentNo = t2.StudentNo
and #temp1.ClassCode = t2.ClassCode
and #temp1.Section = t2.Section)
-- this is to check what is delete so that i can join it and update the table temp1
select * from #inserted
You can see below the query should not delete the last two highlighted column because the Section does not match. It should only delete matching criteria from Temp1 and Temp2.
Scenario 2 : Delete the duplicate record in Temp1 and use the key in order to update the data to NULL for Section and Classcode . This is what i expect with the highlighted to be NULLs .
You can run this query yourself - just copy and paste.
Yes, for scenario #1 it is going to delete the rows because the problem is in this section.
I added this table for references.
Added this #temp2 table to clarify for later use.
CREATE Table #Temp2 (
Id int,
Name Varchar(64),
StudentNo INT NULL,
ClassCode Varchar(8) NULL,
Section Varchar(8) NULL,
)
IF OBJECT_ID('tempdb..#tmp4') IS NOT NULL
DROP TABLE #tmp4
select t1.Id,t1.Name,t1.StudentNo,t1.Section,t1.ClassCode,
Duplicate_RowNumber
INTO #Duplicatedata
from CTE as c INNER JOIN #Temp1 as t1 ON t1.Id = c.Id
and t1.StudentNo = t1.StudentNo
and c.Duplicate_RowNumber >1
select * from #Duplicatedata
This is going to satisfy both condition as #temp 1 will have both rows for Elle as your join condition is only on ID and Student No.
I added row number column for clarity.
Id Name StudentNo Section ClassCode Duplicate_RowNumber
1 Joe 123 I A1 2
1 Joe 123 I A1 2
2 Harry 113 H X2 2
2 Harry 113 H X2 2
3 Elle 121 E1 J1 2
3 Elle 121 E J1 2
As your Partition is based by Student No and ID, every duplicate row will have 2 or more row numbers.
You can use this approach to delete.
select
ROW_NUMBER() over (partition by Id
, StudentNo
order by Id, StudentNo, section)as Duplicate_RowNumber
, * into #tmp4 from #Temp1
--You can add section in your order as well for consistency purpose.
delete
from #tmp4
output deleted.id, deleted.Name, deleted.StudentNo, deleted.ClassCode,
deleted.Section into #Temp2
where Duplicate_RowNumber > 1
After that it seems like you want to keep one row in your final table and put the other one in you deleted table. For Elle it will delete one of the rows from Final table and keep only one since your partition is not based on section.
To make sure that you delete 1 row from your final table you can use this.
DELETE t
OUTPUT deleted.Id , deleted.Name ,deleted.StudentNo ,deleted.ClassCode
,deleted.Section into #inserted FROM
(select *, row_number() over (Partition by tm.name, tm.studentNo Order by ID,
StudentNo, section ) rownum from #temp1 tm) t
join #Temp2 t2 on t.Id = t2.Id
and t.Name = t2.Name
and t.StudentNo = t2.StudentNo
and t.ClassCode = t2.ClassCode
and t.Section = t2.Section
where t.rownum > 1
If you notice I added this row number, so that it will not two delete the rows from final table, since Joe and Harry has all the matching attributes, and it will delete two rows.
select * from #inserted
Output you get:
Id Name StudentNo ClassCode Section
3 Elle 121 J1 E1
2 Harry 113 X2 H
1 Joe 123 A1 I
Finally you can update final table in this way. #Scenario 2
update TMP
SET ClassCode = NULL, SECTION = NULL
FROM
#Temp1 TMP
JOIN #INSERTED I ON TMP.Id = I.Id
AND TMP.StudentNo = I.StudentNo
SELECT * FROM #Temp1
Final Output:
Id Name StudentNo ClassCode Section
1 Joe 123 NULL NULL
2 Harry 113 NULL NULL
3 Elle 121 NULL NULL
8 Jane 191 A1 E
5 Silva 811 S1 SE
6 Juan 411 S2 SE
7 Carla 431 S2 SE
Please note that I have added scripts and output only for the parts where it required change, and rest part is same script provided by you.
I was not able to quote correct tile for the question.
Below is my table.
Expected Output : With ID and SequenceNo column. Was unable to upload image.
ID Act SequenceNo1
1 1 1
2 2 NULL
3 3 NULL
4 4 NULL
5 1 5
6 2 NULL
7 3 NULL
8 4 NULL
9 5 NULL
10 6 NULL
11 5 11
12 6 NULL
13 1 13
14 2 NULL
15 3 NULL
16 4 NULL
17 5 NULL
18 6 NULL
19 1 19
20 2 NULL
21 3 NULL
22 4 NULL
23 5 NULL
24 6 NULL
At the start last column SequenceNo is NULL.`
My requirement is to update value of ID column to SequenceNo column whenever new series of Act column is started.
Act column has value of 1 to 6. There can be case where any number from 1 to 6 is missing from Act.
Example1 : ID 1 to 4 - Act is correct but, in next row (ID=5) Act is restarted. Hence need to update SequenceNo column.
Example2 : ID 5 to 10 are correct. But next row (ID=11;Act=5) has new sequence hence need to update SequenceNo column.
CREATE TABLE #tmp
(
ID int
, ScheduleID varchar(50)
,DCNumber VARCHAR(50)
, BuildingID varchar(10)
, StoreNumber int
, [DayOfWeek] int
, [Tm] varchar(10)
,[Act] int
, SequenceNo int
)
INSERT INTO #tmp SELECT 1,'WAS',9003,900301,254,1,'00:00',1,NULL
INSERT INTO #tmp SELECT 2,'WAS',9003,900301,254,1,'00:00',2,NULL
INSERT INTO #tmp SELECT 3,'WAS',9003,900301,254,1,'00:00',3,NULL
INSERT INTO #tmp SELECT 4,'WAS',9003,900301,254,1,'00:00',4,NULL
INSERT INTO #tmp SELECT 5,'WAS',9003,900301,254,2,'00:00',1,NULL
INSERT INTO #tmp SELECT 6,'WAS',9003,900301,254,2,'00:00',2,NULL
INSERT INTO #tmp SELECT 7,'WAS',9003,900301,254,2,'00:00',3,NULL
INSERT INTO #tmp SELECT 8,'WAS',9003,900301,254,2,'00:00',4,NULL
INSERT INTO #tmp SELECT 9,'WAS',9003,900301,254,2,'00:00',5,NULL
INSERT INTO #tmp SELECT 10,'WAS',9003,900301,254,2,'00:00',6,NULL
INSERT INTO #tmp SELECT 11,'WAS',9003,900301,254,3,'00:00',5,NULL
INSERT INTO #tmp SELECT 12,'WAS',9003,900301,254,3,'00:00',6,NULL
INSERT INTO #tmp SELECT 13,'WAS',9003,900301,254,4,'00:00',1,NULL
INSERT INTO #tmp SELECT 14,'WAS',9003,900301,254,4,'00:00',2,NULL
INSERT INTO #tmp SELECT 15,'WAS',9003,900301,254,4,'00:00',3,NULL
INSERT INTO #tmp SELECT 16,'WAS',9003,900301,254,4,'00:00',4,NULL
INSERT INTO #tmp SELECT 17,'WAS',9003,900301,254,5,'00:00',5,NULL
INSERT INTO #tmp SELECT 18,'WAS',9003,900301,254,5,'00:00',6,NULL
INSERT INTO #tmp SELECT 19,'WAS',9003,900301,254,6,'00:00',1,NULL
INSERT INTO #tmp SELECT 20,'WAS',9003,900301,254,6,'00:00',2,NULL
INSERT INTO #tmp SELECT 21,'WAS',9003,900301,254,6,'00:00',3,NULL
INSERT INTO #tmp SELECT 22,'WAS',9003,900301,254,6,'00:00',4,NULL
INSERT INTO #tmp SELECT 23,'WAS',9003,900301,254,7,'00:00',5,NULL
INSERT INTO #tmp SELECT 24,'WAS',9003,900301,254,7,'00:00',6,NULL
I have build one logic but that is time consuming.
DECLARE #Act INT, #iStart INT, #iMax INT, #iFirst INT
SET #iFirst = 7
SET #iMax = (SELECT MAX(ID) FROM #tmp)
SET #iStart = 1
WHILE(#iStart <= #iMax)
BEGIN
SET #Act = (SELECT Act FROM #tmp WHERE ID = #iStart)
IF(#iFirst > #Act )
BEGIN
UPDATE #tmp SET SequenceNo = #iStart WHERE ID = #iStart
END
SET #iFirst = #Act
SET #iStart = #iStart + 1
END
I am looking out for any alternative optimized solution.
Is this what you wanted?
update tupd
set
SequenceNo = tupd.ID
from (
select
*
, LAG(t.Act, 1, null) over(order by t.ID) as LagAct
from #tmp t
) tt
inner join #tmp tupd on tt.ID = tupd.ID
where tt.Act < tt.LagAct
select
*
from #tmp tt
order by tt.ID
You can find breaks in the Act sequence if you compare the current row with the previous. In current SQL Server versions, ie 2012+ you can do that with the LAG() method. In previous versions, you have to perform a self join on the current ID value and its previous value. If the difference between the current and previous Act values isn't 1, there is a break in the sequence. This only works if there are no gaps in ID.
This query will return 1 in the SeqBreak column for each row where Act restarts
select
t1.ID,
t1.SequenceNo ,
case when t1.act = t2.act+1 then '0' else '1' end as SeqBreak
from #tmp t1 left join #tmp t2 on t1.id=t2.id+1
You can use this in a CTE to select only the rows where Act breaks. You can update the CTE directly, at least in SQL Server 2014.
with x as (
select
t1.ID,
t1.SequenceNo ,
case when t1.act = t2.act+1 then '0' else '1' end as SeqBreak
from #tmp t1 left join #tmp t2 on t1.id=t2.id+1 )
update x
set SequenceNo=x.ID
where seqbreak=1 and id>1
If that doesn't work with SQL Server 2008, you'll have to join between the cte and the table:
with x as (
select
t1.ID,
t1.SequenceNo ,
case when t1.act = t2.act+1 then '0' else '1' end as SeqBreak
from #tmp t1 left join #tmp t2 on t1.id=t2.id+1 )
update #tmp
set SequenceNo=x.ID
from #tmp inner join x on x.ID=#tmp.ID
where seqbreak=1 and #tmp.id>1
The equivalent query using LAG(), one of the windowing functions in SQL Server 2012 is simpler and twice as fast, since it avoids the self join:
with x as (
select
ID,
SequenceNo ,
case when act = 1 +LAG(act,1) OVER (ORDER BY ID) then '0' else '1' end as SeqBreak
from #tmp)
update x
set SequenceNo=x.ID
where seqbreak=1 and id>1
;WITH CTE
AS
(
SELECT ID,Act,RANK() OVER (PARTITION BY [DayOfWeek] ORDER BY Act) RN
FROM #tmp
)
UPDATE t
SET t.SequenceNo = t.id
FROM #tmp t
inner join CTE C
ON t.ID = C.ID
WHERE C.RN = 1
If I'm getting your question correctly below query should fetch the desired result.
update t3 set sequenceno = t3.id
from
#temp t1
cross apply (select id+1 as idsum
from #temp
tmp where t1.id = tmp.id)t2
inner join #temp t3
on t3.id = t2.idsum
where t1.act>t3.act
I have two tables like below:
table1:
StoreId SKU
------------
1 abc
2 abc
3 abc
1 xyz
4 xyz
table2:
StoreId
--------
1
2
3
4
5
I want to select missing storeid from the table1 which are in table 2. But condition is that in above example for SKU abc storeid 4 and 5 are missing and for sku xyz 2,3,5 are missing. So I want below table as output
SKU,ID
------
abc 4
abc 5
xyz 2
xyz 3
xyz 5
I am able to pull only the overall missing store which is 5 using below query.
SELECT
SKU, t2.StoreId
FROM
#table1 t1
FULL OUTER JOIN
#table2 t2 ON t1.StoreId = t2.StoreId
WHERE
t1.StoreId IS NULL
Below is test create and insert query.
Declare #table1 As table
(
StoreId varchar(4),
SKU varchar(5)
)
Declare #table2 As table
(
StoreId int
)
BEGIN
Insert Into #table1(SKU,StoreId) values('abc',1)
Insert Into #table1(SKU,StoreId) values('abc',2)
Insert Into #table1(SKU,StoreId) values('abc',3)
Insert Into #table1(SKU,StoreId) values('xyz',1)
Insert Into #table1(SKU,StoreId) values('xyz',4)
Insert Into #table2(StoreId) values(1)
Insert Into #table2(StoreId) values(2)
Insert Into #table2(StoreId) values(3)
Insert Into #table2(StoreId) values(4)
Insert Into #table2(StoreId) values(5)
END
Thank you
You need to get a list of all skus and tables, and then show only rows which do not appear in table1:
select SKU, StoreID
from #table2 t2
cross join (select distinct sku from #table1) t1
where not exists (select 1 from #table1 table1
where table1.SKU = t1.SKU
and table1.StoreId = t2.StoreId)
Here is an alternative solution with the same result.
Syntax is very similar to the answer from #BeanFrog:
SELECT
t3.SKU, t2.StoreID
FROM
#table2 t2
CROSS JOIN
(SELECT distinct SKU
FROM #table1) t3
LEFT JOIN
#table1 t1
ON
t1.SKU = t3.SKU
and t1.StoreId = t2.StoreId
WHERE
t1.sku is null
I have two tables in SQL Server,
Declare #Table1 Table ( TID1 INT, TP1 INT)
Declare #Table2 Table ( TID2 INT, TP2 INT)
INSERT INTO #Table1 (TID1,TP1) VALUES (100,1)
INSERT INTO #Table1 (TID1,TP1) VALUES (100,2)
INSERT INTO #Table1 (TID1,TP1) VALUES (100,3)
INSERT INTO #Table2 (TID2,TP2) VALUES (101,1)
INSERT INTO #Table2 (TID2,TP2) VALUES (101,2)
INSERT INTO #Table2 (TID2,TP2) VALUES (101,3)
INSERT INTO #Table2 (TID2,TP2) VALUES (102,1)
INSERT INTO #Table2 (TID2,TP2) VALUES (102,2)
INSERT INTO #Table2 (TID2,TP2) VALUES (103,1)
INSERT INTO #Table2 (TID2,TP2) VALUES (103,2)
INSERT INTO #Table2 (TID2,TP2) VALUES (103,3)
INSERT INTO #Table2 (TID2,TP2) VALUES (103,4)
INSERT INTO #Table2 (TID2,TP2) VALUES (104,2)
INSERT INTO #Table2 (TID2,TP2) VALUES (105,3)
Having Data as :
TID1 TP1
----------- -----------
100 1
100 2
100 3
TID2 TP2
----------- -----------
101 1
101 2
101 3
102 1
102 2
103 1
103 2
103 3
103 4
104 2
105 3
I want to select those records which having exact matching of TP1 column in Table2 TP2 column. EX TID2 having ID 101 will be only in result set
SELECT t2.TID2
FROM #Table2 t2
LEFT JOIN #Table1 t1
ON t2.TP2 = t1.TP1
GROUP BY t2.TID2
HAVING SUM(CASE WHEN t1.TP1 IS NULL THEN 1 ELSE 0 END) = 0 AND
COUNT(*) = (SELECT COUNT(*) FROM #Table1)
Try Like below.
SELECT TID2
FROM #TABLE1 T
RIGHT JOIN #TABLE2 T2
ON T.TP1 = T2.TP2
GROUP BY TID2
HAVING COUNT(T2.TP2) = (SELECT COUNT(*) FROM #TABLE1)
-- you can calculated this in CTEW or sub-query if you do not like to be in variable
DECLARE #MaxRowsCount INT = (SELECT COUNT(*) FROM #Table1);
SELECT T2.[TID2]
FROM #Table2 T2
LEFT JOIN #Table1 T1
ON T2.[TP2] = T1.[TP1]
GROUP BY T2.[TID2]
HAVING
(
-- current count of rows should be the same as the row count from the first table
COUNT(T2.[TP2]) = #MaxRowsCount
)
I have two tables as below:
Table1:
Col1 Col2
834 2
834 3
834 5
877 1
877 2
877 5
900 10
900 2
900 3
Table2:
Col3
1
10
Expected Output:
Col1 Col3
834 1
834 10
877 10
900 1
This is what I have done so far:
select distinct t1.Col1,A.Col3
from Table1
cross apply
(select * from Table2 left join Table1 on Col2 = Col3) A
order by t1.Col1,A.Col3
I have been trying lot using join, cross apply and other sql function to get me the expected output. Any help is appreciated.
Logic:
I want to get the values of Col1 of Table1 for which Col3 does not matches with col2 of Table1.
For Eg: Values 1 and 10 of Col3 of table2 is not there for value 834 of col1 of table1. Hence there are two rows for 834, one with value 1 and other with 10.
Please try with the below code snippet.
DECLARE #Table1 TABLE(
Col1 INT NOT NULL,
Col2 INT NOT NULL
);
INSERT INTO #Table1 VALUES(834 , 2)
INSERT INTO #Table1 VALUES(834 , 3)
INSERT INTO #Table1 VALUES(834 , 5)
INSERT INTO #Table1 VALUES(877 , 1)
INSERT INTO #Table1 VALUES(877 , 2)
INSERT INTO #Table1 VALUES(877 , 5)
INSERT INTO #Table1 VALUES(900 , 10)
INSERT INTO #Table1 VALUES(900 , 2)
INSERT INTO #Table1 VALUES(900 , 3)
DECLARE #Table2 TABLE(
Col3 INT NOT NULL
);
INSERT INTO #Table2 VALUES(1)
INSERT INTO #Table2 VALUES(10)
SELECT a.Col1,a.Col3 FROM (SELECT DISTINCT a.Col1,b.Col3 FROM #Table1 a,#Table2 b
WHERE a.Col2 <> b.Col3) a
LEFT JOIN #Table1 c on a.Col1 = c.Col1 and a.Col3 = c.Col2
WHERE c.Col1 is null
I doubt you need any join in here. All you want to select from table1 are the rows with matching value in table2. Something like this :
Select distinct col1, col2 from table1
Where col2 not in (select col3 from table2)
select a.col1, a.col2
from Table1 a
where not exists(select * from Table2 b where a.col1 = b.col3)
This may work...