T- SQL Duplicate Records - sql-server

I am trying to delete every other record which are duplicate my select query returns every other record duplicate (tblPoints.ptUser_ID) is the unique id
SELECT *, u.usMembershipID
FROM [ABCRewards].[dbo].[tblPoints]
inner join tblUsers u on u.User_ID = tblPoints.ptUser_ID
where ptUser_ID in (select user_id from tblusers where Client_ID = 8)
and ptCreateDate >= '3/9/2016'
and ptDesc = 'December Anniversary'

Usually duplicates getting returned by an INNER JOIN suggests an issue with the query but if you are certain that your join is correct then this would do it:
;WITH CTE
AS (SELECT *
, ROW_NUMBER() OVER(PARTITION BY t.ptUser_ID ORDER BY t.ptUser_ID) AS rn
FROM [ABCRewards].[dbo].[tblPoints] AS t)
/*Uncomment below to Review duplicates*/
--SELECT *
--FROM CTE
--WHERE rn > 1;
/*Uncomment below to Delete duplicates*/
--DELETE
--FROM CTE
--WHERE rn > 1;

When cleaning up data duplication, I have always used the same query pattern to delete all the duplicate and keep the wanted one(original, most recent, whatever). The below query pattern delete all duplicates and keep the one you wish to keep.
Just replace all [] with your table and fields.
[Field(s)ToDetectDuplications] : Put here the field(s) that allow you to say that they are dupplicate when they have the same values.
[Field(s)ToChooseWhichDupplicationIsKept ] : Put here a fields to choose which dupplicate will be kept. For exemple, the one with the
biggest value or the less old one.
.
DELETE [YourTableName]
FROM [YourTableName]
INNER JOIN (SELECT [YourTablePrimaryKey],
I = ROW_NUMBER() OVER(PARTITION BY [Field(s)ToDetectDuplications] ORDER BY [Field(s)ToChooseWhichDupplicationIsKept ] DESC)
FROM [dbo].[YourTableName]) AS T ON [YourTableName].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
AND T.I > 1
I recommend to have a look to what will be deleted before. To do so, just replace the "delete" statement with a "select" instead just like below.
SELECT T.I,
[YourTableName].*
FROM [YourTableName]
INNER JOIN (SELECT [YourTablePrimaryKey],
I = ROW_NUMBER() OVER(PARTITION BY [Field(s)ToDetectDuplications] ORDER BY [Field(s)ToChooseWhichDupplicationIsKept ] DESC)
FROM [dbo].[YourTableName]) AS T ON [YourTableName].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
AND T.I > 1
Explanation :
Here we use "row_number()", "Partition by" and "Order by" to detect duplicates. "Partition" group together all rows. Set your partitions fields in order to have one row per partition when the data is right. That way bad data come out with partition that have more than one row. Row_number assign them a number. When a number is greater then 1, then this mean there is a duplicate with this partition. The "order by" is use to tell "row_number" in what order to assign them a number. Number 1 is kept, all others are deleted.
Exemple with OP's schema and specification
Here I attempted to fill the patern with guess I have made on your database schema.
DECLARE #userID INT
SELECT #userID = 8
SELECT T.I,
[ABCRewards].[dbo].[tblPoints].*
FROM [ABCRewards].[dbo].[tblPoints]
INNER JOIN (SELECT [YourTablePrimaryKey],
I = ROW_NUMBER() OVER(PARTITION BY T.ptDesc, T.ptUser_ID ORDER BY ptCreateDate DESC)
FROM [ABCRewards].[dbo].[tblPoints]
WHERE T.ptCreateDate >= '3/9/2016'
AND T.ptDesc = 'December Anniversary'
AND T.ptUser_ID = #userID
) AS T ON [ABCRewards].[dbo].[tblPoints].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
AND T.I > 1

Related

Print results side by side SQL Server

I have following result set,
Now with above results i want to print the records via select query as below attached image
Please note, I will have only two types of columns in output Present Employee & Absent Employees.
I tried using pivot tables, temporary table but cant achieve what I want.
One method would be to ROW_NUMBER each the the "statuses" and then use a FULL OUTER JOIN to get the 2 datasets into the appropriate columns. I use a FULL OUTER JOIN as I assume you could have a different amount of employees who were present/absent.
CREATE TABLE dbo.YourTable (Name varchar(10), --Using a name that doesn't require delimit identification
Status varchar(7), --Using a name that doesn't require delimit identification
Days int);
GO
INSERT INTO dbo.YourTable(Name, Status, Days)
VALUES('Mal','Present',30),
('Jess','Present',20),
('Rick','Absent',30),
('Jerry','Absent',10);
GO
WITH RNs AS(
SELECT Name,
Status,
Days,
ROW_NUMBER() OVER (PARTITION BY Status ORDER BY Days DESC) AS RN
FROM dbo.YourTable)
SELECT P.Name AS PresentName,
P.Days AS PresentDays,
A.Name AS AbsentName,
A.Days AS AbsentDays
FROM (SELECT R.Name,
R.Days,
R.Status,
R.RN
FROM RNs R
WHERE R.Status = 'Present') P
FULL OUTER JOIN (SELECT R.Name,
R.Days,
R.Status,
R.RN
FROM RNs R
WHERE R.Status = 'Absent') A ON P.RN = A.RN;
GO
DROP TABLE dbo.YourTable;
db<>fiddle
2 CTE's is actually far neater:
WITH Absents AS(
SELECT Name,
Status,
Days,
ROW_NUMBER() OVER (ORDER BY Days DESC) AS RN
FROM dbo.YourTable
WHERE Status = 'Absent'),
Presents AS(
SELECT Name,
Status,
Days,
ROW_NUMBER() OVER (ORDER BY Days DESC) AS RN
FROM dbo.YourTable
WHERE Status = 'Present')
SELECT P.Name AS PresentName,
P.Days AS PresentDays,
A.Name AS AbsentName,
A.Days AS AbsentDays
FROM Absents A
FULL OUTER JOIN Presents P ON A.RN = P.RN;

Is there any way to sum duplicate rows when deleting duplicates using CTE?

I have a table that contains duplicated ItemId. I am using CTE to remove the duplicate records and keep only single record for each item. I am able to successfully achieve this milestone using following Query:
Create procedure sp_SumSameItems
as
begin
with cte as (select a.Id,a.ItemId,Qty, QtyPrice,
ROW_NUMBER() OVER(PARTITION by ItemId ORDER BY Id) AS rn from tblTest a)
delete x from tblTest x Join cte On x.Id = cte.Id where cte.rn > 1
end
The actual problem is I want to Sum the Qty and QtyPrice before deleting duplicate records. Where should I add Sum function ?
Problem Illustration:
You can't use update with delete statement, you need to update before :
update t
set t.qty = (select sum(t1.qty) from table t1 where t1.itemid = t.itemid);
A CTE is valid for only one statement, so you will need to either run the cte twice, once summing and then deleting or you could put the result of CTE in a temp table and then use the temp table to sum and then delete records in the original table.
At first level, you have to update Qty and QtyPrice after that remove duplicate records.
Given Example:
CREATE PROCEDURE Sp_sumsameitems
AS
BEGIN
WITH cte1
AS (SELECT a.id,
a.itemid,
Sum(qty) Qty,
Sum(qtyprice)QtyPrice,
FROM tbltest a
GROUP BY a.id)
UPDATE x
SET x.qty = c.qty,
x.qtyprice = c.qtyprice
FROM tbltest x
JOIN cte1 c
ON x.id = cte.id
WITH cte
AS (SELECT a.id,
a.itemid,
qty,
qtyprice,
Row_number()
OVER(
partition BY itemid
ORDER BY id) AS rn
FROM tbltest a)
DELETE x
FROM tbltest x
JOIN cte
ON x.id = cte.id
WHERE cte.rn > 1
END

Unique data based on name,email from sql server

I want to get unique rows based on FirstName,EmailID. I tried few things by adding DISTINCT to all row that still get duplicate rows. tried Group By that failed with error. I can do a subquery but that will be slow. WHat is the best solution for below query
SELECT FirstName,LastName,FamilyName, EmailID,Phone,City,Country,CreatedOn,t.Type , ID
FROM Forms C JOIN Form_Type T
ON c.Form_TypeID = t.Form_TypeID
WHERE c.Form_TypeID = 1 AND DATEDIFF( "d", CreatedOn, GETDATE()) < 31
ORDER BY CreatedOn DESC
See if this works for you:
SELECT *
FROM (
SELECT FirstName,LastName,FamilyName, EmailID,Phone,City,Country,CreatedOn,t.Type , ID,
ROW_NUMBER() OVER (PARTITION BY FirstName ,EmailID ORDER BY CreatedOn DESC ) NewCol
FROM Forms C
JOIN Form_Type T ON c.Form_TypeID = t.Form_TypeID
WHERE c.Form_TypeID = 1
AND DATEDIFF("d", CreatedOn, GETDATE()) < 31
) t
WHERE NewCol = 1
I have added an extra column (i.e. NewCol) in the inner table. I am assuming that you wanted to display recent record (using CREATEDON) for each combination of "FirstName, Email"
DISTINCT will not work in your case, as you want all the fields from the table. So you need to use a sub-query to create a list of distinct names/emails.
You should be able to adapt the following example to your needs:
SELECT User, EMail, Address1, Address2
FROM Table1 t1
INNER JOIN (SELECT DISTINCT(User, EMail) FROM Table1) tmp ON t1.User = tmp.User AND t1.EMail = tmp.EMail
Using an INNER JOIN this returns only rows from Table1 that are in table tmp. Table tmp is defined as the distinct combinations of User and EMail from Table1.
So what happens is: You create a distinct list of User and EMail from Table1. Then you select all the entries from Table1 where User and EMail are in that list.

Delete duplicate row with having

I want to remove my duplicate value in SQL Server. So I found a query that can find them, how can I append delete statement to this query to delete them?
(SELECT
DocumentNumber, LineNumber, SheetNumber, Unit, COUNT(*)
FROM
Lines
GROUP BY
DocumentNumber, LineNumber, SheetNumber, Unit
HAVING
COUNT(*) > 1)
I would use a CTE and a ranking function like ROW_NUMBER:
;WITH CTE AS
(
SELECT Id, DocumentNumber, LineNumber, SheetNumber, Unit,
RN = ROW_NUMBER() OVER (PARTITION BY DocumentNumber, LineNumber, SheetNumber, Unit
ORDER BY Id DESC) -- keeps the newest Id-row
FROM dbo.Lines
)
DELETE FROM CTE WHERE RN > 1
The great benefit, it's easier to read and to maintain and it's easy to change it to a SELECT * FROM CTE to see what you're going to delete.
Modify the Order By part to implement a custom logic which row to keep and which rows to delete. It can contain multiple columns(either ASC or DESC) or even conditional statements (f.e. with CASE).
Here are a few ways to do that:
DELETE FROM Lines L
INNER JOIN (SELECT
DocumentNumber,LineNumber,SheetNumber,Unit, COUNT(*)
FROM
Lines
GROUP BY
DocumentNumber,LineNumber,SheetNumber,Unit
HAVING
COUNT(*) > 1) D ON L.DocumentNumber = D.DocumentNumber AND L.LineNumber = D.LineNumber AND L.SheetNumber = D.SheetNumber AND L.Unit = D.Unit
or you can also use table variable or CTE or IN and subquery if you have a column like ID
Much appropriate approach using EXISTS
DELETE a
FROM lines a
WHERE EXISTS (SELECT 1
FROM lines b
WHERE a.documentnumber = b.documentnumber
AND a.linenumber = b.linenumber
AND a.sheetnumber = b.sheetnumber
AND a.unit = b.unit
HAVING Count(*) > 1)
Another approach using COUNT() OVER()
;WITH cte
AS (SELECT id,
documentnumber,
linenumber,
sheetnumber,
unit,
CNT = Count(1) OVER(partition BY documentnumber, linenumber, sheetnumber, unit)
FROM dbo.lines)
DELETE FROM cte
WHERE cnt > 1
Note : Both my approaches deletes all the records for this combination documentnumber, linenumber, sheetnumber, unit if it is duplicated

SQL Update with row_number()

I want to update my column CODE_DEST with an incremental number. I have:
CODE_DEST RS_NOM
null qsdf
null sdfqsdfqsdf
null qsdfqsdf
I would like to update it to be:
CODE_DEST RS_NOM
1 qsdf
2 sdfqsdfqsdf
3 qsdfqsdf
I have tried this code:
UPDATE DESTINATAIRE_TEMP
SET CODE_DEST = TheId
FROM (SELECT Row_Number() OVER (ORDER BY [RS_NOM]) AS TheId FROM DESTINATAIRE_TEMP)
This does not work because of the )
I have also tried:
WITH DESTINATAIRE_TEMP AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY [RS_NOM] DESC) AS RN
FROM DESTINATAIRE_TEMP
)
UPDATE DESTINATAIRE_TEMP SET CODE_DEST=RN
But this also does not work because of union.
How can I update a column using the ROW_NUMBER() function in SQL Server 2008 R2?
One more option
UPDATE x
SET x.CODE_DEST = x.New_CODE_DEST
FROM (
SELECT CODE_DEST, ROW_NUMBER() OVER (ORDER BY [RS_NOM]) AS New_CODE_DEST
FROM DESTINATAIRE_TEMP
) x
DECLARE #id INT
SET #id = 0
UPDATE DESTINATAIRE_TEMP
SET #id = CODE_DEST = #id + 1
GO
try this
http://www.mssqltips.com/sqlservertip/1467/populate-a-sql-server-column-with-a-sequential-number-not-using-an-identity/
With UpdateData As
(
SELECT RS_NOM,
ROW_NUMBER() OVER (ORDER BY [RS_NOM] DESC) AS RN
FROM DESTINATAIRE_TEMP
)
UPDATE DESTINATAIRE_TEMP SET CODE_DEST = RN
FROM DESTINATAIRE_TEMP
INNER JOIN UpdateData ON DESTINATAIRE_TEMP.RS_NOM = UpdateData.RS_NOM
Your second attempt failed primarily because you named the CTE same as the underlying table and made the CTE look as if it was a recursive CTE, because it essentially referenced itself. A recursive CTE must have a specific structure which requires the use of the UNION ALL set operator.
Instead, you could just have given the CTE a different name as well as added the target column to it:
With SomeName As
(
SELECT
CODE_DEST,
ROW_NUMBER() OVER (ORDER BY [RS_NOM] DESC) AS RN
FROM DESTINATAIRE_TEMP
)
UPDATE SomeName SET CODE_DEST=RN
This is a modified version of #Aleksandr Fedorenko's answer adding a WHERE clause:
UPDATE x
SET x.CODE_DEST = x.New_CODE_DEST
FROM (
SELECT CODE_DEST, ROW_NUMBER() OVER (ORDER BY [RS_NOM]) AS New_CODE_DEST
FROM DESTINATAIRE_TEMP
) x
WHERE x.CODE_DEST <> x.New_CODE_DEST AND x.CODE_DEST IS NOT NULL
By adding a WHERE clause I found the performance improved massively for subsequent updates. Sql Server seems to update the row even if the value already exists and it takes time to do so, so adding the where clause makes it just skip over rows where the value hasn't changed. I have to say I was astonished as to how fast it could run my query.
Disclaimer: I'm no DB expert, and I'm using PARTITION BY for my clause so it may not be exactly the same results for this query. For me the column in question is a customer's paid order, so the value generally doesn't change once it is set.
Also make sure you have indexes, especially if you have a WHERE clause on the SELECT statement. A filtered index worked great for me as I was filtering based on payment statuses.
My query using PARTITION by
UPDATE UpdateTarget
SET PaidOrderIndex = New_PaidOrderIndex
FROM
(
SELECT PaidOrderIndex, SimpleMembershipUserName, ROW_NUMBER() OVER(PARTITION BY SimpleMembershipUserName ORDER BY OrderId) AS New_PaidOrderIndex
FROM [Order]
WHERE PaymentStatusTypeId in (2,3,6) and SimpleMembershipUserName is not null
) AS UpdateTarget
WHERE UpdateTarget.PaidOrderIndex <> UpdateTarget.New_PaidOrderIndex AND UpdateTarget.PaidOrderIndex IS NOT NULL
-- test to 'break' some of the rows, and then run the UPDATE again
update [order] set PaidOrderIndex = 2 where PaidOrderIndex=3
The 'IS NOT NULL' part isn't required if the column isn't nullable.
When I say the performance increase was massive I mean it was essentially instantaneous when updating a small number of rows. With the right indexes I was able to achieve an update that took the same amount of time as the 'inner' query does by itself:
SELECT PaidOrderIndex, SimpleMembershipUserName, ROW_NUMBER() OVER(PARTITION BY SimpleMembershipUserName ORDER BY OrderId) AS New_PaidOrderIndex
FROM [Order]
WHERE PaymentStatusTypeId in (2,3,6) and SimpleMembershipUserName is not null
I did this for my situation and worked
WITH myUpdate (id, myRowNumber )
AS
(
SELECT id, ROW_NUMBER() over (order by ID) As myRowNumber
FROM AspNetUsers
WHERE UserType='Customer'
)
update AspNetUsers set EmployeeCode = FORMAT(myRowNumber,'00000#')
FROM myUpdate
left join AspNetUsers u on u.Id=myUpdate.id
Simple and easy way to update the cursor
UPDATE Cursor
SET Cursor.CODE = Cursor.New_CODE
FROM (
SELECT CODE, ROW_NUMBER() OVER (ORDER BY [CODE]) AS New_CODE
FROM Table Where CODE BETWEEN 1000 AND 1999
) Cursor
If table does not have relation, just copy all in new table with row number and remove old and rename new one with old one.
Select RowNum = ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) , * INTO cdm.dbo.SALES2018 from
(
select * from SALE2018) as SalesSource
In my case I added a new column and wanted to update it with the equevilat record number for the whole table
id name new_column (ORDER_NUM)
1 Ali null
2 Ahmad null
3 Mohammad null
4 Nour null
5 Hasan null
6 Omar null
I wrote this query to have the new column populated with the row number
UPDATE My_Table
SET My_Table.ORDER_NUM = SubQuery.rowNumber
FROM (
SELECT id ,ROW_NUMBER() OVER (ORDER BY [id]) AS rowNumber
FROM My_Table
) SubQuery
INNER JOIN My_Table ON
SubQuery.id = My_Table.id
after executing this query I had 1,2,3,... numbers in my new column
I update a temp table with the first occurrence of part where multiple parts can be associated with a sequence number. RowId=1 returns the first occurence which I join the tmp table and data using part and sequence number.
update #Tmp
set
#Tmp.Amount=#Amount
from
(SELECT Part, Row_Number() OVER (ORDER BY [Part]) AS RowId FROM #Tmp
where Sequence_Num=#Sequence_Num
)data
where data.Part=#Tmp.Part
and data.RowId=1
and #Tmp.Sequence_Num=#Sequence_Num
I don't have a running ID in order to do what "Basheer AL-MOMANI" suggested.
I did something like this: (joined my table on myself, just to get the Row Number)
update T1 set inID = T2.RN
from (select *, ROW_NUMBER() over (order by ID) RN from MyTable) T1
inner join (select *, ROW_NUMBER() over (order by ID) RN from MyTable) T2 on T2.RN = T1.RN

Resources