How to remove duplicates where id is on different columns?

How to remove duplicates where id is on different columns? - sql-server

I have a problem removing duplicates. What makes duplicates here is I below with an example.
EmployeeID IDnr1 IDnr2
123456 111111 222222
123456 222222 111111
I want to remove one of these lines. Does not matter who.
I have several thousand such duplicate lines.
Thanks in advance

Use a CASE expression in the GROUP BY clause.
Query
select [EmployeeID], min([IDnr1]) [IDnr1], max([IDnr2]) [IDnr2]
from [your_table_name]
group by [EmployeeID],
case when [IDnr1] > [IDnr2] then [IDnr1] else [IDnr2] end,
case when [IDnr1] > [IDnr2] then [IDnr2] else [IDnr1] end;
Find a demo here

One way to do it is use a cte with row_number().
Create and populate sample table (Please save us this step in your future questions)
DECLARE #T AS TABLE
(
EmployeeID int,
IDnr1 int,
IDnr2 int
)
INSERT INTO #T VALUES
(123456, 111111, 222222),
(123456, 222222, 111111),
(123456, 111112, 222222),
(123457, 222222, 111111)
The cte - note the use of case to get the minimum value:
;WITH CTE AS
(
SELECT EmployeeID,
ROW_NUMBER() OVER(PARTITION BY EmployeeID,
CASE WHEN IDnr1 < IDnr2 THEN IDnr1 ELSE IDnr2 END,
CASE WHEN IDnr1 < IDnr2 THEN IDnr2 ELSE IDnr1 END
ORDER BY (SELECT NULL)) rn
FROM #T
)
The delete statement:
DELETE
FROM CTE
WHERE rn > 1
See a live demo on rextester.
However, deleting the duplicates is only a part of the work. You want to make sure no new duplicates can be inserted to the table. To do that, you need to add a check constraint to your table, but first, update the table.
This step will make sure you can add the check constraint:
UPDATE TableName
SET Idnr1 = Idnr2,
Idnr2 = Idnr1
WHERE Idnr1 >= Idnr2
Then, add the check constraint:
ALTER TABLE TableName
ADD CONSTRAINT CK_TableNamePreventDups CHECK(Idnr1 < Idnr2)
GO
This will make sure no new duplicates can be inserted to your table.

Related

Compare row with other rows in the same table in sql server

I have the below records in my table,
If the HoleNumber combination is not having 'A' and 'B' for the particular datetime, we need to remove the alphabets from the number.
i.e., Remove 'A' from third record and sixth record. Because, it doesn't have B combinations for that datetime.

delete from myTable
where id in
(
select id from myTable t1
inner join
(
select [date], left([holeNumber], len(holeNumber)-1) as hNumber
from myTable
group by [date], left([holeNumber], len(holeNumber)-1)
having count(holeNumber) = 1
) tmp
on t1.[date] = tmp.[date] and left(t1.holeNumber, len(holeNumber)-1) = tmp.hNumber);
would do it, provided your requirements are strictly to remove having only 1 type of holeNumber.
DBFiddle demo

T-SQL : Cleaning up data, merging rows into columns

I'm trying to clean up some Active Directory data in SQL Server. I have managed to read the raw LFD file into a table. Now I need to clean up some attributes where values are spread out over multiple rows. I can identify records that need to be appended to the prior record by the fact the have a leading space.
Example:
ID Record IsPartOfPrior
3114 memberOf: 0
3115 CN=Sharepoint-Members-home 1
3116 memberOf: 0
3117 This is 1
3118 part of the 1
3119 next line. 1
Ultimately, I would like to have the following table generated:
ID Record
3114 memberOf:CN=Sharepoint-Members-home
3116 memberOf:This is part of the next line
I could write it through a cursor, setting variables, working with temp tables and populating a table. But there has to be a set based (maybe recursive?) approach to this?
I could use the STUFF method to combine various rows together, but how am I about to group the various sets together? I'm thinking that I first have to define groupID's per record, and then stuff them together per groupID?
Thanks for any help.

Batch with comments below. Should work starting with SQL Server 2008.
--Here I emulate your table
DECLARE #yourtable TABLE (ID int, Record nvarchar(max), IsPartOfPrior bit)
INSERT INTO #yourtable VALUES
(3114,'memberOf:',0),(3115,'CN=Sharepoint-Members-home',1),(3116,'memberOf:',0),(3117,'This is',1),(3118,'part of the',1),(3119,'next line.',1)
--Getting max ID
DECLARE #max_id int
SELECT #max_id = MAX(ID)+1
FROM #yourtable
--We get next prior for each prior record
--And use STUFF and FOR XML PATH to build new Record
SELECT y.ID,
y.Record + b.Record as Record
FROM #yourtable y
OUTER APPLY (
SELECT TOP 1 ID as NextPrior
FROM #yourtable
WHERE IsPartOfPrior = 0 and y.ID < ID
ORDER BY ID ASC
) as t
OUTER APPLY (
SELECT STUFF((
SELECT ' '+Record
FROM #yourtable
WHERE ID > y.ID and ID < ISNULL(t.NextPrior,#max_id)
ORDER BY id ASC
FOR XML PATH('')
),1,1,'') as Record
) as b
WHERE y.IsPartOfPrior = 0
The output:
ID Record
----------- -----------------------------------------
3114 memberOf:CN=Sharepoint-Members-home
3116 memberOf:This is part of the next line.
This will work if ID are numeric and ascending.

Yet another option if 2012+
Example
Declare #YourTable Table ([ID] int,[Record] varchar(50),[IsPartOfPrior] int)
Insert Into #YourTable Values
(3114,'memberOf:',0)
,(3115,'CN=Sharepoint-Members-home',1)
,(3116,'memberOf:',0)
,(3117,'This is',1)
,(3118,'part of the',1)
,(3119,'next line.',1)
;with cte as (
Select *,Grp = sum(IIF([IsPartOfPrior]=0,1,0)) over (Order By ID)
From #YourTable
)
Select ID
,Record = Stuff((Select ' ' +Record From cte Where Grp=A.Grp Order by ID For XML Path ('')),1,1,'')
From (Select Grp,ID=min(ID)from cte Group By Grp ) A
Returns
ID Record
3114 memberOf: CN=Sharepoint-Members-home
3116 memberOf: This is part of the next line.
If it Helps with the Visualization, the cte Produces:
ID Record IsPartOfPrior Grp << Notice Grp Values
3114 memberOf: 0 1
3115 CN=Sharepoint-Members-home 1 1
3116 memberOf: 0 2
3117 This is 1 2
3118 part of the 1 2
3119 next line. 1 2

How can I ignore duplicate rows where columns do not contain data

I have a table with duplicate rows, however, some of the duplicate rows have columns does not contain data for the same column. How can I remove/ignore only those row where columns are blank? In some instances:
Name Employee# Location City
-----------------------------------------
BowerT 48999 NJ Foods
BowerT 48999 NJ Foods Pearl
BowerT 48999 NJ Foods Johns
BowerT 48999 NJ Foods Johns
I'm using with CTE to delete duplicate, however, if 2nd, 3rd, or 4th row has the data I need for that column, I lose it because these are greater than row 1.
;With hrEmployee as
(
Select
*,
Row_Number () Over (Partition BY Employee_Number order by Employee_Number) As RowNumber
From
[dbo].[hrEmployee]
Where
Employee_Number = '48999'
)
Delete hrEmployees
where RowNumber > 1
What am I missing?

Here is an entire example the relevant code change is:
ROW_NUMBER() OVER (PARTITION BY Employee_Number ORDER BY
CASE WHEN ISNULL(City,'') = '' THEN 1 ELSE 0 END
) as RowNumber
What that does is simply ORDER your results of what you want to keep by saying if the City is null or '' (blank) make it last. You can rank your results anyway you want by specifying different order in your ORDER BY.
DECLARE #Table AS TABLE (Name VARCHAR(10), Employee_Number INT, Location VARCHAR(20), City VARCHAR(20))
INSERT INTO #Table VALUES ('BowerT',48999,'NJ Foods',NULL)
,('BowerT',48999,'NJ Foods','Pearl')
,('BowerT',48999,'NJ Foods','Johns')
,('BowerT',48999,'NJ Foods','Johns')
SELECT *
FROM
#Table
;WITH hrEmployee AS (
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY Employee_Number ORDER BY
CASE WHEN ISNULL(City,'') = '' THEN 1 ELSE 0 END
) as RowNumber
FROM
#Table
where Employee_Number = '48999'
)
DELETE
FROM
hrEmployee
WHERE
RowNumber > 1
SELECT *
FROM
#Table

Clone a void refund transaction record to cancel each other out

Account_ID Amount
123 200
Result
Account_ID Amount
123 200
123 -200
Typically, our database will have two transactions for a void refund payment, but somehow few records only have one transaction.
I know I can manually insert a same record into the table.
Is there any other ways to clone a record and set the amount to negative without using insert statment?

Even though as M.Ali said it is not good to clone record but we can achieve but i didn't exactly know if it suits your requirement or not
DECLARE #T TABLE
([Account_ID] int, [Amount] int)
;
INSERT INTO #T
([Account_ID], [Amount])
VALUES
(123, 200)
;
;WITH CTE AS (select Account_ID,Amount,row_number()OVER(PARTITION BY Amount ORDER BY (Select NULL))RN from #T
CROSS APPLY(values('Account_ID',Account_ID),('Amount',Amount))M(v,s))
Select Account_ID,
CASE WHEN RN = 1 THEN cast(Amount as varchar) ELSE
'-' + cast(Amount as varchar)END
from CTE

How can i update a column in a SQL table if the entry is a duplicate and keep the newest entry?

I have a database table with about 90 thousand entries. I need to update any older entries so i can move them to an archive table. I was able to find the duplicates where the count is greater than 1 and also find out how many times it was duplicated.
This is query I used for this to work.
SELECT DWPAGECOUNT, DOCTYPE, FILENAME, First, middleinitial, last,
COUNT(*) as Number_of_Duplicates
FROM dbo.REGISTRAR
WHERE first IS NOT NULL
GROUP BY DWPAGECOUNT, DOCTYPE, FILENAME, First, middleinitial, last
HAVING COUNT(*) > 1
ORDER by Number_of_Duplicates desc
I now need to update anything that is on the table more than once and leave the newest entry intact and update a status column to duplicate.
How can I do this?
Thanks in advance for the help.

;WITH x AS
(
SELECT DWPAGECOUNT, DOCTYPE, FILENAME, First, middleinitial, last,
[STATUS],
rn = ROW_NUMBER() OVER
(
PARTITION BY
DWPAGECOUNT, DOCTYPE, FILENAME, First, middleinitial, last
ORDER BY
STOREDATETIME DESC
)
FROM dbo.REGISTRAR
)
UPDATE x SET [STATUS] = CASE WHEN rn > 1 THEN 'DUP' ELSE 'NOT DUP' END;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to remove duplicates where id is on different columns? - sql-server

I have a problem removing duplicates. What makes duplicates here is I below with an example. EmployeeID IDnr1 IDnr2 123456 111111 222222 123456 222222 111111 I want to remove one of these lines. Does not matter who. I have several thousand such duplicate lines. Thanks in advance

Use a CASE expression in the GROUP BY clause. Query select [EmployeeID], min([IDnr1]) [IDnr1], max([IDnr2]) [IDnr2] from [your_table_name] group by [EmployeeID], case when [IDnr1] > [IDnr2] then [IDnr1] else [IDnr2] end, case when [IDnr1] > [IDnr2] then [IDnr2] else [IDnr1] end; Find a demo here

Related

Compare row with other rows in the same table in sql server

T-SQL : Cleaning up data, merging rows into columns

How can I ignore duplicate rows where columns do not contain data

Clone a void refund transaction record to cancel each other out

How can i update a column in a SQL table if the entry is a duplicate and keep the newest entry?

Categories

Resources