how to remove duplicate rows from a table in SQL Server [duplicate] - sql-server

This question already has answers here:
How can I remove duplicate rows?
(43 answers)
Closed 9 years ago.
I have a table called table1 which has duplicate values. It looks like this:
new
pen
book
pen
like
book
book
pen
but I want to remove the duplicated rows from that table and insert them into another table called table2.
table2 should look like this:
new
pen
book
like
How can I do this in SQL Server?

Let's assume the field was named name:
INSERT INTO table2 (name)
SELECT name FROM table1 GROUP BY name
that query would get you all the unique names.
You could even put them into a table variable if you wanted:
DECLARE #Table2 TABLE (name VARCHAR(50))
INSERT INTO #Table2 (name)
SELECT name FROM table1 GROUP BY name
or you could use a temp table:
CREATE TABLE #Table2 (name VARCHAR(50))
INSERT INTO #Table2 (name)
SELECT name FROM table1 GROUP BY name

You can do this easily with a INSERT that SELECTs from a CTE where you use ROW_NUMBER(), like:
DECLARE #YourTable table (YourColumn varchar(10))
DECLARE #YourTable2 table (YourColumn varchar(10))
INSERT INTO #YourTable VALUES ('new')
INSERT INTO #YourTable VALUES ('pen')
INSERT INTO #YourTable VALUES ('book')
INSERT INTO #YourTable VALUES ('pen')
INSERT INTO #YourTable VALUES ('like')
INSERT INTO #YourTable VALUES ('book')
INSERT INTO #YourTable VALUES ('book')
INSERT INTO #YourTable VALUES ('pen')
;WITH OrderedResults AS
(
SELECT
YourColumn, ROW_NUMBER() OVER (PARTITION BY YourColumn ORDER BY YourColumn) AS RowNumber
FROM #YourTable
)
INSERT INTO #YourTable2
(YourColumn)
SELECT YourColumn FROM OrderedResults
WHERE RowNumber=1
SELECT * FROM #YourTable2
OUTPUT:
YourColumn
----------
book
like
new
pen
(4 row(s) affected)
You can do this easily with a DELETE on a CTE where you use ROW_NUMBER(), like:
--this will just remove them from your original table
DECLARE #YourTable table (YourColumn varchar(10))
INSERT INTO #YourTable VALUES ('new')
INSERT INTO #YourTable VALUES ('pen')
INSERT INTO #YourTable VALUES ('book')
INSERT INTO #YourTable VALUES ('pen')
INSERT INTO #YourTable VALUES ('like')
INSERT INTO #YourTable VALUES ('book')
INSERT INTO #YourTable VALUES ('book')
INSERT INTO #YourTable VALUES ('pen')
;WITH OrderedResults AS
(
SELECT
YourColumn, ROW_NUMBER() OVER (PARTITION BY YourColumn ORDER BY YourColumn) AS RowNumber
FROM #YourTable
)
DELETE OrderedResults
WHERE RowNumber!=1
SELECT * FROM #YourTable
OUTPUT:
YourColumn
----------
new
pen
book
like
(4 row(s) affected)

I posted something on deleting duplicates a couple of weeks ago by using DELETE TOP X. Only for a single set of duplicates obviously. However in the comments I was given this little jewel by Joshua Patchak.
;WITH cte(rowNumber) AS
(SELECT ROW_NUMBER() OVER (PARTITION BY [List of Natural Key Fields]
ORDER BY [List of Order By Fields])
FROM dbo.TableName)
DELETE FROM cte WHERE rowNumber>1
This will get rid of all of the duplicates in the table.
Here is the original post if you want to read the discussion. Duplicate rows in a table.

Related

Calculating statistics of interactions between football players

I have a table of kicks between a number of football players. Most interactions have both a kicker and receiver, but sometimes the pass is made but never received. The table contains 3 columns. For purposes of the example, I have added a "PassID" column to assist with the description of the problem.
The table looks as follows:
create table #T (Player1 varchar(25),Action varchar(25),Player2 varchar(25),PassID int)
insert into #T select 'Jamie','Kicked to','Pierre',1
insert into #T select 'Pierre','Received from ','Jamie',1
insert into #T select 'Jamie','Kicked to ','Mohamed',2
insert into #T select 'Jamie','Received from ','Kun',3
insert into #T select 'Kun ','Kicked to','Jamie',3
insert into #T select 'Mohamed','Received from ','Pierre',4
insert into #T select 'Pierre','Kicked to','Mohamed',4
insert into #T select 'Mohamed','Kicked to','Kun',5
insert into #T select 'Jamie ','Kicked to ','Kun',6
insert into #T select 'Kun ','Received from ','Jamie',6
insert into #T select 'Jamie','Received from ','Kun',7
insert into #T select 'Kun ','Kicked to','Jamie',7
I have to answer the following question using SQL server:
How many unique interactions exist, where a unique interaction is defined as a kick between two players, whether completed or not and where the direction of the interaction does not matter?
In this simple example, I know the answer is 5,being:
Jamie/Pierre
Jamie/Mohamed
Jamie/Kun
Mohamed/Pierre
Mohamed/Kun
How do I extract this answer from the table using T-SQL statement?
SELECT COUNT(DISTINCT CASE
WHEN Player1 > Player2 THEN CONCAT(Player1,'+',Player2)
ELSE CONCAT(Player2,'+',Player1)
END )
FROM #T
WHERE Action = 'Kicked To';
Here is a SQL Fiddle
Try with the below code.
Select CONCAT(x.Player1,'/',x.Player2)Title from (
Select *,ROW_NUMBER() over (PARTITION by passid order by passid)Row from #T
)X
where Row=1

Display few specific rows always at the top

I want to display a few specific Rows always at top of the query results.
For example: Cities Table. Columns: City ID, City Name
I want to fetch Query result where Mumbai, Bangalore, Chennai, Hyderabad should display at the top always.
1st way:
I can insert these records first in the table so that they will get displayed always at the top.
But, this way will not work if any other city gets added after a few months that I also want to display at the top.
Use an iif in your order by clause:
SELECT CityId, CityName
FROM Cities
ORDER BY IIF(CityName IN ('Mumbai', 'Bangalore', 'Chennai', 'Hyderabad'), 0, 1), CityName
You can't rely on the order in which you've entered the records to the table, because database tables are unsorted by nature, and without an order by clause, the order of the result set will be arbitrary.
For more information, read The “Natural order” misconception on my blog.
Try this:
Declare #t table (cityID int,cityname nvarchar(50))
insert into #t values (2,'Gujrat')
insert into #t values (4,'Surat')
insert into #t values (6,'Mumbai')
insert into #t values (3,'Bangalore')
insert into #t values (5,'Chennai')
insert into #t values (1,'Hyderabad')
select * from #t
order by case when cityname in ('Mumbai','Bangalore','Chennai','Hyderabad') then 0 else 1 END
Clean way of doing this,
Declare #t table (cityID int,cityname nvarchar(50))
Declare #DesireOrder table (id int identity,CityID int) -- instead of cityname
insert into #DesireOrder values (6),(3),(5),(1)
insert into #t values (2,'Gujrat')
insert into #t values (4,'Surat')
insert into #t values (6,'Mumbai')
insert into #t values (3,'Bangalore')
insert into #t values (5,'Chennai')
insert into #t values (1,'Hyderabad')
insert into #t values (8,'Delhi')
insert into #t values (7,'New Delhi')
select t.* from #t t
left join DesireOrder O on t.cityid=O.cityid
order by o.id,t.cityID
Main idea is #DesireOrder, rest you can implement as per your requirement.

How to autoincrement the id without identity?

I'm trying do to a bulk insert from another table in sql server. My query is currently like that :
INSERT INTO Table1(Id, Value)
SELECT ??, Value
FROM Table2;
Now, my problem is obviously by what I replace ??. Id is an integer column without an identity property. I would like that for each inserted row, Id take the current max(Id) + 1.
Can I do that directly in my insert command?
If you were using a newer version of SQL Server (2008+) you could try ROW_NUMBER():
DECLARE #BASE INT
SET #BASE = (SELECT IsNull(MAX(ID),0) FROM Table1)
INSERT INTO Table1(Id, Value)
SELECT
#BASE + ROW_NUMBER() OVER (ORDER BY Value) ID,
Value
FROM Table2;
SQL Fiddle
Since you are using SQL Server 2000, you could try like bellow:
DECLARE #BASE INT
SET #BASE = (SELECT IsNull(MAX(ID),0) FROM Table1)
INSERT INTO Table1(Id, Value)
SELECT
#BASE + (SELECT COUNT(*) FROM Table2 AS i2 WHERE i2.Value <= a.Value),
a.Value
FROM Table2 a
But it will only works if Value in Table2 is unique
SQL Fiddle
If Table2 has a primary key (field PK), then you could use:
INSERT INTO Table1(Id, Value)
SELECT
#BASE + (SELECT COUNT(*) FROM Table2 AS i2 WHERE i2.PK <= a.PK),
a.Value
FROM Table2 a
Here is one wicked way.
We create a temp table with identity to generate new ids. This way we avoid the while loop.
DECLARE #CurrentMaxID INT,
#DynamicQuery NVARCHAR(MAX)
--TODO : Acquired table lock here on table1
SELECT #FirstNextID = ISNULL(MAX(Id), 0)
FROM Table1 --WITH(TABLOCK)
CREATE TABLE #TempTableWithID( Table2Id INT,
Table1FuturId INT IDENTITY(1, 1))
INSERT INTO #TempTableWithID(Table2Id)
SELECT Id --Here we use identity to generate table1 futur id
FROM Table2
INSERT INTO Table1(Id, value)
SELECT Temp.Table1FuturId + #FirstNextID,
Table2.Value
FROM Table2
INNER JOIN #TempTableWithID AS Temp ON Table2.Id = Temp.Table2Id
--TODO : release table lock here on table1
DROP TABLE #TempTableWithID
If I'm understanding you correctly, this should work.
CREATE TABLE #tbl1 (ID int, Value float)
CREATE TABLE #tbl2 (ID int, Value float)
INSERT INTO #tbl2 values (4, 2.0)
INSERT INTO #tbl2 values (8, 3.0)
INSERT INTO #tbl2 values (6, 4.0)
INSERT INTO #tbl1 values (1,1.0)
INSERT INTO #tbl1 values (3,3)
INSERT INTO #tbl1 values (9,3)
/*meat and potatoes start*/
INSERT INTO #tbl1(Id, Value)
SELECT (SELECT MAX(ID) FROM #tbl1) + ROW_NUMBER() OVER (ORDER BY Value) ID, Value
FROM #tbl2;
/*meat and potatoes end*/
Select * From #tbl1
drop table #tbl1
drop table #tbl2
Why not IDENT_CURRENT() ?
SELECT IDENT_CURRENT('yourtablename')
It gives you the next ID reference. But this only works if the ID column has IDENTITY turned on.
OR you can try a SEQUENCE and the NEXT VALUE FOR.
i.e.
CREATE TABLE Test.TestTable
(CounterColumn int PRIMARY KEY,
Name nvarchar(25) NOT NULL) ;
GO
INSERT Test.TestTable (CounterColumn,Name)
VALUES (NEXT VALUE FOR Test.CountBy1, 'Syed') ;
GO
SELECT * FROM Test.TestTable;
GO

SQL Server: Multiple Output Clauses

I have two tables, Table_1 and Table_2.
Table_1 has columns PK (autoincrementing int) and Value (nchar(10)).
Table_2 has FK (int), Key (nchar(10)) and Value (nchar(10)).
That is to say, Table_1 is a table of data and Table_2 is a key-value store where one row in Table_1 may correspond to 0, 1 or more keys and values in Table_2.
I'd like to write code that programmatically builds up a query that inserts one row into Table_1 and a variable number of rows into Table_2 using the primary key from Table_1.
I can do it easy with one row:
INSERT INTO Table_1 ([Value])
OUTPUT INSERTED.PK, 'Test1Key', 'Test1Val' INTO Table_2 (FK, [Key], [Value])
VALUES ('Test')
But SQL doesn't seem to like the idea of having multiple rows. This fails:
INSERT INTO Table_1 ([Value])
OUTPUT INSERTED.PK, 'Test1Key', 'Test1Val' INTO Table_2 (FK, [Key], [Value])
OUTPUT INSERTED.PK, 'Test2Key', 'Test2Val' INTO Table_2 (FK, [Key], [Value])
OUTPUT INSERTED.PK, 'Test3Key', 'Test3Val' INTO Table_2 (FK, [Key], [Value])
VALUES ('Test')
Is there any way to make this work?
I had to put the code in answer, in comment it looks ugly...
CREATE TABLE #Tmp(PK int, value nchar(10))
INSERT INTO Table_1 ([Value])
OUTPUT INSERTED.PK, inserted.[Value] INTO #Tmp
SELECT 'Test'
INSERT INTO Table_2 (FK, [Key], Value)
SELECT PK, 'Test1Key', 'Test1Val' FROM #Tmp
UNION ALL SELECT PK, 'Test2Key', 'Test2Val' FROM #Tmp
UNION ALL SELECT PK, 'Test3Key', 'Test3Val' FROM #Tmp
Btw, SQL Server won't let you do it all in one query without some ugly hack...
Try putting the INSERTED.PK value into a parameter, then inserting into table 2 with 3 INSERT..VALUES or 1 INSERT..SELECT statement.

SQL Server Another simple question

I have 2 temp Tables [Description] and [Institution], I want to have these two in one table.
They are both tables that look like this:
Table1; #T1
|Description|
blabla
blahblah
blagblag
Table2; #T2
|Institution|
Inst1
Inst2
Inst3
I want to get it like this:
Table3; #T3
|Description| |Institution|
blabla Inst1
blahblah Inst2
blagblag Inst3
They are already in sort order.
I just need to get them next to each other..
Last time I asked was something almost the same.
I used this query
Create Table #T3
(
[From] Datetime
,[To] Datetime
)
INSERT INTO #T3
SELECT #T1.[From]
, MIN(#T2.[To])
FROM #T1
JOIN #T2 ON #T1.[From] < #T2.[To]
GROUP BY #T1.[From]
Select * from #T3
It did work for the date values, but it won't work here ? :s
Thank you.
One thing that concerns me is that you say that the values "are already in sort order". There really is no default sort order -- if you don't specify a sort order, you are at the mercy of SQL Server to determine the order in which the data is returned. The solution below assumes that there is some way to sort the data such that the records "match up" (using the ORDER BY clauses).
Hope this helps,
John
-- Table 1 test data
Create Table #T1
(
[Description] nvarchar(30)
)
INSERT INTO #T1 ([Description]) VALUES ('desc1')
INSERT INTO #T1 ([Description]) VALUES ('desc2')
INSERT INTO #T1 ([Description]) VALUES ('desc3')
-- Table 2 test data
Create Table #T2
(
[Institution] nvarchar(30)
)
INSERT INTO #T2 (Institution) VALUES ('Inst1')
INSERT INTO #T2 (Institution) VALUES ('Inst2')
INSERT INTO #T2 (Institution) VALUES ('Inst3')
-- Create table 3
Create Table #T3
(
[Description] nvarchar(30),
[Institution] nvarchar(30)
);
-- Use CTE2 to add row numbers to the data; use the row numbers to join the tables
-- you must specify the sort order for the data in the tables
WITH CTE1 (Description, RowNum) AS
(
SELECT [Description], ROW_NUMBER() OVER(ORDER BY [Description]) as RowNum
FROM #T1
),
CTE2 (Institution, RowNum) AS
(
SELECT Institution, ROW_NUMBER() OVER(ORDER BY Institution) as RowNum
FROM #T2
)
INSERT INTO #T3
SELECT CTE1.Description, CTE2.Institution
FROM CTE1
LEFT JOIN CTE2 ON CTE1.RowNum = CTE2.RowNum
Select * from #T3

Resources