Select Duplicate Records - sql-server

I want to retrieve only Duplicated records not unique records.
Suppose I have data which consists of as below
Ids Names
1 A
2 B
1 A
I want like output like the following:
Sno Id Name
1 1 A
2 1 A

Try this:
DECLARE #DataSource TABLE
(
[ID] INT
,[name] CHAR(1)
,[value] CHAR(2)
);
INSERT INTO #DataSource ([ID], [name], [value])
VALUES (1, 'A', 'x1')
,(2, 'B', 'x2')
,(1, 'A', 'x3');
WITH DataSource AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY [ID], [name]) AS [Count]
FROM #DataSource
)
SELECT *
FROM Datasource
WHERE [Count] > 1;
The grouping part is done in the PARTITION BY part of the window function. So, basically, we are counting records for each unique ID - name pairs. Of couse, you are able to add more columns columns here.

SELECT Id, Names
FROM T
GROUP BY Id,Name
HAVING COUNT(*) >1

like your request, you need to create a new column [SNo] that is partitioned on the orignal columns (Names, Id). Those with [SNo] >1 are duplicates. To Filter, just get RCount>1.
See a mockup below:
DECLARE #Records TABLE (Id int, Names VARCHAR(10))
INSERT INTO #Records
SELECT 1, 'A' UNION ALL
SELECT 2, 'B' UNION ALL
SELECT 1, 'A'
----To Get Duplicates -----
SELECT *
FROM
(
SELECT
SNo=ROW_NUMBER()over(PARTITION BY Names,Id order by Id),
RCount=COUNT(*) OVER (PARTITION BY [ID], Names),
*
FROM
#Records
)M
WHERE
RCount>1

Related

T-SQL - Customer Linking

Please run the below code, these are all the same Customer because 2 of them have the same TaxNumber while another one matches one based on CompanyName. I need to link them all and set the ParentCompanyID based on who was created first. I am struggling to get them linked.
CREATE TABLE #Temp
(
CustomerID INT,
CustomerName VARCHAR(20),
CustomerTaxNumber INT,
CreatedDate DATE
)
INSERT INTO #Temp
VALUES (8, 'Company PTY',1234, '2019-09-20'),
(2, 'Company PT', 1234, '2019-09-24'),
(3, 'Company PTY',NULL, '2019-09-29')
SELECT * FROM #Temp
Below is the result that I require....
Any help will be appreciated.
Using case expression with first_value can give you the desired results:
SELECT CustomerID, CustomerName, CustomerTaxNumber, CreatedDate,
CASE WHEN CustomerTaxNumber IS NULL THEN
FIRST_VALUE(CustomerID) OVER(PARTITION BY CustomerName ORDER BY CreatedDate)
ELSE
FIRST_VALUE(CustomerID) OVER(PARTITION BY CustomerTaxNumber ORDER BY CreatedDate)
END As ParentCompanyID
FROM #Temp
Try this:
CREATE TABLE #Temp
(
CustomerID INT,
CustomerName VARCHAR(20),
CustomerTaxNumber INT,
CreatedDate DATE
)
INSERT INTO #Temp
VALUES (8, 'Company PTY',1234, '2019-09-20'),
(2, 'Company PT', 1234, '2019-09-24'),
(3, 'Company PTY',NULL, '2019-09-29')
SELECT DS.[CreatedDate] AS [FirstEntry]
,DS.[CustomerID] AS [ParentCompanyID]
,#Temp.*
FROM #Temp
CROSS APPLY
(
SELECT TOP 1 *
FROM #Temp
ORDER BY CreatedDate
) DS
DROP TABLE #Temp
You are condition is pretty simple - get the first record. If you need to group the records in some way, you can add additional filtering in the CROSS APPLY clause.

How to get records which has more than one entries on another table

An example scenario for my question would be:
How to get all persons who has multiple address types?
Now here's my sample data:
CREATE TABLE #tmp_1 (
ID uniqueidentifier PRIMARY KEY
, FirstName nvarchar(max)
, LastName nvarchar(max)
)
CREATE TABLE #tmp_2 (
SeedID uniqueidentifier PRIMARY KEY
, SomeIrrelevantCol nvarchar(max)
)
CREATE TABLE #tmp_3 (
KeyID uniqueidentifier PRIMARY KEY
, ID uniqueidentifier REFERENCES #tmp_1(ID)
, SeedID uniqueidentifier REFERENCES #tmp_2(SeedID)
, SomeIrrelevantCol nvarchar(max)
)
INSERT INTO #tmp_1
VALUES
('08781F73-A06B-4316-B6A5-802ED58E54BE', 'AAAAAAA', 'aaaaaaa'),
('4EC71FCE-997C-46AA-B119-6C5A2545DDC2', 'BBBBBBB', 'bbbbbbb'),
('B0726ABF-738E-48BC-95CB-091C9D731A0E', 'CCCCCCC', 'ccccccc'),
('6C6CE284-A63C-49D2-B2CC-F25C9CBC8FB8', 'DDDDDDD', 'ddddddd')
INSERT INTO #tmp_2
VALUES
('4D10B4EC-C929-4D6B-8C94-11B680CF2221', 'Value1'),
('4C891FE9-60B6-41BE-A64B-11A9A8B58AB2', 'Value2'),
('6F6EFED6-8EA0-4F70-A63F-6A103D0A71BD', 'Value3')
INSERT INTO #tmp_3
VALUES
(NEWID(), '08781F73-A06B-4316-B6A5-802ED58E54BE', '4D10B4EC-C929-4D6B-8C94-11B680CF2221', 'sdfsdgdfbgcv'),
(NEWID(), '08781F73-A06B-4316-B6A5-802ED58E54BE', '4C891FE9-60B6-41BE-A64B-11A9A8B58AB2', 'asdfadsas'),
(NEWID(), '08781F73-A06B-4316-B6A5-802ED58E54BE', '4C891FE9-60B6-41BE-A64B-11A9A8B58AB2', 'xxxxxeeeeee'),
(NEWID(), '4EC71FCE-997C-46AA-B119-6C5A2545DDC2', '4D10B4EC-C929-4D6B-8C94-11B680CF2221', 'sdfsdfsd'),
(NEWID(), 'B0726ABF-738E-48BC-95CB-091C9D731A0E', '4D10B4EC-C929-4D6B-8C94-11B680CF2221', 'zxczxcz'),
(NEWID(), 'B0726ABF-738E-48BC-95CB-091C9D731A0E', '6F6EFED6-8EA0-4F70-A63F-6A103D0A71BD', 'eerwerwe'),
(NEWID(), '6C6CE284-A63C-49D2-B2CC-F25C9CBC8FB8', '4D10B4EC-C929-4D6B-8C94-11B680CF2221', 'vbcvbcvbcv')
Which gives you:
This is my attempt:
SELECT
t1.*
, Cnt -- not really needed. Just added for visual purposes
FROM #tmp_1 t1
LEFT JOIN (
SELECT
xt.ID
, COUNT(1) Cnt
FROM (
SELECT
#tmp_3.ID
, COUNT(1) as Cnt
FROM #tmp_3
GROUP BY ID, SeedID
) xt
GROUP BY ID
) t2
ON t1.ID = t2.ID
WHERE t2.Cnt > 1
Which gives:
ID FirstName LastName Cnt
B0726ABF-738E-48BC-95CB-091C9D731A0E CCCCCCC ccccccc 2
08781F73-A06B-4316-B6A5-802ED58E54BE AAAAAAA aaaaaaa 2
Although this gives me the correct results, I'm afraid that this query is not the right way to do this performance-wise because of the inner queries. Any input is very much appreciated.
NOTE:
A person can have multiple address of the same address types.
"Person-Address" is not the exact use-case. This is just an example.
The Cnt column is not really needed in the result set.
The way you have named your sample tables and data help little in understanding the problem.
I think you want all IDs which have 2 or more SomeIrrelevantCol values in the last table?
This can be done by:
select * from #tmp_1
where ID in
(
select ID
from #tmp_3
group by ID
having count(distinct SomeIrrelevantCol)>=2
)

How to write a SQL script that deletes duplicate posts

I have a table with these columns:
id (pk, int identity), imei (varchar), name (varchar), lastconnected (datetime)
Some of the entries in this table have the same name and imei, but different id and different lastconnected date.
How can I effectively filter out all entries that have duplicates (with a SQL script), and then delete the one with the latest lastconnected date?
A simple ROW_NUMBER and DELETE should do the trick:
WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY imei, [name] ORDER BY lastconnected DESC)
FROM dbo.YourTable
)
DELETE FROM CTE
WHERE RN = 1;
This is easy and will solve your problem
DECLARE #table TABLE
(
id int,
name varchar(10),
imei varchar(10)
)
insert into #table select 1, 'a','a'
insert into #table select 2, 'b','a'
insert into #table select 3, 'c','a'
insert into #table select 4, 'a','a'
insert into #table select 5, 'c','a'
insert into #table select 6, 'a','a'
insert into #table select 7, 'c','a'
insert into #table select 8, 'a','a'
WHILE (exists (select '' from #table group by name , imei having count(*) > 1))
BEGIN
delete from #table where id in (
select max(id) from #table group by imei , name having count(*) > 1)
End
select * from #table
My first instinct is to use RANK(). This will delete all duplicates, not just the most recent, in cases where things are duplicated multiple times.
delete a
from (
select id, imei, name, lastconnected, RANK() over(partition by imei, name order by lastconnected) as [rank] from #temp
) as a
where a.rank>1
It selects the maximum of the date for each combination of name and iemi and then deletes that particular row.
DELETE FROM yourtablee
WHERE (lastconnecteddate,name,imei) in
(SELECT max(lastconnecteddate), name,imei
FROM yourtable
GROUP BY name,imei)

SQLServer : Grouping and Replacing a COLUMN value with the DATA from other table, without UDF

I would like to replace the numbers in #CommentsTable column "Comments" with the equivalent text from #ModTable table, without using UDF in a single SELECT. May with a CTE. Tried STUFF with REPLACE, but no luck.
Any suggestions would be a great help!
Sample:
DECLARE #ModTable TABLE
(
ID INT,
ModName VARCHAR(10),
ModPos VARCHAR(10)
)
DECLARE #CommentsTable TABLE
(
ID INT,
Comments VARCHAR(100)
)
INSERT INTO #CommentsTable
VALUES (1, 'MyFirst 5 Comments with 6'),
(2, 'MySecond comments'),
(3, 'MyThird comments 5')
INSERT INTO #ModTABLE
VALUES (1, '[FIVE]', '5'),
(1, '[SIX]', '6'),
(1, '[ONE]', '1'),
(1, '[TWO]', '2')
SELECT T1.ID, <<REPLACED COMMENTS>>
FROM #CommentsTable T1
GROUP BY T1.ID, T1.Comments
**Expected Result:**
ID Comments
1 MyFirst [FIVE] Comments with [SIX]
2 MySecond comments
3 MyThird comments [FIVE]
Create a cursor, span over the #ModTable and do each replacement a time
DECLARE replcursor FOR SELECT ModPos, ModName FROM #ModTable;
OPEN replcursor;
DECLARE modpos varchar(100) DEFAULT "";
DECLARE modname varchar(100) DEFAULT "";
get_loop: LOOP
FETCH replcursor INTO #modpos, #modname
SELECT T1.ID, REPLACE(T1.Comments, #modpos, #modname)
FROM #CommentsTable T1
GROUP BY T1.ID, T1.Comments
END LOOP get_loop;
Of course, you can store the results in a temp table and get the results altogether in the end of loop.
You can use a while loop to iterate over the records and the mods. I slightly modified your #ModTable to have unique values for ID. If this is not your data structure, then you can use a window function like ROW_NUMBER() to get a unique value over which you can iterate.
Revised script example:
DECLARE #ModTable TABLE
(
ID INT,
ModName VARCHAR(10),
ModPos VARCHAR(10)
)
DECLARE #CommentsTable TABLE
(
ID INT,
Comments VARCHAR(100)
)
INSERT INTO #CommentsTable
VALUES (1, 'MyFirst 5 Comments with 6'),
(2, 'MySecond comments'),
(3, 'MyThird comments 5')
INSERT INTO #ModTABLE
VALUES (1, '[FIVE]', '5'),
(2, '[SIX]', '6'),
(3, '[ONE]', '1'),
(4, '[TWO]', '2')
declare #revisedTable table (id int, comments varchar(100))
declare #modcount int = (select count(*) from #ModTable)
declare #commentcount int = (select count(*) from #CommentsTable)
declare #currentcomment varchar(100) = ''
while #commentcount > 0
begin
set #modcount = (select count(*) from #ModTable)
set #currentcomment = (select Comments from #CommentsTable where ID = #commentcount)
while #modcount > 0
begin
set #currentcomment = REPLACE( #currentcomment,
(SELECT TOP 1 ModPos FROM #ModTable WHERE ID = #modcount),
(SELECT TOP 1 ModName FROM #ModTable WHERE ID = #modcount))
set #modcount = #modcount - 1
end
INSERT INTO #revisedTable (id, comments)
SELECT #commentcount, #currentcomment
set #commentcount = #commentcount - 1
end
SELECT *
FROM #revisedTable
order by id
I think the will work even though I generally avoid recursive queries. It assumes that you have consecutive ids though:
with Comments as
(
select ID, Comments, 0 as ConnectID
from #CommentsTable
union all
select ID, replace(c.Comments, m.ModPos, m.ModName), m.ConnectID
from Comments c inner join #ModTable m on m.ConnectID = c.ConnectID + 1
)
select * from Comments
where ConnectID = (select max(ID) from #ModTable)
=> CLR Function()
As I have lot of records in "CommentsTable" and the "ModTable" would have multiple ModName for each comments, finally decided to go with CLR Function. Thanks all of you for the suggestions and pointers.

SQL Server Another simple question

I have 2 temp Tables [Description] and [Institution], I want to have these two in one table.
They are both tables that look like this:
Table1; #T1
|Description|
blabla
blahblah
blagblag
Table2; #T2
|Institution|
Inst1
Inst2
Inst3
I want to get it like this:
Table3; #T3
|Description| |Institution|
blabla Inst1
blahblah Inst2
blagblag Inst3
They are already in sort order.
I just need to get them next to each other..
Last time I asked was something almost the same.
I used this query
Create Table #T3
(
[From] Datetime
,[To] Datetime
)
INSERT INTO #T3
SELECT #T1.[From]
, MIN(#T2.[To])
FROM #T1
JOIN #T2 ON #T1.[From] < #T2.[To]
GROUP BY #T1.[From]
Select * from #T3
It did work for the date values, but it won't work here ? :s
Thank you.
One thing that concerns me is that you say that the values "are already in sort order". There really is no default sort order -- if you don't specify a sort order, you are at the mercy of SQL Server to determine the order in which the data is returned. The solution below assumes that there is some way to sort the data such that the records "match up" (using the ORDER BY clauses).
Hope this helps,
John
-- Table 1 test data
Create Table #T1
(
[Description] nvarchar(30)
)
INSERT INTO #T1 ([Description]) VALUES ('desc1')
INSERT INTO #T1 ([Description]) VALUES ('desc2')
INSERT INTO #T1 ([Description]) VALUES ('desc3')
-- Table 2 test data
Create Table #T2
(
[Institution] nvarchar(30)
)
INSERT INTO #T2 (Institution) VALUES ('Inst1')
INSERT INTO #T2 (Institution) VALUES ('Inst2')
INSERT INTO #T2 (Institution) VALUES ('Inst3')
-- Create table 3
Create Table #T3
(
[Description] nvarchar(30),
[Institution] nvarchar(30)
);
-- Use CTE2 to add row numbers to the data; use the row numbers to join the tables
-- you must specify the sort order for the data in the tables
WITH CTE1 (Description, RowNum) AS
(
SELECT [Description], ROW_NUMBER() OVER(ORDER BY [Description]) as RowNum
FROM #T1
),
CTE2 (Institution, RowNum) AS
(
SELECT Institution, ROW_NUMBER() OVER(ORDER BY Institution) as RowNum
FROM #T2
)
INSERT INTO #T3
SELECT CTE1.Description, CTE2.Institution
FROM CTE1
LEFT JOIN CTE2 ON CTE1.RowNum = CTE2.RowNum
Select * from #T3

Resources