How to Remove Duplicate Statement - sql-server

How to delete duplicate data row in SQL Server where there are not any unique value differences? I remain only one statement from my sales table (dbo.Sales)
ID DESCRIPTIONS QTY RATE AMOUNT
--------------------------------
1 APPLE 50 100 1000
1 APPLE 50 100 1000
1 APPLE 50 100 1000
1 APPLE 50 100 1000

We can try using a CTE here to arbitrarily delete all but one of the duplicates:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID, DESCRIPTIONS, QTY, RATE, AMOUNT
ORDER BY (SELECT NULL)) rn
FROM yourTable
)
DELETE
FROM cte
WHERE rn > 1;

You can delete like following.
DELETE A
FROM (SELECT Row_number()
OVER (
partition BY id, descriptions, qty, rate, amount
ORDER BY (SELECT 1)) AS rn
FROM table1) A
WHERE a.rn > 1
If you want to use CTE, you can try like following.
;WITH cte
AS (SELECT Row_number()
OVER(
partition BY id, descriptions, qty, rate, amount
ORDER BY (SELECT 1)) RN
FROM table1)
DELETE FROM cte
WHERE rn > 1

you can use this:
select distinct * into temp from tableName
delete from tableName
insert into tableName
select * from temp
drop table temp

I suggest to add a column like rn and feed it by row_number() over (Partition by ID, DESCRIPTIONS ,QTY, RATE, AMOUNT order by Id)
Now delete the data having rn not equal to 1
after completion drop that column... this is a one time solution if it is frequent that add a unique key in your table

Related

Getting Top 3 values for each id and status

I have data something like this,
ID Time Status
--- ---- ------
1 10 B
1 20 B
1 30 C
1 70 C
1 100 B
1 490 D
The desired result should be,
ID Time Status
1 490 D
1 100 B
1 70 C
This is how,I should get top 3 Time vales for ID and distinct status.
For this I Tried:-
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY TIME DESC) AS rn
FROM MyTable
)
SELECT id,TIME,Status
FROM cte
where rn<=3
But it doesn't meet my requirement iam gettng top 3 duplicates staus values,How can i solve this.Help!
Partition by status as well:
WITH cte AS (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id, status
ORDER BY TIME DESC
) AS rn
FROM MyTable t
)
SELECT id, TIME, Status
FROM t
WHERE rn <= 3;
The with ties argument of the top function will return all the of the rows which match the top values:
select top (3) with ties id, Time, Status from table1 order by Time desc
Alternatively, if you wanted to return 3 values only, but make sure they are always the same 3 values, then you will need to use something else as a tie-breaker. In this case, it looks like your id column could be unique.
select top (3) id, Time, Status from table1 order by Time desc, id
Try this:
select distinct id,max(time) over (partition by id,status) as time ,status
from mytable t order by time desc
Output -
id time status
1 490 D
1 100 B
1 70 C
EDIT:
select distinct TOP 3 id,max(time) over (partition by id,status) as time,status
from mytable t order by time desc
Try this:
SELECT TOP 3 * FROM [MyTable] WHERE [Id] = 1 ORDER BY [Time] DESC
This will give you top three records for ID = 1. For any other ID, just change the number in WHERE clause.
Additionally you can make some stored procedure to UNION all top three records for each ID - this can be done using looping through all distinct IDs in your table :)
Try using RANK.
You may use the below query to get your desired result.
select * from
(select *, RANK() over(partition by status order by time desc) as rn from myTable)T
where rn = 1
FIDDLE

How to test against a list of items in an if statement

I have a large table (130 columns). It is a monthly dataset that is separated by month (jan,feb,mar,...). every month I get a small set of duplicate rows. I would like to remove one of the rows, it does not matter which row to be deleted.
This query seems to work ok when I only select the ID that I want to filter the dups on, but when I select everything "*" from the table I end up with all of the rows, dups included. My goal is to filter out the dups and insert the result set into a new table.
SELECT DISTINCT a.[ID]
FROM MonthlyLoan a
JOIN (SELECT COUNT(*) as Count, b.[ID]
FROM MonthlyLoan b
GROUP BY b.[ID])
AS b ON a.[ID] = b.[ID]
WHERE b.Count > 1
and effectiveDate = '01/31/2017'
Any help will be appreciated.
This will show you all duplicates per ID:
;WITH Duplicates AS
(
SELECT ID
rn = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
FROM MonthlyLoan
)
SELECT ID,
rn
FROM Duplicates
WHERE rn > 1
Alternatively, you can set rn = 2 to find the immediate duplicate per ID.
Since your ID is dupped (A DUPPED ID!!!!)
all you need it to use the HAVING clause in your aggregate.
See the below example.
declare #tableA as table
(
ID int not null
)
insert into #tableA
values
(1),(2),(2),(3),(3),(3),(4),(5)
select ID, COUNT(*) as [Count]
from #tableA
group by ID
having COUNT(*) > 1
Result:
ID Count
----------- -----------
2 2
3 3
To insert the result into a #Temporary Table:
select ID, COUNT(*) as [Count]
into #temp
from #tableA
group by ID
having COUNT(*) > 1
select * from #temp

tsq - take the highest int in duplicate row

I have the following rows in a table
name, tagid
-------
test1,1
test1,100
test2,2
test2,200
test3,3
test3,300
There are duplicates in the name.
Is there a way to select unique names by taking the highest tagid of each group?
select name,max(tagid) as highest_tagid
from tbl
group by name
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY name ORDER BY tagid DESC) AS rn
FROM table_1
)
SELECT *
FROM cte
WHERE rn = 1

Subtract top two rows from one column using one id

does anyone know how can I subtract top two rows from one column only using one id? Here's my sample query:
SELECT top 2 a.consumption,
coalesce(a.consumption -
(SELECT b.consumption
FROM tbl_t_billing b
WHERE b.id = a.id + 1), a.consumption) AS diff
FROM tbl_t_billing a
WHERE a.customerId = '5'
ORDER BY a.dateCreated DESC
I want to know how to get the difference between the top 2 rows using one id from the consumption column using the customerId #5. I've tried but I can't get the right query for that. Can somebody help me please? Thanks!
try this:
;with cte as
(
select consumption, customerId,
row_number() over (partiton by customerid order by datecreated desc) rn
from tbl_t_billing where customerId = '5'
)
select a.customerId, a.consumption,
coalesce((a.consumption - b.consumption), a.consumption) consumption_diff
from cte a left outer join cte b on a.rn + 1 = b.rn
where a.rn = 1
declare #tbl_t_billing table(consumption int, customerId int, datecreated datetime)
insert into #tbl_t_billing
values
(10,5,'20100101'),
(7,5,'20000101'),
(9,4,'20100101'),
(5,4,'20000101'),
(8,3,'20100101'),
(3,3,'20000101'),
(7,2,'20100101'),
(3,2,'20000101'),
(4,1,'20100101'),
(2,1,'20000101')
-- get the difference between the last two consumption values for each customerId
select
customerId,
sum(consumption) diff
from(
select
customerId,
consumption *
case row_number() over(partition by customerId order by datecreated desc)
when 1 then 1 when 2 then -1
end consumption
from #tbl_t_billing
) t
group by customerId

Filter first then select page

How to first filter the result based on params then to apply where-between?
Some thing like
With Results as
(
Select colName,Title, Row_Number(Over...) as row from a table where colName=5
)
Select * from Results
where
row between #first and #last
But it does not works. I need to move my where colName=5 from with clause to outside then I got wrong data as It first get rows between #first n #last then search for colName=5.
Also I want count of Results.
Any idea?
You can use COUNT(*) OVER() to get the count of the unfiltered results
WITH cte as
(
select *,
ROW_NUMBER() over (order by name desc) AS RN,
count(*) over() AS [Count]
from master..spt_values
)
SELECT name, number,[Count]
FROM cte
WHERE RN BETWEEN 20 AND 24
Returns
name number Count
----------------------------------- ----------- -----------
VIEW 8278 2506
VIEW 8278 2506
view 2 2506
varchar 3 2506
varbinary 1 2506
This has performance implications though. You might want to just calculate the COUNT up front and cache it somewhere rather than recalculating it for every page request.
Your ROW_NUMBER syntax is incorrect. It should be this:
With Results as
(
SELECT colName, Title, ROW_NUMBER() OVER (ORDER BY ...) AS RN
FROM your_table
WHERE colName = 5
)
SELECT * FROM Results
WHERE rn BETWEEN #first AND #last
ORDER BY rn
See the documentation for more information.
I use approach very similar to Martin Smiths (currently selected answer) and at least in the tests I've made it gives better performance results.
; WITH cte as
(
select *,
ROW_NUMBER() over (order by name desc) AS RN
from master..spt_values
)
SELECT name, number, (SELECT COUNT(*) FROM cte) AS [Count]
FROM cte
WHERE RN BETWEEN 20 AND 24
Run this and his queries side by side and compare execution plans.

Resources