Remove duplicate records from views in SQL Server

Remove duplicate records from views in SQL Server - sql-server

How to remove duplicate records from a view? I need to keep them on the physical table, but in the view, I don't want the duplicates
Here is the query I used:
CREATE VIEW myview
AS
SELECT DISTINCT *
FROM [roug].[dbo].[Table_1]
ORDER BY id
for the table :
id| name age
----------
c1 ann 12
u2 joe 15
c1 ann 12
c1 ann 12
u5 dev 13
u3 Jim 16
u3 Jim 16

You can either use DISTINCT or ROW_NUMBER() Like this
create view myview as
WITH CTE
AS
(
SELECT
RN = ROW_NUMBER() OVER(PARTITION BY [Id],[Name],[Age] ORDER BY ID),
*
FROM [roug].[dbo].[Table_1]
)
SELECT
[Id],[Name],[Age]
FROM CTE
WHERE RN = 1

If you want to delete data then you should be doing it in the source table not the view. A standard approach for de-duping is via a cte. Try
;
WITH cte
AS (SELECT id
, name
, age
, ROW_NUMBER() OVER (PARTITION BY id, name, age ORDER BY id) RN
FROM Table_1
)
DELETE FROM cte
WHERE RN > 1
Depends on if you want to delete the actual data, or just not display it in the view.

Related

How to Remove Duplicate Statement

How to delete duplicate data row in SQL Server where there are not any unique value differences? I remain only one statement from my sales table (dbo.Sales)
ID DESCRIPTIONS QTY RATE AMOUNT
--------------------------------
1 APPLE 50 100 1000
1 APPLE 50 100 1000
1 APPLE 50 100 1000
1 APPLE 50 100 1000

We can try using a CTE here to arbitrarily delete all but one of the duplicates:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID, DESCRIPTIONS, QTY, RATE, AMOUNT
ORDER BY (SELECT NULL)) rn
FROM yourTable
)
DELETE
FROM cte
WHERE rn > 1;

You can delete like following.
DELETE A
FROM (SELECT Row_number()
OVER (
partition BY id, descriptions, qty, rate, amount
ORDER BY (SELECT 1)) AS rn
FROM table1) A
WHERE a.rn > 1
If you want to use CTE, you can try like following.
;WITH cte
AS (SELECT Row_number()
OVER(
partition BY id, descriptions, qty, rate, amount
ORDER BY (SELECT 1)) RN
FROM table1)
DELETE FROM cte
WHERE rn > 1

you can use this:
select distinct * into temp from tableName
delete from tableName
insert into tableName
select * from temp
drop table temp

I suggest to add a column like rn and feed it by row_number() over (Partition by ID, DESCRIPTIONS ,QTY, RATE, AMOUNT order by Id)
Now delete the data having rn not equal to 1
after completion drop that column... this is a one time solution if it is frequent that add a unique key in your table

Update table but skipping some rows with specific condition

I have a table called body_scan that looks like this:
body_no tag
--------------------
1 noscan
2 noscan
3 missing
4 noscan
5 missing
I also have a list that I can load into a temp table like so
tag_no
------
aaa
bbb
ccc
What I need to be able to do is to update the body_scan table with the tag numbers in the temporary table.
You will notice that there are only 3 tags in the temp table but 5 in the body_scan table. I need to update the tag value "noscan" with values from the temp table and leave the missing as they are..
The order of the tags in the temporary table is the same as the order of body_no from the body_scan table.
So yes, I did consider the row_number() function. But I'm just not 100% sure how to define the join correctly..
How do I achieve this please?
The desired result is :
body_no tag
-------------------
1 aaa
2 bbb
3 missing
4 ccc
5 missing

Firstly, you need to preserve the input file order of data by adding an identity field to the temp_table (note that some ETL tools insert data in parallel and that messes things up so you might even need to add this column to the file)
Once you've done that, you need to generate a key in body_scan that you can join to. This is simply ROW_NUMBER() over the existing table, excluding the missing rows
This returns the row and what it should be matched to in temp_table
SELECT
body_no,
ROW_NUMBER() OVER (ORDER BY body_no) RN
FROM body_scan
WHERE tag<> 'missing';
This joins in the temp table (assumes your ordinal column is called RowID)
SELECT T1.body_no, T1.tag, T1.RN, T2.tag_no
FROM
(
SELECT
body_no,tag,
ROW_NUMBER() OVER (ORDER BY body_no) RN
FROM body_scan
WHERE tag<> 'missing'
) T1
INNER JOIN
temp_table T2
ON T1.RN=T2.RowID;
This updates it back to the table:
UPDATE TGT
SET tag=SRC.tag_no
FROM body_scan TGT
INNER JOIN
(
SELECT T1.body_no, T2.tag_no
FROM
(
SELECT
body_no,tag,
ROW_NUMBER() OVER (ORDER BY body_no) RN
FROM body_scan
WHERE tag<> 'missing'
) T1
INNER JOIN
temp_table T2
ON T1.RN=T2.RowID
) SRC
ON SRC.body_no=TGT.body_no;
(There's half a dozen ways to write that final statement but I prefer this way as you can see the dataset you're updating from in the subselect)

I cant understand your explanation and command discussion. I workout(in SQL 2012) to achieve your OUTPUT table. As,
update a
set a.tag = t.tag
from (
select m.*, ROW_NUMBER() over(partition by m.tag order by m.rn)trn from(
select *, row_number() over(partition by (select null) order by (select null)) rn from body_scan --set order what the order of actual table's order
) m --set row number for noscan rows
) a
join(
select *, ROW_NUMBER() over(order by (select null)) rn from #temp --set order what the order of actual table order
) t
on a.trn = t.rn and a.tag <> 'missing' -- join to noscan rows using row numbers
OUTPUT:
body_no tag
--------------
1 aaa
2 bbb
3 missing
4 ccc
5 missing

Getting Top 3 values for each id and status

I have data something like this,
ID Time Status
--- ---- ------
1 10 B
1 20 B
1 30 C
1 70 C
1 100 B
1 490 D
The desired result should be,
ID Time Status
1 490 D
1 100 B
1 70 C
This is how,I should get top 3 Time vales for ID and distinct status.
For this I Tried:-
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY TIME DESC) AS rn
FROM MyTable
)
SELECT id,TIME,Status
FROM cte
where rn<=3
But it doesn't meet my requirement iam gettng top 3 duplicates staus values,How can i solve this.Help!

Partition by status as well:
WITH cte AS (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id, status
ORDER BY TIME DESC
) AS rn
FROM MyTable t
)
SELECT id, TIME, Status
FROM t
WHERE rn <= 3;

The with ties argument of the top function will return all the of the rows which match the top values:
select top (3) with ties id, Time, Status from table1 order by Time desc
Alternatively, if you wanted to return 3 values only, but make sure they are always the same 3 values, then you will need to use something else as a tie-breaker. In this case, it looks like your id column could be unique.
select top (3) id, Time, Status from table1 order by Time desc, id

Try this:
select distinct id,max(time) over (partition by id,status) as time ,status
from mytable t order by time desc
Output -
id time status
1 490 D
1 100 B
1 70 C
EDIT:
select distinct TOP 3 id,max(time) over (partition by id,status) as time,status
from mytable t order by time desc

Try this:
SELECT TOP 3 * FROM [MyTable] WHERE [Id] = 1 ORDER BY [Time] DESC
This will give you top three records for ID = 1. For any other ID, just change the number in WHERE clause.
Additionally you can make some stored procedure to UNION all top three records for each ID - this can be done using looping through all distinct IDs in your table :)

Try using RANK.
You may use the below query to get your desired result.
select * from
(select *, RANK() over(partition by status order by time desc) as rn from myTable)T
where rn = 1
FIDDLE

Join two tables with conditions depending on multiples columns

In SQL Server 2008, I want to join two table on key that might have duplicate, but the match is unique with the information from other columns.
For a simplified purchase record example,
Table A:
UserId PayDate Amount
1 2015 100
1 2010 200
2 2014 150
Table B:
UserId OrderDate Count
1 2009 4
1 2014 2
2 2013 5
Desired Result:
UserId OrderDate PayDate Amount Count
1 2009 2010 200 4
1 2014 2015 100 2
2 2013 2014 150 5
It's guaranteed that:
Table A and Table B have same number of rows, and UserId in both table are same set of numbers.
For any UserId, PayDate is always later than OrderDate
Rows with same UserId are matched by sorted sequence of Date. For example, Row 1 in Table A should match Row 2 in Table B
My idea is that on both tables, first sort by Date, then add another Id column, then join on this Id column. But I not authorized to write anything into the database. How can I do this task?

Row_Number() will be your friend here. It allows you to add a virtual sequencing to your resultset.
Run this and study the output:
SELECT UserID
, OrderDate
, "Count" As do_not_use_reserved_words_for_column_names
, Row_Number() OVER (PARTITION BY UserID ORDER BY OrderDate) As sequence
FROM table_b
The PARTITION BY determines when the counter should be "reset" i.e. it should restart after a change of UserID
The ORDER BY, well, you've guessed it - determines the order of the sequence!
Pull this all together:
; WITH payments AS (
SELECT UserID
, PayDate
, Amount
, Row_Number() OVER (PARTITION BY UserID ORDER BY PayDate) As sequence
FROM table_b
)
, orders AS (
SELECT UserID
, OrderDate
, "Count" As do_not_use_reserved_words_for_column_names
, Row_Number() OVER (PARTITION BY UserID ORDER BY OrderDate) As sequence
FROM table_b
)
SELECT orders.UserID
, orders.OrderDate
, orders.do_not_use_reserved_words_for_column_names
, payments.PayDate
, payments.Amount
FROM orders
LEFT
JOIN payments
ON payments.UserID = orders.UserID
AND payments.sequence = orders.sequence
P.S. I've opted for an outer join because I assumed that there's not always going to be a payment for every order.

Try:
;WITH t1
AS
(
SELECT UserId, PayDate, Amount,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY PayDate) AS RN
FROM TableA
),
t2
AS
(
SELECT UserId, OrderDate, [Count],
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY OrderDate) AS RN
FROM TableB
)
SELECT t1.UserId, t2.OrderDate, t1.PayDate, t1.Amount, t2.[Count]
FROM t1
INNER JOIN t2
ON t1.UserId = t2.UserId AND t1.RN = t2.RN

How do you select nth row for each ID?

How do you select all nth rows for each ID?
My table looks somewhat like this :
ID fName data
1 Hari 20
1 Hari 30
2 John 89
2 John 38
2 John 55
In this case, how do you select all 2nd rows for each ID?
The result would look like this :
ID fName data
1 Hari 30
2 John 38

This will help in SQL SERVER 2012:
SELECT ID, FNAME, DATA FROM
(
SELECT TEST_DATA.*,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ORDER_BY_CONDITION) AS RANK
FROM TEST_DATA
) T
WHERE T.RANK=2
Change your order by condition(ORDER_BY_CONDITION accordingly
Fiddle for SQL SERVER 2012 here: http://sqlfiddle.com/#!6/f59a1/3
EDIT: For multiple tables, you can try with CTE as in the fiddle : http://sqlfiddle.com/#!6/8a5b1d/10

Does the following query work for you? (Replace the <tablename> and the <datecolumn> names)
SELECT tbl.*
FROM <tablename> tbl
INNER JOIN
(
SELECT
ID,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY <datecolumn>) rn
FROM <tablename>
) row_numbers
ON tbl.ID = row_numbers.ID AND tbl.<datecolumn> = row_numbers.<datecolumn> AND row_numbers.rn = 2;
Reference:
ROW_NUMBER function on MSDN

Using a CTE and ROW_NUMBER()...
;
WITH cteData ( ID, fName, data )
AS ( SELECT ID ,
fName ,
data ,
ROW_NUMBER()
OVER ( PARTITION BY ID ORDER BY DateField ) AS 'rowNum'
FROM tblName
)
SELECT ID ,
fName ,
data
FROM cteData
WHERE rowNum = 2--Im assuming 2 from the data presented in question

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Remove duplicate records from views in SQL Server - sql-server

You can either use DISTINCT or ROW_NUMBER() Like this create view myview as WITH CTE AS ( SELECT RN = ROW_NUMBER() OVER(PARTITION BY [Id],[Name],[Age] ORDER BY ID), * FROM [roug].[dbo].[Table_1] ) SELECT [Id],[Name],[Age] FROM CTE WHERE RN = 1

Related

How to Remove Duplicate Statement

Update table but skipping some rows with specific condition

Getting Top 3 values for each id and status

Join two tables with conditions depending on multiples columns

How do you select nth row for each ID?

Categories

Resources