Cleaning dupes and keeping max data

Cleaning dupes and keeping max data - sql-server

I have a table that contains duplicates and I need to keep the max data and blow away the rest. Due to requirements I cannot change the date field format and I am getting a conversion error. Any ideas?
DELETE from MAIN_TBL
WHERE ID NOT IN
(
select * from
(SELECT MAX(updated_on)
FROM MAIN_TBL
GROUP BY widget_tag, ID) as TEMP
)
ERROR = Conversion failed when converting date and/or time from character string.

If you want to delete everything except newest rows per widget_tag you could use:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(PARITITION BY widget_tag ORDER BY updated_on DESC) rn
FROM MAIN_TBL
)
DELETE FROM cte WHERE rn <> 1;

You can use EXISTS:
DELETE t
from MAIN_TBL t
WHERE EXISTS (
SELECT 1 FROM MAIN_TBL
WHERE id = t.id and widget_tag = t.widget_tag and updated_on > t.updated_on
)
See the demo.

Related

Update a multiple records with duplicate column value

I have a query that identify how many times a ChassisNo was use:
Query:
SELECT
ROW_NUMBER() OVER (
PARTITION BY ChassisNo
ORDER BY datecreated ASC
) row_num,
CollateralType,
LoanID,
ClientID,
CollateralID,
PlateNo,
ChassisNo,
EngineNo,
datecreated,
PreparedBy
FROM
TestAllLoanWithCollaterals
Result:
I highlighted an example of duplicated chassisno three times, some of the chassisno are duplicated 5 times or so, but the main thing is, how can I update all records with the same details with the latest chassisno
Expected result
based on the highlighted example above:
The yellow highlight is the latest record based on the datecreated column and always the last row_num of each chassisno. the blue highlight is the columns that should be updated.
I am thinking of using the Database Cursor but I don't think it is possible.

You may use an update join involving your original table and the logic you have already defined:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ChassisNo ORDER BY datecreated DESC) rn
FROM TestAllLoanWithCollaterals
)
UPDATE a
SET
CollateralType = b.CollateralType
LoanID = b.LoanID
ClientID = b.ClientID
CollateralID = b.CollateralID
PlateNo = b.PlateNo
EngineNo = b.EngineNo
datecreated = b.datecreated
PreparedBy = b.PreparedBy
FROM TestAllLoanWithCollaterals a
INNER JOIN cte b
ON a.ChassisNo = b.ChassisNo
WHERE
b.rn = 1;
Note that the above update logic simply overwrites all fields among duplicate by chassis to use those of the record which were most recently updated in the group.

SELECT only the most recent upload of records by rundate

I am dealing with a scenario where I upload records on a daily and weekly basis.
What is the best way to write a SQL statement that selects only the most rows of those record based on the rundate?
This is NOT what I am seeking:
SELECT *
FROM dbo.Table
WHERE rundate = (SELECT MAX(rundate) from dbo.Table)
I also need the rows from past rundates but only the most recent of those. The problem that I am having is that they could be making changes to the hour amounts/pay codes etc. I just need the most recent records and the past most recent based on the paydate and rundate, if that makes sense. Please see example below:
A nice addition to this would be to also DELETE those older records based on the same criteria. Can someone please shine some light on this?

#iCosmin,
This should get you what your are after:
SELECT *
FROM (
SELECT
MostRecent = ROW_NUMBER() OVER (PARTITION BY Last_Name,First_Name,Position_ID,PayDate ORDER BY RunDate DESC), *
FROM dbo.table
) AS A
WHERE A.MostRecent = 1
Bonus Points Query:
DELETE t
FROM dbo.table t
JOIN ( SELECT
MostRecent = ROW_NUMBER() OVER (PARTITION BY Last_Name,First_Name,Position_ID,PayDate ORDER BY RunDate DESC), *
FROM dbo.table
) AS a ON t.Last_Name = a.Last_Name AND t.First_Name = a.First_Name and t.Position_ID = a.Position_ID AND t.PayDate = a.PayDate AND t.RunDate = a.RunDate
WHERE a.MostRecent <> 1

unique chat records sql

I have DB which having 5 column as follows:
message_id
user_id_send
user_id_rec
message_date
message_details
Looking for a SQL Serve Query, I want to Filter Results from two columns (user_id_send,user_id_rec)for Given User ID based on following constrains:
Get the Latest Record (filtered on date or message_id)
Only Unique Records (1 - 2 , 2 - 1 are same so only one record will be returned which ever is the latest one)
Ordered by Descending based on message_id
SQL Query
The main purpose of this query is to get records of user_id to find out to whom he has sent messages and from whom he had received messages.
I have also attached the sheet for your reference.
Here is my try
WITH t
AS (SELECT *
FROM messages
WHERE user_id_sender = 1)
SELECT DISTINCT user_id_reciever,
*
FROM t;
WITH h
AS (SELECT *
FROM messages
WHERE user_id_reciever = 1)
SELECT DISTINCT user_id_sender,
*
FROM h;

;WITH tmpMsg AS (
SELECT M2.message_id
,M2.user_id_receiver
,M2.user_id_sender
,M2.message_date
,M2.message_details
,ROW_NUMBER() OVER (PARTITION BY user_id_receiver+user_id_sender ORDER BY message_date DESC) AS 'RowNum'
FROM messages M2
WHERE M2.user_id_receiver = 1
OR M2.user_id_sender = 1
)
SELECT T.message_id
,T.user_id_receiver
,T.user_id_sender
,T.message_date
,T.message_details
FROM tmpMsg T
WHERE RowNum <= 1
The above should fetch you the results you are looking for when you query for a particular user_id (replace the 1 with parameter e.g. #p_user_id). The user_id_receiver+user_id_sender in the PARTITION clause ensure that records with user id combinations such as 1 - 2, 2 - 1 are not selected twice.
Hope this helps.

select * from
(
select ROW_NUMBER() over (order by message_date DESC) as rowno,
* from messages
where user_id_receiver = 1
--order by message_date DESC
) T where T.rowno = 1
UNION ALL
select * from
(
select ROW_NUMBER() over (order by message_date DESC) as rowno,
* from messages
where user_id_sender = 1
-- order by message_date DESC
) T where T.rowno = 1
Explanation: For each group of user_id_sender, it orders internally by message_date desc, and then adds row numbers, and we only want the first one (chronologically last). Then do the same for user_id_receiver, and union the results together to get 1 result set with all the desired rows. You can then add your own order by clause and additional where conditions at the end as required.
Of course, this only works for any 1 user_id at a time (replace =1 with #user_id).
To get a result from all user_id's at once, is a totally different query, so I hope this helps?

How to avoid duplicate rows while inserting a set of row from flatfile in SQL SERVER by considering existing column values

I have a table with set of rows with same RecordtypeCode,
then the single/set row coming from a flatfile/other source like below,
finally I need a unique row in my table by elimating the duplicate Recordtypecode & taking the max of other field information,
Finally my table should like this,
What I tried right now?
Fetching all the rows from my table & then union with the new set of records then wrote the stored procedure (using group by & max keyword) to get the desired output in temp table & finally truncate my table & then insert the temp table data to my table.
Is there is any other better ways to avoid performance issue, because i am going to play with millions of records here.

Difficult to answer without more details, but you could try something like this to get grouped results:
SELECT RecordTypeCode,
Max(AgeGroupFemale60_64),
Max(AgeGroupFemale65_69),
Max(AgeGroupFemale70_74)
FROM [TempTable]
GROUP BY RecordTypeCode

Assuming you are using SQL Server 2005+, you could use MAX() OVER to determine maximum flag values within every Recordtypecode group:
SELECT
Recordtypecode,
AgeGroupFemale60_64,
AgeGroupFemale65_69,
AgeGroupFemale70_74,
MAX(AgeGroupFemale60_64) OVER (PARTITION BY Recordtypecode),
MAX(AgeGroupFemale65_69) OVER (PARTITION BY Recordtypecode),
MAX(AgeGroupFemale70_74) OVER (PARTITION BY Recordtypecode)
FROM
dbo.TempTable
and update all the flags with those values:
WITH maximums AS (
SELECT
Recordtypecode,
AgeGroupFemale60_64,
AgeGroupFemale65_69,
AgeGroupFemale70_74,
MaxFemale60_64 = MAX(AgeGroupFemale60_64) OVER (PARTITION BY Recordtypecode),
MaxFemale65_69 = MAX(AgeGroupFemale65_69) OVER (PARTITION BY Recordtypecode),
MaxFemale70_74 = MAX(AgeGroupFemale70_74) OVER (PARTITION BY Recordtypecode)
FROM
dbo.TempTable
)
UPDATE
maximums
SET
AgeGroupFemale60_64 = MaxFemale60_64,
AgeGroupFemale65_69 = MaxFemale65_69,
AgeGroupFemale70_74 = MaxFemale70_74
;
Next, you could use ROW_NUMBER() to enumerate all the rows within the groups:
SELECT
*
rn = ROW_NUMBER() OVER (PARTITION BY Recordtypecode ORDER BY Recordtypecode)
FROM
dbo.TempTable
and delete all the rows with rn > 1:
WITH enumerated AS (
SELECT
*
rn = ROW_NUMBER() OVER (PARTITION BY Recordtypecode ORDER BY Recordtypecode)
FROM
dbo.TempTable
)
DELETE FROM
enumerated
WHERE
rn > 1
;
Alternatively, instead of the two statements, UPDATE and DELETE, you could use one, MERGE (which now assumes SQL Server 2008+), like this:
WITH enumerated AS (
SELECT
*
rn = ROW_NUMBER() OVER (PARTITION BY Recordtypecode ORDER BY Recordtypecode)
FROM
dbo.TempTable
),
maximums AS (
SELECT
Recordtypecode,
MaxFemale60_64 = MAX(AgeGroupFemale60_64),
MaxFemale65_69 = MAX(AgeGroupFemale65_69),
MaxFemale70_74 = MAX(AgeGroupFemale70_74),
rn = 1
FROM
dbo.TempTable
GROUP BY
Recordtypecode
)
MERGE INTO
enumerated AS tgt
USING
maximums AS src
ON
tgt.Recordtypecode = src.Recordtypecode AND tgt.rn = src.rn
WHEN MATCHED THEN
UPDATE SET
tgt.AgeGroupFemale60_64 = src.MaxFemale60_64,
tgt.AgeGroupFemale65_69 = src.MaxFemale65_69,
tgt.AgeGroupFemale70_74 = src.MaxFemale70_74
WHEN NOT MATCHED THEN
DELETE
;
More information:
OVER Clause (Transact-SQL)
MERGE (Transact-SQL)
Note that there are known issues with the MERGE statement that you need to be aware before deciding to use it. You can start with this article to learn more about them and see whether any of them would apply to your situation:
Use Caution with SQL Server's MERGE Statement

How can i update a column in a SQL table if the entry is a duplicate and keep the newest entry?

I have a database table with about 90 thousand entries. I need to update any older entries so i can move them to an archive table. I was able to find the duplicates where the count is greater than 1 and also find out how many times it was duplicated.
This is query I used for this to work.
SELECT DWPAGECOUNT, DOCTYPE, FILENAME, First, middleinitial, last,
COUNT(*) as Number_of_Duplicates
FROM dbo.REGISTRAR
WHERE first IS NOT NULL
GROUP BY DWPAGECOUNT, DOCTYPE, FILENAME, First, middleinitial, last
HAVING COUNT(*) > 1
ORDER by Number_of_Duplicates desc
I now need to update anything that is on the table more than once and leave the newest entry intact and update a status column to duplicate.
How can I do this?
Thanks in advance for the help.

;WITH x AS
(
SELECT DWPAGECOUNT, DOCTYPE, FILENAME, First, middleinitial, last,
[STATUS],
rn = ROW_NUMBER() OVER
(
PARTITION BY
DWPAGECOUNT, DOCTYPE, FILENAME, First, middleinitial, last
ORDER BY
STOREDATETIME DESC
)
FROM dbo.REGISTRAR
)
UPDATE x SET [STATUS] = CASE WHEN rn > 1 THEN 'DUP' ELSE 'NOT DUP' END;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Cleaning dupes and keeping max data - sql-server

If you want to delete everything except newest rows per widget_tag you could use: WITH cte AS ( SELECT *, ROW_NUMBER() OVER(PARITITION BY widget_tag ORDER BY updated_on DESC) rn FROM MAIN_TBL ) DELETE FROM cte WHERE rn <> 1;

You can use EXISTS: DELETE t from MAIN_TBL t WHERE EXISTS ( SELECT 1 FROM MAIN_TBL WHERE id = t.id and widget_tag = t.widget_tag and updated_on > t.updated_on ) See the demo.

Related

Update a multiple records with duplicate column value

SELECT only the most recent upload of records by rundate

unique chat records sql

How to avoid duplicate rows while inserting a set of row from flatfile in SQL SERVER by considering existing column values

How can i update a column in a SQL table if the entry is a duplicate and keep the newest entry?

Categories

Resources