I have a query that identify how many times a ChassisNo was use:
Query:
SELECT
ROW_NUMBER() OVER (
PARTITION BY ChassisNo
ORDER BY datecreated ASC
) row_num,
CollateralType,
LoanID,
ClientID,
CollateralID,
PlateNo,
ChassisNo,
EngineNo,
datecreated,
PreparedBy
FROM
TestAllLoanWithCollaterals
Result:
I highlighted an example of duplicated chassisno three times, some of the chassisno are duplicated 5 times or so, but the main thing is, how can I update all records with the same details with the latest chassisno
Expected result
based on the highlighted example above:
The yellow highlight is the latest record based on the datecreated column and always the last row_num of each chassisno. the blue highlight is the columns that should be updated.
I am thinking of using the Database Cursor but I don't think it is possible.
You may use an update join involving your original table and the logic you have already defined:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ChassisNo ORDER BY datecreated DESC) rn
FROM TestAllLoanWithCollaterals
)
UPDATE a
SET
CollateralType = b.CollateralType
LoanID = b.LoanID
ClientID = b.ClientID
CollateralID = b.CollateralID
PlateNo = b.PlateNo
EngineNo = b.EngineNo
datecreated = b.datecreated
PreparedBy = b.PreparedBy
FROM TestAllLoanWithCollaterals a
INNER JOIN cte b
ON a.ChassisNo = b.ChassisNo
WHERE
b.rn = 1;
Note that the above update logic simply overwrites all fields among duplicate by chassis to use those of the record which were most recently updated in the group.
Related
I am dealing with a scenario where I upload records on a daily and weekly basis.
What is the best way to write a SQL statement that selects only the most rows of those record based on the rundate?
This is NOT what I am seeking:
SELECT *
FROM dbo.Table
WHERE rundate = (SELECT MAX(rundate) from dbo.Table)
I also need the rows from past rundates but only the most recent of those. The problem that I am having is that they could be making changes to the hour amounts/pay codes etc. I just need the most recent records and the past most recent based on the paydate and rundate, if that makes sense. Please see example below:
A nice addition to this would be to also DELETE those older records based on the same criteria. Can someone please shine some light on this?
#iCosmin,
This should get you what your are after:
SELECT *
FROM (
SELECT
MostRecent = ROW_NUMBER() OVER (PARTITION BY Last_Name,First_Name,Position_ID,PayDate ORDER BY RunDate DESC), *
FROM dbo.table
) AS A
WHERE A.MostRecent = 1
Bonus Points Query:
DELETE t
FROM dbo.table t
JOIN ( SELECT
MostRecent = ROW_NUMBER() OVER (PARTITION BY Last_Name,First_Name,Position_ID,PayDate ORDER BY RunDate DESC), *
FROM dbo.table
) AS a ON t.Last_Name = a.Last_Name AND t.First_Name = a.First_Name and t.Position_ID = a.Position_ID AND t.PayDate = a.PayDate AND t.RunDate = a.RunDate
WHERE a.MostRecent <> 1
I found some answers to ways to update using over order by, but not anything that solved my issue. In SQL Server 2014, I have a column of DATES (with inconsistent intervals down to the millisecond) and a column of PRICE, and I would like to update the column of OFFSETPRICE with the value of PRICE from 50 rows hence (ordered by DATES). The solutions I found have the over order by in either the query or the subquery, but I think I need it in both. Or maybe I'm making it more complicated than it is.
In this simplified example, if the offset was 3 rows hence then I need to turn this:
DATES, PRICE, OFFSETPRICE
2018-01-01, 5.01, null
2018-01-03, 8.52, null
2018-02-15, 3.17, null
2018-02-24, 4.67, null
2018-03-18, 2.54, null
2018-04-09, 7.37, null
into this:
DATES, PRICE, OFFSETPRICE
2018-01-01, 5.01, 3.17
2018-01-03, 8.52, 4.67
2018-02-15, 3.17, 2.54
2018-02-24, 4.67, 7.37
2018-03-18, 2.54, null
2018-04-09, 7.37, null
This post was helpful, and so far I have this code which works as far as it goes:
select dates, price, row_number() over (order by dates asc) as row_num
from pricetable;
I haven't yet figured out how to point the update value to the future ordered row. Thanks in advance for any assistance.
LEAD is a useful window function for getting values from subsequent rows. (Also, LAG, which looks at preceding rows,) Here's a direct answer to your question:
;WITH cte AS (
SELECT dates, LEAD(price, 2) OVER (ORDER BY dates) AS offsetprice
FROM pricetable
)
UPDATE pricetable SET offsetprice = cte.offsetprice
FROM pricetable
INNER JOIN cte ON pricetable.dates = cte.dates
Since you asked about ROW_NUMBER, the following does the same thing:
;WITH cte AS (
SELECT dates, price, ROW_NUMBER() OVER (ORDER BY dates ASC) AS row_num
FROM pricetable
),
cte2 AS (
SELECT dates, price, (SELECT price FROM cte AS sq_cte WHERE row_num = cte.row_num + 2) AS offsetprice
FROM cte
)
UPDATE pricetable SET offsetprice = cte2.offsetprice
FROM pricetable
INNER JOIN cte2 ON pricetable.dates = cte2.dates
So, you could use ROW_NUMBER to sort the rows and then use that result to select a value 2 rows ahead. LEAD just does that very thing directly.
I have a table in SQL Server where user is allowed to make changes to the employee's details. Every time a new record is placed in the EMPLOYEE_HIST table. Only the EMP_ID is kept constant for the employee, and all other details are modifiable.
Also there the is a SEQ_NO column which maintains the sequence of entries made.
EMPLOYEE_HIST:
SEQ_NO EMP_ID SOME_VAL1 SOME_VAL2
1 E1 V11 V21 (initial value of this employee)
2 E2 V12 V22 (initial value of this employee)
3 E3 V13 V23 (initial value of this employee)
4 E2 V00 V22
5 E1 V01 V21
6 E2 V02 V22
7 E4 V00 V00 (initial value of this employee)
I want a query which will give me changes made to particular employees, something like
EMP_ID SOME_VAL1_OLD SOME_VAL1_NEW SOME_VAL2_OLD SOME_VAL2_NEW
E1 V11 V01 V21 V21
E2 V12 V00 V22 V22
E2 V00 V02 V22 V22
UPDATE
Also employee details may be modified by user n number of times and for each change, a row should be present in the result set.
Please help.
EDIT:
I finally settled with using LAG function. It will work like this:
SELECT *,ROW_NUMBER() OVER(PARTITION BY EMP_ID,CHANGE_NO ORDER BY EMP_ID,CHANGE_NO,SEQ_NO)
FROM(
SELECT * FROM EMPLOYEE_HIST( SELECT LAG(SOME_VAL1)
OVER(PARTITION BY EMP_ID ORDER BY EMP_ID,SEQ_NO) AS OLD_VAL, SOME_VAL1 AS NEW_VAL, '1' AS CHANGE_NO) T
WHERE OLD_VAL<>NEW_VAL UNION ALL
SELECT * FROM EMPLOYEE_HIST( SELECT LAG(SOME_VAL1) OVER(PARTITION BY EMP_ID ORDER BY EMP_ID,SEQ_NO) AS OLD_VAL, SOME_VAL2 AS NEW_VAL, '2' AS CHANGE_NO) T
WHERE OLD_VAL<>NEW_VAL) TEMP
But the performance is terribly slow for fetching total 500 rows on the table containing 3 million records. Please give some suggestions to improve sorting cost.
You can use a CTE with a Window function if you're using 2008 or newer:
;WITH r AS (
SELECT RANK() OVER (PARTITION BY EMP_ID ORDER BY SEQ_NO DESC) [rank]
, EMP_ID
, SOME_VAL1
, SOME_VAL2
FROM EMPLOYEE_HIST
)
SELECT e.EMP_ID
, s2.SOME_VAL1 [SOME_VAL1_OLD]
, s1.SOME_VAL1 [SOME_VAL1_NEW]
, s2.SOME_VAL2 [SOME_VAL2_OLD]
, s1.SOME_VAL2 [SOME_VAL2_NEW]
FROM (SELECT DISTINCT EMP_ID FROM EMPLOYEE_HIST) AS e
LEFT JOIN r AS s1 ON e.EMP_ID = s1.EMP_ID and s1.rank = 1 --the last change
LEFT JOIN r AS s2 ON e.EMP_ID = s2.EMP_ID and s2.rank = 2 --the second to last change
If you want all of the changes, not just the top two, then you should be able to do something like this:
;WITH r AS (
SELECT RANK() OVER (PARTITION BY EMP_ID ORDER BY SEQ_NO DESC) [rank]
, EMP_ID
, SOME_VAL1
, SOME_VAL2
FROM EMPLOYEE_HIST
)
SELECT e.EMP_ID
, s2.SOME_VAL1 [SOME_VAL1_OLD]
, s1.SOME_VAL1 [SOME_VAL1_NEW]
, s2.SOME_VAL2 [SOME_VAL2_OLD]
, s1.SOME_VAL2 [SOME_VAL2_NEW]
FROM (SELECT DISTINCT EMP_ID FROM EMPLOYEE_HIST) AS e
LEFT JOIN (r AS s1 --the change
INNER JOIN r AS s2 ON s1.EMP_ID = s2.EMP_ID and s2.rank = s1.rank + 1) --previous value
ON e.EMP_ID = s1.EMP_ID
This should enumerate all changes until it encounters the original value.
You could use a CTE to get a partitioned row number, by EMP_ID. Then join that against itself where the row number is offset by 1.
;WITH PartitionedRows
AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY EMP_ID ORDER BY SEQ_NO) AS RowID, EMP_ID, SOME_VAL1,SOME_VAL2
FROM EMPLOYEE_HIST
)
SELECT a.EMP_ID,b.SOME_VAL1 AS SOME_VAL1_OLD,a.SOME_VAL1 AS SOME_VAL1_NEW,b.SOME_VAL2 AS SOME_VAL2_OLD,a.SOME_VAL2 AS SOME_VAL2_NEW
FROM PartitionedRows a
LEFT JOIN PartitionedRows b ON a.EMP_ID = b.EMP_ID AND a.RowID = (b.RowID + 1)
WHERE b.RowID IS NOT NULL
You may be better off with a different data model. You could have a table EMPLOYEE_HIST_OLD that contains the identical data structure. This would allow you to archive the former data (even with a timestamp and/or sequence number), keep the size of the EMPLOYEE_HIST table smaller and w/o data you would not reference regularly, etc. This would allow for a basic join statement between the two tables.
I would then suggest you use the timestamp of the EMPLOYEE_HIST_OLD records to find the most recent modifications, then join those records back to the current records. This will only present to you the changed records. You could limit the query on EMPLOYEE_HIST_OLD to simply return one record (most recent) if you like. SQL query to get most recent row for each instance of a given key
If you must stay within the same EMPLOYEE_HIST table for everything and use the sequence number approach you may wish to use a count() to find changed records for a particular Employee ID and return the values ORDERED by sequence number. You could also limit the query to employees with count > 1. You would then view the data vertically in the table, though. To parse the values into separate columns like VAR1_OLD and VAR1 essentially would require you to only read the last two values and make one record out of two. You lose the visibility of all the changes when trying to view the data horizontally. There could be more than one historical change. To view the records horizontally would require you to do some array manipulation outside of SQL after the data was returned from the query.
For info on counting:
SQL query for finding records where count > 1
I have DB which having 5 column as follows:
message_id
user_id_send
user_id_rec
message_date
message_details
Looking for a SQL Serve Query, I want to Filter Results from two columns (user_id_send,user_id_rec)for Given User ID based on following constrains:
Get the Latest Record (filtered on date or message_id)
Only Unique Records (1 - 2 , 2 - 1 are same so only one record will be returned which ever is the latest one)
Ordered by Descending based on message_id
SQL Query
The main purpose of this query is to get records of user_id to find out to whom he has sent messages and from whom he had received messages.
I have also attached the sheet for your reference.
Here is my try
WITH t
AS (SELECT *
FROM messages
WHERE user_id_sender = 1)
SELECT DISTINCT user_id_reciever,
*
FROM t;
WITH h
AS (SELECT *
FROM messages
WHERE user_id_reciever = 1)
SELECT DISTINCT user_id_sender,
*
FROM h;
;WITH tmpMsg AS (
SELECT M2.message_id
,M2.user_id_receiver
,M2.user_id_sender
,M2.message_date
,M2.message_details
,ROW_NUMBER() OVER (PARTITION BY user_id_receiver+user_id_sender ORDER BY message_date DESC) AS 'RowNum'
FROM messages M2
WHERE M2.user_id_receiver = 1
OR M2.user_id_sender = 1
)
SELECT T.message_id
,T.user_id_receiver
,T.user_id_sender
,T.message_date
,T.message_details
FROM tmpMsg T
WHERE RowNum <= 1
The above should fetch you the results you are looking for when you query for a particular user_id (replace the 1 with parameter e.g. #p_user_id). The user_id_receiver+user_id_sender in the PARTITION clause ensure that records with user id combinations such as 1 - 2, 2 - 1 are not selected twice.
Hope this helps.
select * from
(
select ROW_NUMBER() over (order by message_date DESC) as rowno,
* from messages
where user_id_receiver = 1
--order by message_date DESC
) T where T.rowno = 1
UNION ALL
select * from
(
select ROW_NUMBER() over (order by message_date DESC) as rowno,
* from messages
where user_id_sender = 1
-- order by message_date DESC
) T where T.rowno = 1
Explanation: For each group of user_id_sender, it orders internally by message_date desc, and then adds row numbers, and we only want the first one (chronologically last). Then do the same for user_id_receiver, and union the results together to get 1 result set with all the desired rows. You can then add your own order by clause and additional where conditions at the end as required.
Of course, this only works for any 1 user_id at a time (replace =1 with #user_id).
To get a result from all user_id's at once, is a totally different query, so I hope this helps?
I have a table with set of rows with same RecordtypeCode,
then the single/set row coming from a flatfile/other source like below,
finally I need a unique row in my table by elimating the duplicate Recordtypecode & taking the max of other field information,
Finally my table should like this,
What I tried right now?
Fetching all the rows from my table & then union with the new set of records then wrote the stored procedure (using group by & max keyword) to get the desired output in temp table & finally truncate my table & then insert the temp table data to my table.
Is there is any other better ways to avoid performance issue, because i am going to play with millions of records here.
Difficult to answer without more details, but you could try something like this to get grouped results:
SELECT RecordTypeCode,
Max(AgeGroupFemale60_64),
Max(AgeGroupFemale65_69),
Max(AgeGroupFemale70_74)
FROM [TempTable]
GROUP BY RecordTypeCode
Assuming you are using SQL Server 2005+, you could use MAX() OVER to determine maximum flag values within every Recordtypecode group:
SELECT
Recordtypecode,
AgeGroupFemale60_64,
AgeGroupFemale65_69,
AgeGroupFemale70_74,
MAX(AgeGroupFemale60_64) OVER (PARTITION BY Recordtypecode),
MAX(AgeGroupFemale65_69) OVER (PARTITION BY Recordtypecode),
MAX(AgeGroupFemale70_74) OVER (PARTITION BY Recordtypecode)
FROM
dbo.TempTable
and update all the flags with those values:
WITH maximums AS (
SELECT
Recordtypecode,
AgeGroupFemale60_64,
AgeGroupFemale65_69,
AgeGroupFemale70_74,
MaxFemale60_64 = MAX(AgeGroupFemale60_64) OVER (PARTITION BY Recordtypecode),
MaxFemale65_69 = MAX(AgeGroupFemale65_69) OVER (PARTITION BY Recordtypecode),
MaxFemale70_74 = MAX(AgeGroupFemale70_74) OVER (PARTITION BY Recordtypecode)
FROM
dbo.TempTable
)
UPDATE
maximums
SET
AgeGroupFemale60_64 = MaxFemale60_64,
AgeGroupFemale65_69 = MaxFemale65_69,
AgeGroupFemale70_74 = MaxFemale70_74
;
Next, you could use ROW_NUMBER() to enumerate all the rows within the groups:
SELECT
*
rn = ROW_NUMBER() OVER (PARTITION BY Recordtypecode ORDER BY Recordtypecode)
FROM
dbo.TempTable
and delete all the rows with rn > 1:
WITH enumerated AS (
SELECT
*
rn = ROW_NUMBER() OVER (PARTITION BY Recordtypecode ORDER BY Recordtypecode)
FROM
dbo.TempTable
)
DELETE FROM
enumerated
WHERE
rn > 1
;
Alternatively, instead of the two statements, UPDATE and DELETE, you could use one, MERGE (which now assumes SQL Server 2008+), like this:
WITH enumerated AS (
SELECT
*
rn = ROW_NUMBER() OVER (PARTITION BY Recordtypecode ORDER BY Recordtypecode)
FROM
dbo.TempTable
),
maximums AS (
SELECT
Recordtypecode,
MaxFemale60_64 = MAX(AgeGroupFemale60_64),
MaxFemale65_69 = MAX(AgeGroupFemale65_69),
MaxFemale70_74 = MAX(AgeGroupFemale70_74),
rn = 1
FROM
dbo.TempTable
GROUP BY
Recordtypecode
)
MERGE INTO
enumerated AS tgt
USING
maximums AS src
ON
tgt.Recordtypecode = src.Recordtypecode AND tgt.rn = src.rn
WHEN MATCHED THEN
UPDATE SET
tgt.AgeGroupFemale60_64 = src.MaxFemale60_64,
tgt.AgeGroupFemale65_69 = src.MaxFemale65_69,
tgt.AgeGroupFemale70_74 = src.MaxFemale70_74
WHEN NOT MATCHED THEN
DELETE
;
More information:
OVER Clause (Transact-SQL)
MERGE (Transact-SQL)
Note that there are known issues with the MERGE statement that you need to be aware before deciding to use it. You can start with this article to learn more about them and see whether any of them would apply to your situation:
Use Caution with SQL Server's MERGE Statement