I have some almost duplicate data in my database (duplicates based on these 5 columns: Date, Code, Expiry, TheType, Strike, there are many more columns but they won't be counted towards labeling a record a duplicate). I want to keep only one record in each case and the one I want to keep is the one whose mtm column is closest to its checkprice column (i.e. minimize abs(mtm-checkprice)). So I think the CTE below gets pretty close if I can just order the partition by that expression. The way I tried gives me the error Invalid column name 'diff'.
WITH CTE AS(
SELECT *, ABS(Mtm - checkprice) as diff,
RN = ROW_NUMBER()OVER(PARTITION BY Date, Strike, Mtm, /* ALL THE OTHER COLUMN NAMES */
ORDER BY diff DESC)
FROM FullStats
)
--DELETE FROM CTE WHERE RN > 1
SELECT * FROM CTE WHERE RN > 1
ORDER BY Date, Code, Expiry, TheType, Strike
Any ideas on how to rectify this?
Use the ABS(mtm-checkprice) in the ORDER BY of the ROW_NUMBER:
WITH CTE AS(
SELECT *, Diff = ABS(mtm-checkprice),
RN = ROW_NUMBER()OVER(PARTITION BY Date, Code, Expiry, TheType, Strike
ORDER BY ABS(mtm-checkprice) ASC)
FROM FullStats
)
--DELETE FROM CTE WHERE RN > 1
SELECT * FROM CTE WHERE RN > 1
ORDER BY Date, Code, Expiry, TheType, Strike
You cannot access Diff in the ROW_NUMBER, only outside of the CTE.
Related
I am trying to query a table in order to find the prior to last record. For the moment i have tried various solution such as this code but i cannnot make it work (see below)
the fields i am trying to query are from this table [CMI_Industry_Workload].[dbo].[Gantt_Value] and are [ProjectName] and [TimeStamp].
The first field is a string field and the second one is a timestamp field.
select t.*
from (Select t.*
row_number() over (partition by [ProjectName] order by [Timestamp]) as seqnum,
count(*) over (partition by [ProjectName]) as cnt
from [CMI_Industry_Workload].[dbo].[Gantt_Value] t
) t
where seqnum in (1, cnt - 1, cnt);
So i expect to have not the most recent record but the one before that.
Thanks a lot
Gary
This should find prior to last record in your table (assuming that data is sorted by Timestamp field. This will sort rows in descending order by Timestamp, skip (offset) 1 (very last) row and fetch only one row, making that prior to last one.
SELECT T.*
FROM [CMI_Industry_Workload].[dbo].[Gantt_Value] AS T
ORDER BY [Timestamp] DESC
OFFSET 1 ROWS
FETCH NEXT 1 ROWS ONLY;
Above will work with 2012+
This is version with ROW_NUMBER
SELECT *
FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY [Timestamp] DESC) AS Seq
FROM [CMI_Industry_Workload].[dbo].[Gantt_Value]) AS T
WHERE T.Seq = 2;
This will order yet again by Timestamp desc and will pick 2nd value (prior to last one).
Keep in mind that MSSQL 2008 is (or soon will be) out of support. I'd strongly encourage upgrading.
Update
Based on OP's comment, this must be the answer then:
SELECT *
FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY [ProjectName] ORDER BY [Timestamp] DESC) AS Seq
FROM [CMI_Industry_Workload].[dbo].[Gantt_Value]) AS T
WHERE T.Seq > 1;
Second update
If Timestamp values happen to be duplicates and you want to treat them as same, you might want to use DENSE_RANK() instead of ROW_NUMBER(). It's going to assign same sequence number if current value matches previous value within the sequence.
SELECT *
FROM ( SELECT *, DENSE_RANK() OVER (PARTITION BY [ProjectName] ORDER BY [Timestamp] DESC) AS Seq
FROM [CMI_Industry_Workload].[dbo].[Gantt_Value]) AS T
WHERE T.Seq > 1;
I have a sql-server table like this:
date : date
symbol : string
open : money
...
In the act of collecting historical data, I may have accidentally added the same data for a given date more than once. I need to keep one of the rows. But any more than one entry for the given symbol on a given date needs to be deleted. For example, this is wrong (two entries for INTC on 2/2/2019):
1/31/2019 INTC 48.32
2/2/2019 INTC 49.51
2/2/2019 INTC 49.51
How do I delete, per each symbol, duplicate rows automatically through a sql script and leave the rest of the data that does not contain duplicates alone?
You can use some CTE "magic":
WITH CTE AS(
SELECT [date], [Symbol], [open],
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RN
FROM YourTable
WHERE [date] = '20190202'
AND [Symbol] = 'INTC'
AND [open] = 49.51)
DELETE FROM CTE
WHERE RN > 1;
If you want to DELETE any duplicates you've created and assuming that a duplicate denotes 2 or more rows that share the same values for date, symbol and open, then you can do:
WITH CTE AS(
SELECT [date], [Symbol], [open],
ROW_NUMBER() OVER (PARTITION BY [date], [Symbol], [open] ORDER BY (SELECT NULL)) AS RN
FROM YourTable)
DELETE FROM CTE
WHERE RN > 1;
If you should only have one entry per day (or day and symbol perhaps), then create it as a UNIQUE constraint:
ALTER TABLE YourTable ADD CONSTRAINT UK_date_symbol UNIQUE ([date],symbol);
I have the following query. I am trying to get the Row # to increment whenever the value in Value1 field changes. The SensorData table has 2800 records and the Value1 is either 0 or 3 and changes throughout the day.
SELECT
ROW_NUMBER() OVER(PARTITION BY Value1 ORDER BY Block ASC) AS Row#,
GatewayDetailID, Block, Value1
FROM
SensorData
ORDER BY
Row#
I get the following results:
It seems like it creates only 2 partitions 0 and 3. It is not restarting the row number every time the value 1 changes.?
First instead of creating a permanent table I just changed it to a Temp table.
So, Given your example here is what I came up with:
WITH CTE as(
select ROW_NUMBER() OVER(ORDER BY BLOCK) RN, LAG(Value1,1,VALUE1) OVER (ORDER BY BLOCK) LG,
GatewayDetailID, Block, Value1,Value2,Vaule3
from #tmp
),
CTE2 as (
select *, CASE WHEN LG <> VALUE1 THEN RN ELSE 0 END RowMark
from cte
),
CTE3 AS (
select MIN(Block) BL, RowMark from CTE2
GROUP BY ROwMark
),
CTE4 AS (
SELECT GatewayDetailID,Block,Value1,Value2,Vaule3,RMM from cte2 t1
CROSS APPLY (SELECT MAX(ROWMark) RMM FROM CTE3 t9 where t1.Block >= t9.ROwMark and t1.RN >= t9.RowMark) t2
)
SELECT GateWayDetailID,Block,Value1,Value2,Vaule3, ROW_NUMBER() OVER(Partition by RMM ORDER BY BLOCK) RN
FROM CTE4
ORDER BY BLOCK
I first had to get a Row number for all the rows, then depending on when the Value1 changed I marked that as a new group. From that I created a CTE with the date and row boundry for each group. And then lastly I cross applied that back to the table to find each row in each group.
From that last CTE I merely just applied a simple ROW_NUMBER() function portioned by each RowMarker group and poof....row numbers.
There may be a better way to do this, but this was how I logically worked through the problem.
Struggling with what's probably a very simple problem. I have a query like this:
;WITH rankedData
AS ( -- a big, complex subquery)
SELECT UserId,
AttributeId,
ItemId
FROM rankedData
WHERE rank = 1
ORDER BY datEventDate DESC
The sub-query is designed to grab a big chunk of interlined data and rank it by itemId and date, so that the rank=1 in the above query ensures we only get unique ItemIds, ordered by date. The partition is:
Rank() OVER (partition BY ItemId ORDER BY datEventDate DESC) AS rk
The problem is that what I want is the top 75 records for each UserID, ordered by date. Seeing as I've already got a rank inside my sub-query to sort out item duplicates by date, I can't see a straightforward way of doing this.
Cheers,
Matt
I think your query should look like
SELECT t.UserId, t.AttributeId, t.ItemId
FROM (
SELECT UserId, AttributeId, ItemId, rowid = ROW_NUMBER() OVER (
PARTITION BY UserId ORDER BY datEventDate
)
FROM rankedData
) t
WHERE t.rowid <= 75
How to first filter the result based on params then to apply where-between?
Some thing like
With Results as
(
Select colName,Title, Row_Number(Over...) as row from a table where colName=5
)
Select * from Results
where
row between #first and #last
But it does not works. I need to move my where colName=5 from with clause to outside then I got wrong data as It first get rows between #first n #last then search for colName=5.
Also I want count of Results.
Any idea?
You can use COUNT(*) OVER() to get the count of the unfiltered results
WITH cte as
(
select *,
ROW_NUMBER() over (order by name desc) AS RN,
count(*) over() AS [Count]
from master..spt_values
)
SELECT name, number,[Count]
FROM cte
WHERE RN BETWEEN 20 AND 24
Returns
name number Count
----------------------------------- ----------- -----------
VIEW 8278 2506
VIEW 8278 2506
view 2 2506
varchar 3 2506
varbinary 1 2506
This has performance implications though. You might want to just calculate the COUNT up front and cache it somewhere rather than recalculating it for every page request.
Your ROW_NUMBER syntax is incorrect. It should be this:
With Results as
(
SELECT colName, Title, ROW_NUMBER() OVER (ORDER BY ...) AS RN
FROM your_table
WHERE colName = 5
)
SELECT * FROM Results
WHERE rn BETWEEN #first AND #last
ORDER BY rn
See the documentation for more information.
I use approach very similar to Martin Smiths (currently selected answer) and at least in the tests I've made it gives better performance results.
; WITH cte as
(
select *,
ROW_NUMBER() over (order by name desc) AS RN
from master..spt_values
)
SELECT name, number, (SELECT COUNT(*) FROM cte) AS [Count]
FROM cte
WHERE RN BETWEEN 20 AND 24
Run this and his queries side by side and compare execution plans.