SQL Newbie - Over Partition? - sql-server

I have the following query. I am trying to get the Row # to increment whenever the value in Value1 field changes. The SensorData table has 2800 records and the Value1 is either 0 or 3 and changes throughout the day.
SELECT
ROW_NUMBER() OVER(PARTITION BY Value1 ORDER BY Block ASC) AS Row#,
GatewayDetailID, Block, Value1
FROM
SensorData
ORDER BY
Row#
I get the following results:
It seems like it creates only 2 partitions 0 and 3. It is not restarting the row number every time the value 1 changes.?

First instead of creating a permanent table I just changed it to a Temp table.
So, Given your example here is what I came up with:
WITH CTE as(
select ROW_NUMBER() OVER(ORDER BY BLOCK) RN, LAG(Value1,1,VALUE1) OVER (ORDER BY BLOCK) LG,
GatewayDetailID, Block, Value1,Value2,Vaule3
from #tmp
),
CTE2 as (
select *, CASE WHEN LG <> VALUE1 THEN RN ELSE 0 END RowMark
from cte
),
CTE3 AS (
select MIN(Block) BL, RowMark from CTE2
GROUP BY ROwMark
),
CTE4 AS (
SELECT GatewayDetailID,Block,Value1,Value2,Vaule3,RMM from cte2 t1
CROSS APPLY (SELECT MAX(ROWMark) RMM FROM CTE3 t9 where t1.Block >= t9.ROwMark and t1.RN >= t9.RowMark) t2
)
SELECT GateWayDetailID,Block,Value1,Value2,Vaule3, ROW_NUMBER() OVER(Partition by RMM ORDER BY BLOCK) RN
FROM CTE4
ORDER BY BLOCK
I first had to get a Row number for all the rows, then depending on when the Value1 changed I marked that as a new group. From that I created a CTE with the date and row boundry for each group. And then lastly I cross applied that back to the table to find each row in each group.
From that last CTE I merely just applied a simple ROW_NUMBER() function portioned by each RowMarker group and poof....row numbers.
There may be a better way to do this, but this was how I logically worked through the problem.

Related

SQL select the row with max value using row_number() or rank()

I have data of following kind:
RowId Name Value
1 s1 12
22 s1 3
13 s1 4
10 s2 14
22 s2 5
3 s2 100
I want to have the following output:
RowId Name Value
1 s1 12
3 s2 100
I am currently using temp tables to get this in two step. I have been trying to use row_number() and rank() functions but have not been successful.
Can someone please help me with syntax as I feel row_number() and rank() will make it cleaner?
Edit:
I changed the rowId to make it a general case
Edit:
I am open to ideas better than row_number() and rank() if there are any.
If you use rank() you can get multiple results when a name has more than 1 row with the same max value. If that is what you are wanting, then switch row_number() to rank() in the following examples.
For the highest value per name (top 1 per group), using row_number()
select sub.RowId, sub.Name, sub.Value
from (
select *
, rn = row_number() over (
partition by Name
order by Value desc
)
from t
) as sub
where sub.rn = 1
I can not say that there are any 'better' alternatives, but there are alternatives. Performance may vary.
cross apply version:
select distinct
x.RowId
, t.Name
, x.Value
from t
cross apply (
select top 1
*
from t as i
where i.Name = t.Name
order by i.Value desc
) as x;
top with ties using row_number() version:
select top 1 with ties
*
from t
order by
row_number() over (
partition by Name
order by Value desc
)
This inner join version has the same issue as using rank() instead of row_number() in that you can get multiple results for the same name if a name has more than one row with the same max value.
inner join version:
select t.*
from t
inner join (
select MaxValue = max(value), Name
from t
group by Name
) as m
on t.Name = m.Name
and t.Value = m.MaxValue;
If you really want to use ROW_NUMBER() you can do it this way:
With Cte As
(
Select *,
Row_Number() Over (Partition By Name Order By Value Desc) RN
From YourTable
)
Select RowId, Name, Value
From Cte
Where RN = 1;
Unless I'm missing something... Why use row_number() or rank?
select rowid, name, max(value) as value
from table
group by rowid, name

Deleting duplicates in a time series

I have a large set of measurements taken every 1 millisecond stored in a SQL Server 2012 table. Whenever there are 3 or more duplicate values in some rows that I would like to delete the middle duplicates. Highlighted values in this image of sample data are the ones that I want to delete. Is there a way to do this with a SQL query?
You can do this using a CTE and ROW_NUMBER:
SQL Fiddle
WITH CteGroup AS(
SELECT *,
grp = ROW_NUMBER() OVER(ORDER BY MS) - ROW_NUMBER() OVER(PARTITION BY Value ORDER BY MS)
FROM YourTable
),
CteFinal AS(
SELECT *,
RN_FIRST = ROW_NUMBER() OVER(PARTITION BY grp, Value ORDER BY MS),
RN_LAST = ROW_NUMBER() OVER(PARTITION BY grp, Value ORDER BY MS DESC)
FROM CteGroup
)
DELETE
FROM CteFinal
WHERE
RN_FIRST > 1
AND RN_LAST > 1
I'm sure there must be a more efficient way to do this, but you could join the table to itself twice to find the previous and next value in the list, and then delete all of the entries where all three values are the same.
DELETE FROM tbl
WHERE ms IN
(
SELECT T.ms
FROM tbl T
INNER JOIN tbl T1 ON T.ms = T1.ms + 1
INNER JOIN tbl T2 ON T.ms = T2.ms - 1
WHERE T.value = T1.value AND T.value = T2.value
)
If the table is really big, I can see this blowing tempdb though.
Yes there is
select * from table group by table.field ->value

Ordering by an expression in a partition by

I have some almost duplicate data in my database (duplicates based on these 5 columns: Date, Code, Expiry, TheType, Strike, there are many more columns but they won't be counted towards labeling a record a duplicate). I want to keep only one record in each case and the one I want to keep is the one whose mtm column is closest to its checkprice column (i.e. minimize abs(mtm-checkprice)). So I think the CTE below gets pretty close if I can just order the partition by that expression. The way I tried gives me the error Invalid column name 'diff'.
WITH CTE AS(
SELECT *, ABS(Mtm - checkprice) as diff,
RN = ROW_NUMBER()OVER(PARTITION BY Date, Strike, Mtm, /* ALL THE OTHER COLUMN NAMES */
ORDER BY diff DESC)
FROM FullStats
)
--DELETE FROM CTE WHERE RN > 1
SELECT * FROM CTE WHERE RN > 1
ORDER BY Date, Code, Expiry, TheType, Strike
Any ideas on how to rectify this?
Use the ABS(mtm-checkprice) in the ORDER BY of the ROW_NUMBER:
WITH CTE AS(
SELECT *, Diff = ABS(mtm-checkprice),
RN = ROW_NUMBER()OVER(PARTITION BY Date, Code, Expiry, TheType, Strike
ORDER BY ABS(mtm-checkprice) ASC)
FROM FullStats
)
--DELETE FROM CTE WHERE RN > 1
SELECT * FROM CTE WHERE RN > 1
ORDER BY Date, Code, Expiry, TheType, Strike
You cannot access Diff in the ROW_NUMBER, only outside of the CTE.

SQL Update with row_number()

I want to update my column CODE_DEST with an incremental number. I have:
CODE_DEST RS_NOM
null qsdf
null sdfqsdfqsdf
null qsdfqsdf
I would like to update it to be:
CODE_DEST RS_NOM
1 qsdf
2 sdfqsdfqsdf
3 qsdfqsdf
I have tried this code:
UPDATE DESTINATAIRE_TEMP
SET CODE_DEST = TheId
FROM (SELECT Row_Number() OVER (ORDER BY [RS_NOM]) AS TheId FROM DESTINATAIRE_TEMP)
This does not work because of the )
I have also tried:
WITH DESTINATAIRE_TEMP AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY [RS_NOM] DESC) AS RN
FROM DESTINATAIRE_TEMP
)
UPDATE DESTINATAIRE_TEMP SET CODE_DEST=RN
But this also does not work because of union.
How can I update a column using the ROW_NUMBER() function in SQL Server 2008 R2?
One more option
UPDATE x
SET x.CODE_DEST = x.New_CODE_DEST
FROM (
SELECT CODE_DEST, ROW_NUMBER() OVER (ORDER BY [RS_NOM]) AS New_CODE_DEST
FROM DESTINATAIRE_TEMP
) x
DECLARE #id INT
SET #id = 0
UPDATE DESTINATAIRE_TEMP
SET #id = CODE_DEST = #id + 1
GO
try this
http://www.mssqltips.com/sqlservertip/1467/populate-a-sql-server-column-with-a-sequential-number-not-using-an-identity/
With UpdateData As
(
SELECT RS_NOM,
ROW_NUMBER() OVER (ORDER BY [RS_NOM] DESC) AS RN
FROM DESTINATAIRE_TEMP
)
UPDATE DESTINATAIRE_TEMP SET CODE_DEST = RN
FROM DESTINATAIRE_TEMP
INNER JOIN UpdateData ON DESTINATAIRE_TEMP.RS_NOM = UpdateData.RS_NOM
Your second attempt failed primarily because you named the CTE same as the underlying table and made the CTE look as if it was a recursive CTE, because it essentially referenced itself. A recursive CTE must have a specific structure which requires the use of the UNION ALL set operator.
Instead, you could just have given the CTE a different name as well as added the target column to it:
With SomeName As
(
SELECT
CODE_DEST,
ROW_NUMBER() OVER (ORDER BY [RS_NOM] DESC) AS RN
FROM DESTINATAIRE_TEMP
)
UPDATE SomeName SET CODE_DEST=RN
This is a modified version of #Aleksandr Fedorenko's answer adding a WHERE clause:
UPDATE x
SET x.CODE_DEST = x.New_CODE_DEST
FROM (
SELECT CODE_DEST, ROW_NUMBER() OVER (ORDER BY [RS_NOM]) AS New_CODE_DEST
FROM DESTINATAIRE_TEMP
) x
WHERE x.CODE_DEST <> x.New_CODE_DEST AND x.CODE_DEST IS NOT NULL
By adding a WHERE clause I found the performance improved massively for subsequent updates. Sql Server seems to update the row even if the value already exists and it takes time to do so, so adding the where clause makes it just skip over rows where the value hasn't changed. I have to say I was astonished as to how fast it could run my query.
Disclaimer: I'm no DB expert, and I'm using PARTITION BY for my clause so it may not be exactly the same results for this query. For me the column in question is a customer's paid order, so the value generally doesn't change once it is set.
Also make sure you have indexes, especially if you have a WHERE clause on the SELECT statement. A filtered index worked great for me as I was filtering based on payment statuses.
My query using PARTITION by
UPDATE UpdateTarget
SET PaidOrderIndex = New_PaidOrderIndex
FROM
(
SELECT PaidOrderIndex, SimpleMembershipUserName, ROW_NUMBER() OVER(PARTITION BY SimpleMembershipUserName ORDER BY OrderId) AS New_PaidOrderIndex
FROM [Order]
WHERE PaymentStatusTypeId in (2,3,6) and SimpleMembershipUserName is not null
) AS UpdateTarget
WHERE UpdateTarget.PaidOrderIndex <> UpdateTarget.New_PaidOrderIndex AND UpdateTarget.PaidOrderIndex IS NOT NULL
-- test to 'break' some of the rows, and then run the UPDATE again
update [order] set PaidOrderIndex = 2 where PaidOrderIndex=3
The 'IS NOT NULL' part isn't required if the column isn't nullable.
When I say the performance increase was massive I mean it was essentially instantaneous when updating a small number of rows. With the right indexes I was able to achieve an update that took the same amount of time as the 'inner' query does by itself:
SELECT PaidOrderIndex, SimpleMembershipUserName, ROW_NUMBER() OVER(PARTITION BY SimpleMembershipUserName ORDER BY OrderId) AS New_PaidOrderIndex
FROM [Order]
WHERE PaymentStatusTypeId in (2,3,6) and SimpleMembershipUserName is not null
I did this for my situation and worked
WITH myUpdate (id, myRowNumber )
AS
(
SELECT id, ROW_NUMBER() over (order by ID) As myRowNumber
FROM AspNetUsers
WHERE UserType='Customer'
)
update AspNetUsers set EmployeeCode = FORMAT(myRowNumber,'00000#')
FROM myUpdate
left join AspNetUsers u on u.Id=myUpdate.id
Simple and easy way to update the cursor
UPDATE Cursor
SET Cursor.CODE = Cursor.New_CODE
FROM (
SELECT CODE, ROW_NUMBER() OVER (ORDER BY [CODE]) AS New_CODE
FROM Table Where CODE BETWEEN 1000 AND 1999
) Cursor
If table does not have relation, just copy all in new table with row number and remove old and rename new one with old one.
Select RowNum = ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) , * INTO cdm.dbo.SALES2018 from
(
select * from SALE2018) as SalesSource
In my case I added a new column and wanted to update it with the equevilat record number for the whole table
id name new_column (ORDER_NUM)
1 Ali null
2 Ahmad null
3 Mohammad null
4 Nour null
5 Hasan null
6 Omar null
I wrote this query to have the new column populated with the row number
UPDATE My_Table
SET My_Table.ORDER_NUM = SubQuery.rowNumber
FROM (
SELECT id ,ROW_NUMBER() OVER (ORDER BY [id]) AS rowNumber
FROM My_Table
) SubQuery
INNER JOIN My_Table ON
SubQuery.id = My_Table.id
after executing this query I had 1,2,3,... numbers in my new column
I update a temp table with the first occurrence of part where multiple parts can be associated with a sequence number. RowId=1 returns the first occurence which I join the tmp table and data using part and sequence number.
update #Tmp
set
#Tmp.Amount=#Amount
from
(SELECT Part, Row_Number() OVER (ORDER BY [Part]) AS RowId FROM #Tmp
where Sequence_Num=#Sequence_Num
)data
where data.Part=#Tmp.Part
and data.RowId=1
and #Tmp.Sequence_Num=#Sequence_Num
I don't have a running ID in order to do what "Basheer AL-MOMANI" suggested.
I did something like this: (joined my table on myself, just to get the Row Number)
update T1 set inID = T2.RN
from (select *, ROW_NUMBER() over (order by ID) RN from MyTable) T1
inner join (select *, ROW_NUMBER() over (order by ID) RN from MyTable) T2 on T2.RN = T1.RN

Filter first then select page

How to first filter the result based on params then to apply where-between?
Some thing like
With Results as
(
Select colName,Title, Row_Number(Over...) as row from a table where colName=5
)
Select * from Results
where
row between #first and #last
But it does not works. I need to move my where colName=5 from with clause to outside then I got wrong data as It first get rows between #first n #last then search for colName=5.
Also I want count of Results.
Any idea?
You can use COUNT(*) OVER() to get the count of the unfiltered results
WITH cte as
(
select *,
ROW_NUMBER() over (order by name desc) AS RN,
count(*) over() AS [Count]
from master..spt_values
)
SELECT name, number,[Count]
FROM cte
WHERE RN BETWEEN 20 AND 24
Returns
name number Count
----------------------------------- ----------- -----------
VIEW 8278 2506
VIEW 8278 2506
view 2 2506
varchar 3 2506
varbinary 1 2506
This has performance implications though. You might want to just calculate the COUNT up front and cache it somewhere rather than recalculating it for every page request.
Your ROW_NUMBER syntax is incorrect. It should be this:
With Results as
(
SELECT colName, Title, ROW_NUMBER() OVER (ORDER BY ...) AS RN
FROM your_table
WHERE colName = 5
)
SELECT * FROM Results
WHERE rn BETWEEN #first AND #last
ORDER BY rn
See the documentation for more information.
I use approach very similar to Martin Smiths (currently selected answer) and at least in the tests I've made it gives better performance results.
; WITH cte as
(
select *,
ROW_NUMBER() over (order by name desc) AS RN
from master..spt_values
)
SELECT name, number, (SELECT COUNT(*) FROM cte) AS [Count]
FROM cte
WHERE RN BETWEEN 20 AND 24
Run this and his queries side by side and compare execution plans.

Resources