T-SQL - get only latest row for selected condition

T-SQL - get only latest row for selected condition - sql-server

I have table with measurement with column SERIAL_NBR, DATE_TIME, VALUE.
There is a lot of data so when I need them to get the last 48 hours for 2000 devices
Select * from MY_TABLE where [TIME]> = DATEADD (hh, -48, #TimeNow)
takes a very long time.
Is there a way not to receive all the rows for each device, but only the latest entry? Would this speed up the query execution time?

Assuming that there is column named deviceId(change as per your needs), you can use top 1 with ties with window function row_number:
Select top 1 with ties *
from MY_TABLE
where [TIME]> = DATEADD (hh, -48, #TimeNow)
Order by row_number() over (
partition by deviceId
order by Time desc
);

You can simply create Common Table Expression that sorts and groups the entries and then pick the latest one from there.
;WITH numbered
AS ( SELECT [SERIAL_NBR], [TIME], [VALUE], row_nr = ROW_NUMBER() OVER (PARTITION BY [SERIAL_NBR] ORDER BY [TIME] DESC)
FROM MY_TABLE
WHERE [TIME]> = DATEADD (hh, -48, #TimeNow) )
SELECT [SERIAL_NBR], [TIME], [VALUE]
FROM numbered
WHERE row_nr = 1 -- we want the latest record only
Depending on the amount of data and the indexes available this might or might not be faster than Anthony Hancock's answer.
Similar to his answer you might also try the following:
(from MSSQL's point of view, the below query and Anthony's query are pretty much identical and they'll probably end up with the same query plan)
SELECT [SERIAL_NBR] , [TIME], [VALUE]
FROM MY_TABLE AS M
JOIN (SELECT [SERIAL_NBR] , max_time = MAX([TIME])
FROM MY_TABLE
GROUP BY [SERIAL_NBR]) AS L -- latest
ON L.[SERIAL_NBR] = M.[SERIAL_NBR]
AND L.max_time = M.[TIME]
WHERE M.DATE_TIME >= DATEADD(hh,-48,#TimeNow)

Your listed column values and your code don't quite match up so you'll probably have to change this code a little, but it sounds like for each SERIAL_NBR you want the record with the highest DATE_TIME in the last 48 hours. This should achieve that result for you.
SELECT SERIAL_NBR,DATE_TIME,VALUE
FROM MY_TABLE AS M
WHERE M.DATE_TIME >= DATEADD(hh,-48,#TimeNow)
AND M.DATE_TIME = (SELECT MAX(_M.DATE_TIME) FROM MY_TABLE AS _M WHERE M.SERIAL_NBR = _M.SERIAL_NBR)

This will get you details of the latest record per serial number:
Select t.SERIAL_NBR, q.FieldsYouWant
from MY_TABLE t
outer apply
(
selct top 1 t2.FieldsYouWant
from MY_TABLE t2
where t2.SERIAL_NBR = t.SERIAL_NBR
order by t2.[TIME] desc
)q
where t.[TIME]> = DATEADD (hh, -48, #TimeNow)
Also, worth sticking DATEADD (hh, -48, #TimeNow) into a variable rather than calculating inline.

Related

How can I get, and re-use, the next 6 dates from today in SQL Server?

I need to create a CTE I can re-use that will hold seven dates. That is today and the next six days.
So, output for today (4/22/2022) should be:
2022-04-22
2022-04-23
2022-04-24
2022-04-25
2022-04-26
2022-04-27
2022-04-28
So far, I have this:
WITH seq AS
(
SELECT 0 AS [idx]
UNION ALL
SELECT [idx] + 1
FROM seq
WHERE [idx] < 6
)
SELECT DATEADD(dd, [idx], CONVERT(date, GETDATE()))
FROM seq;
The problem is my SELECT is outside the WITH, so I would need to wrap this whole thing with another WITH to re-use it, for example to JOIN on it as a list of dates, and I'm not having luck getting that nested WITH to work. How else could I accomplish this?
To be clear: I'm not trying to find records in a specific table full of dates that are from the next seven days. There are plenty of easy solutions for that. I need a list of dates for today and the next six days, that I can re-use in other queries as a CTE.

You're close. Here's an example:
with cte as (
select
1 as n
,GETDATE() as dt
union all
select
n+1
,DATEADD(dd,n,GETDATE()) as dt
from cte
where n <= 6
)
select * from cte
Fiddle here
You can create a view for reusability and simply query the view rather than using the same CTE over and over again.

You can do this by adding a second column for the date to the CTE:
WITH seq AS (
SELECT 0 AS [idx], cast(current_timestamp as date) as date
UNION ALL
SELECT [idx] + 1, dateadd(dd, idx+1, cast(current_timestamp as date))
FROM seq
WHERE [idx] < 6
)
SELECT *
FROM seq;
See it here:
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=208ecd76be2071529078f38b1735b0cd
Another option is you can "stack" CTEs, rather than nest, to avoid the second column:
WITH seq0 AS (
SELECT 0 AS [idx]
UNION ALL
SELECT [idx] + 1
FROM seq0
WHERE [idx] < 6
),
seq As (
SELECT dateadd(dd, idx, cast(current_timestamp as date)) as idx
FROM seq0
)
SELECT *
FROM seq;
Note how the final query only needed to reference the 2nd CTE.
See it here:
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=22315438e4710792f368009cc6ff6451

Never recommend using recursion if you don't have a need for it. It's more complex and slower. I'd just use a hardcoded list of numbers, could encapsulate it in TVF if you wanted to reuse it across different stored procedures/functions. If you need to reuse it in 1 stored proc in multiple places, I'd just throw it in a temp table.
CTE Version without Recursion
WITH cte_7days AS (
SELECT theDate = CAST(DATEADD(dd,num,GETDATE()) AS DATE)
FROM (VALUES (0),(1),(2),(3),(4),(5),(6)) AS A(num)
)
SELECT *
FROM cte_7days
CROSS APPLY Version to Remove Need for CTE
Could use something like this as your base query, and then just add more joins below table depending on your query
SELECT theDate
FROM (VALUES (0),(1),(2),(3),(4),(5),(6)) AS A(num)
CROSS APPLY (SELECT theDate = CAST(DATEADD(DAY,num,GETDATE()) AS DATE)) AS B
TVF Version
CREATE FUNCTION dbo.uf_7days()
RETURNS TABLE AS
RETURN
(
SELECT theDate
FROM (VALUES (0),(1),(2),(3),(4),(5),(6)) AS A(num)
CROSS APPLY (SELECT theDate = CAST(DATEADD(DAY,num,GETDATE()) AS DATE)) AS B
)
GO

update using over order by row_number()

I found some answers to ways to update using over order by, but not anything that solved my issue. In SQL Server 2014, I have a column of DATES (with inconsistent intervals down to the millisecond) and a column of PRICE, and I would like to update the column of OFFSETPRICE with the value of PRICE from 50 rows hence (ordered by DATES). The solutions I found have the over order by in either the query or the subquery, but I think I need it in both. Or maybe I'm making it more complicated than it is.
In this simplified example, if the offset was 3 rows hence then I need to turn this:
DATES, PRICE, OFFSETPRICE
2018-01-01, 5.01, null
2018-01-03, 8.52, null
2018-02-15, 3.17, null
2018-02-24, 4.67, null
2018-03-18, 2.54, null
2018-04-09, 7.37, null
into this:
DATES, PRICE, OFFSETPRICE
2018-01-01, 5.01, 3.17
2018-01-03, 8.52, 4.67
2018-02-15, 3.17, 2.54
2018-02-24, 4.67, 7.37
2018-03-18, 2.54, null
2018-04-09, 7.37, null
This post was helpful, and so far I have this code which works as far as it goes:
select dates, price, row_number() over (order by dates asc) as row_num
from pricetable;
I haven't yet figured out how to point the update value to the future ordered row. Thanks in advance for any assistance.

LEAD is a useful window function for getting values from subsequent rows. (Also, LAG, which looks at preceding rows,) Here's a direct answer to your question:
;WITH cte AS (
SELECT dates, LEAD(price, 2) OVER (ORDER BY dates) AS offsetprice
FROM pricetable
)
UPDATE pricetable SET offsetprice = cte.offsetprice
FROM pricetable
INNER JOIN cte ON pricetable.dates = cte.dates
Since you asked about ROW_NUMBER, the following does the same thing:
;WITH cte AS (
SELECT dates, price, ROW_NUMBER() OVER (ORDER BY dates ASC) AS row_num
FROM pricetable
),
cte2 AS (
SELECT dates, price, (SELECT price FROM cte AS sq_cte WHERE row_num = cte.row_num + 2) AS offsetprice
FROM cte
)
UPDATE pricetable SET offsetprice = cte2.offsetprice
FROM pricetable
INNER JOIN cte2 ON pricetable.dates = cte2.dates
So, you could use ROW_NUMBER to sort the rows and then use that result to select a value 2 rows ahead. LEAD just does that very thing directly.

T-SQL CTE self-reference CROSS APPLY previous row by date with gaps

I have an updatable table of date-value sequence (say dbo.sequence) in SQL Server 2014. Dates are unique.
When new updates come I want to distribute that values into different columns in a separate table (say dbo.distributed_values) by certain conditions, e.g. if previous value from dbo.sequence is less/greater than current dbo.sequence value, it gets inserted into specified column of dbo.distributed_values or becomes NULL in that column.
Here is the main idea:
;WITH
CTE_tbl (date, value, val_1, val_2, val_3)
AS (
SELECT ... FROM dbo.distributed_values -- get latest values from database
UNION ALL
SELECT
SEQ.date,
SEQ.value,
CASE
WHEN ABS (SEQ.value - prev.value) >= 0.5
THEN SEQ.value
ELSE NULL
END AS val_1,
...
FROM dbo.sequence AS SEQ
CROSS APPLY (SELECT * FROM CTE_tbl WHERE date = DATEADD(DAY, -1, SEQ.date)) AS prev
)
INSERT INTO dbo.distributed_values (...)
SELECT *
FROM CTE_tbl
ORDER BY date ASC
OPTION (MAXRECURSION 1000)
Seems it works mostly, but the dbo.sequence contains gaps, so I can not use things like date = DATEADD(DAY, -1, SEQ.date) to bind on previous row properly.
2012-01-04
2012-01-05
2012-01-06
2012-01-09
2012-01-10
2012-01-11
How to bind previous value correctly in case of date gaps?
UPD:
By the way, I can not use LAG ... OVER in WHERE clause, I tried. Could it be used here somehow?

Add another CTE and use that in your recursive CTE, something like this:
;WITH
SequenceWithPrevious AS(
SELECT *
,PrevValue = LAG(value,1,NULL) OVER (ORDER BY SEQ.date)
,Prevdate = LAG(date,1,NULL) OVER (ORDER BY SEQ.date)
FROM dbo.sequence AS SEQ
),
CTE_tbl (date, value, val_1, val_2, val_3)
AS (
SELECT ... FROM dbo.distributed_values -- get latest values from database
UNION ALL
SELECT ...
FROM SequenceWithPrevious AS SEQ
CROSS APPLY (SELECT * FROM CTE_tbl WHERE date = SEQ.PrevDate) AS prev
)
INSERT INTO dbo.distributed_values (...)
SELECT *
FROM CTE_tbl
ORDER BY date ASC
OPTION (MAXRECURSION 1000)

How can I order by count with pagination?

I have to migrate some SQL from PostgreSQL to SQL Server (2005+). On PostgreSQL i had:
select count(id) as count, date
from table
group by date
order by count
limit 10 offset 25
Now i need the same SQL but for SQL Server. I did it like below, but get error: Invalid column name 'count'. How to solve it ?
select * from (
select row_number() over (order by count) as row, count(id) as count, date
from table
group by date
) a where a.row >= 25 and a.row < 35

You can't reference an alias by name, at the same scope, except in an ending ORDER BY (it is an invalid reference inside of a windowing function at the same scope).
To get the exact same results, it may need to be extended to (nesting scope for clarity):
SELECT c, d FROM
(
SELECT c, d, ROW_NUMBER() OVER (ORDER BY c) AS row FROM
(
SELECT d = [date], c = COUNT(id) FROM dbo.table GROUP BY [date]
) AS x
) AS y WHERE row >= 25 AND row < 35;
This can be shortened a little bit as per mohan's answer.
SELECT c, d FROM
(
SELECT COUNT(id), [date], ROW_NUMBER() OVER (ORDER BY COUNT(id))
FROM dbo.table GROUP BY [date]
) AS y(c, d, row)
WHERE row >= 25 AND row < 35;
In SQL Server 2012, it's much easier with OFFSET / FETCH - closer to the syntax you're used to, but actually using ANSI-compatible syntax rather than proprietary voodoo.
SELECT c = COUNT(id), d = [date]
FROM dbo.table GROUP BY [date]
ORDER BY COUNT(id)
OFFSET 25 ROWS FETCH NEXT 10 ROWS ONLY;
I blogged about this functionality in 2010 (lots of good comments there too) and should probably invest some time doing some serious performance tests.
And I agree with #ajon - I hope your real tables, columns and queries don't abuse reserved words like this.

It works
DECLARE #startrow int=0,#endrow int=0
;with CTE AS (
select row_number() over ( order by count(id)) as row,count(id) AS count, date
from table
group by date
)
SELECT * FROM CTE
WHERE row between #startrow and #endrow

I think this will do it
select * from (
select row_number() over (order by id) as row, count(id) as count, date
from table
group by date
) a where a.row >= 25 and a.row < 35
Also, I don't know what version of SQL Server you are using but SQL Server 2012 has a new Paging feature

SQL Calculate (time) gap between occurrences in a log

I have tables that record when certain items were sent or returned to a particular location, and I want to work out the intervals between each time a particular item is returned.
Sample data:
Item ReturnDate:
Item1, 20120101
Item1, 20120201
Item1, 20120301
Item2, 20120401
Item2, 20120601
So in this case, we can see that the there was a month gap until Item 1 was returned the first time, and another month before it was returned the second time. Item 2 came back after 2 months.
My starting point is:
Select r1.Item, r1.ReturnDate, r2.Item, r2.ReturnDate, DateDiff(m, r1.ReturnDate, r2.ReturnDate)
from Returns r1
inner join Returns r2 on r2.VehicleNo = r1.VehicleNo
However, in the this sample, each item is compared to every other instance where it has been returned - and not just the next one. So I need to limit this query so it will only compare adjacent returns.
One solution is to tag each return with an count (of the number of times that item has been returned):
Item ReturnDate, ReturnNo:
Item1, 20120101, 1
Item1, 20120201, 2
Item1, 20120301, 3
Item2, 20120401, 1
Item2, 20120601, 2
This would enable me to use the following T-SQL (or similar):
Select r1.Item, r1.ReturnDate, r2.Item, r2.ReturnDate, DateDiff(m, r1.ReturnDate, r2.ReturnDate)
from Returns r1
inner join Returns r2 on r2.VehicleNo = r1.VehicleNo
and (r1.ReturnNo + 1 = r2.ReturnNo)
My first question is whether the is a sensible/optimal approach or whether there is a better approach?
Secondly, what is the easiest/slickest means of calculating the ReturnNo?

If you are using SQL Server 2005+, use ROW_NUMBER() to do exactly what you want:
WITH RankedReturn AS
(
SELECT Item, ReturnDate,
ROW_NUMBER() OVER (PARTITION BY Item ORDER BY ReturnDate DESC) AS ReturnNo
FROM Returns
)
SELECT * FROM RankedReturn
Obviously, now that you have your CTE you can put whatever you need in the outer SELECT. I would use an OUTER APPLY for this:
WITH RankedReturn AS
(
SELECT Item, ReturnDate,
ROW_NUMBER() OVER (PARTITION BY Item ORDER BY ReturnDate DESC) AS ReturnNo
FROM Returns
)
SELECT rOuter.Item, rOuter.ReturnDate, DATEDIFF(month, prev.PrevDate, ReturnDate) AS Months
FROM RankedReturn rOuter
OUTER APPLY
(
SELECT ReturnDate AS PrevDate
FROM RankedReturn rInner
WHERE rOuter.Item = rInner.Item AND rOuter.ReturnNo = rInner.ReturnNo - 1
) prev
Oops, and the SQL Fiddle is here.
Edited because the month difference calculation was backwards; fixed now

Easiest way of calculating the ReturnNo would be to use OVER:
SELECT [Item], [ReturnDate],
ROW_NUMBER() OVER (PARTITION BY [Item] ORDER BY [ReturnDate]) AS ReturnNumber
FROM Returns
http://sqlfiddle.com/#!3/e18ad/1/0
You could also attempt to make use of the techniques for calculating a running total to work out the difference between two rows.

This is how I would do it:
select itemNo,
dt,
DATEDIFF(day, previousDt, dt) as daysSince
from (select itemNo,
dt,
(select top 1 dt from testTable where itemNo = outerTbl.itemNo and dt < outerTbl.dt order by dt desc) as previousDt
from testTable as outerTbl
) as x
... and here's a bit of setup code for anybody else testing a solution to this
create table testTable(
itemNo nvarchar(20),
dt datetime)
go
insert into testTable values('Item1', '2012-01-01');
insert into testTable values('Item1', '2012-02-01');
insert into testTable values('Item1', '2012-03-01');
insert into testTable values('Item2', '2012-04-01');
insert into testTable values('Item2', '2012-05-01');
go