Find the date when a bit column toggled state - sql-server

I have this requirement.
My table contains a series of rows with serialnos and several bit columns and date-time.
To Simplify I will focus on 1 bit column.In essence, I need to know the recent date that this bit was toggled.
Ex: The following table depicts the bit values for 7 serials for the latest 6 days (10 to 5).
SQl Fiddle schema + query
I have succesfully managed to get the result in a sample but is taking ages on the real table containing over 30 million records and approx 300K serial nos.
Pseudo -->
For each Serial:
Get (max Date) bit value as A (latest bit value ex 1)
Get (max Date) NOT A as B ( Find most recent date that was ex 0)
Get the (Min Date) > B
Group by SNO
I am sure an optimised approach exists.
For completeness the dataset contains rows that I need to filter out etc. However I can build and add these later when getting the basic executing more efficiently.
Tks for your time!

with cte as
(
select *, rn = ROW_NUMBER() OVER (ORDER BY sno)
from dbo.TestCape2
)
select MAX(y.Device_date) as MaxDate,
y.SNo
from cte x
inner join cte as y
on x.rn = y.rn + 1
and x.SNo = y.SNo
and x.Cape <> y.Cape
group by y.SNo
order by SNo;
And if you're using SQL-Server 2012 and up you can make use of LAG, which will take a look at the previous row.
select max(Device_date) as MaxDate,
SNo
from (
select SNo
,Device_date
,Cape
,LAG (Cape, 1, 0) OVER (PARTITION BY Sno ORDER BY Device_date) AS PrevCape
,LAG (Sno, 1, 0) OVER (PARTITION BY Sno ORDER BY Device_date) AS PrevSno
from dbo.TestCape2) t
where sno = PrevSno
and t.Cape <> t.PrevCape
group by sno
order by sno;

Related

T-SQL - get only latest row for selected condition

I have table with measurement with column SERIAL_NBR, DATE_TIME, VALUE.
There is a lot of data so when I need them to get the last 48 hours for 2000 devices
Select * from MY_TABLE where [TIME]> = DATEADD (hh, -48, #TimeNow)
takes a very long time.
Is there a way not to receive all the rows for each device, but only the latest entry? Would this speed up the query execution time?
Assuming that there is column named deviceId(change as per your needs), you can use top 1 with ties with window function row_number:
Select top 1 with ties *
from MY_TABLE
where [TIME]> = DATEADD (hh, -48, #TimeNow)
Order by row_number() over (
partition by deviceId
order by Time desc
);
You can simply create Common Table Expression that sorts and groups the entries and then pick the latest one from there.
;WITH numbered
AS ( SELECT [SERIAL_NBR], [TIME], [VALUE], row_nr = ROW_NUMBER() OVER (PARTITION BY [SERIAL_NBR] ORDER BY [TIME] DESC)
FROM MY_TABLE
WHERE [TIME]> = DATEADD (hh, -48, #TimeNow) )
SELECT [SERIAL_NBR], [TIME], [VALUE]
FROM numbered
WHERE row_nr = 1 -- we want the latest record only
Depending on the amount of data and the indexes available this might or might not be faster than Anthony Hancock's answer.
Similar to his answer you might also try the following:
(from MSSQL's point of view, the below query and Anthony's query are pretty much identical and they'll probably end up with the same query plan)
SELECT [SERIAL_NBR] , [TIME], [VALUE]
FROM MY_TABLE AS M
JOIN (SELECT [SERIAL_NBR] , max_time = MAX([TIME])
FROM MY_TABLE
GROUP BY [SERIAL_NBR]) AS L -- latest
ON L.[SERIAL_NBR] = M.[SERIAL_NBR]
AND L.max_time = M.[TIME]
WHERE M.DATE_TIME >= DATEADD(hh,-48,#TimeNow)
Your listed column values and your code don't quite match up so you'll probably have to change this code a little, but it sounds like for each SERIAL_NBR you want the record with the highest DATE_TIME in the last 48 hours. This should achieve that result for you.
SELECT SERIAL_NBR,DATE_TIME,VALUE
FROM MY_TABLE AS M
WHERE M.DATE_TIME >= DATEADD(hh,-48,#TimeNow)
AND M.DATE_TIME = (SELECT MAX(_M.DATE_TIME) FROM MY_TABLE AS _M WHERE M.SERIAL_NBR = _M.SERIAL_NBR)
This will get you details of the latest record per serial number:
Select t.SERIAL_NBR, q.FieldsYouWant
from MY_TABLE t
outer apply
(
selct top 1 t2.FieldsYouWant
from MY_TABLE t2
where t2.SERIAL_NBR = t.SERIAL_NBR
order by t2.[TIME] desc
)q
where t.[TIME]> = DATEADD (hh, -48, #TimeNow)
Also, worth sticking DATEADD (hh, -48, #TimeNow) into a variable rather than calculating inline.

T-SQL - Get last as-at date SUM(Quantity) was not negative

I am trying to find a way to get the last date by location and product a sum was positive. The only way i can think to do it is with a cursor, and if that's the case I may as well just do it in code. Before i go down that route, i was hoping someone may have a better idea?
Table:
Product, Date, Location, Quantity
The scenario is; I find the quantity by location and product at a particular date, if it is negative i need to get the sum and date when the group was last positive.
select
Product,
Location,
SUM(Quantity) Qty,
SUM(Value) Value
from
ProductTransactions PT
where
Date <= #AsAtDate
group by
Product,
Location
i am looking for the last date where the sum of the transactions previous to and including it are positive
Based on your revised question and your comment, here another solution I hope answers your question.
select Product, Location, max(Date) as Date
from (
select a.Product, a.Location, a.Date from ProductTransactions as a
join ProductTransactions as b
on a.Product = b.Product and a.Location = b.Location
where b.Date <= a.Date
group by a.Product, a.Location, a.Date
having sum(b.Value) >= 0
) as T
group by Product, Location
The subquery (table T) produces a list of {product, location, date} rows for which the sum of the values prior (and inclusive) is positive. From that set, we select the last date for each {product, location} pair.
This can be done in a set based way using windowed aggregates in order to construct the running total. Depending on the number of rows in the table this could be a bit slow but you can't really limit the time range going backwards as the last positive date is an unknown quantity.
I've used a CTE for convenience to construct the aggregated data set but converting that to a temp table should be faster. (CTEs get executed each time they are called whereas a temp table will only execute once.)
The basic theory is to construct the running totals for all of the previous days using the OVER clause to partition and order the SUM aggregates. This data set is then used and filtered to the expected date. When a row in that table has a quantity less than zero it is joined back to the aggregate data set for all previous days for that product and location where the quantity was greater than zero.
Since this may return multiple positive date rows the ROW_NUMBER() function is used to order the rows based on the date of the positive quantity day. This is done in descending order so that row number 1 is the most recent positive day. It isn't possible to use a simple MIN() here because the MIN([Date]) may not correspond to the MIN(Quantity).
WITH x AS (
SELECT [Date],
Product,
[Location],
SUM(Quantity) OVER (PARTITION BY Product, [Location] ORDER BY [Date] ASC) AS Quantity,
SUM([Value]) OVER(PARTITION BY Product, [Location] ORDER BY [Date] ASC) AS [Value]
FROM ProductTransactions
WHERE [Date] <= #AsAtDate
)
SELECT [Date], Product, [Location], Quantity, [Value], Positive_date, Positive_date_quantity
FROM (
SELECT x1.[Date], x1.Product, x1.[Location], x1.Quantity, x1.[Value],
x2.[Date] AS Positive_date, x2.[Quantity] AS Positive_date_quantity,
ROW_NUMBER() OVER (PARTITION BY x1.Product, x1.[Location] ORDER BY x2.[Date] DESC) AS Positive_date_row
FROM x AS x1
LEFT JOIN x AS x2 ON x1.Product=x2.Product AND x1.[Location]=x2.[Location]
AND x2.[Date]<x1.[Date] AND x1.Quantity<0 AND x2.Quantity>0
WHERE x1.[Date] = #AsAtDate
) AS y
WHERE Positive_date_row=1
Do you mean that you want to get the last date of positive quantity come to positive in group?
For example, If you are using SQL Server 2012+:
In following scenario, when the date going to 01/03/2017 the summary of quantity come to 1(-10+5+6).
Is it possible the quantity of following date come to negative again?
;WITH tb(Product, Location,[Date],Quantity) AS(
SELECT 'A','B',CONVERT(DATETIME,'01/01/2017'),-10 UNION ALL
SELECT 'A','B','01/02/2017',5 UNION ALL
SELECT 'A','B','01/03/2017',6 UNION ALL
SELECT 'A','B','01/04/2017',2
)
SELECT t.Product,t.Location,SUM(t.Quantity) AS Qty,MIN(CASE WHEN t.CurrentSum>0 THEN t.Date ELSE NULL END ) AS LastPositiveDate
FROM (
SELECT *,SUM(tb.Quantity)OVER(ORDER BY [Date]) AS CurrentSum FROM tb
) AS t GROUP BY t.Product,t.Location
Product Location Qty LastPositiveDate
------- -------- ----------- -----------------------
A B 3 2017-01-03 00:00:00.000

Identify sub-set of records based on date and rules in SQL Server

I have a dataset that looks like this:
I need to identify the rows that have Linked set to 1 but ONLY where they are together when sorted by ToDate descending as in the picture.
In other words I want to be able to identify these records (EDITED):
This is a simplified dataset, in fact there will be many more records...
The logic that defines whether a record is linked is if the FromDate of a record is within 8 weeks of the ToDate of the preceeding date...but this is testData so may not be perfect
What's the best way to do that please?
You can use LAG() and LEAD() analytic functions:
SELECT * FROM (
SELECT t.*,
LAG(t.linked,1,0) OVER(ORDER BY t.FromDate DESC) as rnk_1, --Next one
LEAD(t.linked,1,0) OVER(ORDER BY t.FromDate DESC) as rnk_2, -- Last one,
LEAD(t.linked,2,0) OVER(ORDER BY t.FromDate DESC) as rnk_3 -- Last two,
FROM YourTable t) s
WHERE ((s.rnk_1 = 1 OR s.rnk_2 = 1) AND s.linked = 1) OR
(s.rnk_2 = 1 and s.rnk_3 = 1 and s.linked = 0)
ORDER BY s.FromDate DESC
This will result in records that have linked = 1 and the previous/next record is also 1.
Using LAG and LEAD functions you can examine the previous/next row values given a sort criteria.
You can achieve your required dataset using the following DDL:
;
WITH CTE_LagLead
AS (
SELECT FromDate,
ToDate,
NoOfDays,
Weeks,
Linked,
LAG(Linked, 1, 0) OVER (ORDER BY ToDate DESC) LinkedLag,
LEAD(Linked, 1, 0) OVER (ORDER BY ToDate DESC) LinkedLead
FROM #table
)
SELECT FromDate,
ToDate,
NoOfDays,
Weeks,
Linked
FROM CTE_LagLead
WHERE Linked = 1 AND
(LinkedLag = 1 OR
LinkedLead = 1)
ORDER BY ToDate DESC;
See working example
here is the answer I came up with:
Select
*
from
#tmpAbsences
where
idcol between 1 AND (
Select TOP 1 idcol from #tmpAbsences where Linked=0)
this includes the row 7 in the below picture:

MS SQL Server Can Not Get A Select Sum Column Correct

I am using MS SQL Server Management Studio. What I am trying to do is get a sum as one of my columns for each record but that sum would only sum up values based on the values from the first two columns.
The query looks like this so far:
SELECT DISTINCT
BeginPeriod,
EndPeriod,
(
SUM((select FO_NumPages from tbl_Folder where FO_StatisticDateTime > BeginPeriod AND FO_StatisticDateTime < EndPeriod))
) AS PageCount
FROM
(
SELECT
CONVERT(varchar(12),DATEADD(mm,DATEDIFF(mm,0,tbl_Folder.FO_StatisticDateTime),0),101) AS BeginPeriod,
tbl_Folder.FO_PK_ID AS COL1ID
FROM
tbl_Folder
)AS ProcMonth1
INNER JOIN
(
SELECT
CONVERT(varchar(12),DATEADD(mm,DATEDIFF(mm,0,tbl_Folder.FO_StatisticDateTime)+1,0),101) AS EndPeriod,
tbl_Folder.FO_PK_ID AS COL2ID
FROM
tbl_Folder
)AS ProcNextMonth1
ON ProcMonth1.COL1ID = ProcNextMonth1.COL2ID
ORDER BY BeginPeriod DESC;
The table I am getting the data from would look something like this:
FO_StatisticsDateTime | FO_PK_ID | FO_NumPages
-------------------------------------------------
03/21/2013 | 24 | 5
04/02/2013 | 22 | 6
I want the sum to count the number of pages for each record that is between the beginning period and the end period for each record.
I understand the sum with the select statement has an aggregate error in that function for the column values. But is there a way I can get that sum for each record?
I'm trusting that everything in the FROM clause works as you expect, and would suggest that this change to the top part of your query should get what you want:
SELECT DISTINCT
BeginPeriod,
EndPeriod,
(Select SUM(FO_NumPages)
from tbl_Folder f1
where f1.FO_StatisticDateTime >= ProcMonth1.BeginPeriod
AND f1.FO_StatisticDateTime <= ProcNextMonth1.EndPeriod
) AS PageCount
FROM
(
SELECT
CONVERT(varchar(12),DATEADD(mm,DATEDIFF(mm,0,tbl_Folder.FO_StatisticDateTime),0),101) AS BeginPeriod,
tbl_Folder.FO_PK_ID AS COL1ID
FROM
tbl_Folder
)AS ProcMonth1
INNER JOIN
(
SELECT
CONVERT(varchar(12),DATEADD(mm,DATEDIFF(mm,0,tbl_Folder.FO_StatisticDateTime)+1,0),101) AS EndPeriod,
tbl_Folder.FO_PK_ID AS COL2ID
FROM
tbl_Folder
)AS ProcNextMonth1
ON ProcMonth1.COL1ID = ProcNextMonth1.COL2ID
ORDER BY BeginPeriod DESC;
This should work:
select BeginDate,
EndDate,
SUM(tbl_Folder.FO_NumPages) AS PageCount
from (select distinct dateadd(month,datediff(month,0,FO_StatisticDateTime),0) BeginDate from tbl_Folder) begindates
join (select distinct dateadd(month,datediff(month,0,FO_StatisticDateTime)+1,0) EndDate from tbl_Folder) enddates
on BeginDate < EndDate
join tbl_Folder
on tbl_Folder.FO_StatisticDateTime >= BeginDate
and tbl_Folder.FO_StatisticDateTime < EndDate
group by BeginDate, EndDate
order by 1, 2
I changed your expressions that converted the dates, because the string comparisons won't work as expected.
It joins two sub-queries of distinct beginning and ending dates to get all the possible date combinations. Then it joins that with your data that falls between the dates so that you can come up with your sum.

SQL Select Statement For Calculating A Running Average Column

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on the n previous rows in the resultset.
Let me explain
Id Number Average
1 1 NULL
2 3 NULL
3 2 NULL
4 4 2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
5 6 3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
. . .
. . .
The first 3 rows of the Average column are null because there are no previous rows. The row 4 in the Average column shows the average of the Number column from the previous 3 rows.
I need some help trying to construct a SQL Select statement that will do this.
This should do it:
--Test Data
CREATE TABLE RowsToAverage
(
ID int NOT NULL,
Number int NOT NULL
)
INSERT RowsToAverage(ID, Number)
SELECT 1, 1
UNION ALL
SELECT 2, 3
UNION ALL
SELECT 3, 2
UNION ALL
SELECT 4, 4
UNION ALL
SELECT 5, 6
UNION ALL
SELECT 6, 8
UNION ALL
SELECT 7, 10
--The query
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM RowsToAverage rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
Assuming that the Id column is sequential, here's a simplified query for a table named "MyTable":
SELECT
b.Id,
b.Number,
(
SELECT
AVG(a.Number)
FROM
MyTable a
WHERE
a.id >= (b.Id - 3)
AND a.id < b.Id
AND b.Id > 3
) as Average
FROM
MyTable b;
Edit: I missed the point that it should average the three previous records...
For a general running average, I think something like this would work:
SELECT
id, number,
SUM(number) OVER (ORDER BY ID) /
ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID
A simple self join would seem to perform much better than a row referencing subquery
Generate 10k rows of test data:
drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))
;WITH digits AS (
SELECT 0 as Number
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
)
,numbers as (
SELECT
(thousands.Number * 1000)
+ (hundreds.Number * 100)
+ (tens.Number * 10)
+ ones.Number AS Number
FROM digits AS ones
CROSS JOIN digits AS tens
CROSS JOIN digits AS hundreds
CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers
I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
avg(trailing.Number) as MovingAverage
FROM NumberedRows nr
join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number
On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.
Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
Check out some solutions here. I'm sure that you could adapt one of them easily enough.
If you want this to be truly performant, and arn't afraid to dig into a seldom-used area of SQL Server, you should look into writing a custom aggregate function. SQL Server 2005 and 2008 brought CLR integration to the table, including the ability to write user aggregate functions. A custom running total aggregate would be the most efficient way to calculate a running average like this, by far.
Alternatively you can denormalize and store precalculated running values. Described here:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
Performance of selects is as fast as it goes. Of course, modifications are slower.

Resources