Calculate a Recursive Rolling Average in SQL Server

Calculate a Recursive Rolling Average in SQL Server - sql-server

We are attempting to calculate a rolling average and have tried to convert numerous SO answers to solve the problem. To this point we are still unsuccessful.
What we've tried:
Here are some of the SO answers we have considered.
SQL Server: How to get a rolling sum over 3 days for different customers within same table
SQL Query for 7 Day Rolling Average in SQL Server
T-SQL calculate moving average
Our latest attempt has been to modify one of the solutions (#4) found here.
https://www.red-gate.com/simple-talk/sql/t-sql-programming/calculating-values-within-a-rolling-window-in-transact-sql/
Example:
Here is an example in SQL Fiddle: http://sqlfiddle.com/#!6/4570a/17
In the fiddle, we are still trying to get the SUM to work right but ultimately we are trying to get the average.
The end goal
Using the Fiddle example, we need to find the difference between Value1 and ComparisonValue1 and present it as Diff1. When a row has no Value1 available, we need to estimate it by taking the average of the last two Diff1 values and then add it to the ComparisonValue1 for that row.
With the correct query, the result would look like this:
GroupID Number ComparisonValue1 Diff1 Value1
5 10 54.78 2.41 57.19
5 11 55.91 2.62 58.53
5 12 55.93 2.78 58.71
5 13 56.54 2.7 59.24
5 14 56.14 2.74 58.88
5 15 55.57 2.72 58.29
5 16 55.26 2.73 57.99
Question: is it possible to calculate this average when it could potentially factor into the average of the following rows?
Update:
Added a VIEW to the Fiddle schema to simplify the final query.
Updated the query to include the new rolling average for Diff1 (column Diff1Last2Avg). This rolling average works great until we run into nulls in the Value1 column. This is where we need to insert the estimate.
Updated the query to include the estimate that should be used when there is no Value1 (column Value1Estimate). This is working great and would be perfect if we could use the estimate in place of NULL in the Value1 column. Since the Diff1 column reflects the difference between Value1 (or its estimate) and ComparisonValue1, including the Estimate would fill in all the NULL values in Diff1. This in turn would continue to allow the Estimates of future rows to be calculated. It gets confusing at this point, but still hacking away at it. Any ideas?

Credit for the idea goes to this answer: https://stackoverflow.com/a/35152131/6305294 from #JesúsLópez
I have included comments in the code to explain it.
UPDATE
I have corrected the query based on comments.
I have swapped numbers in minuend and subtrahend to get difference as a positive number.
Removed Diff2Ago column.
Results of the query now exactly match your sample output.
;WITH cte AS
(
-- This is similar to your ItemWithComparison view
SELECT i.Number, i.Value1, i2.Value1 AS ComparisonValue1,
-- Calculated Differences; NULL will be returned when i.Value1 is NULL
CONVERT( DECIMAL( 10, 3 ), i.Value1 - i2.Value1 ) AS Diff
FROM Item AS i
LEFT JOIN [Group] AS G ON g.ID = i.GroupID
LEFT JOIN Item AS i2 ON i2.GroupID = g.ComparisonGroupID AND i2.Number = i.Number
WHERE NOT i2.Id IS NULL
),
cte2 AS(
/*
Start with the first number
Note if you do not have at least 2 consecutive numbers (in cte) with non-NULL Diff value and therefore Diff1Ago or Diff2Ago are NULL then everything else will not work;
You may need to add additional logic to handle these cases */
SELECT TOP 1 -- start with the 1st number (see ORDER BY)
a.Number, a.Value1, a.ComparisonValue1, a.Diff, b.Diff AS Diff1Ago
FROM cte AS a
-- "1 number ago"
LEFT JOIN cte AS b ON a.Number - 1 = b.Number
WHERE NOT a.Value1 IS NULL
ORDER BY a.Number
UNION ALL
SELECT b.Number, b.Value1, b.ComparisonValue1,
( CASE
WHEN NOT b.Value1 IS NULL THEN b.Diff
ELSE CONVERT( DECIMAL( 10, 3 ), ( a.Diff + a.Diff1Ago ) / 2.0 )
END ) AS Diff,
a.Diff AS Diff1Ago
FROM cte2 AS a
INNER JOIN cte AS b ON a.Number + 1 = b.Number
)
SELECT *, ( CASE WHEN Value1 IS NULL THEN ComparisonValue1 + Diff ELSE Value1 END ) AS NewValue1
FROM cte2 OPTION( MAXRECURSION 0 );
Limitations:
this solution works well only when you need to consider small number of preceding values.

Related

Is there a way you can produce an output like this in T-SQL

I have a column which I translate the values using a case statements and I get numbers like this below. There are multiple columns I need to produce the result like this and this is just one column.
How do you produce the output as a whole like this below.
The 12 is the total numbers counting from top to bottom
49 is the Average.
4.08 is the division 49/12.
1 is how many 1's are there in the output list above. As you can see there is only one 1 in the output above
8.33% is the division and percentage comes from 1/12 * 100
and so on. Is there a way to produce this output below?
drop table test111
create table test111
(
Q1 nvarchar(max)
);
INSERT INTO TEST111(Q1)
VALUES('Strongly Agree')
,('Agree')
,('Disagree')
,('Strongly Disagree')
,('Strongly Agree')
,('Agree')
,('Disagree')
,('Neutral');
SELECT
CASE WHEN [Q1] = 'Strongly Agree' THEN 5
WHEN [Q1] = 'Agree' THEN 4
WHEN [Q1] = 'Neutral' THEN 3
WHEN [Q1] = 'Disagree' THEN 2
WHEN [Q1] = 'Strongly Disagree' THEN 1
END AS 'Test Q1'
FROM test111

I have to make a few assumptions here, but it looks like you want to treat an output column like a column in a spreadsheet. You have 12 numbers. You then have a blank "separator" row. Then a row with the number 12 (which is the count of how many numbers you have). Then a row with the number 49, which is the sum of those 12 numbers. Then the 4.08 row, which is rougly the average, and so on.
Some of these outputs can be provided by cube or rollup, but neither is a complete solution.
If you wanted to produce this output directly from TSQL, you would need to have multiple select statements and combine the results of all of those statements using union all. First you would have a select just to get the numbers. Then you would have a second select which outputs a "blank". Then another select which is providing a count. Then another select which is providing a sum. And so on.
You would also no longer be able to output actual numbers, since a "blank" is not a number. Visually it's best represented as an empty string. But now your output column has to be of datatype char or varchar.
You also have to make sure rows come out in the correct order for presentation. So you need a column to order by. You would have to add some kind of ordering column "manually" to each of the select statements, so when you union them all together you can tell SQL in what order the output should be provided.
So the answer to "can it be done?" is technically "yes". But if you think seems like a whole lot of laborious and inefficient TSQL work, you'd be right.
The real solution here is to change your approach. SQL should not be concerned with "output formatting". What you should do is just return the actual data (your 12 numbers) from SQL, and then do all of the additional presentation (like adding a blank row, adding a count row, etc), in the code of the program that is calling SQL to get that data.

I must say, this is one of the strangest T-SQL requirements I've seen, and is really best left to the presentation layer.
It is possible using GROUPING SETS though. We can use it to get an extra rollup row that aggregates the whole table.
Once you have the rollup, you need to unpivot the totalled row (identified by GROUPING() = 1) to get your final result. We can do this using CROSS APPLY.
This is impossible without a row-identifier. I have added ROW_NUMBER, but any primary or unique key will do.
WITH YourTable AS (
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rn,
CASE WHEN [Q1] = 'Strongly Agree' THEN 5
WHEN [Q1] = 'Agree' THEN 4
WHEN [Q1] = 'Neutral' THEN 3
WHEN [Q1] = 'Disagree' THEN 2
WHEN [Q1] = 'Strongly Disagree' THEN 1
END AS TestQ1
FROM test111
),
RolledUp AS (
SELECT
rn,
TestQ1,
grouping = GROUPING(TestQ1),
count = COUNT(*),
sum = SUM(TestQ1),
avg = AVG(TestQ1 * 1.0),
one = COUNT(CASE WHEN TestQ1 = 1 THEN 1 END),
onePct = COUNT(CASE WHEN TestQ1 = 1 THEN 1 END) * 1.0 / COUNT(*)
FROM YourTable
GROUP BY GROUPING SETS(
(rn, TestQ1),
()
)
)
SELECT v.TestQ1
FROM RolledUp r
CROSS APPLY (
SELECT r.TestQ1, 0 AS ordering
WHERE r.grouping = 0
UNION ALL
SELECT v.value, v.ordering
FROM (VALUES
(NULL , 1),
(r.count , 2),
(r.sum , 3),
(r.avg , 4),
(r.one , 5),
(r.onePct, 6)
) v(value, ordering)
WHERE r.grouping = 1
) v
ORDER BY
v.ordering,
r.rn;
db<>fiddle

How to select Second Last Row in mySql?

I want to retrieve the 2nd last row result and I have seen this question:
How can I retrieve second last row?
but it uses order by which in my case does not work because the Emp_Number Column contains number of rows and date time stamp that mixes data if I use order by .
The rows 22 and 23 contain the total number of rows (excluding row 21 and 22) and the time and day it got entered respectively.
I used this query which returns the required result 21 but if this number increases it will cause an error.
SELECT TOP 1 *
FROM(
SELECT TOP 2 *
FROM DAT_History
ORDER BY Emp_Number ASC
) t
ORDER BY Emp_Number desc
Is there any way to get the 2nd last row value without using the Order By function?

There is no guarantee that the count will be returned in the one-but-last row, as there is no definite order defined. Even if those records were written in the correct order, the engine is free to return the records in any order, unless you specify an order by clause. But apparently you don't have a column to put in that clause to reproduce the intended order.
I propose these solutions:
1. Return the minimum of those values that represent positive integers
select min(Emp_Number * 1)
from DAT_history
where Emp_Number not regexp '[^0-9]'
See SQL Fiddle
This will obviously fail when the count is larger then the smallest employee number. But seeing the sample data, that would represent a number of records that is maybe not expected...
2. Count the records, ignoring the 2 aggregated records
select count(*)-2
from DAT_history
See SQL Fiddle
3. Relying on correct order without order by
As explained at the start, you cannot rely on the order, but if for some reason you still want to rely on this, you can use a variable to number the rows in a sub query, and then pick out the one that has been attributed the one-but-last number:
select Emp_Number * 1
from (select Emp_Number,
#rn := #rn + 1 rn
from DAT_history,
(select #rn := 0) init
) numbered
where rn = #rn - 1
See SQL Fiddle
The * 1 is added to convert the text to a number data type.

This is not a perfect solution. I am making some assumptions for this. Check if this could work for you.
;WITH cte
AS (SELECT emp_number,
Row_number()
OVER (
ORDER BY emp_number ASC) AS rn
FROM dat_history
WHERE Isdate(emp_number) = 0) --Omit date entries
SELECT emp_number
FROM cte
WHERE rn = 1 -- select the minimum entry, assuming it would be the count and assuming count might not exceed the emp number range of 9888000

Summation in SQL over a computed column

I have a Trans-SQL related question, concerning summations over a computed column.
I am having a problem with double-counting of these computed values.
Usually I would extract all the raw data and post-process it in Perl, but I can't do that on this occasion due to the particular reporting system we need to use. I'm relatively inexperienced with the intricacies of SQL, so I thought I'd refer this to the experts.
My data is arranged in the following tables (highly simplified and reduced for the purposes of clarity):
Patient table:
PatientId
PatientSer
Course table
PatientSer
CourseSer
CourseId
Diagnosis table
PatientSer
DiagnosisId
Plan table
PlanSer
CourseSer
PlanId
Field table
PlanSer
FieldId
FractionNumber
FieldDateTime
What I would like to do is find the difference between the maximum fraction number and the minimum fraction number over a range of dates in the FieldDateTime in the FieldTable. I would like to then sum these values over the possible plan ids associated with a course, but I do not want to double count over the two particular diagnosis ids (A or B or both) that I may encounter for a patient.
So, for a patient with two diagnosis codes (A and B) and two plans in the same course of treatment (Plan1 and Plan2), with a difference in fraction numbers of 24 for the first plan and 5 for the second what I would like to get out is something like this:
- **PatientId CourseId PlanId DiagnosisId FractionNumberDiff Sum
- AB1234 1 Plan1 A 24 29
- AB1234 1 Plan1 B * *
- AB1234 1 Plan2 A 5 *
- AB1234 1 Plan2 B * *
I've racked my brains about how to do this, and I've tried the following:
SELECT
Patient.PatientId,
Course.CourseId,
Plan.PlanId,
MAX(fractionnumber OVER PARTITION(Plan.PlanSer)) - MIN(fractionnumber OVER PARTITION(Plan.PlanSer)) AS FractionNumberDiff,
SUM(FractionNumberDiff OVER PARTITION(Course.CourseSer)
FROM
Patient P
INNER JOIN
Course C ON (P.PatientSer = C.PatientSer)
INNER JOIN
Plan Pl ON (Pl.CourseSer = C.CourseSer)
INNER JOIN
Diagnosis D ON (D.PatientSer = P.PatientSer)
INNER JOIN
Field F ON (F.PlanSer = Pl.PlanSer)
WHERE
FieldDateTime > [Start Date]
AND FieldDateTime < [End Date]
But this just double-counts over the diagnosis codes, meaning that I end up with 58 instead of 29.
Any ideas about what I can do?

change the FractionNumberDiff to
MAX(fractionnumber) OVER (PARTITION BY Plan.PlanSer) -
MIN(fractionnumber) OVER (PARTITION BY Plan.PlanSer) AS FractionNumberDiff
and remove the "SUM(FractionNumberDiff OVER PARTITION(Course.CourseSer)"
make the exisitng query as a derived table and calcualte the SUM(FractionNumberDiff) there
SELECT *, SUM(FractionNumberDiff) OVER ( PARTITION BYCourse.CourseSer)
FROM
(
< the modified existing query here>
) AS d
as for the double counting issue, please post some sample data and the expected result

How Can I Detect and Bound Changes Between Row Values in a SQL Table?

I have a table which records values over time, similar to the following:
RecordId Time Name
========================
1 10 Running
2 18 Running
3 21 Running
4 29 Walking
5 33 Walking
6 57 Running
7 66 Running
After querying this table, I need a result similar to the following:
FromTime ToTime Name
=========================
10 29 Running
29 57 Walking
57 NULL Running
I've toyed around with some of the aggregate functions (e.g. MIN, MAX, etc.), PARTITION and CTEs, but I can't seem to hit upon the right solution. I'm hoping a SQL guru can give me a hand, or at least point me in the right direction. Is there a fairly straightforward way to query this (preferrably without a cursor?)

Finding "ToTime" By Aggregates Instead of a Join
I would like to share a really wild query that only takes 1 scan of the table with 1 logical read. By comparison, the best other answer on the page, Simon Kingston's query, takes 2 scans.
On a very large set of data (17,408 input rows, producing 8,193 result rows) it takes CPU 574 and time 2645, while Simon Kingston's query takes CPU 63,820 and time 37,108.
It's possible that with indexes the other queries on the page could perform many times better, but it is interesting to me to achieve 111x CPU improvement and 14x speed improvement just by rewriting the query.
(Please note: I mean no disrespect at all to Simon Kingston or anyone else; I am simply excited about my idea for this query panning out so well. His query is better than mine as its performance is plenty and it actually is understandable and maintainable, unlike mine.)
Here is the impossible query. It is hard to understand. It was hard to write. But it is awesome. :)
WITH Ranks AS (
SELECT
T = Dense_Rank() OVER (ORDER BY Time, Num),
N = Dense_Rank() OVER (PARTITION BY Name ORDER BY Time, Num),
*
FROM
#Data D
CROSS JOIN (
VALUES (1), (2)
) X (Num)
), Items AS (
SELECT
FromTime = Min(Time),
ToTime = Max(Time),
Name = IsNull(Min(CASE WHEN Num = 2 THEN Name END), Min(Name)),
I = IsNull(Min(CASE WHEN Num = 2 THEN T - N END), Min(T - N)),
MinNum = Min(Num)
FROM
Ranks
GROUP BY
T / 2
)
SELECT
FromTime = Min(FromTime),
ToTime = CASE WHEN MinNum = 2 THEN NULL ELSE Max(ToTime) END,
Name
FROM Items
GROUP BY
I, Name, MinNum
ORDER BY
FromTime
Note: This requires SQL 2008 or up. To make it work in SQL 2005, change the VALUES clause to SELECT 1 UNION ALL SELECT 2.
Updated Query
After thinking about this a bit, I realized that I was accomplishing two separate logical tasks at the same time, and this made the query unnecessarily complicated: 1) prune out intermediate rows that have no bearing on the final solution (rows that do not begin a new task) and 2) pull the "ToTime" value from the next row. By performing #1 before #2, the query is simpler and performs with approximately half the CPU!
So here is the simplified query that first, trims out the rows we don't care about, then gets the ToTime value using aggregates rather than a JOIN. Yes, it does have 3 windowing functions instead of 2, but ultimately because of the fewer rows (after pruning those we don't care about) it has less work to do:
WITH Ranks AS (
SELECT
Grp =
Row_Number() OVER (ORDER BY Time)
- Row_Number() OVER (PARTITION BY Name ORDER BY Time),
[Time], Name
FROM #Data D
), Ranges AS (
SELECT
Result = Row_Number() OVER (ORDER BY Min(R.[Time]), X.Num) / 2,
[Time] = Min(R.[Time]),
R.Name, X.Num
FROM
Ranks R
CROSS JOIN (VALUES (1), (2)) X (Num)
GROUP BY
R.Name, R.Grp, X.Num
)
SELECT
FromTime = Min([Time]),
ToTime = CASE WHEN Count(*) = 1 THEN NULL ELSE Max([Time]) END,
Name = IsNull(Min(CASE WHEN Num = 2 THEN Name ELSE NULL END), Min(Name))
FROM Ranges R
WHERE Result > 0
GROUP BY Result
ORDER BY FromTime;
This updated query has all the same issues as I presented in my explanation, however, they are easier to solve because I am not dealing with the extra unneeded rows. I also see that the Row_Number() / 2 value of 0 I had to exclude, and I am not sure why I didn't exclude it from the prior query, but in any case this works perfectly and is amazingly fast!
Outer Apply Tidies Things Up
Last, here is a version basically identical to Simon Kingston's query that I think is an easier to understand syntax.
SELECT
FromTime = Min(D.Time),
X.ToTime,
D.Name
FROM
#Data D
OUTER APPLY (
SELECT TOP 1 ToTime = D2.[Time]
FROM #Data D2
WHERE
D.[Time] < D2.[Time]
AND D.[Name] <> D2.[Name]
ORDER BY D2.[Time]
) X
GROUP BY
X.ToTime,
D.Name
ORDER BY
FromTime;
Here's the setup script if you want to do performance comparison on a larger data set:
CREATE TABLE #Data (
RecordId int,
[Time] int,
Name varchar(10)
);
INSERT #Data VALUES
(1, 10, 'Running'),
(2, 18, 'Running'),
(3, 21, 'Running'),
(4, 29, 'Walking'),
(5, 33, 'Walking'),
(6, 57, 'Running'),
(7, 66, 'Running'),
(8, 77, 'Running'),
(9, 81, 'Walking'),
(10, 89, 'Running'),
(11, 93, 'Walking'),
(12, 99, 'Running'),
(13, 107, 'Running'),
(14, 113, 'Walking'),
(15, 124, 'Walking'),
(16, 155, 'Walking'),
(17, 178, 'Running');
GO
insert #data select recordid + (select max(recordid) from #data), time + (select max(time) +25 from #data), name from #data
GO 10
Explanation
Here is the basic idea behind my query.
The times that represent a switch have to appear in two adjacent rows, one to end the prior activity, and one to begin the next activity. The natural solution to this is a join so that an output row can pull from its own row (for the start time) and the next changed row (for the end time).
However, my query accomplishes the need to make end times appear in two different rows by repeating the row twice, with CROSS JOIN (VALUES (1), (2)). We now have all our rows duplicated. The idea is that instead of using a JOIN to do calculation across columns, we'll use some form of aggregation to collapse each desired pair of rows into one.
The next task is to make each duplicate row split properly so that one instance goes with the prior pair and one with the next pair. This is accomplished with the T column, a ROW_NUMBER() ordered by Time, and then divided by 2 (though I changed it do a DENSE_RANK() for symmetry as in this case it returns the same value as ROW_NUMBER). For efficiency I performed the division in the next step so that the row number could be reused in another calculation (keep reading). Since row number starts at 1, and dividing by 2 implicitly converts to int, this has the effect of producing the sequence 0 1 1 2 2 3 3 4 4 ... which has the desired result: by grouping by this calculated value, since we also ordered by Num in the row number, we've now accomplished that all sets after the first one are comprised of a Num = 2 from the "prior" row, and a Num = 1 from the "next" row.
The next difficult task is figuring out a way to eliminate the rows we don't care about and somehow collapse the start time of a block into the same row as the end time of a block. What we want is a way to get each discrete set of Running or Walking to be given its own number so we can group by it. DENSE_RANK() is a natural solution, but a problem is that it pays attention to each value in the ORDER BY clause--we don't have syntax to do DENSE_RANK() OVER (PREORDER BY Time ORDER BY Name) so that the Time does not cause the RANK calculation to change except on each change in Name. After some thought I realized I could crib a bit from the logic behind Itzik Ben-Gan's grouped islands solution, and I figured out that the rank of the rows ordered by Time, subtracted from the rank of the rows partitioned by Name and ordered by Time, would yield a value that was the same for each row in the same group but different from other groups. The generic grouped islands technique is to create two calculated values that both ascend in lockstep with the rows such as 4 5 6 and 1 2 3, that when subtracted will yield the same value (in this example case 3 3 3 as the result of 4 - 1, 5 - 2, and 6 - 3). Note: I initially started with ROW_NUMBER() for my N calculation but it wasn't working. The correct answer was DENSE_RANK() though I am sorry to say I don't remember why I concluded this at the time, and I would have to dive in again to figure it out. But anyway, that is what T-N calculates: a number that can be grouped on to isolate each "island" of one status (either Running or Walking).
But this was not the end because there are some wrinkles. First of all, the "next" row in each group contains the incorrect values for Name, N, and T. We get around this by selecting, from each group, the value from the Num = 2 row when it exists (but if it doesn't, then we use the remaining value). This yields the expressions like CASE WHEN NUM = 2 THEN x END: this will properly weed out the incorrect "next" row values.
After some experimentation, I realized that it was not enough to group by T - N by itself, because both the Walking groups and the Running groups can have the same calculated value (in the case of my sample data provided up to 17, there are two T - N values of 6). But simply grouping by Name as well solves this problem. No group of either "Running" or "Walking" will have the same number of intervening values from the opposite type. That is, since the first group starts with "Running", and there are two "Walking" rows intervening before the next "Running" group, then the value for N will be 2 less than the value for T in that next "Running" group. I just realized that one way to think about this is that the T - N calculation counts the number of rows before the current row that do NOT belong to the same value "Running" or "Walking". Some thought will show that this is true: if we move on to the third "Running" group, it is only the third group by virtue of having a "Walking" group separating them, so it has a different number of intervening rows coming in before it, and due to it starting at a higher position, it is high enough so that the values cannot be duplicated.
Finally, since our final group consists of only one row (there is no end time and we need to display a NULL instead) I had to throw in a calculation that could be used to determine whether we had an end time or not. This is accomplished with the Min(Num) expression and then finally detecting that when the Min(Num) was 2 (meaning we did not have a "next" row) then display a NULL instead of the Max(ToTime) value.
I hope this explanation is of some use to people. I don't know if my "row-multiplying" technique will be generally useful and applicable to most SQL query writers in production environments because of the difficulty understanding it and and the difficulty of maintenance it will most certainly present to the next person visiting the code (the reaction is probably "What on earth is it doing!?!" followed by a quick "Time to rewrite!").
If you have made it this far then I thank you for your time and for indulging me in my little excursion into incredibly-fun-sql-puzzle-land.
See it For Yourself
A.k.a. simulating a "PREORDER BY":
One last note. To see how T - N does the job--and noting that using this part of my method may not be generally applicable to the SQL community--run the following query against the first 17 rows of the sample data:
WITH Ranks AS (
SELECT
T = Dense_Rank() OVER (ORDER BY Time),
N = Dense_Rank() OVER (PARTITION BY Name ORDER BY Time),
*
FROM
#Data D
)
SELECT
*,
T - N
FROM Ranks
ORDER BY
[Time];
This yields:
RecordId Time Name T N T - N
----------- ---- ---------- ---- ---- -----
1 10 Running 1 1 0
2 18 Running 2 2 0
3 21 Running 3 3 0
4 29 Walking 4 1 3
5 33 Walking 5 2 3
6 57 Running 6 4 2
7 66 Running 7 5 2
8 77 Running 8 6 2
9 81 Walking 9 3 6
10 89 Running 10 7 3
11 93 Walking 11 4 7
12 99 Running 12 8 4
13 107 Running 13 9 4
14 113 Walking 14 5 9
15 124 Walking 15 6 9
16 155 Walking 16 7 9
17 178 Running 17 10 7
The important part being that each group of "Walking" or "Running" has the same value for T - N that is distinct from any other group with the same name.
Performance
I don't want to belabor the point about my query being faster than other people's. However, given how striking the difference is (when there are no indexes) I wanted to show the numbers in a table format. This is a good technique when high performance of this kind of row-to-row correlation is needed.
Before each query ran, I used DBCC FREEPROCCACHE; DBCC DROPCLEANBUFFERS;. I set MAXDOP to 1 for each query to remove the time-collapsing effects of parallelism. I selected each result set into variables instead of returning them to the client so as to measure only performance and not client data transmission. All queries were given the same ORDER BY clauses. All tests used 17,408 input rows yielding 8,193 result rows.
No results are displayed for the following people/reasons:
RichardTheKiwi *Could not test--query needs updating*
ypercube *No SQL 2012 environment yet :)*
Tim S *Did not complete tests within 5 minutes*
With no index:
CPU Duration Reads Writes
----------- ----------- ----------- -----------
ErikE 344 344 99 0
Simon Kingston 68672 69582 549203 49
With index CREATE UNIQUE CLUSTERED INDEX CI_#Data ON #Data (Time);:
CPU Duration Reads Writes
----------- ----------- ----------- -----------
ErikE 328 336 99 0
Simon Kingston 70391 71291 549203 49 * basically not worse
With index CREATE UNIQUE CLUSTERED INDEX CI_#Data ON #Data (Time, Name);:
CPU Duration Reads Writes
----------- ----------- ----------- -----------
ErikE 375 414 359 0 * IO WINNER
Simon Kingston 172 189 38273 0 * CPU WINNER
So the moral of the story is:
Appropriate Indexes Are More Important Than Query Wizardry
With the appropriate index, Simon Kingston's version wins overall, especially when including query complexity/maintainability.
Heed this lesson well! 38k reads is not really that many, and Simon Kingston's version ran in half the time as mine. The speed increase of my query was entirely due to there being no index on the table, and the concomitant catastrophic cost this gave to any query needing a join (which mine didn't): a full table scan Hash Match killing its performance. With an index, his query was able to do a Nested Loop with a clustered index seek (a.k.a. a bookmark lookup) which made things really fast.
It is interesting that a clustered index on Time alone was not enough. Even though Times were unique, meaning only one Name occurred per time, it still needed Name to be part of the index in order to utilize it properly.
Adding the clustered index to the table when full of data took under 1 second! Don't neglect your indexes.

This will not work in SQL Server 2008, only in SQL Server 2012 version that has the LAG() and LEAD() analytic functions, but I'll leave it here for anyone with newer versions:
SELECT Time AS FromTime
, LEAD(Time) OVER (ORDER BY Time) AS ToTime
, Name
FROM
( SELECT Time
, LAG(Name) OVER (ORDER BY Time) AS PreviousName
, Name
FROM Data
) AS tmp
WHERE PreviousName <> Name
OR PreviousName IS NULL ;
Tested in SQL-Fiddle
With an index on (Time, Name) it will need an index scan.
Edit:
If NULL is a valid value for Name that needs to be taken as a valid entry, use the following WHERE clause:
WHERE PreviousName <> Name
OR (PreviousName IS NULL AND Name IS NOT NULL)
OR (PreviousName IS NOT NULL AND Name IS NULL) ;

I think you're essentially interested in where the 'Name' changes from one record to the next (in order of 'Time'). If you can identify where this happens you can generate your desired output.
Since you mentioned CTEs I'm going to assume you're on SQL Server 2005+ and can therefore use the ROW_NUMBER() function. You can use ROW_NUMBER() as a handy way to identify consecutive pairs of records and then to find those where the 'Name' changes.
How about this:
WITH OrderedTable AS
(
SELECT
*,
ROW_NUMBER() OVER (ORDER BY Time) AS Ordinal
FROM
[YourTable]
),
NameChange AS
(
SELECT
after.Time AS Time,
after.Name AS Name,
ROW_NUMBER() OVER (ORDER BY after.Time) AS Ordinal
FROM
OrderedTable before
RIGHT JOIN OrderedTable after ON after.Ordinal = before.Ordinal + 1
WHERE
ISNULL(before.Name, '') <> after.Name
)
SELECT
before.Time AS FromTime,
after.Time AS ToTime,
before.Name
FROM
NameChange before
LEFT JOIN NameChange after ON after.Ordinal = before.Ordinal + 1

I assume that the RecordIDs are not always sequential, hence the CTE to create a non-breaking sequential number.
SQLFiddle
;with SequentiallyNumbered as (
select *, N = row_number() over (order by RecordId)
from Data)
, Tmp as (
select A.*, RN=row_number() over (order by A.Time)
from SequentiallyNumbered A
left join SequentiallyNumbered B on B.N = A.N-1 and A.name = B.name
where B.name is null)
select A.Time FromTime, B.Time ToTime, A.Name
from Tmp A
left join Tmp B on B.RN = A.RN + 1;
The dataset I used to test
create table Data (
RecordId int,
Time int,
Name varchar(10));
insert Data values
(1 ,10 ,'Running'),
(2 ,18 ,'Running'),
(3 ,21 ,'Running'),
(4 ,29 ,'Walking'),
(5 ,33 ,'Walking'),
(6 ,57 ,'Running'),
(7 ,66 ,'Running');

Here's a CTE solution that gets the results you're seeking:
;WITH TheRecords (FirstTime,SecondTime,[Name])
AS
(
SELECT [Time],
(
SELECT MIN([Time])
FROM ActivityTable at2
WHERE at2.[Time]>at.[Time]
AND at2.[Name]<>at.[Name]
),
[Name]
FROM ActivityTable at
)
SELECT MIN(FirstTime) AS FromTime,SecondTime AS ToTime,MIN([Name]) AS [Name]
FROM TheRecords
GROUP BY SecondTime
ORDER BY FromTime,ToTime

Gaps in recurring series of a group with datetime [duplicate]

We have a table with following data
Id,ItemId,SeqNumber;DateTimeTrx
1,100,254,2011-12-01 09:00:00
2,100,1,2011-12-01 09:10:00
3,200,7,2011-12-02 11:00:00
4,200,5,2011-12-02 10:00:00
5,100,255,2011-12-01 09:05:00
6,200,3,2011-12-02 09:00:00
7,300,0,2011-12-03 10:00:00
8,300,255,2011-12-03 11:00:00
9,300,1,2011-12-03 10:30:00
Id is an identity column.
The sequence for an ItemId starts from 0 and goes till 255 and then resets to 0. All this information is stored in a table called Item. The order of sequence number is determined by the DateTimeTrx but such data can enter any time into the system. The expected output is as shown below-
ItemId,PrevorNext,SeqNumber,DateTimeTrx,MissingNumber
100,Previous,255,2011-12-01 09:05:00,0
100,Next,1,2011-12-01 09:10:00,0
200,Previous,3,2011-12-02 09:00:00,4
200,Next,5,2011-12-02 10:00:00,4
200,Previous,5,2011-12-02 10:00:00,6
200,Next,7,2011-12-02 11:00:00,6
300,Previous,1,2011-12-03 10:30:00,2
300,Next,255,2011-12-03 16:30:00,2
We need to get those rows one before and one after the missing sequence. In the above example for ItemId 300 - the record with sequence 1 has entered first (2011-12-03 10:30:00) and then 255(2011-12-03 16:30:00), hence the missing number here is 2. So 1 is previous and 255 is next and 2 is the first missing number. Coming to ItemId 100, the record with sequence 255 has entered first (2011-12-02 09:05:00) and then 1 (2011-12-02 09:10:00), hence 255 is previous and then 1, hence 0 is the first missing number.
In the above expected result, MissingNumber column is the first occuring missing number just to illustrate the example.
We will not have a case where we would have a complete series reset at one time i.e. it can be either a series rundown from 255 to 0 as in for itemid 100 or 0 to 255 as in ItemId 300. Hence we need to identify sequence missing when in ascending order (0,1,...255) or either in descending order (254,254,0,2) etc.
How can we accomplish this in a t-sql?

Could work like this:
;WITH b AS (
SELECT *
,row_number() OVER (ORDER BY ItemId, DateTimeTrx, SeqNumber) AS rn
FROM tbl
), x AS (
SELECT
b.Id
,b.ItemId AS prev_Itm
,b.SeqNumber AS prev_Seq
,c.ItemId AS next_Itm
,c.SeqNumber AS next_Seq
FROM b
JOIN b c ON c.rn = b.rn + 1 -- next row
WHERE c.ItemId = b.ItemId -- only with same ItemId
AND c.SeqNumber <> (b.SeqNumber + 1)%256 -- Seq cycles modulo 256
)
SELECT Id, prev_Itm, 'Previous' AS PrevNext, prev_Seq
FROM x
UNION ALL
SELECT Id, next_Itm ,'Next', next_Seq
FROM x
ORDER BY Id, PrevNext DESC
Produces exactly the requested result.
See a complete working demo on data.SE.
This solution takes gaps in the Id column into consideration, as there is no mention of a gapless sequence of Ids in the question.
Edit2: Answer to updated question:
I updated the CTE in the query above to match your latest verstion - or so I think.
Use those columns that define the sequence of rows. Add as many columns to your ORDER BY clause as necessary to break ties.
The explanation to your latest update is not entirely clear to me, but I think you only need to squeeze in DateTimeTrx to achieve what you want. I have SeqNumber in the ORDER BY additionally to break ties left by identical DateTimeTrx. I edited the query above.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight