How do I exclude rows when an incremental value starts over? - sql-server

I am a newbie poster but have spent a lot of time researching answers here. I can't quite figure out how to create a SQL result set using SQL Server 2008 R2 that should probably be using lead/lag from more modern versions. I am trying to aggregate data based on sequencing of one column, but there can be varying numbers of instances in each sequence. The only way I know a sequence has ended is when the next row has a lower sequence number. So it may go 1-2, 1-2-3-4, 1-2-3, and I have to figure out how to make 3 aggregates out of that.
Source data is joined tables that look like this (please help me format):
recordID instanceDate moduleID iResult interactionNum
1356 10/6/15 16:14 1 68 1
1357 10/7/15 16:22 1 100 2
1434 10/9/15 16:58 1 52 1
1435 10/11/15 17:00 1 60 2
1436 10/15/15 16:57 1 100 3
1437 10/15/15 16:59 1 100 4
I need to find a way to separate the first 2 rows from the last 4 rows in this example, based on values in the last column.
What I would love to ultimately get is a result set that looks like this, which averages the iResult column based on the grouping and takes the first instanceDate from the grouping:
instanceDate moduleID iResult
10/6/15 1 84
10/9/15 1 78
I can aggregate to get this result using MIN and AVG if I can just find a way to separate the groups. The data is ordered by instanceDate (please ignore the date formatting here) then interactionNum and the group separation should happen when the query finds a row where the interactionNum is <= than the previous row (will usually start over with '1' but not always, so prefer just to separate on a lower or equal integer value).
Here is the query I have so far (includes the joins that give the above data set):
SELECT
X.*
FROM
(SELECT TOP 100 PERCENT
instanceDate, b.ModuleID, iResult, b.interactionNum
FROM
(firstTable a
INNER JOIN
secondTable b ON b.someID = a.someID)
WHERE
a.someID = 2
AND b.otherID LIKE 'xyz'
AND a.ModuleID = 1
ORDER BY
instanceDate) AS X
OUTER APPLY
(SELECT TOP 1
*
FROM
(SELECT
instanceDate, d.ModuleID, iResult, d.interactionNum
FROM
(firstTable c
INNER JOIN
secondTable d ON d.someID = c.someID)
WHERE
c.someID = 2
AND d.otherID LIKE 'xyz'
AND c.ModuleID = 1
AND d.interactionNum = X.interactionNum
AND c.instanceDate < X.instanceDate) X2
ORDER BY
instanceDate DESC) Y
WHERE
NOT EXISTS (SELECT Y.interactionNum INTERSECT SELECT X.interactionNum)
But this is returning an interim result set like this:
instanceDate ModuleID iResult interactionNum
10/6/15 16:10 1 68 1
10/6/15 16:14 1 100 2
10/15/15 16:57 1 100 3
10/15/15 16:59 1 100 4
and the problem is that interactionNum 3, 4 do not belong in this result set. They would go in the next result set when I loop over this query. How do I keep them out of the result set in this iteration? I need the result set from this query to just include the first two rows, 'seeing' that row 3 of the source data has a lower value for interactionNum than row 2 has.

Not sure what ModuleID was supposed to be used, but I guess you're looking for something like this:
select min (instanceDate), [moduleID], avg([iResult])
from (
select *,row_number() over (partition by [moduleID] order by instanceDate) as RN
from Table1
) X
group by [moduleID], RN - [interactionNum]
The idea here is to create a running number with row_number for each moduleid, and then use the difference between that and InteractionNum as grouping criteria.
Example in SQL Fiddle

Here is my solution, although it should be said, I think #JamesZ answer is cleaner.
I created a new field called newinstance which is 1 wherever your instanceNumber is 1. I then created a rolling sum(newinstance) called rollinginstance to group on.
Change the last select to SELECT * FROM cte2 to show all the fields I added.
IF OBJECT_ID('tempdb..#tmpData') IS NOT NULL
DROP TABLE #tmpData
CREATE TABLE #tmpData (recordID INT, instanceDate DATETIME, moduleID INT, iResult INT, interactionNum INT)
INSERT INTO #tmpData
SELECT 1356,'10/6/15 16:14',1,68,1 UNION
SELECT 1357,'10/7/15 16:22',1,100,2 UNION
SELECT 1434,'10/9/15 16:58',1,52,1 UNION
SELECT 1435,'10/11/15 17:00',1,60,2 UNION
SELECT 1436,'10/15/15 16:57',1,100,3 UNION
SELECT 1437,'10/15/15 16:59',1,100,4
;WITH cte1 AS
(
SELECT *,
CASE WHEN interactionNum=1 THEN 1 ELSE 0 END AS newinstance,
ROW_NUMBER() OVER(ORDER BY recordID) as rowid
FROM #tmpData
), cte2 AS
(
SELECT *,
(select SUM(newinstance) from cte1 b where b.rowid<=a.rowid) as rollinginstance
FROM cte1 a
)
SELECT MIN(instanceDate) AS instanceDate, moduleID, AVG(iResult) AS iResult
FROM cte2
GROUP BY moduleID, rollinginstance

Related

How to calculate LCM value using SQL Server?

I have below data and I need to calculate LCM (Lowest calculate multiply) value based on group id using a T-SQL query. Your help would be appreciated.
Groupid GroupValue
------------------
1 2
1 4
1 6
2 5
2 5
2 10
3 3
3 12
3 6
3 9
Expected result is below.
Groupid GroupLCM
------------------
1 12
2 10
3 36
One possible way is to use tally tables like below
See working demo
; with detailedSet as
(
select
Groupid,
GroupValue=abs(GroupValue),
biggest=max(GroupValue) over (partition by Groupid),
totalNumbers= count(1) over (partition by Groupid)
from num
)
,
possibleLCMValues as
(
select Groupid, counter
from detailedSet b
cross apply
(
select counter= row_number() over ( order by (select null)) * biggest
from sys.objects o1 cross join sys.objects o2
)c
where c.counter%GroupValue =0
group by Groupid, counter
having count(1)=max(totalNumbers)
)
,
LCMValues as
(
select
Groupid,
LCM=min(counter)
from possibleLCMValues
group by Groupid
)
select * from LCMValues
I found the solution which I post on below stack flow question. In Final result table we just use max value again group id and we get LCM value.
Just note Like, I post this question for more optimize solution to remove for loop otherwise it working properly using for loop as well.
How to update the column without loop in SQL Server?

How do I fill in missing dates as rows and give other values? (exceptional case)

I have a lot of explaining to do for the context of this question so bear with me.
At my company, we have a SQL Server database and I'm working in the Management studio 2014.
We have a table that's called Jobstatistics, which displays how many Jobs are done during Intervals of one hour each.
The table looks like this
The station field is basically different areas jobs can be done at.
As you can see, some rows are missing for certain intervals and this is because of the way this table gets filled with data. To fill this table we have a script running that looks at another table and aggregates the amount of jobs for all dates between this interval. In other words, if there aren't any jobs, there won't be a row inserted because there will be nothing to insert (no rows from the other table to aggregate any jobs on).
What I want to do here is fill in these extra intervals with 0 as the amount of Jobs. So there will always be the 24 intervals (hours) for each day and for each station. On top of that we have set targets which we would like to achieve and I declared these in another table, called JobstatisticsTargets, which you could call a calendar table to join the Jobstatistics table on.
The calender table looks like this
I have tried doing a left or right join so the missing intervals would get filled in and the Jobs would at least get NULL values, but the join clause doesn't do what I expect it to.
This is my tried attempt
SELECT a.[Station], a.[Interval], a.[Jobs], b.[28JPH], b.[35JPH]
FROM [JobStatistics] a
RIGHT JOIN [JobStatisticsTargets] b
ON CONVERT(VARCHAR(10),a.Interval,108) = b.Interval
WHERE DATEDIFF(DAY, a.Interval, GETDATE()) < 12
AND Station LIKE '138010'
ORDER BY a.Station, a.Interval
The LEFT JOIN does exactly the same as I would expect a normal join to do and it doesn't append any intervals with NULL values. (the query is just for one station and a few days so I could test easily)
Any help is much appreciated. I will check this topic regularly so be sure to ask any questions regarding the context if you have any and I will try to explain it as good as I can!
EDIT
With some help the query now looks like this
SELECT a.[Station], b.[Interval], a.[Jobs], b.[28JPH], b.[35JPH]
FROM [JobStatistics] a
RIGHT JOIN [JobStatisticsTargets] b
ON CONVERT(VARCHAR(10),a.Interval,108) = b.Interval
AND CONVERT(VARCHAR(10),a.Interval,110) = CONVERT(VARCHAR(10),GETDATE(),110)
AND Station LIKE '138010'
ORDER BY b.Interval
I filter on today's date now because otherwise the extra rows aren't what I want them to be at all. The problem is that I don't know an easy way of filling in my stations. I suppose I need a subquery for those or is there another way?
The problem now as well is that I can't do this query for different stations. I would expect 24 rows for each station representing all the intervals, but I get this as a result:
Station Interval Jobs 28JPH 35JPH
NULL 00:30:00 NULL 0 0
NULL 01:30:00 NULL 0 0
NULL 02:30:00 NULL 0 0
NULL 03:30:00 NULL 0 0
134040 04:30:00 2 0 0
136060 04:30:00 2 0 0
131080 04:30:00 2 0 0
138010 05:30:00 2 0 0
NULL 06:30:00 NULL 0 0
NULL 07:30:00 NULL 28 35
NULL 08:30:00 NULL 28 35
...
You filter on a field from the table which rows may not be presented in the join result: >>>AND Station LIKE '138010'
You should change your query and put this condition in ON CLAUSE, not in WHERE
check this script and let me know,
declare #t table(interval datetime,jobs int)
insert into #t VALUES('2017-04-28 05:30',1),('2017-04-28 06:30',5),('2017-04-29 06:30',5)
--select * from #t
;With CTE as
(
select cast('00:00' as time) as IntervalTime
union ALL
select DATEADD(MINUTE,30,IntervalTime)
from cte
where IntervalTime<'23:30'
)
,CTE1 AS(
select interval,jobs
,dense_rank()over( order by cast(interval as date))rn
from #t
)
select * FROM
(
select distinct case when t.interval is null then
DATEADD(day, DATEDIFF(day, 0,
(select top 1 interval from cte1 where rn=n.number)), cast(c.IntervalTime as datetime))
else t.interval end Newinterval,isnull(t.jobs,0) Jobs
from CTE c
left join cte1 t
on c.IntervalTime=cast(t.interval as time)
cross apply(select number from master.dbo.spt_values
where name is null and number<=(select max(rn) from cte1))n
)t4
where Newinterval is not null

SQL Join one-to-many tables, selecting only most recent entries

This is my first post - so I apologise if it's in the wrong seciton!
I'm joining two tables with a one-to-many relationship using their respective ID numbers: but I only want to return the most recent record for the joined table and I'm not entirely sure where to even start!
My original code for returning everything is shown below:
SELECT table_DATES.[date-ID], *
FROM table_CORE LEFT JOIN table_DATES ON [table_CORE].[core-ID] = table_DATES.[date-ID]
WHERE table_CORE.[core-ID] Like '*'
ORDER BY [table_CORE].[core-ID], [table_DATES].[iteration];
This returns a group of records: showing every matching ID between table_CORE and table_DATES:
table_CORE date-ID iteration
1 1 1
1 1 2
1 1 3
2 2 1
2 2 2
3 3 1
4 4 1
But I need to return only the date with the maximum value in the "iteration" field as shown below
table_CORE date-ID iteration Additional data
1 1 3 MoreInfo
2 2 2 MoreInfo
3 3 1 MoreInfo
4 4 1 MoreInfo
I really don't even know where to start - obviously it's going to be a JOIN query of some sort - but I'm not sure how to get the subquery to return only the highest iteration for each item in table 2's ID field?
Hope that makes sense - I'll reword if it comes to it!
--edit--
I'm wondering how to integrate that when I'm needing all the fields from table 1 (table_CORE in this case) and all the fields from table2 (table_DATES) joined as well?
Both tables have additional fields that will need to be merged.
I'm pretty sure I can just add the fields into the "SELECT" and "GROUP BY" clauses, but there are around 40 fields altogether (and typing all of them will be tedious!)
Try using the MAX aggregate function like this with a GROUP BY clause.
SELECT
[ID1],
[ID2],
MAX([iteration])
FROM
table_CORE
LEFT JOIN table_DATES
ON [table_CORE].[core-ID] = table_DATES.[date-ID]
WHERE
table_CORE.[core-ID] Like '*' --LIKE '%something%' ??
GROUP BY
[ID1],
[ID2]
Your example field names don't match your sample query so I'm guessing a little bit.
Just to make sure that I have everything you’re asking for right, I am going to restate some of your question and then answer it.
Your source tables look like this:
table_core:
table_dates:
And your outputs are like this:
Current:
Desired:
In order to make that happen all you need to do is use a subquery (or a CTE) as a “cross-reference” table. (I used temp tables to recreate your data example and _ in place of the - in your column names).
--Loading the example data
create table #table_core
(
core_id int not null
)
create table #table_dates
(
date_id int not null
, iteration int not null
, additional_data varchar(25) null
)
insert into #table_core values (1), (2), (3), (4)
insert into #table_dates values (1,1, 'More Info 1'),(1,2, 'More Info 2'),(1,3, 'More Info 3'),(2,1, 'More Info 4'),(2,2, 'More Info 5'),(3,1, 'More Info 6'),(4,1, 'More Info 7')
--select query needed for desired output (using a CTE)
; with iter_max as
(
select td.date_id
, max(td.iteration) as iteration_max
from #table_dates as td
group by td.date_id
)
select tc.*
, td.*
from #table_core as tc
left join iter_max as im on tc.core_id = im.date_id
inner join #table_dates as td on im.date_id = td.date_id
and im.iteration_max = td.iteration
select *
from
(
SELECT table_DATES.[date-ID], *
, row_number() over (partition by table_CORE date-ID order by iteration desc) as rn
FROM table_CORE
LEFT JOIN table_DATES
ON [table_CORE].[core-ID] = table_DATES.[date-ID]
WHERE table_CORE.[core-ID] Like '*'
) tt
where tt.rn = 1
ORDER BY [core-ID]

Get the missing value in a sequence of numbers

I made the following query for the SQL Server backend
SELECT TOP(1) (v.rownum + 99)
FROM
(
SELECT incrementNo-99 as id, ROW_NUMBER() OVER (ORDER BY incrementNo) as rownum
FROM proposals
WHERE [year] = '12'
) as v
WHERE v.rownum <> v.id
ORDER BY v.rownum
to find the first unused proposal number.
(It's not about the lastrecord +1)
But I realized ROW_NUMBER is not supported in access.
I looked and I can't find something similar.
Does anyone know how to get the same result as a ROW_NUMBER in access?
Maybe there's a better way of doing this.
Actually people insert their proposal No (incrementID) with no constraint. This number looks like this 13-152. xx- is for the current year and the -xxx is the proposal number. The last 3 digits are supposed to be incremental but in some case maybe 10 times a year they have to skip some numbers. That's why I can't have the auto increment.
So I do this query so when they open the form, the default number is the first unused.
How it works:
Because the number starts at 100, I do -99 so it starts at 1.
Then I compare the row number with the id so it looks like this
ROW NUMBER | ID
1 1 (100)
2 2 (101)
3 3 (102)
4 5 (104)<--------- WRONG
5 6 (105)
So now I know that we skip 4. So I return (4 - 99) = 103
If there's a better way, I don't mind changing but I really like this query.
If there's really no other way and I can't simulate a row number in access, i will use the pass through query.
Thank you
From your question it appears that you are looking for a gap in a sequence of numbers, so:
SELECT b.akey, (
SELECT Top 1 akey
FROM table1 a
WHERE a.akey > b.akey) AS [next]
FROM table1 AS b
WHERE (
SELECT Top 1 akey
FROM table1 a
WHERE a.akey > b.akey) <> [b].[akey]+1
ORDER BY b.akey
Where table1 is the table and akey is the sequenced number.
SELECT T.Value, T.next -1 FROM (
SELECT b.Value , (
SELECT Top 1 Value
FROM tblSequence a
WHERE a.Value > b.Value) AS [next]
FROM tblSequence b
) T WHERE T.next <> T.Value +1

SQL Select Statement For Calculating A Running Average Column

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on the n previous rows in the resultset.
Let me explain
Id Number Average
1 1 NULL
2 3 NULL
3 2 NULL
4 4 2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
5 6 3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
. . .
. . .
The first 3 rows of the Average column are null because there are no previous rows. The row 4 in the Average column shows the average of the Number column from the previous 3 rows.
I need some help trying to construct a SQL Select statement that will do this.
This should do it:
--Test Data
CREATE TABLE RowsToAverage
(
ID int NOT NULL,
Number int NOT NULL
)
INSERT RowsToAverage(ID, Number)
SELECT 1, 1
UNION ALL
SELECT 2, 3
UNION ALL
SELECT 3, 2
UNION ALL
SELECT 4, 4
UNION ALL
SELECT 5, 6
UNION ALL
SELECT 6, 8
UNION ALL
SELECT 7, 10
--The query
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM RowsToAverage rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
Assuming that the Id column is sequential, here's a simplified query for a table named "MyTable":
SELECT
b.Id,
b.Number,
(
SELECT
AVG(a.Number)
FROM
MyTable a
WHERE
a.id >= (b.Id - 3)
AND a.id < b.Id
AND b.Id > 3
) as Average
FROM
MyTable b;
Edit: I missed the point that it should average the three previous records...
For a general running average, I think something like this would work:
SELECT
id, number,
SUM(number) OVER (ORDER BY ID) /
ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID
A simple self join would seem to perform much better than a row referencing subquery
Generate 10k rows of test data:
drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))
;WITH digits AS (
SELECT 0 as Number
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
)
,numbers as (
SELECT
(thousands.Number * 1000)
+ (hundreds.Number * 100)
+ (tens.Number * 10)
+ ones.Number AS Number
FROM digits AS ones
CROSS JOIN digits AS tens
CROSS JOIN digits AS hundreds
CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers
I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
avg(trailing.Number) as MovingAverage
FROM NumberedRows nr
join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number
On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.
Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
Check out some solutions here. I'm sure that you could adapt one of them easily enough.
If you want this to be truly performant, and arn't afraid to dig into a seldom-used area of SQL Server, you should look into writing a custom aggregate function. SQL Server 2005 and 2008 brought CLR integration to the table, including the ability to write user aggregate functions. A custom running total aggregate would be the most efficient way to calculate a running average like this, by far.
Alternatively you can denormalize and store precalculated running values. Described here:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
Performance of selects is as fast as it goes. Of course, modifications are slower.

Resources