SQL Select Statement For Calculating A Running Average Column - sql-server

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on the n previous rows in the resultset.
Let me explain
Id Number Average
1 1 NULL
2 3 NULL
3 2 NULL
4 4 2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
5 6 3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
. . .
. . .
The first 3 rows of the Average column are null because there are no previous rows. The row 4 in the Average column shows the average of the Number column from the previous 3 rows.
I need some help trying to construct a SQL Select statement that will do this.

This should do it:
--Test Data
CREATE TABLE RowsToAverage
(
ID int NOT NULL,
Number int NOT NULL
)
INSERT RowsToAverage(ID, Number)
SELECT 1, 1
UNION ALL
SELECT 2, 3
UNION ALL
SELECT 3, 2
UNION ALL
SELECT 4, 4
UNION ALL
SELECT 5, 6
UNION ALL
SELECT 6, 8
UNION ALL
SELECT 7, 10
--The query
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM RowsToAverage rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr

Assuming that the Id column is sequential, here's a simplified query for a table named "MyTable":
SELECT
b.Id,
b.Number,
(
SELECT
AVG(a.Number)
FROM
MyTable a
WHERE
a.id >= (b.Id - 3)
AND a.id < b.Id
AND b.Id > 3
) as Average
FROM
MyTable b;

Edit: I missed the point that it should average the three previous records...
For a general running average, I think something like this would work:
SELECT
id, number,
SUM(number) OVER (ORDER BY ID) /
ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID

A simple self join would seem to perform much better than a row referencing subquery
Generate 10k rows of test data:
drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))
;WITH digits AS (
SELECT 0 as Number
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
)
,numbers as (
SELECT
(thousands.Number * 1000)
+ (hundreds.Number * 100)
+ (tens.Number * 10)
+ ones.Number AS Number
FROM digits AS ones
CROSS JOIN digits AS tens
CROSS JOIN digits AS hundreds
CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers
I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
avg(trailing.Number) as MovingAverage
FROM NumberedRows nr
join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number
On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.

Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
Check out some solutions here. I'm sure that you could adapt one of them easily enough.

If you want this to be truly performant, and arn't afraid to dig into a seldom-used area of SQL Server, you should look into writing a custom aggregate function. SQL Server 2005 and 2008 brought CLR integration to the table, including the ability to write user aggregate functions. A custom running total aggregate would be the most efficient way to calculate a running average like this, by far.

Alternatively you can denormalize and store precalculated running values. Described here:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
Performance of selects is as fast as it goes. Of course, modifications are slower.

Related

Turn quarterly data into monthly by repeating the quarterly rows by 3

I'm wondering how to repeat each of these rows 3 times to get them from Quarters into months.
I need to repeat the same values in the first 2 columns but depending on the quarter in the third column I would need the other months in that quarter, i.e for the first row '31/01/2021' and '28/02/2021'
So desired output would look like:
Another option is via a CROSS APPLY
Select A.Code
,A.Value
,B.Date
From YourTable A
Cross Apply ( values (EOMonth(dateadd(MONTH,-2,A.Date)))
,(EOMonth(dateadd(MONTH,-1,A.Date)))
,(EOMonth(dateadd(MONTH,-0,A.Date)))
) B(Date)
Results
WITH TABLE_DATA(CODE,VAL,DATED)AS
(
SELECT 'R01',777,'2021-03-31' UNION ALL
SELECT 'R01',833,'2021-06-30' UNION ALL
SELECT 'R01',882,'2021-09-30'
)
SELECT D.CODE,D.VAL,CAST(DATEADD(MONTH,-X.PLACEHOLDER,D.DATED)AS DATE)AS DATED,X.PLACEHOLDER
FROM TABLE_DATA AS D
CROSS JOIN
(
SELECT 0 AS PLACEHOLDER
UNION ALL
SELECT 1
UNION ALL
SELECT 2
)X
ORDER BY D.CODE,DATED;
Could you please check if this query is suitable for you. TABLE_DATA is an example of data you have provided

Find the date when a bit column toggled state

I have this requirement.
My table contains a series of rows with serialnos and several bit columns and date-time.
To Simplify I will focus on 1 bit column.In essence, I need to know the recent date that this bit was toggled.
Ex: The following table depicts the bit values for 7 serials for the latest 6 days (10 to 5).
SQl Fiddle schema + query
I have succesfully managed to get the result in a sample but is taking ages on the real table containing over 30 million records and approx 300K serial nos.
Pseudo -->
For each Serial:
Get (max Date) bit value as A (latest bit value ex 1)
Get (max Date) NOT A as B ( Find most recent date that was ex 0)
Get the (Min Date) > B
Group by SNO
I am sure an optimised approach exists.
For completeness the dataset contains rows that I need to filter out etc. However I can build and add these later when getting the basic executing more efficiently.
Tks for your time!
with cte as
(
select *, rn = ROW_NUMBER() OVER (ORDER BY sno)
from dbo.TestCape2
)
select MAX(y.Device_date) as MaxDate,
y.SNo
from cte x
inner join cte as y
on x.rn = y.rn + 1
and x.SNo = y.SNo
and x.Cape <> y.Cape
group by y.SNo
order by SNo;
And if you're using SQL-Server 2012 and up you can make use of LAG, which will take a look at the previous row.
select max(Device_date) as MaxDate,
SNo
from (
select SNo
,Device_date
,Cape
,LAG (Cape, 1, 0) OVER (PARTITION BY Sno ORDER BY Device_date) AS PrevCape
,LAG (Sno, 1, 0) OVER (PARTITION BY Sno ORDER BY Device_date) AS PrevSno
from dbo.TestCape2) t
where sno = PrevSno
and t.Cape <> t.PrevCape
group by sno
order by sno;

selecting random rows with normal distribution based on a column in SQL Server 2012

FULL DETAILS:
let me explain more clear. this is a table including about 100 question. every question has a BooKRange property that shows from which part of the book, this question hast fetched with values 1,2,3,4. and there is another property called Level that shows level of the difficulty of the question with values 1,2,3,4,5. now i need to randomly select 20 question that have to include all four Book Ranges and all five levels with a normal distribution.
please consider that i need to select distinct rows.
thank you very much.
edit: added the table
CREATE TABLE [dbo].[Question] (
[QuesID] INT IDENTITY (1, 1) NOT NULL,
[BookRange] NVARCHAR (50) NULL,
[Level] NVARCHAR (50) NULL,
PRIMARY KEY CLUSTERED ([QuesID] ASC)
);
You can do this query (assuming a uniform distribution) without doing a union. You just need to specify the ordering correctly.
If you want to select 5 questions from each of the levels, then you can do so by assigning a sequential number to the questions in each level. If these are assigned randomly, then you should meet the requirement of randomness for the levels:
with q as (
select q.*,
row_number() over (partition by [range] order by newid()) as seqnum
from Question q
)
select *
from q
where seqnum <= 5;
If you want to ensure that these is exactly one question for each level and range, but want the questions random, then do:
with q as (
select q.*,
row_number() over (partition by [range], [level] order by newid()) as seqnum
from Question q
)
select *
from q
where seqnum = 1;
By the way, range and level are reserved words in SQL Server. In general, it is good practice to avoid using reserved words for the names of things like tables, columns, stored procedures, and so on.
Select distinct id from table where level=1 order by rand() limit 5 union Select distinct id from table where level=2 order by rand() limit 5 union Select distinct id from table where level=3 order by rand() limit 5 union Select distinct id from table where level=4 order by rand() limit 5
Since you havent provided any table schema, Assuming we have a table dbo.Number with One column with values from 1 - 30 you could do something like this ...
;With NthGroups
AS
(
SELECT * , NTILE(4) OVER (ORDER BY Nums) Np
FROM dbo.Number
),
Top25Perc
AS
(
SELECT TOP 5 * FROM NthGroups
WHERE NP = 1
ORDER BY NEWID()
UNION ALL
SELECT TOP 5 * FROM NthGroups
WHERE NP = 2
ORDER BY NEWID()
UNION ALL
SELECT TOP 5 * FROM NthGroups
WHERE NP = 3
ORDER BY NEWID()
UNION ALL
SELECT TOP 5 * FROM NthGroups
WHERE NP = 4
ORDER BY NEWID()
)
SELECT * FROM Top25Perc
Update
Just read your comment in other answer and you have mentioned you have a column Range with values (1,2,3,4) , this makes query even simpler , you can do something like this
;With
RandTop5
AS
(
SELECT TOP 5 * FROM TableName
WHERE [Range] = 1
ORDER BY NEWID()
UNION ALL
SELECT TOP 5 * FROM TableName
WHERE [Range] = 2
ORDER BY NEWID()
UNION ALL
SELECT TOP 5 * FROM TableName
WHERE [Range] = 3
ORDER BY NEWID()
UNION ALL
SELECT TOP 5 * FROM TableName
WHERE [Range] = 4
ORDER BY NEWID()
)
SELECT * FROM RandTop5

How to select Top % in T-SQL without using Top clause?

How to select Top 40% from a table without using the Top clause (or Top percent, the assignment is a little ambiguous) ? This question is for T-SQL, SQL Server 2008. I am not allowed to use Top for my assignment.
Thanks.
This is what I've tried but seems complicated. Isn't there an easier way ?
select top (convert (int, (select round (0.4*COUNT(*), 0) from MyTable))) * from MyTable
Try the NTILE function:
;WITH YourCTE AS
(
SELECT
(some columns),
percentile = NTILE(10) OVER(ORDER BY SomeColumn DESC)
FROM
dbo.YourTable
)
SELECT *
FROM YourCTE
WHERE percentile <= 4
The NTILE(10) OVER(....) creates 10 groups of percentages over your data - and thus, the top 40% are the groups no. 1, 2, 3, 4 of that result
Use NTILE
CREATE TABLE #temp(StudentID CHAR(3), Score INT)
INSERT #temp VALUES('S1',75 )
INSERT #temp VALUES('S2',83)
INSERT #temp VALUES('S3',91)
INSERT #temp VALUES('S4',83)
INSERT #temp VALUES('S5',93 )
INSERT #temp VALUES('S6',75 )
INSERT #temp VALUES('S7',83)
INSERT #temp VALUES('S8',91)
INSERT #temp VALUES('S9',83)
INSERT #temp VALUES('S10',93 )
SELECT * FROM (
SELECT NTILE(10) OVER(ORDER BY Score) AS NtileValue,*
FROM #temp) x
WHERE NtileValue <= 4
ORDER BY 1
Interesting enough I blogged about NTILE today: Does anyone use the NTILE() windowing function?
A problem with the NTILE(10) answers given so far is that if the table has 15 rows they will return 8 rows (53%) rather than the correct number to make up 40% (6).
If the number of rows is not evenly divisible by number of buckets the extra rows all go into the first buckets rather than being evenly distributed.
This alternative (borrows SQL Menace's table) avoids that issue.
WITH CTE
AS (SELECT *,
ROW_NUMBER() OVER ( ORDER BY Score) AS RN,
COUNT(*) OVER() AS Cnt
FROM #temp)
SELECT StudentID,
Score
FROM CTE
WHERE RN <= CEILING(0.4 * Cnt )
Using Top t-sql command:
select top 10 [Column_1],
[Column_2] from [Table]
order by [Column_1]
Using Paging method:
select
[Column_1],
[Column_2]
from
(Select ROW_NUMBER() Over (ORDER BY [Column_1]) AS Row,
[Column_1],
[Column_2]
FROM [Table]) as [alias]
WHERE (Row between 0 and 10)
This is finding the top 10 with order by [Column_1]...please note this is using [variable] method of documentation.
If you could provide column names and table names i could write much more beneficial t-sql, for example to find the top 40% you are going to need to do another sub-query to get count of all rows then do division, i'd likely do this as a query before i do the main query.
Calculate and set ROWCOUNT for whatever number of records.
Then execute you query for the limited set.
declare #rc as integer
select #rc = count(*)*0.40 from CTE
Set ROWCOUNT #rc
select * from CTE
ROWCOUNT is not deprecated yet - see http://msdn.microsoft.com/en-us/library/ms188774.aspx

How do I select last 5 rows in a table without sorting?

I want to select the last 5 records from a table in SQL Server without arranging the table in ascending or descending order.
This is just about the most bizarre query I've ever written, but I'm pretty sure it gets the "last 5" rows from a table without ordering:
select *
from issues
where issueid not in (
select top (
(select count(*) from issues) - 5
) issueid
from issues
)
Note that this makes use of SQL Server 2005's ability to pass a value into the "top" clause - it doesn't work on SQL Server 2000.
Suppose you have an index on id, this will be lightning fast:
SELECT * FROM [MyTable] WHERE [id] > (SELECT MAX([id]) - 5 FROM [MyTable])
The way your question is phrased makes it sound like you think you have to physically resort the data in the table in order to get it back in the order you want. If so, this is not the case, the ORDER BY clause exists for this purpose. The physical order in which the records are stored remains unchanged when using ORDER BY. The records are sorted in memory (or in temporary disk space) before they are returned.
Note that the order that records get returned is not guaranteed without using an ORDER BY clause. So, while any of the the suggestions here may work, there is no reason to think they will continue to work, nor can you prove that they work in all cases with your current database. This is by design - I am assuming it is to give the database engine the freedom do as it will with the records in order to obtain best performance in the case where there is no explicit order specified.
Assuming you wanted the last 5 records sorted by the field Name in ascending order, you could do something like this, which should work in either SQL 2000 or 2005:
select Name
from (
select top 5 Name
from MyTable
order by Name desc
) a
order by Name asc
You need to count number of rows inside table ( say we have 12 rows )
then subtract 5 rows from them ( we are now in 7 )
select * where index_column > 7
select * from users
where user_id >
( (select COUNT(*) from users) - 5)
you can order them ASC or DESC
But when using this code
select TOP 5 from users order by user_id DESC
it will not be ordered easily.
select * from table limit 5 offset (select count(*) from table) - 5;
Without an order, this is impossible. What defines the "bottom"? The following will select 5 rows according to how they are stored in the database.
SELECT TOP 5 * FROM [TableName]
Well, the "last five rows" are actually the last five rows depending on your clustered index. Your clustered index, by definition, is the way that he rows are ordered. So you really can't get the "last five rows" without some order. You can, however, get the last five rows as it pertains to the clustered index.
SELECT TOP 5 * FROM MyTable
ORDER BY MyCLusteredIndexColumn1, MyCLusteredIndexColumnq, ..., MyCLusteredIndexColumnN DESC
Search 5 records from last records you can use this,
SELECT *
FROM Table Name
WHERE ID <= IDENT_CURRENT('Table Name')
AND ID >= IDENT_CURRENT('Table Name') - 5
If you know how many rows there will be in total you can use the ROW_NUMBER() function.
Here's an examble from MSDN (http://msdn.microsoft.com/en-us/library/ms186734.aspx)
USE AdventureWorks;
GO
WITH OrderedOrders AS
(
SELECT SalesOrderID, OrderDate,
ROW_NUMBER() OVER (ORDER BY OrderDate) AS 'RowNumber'
FROM Sales.SalesOrderHeader
)
SELECT *
FROM OrderedOrders
WHERE RowNumber BETWEEN 50 AND 60;
In SQL Server 2012 you can do this :
Declare #Count1 int ;
Select #Count1 = Count(*)
FROM [Log] AS L
SELECT
*
FROM [Log] AS L
ORDER BY L.id
OFFSET #Count - 5 ROWS
FETCH NEXT 5 ROWS ONLY;
Try this, if you don't have a primary key or identical column:
select [Stu_Id],[Student_Name] ,[City] ,[Registered],
RowNum = row_number() OVER (ORDER BY (SELECT 0))
from student
ORDER BY RowNum desc
You can retrieve them from memory.
So first you get the rows in a DataSet, and then get the last 5 out of the DataSet.
There is a handy trick that works in some databases for ordering in database order,
SELECT * FROM TableName ORDER BY true
Apparently, this can work in conjunction with any of the other suggestions posted here to leave the results in "order they came out of the database" order, which in some databases, is the order they were last modified in.
select *
from table
order by empno(primary key) desc
fetch first 5 rows only
Last 5 rows retrieve in mysql
This query working perfectly
SELECT * FROM (SELECT * FROM recharge ORDER BY sno DESC LIMIT 5)sub ORDER BY sno ASC
or
select sno from(select sno from recharge order by sno desc limit 5) as t where t.sno order by t.sno asc
When number of rows in table is less than 5 the answers of Matt Hamilton and msuvajac is Incorrect.
Because a TOP N rowcount value may not be negative.
A great example can be found Here.
i am using this code:
select * from tweets where placeID = '$placeID' and id > (
(select count(*) from tweets where placeID = '$placeID')-2)
In SQL Server, it does not seem possible without using ordering in the query.
This is what I have used.
SELECT *
FROM
(
SELECT TOP 5 *
FROM [MyTable]
ORDER BY Id DESC /*Primary Key*/
) AS T
ORDER BY T.Id ASC; /*Primary Key*/
DECLARE #MYVAR NVARCHAR(100)
DECLARE #step int
SET #step = 0;
DECLARE MYTESTCURSOR CURSOR
DYNAMIC
FOR
SELECT col FROM [dbo].[table]
OPEN MYTESTCURSOR
FETCH LAST FROM MYTESTCURSOR INTO #MYVAR
print #MYVAR;
WHILE #step < 10
BEGIN
FETCH PRIOR FROM MYTESTCURSOR INTO #MYVAR
print #MYVAR;
SET #step = #step + 1;
END
CLOSE MYTESTCURSOR
DEALLOCATE MYTESTCURSOR
Thanks to #Apps Tawale , Based on his answer, here's a bit of another (my) version,
To select last 5 records without an identity column,
select top 5 *,
RowNum = row_number() OVER (ORDER BY (SELECT 0))
from [dbo].[ViewEmployeeMaster]
ORDER BY RowNum desc
Nevertheless, it has an order by, but on RowNum :)
Note(1): The above query will reverse the order of what we get when we run the main select query.
So to maintain the order, we can slightly go like:
select *, RowNum2 = row_number() OVER (ORDER BY (SELECT 0))
from (
select top 5 *, RowNum = row_number() OVER (ORDER BY (SELECT 0))
from [dbo].[ViewEmployeeMaster]
ORDER BY RowNum desc
) as t1
order by RowNum2 desc
Note(2): Without an identity column, the query takes a bit of time in case of large data
Get the count of that table
select count(*) from TABLE
select top count * from TABLE where 'primary key row' NOT IN (select top (count-5) 'primary key row' from TABLE)
If you do not want to arrange the table in ascending or descending order. Use this.
select * from table limit 5 offset (select count(*) from table) - 5;

Resources