T-SQL Multi-layered CTE query with Aggregates - sql-server

I have a long Common Table Expression (CTE) query which is trying to calculate percent difference between each users' average score and group average score.
I would like for my multi-layered CTE query to filter and reduce bulk of records down to the following table:
UserID Tag UserAvg GroupAvg PercentDifference
1 Cat 72.50 73 -0.68
2 Cat 75.50 73 3.36
3 Cat 75 73 2.70
4 Cat 73.25 73 0.34
5 Cat 52.3333 73 -32.97
6 Cat 86.25 73 16.64
My problem is getting GroupAvg column so that I can perform % Difference calculation.
To illustrate the current approach I am using; here is the summary of my CTE query:
WITH
-- select 1st 3 columns
UserScores AS (select UserID, Tag, Score FROM {multiple-table} WHERE Tag = 'Cat'),
-- add UserAvg column by grouping records
ScoreAverages AS (select UserID, Tag, AVG(Score) AS UserAvg GROUP BY UserID, Tag FROM UserScores),
-- calculate GroupAvg
GroupAverage AS (select AVG(UserAvg) AS GroupAvg FROM ScoreAverages),
-- calculate % difference
PercentDiff AS (select UserID, Tag, UserAvg, 73 AS GroupAvg, (((UserAvg-73)/((UserAvg+73)/2))*100) AS PercentDifference FROM ScoreAverages )
-- do something with results
select * from PercentDiff
Simple enough; right?
Notice that I have hard coded 73 as my GroupAvg value. I am unsure how to construct required sql query that would allow me to go from ScoreAverages to PercentDiff table.
Is it possible to perform SELECT within a SELECT statement? And I am not looking for something of the following:
select * from X where Id in (select Id from Y where Name like '%abc%')
Or I am simply trying to do too much in one go?

Yes, it's called a sub-select:
SELECT Column1, Column2, (SELECT QUERY THAT GETS GROUP AVERAGE) AS GroupAverage, Column3
FROM ...
To use the result of the sub-select in another column's calculation, you can either repeat the sub-select:
SELECT Column1, Column2, (SELECT QUERY THAT GETS GROUP AVERAGE) AS GroupAverage, (Column3 - (SELECT QUERY THAT GETS GROUP AVERAGE)) AS Column4
FROM ...
Or you can reference it the same as you would any other column in the outer query or a subsequent CTE:
WITH CTE1 AS (SELECT Column1, Column2, (SELECT QUERY THAT GETS GROUP AVERAGE) AS GroupAverage
FROM ...)
, CTE2 AS (SELECT *, Column3-GroupAverage) AS Column4
FROM CTE1
JOIN ...

It is possible, as shown in Tab Alleman's answer, but in your case it's not necessary. Since you already calculate the GroupAvg in the cte chain, you can use it in the final query. and since the GroupAverage only contains one row, you can simply add a CROSS JOIN to it:
;WITH
-- select 1st 3 columns
UserScores AS (
select UserID, Tag, Score
FROM {multiple-table}
WHERE Tag = 'Cat'),
-- add UserAvg column by grouping records
ScoreAverages AS (
select UserID, Tag, AVG(Score) AS UserAvg
FROM UserScores
GROUP BY UserID, Tag),
-- calculate GroupAvg
GroupAverage AS (
select AVG(UserAvg) AS GroupAvg
FROM ScoreAverages),
-- calculate % difference
PercentDiff AS (
select UserID, Tag, UserAvg, GroupAvg,
(((UserAvg-GroupAvg)/((UserAvg+GroupAvg)/2))*100) AS PercentDifference
FROM ScoreAverages
CROSS JOIN GroupAverage)
-- do something with results
select * from PercentDiff

I just thought you could do this with a single cte like so.
;WITH UserAverages AS
(
SELECT UserID,
Tag,
AVG(Score) AS UserAvg,
AVG(AVG(Score)) OVER () AS GroupAvg
FROM {multiple-table}
WHERE Tag = 'Cat'
GROUP BY UserID, Tag
)
SELECT UserID,
Tag,
UserAvg,
GroupAvg,
(((UserAvg-GroupAvg)/((UserAvg+GroupAvg)/2))*100) AS PercentDifference
FROM UserAverages

Related

select statement with "Group by" on specific columns but displaying other columns along with group by columns

I want to get all data based on group by of only encounter,medicationname
column data..
select encounter,medicationname,count(*) as freq,labdate,result
from Medications where (labdate between #admitdate and DATEDIFF(dd,24,#admitdate))
group by encounter,medicationname having count(*)>2
I have records like
encounter medicationname freq
8604261 ACC 3
Now based on this data ,I want to get
This is my desired output
encounter medicationname labtime result
8604261 ACC 2015-05-22 18
8604261 ACC 2015-07-23 23
8604261 ACC 2015-09-09 27
You can use COUNT() as a window function, something like this:
;With Counted as (
SELECT encounter,medicationname,labdate,result,
COUNT(*) OVER (PARTITION BY encounter,medicationname) as cnt
from Medications
where (labdate between #admitdate
and DATEDIFF(dd,24,#admitdate))
)
select encounter,medicationname,labdate,result
from Counted
where cnt > 2
I would note that I think DATEDIFF1 is probably wrong also but since I don't have your data, inputs and an actual spec, I've left it as is for now.
1DATEDIFF returns an int, but you're using it in a comparison against a column which is apparently a date. DATEADD would be the more probably desired function here, but as I say, I don't have full information to go on.
If I understand you question correctly what you need is this
;WITH CTE AS
(
select encounter,medicationname,count(*) as freq,labdate,result
from Medications where (labdate between #admitdate and DATEDIFF(dd,24,#admitdate))
group by encounter,medicationname having count(*) > 2
)
select encounter,medicationname,labdate,result
from Medications M
INNER JOIN CTE C
ON M.encounter = C.encounter
AND M.medicationname = C.medicationname
where (labdate between #admitdate and DATEDIFF(dd,24,#admitdate))
or better yet using COUNT()OVER()
;WITH CTE AS
(
SELECT encounter,medicationname,COUNT(*) OVER(PARTITION BY encounter,medicationname)as freq,labdate,result
FROM Medications
WHERE (labdate between #admitdate and DATEDIFF(dd,24,#admitdate))
)
SELECT * FROM CTE
WHERE freq > 2
select encounter,medicationname,count(*) as freq,labdate,result
from Medications
where (labdate between #admitdate and DATEDIFF(dd,24,#admitdate))
group by encounter,medicationname having count(*) > 2

T-SQL order by, based on other column value

I'm stuck with a query which should be pretty simple but, for reasons unknown, my brain is not playing ball here ...
Table:
id(int) | strategy (varchar) | value (whatever)
1 "ABC" whatevs
2 "ABC" yeah
3 "DEF" hello
4 "DEF" kitty
5 "QQQ" hurrr
The query should select ALL rows grouped on strategy but only one row per strategy - the one with the higest id.
In the case above, it should return rows with id 2, 4 and 5
SELECT id, strategy , value
FROM (
SELECT id, strategy , value
,ROW_NUMBER() OVER (PARTITION BY strategy ORDER BY ID DESC) rn
FROM Table_Name
) Sub
WHERE rn = 1
Working SQL FIDDLE
You can use window function to get the solution you want. Fiddle here
with cte as
(
select
rank()over(partition by strategy order by id desc) as rnk,
id, strategy, value from myT
)
select id, strategy, value from
cte where rnk = 1;
Try this:
SELECT T2.id,T1.strategy,T1.value
FROM TableName T1
INNER JOIN
(SELECT MAX(id) as id,strategy
FROM TableName
GROUP BY strategy) T2
ON T1.id=T2.id
Result:
ID STRATEGY VALUE
2 ABC yeah
4 DEF kitty
5 QQQ hurrr
See result in SQL Fiddle.
SELECT id, strategy , value
FROM (
SELECT id, strategy , value
,MAX(id) OVER (PARTITION BY strategy) MaxId
FROM YourTable
) Sub
WHERE id=MaxId
You may try this one as well:
SELECT id, strategy, value FROM TableName WHERE id IN (
SELECT MAX(id) FROM TableName GROUP BY strategy
)
Bit depends on your data, you might get results faster with it as it does not do sorting, but by the other hand it uses IN, which can slow you down if there is many 'strategies'

How to select data top x data after y rows from SQL Server

For example I have a table which contains 10'000 rows. I want to select top 100 rows after top 500th row. How can I do this most efficiently.
Query needed for SQL Server 2008
For example i have this query already but i wonder are there any more effective solution
SELECT TOP 100 xx
FROM nn
WHERE cc NOT IN
(SELECT TOP 500 cc
FROM nn ORDER BY cc ASC)
Tutorial 25: Efficiently Paging Through Large Amounts of Data
with cte as (
SELECT ...,
ROW_NUMBER () OVER (ORDER BY ...) as rn
FROM ...)
SELECT ... FROM cte
WHERE rn BETWEEN 500 and 600;
Select T0P 600 *
from my table
where --whatever condition you want
except
select top 500 *
from mytable
where --whatever condition you want
SELECT
col1,
col2
FROM (
SELECT ROW_NUMBER() OVER (
ORDER BY [t0].someColumn) as ROW_NUMBER,
col1,
col2
FROM [dbo].[someTable] AS [t0]
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN 501 and 600
ORDER BY [t1].[ROW_NUMBER]
Selecting TOP 500, then concatenating the TOP 100 to the result set.
Normally, in order to worth doing this, you need to have some criteria on which to base what your need 500 records for, and only 100 for another condition. I assume that these conditions are condition1 for the TOP 500, and condition2 for the TOP 100 you want. Because the conditions differ, that is the reason why the records might not be the same based on TOP 100.
select TOP 500 *
from MyTable
where -- condition1 -- Retrieving the first 500 rows meeting condition1
union
select TOP 100 *
from MyTable
where -- condition2 -- Retrieving the first 100 rows meeting condition2
-- The complete result set of the two queries will be combined (UNIONed) into only one result set.
EDIT #1
this is not what i meant. i want to select top 100 rows coming after top 500 th row. so selecting rows 501-600
After your comment, I better understood what you want to achieve. Try this:
WITH Results AS (
select TOP 600 f.*, ROW_NUMBER() OVER (ORDER BY f.[type]) as RowNumber
from MyTable f
) select *
from Results
where RowNumber between 501 and 600
Does this help?

SQL Select Statement For Calculating A Running Average Column

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on the n previous rows in the resultset.
Let me explain
Id Number Average
1 1 NULL
2 3 NULL
3 2 NULL
4 4 2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
5 6 3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
. . .
. . .
The first 3 rows of the Average column are null because there are no previous rows. The row 4 in the Average column shows the average of the Number column from the previous 3 rows.
I need some help trying to construct a SQL Select statement that will do this.
This should do it:
--Test Data
CREATE TABLE RowsToAverage
(
ID int NOT NULL,
Number int NOT NULL
)
INSERT RowsToAverage(ID, Number)
SELECT 1, 1
UNION ALL
SELECT 2, 3
UNION ALL
SELECT 3, 2
UNION ALL
SELECT 4, 4
UNION ALL
SELECT 5, 6
UNION ALL
SELECT 6, 8
UNION ALL
SELECT 7, 10
--The query
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM RowsToAverage rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
Assuming that the Id column is sequential, here's a simplified query for a table named "MyTable":
SELECT
b.Id,
b.Number,
(
SELECT
AVG(a.Number)
FROM
MyTable a
WHERE
a.id >= (b.Id - 3)
AND a.id < b.Id
AND b.Id > 3
) as Average
FROM
MyTable b;
Edit: I missed the point that it should average the three previous records...
For a general running average, I think something like this would work:
SELECT
id, number,
SUM(number) OVER (ORDER BY ID) /
ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID
A simple self join would seem to perform much better than a row referencing subquery
Generate 10k rows of test data:
drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))
;WITH digits AS (
SELECT 0 as Number
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
)
,numbers as (
SELECT
(thousands.Number * 1000)
+ (hundreds.Number * 100)
+ (tens.Number * 10)
+ ones.Number AS Number
FROM digits AS ones
CROSS JOIN digits AS tens
CROSS JOIN digits AS hundreds
CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers
I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
avg(trailing.Number) as MovingAverage
FROM NumberedRows nr
join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number
On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.
Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
Check out some solutions here. I'm sure that you could adapt one of them easily enough.
If you want this to be truly performant, and arn't afraid to dig into a seldom-used area of SQL Server, you should look into writing a custom aggregate function. SQL Server 2005 and 2008 brought CLR integration to the table, including the ability to write user aggregate functions. A custom running total aggregate would be the most efficient way to calculate a running average like this, by far.
Alternatively you can denormalize and store precalculated running values. Described here:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
Performance of selects is as fast as it goes. Of course, modifications are slower.

How to retrieve the total row count of a query with TOP

I have a SQL Server 2008 query
SELECT TOP 10 *
FROM T
WHERE ...
ORDER BY ...
I'd like to get also the total number of the rows. The obious way is to make a second query
SELECT COUNT(*)
FROM T
WHERE ...
ORDER BY ...
Is there an efficient method?
Thanks
Do you want a second query?
SELECT TOP 10
*, foo.bar
FROM
T
CROSS JOIN
(SELECT COUNT(*) AS bar FROM T WHERE ...) foo
WHERE
...
ORDER BY
...
OR
DECLARE #bar int
SELECT #bar = COUNT(*) AS bar FROM T WHERE ...
SELECT TOP 10
*, #bar
FROM
T
CROSS JOIN
(SELECT COUNT(*) AS bar FROM T WHERE ...) foo
WHERE
...
ORDER BY
...
Or (Edit: using WITH)
WITH cTotal AS
(
SELECT COUNT(*) AS bar FROM T WHERE ...)
)
SELECT TOP 10
*, cTotal .bar
FROM
T
WHERE
...
ORDER BY
...
What is in this answer seems to work:
https://stackoverflow.com/a/19125458/16241
Basically you do a:
SELECT top 100 YourColumns, TotalCount = Count(*) Over()
From YourTable
Where SomeValue = 32
TotalCount will have the total number of rows. It is listed on each row though.
When I tested this the query plan showed the table only being hit once.
Remove the ORDER BY clause from the 2nd query as well.
No.
SQL Server doesn't keep COUNT(*) in metadata like MyISAM, it calculates it every time.
UPDATE: If you need an estimate, you can use statistics metadata:
SELECT rows
FROM dbo.sysindexes
WHERE name = #primary_key,
where #primary_key is your table's primary key name.
This will return the COUNT(*) from last statistics update.
SELECT TOP (2) *,
(SELECT COUNT(*) AS Expr1 FROM T) AS C
FROM T

Resources