Using a CTE to provide a cumulative total - sql-server

I have a table with some forenames in:
SELECT * FROM d;
Forename
--------------------------------
Robert
Susan
Frances
Kate
May
Alex
Anna
I want to pull a cumulative total of name lengths alphabetically. So far I have:
WITH Names ( RowNum, Forename, ForenameLength )
AS ( SELECT ROW_NUMBER() OVER ( ORDER BY forename ) AS RowNum ,
Forename ,
LEN(forename) AS ForenameLength
FROM d
)
SELECT RowNum ,
Forename ,
ForenameLength ,
ISNULL(ForenameLength + ( SELECT ISNULL(SUM(ForenameLength),0)
FROM Names
WHERE RowNum < n.RowNum
), 0) AS CumLen
FROM NAMES n;
RowNum Forename ForenameLength CumLen
-------------------- -------------------------------- -------------- -----------
1 Alex 4 4
2 Anna 4 8
3 Frances 7 15
4 Kate 4 19
5 May 3 22
6 Robert 6 28
7 Susan 5 33
But I understand that it should be possible to do this (recursively) within the CTE. Anyone know how this could be achieved?
N.B. whilst we are developing on 2012, the current live system is 2008 so any solution would need to be backwards compatible at least in the short term.

You are on SQL Server 2012 and should use sum() over() instead.
select row_number() over(order by d.Forename) as RowNum,
d.Forename,
len(d.Forename) as ForenameLength,
sum(len(d.Forename)) over(order by d.Forename rows unbounded preceding) as CumLen
from d
order by d.Forename;
Result:
RowNum Forename ForenameLength CumLen
-------- ------------ -------------- -----------
1 Alex 4 4
2 Anna 4 8
3 Frances 7 15
4 Kate 4 19
5 May 3 22
6 Robert 6 28
7 Susan 5 33
Update:
If you for some reason absolutely want a recursive version it could look something like this:
with C as
(
select top(1)
1 as RowNum,
d.Forename,
len(d.Forename) as ForenameLength,
len(d.Forename) as CumLen
from d
order by d.Forename
union all
select d.RowNum,
d.Forename,
d.ForenameLength,
d.CumLen
from (
select C.RowNum + 1 as RowNum,
d.Forename,
len(d.Forename) as ForenameLength,
C.CumLen + len(d.Forename) as CumLen,
row_number() over(order by d.ForeName) as rn
from d
inner join C
on C.Forename < d.Forename
) as d
where d.rn = 1
)
select C.RowNum,
C.Forename,
C.ForenameLength,
C.CumLen
from C;
Adapted from Performance Tuning the Whole Query Plan by Paul White.

Related

Finding A Time When A Value Changed

I am still learning many new things about SQL such as PARTITION BY and CTEs. I am currently working on a query which I have cobbled together from a similar question I found online. However, I can not seem to get it to work as intended.
The problem is as follows -- I have been tasked to show rank promotions in an organization from the begining of 2022 to today. I am working with 2 primary tables, an EMPLOYEES table and a PERIODS table. This periods table captures a snapshot of any given employee each month - including their rank at the time. Each of these months is also assigned a PeriodID (e.g. Jan 2022 = PeriodID 131). Our EMPLOYEE table holds the employees current rank. These ranks are stored as an int (e.g. 1,2,3 with 1 being lowest rank). It is possible for an employee to rank up more than once in any given month.
I have simplified the used query as much as I can for the sake of this problem. Query follows as:
;WITH x AS
(
SELECT
e.EmployeeID, p.PeriodID, p.RankID,
rn = ROW_NUMBER() OVER (PARTITION BY e.EmployeeID ORDER BY p.PeriodID DESC)
FROM employees e
LEFT JOIN periods p on p.EmployeeID= e.EmployeeID
WHERE p.PeriodID <= 131 AND p.PeriodID >=118 --This is the time range mentioned above
),
rest AS (SELECT * FROM x WHERE rn > 1)
SELECT
main.EmployeeID,
PeriodID = MIN(
CASE
WHEN main.CurrentRankID = Rest.RankID
THEN rest.PeriodID ELSE main.PeriodID
END),
main.RankID, rest.RankID
FROM x AS main LEFT OUTER JOIN rest ON main.EmployeeID = rest.EmployeeID
AND rest.rn >1
LEFT JOIN periods p on p.EmployeeID = e.EmployeeID
WHERE main.rn = 1
AND NOT EXISTS
(
SELECT 1 FROM rest AS rest2
WHERE EmployeeID = rest.EmployeeID
AND rn < rest.rn
AND main.RankID <> rest.RankID
)
and p.PeriodID <= 131 AND p.PeriodID >=118
GROUP BY main.EmployeeID, main.PeriodID, main.RankID, rest.RankID
As mentioned before, this query was borrowed from a similar question and modified for my own use. I imagine the bones of the query is good and maybe I have messed up a variable somewhere but I can not seem to locate the problem line. The end goal is for the query to result in a table showing the EmployeeID, PeriodID, the rank they are being promoted from, and the rank they are being promoted to in the month the promotion was earned. Similar to the below.
EmployeeID
PeriodID
PerviousRankID
NewRank
123
131
1
2
123
133
2
3
Instead, my query is spitting out repeating previous/current ranks and the PeriodIDs seem to be static (such as what is shown below).
EmployeeID
PeriodID
PerviousRankID
NewRank
123
131
1
1
123
131
1
1
I am hoping someone with a greater knowledge base on these functions is able to quickly notice my mistake.
If we assume some example DML/DDL (it's really helpful to provide this with your question):
DECLARE #Employees TABLE (EmployeeID INT IDENTITY, Name VARCHAR(20), RankID INT);
DECLARE #Periods TABLE (PeriodID INT, EmployeeID INT, RankID INT);
INSERT INTO #Employees (Name, RankID) VALUES ('Jonathan', 10),('Christopher', 10),('James', 10),('Jean-Luc', 8);
INSERT INTO #Periods (PeriodID, EmployeeID, RankID) VALUES
(1,1,1),(2,1,1),(3,1,1),(4,1,8 ),(5,1,10),(6,1,10),
(1,2,1),(2,2,1),(3,2,1),(4,2,8 ),(5,2,8 ),(6,2,10),
(1,3,1),(2,3,1),(3,3,7),(4,3,10),(5,3,10),(6,3,10),
(1,4,1),(2,4,1),(3,4,1),(4,4,8 ),(5,4,9 ),(6,4,9 )
Then we can accomplish what I think you're looking for using a OUTER APPLY then aggregates the values based on the current-row values:
SELECT e.EmployeeID, e.Name, e.RankID AS CurrentRank, ap.PeriodID AS ThisPeriod, p.PeriodID AS LastRankChangePeriodID, p.RankID AS LastRankChangedFrom, ap.RankID - p.RankID AS LastRankChanged
FROM #Employees e
LEFT OUTER JOIN #Periods ap
ON e.EmployeeID = ap.EmployeeID
OUTER APPLY (
SELECT EmployeeID, MAX(PeriodID) AS PeriodID
FROM #Periods
WHERE EmployeeID = e.EmployeeID
AND RankID <> ap.RankID
AND PeriodID < ap.PeriodID
GROUP BY EmployeeID
) a
LEFT OUTER JOIN #Periods p
ON a.EmployeeID = p.EmployeeID
AND a.PeriodID = p.PeriodID
ORDER BY e.EmployeeID, ap.PeriodID DESC
Using the correlated subquery we get a view of the data which we can filter using the current-row values, and we aggregate that to return the period we're looking for (where it's before this period, and it's not the same rank). Then it's just a join back to the Periods table to get the values.
You used an LEFT JOIN, so I've preserved that using an OUTER APPLY. If you wanted to filter using it, it would be a CROSS APPLY instead.
EmployeeID
Name
CurrentRank
ThisPeriod
LastRankChangePeriodID
LastRankChangedFrom
LastRankChanged
1
Jonathan
10
6
4
8
2
1
Jonathan
10
5
4
8
2
1
Jonathan
10
4
3
1
7
1
Jonathan
10
3
1
Jonathan
10
2
1
Jonathan
10
1
2
Christopher
10
6
5
8
2
2
Christopher
10
5
3
1
7
2
Christopher
10
4
3
1
7
2
Christopher
10
3
2
Christopher
10
2
2
Christopher
10
1
3
James
10
6
3
7
3
3
James
10
5
3
7
3
3
James
10
4
3
7
3
3
James
10
3
2
1
6
3
James
10
2
3
James
10
1
4
Jean-Luc
8
6
5
9
-1
4
Jean-Luc
8
5
4
8
1
4
Jean-Luc
8
4
3
1
7
4
Jean-Luc
8
3
4
Jean-Luc
8
2
4
Jean-Luc
8
1
Now we can see what the previous change looked like for each period. Currently Jonathan is has RankID 10. Last time that was different was in PeriodID 4 when it was 8. The same was true for PeriodID 5. In PeriodID 4 he had RankID 8, and prior to that he had RankID 1. Before that his Rank hadn't changed.
Jean-Luc was actually demoted as his last change. I don't know if this is possible within your model.

SQL Server - select column using in having count()

This is my first question (and sorry for my English)
I have this table in SQL Server:
id_patient | date | id_drug
----------------------------------------------------
1 20200101 A
1 20200102 A
1 20200103 A
1 20200104 A
1 20200105 A
1 20200110 A
2 20200101 A
2 20200105 B
2 20200106 C
2 20200107 D
2 20200108 E
2 20200110 L
3 20200101 A
3 20200102 A
3 20200103 A
3 20200104 A
3 20200105 C
3 20200106 C
4 20200105 A
4 20200106 D
4 20200107 D
5 20200105 A
5 20200106 A
5 20200107 C
5 20200108 D
I would like to extract patient and drug for all patients who have taken at least 3 different drugs in a given period
I have tried:
select id_patient, count(distinct ID_drug)
from table
where date between XXX and YYY
group by id_patient
having count(Distinct ID_drug) > 3
but in this way -YES- I get all patients with 3 or more different id_drug in this date range but I can't get the ID_drug because in the count()
For example, I'd like to obtain:
Who help me ?
Thanks
You can use string_agg() in the most recent versions of SQL Server:
select id_patient, count(distinct ID_drug),
string_agg(id_drug, ',')
from table
where date between XXX and YYY
group by id_patient
having count(Distinct ID_drug) > 3;
If you want the original rows, you can use window functions. Unfortunately, SQL Server does not support count(distinct) as a window function, but there is an easy work-around using dense_rank():
select t.*
from (select t.*,
(dense_rank() over (partition by id_patient order by id_drug) +
dense_rank() over (partition by id_patient order by id_drug desc)
) as num_drugs
from t
where . . .
) t
where num_drugs >= 3;
SELECT id_patient,
ID_drug
FROM table
WHERE id_patient IN (
SELECT id_patient
FROM table
WHERE date
BETWEEN XXX
AND YYY
GROUP BY id_patient
HAVING COUNT(DISTINCT ID_drug) >= 3
)
GROUP BY id_patient,
ID_drug;

How to get desired number of rows for each group / category in SQL Server

I have this query for retrieving rows from a SQL Server table:
SELECT
aid,
research_area_category_id,
CAST(research_area as VARCHAR(100)) [research_area],
COUNT(*) [Paper_Count]
FROM
sub_aminer_paper
GROUP BY
aid,
research_area_category_id,
CAST(research_area as VARCHAR(100))
HAVING
aid IN (SELECT
aid
FROM
sub_aminer_paper
GROUP BY
aid
HAVING
MIN(p_year) = 1990 AND MAX(p_year) = 2014 AND COUNT(pid) BETWEEN 10 AND 40
)
ORDER BY aid ASC, Paper_Count DESC
which returns this output:
aid research_area_category_id research_area Paper_Count
2937 33 markov chain 3
2937 33 markov decision process 1
2937 1 optimization problem 1
2937 27 real time application 1
2937 32 software product lines 1
11120 29 aspect oriented programming 4
11120 1 graph cut 2
11120 1 optimization problem 2
11120 32 uml class diagrams 1
11120 25 chinese word segmentation 1
11120 29 dynamic programming 1
11120 19 face recognition 1
11120 1 approximation algorithm 1
12403 2 differential equation 7
12403 1 data structure 2
12403 34 design analysis 1
12403 9 object detection 1
12403 27 operating system 1
12403 1 problem solving 1
12403 21 archiving system 1
12403 2 calculus 1
Now this is returning the output including all of rows concerned with respective aid's whereas I need only first 3 rows for each aid ORDER BY Paper_Count DESC i.e. rows containing value of Paper_Count 3, 1, 1 for aid 2937, 4,2,2 for 11120 and 7,2,2 for 12403.
Please help! Thanks.
one way is to apply row_number() over(partition by aid order by Paper_Count desc) as rn on your resultset and then select all records with rn<=3
with cte
as
(
SELECT
aid,
research_area_category_id,
CAST(research_area as VARCHAR(100)) [research_area],
COUNT(*) [Paper_Count]
FROM
sub_aminer_paper
GROUP BY
aid,
research_area_category_id,
CAST(research_area as VARCHAR(100))
HAVING
aid IN (SELECT
aid
FROM
sub_aminer_paper
GROUP BY
aid
HAVING
MIN(p_year) = 1990 AND MAX(p_year) = 2014 AND COUNT(pid) BETWEEN 10 AND 40
)
ORDER BY aid ASC, Paper_Count DESC
)
,
cte1
AS
(
SELECT * ,
ROW_NUMBER() OVER (PARTITION BY aid ORDER BY Paper_Count DESC) AS rn
FROM cte
)
SELECT * FROM cte1 WHERE rn<=3

Max row count (top) customer per year

Doing some stats page for my skydiving club I have two SQL queries I would like to merge.
One shows the top three jumpers with the most jumps on the first load this year:
select top 3 ROW_NUMBER() OVER(ORDER BY count(1) desc) AS Nr
, vc.sCust as Name
, count(1) as JumpsCount
from dbo.ViewInvoice vi with(nolock)
join dbo.viewCust vc
on vi.wCustId = vc.wCustId
where year(vi.dtProcess) = year(getdate())
and vi.nMani = 1
group by vc.sCust
order by count(1) desc
OUTPUT:
-- Nr Name JumpsCount
-- 1 Tom Awesome 17
-- 2 Alan Jackson 16
-- 3 John Thebest 13
The next query shows all the years of jumping in the DB:
select distinct year(vi.dtProcess) As Datum
from dbo.ViewInvoice vi
order by 1
OUTPUT:
-- Datum
-- 2010
-- ...
-- 2014
What I would like is to get a query that merge the results with an output like this:
-- Datum Nr Name Jumpscount
-- 2010 1 Some OldVeteran 100
-- 2010 2 Alan Jackson 96
-- 2010 3 Gordon McGann 89
-- ...
-- 2014 1 Tom Awesome 17
-- 2014 2 Alan Jackson 16
-- 2014 3 John Thebest 13
Changed to RANK so jumpers with same amount of jumps gets the same postion.
WITH JumpsPerYear AS (
SELECT YEAR(dtProcess) AS [Year]
,wCustId
,COUNT(1) AS JumpsCount
FROM ViewInvoice
WHERE nMani = 1 -- first load/manifest number
GROUP BY YEAR(dtProcess)
,wCustId
)
,RankPerYear AS (
SELECT [Year]
,wCustId
,RANK() OVER (PARTITION BY [Year] ORDER BY JumpsCount DESC) AS [Rank]
,JumpsCount
FROM JumpsPerYear
)
SELECT RankPerYear.[Year] AS Datum
,RankPerYear.[Rank] AS Nr
,ViewCust.sCust AS Name
,RankPerYear.JumpsCount
FROM RankPerYear
INNER JOIN ViewCust
ON ViewCust.wCustId = RankPerYear.wCustId
WHERE RankPerYear.[Rank] <= 3
ORDER BY RankPerYear.[Year]
,RankPerYear.[Rank]

Applying grouped ranking using ROW_NUMBER

I m Looking for ways to assign the row numbers as below for the table
Roll No Name Score
1 ABC 10
1 ABC 10
1 DEF 8
2 ASC 9
2 YHN 4
3 IOP 5
3 YHN 4
I m looking for a way to assign the roll no as Rownumber()
Roll No Name Score Row_Number
1 ABC 10 1
1 ABC 10 2
1 DEF 8 3
2 ASC 9 1
2 YHN 4 2
3 IOP 5 1
3 YHN 4 2
I m trying to work around with Row_number() , it is isnt working . ANy inputs on this world be great :)
Thanks !!!!
SELECT [Roll No], Name, Score, [ROW_NUMBER] =
ROW_NUMBER() OVER (PARTITION BY [Roll No] ORDER BY Score DESC)
FROM dbo.table
ORDER BY [Roll No], [ROW_NUMBER];
If you later decide that you want to handle ties in a different way, play with using RANK() or DENSE_RANK() in place of ROW_NUMBER()...

Resources