SQL Server: Sum of Calculated Row within N Months - sql-server

I am new to SQL Server and have a question regarding summing over a calculated row with a conditional statement.
My data is organized as follows:
ID S_DATE END_DATE MNum CHG DateCHG
---------------------------------------------
1 1/26/2001 2/26/2001 7 NULL 1
1 2/27/2001 3/27/2001 8 1 1
1 3/28/2001 1/9/2003 9 1 21
1 1/10/2003 3/2/2004 11 2 14
1 3/3/2004 10/14/2004 10 -1 7
1 10/15/2004 6/22/2005 9 -1 8
1 6/23/2005 3/9/2008 8 -1 33
1 3/10/2008 1899-12-30 0 NULL -1299
2 9/23/1993 9/11/2000 3 NULL 84
2 1/1/1999 12/31/1998 3 0 -1
2 9/12/2000 11/13/2001 2 -1 14
2 11/14/2001 1899-12-30 0 NULL -1223
DateCHG is equal to the number of months between S_DATE & End_Date. I would like to find the SUM of CHG for each ID where the CHG occurs within 3 months of previous date.
Here is my current code (NOTE: Column headers are different from data above for formatting purposes. Also I cannot write to this database so only in Query format)
SELECT
*,
CASE
WHEN MratingNum = 0 OR
LAG(MratingNum) OVER (OVER BY MAST_ISSU_NUM, RATG_DATETIME) = 0 OR
MAST_ISSU_NUM <> LAG(MAST_ISSU_NUM) OVER (ORDER BY MAST_ISSU_NUM, RATG_DATETIME) --OR
--LAG(MratingNum) OVER (ORDER BY MAST_ISSU_NUM, RATG_DATETIME) < 12 OR --By Credit Rating
--LAG(MratingNum) OVER (ORDER BY MAST_ISSU_NUM, RATG_DATETIME) < 18
THEN NULL
ELSE CAST(MratingNum AS INT) - LAG(MratingNum) OVER (ORDER BY MAST_ISSU_NUM, RATG_DATETIME)
END AS CHG,
DATEDIFF(month, RATG_DATETIME, RATG_END_DATETIME) AS DateCHG
FROM
MOODYS_DRD.dbo.DEBT_RATG AS t1
LEFT JOIN
sandbox.dbo.RatingMap AS t2 ON t1.RATG_TXT = t2.MratingValue
WHERE
RATG_TYP_CD = 'LT'
ORDER BY
MAST_ISSU_NUM, RATG_DATETIME
So for example the output would look something like this:
ID S_DATE .... SumCHG
1 1/26/2001.... NULL
1 2/27/2001.... NULL
1 3/28/2001.... 2
1 1/10/2003.... NULL
1 3/3/2004 .... NULL
I'm assuming the best approach is to calculate a rolling sum of DateCHG where it is less than 3 and then SUM the CHG column? Thanks all!
EDIT: This is fairly complex so let me try another way of asking the question. For each record I want to look back and find the SUM of CHG within 3 months of the S_DATE. For 3/28/2001, this would include 2/01 and 1/01. The MNum went from 7 to 9 so the SUM of CHG would be 2. However from 3/04, there were no changes in the past 3 months so return NULL. I obviously want to do this per ID so don't want to overlap 3 months from ID 2 to 1. Hope this makes more sense now?

t0 and t are used for setting up the data.
with t0
as ( select *
from ( values ( 1, '1/26/2001', '2/26/2001', 7, null, 1),
( 1, '2/27/2001', '3/27/2001', 8, 1, 1),
( 1, '3/28/2001', '1/9/2003', 9, 1, 21),
( 1, '1/10/2003', '3/2/2004', 11, 2, 14),
( 1, '3/3/2004', '10/14/2004', 10, -1, 7),
( 1, '10/15/2004', '6/22/2005', 9, -1, 8),
( 1, '6/23/2005', '3/9/2008', 8, -1, 33),
( 1, '3/10/2008', '1899-12-30', 0, null, -1299),
( 2, '9/23/1993', '9/11/2000', 3, null, 84),
( 2, '1/1/1999', '12/31/1998', 3, 0, -1),
( 2, '9/12/2000', '11/13/2001', 2, -1, 14),
( 2, '11/14/2001', '1899-12-30', 0, null, -1223) ) t ( ID, S_DATE, END_DATE, MNum, CHG, DateCHG )
),
t as ( select t0.ID ,
cast(t0.S_DATE as date) S_DATE ,
cast(t0.END_DATE as date) END_DATE ,
t0.MNum ,
t0.CHG ,
t0.DateCHG
from t0
)
select case when Cnt >= 3 then p.CHG
end SumCHG,
*
from t
outer apply ( select sum(u.CHG) CHG ,
count(*) Cnt
from t u
where u.ID = t.ID
and u.S_DATE between dateadd(month, -3,
t.S_DATE)
and t.S_DATE
) p
order by t.ID ,
t.S_DATE;
Use CTE for your tables,
;with t as (
SELECT
*,
CASE
WHEN MratingNum = 0 OR
LAG(MratingNum) OVER (OVER BY MAST_ISSU_NUM, RATG_DATETIME) = 0 OR
MAST_ISSU_NUM <> LAG(MAST_ISSU_NUM) OVER (ORDER BY MAST_ISSU_NUM, RATG_DATETIME) --OR
--LAG(MratingNum) OVER (ORDER BY MAST_ISSU_NUM, RATG_DATETIME) < 12 OR --By Credit Rating
--LAG(MratingNum) OVER (ORDER BY MAST_ISSU_NUM, RATG_DATETIME) < 18
THEN NULL
ELSE CAST(MratingNum AS INT) - LAG(MratingNum) OVER (ORDER BY MAST_ISSU_NUM, RATG_DATETIME)
END AS CHG,
DATEDIFF(month, RATG_DATETIME, RATG_END_DATETIME) AS DateCHG
FROM
MOODYS_DRD.dbo.DEBT_RATG AS t1
LEFT JOIN
sandbox.dbo.RatingMap AS t2 ON t1.RATG_TXT = t2.MratingValue
WHERE
RATG_TYP_CD = 'LT'
)
select case when Cnt >= 3 then p.CHG
end SumCHG,
*
from t
outer apply ( select sum(u.CHG) CHG ,
count(*) Cnt
from t u
where u.ID = t.ID
and u.S_DATE between dateadd(month, -3,
t.S_DATE)
and t.S_DATE
) p
order by t.ID ,
t.S_DATE;

Related

Select the IDs using group by condition

I have a dataset where I need to find the diseased patients in consecutive rows.
I'll share my sample dataset with a clear explanation.
ID Normal Des1 Des2 Des3 Des4
12 0 1 0 0 0
12 1 0 1 0 0
12 1 0 1 0 0
12 1 0 1 0 0
14 0 1 0 1 0
18 1 0 0 0 0
18 1 0 0 0 0
18 1 0 0 0 0
11 0 1 0 0 0
11 0 1 0 0 0
11 0 1 0 0 0
22 1 0 0 0 0
Here I specified the Diseased list of the dataset. I required the IDs for those who are in the same Disease in all the period.
Assume that I need an output for Patients who never fall in any Diseased criteria(IDs 18, 22) I stored it as a new set(Undiseased), Later I need to get the same model for Des1 patients (IDs 11). I tried the below code to fetch the data. but It returns partial output.
select ID from tablename where
(normal = '1' and Des1 = '0' and Des2 = '0' and Des3 = '0' and Des4 = '0')
group by ID
You can try the below query using COUNT (Transact-SQL)
function.
Create table MySampleTable (Id int, Des1 int, Des2 int, Des3 int)
insert into MySampleTable Values
(12, 0, 1, 0),
(12, 1, 0, 1),
(12, 1, 0, 1),
(18, 1, 0, 0),
(18, 1, 0, 0),
(11, 0, 1, 0),
(11, 0, 1, 0)
; with cte as (Select Id
, Count(distinct Des1) as TotDes1
, Count(distinct Des2) as TotDes2
, Count(distinct Des3) as TotDes3
from MySampleTable
group by Id
)
Select Id from cte where TotDes1 = 1
and TotDes2 = 1 and TotDes3 = 1
It looks like as shown below with the output.
Here is the live db<>fiddle demo.
You can also use the having clause as shown in the query below.
Select Id
/*
, Count(distinct Des1) as TotDes1
, Count(distinct Des2) as TotDes2
, Count(distinct Des3) as TotDes3
*/
from MySampleTable
group by Id
having Count(distinct Des1) = 1 and Count(distinct Des2) = 1
and Count(distinct Des3) = 1
Demo on db<>fiddle
You can achieve it in this simple way
;WITH cte_TempTable AS(
Select DISTINCT Id, Des1, Des2, Des3
from MySampleTable
)
SELECT Id
FROM cte_TempTable
GROUP BY Id
HAVING COUNT(Id) = 1
Output
You can use use apply :
select t.id
from table t cross apply
( values (Des1, 'Des1'), (Des2, 'Des2'), (Des3, 'Des3'), (Des4, 'Des4')
) tt(DiseasFlag, DiseasName)
where DiseasFlag = 1
group by t.id
having count(distinct DiseasName) = 1;

Query to identify contiguous ranges

I'm trying to write a query on the below data set to add a new column which has some sort of "period_id_group".
contiguous new_period row_nr new_period_starting_id
0 0 1 0
1 1 2 2
1 0 3 0
1 0 4 0
1 1 5 5
1 0 6 0
What I'm trying to get is:
contiguous new_period row_nr new_period_starting_id period_id_group
0 0 1 0 0
1 1 2 2 2
1 0 3 0 2
1 0 4 0 2
1 1 5 5 5
1 0 6 0 5
The logic is that for each 0 value in the new_period_starting_id, it has to get the >0 value from the row above.
So, for row_nr = 1 since there is no row before it, period_id_group is 0.
For row_nr = 2 since this is a new perid (marked by new_period = 1), the period_id_group is 2 (the id of this row).
For row_nr = 3 since it's part of a contiguous range (because contiguous = 1), but is not the start of the range, because it's not a new_period (new_period = 0), its period_id_group should inherit the value from the previous row (which is the start of the contiguous range) - in this case period_id_group = 2 also.
I've tried multiple versions but couldn't get a good solution for SQL Server 2008R2, since I can't use LAG().
What I have, so far, is a shameful:
select *
from #temp2 t1
left join (select distinct new_period_starting_id from #temp2) t2
on t1.new_period_starting_id >= t2.new_period_starting_id
where 1 = case
when contiguous = 0
then 1
when contiguous = 1 and t2.new_period_starting_id > 0
then 1
else 1
end
order by t1.rn
Sample data script:
declare #tmp2 table (contiguous int
, new_period int
, row_nr int
, new_period_starting_id int);
insert into #tmp2 values (0, 0, 1, 0)
, (1, 1, 2, 2)
, (1, 0, 3, 0)
, (1, 0, 4, 0)
, (1, 1, 5, 5)
, (1, 0, 6, 0);
Any help is appreciated.
So, if I'm understanding you correctly, you just need one additional column.
SELECT t1.contiguous, t1.new_period, t1.row_nr, t1.new_period_starting_id,
(SELECT TOP 1 (new_period_starting_id)
FROM YourTable t2
WHERE t2.row_nr <= t1.row_nr
AND t2.period_id_group > 0 /* optimization */
ORDER BY t2.row_nr DESC /* optimization */) AS period_id_group
FROM YourTable t1
Here is yet another option for this.
select t1.contiguous
, t1.new_period
, t1.row_nr
, t1.new_period_starting_id
, x.new_period_starting_id
from #tmp2 t1
outer apply
(
select top 1 *
from #tmp2 t2
where (t2.row_nr = 1
or t2.new_period_starting_id > 0)
and t1.row_nr >= t2.row_nr
order by t2.row_nr desc
) x
Found the solution:
select *
, case
when contiguous = 0
then f1
when contiguous = 1 and new_periods = 1
then f1
when contiguous = 1 and new_periods = 0
then v
else NULL
end [period_group]
from (
select *
, (select max(f1) from #temp2 where new_period_starting_id > 0 and rn < t1.rn) [v]
from #temp2 t1
) rs
order by rn

Moving Median, Mode in T-SQL

I am using SQL Server 2012 and I know it is quite simple to calculate moving averages.
But what I need is to get the mode and the median for a defined window frame like so (with a window of 2 preceding to current row; month unique):
MONTH | CODE | MEDIAN | MODE
1 0 0 0
2 3 1.5 0
3 2 2 0
4 2 2 2
5 2 2 2
6 5 2 2
7 3 3 2
If several values qualify as mode, than pick the first.
I commented my code thoroughly. Read my comments on my Mode calculations and let me know it needs tweaking. Overall, it's a relatively simple query. It just has a lot of ugly subqueries and it has a lot of comments. Check it out:
DECLARE #Table TABLE ([Month] INT,[Code] INT);
INSERT INTO #Table
VALUES (1,0),
(2,3),
(3,2),
(4,2), --Try commenting this out to test my special mode thingymajig
(5,2),
(6,5),
(7,3);
WITH CTE
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY [Month]) row_num,
[Month],
CAST(Code AS FLOAT) Code
FROM #Table
)
SELECT [Month],
Code,
ISNULL((
SELECT CASE
--When there is only one previous value at row_num = 2, find Mean of first two codes
WHEN A.row_num = 2 THEN (LAG(B.code,1) OVER (ORDER BY [Code]) + B.Code)/2.0
--Else find middle code value of current and previous two rows
ELSE B.Code
END
FROM CTE B
--How subquery relates to outer query
WHERE B.row_num BETWEEN A.row_num - 2 AND A.row_num
ORDER BY B.[Code]
--Order by code and offset by 1 so don't select the lowest value, but fetch the one above the lowest value
OFFSET 1 ROW FETCH NEXT 1 ROW ONLY),
0) AS Median,
--I did mode a little different
--Instead of Avg(D.Code) you could list the values because with mode,
--If there's a tie with more than one of each number, you have multiple modes
--Instead of doing that, I simply return the mean of the tied modes
--When there's one, it doesn't change anything.
--If you were to delete the month 4, then your number of Codes 2 and number of Codes 3 would be the same in the last row.
--Proper mode would be 2,3. I instead average them out to be 2.5.
ISNULL((
SELECT AVG(D.Code)
FROM (
SELECT C.Code,
COUNT(*) cnt,
DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) dnse_rank
FROM CTE C
WHERE C.row_num <= A.row_num
GROUP BY C.Code
HAVING COUNT(*) > 1) D
WHERE D.dnse_rank = 1),
0) AS Mode
FROM CTE A
Results:
Month Code Median Mode
----------- ---------------------- ---------------------- ----------------------
1 0 0 0
2 3 1.5 0
3 2 2 0
4 2 2 2
5 2 2 2
6 5 2 2
7 3 3 2
If I understood your requirements correctly, your source table contains MONTH and CODE columns, and you want to calculate MEDIAN and MODE.
The query below calculates MEDIAN and MODE with moving window <= than 3 month ("2 preceding to current row") and returns the results matching your example.
-----------------------------------------------------
--Demo data
-----------------------------------------------------
CREATE TABLE #Data(
[Month] INT NOT NULL,
[Code] INT NOT NULL,
CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED
(
[Month] ASC
));
INSERT #Data
([Month],[Code])
VALUES
(1,0),
(2,3),
(3,2),
(4,2),
(5,2),
(6,5),
(7,3);
-----------------------------------------------------
--Query
-----------------------------------------------------
DECLARE #PrecedingRowsLimit INT = 2;
WITH [MPos] AS
(
SELECT [R].[Month]
, [RB].[Month] AS [SubId]
, [RB].[Code]
, ROW_NUMBER() OVER(PARTITION BY [R].[Month] ORDER BY [RB].[Code]) AS [RowNumberInPartition]
, CASE
WHEN [R].[Count] % 2 = 1 THEN ([R].[Count] + 1) / 2
ELSE NULL
END AS [MedianPosition]
, CASE
WHEN [R].[Count] % 2 = 0 THEN [R].[Count] / 2
ELSE NULL
END AS [MedianPosition1]
, CASE
WHEN [R].[Count] % 2 = 0 THEN [R].[Count] / 2 + 1
ELSE NULL
END AS [MedianPosition2]
FROM
(
SELECT [RC].[Month]
, [RC].[RowNumber]
, CASE WHEN [RC].[Count] > #PrecedingRowsLimit + 1 THEN #PrecedingRowsLimit + 1 ELSE [RC].[Count] END AS [Count]
FROM
(
SELECT [Month]
, ROW_NUMBER() OVER(ORDER BY [Month]) AS [RowNumber]
, ROW_NUMBER() OVER(ORDER BY [Month]) AS [Count]
FROM #Data
) [RC]
) [R]
INNER JOIN #Data [RB]
ON [R].[Month] >= [RB].[Month]
AND [RB].[Month] >= [R].[RowNumber] - #PrecedingRowsLimit
)
SELECT DISTINCT [M].[Month]
, [ORIG].[Code]
, COALESCE([ME].[Code],([M1].[Code] + [M2].[Code]) / 2.0) AS [Median]
, [MOD].[Mode]
FROM [MPos] [M]
LEFT JOIN [MPOS] [ME]
ON [M].[Month] = [ME].[Month]
AND [M].[MedianPosition] = [ME].[RowNumberInPartition]
LEFT JOIN [MPOS] [M1]
ON [M].[Month] = [M1].[Month]
AND [M].[MedianPosition1] = [M1].[RowNumberInPartition]
LEFT JOIN [MPOS] [M2]
ON [M].[Month] = [M2].[Month]
AND [M].[MedianPosition2] = [M2].[RowNumberInPartition]
INNER JOIN
(
SELECT [MG].[Month]
, FIRST_VALUE([MG].[Code]) OVER (PARTITION BY [MG].[Month] ORDER BY [MG].[Count] DESC , [MG].[SubId] ASC) AS [Mode]
FROM
(
SELECT [Month] , MIN([SubId]) AS [SubId], [Code] , COUNT(1) AS [Count]
FROM [MPOS]
GROUP BY [Month] , [Code]
) [MG]
) [MOD]
ON [M].[Month] = [MOD].[Month]
INNER JOIN #Data [ORIG]
ON [ORIG].[Month] = [M].[Month]
ORDER BY [M].[Month];

how to display different data sequentially date of status 1 first in sql server

how to display different data sequentially date of status 1 first
result like this image
First creating the test data:
DECLARE #mydate TABLE
(
date_time DATETIME,
statusid int
)
INSERT INTO #mydate
( date_time, statusid )
VALUES
('02/25/2015 12:09:00', 0),
('02/25/2015 12:10:00', 0),
('02/25/2015 12:11:00', 0),
('02/25/2015 12:12:00', 1),
('02/25/2015 12:13:00', 1),
('02/25/2015 12:14:00', 0),
('02/25/2015 12:15:00', 0),
('02/25/2015 12:16:00', 1),
('02/25/2015 12:17:00', 1),
('02/25/2015 12:18:00', 1),
('02/25/2015 12:19:00', 1),
('02/25/2015 12:20:00', 0),
('02/25/2015 12:21:00', 0);
Lets find out what status is before each record
; WITH StatusRecs AS
(
SELECT
m.date_time,
m.statusid,
LAG(m.statusid) OVER (ORDER BY date_time) AS prev_status
FROM #mydate m
)
Now going to pull all of the status 1 records that are different from the prev status to find the beginning of each set, we are also going to pull the date of the next status = 1 record in next_start_date
,StartStatus AS
(
SELECT
s.date_time,
s.statusid,
LEAD(s.date_time) OVER (ORDER BY s.date_time) AS next_start_date
FROM StatusRecs s
WHERE s.statusid != ISNULL(s.prev_status, -1)
AND s.statusid = 1
)
Now lets pull it all together to get the last 0 status record before the next status 1 record
,MyRecs AS
(
SELECT * FROM StartStatus ss
OUTER APPLY
(
SELECT TOP 1 sr.date_time AS date_time2, sr.statusid AS statusid2
FROM StatusRecs sr
WHERE sr.date_time > ss.date_time
AND (sr.date_time < ss.next_start_date OR ss.next_start_date IS NULL)
ORDER BY sr.date_time DESC
) m
)
Now we format and output the table
SELECT m.date_time, m.statusid, m.date_time2, m.statusid2, DATEDIFF(MINUTE, m.date_time, m.date_time2) AS duration FROM MyRecs m
date_time statusid date_time2 statusid2 duration
2015-02-25 12:12:00.000 1 2015-02-25 12:15:00.000 0 3
2015-02-25 12:16:00.000 1 2015-02-25 12:21:00.000 0 5
this is an ugly example due to all the table read on the CTE but it might help in some way.. didn't have a SQLfiddle to test it out... try set one up for your questions, always helps.
with myTableNumbered (myOrder,date_time,status) AS (
select
ROW_NUMBER() OVER (ORDER BY date_time) myOrder,
date_time,
status
from myTable
)
select
m.date_time,
m.status,
next.date_time datetime_2,
next.status status2,
CONVERT(VARCHAR(10),datediff(minute,m.date_time,next.date_time))+'minut'duration
from myTableNumbered m
OUTER APPLY (
select TOP 1
next.date_time,
next.status
from myTableNumbered next
where
next.myOrder > m.myOrder and
next.status = 0 and
ISNULL((select status from myTableNumbered prev where prev.myOrder-1 = next.myOrder),1) = 1
order by next.date_time asc
) next
where
m.status = 1 and
ISNULL((select status from myTableNumbered prev where prev.myOrder+1 = m.myOrder),0) = 0

regarding updating rows in table without using loop

I have table in which I have column called quantity also I have 10 rows which having same column value 200(it can be any value)
Requirement is: if a input value is x=500(or anynumber) then this value should be compared with quantity column value in a fasion below:
if 1 row's quantity is 200 then it should subtract it form 500 and x should be updated to 300 and quantity of that row should be made 0 then It should move to next row till x is 0
could you please help me write sql query for this...
it is ask that loops should not be used.
thanks,
What is the version of SQL Server? If it's 2012 or 2014, you can use the following:
DECLARE #x int = 500
;WITH cte_sum AS
(
SELECT quantity,
ISNULL(SUM(quantity) OVER (ORDER BY (SELECT NULL) ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) sum_running_before,
SUM(quantity) OVER (ORDER BY (SELECT NULL) ROWS UNBOUNDED PRECEDING) sum_running_total
FROM YourTable
)
UPDATE cte_sum
SET quantity = CASE
WHEN quantity >= #x - sum_running_before THEN
quantity - (#x - sum_running_before)
ELSE 0
END
WHERE (#x >= sum_running_total OR (#x < sum_running_total AND sum_running_before < #x))
It's a bit more tricky to get running totals in earlier versions but I think you got the main idea.
DECLARE #YourTable TABLE
(
CustId INT,
Quantity INT
)
INSERT INTO #YourTable
( CustId, Quantity )
VALUES
( 1, 10 ),
( 1, 10 ),
( 1, 10 ),
( 1, 10 ),
( 2, 20 ),
( 2, 20 ),
( 2, 20 ),
( 2, 20 );
;WITH cte_sum AS
(
SELECT
y.CustId,
y.Quantity,
ROW_NUMBER() OVER (PARTITION BY CustId ORDER BY Quantity) RN
FROM #YourTable y
)
SELECT s1.CustId, s1.Quantity, s2.Qty, s1.Quantity + ISNULL(s2.Qty, 0) RunningTotal, s1.RN
FROM cte_sum s1
OUTER APPLY
(
SELECT SUM(Quantity) Qty FROM cte_sum s2
WHERE s2.CustId = s1.CustId
AND s2.RN < s1.RN
) s2
ORDER BY s1.CustId, s1.RN
Here's an example of a running total that will work for Sql Server 2005+
This is the output:
CustId Quantity Qty RunningTotal RN
1 10 NULL 10 1
1 10 10 20 2
1 10 20 30 3
1 10 30 40 4
2 20 NULL 20 1
2 20 20 40 2
2 20 40 60 3
2 20 60 80 4

Resources