How to calculate weighted average balance in sql between two months - sql-server

How can I calculate WtdAvgBal in SQL for two months that are not quarter end months (March,June,September,December) and append values in the dataset.
Dataset:
Date
ID
AvgBal
Period
7/31/2020
1
50
M
7/31/2020
2
75
M
7/31/2020
3
50
M
7/31/2020
4
50
M
7/31/2020
5
50
M
8/31/2020
1
55
M
8/31/2020
2
99
M
8/31/2020
3
80
M
8/31/2020
4
70
M
8/31/2020
5
90
M
Total days in the two periods: 62 days (31 in July, 31 in August)
WtdAvgBal calculation steps that is done in excel.
create a new column to get Bal * day value for each row.
Date
ID
Bal
Period
Bal x Day
7/31/2020
1
50
M
1550
Add total days in the two months observed. (eg. 62 day in July and August)
Total Days: 62
Calculate WtdAvg using excel's sumif formula in the AvgBal column and apply it to the most recent month in the dataset.
=sumifs(#Bal x Day, #ID, ID) / 62
Avg bal output Output:
Date
ID
AvgBal
Period
8/31/2020
1
52.5
X
8/31/2020
2
87
X
8/31/2020
3
65
X
8/31/2020
4
60
X
8/31/2020
5
70
X
Append the new period X into the dataset
Final Output:
Date
ID
AvgBal
Period
7/31/2020
1
50
M
7/31/2020
2
75
M
7/31/2020
3
50
M
7/31/2020
4
50
M
7/31/2020
5
50
M
8/31/2020
1
55
M
8/31/2020
2
99
M
8/31/2020
3
80
M
8/31/2020
4
70
M
8/31/2020
5
90
M
8/31/2020
1
52.5
X
8/31/2020
2
87
X
8/31/2020
3
65
X
8/31/2020
4
60
X
8/31/2020
5
70
X

Following GROUP BY query will gives you the required value for 'X'
select Date = max(Date),
ID,
AvgBal = sum(AvgBal * datepart(day, Date) * 1.0) / sum(datepart(day, Date)),
Period = 'X'
from Dataset
group by ID

Related

Update previous rows based on another row

I have a question and i need to update the empty rows based on the rows with value
In this case, i need to update the hours, mins and secs based on every 4th row
For ex: rownum 4 has 8hours, 1mins, 9 sec.
So, my update to previous row should be 8hrs, 1min, 6 sec from rownum 1, then, for rownum 5 it should continue the same procedure
See rownum 8 has 8hours, 1mins, 13 sec.
The previous 3 rows should be 8hrs, 1min, 10 sec from rownum 5
How to have this in a loop or with partition by or any suggestion in SQL server.
You can do this with window functions and converting your hours, minutes and seconds to a time value. Converting to a tim is important to make sure you wrap around the appropriate time boundaries and don't end up with 61 seconds in a minute etc.
Depending on the data and your real world environment you will probably need to add the Flight and maybe some other columns into the partition bys to ensure you are working correctly scoped windows of data.
Query
declare #t table(rn int,timeframe int,h int,m int,s int);
insert into #t values
(1,1,null,null,null)
,(2,1,null,null,null)
,(3,1,null,null,null)
,(4,1,23,59,45)
,(5,2,null,null,null)
,(6,2,null,null,null)
,(7,2,null,null,null)
,(8,2,23,59,49)
,(9,3,null,null,null)
,(10,3,null,null,null)
,(11,3,null,null,null)
,(12,3,23,59,53)
,(13,4,null,null,null)
,(14,4,null,null,null)
,(15,4,null,null,null)
,(16,4,23,59,57)
,(17,5,null,null,null)
,(18,5,null,null,null)
,(19,5,null,null,null)
,(20,5,0,0,1)
,(21,6,null,null,null)
,(22,6,null,null,null)
,(23,6,null,null,null)
,(24,6,0,0,5)
;
with d as
(
select rn
,timeframe
,dateadd(second
,rn - max(rn) over (partition by timeframe)
,max(timefromparts(h,m,s,0,0)) over (partition by timeframe)
) as t
from #t
)
select rn
,timeframe
,datepart(hour,t) as h
,datepart(minute,t) as m
,datepart(second,t) as s
from d
order by rn;
Output
rn
timeframe
h
m
s
1
1
23
59
42
2
1
23
59
43
3
1
23
59
44
4
1
23
59
45
5
2
23
59
46
6
2
23
59
47
7
2
23
59
48
8
2
23
59
49
9
3
23
59
50
10
3
23
59
51
11
3
23
59
52
12
3
23
59
53
13
4
23
59
54
14
4
23
59
55
15
4
23
59
56
16
4
23
59
57
17
5
23
59
58
18
5
23
59
59
19
5
0
0
0
20
5
0
0
1
21
6
0
0
2
22
6
0
0
3
23
6
0
0
4
24
6
0
0
5

First-In-First-Out Stock trading - calculate cumulative P/L

I want a sql-server query to calculate cumulative P/L on stock trading (FIFO based calculation).
Input table :
EXECTIME
share_name
Quantity
Price
Buy/Sell
2013-01-01 12:25
abc
100
100
B
2013-01-01 12:26
abc
10
102
S
2013-01-01 12:27
abc
10
102
S
2013-01-01 12:28
abc
10
95
S
2013-01-01 12:29
abc
10
99
S
2013-01-01 12:30
abc
10
105
S
2013-01-01 12:31
abc
100
102
B
2013-01-01 12:32
abc
150
101
S
OUTPUT :
EXECTIME
Cumualative P/L
Winning Streak
Lossing Streak
2013-01-01 12:26
20
1
0
2013-01-01 12:27
40
1
0
2013-01-01 12:28
-10
0
1
2013-01-01 12:29
-20
0
2
2013-01-01 12:30
30
1
0
2013-01-01 12:32
-20
0
1
Explanation :
1st row - 10 shares sold at 102 which were purchased at 100. So profit = (102-100) * 10 = 20
6th row - 150 shares sold at 101,
50 were purchased at 100 - 1st row( 50 already sold above, 50 left)
100 were purchaed at 102 - 7th row
150 * 101 - [(50 * 100)+(100 * 102)] = -50
cumaltive p/l = 30 + (-50) = -20
Winning streak - 1 for positive
Lossing streak - 1,2,... for continuous loss. reset again after profit

Creating calculated variables in one datastep

Any help is much appreciated. Thanks
I would like to create a couple of variables with my transactional data
I am trying to create variables 'act_bal' and 'penalty' using amount, type and Op_Bal. The rules I have are:
For the first record, the id will have op_bal and it will be
subtracted from the 'amount' for type=C and added if type=D to
calculate act_bal
For the second record onwards it is act_bal + amount for type=C and
act_bal-amount for type=D
I will add the penalty 10 only if the amount is >4 and the type=D.
The id can have only two penalties.
Total Penalty should be subtracted from the act_bal of the last
record which would become op_bal for the next day. (e.g. for id
101, -178-20=-198 will become the op_bal for 4/2/2019)
This is the data I have for two customers IDs 101 and 102 for two different dates (My actual dataset has the data for all the 30 days).
id date amount type Op_Bal
101 4/1/2019 50 C 100
101 4/1/2019 25 D
101 4/1/2019 75 D
101 4/1/2019 3 D
101 4/1/2019 75 D
101 4/1/2019 75 D
101 4/2/2019 100 C
101 4/2/2019 125 D
101 4/2/2019 150 D
102 4/1/2019 50 C 125
102 4/1/2019 125 C
102 4/2/2019 250 D
102 4/2/2019 10 D
The code I wrote is like this
data want;
set have;
by id date;
if first.id or first.date then do;
if first.id then do;
if type='C' then act_bal=Op_Bal - amount;
if type='D' then act_bal=Op_Bal + amount;
end;
else do;
retain act_bal;
if type='C' then act_bal=act_bal + amount;
if type='D' then act_bal=act_bal - amount;
if amount>4 and type='D' then do;
penalty=10;
end;
run;
I couldn't create a counter to control the penalties to 2 and could not subtract the total penalty amount from the amount of the last row. Could someone help me in receiving the desired result? Thanks
id date amount type Op_Bal act_bal penalty
101 4/1/2019 50 C 200 150 0
101 4/1/2019 25 D 125 0
101 4/1/2019 150 D -25 10
101 4/1/2019 75 D -100 10
101 4/1/2019 3 D -103 0
101 4/1/2019 75 D -178 0
101 4/2/2019 100 C -198 -98 0
101 4/2/2019 125 D -223 10
101 4/2/2019 150 D -373 10
102 4/1/2019 50 C 125 175 0
102 4/1/2019 125 C 300 0
102 4/2/2019 250 D 50 0
102 4/2/2019 10 D 40 0
A few tips:
You have the same code for incrementing act_bal in both the if and else blocks, so factor it out. Don't repeat yourself.
You can skip the retain statement if you use a sum statement.
Use a separate variable to keep track of the number of penalties triggered per day, but only apply the first two of them.
So, putting it all together:
data want;
set have;
by id date;
if first.date and not first.id then op_bal = act_bal;
if first.date then do;
act_bal = op_bal;
penalties = 0;
end;
if type='C' then act_bal + amount;
if type='D' then act_bal + (-amount);
if amount > 4 and type='D' then penalties + 1;
if last.date then act_bal + (-min(penalties,2) * 10);
run;

Calculating a column in SQL using the column's own output as input

I have problem that I find very hard to solve:
I need to calculate a column R_t in SQL where for each row, the sum of the "previous" calculated values SUM(R_t-1) is required as input. The calculation is done grouped over a ProjectID column. I have no clue how to proceed.
The formula for the calculation I am trying to achieve is R_t = ([Contract value]t - SUM(R{t-1})) / [Remaining Hours]_t * [HoursRegistered]t where "t" denotes time and SUM(R{t-1}) is the sum of R_t from t = 0 to t-1.
Time is always consecutive and always begin in t = 0. But number of time periods may differ across [ProjectID], i.e. one project having t = {0,1,2} and another t = {0,1,2,3,4,5}. The time period will never "jump" from 5 to 7
The expected output (using the data from below is) for ProjectID 101 is
R_0 = (500,000 - 0) / 500 * 65 = 65,000
R_1 = (500,000 - (65,000)) / 435 * 100 = 100,000
R_2 = (500,000 - (65,000 + 100,000)) / 335 * 85 = 85,000
R_3 = (500,000 - (65,000 + 100,000 + 85,000)) / 250 * 69 = 69,000
etc...
This calculation is done for each ProjectID.
My question is how to formulate this in a SQL query? My first thought was to create a recursive CTE, but I am actually not sure it is the right way proceed. Recursive CTE is (from my understanding) made for handling more of hierarchical like structure, which this isn't really.
My other thought was to calculate the SUM(R_t-1) using windowed functions, ie SUM OVER (PARITION BY ORDER BY) with a LAG, but the recursiveness really gives me trouble and I run my head against the wall when I am trying.
Below a query for creating the input data
CREATE TABLE [dbo].[InputForRecursiveCalculation]
(
[Time] int NULL,
ProjectID [int],
ContractValue float,
ContractHours float,
HoursRegistered float,
RemainingHours float
)
GO
INSERT INTO [dbo].[InputForRecursiveCalculation]
(
[Time]
,[ProjectID]
,[ContractValue]
,[ContractHours]
,[HoursRegistered]
,[RemainingHours]
)
VALUES
(0,101,500000,500,65,500),
(1,101,500000,500,100,435),
(2,101,500000,500,85,335),
(3,101,500000,500,69,250),
(4,101,450000,650,100,331),
(5,101,450000,650,80,231),
(6,101,450000,650,90,151),
(7,101,450000,650,45,61),
(8,101,450000,650,16,16),
(0,110,120000,90,10,90),
(1,110,120000,90,10,80),
(2,110,130000,90,10,70),
(3,110,130000,90,10,60),
(4,110,130000,90,10,50),
(5,110,130000,90,10,40),
(6,110,130000,90,10,30),
(7,110,130000,90,10,20),
(8,110,130000,90,10,10)
GO
For those of you who dare downloading something from a complete stranger, I have created an Excel file demonstrating the calculation (please download the file as you will not be to see the actual formula in the HTML representation shown when first clicking the link):
https://www.dropbox.com/s/3rxz72lbvooyc4y/Calculation%20example.xlsx?dl=0
Best regards,
Victor
I think it will be usefull for you. There is additional column SumR that stands for sumarry of previest rows (for ProjectID)
;with recu as
(
select
Time,
ProjectId,
ContractValue,
ContractHours,
HoursRegistered,
RemainingHours,
cast((ContractValue - 0)*HoursRegistered/RemainingHours as numeric(15,0)) as R,
cast((ContractValue - 0)*HoursRegistered/RemainingHours as numeric(15,0)) as SumR
from
InputForRecursiveCalculation
where
Time=0
union all
select
input.Time,
input.ProjectId,
input.ContractValue,
input.ContractHours,
input.HoursRegistered,
input.RemainingHours,
cast((input.ContractValue - prev.SumR)*input.HoursRegistered/input.RemainingHours as numeric(15,0)),
cast((input.ContractValue - prev.SumR)*input.HoursRegistered/input.RemainingHours + prev.SumR as numeric(15,0))
from
recu prev
inner join
InputForRecursiveCalculation input
on input.ProjectId = prev.ProjectId
and input.Time = prev.Time + 1
)
select
*
from
recu
order by
ProjectID,
Time
RESULTS:
Time ProjectId ContractValue ContractHours HoursRegistered RemainingHours R SumR
----------- ----------- ---------------------- ---------------------- ---------------------- ---------------------- --------------------------------------- ---------------------------------------
0 101 500000 500 65 500 65000 65000
1 101 500000 500 100 435 100000 165000
2 101 500000 500 85 335 85000 250000
3 101 500000 500 69 250 69000 319000
4 101 450000 650 100 331 39577 358577
5 101 450000 650 80 231 31662 390239
6 101 450000 650 90 151 35619 425858
7 101 450000 650 45 61 17810 443668
8 101 450000 650 16 16 6332 450000
0 110 120000 90 10 90 13333 13333
1 110 120000 90 10 80 13333 26666
2 110 130000 90 10 70 14762 41428
3 110 130000 90 10 60 14762 56190
4 110 130000 90 10 50 14762 70952
5 110 130000 90 10 40 14762 85714
6 110 130000 90 10 30 14762 100476
7 110 130000 90 10 20 14762 115238
8 110 130000 90 10 10 14762 130000

Find first and last of a unique element in a column

In a data table such as with the following format:
id Time 1 Time2 V1 V2
1 1 10 30 40
1 2 20 31 41
1 3 30 32 42
1 4 40 33 43
2 1 10 40 50
2 2 20 41 51
2 3 30 42 52
2 4 40 43 53
3 1 10 50 60
3 2 20 51 61
3 3 30 52 62
3 4 40 53 63
I want to select the two smallest and two largest variable readings of time 1 and time 2
I want to do a regression and correlation analysis of v1 and v2 using the first two and last two time readings for each unique ID
Thanks

Resources