Creating calculated variables in one datastep - loops

Any help is much appreciated. Thanks
I would like to create a couple of variables with my transactional data
I am trying to create variables 'act_bal' and 'penalty' using amount, type and Op_Bal. The rules I have are:
For the first record, the id will have op_bal and it will be
subtracted from the 'amount' for type=C and added if type=D to
calculate act_bal
For the second record onwards it is act_bal + amount for type=C and
act_bal-amount for type=D
I will add the penalty 10 only if the amount is >4 and the type=D.
The id can have only two penalties.
Total Penalty should be subtracted from the act_bal of the last
record which would become op_bal for the next day. (e.g. for id
101, -178-20=-198 will become the op_bal for 4/2/2019)
This is the data I have for two customers IDs 101 and 102 for two different dates (My actual dataset has the data for all the 30 days).
id date amount type Op_Bal
101 4/1/2019 50 C 100
101 4/1/2019 25 D
101 4/1/2019 75 D
101 4/1/2019 3 D
101 4/1/2019 75 D
101 4/1/2019 75 D
101 4/2/2019 100 C
101 4/2/2019 125 D
101 4/2/2019 150 D
102 4/1/2019 50 C 125
102 4/1/2019 125 C
102 4/2/2019 250 D
102 4/2/2019 10 D
The code I wrote is like this
data want;
set have;
by id date;
if first.id or first.date then do;
if first.id then do;
if type='C' then act_bal=Op_Bal - amount;
if type='D' then act_bal=Op_Bal + amount;
end;
else do;
retain act_bal;
if type='C' then act_bal=act_bal + amount;
if type='D' then act_bal=act_bal - amount;
if amount>4 and type='D' then do;
penalty=10;
end;
run;
I couldn't create a counter to control the penalties to 2 and could not subtract the total penalty amount from the amount of the last row. Could someone help me in receiving the desired result? Thanks
id date amount type Op_Bal act_bal penalty
101 4/1/2019 50 C 200 150 0
101 4/1/2019 25 D 125 0
101 4/1/2019 150 D -25 10
101 4/1/2019 75 D -100 10
101 4/1/2019 3 D -103 0
101 4/1/2019 75 D -178 0
101 4/2/2019 100 C -198 -98 0
101 4/2/2019 125 D -223 10
101 4/2/2019 150 D -373 10
102 4/1/2019 50 C 125 175 0
102 4/1/2019 125 C 300 0
102 4/2/2019 250 D 50 0
102 4/2/2019 10 D 40 0

A few tips:
You have the same code for incrementing act_bal in both the if and else blocks, so factor it out. Don't repeat yourself.
You can skip the retain statement if you use a sum statement.
Use a separate variable to keep track of the number of penalties triggered per day, but only apply the first two of them.
So, putting it all together:
data want;
set have;
by id date;
if first.date and not first.id then op_bal = act_bal;
if first.date then do;
act_bal = op_bal;
penalties = 0;
end;
if type='C' then act_bal + amount;
if type='D' then act_bal + (-amount);
if amount > 4 and type='D' then penalties + 1;
if last.date then act_bal + (-min(penalties,2) * 10);
run;

Related

How would you rewrite this mysql syntax using Snowflake

ifnull(min(r.sold_date),IF('xxx' in ('xx','xxx','xxxx') and r.refill_status = 'Sold',r.refill_request_date,p.latest_sold_date)) as sold_date
The part where I am having the most trouble is the min(r.sold_date). I need sold_date as part of the output so I can't put it in the main select statement because you can't aggregate a min, max, etc in your group by
you you have
a
b
c
1
10
100
2
10
101
3
20
102
4
20
103
and you want the min(c) for all same b but you also want to get all 4 four rows out. Thus if you want
a
b
c
min_c
1
10
100
100
2
10
101
100
3
20
102
102
4
20
103
102
Then you want the WINDOW FUCNTION version of MIN which looks like:
SELECT
a,
b,
c,
MIN(c) OVER (partition by b) as min_c
FROM VALUES
(1,10,100),
(2,10,101),
(3,20,102),
(4,20,103)
v(a,b,c);

First-In-First-Out Stock trading - calculate cumulative P/L

I want a sql-server query to calculate cumulative P/L on stock trading (FIFO based calculation).
Input table :
EXECTIME
share_name
Quantity
Price
Buy/Sell
2013-01-01 12:25
abc
100
100
B
2013-01-01 12:26
abc
10
102
S
2013-01-01 12:27
abc
10
102
S
2013-01-01 12:28
abc
10
95
S
2013-01-01 12:29
abc
10
99
S
2013-01-01 12:30
abc
10
105
S
2013-01-01 12:31
abc
100
102
B
2013-01-01 12:32
abc
150
101
S
OUTPUT :
EXECTIME
Cumualative P/L
Winning Streak
Lossing Streak
2013-01-01 12:26
20
1
0
2013-01-01 12:27
40
1
0
2013-01-01 12:28
-10
0
1
2013-01-01 12:29
-20
0
2
2013-01-01 12:30
30
1
0
2013-01-01 12:32
-20
0
1
Explanation :
1st row - 10 shares sold at 102 which were purchased at 100. So profit = (102-100) * 10 = 20
6th row - 150 shares sold at 101,
50 were purchased at 100 - 1st row( 50 already sold above, 50 left)
100 were purchaed at 102 - 7th row
150 * 101 - [(50 * 100)+(100 * 102)] = -50
cumaltive p/l = 30 + (-50) = -20
Winning streak - 1 for positive
Lossing streak - 1,2,... for continuous loss. reset again after profit

Calculating a column in SQL using the column's own output as input

I have problem that I find very hard to solve:
I need to calculate a column R_t in SQL where for each row, the sum of the "previous" calculated values SUM(R_t-1) is required as input. The calculation is done grouped over a ProjectID column. I have no clue how to proceed.
The formula for the calculation I am trying to achieve is R_t = ([Contract value]t - SUM(R{t-1})) / [Remaining Hours]_t * [HoursRegistered]t where "t" denotes time and SUM(R{t-1}) is the sum of R_t from t = 0 to t-1.
Time is always consecutive and always begin in t = 0. But number of time periods may differ across [ProjectID], i.e. one project having t = {0,1,2} and another t = {0,1,2,3,4,5}. The time period will never "jump" from 5 to 7
The expected output (using the data from below is) for ProjectID 101 is
R_0 = (500,000 - 0) / 500 * 65 = 65,000
R_1 = (500,000 - (65,000)) / 435 * 100 = 100,000
R_2 = (500,000 - (65,000 + 100,000)) / 335 * 85 = 85,000
R_3 = (500,000 - (65,000 + 100,000 + 85,000)) / 250 * 69 = 69,000
etc...
This calculation is done for each ProjectID.
My question is how to formulate this in a SQL query? My first thought was to create a recursive CTE, but I am actually not sure it is the right way proceed. Recursive CTE is (from my understanding) made for handling more of hierarchical like structure, which this isn't really.
My other thought was to calculate the SUM(R_t-1) using windowed functions, ie SUM OVER (PARITION BY ORDER BY) with a LAG, but the recursiveness really gives me trouble and I run my head against the wall when I am trying.
Below a query for creating the input data
CREATE TABLE [dbo].[InputForRecursiveCalculation]
(
[Time] int NULL,
ProjectID [int],
ContractValue float,
ContractHours float,
HoursRegistered float,
RemainingHours float
)
GO
INSERT INTO [dbo].[InputForRecursiveCalculation]
(
[Time]
,[ProjectID]
,[ContractValue]
,[ContractHours]
,[HoursRegistered]
,[RemainingHours]
)
VALUES
(0,101,500000,500,65,500),
(1,101,500000,500,100,435),
(2,101,500000,500,85,335),
(3,101,500000,500,69,250),
(4,101,450000,650,100,331),
(5,101,450000,650,80,231),
(6,101,450000,650,90,151),
(7,101,450000,650,45,61),
(8,101,450000,650,16,16),
(0,110,120000,90,10,90),
(1,110,120000,90,10,80),
(2,110,130000,90,10,70),
(3,110,130000,90,10,60),
(4,110,130000,90,10,50),
(5,110,130000,90,10,40),
(6,110,130000,90,10,30),
(7,110,130000,90,10,20),
(8,110,130000,90,10,10)
GO
For those of you who dare downloading something from a complete stranger, I have created an Excel file demonstrating the calculation (please download the file as you will not be to see the actual formula in the HTML representation shown when first clicking the link):
https://www.dropbox.com/s/3rxz72lbvooyc4y/Calculation%20example.xlsx?dl=0
Best regards,
Victor
I think it will be usefull for you. There is additional column SumR that stands for sumarry of previest rows (for ProjectID)
;with recu as
(
select
Time,
ProjectId,
ContractValue,
ContractHours,
HoursRegistered,
RemainingHours,
cast((ContractValue - 0)*HoursRegistered/RemainingHours as numeric(15,0)) as R,
cast((ContractValue - 0)*HoursRegistered/RemainingHours as numeric(15,0)) as SumR
from
InputForRecursiveCalculation
where
Time=0
union all
select
input.Time,
input.ProjectId,
input.ContractValue,
input.ContractHours,
input.HoursRegistered,
input.RemainingHours,
cast((input.ContractValue - prev.SumR)*input.HoursRegistered/input.RemainingHours as numeric(15,0)),
cast((input.ContractValue - prev.SumR)*input.HoursRegistered/input.RemainingHours + prev.SumR as numeric(15,0))
from
recu prev
inner join
InputForRecursiveCalculation input
on input.ProjectId = prev.ProjectId
and input.Time = prev.Time + 1
)
select
*
from
recu
order by
ProjectID,
Time
RESULTS:
Time ProjectId ContractValue ContractHours HoursRegistered RemainingHours R SumR
----------- ----------- ---------------------- ---------------------- ---------------------- ---------------------- --------------------------------------- ---------------------------------------
0 101 500000 500 65 500 65000 65000
1 101 500000 500 100 435 100000 165000
2 101 500000 500 85 335 85000 250000
3 101 500000 500 69 250 69000 319000
4 101 450000 650 100 331 39577 358577
5 101 450000 650 80 231 31662 390239
6 101 450000 650 90 151 35619 425858
7 101 450000 650 45 61 17810 443668
8 101 450000 650 16 16 6332 450000
0 110 120000 90 10 90 13333 13333
1 110 120000 90 10 80 13333 26666
2 110 130000 90 10 70 14762 41428
3 110 130000 90 10 60 14762 56190
4 110 130000 90 10 50 14762 70952
5 110 130000 90 10 40 14762 85714
6 110 130000 90 10 30 14762 100476
7 110 130000 90 10 20 14762 115238
8 110 130000 90 10 10 14762 130000

In SAS, how to do a loop INSIDE a group using the following characteristics

I have the following problem:
data example;
input channel $ program $ item1 item2 GOAL1 GOAL2;
datalines;
CS A 100 10 100 10
CS A 101 9 100 9
CS B 102 11 102 11
CS B 101 14 101 11
BD A 200 210 200 210
BD A 201 209 200 209
BD B 202 211 202 211
BD B 201 214 201 214
;
run;
First, I need to notice that operations are going to be performed on channel-program level.
Second, a third variable call THIRD equals item1 in its first entry by group. However, in the second entry of third it will vary: if item1_entry1
data poli;
set poli;
by channel program;
array prog{*} A B; /*IN my original data I have 3 programs, so the solution has to be general*/
third=item1; /*So the first entry of item1 will be equal in third*/
do k=1 to dim(prog);
if program=prog{k} then do;
if lag(item1)<lag(item1) then THIRD=lag(item1)
else THIRD=item1;
end;
end;
run;
As expected the code does not give me what I want.
Specifically THIRD and FOURTH should be equal to the variables GOAL1 and GOAL 2.
NOTE: The idea behind the comparison is that always the higher levels are going to be greater or equal than the lower levels, and the lower levels cannot be greater than the upper levels: I can't have 100 and then 101, it should be 100 and 100 for one group.
Your description is a bit unclear. I think you want to compute a new variable want1, which is set to the value of item1 at the beginning of each by group. And within a by group, it decreases if item1 decreases, else stays the same. I would try (untested):
data want;
set example;
by channel program;
retain want1;
if first.program then want1=item1;
else want1=min(want1,item1);
run;

How to populate binary matrices with all the combinations?

I want to have 2^n matrices with all the combinations of 0 and 1 in them. For example, for n=6 (n=#rows x #columns) array{1}=[0 0 0; 0 0 0],array{2}=[0 0 0; 0 0 1]... array{64}=[1 1 1;1 1 1]. I am using MATLAB and I came across with combn.m (M = COMBN(V,N) returns all combinations of N elements of the elements in vector V. M has the size (length(V).^N)-by-N.), dec2bin() but I can't get it quite right. Another idea of mine was to create a large matrix and then split it into 2^n matrices. For instance,for n=6( 2 x 3), i did this M=combn([0 1],3) which gives me:
M =
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
Then, use this M to create a larger matrix like this M2=combn(M,2), but this produces the wrong results. However, if i concatenate M row like this:
M=combn([000;010;100;001;110;011;101;111],2)' I get something closer to what I expect i.e
M =
Columns 1 through 21
0 0 0 0 0 0 0 0 10 10 10 10 10 10 10 10 100 100 100 100 100
0 10 100 1 110 11 101 111 0 10 100 1 110 11 101 111 0 10 100 1 110
Columns 22 through 42
100 100 100 1 1 1 1 1 1 1 1 110 110 110 110 110 110 110 110 11 11
11 101 111 0 10 100 1 110 11 101 111 0 10 100 1 110 11 101 111 0 10
Columns 43 through 63
11 11 11 11 11 11 101 101 101 101 101 101 101 101 111 111 111 111 111 111 111
100 1 110 11 101 111 0 10 100 1 110 11 101 111 0 10 100 1 110 11 101
Column 64
111
111
where I can get each column and convert it separately into 64 matrices.So, for example column 1 would be converted from [0;0] to [0 0 0;0 0 0] etc. However, i believe it is a much easier problem which it can be solved in less time, elegantly.
Using dec2bin:
r = 2; %// nunber of rows
c = 3; %// number of columns
M = dec2bin(0:2^(r*c)-1)-'0'; %// Or: M = de2bi(0:2^(r*c)-1);
M = reshape(M.',r,c,[]);
M is a 3D-array of size r x c x 2^(r*c), such that M(:,:,1) is the first matrix, M(:,:,2) is the second etc.
How it works:
dec2bin gives a binary string representation of a number. So dec2bin(0:2^(r*c)-1) gives all numbers from 0 to 2^(r*c)-1 expressed in binary, each in one row. The -'0' part just turns the string into a numeric vector of 0 and 1 values. Then reshape puts each of those rows into a r x c form, to make up each of the the desired matrices.

Resources