Embedded Implicit Loops in SAS with two Set Statements - loops

I have the below code to track specific manufacturer coupon issuings. Each issue can have dollars in the redeemed, expired and remaining fields. I have an ISSUE table that gives all of the issuings and amounts and the REDEMPTIONS table that is an accounting of amounts redeemed or expired. The objective is to track all coupon dollars for each issue by putting the appropriate amounts in each category.
This code is supposed to loop through the ISSUE table and tie matching records from the REDEMPTIONS table to each record in the ISSUE table.
/*create sample tables*/
data ISSUE; input
Coupon_NBR $ AMOUNT REDEEMED EXPIRED REMAINING; datalines;
A 500 0 0 500
A 500 0 0 500
B 500 0 0 500
B 500 0 0 500
B 500 0 0 500
B 1250 0 0 1250
B 750 0 0 750
C 500 0 0 500
C 500 0 0 500
C 500 0 0 500
C 500 0 0 500
C 500 0 0 500
run;
data REDEMPTIONS; input
Redemp_coupon_NBR $ TRANS_AMOUNT TYPE $16.; datalines;
A -150 REDEMPTION
A -350 REDEMPTION
A -200 EXPIRATION
B -300 REDEMPTION
B -200 EXPIRATION
B -1000 REDEMPTION
C -1500 REDEMPTION
C -500 EXPIRATION
run;
/*begin looping code*/
data Tracking;
if _n_ = 1 then Link get_redemptions;
set issue;
if (remaining > 0) and (Coupon_NBR = Redemp_coupon_NBR) then do;
if trans_amount = 0 then
link get_redemptions;
if trans_amount + remaining >=0 then do;
remaining = remaining + trans_amount;
if type = 'EXPIRATION' then;
expired = expired - trans_amount;
if type = 'REDEMPTION' then;
redeemed = redeemed - trans_amount;
link get_redemptions;
end;
else do;
remaining = 0;
if type = 'EXPIRATION' then do;
expired = expired - trans_amount;
end;
else do;
redeemed = redeemed - trans_amount;
trans_amount = trans_amount + remaining;
remaining = 0;
end;
end;
end;
else do;
link get_redemptions;
end;
return;
get_redemptions:
set redemptions;
return;
run;
This is the output I'm getting:
Coupon_NBR AMOUNT REDEEMED EXPIRED REMAINING redemp_coupon_nbr trans_amount type
A 500 150 150 350 A -350 REDEMPTION
A 500 350 350 150 A -200 EXPIRATION
B 500 0 0 500 B -300 REDEMPTION
B 500 300 300 200 B -200 EXPIRATION
B 500 200 200 300 B -1000 REDEMPTION
B 1250 1000 1000 250 C -1500 REDEMPTION
B 750 0 0 750 C -500 EXPIRATION
In this example, the correct output is:
redemp_coupon_nbr AMOUNT REDEEMED EXPIRED REMAINING
A 500 500 0 0
A 500 0 200 300
B 500 300 200 0
B 500 500 0 0
B 500 500 0 0
B 1250 0 0 1250
B 750 0 0 750
C 500 500 0 0
C 500 500 0 0
C 500 500 0 0
C 500 0 500 0
C 500 0 0 500
Obviously my result is far from where I want it to be. My main concern, however, is that the output only has seven rows, when I want it track every coupon issue, which means I need it to have 12 rows (one for each row in the ISSUE table). There is some kind of a problem with my loop I think specifically in the Get Redemptions definition. I've been debugging for a while without success.

Jarom:
A robust solution requires a transactional ledgering approach in order to properly deal with the alternatives of fetching multiple redemptions per coupon and tracking overages, and applying overages to a coupon before fetching additional redemptions.
The following sample code has numerous put statements so you can observe the algorithm decision points in the log. The balance goal is to approach zero (from above or below) at each transaction reconciliation, and track any portions that go beyond the goal.
For the sample these variable names substitutions were done with respect to your data.
coupon_nbr -> G
redemp_coupon_nbr -> XG
trans_amount -> XAMOUNT
You were correct in needing a LINK to fetch the redemptions.
The group wise processing is further facilitated by adding BY XG and END=.
Tests for the redemptions end= variable prevent a premature halting of the data step (which would occur if a subsequent unconditional set is reached after a data sets last record is read).
data reconciliation (keep=G AMOUNT REDEEMED EXPIRED REMAINING EXCESS APP_COUNT _redem_bal _expir_bal fetch_sum)
; * / debug;
set issue;
by G;
REDEEMED = 0;
EXPIRED = 0;
REMAINING = 0;
EXCESS = 0;
if 0 then set redemptions; %* prep pdv;
retain _balance 0;
retain _redem_sum 0;
retain _expir_sum 0;
retain _redem_bal 0;
retain _expir_bal 0;
if first.g then put / '----------- ' G= '-------------';
put '#set ' _N_=;
put 'balance: ' _balance _redem_bal= _expir_bal=;
put 'coupon : ' amount first.g= /;
if first.G then do;
put #3 '#first in group';
_balance = amount;
_redem_sum = 0;
_redem_bal = 0;
_expir_sum = 0;
_expir_bal = 0;
put #3 'balance: ' _balance _redem_bal= _expir_bal= G= XG=;
* spin to first matching redemption or first redemption in a higher by-group;
if (XG ne G) then
do while (not EOT);
link fetch;
if XG >= G then leave;
end;
if (G = XG) then
link apply_redemp;
put #6 'spin: ' G= XG=;
end;
else do; * additional couponage;
put #3 '#next in group';
if (G = XG) then
_balance + amount;
else
_balance = amount;
put #3 'balance: ' _balance _redem_bal= _expir_bal=;
link apply_excess_to_balance;
put #3 'balance: ' _balance _redem_bal= _expir_bal= xg= last.xg=;
if (_balance > 0 and G = XG and not last.XG) then
link fetch_apply;
end;
if (G = XG) then
do while (not EOT and not last.XG and _balance > 0);
link fetch_apply;
end;
redeemed = _redem_sum;
expired = _expir_sum;
remaining = min (_balance, amount);
excess = sum (_redem_bal, _expir_bal, max (0, _balance - amount));
output;
put #4 'output: ' amount= redeemed= expired= remaining= excess= /;
_redem_sum = 0;
_expir_sum = 0;
return;
apply_excess_to_balance:
if (_redem_bal > 0 and _balance > 0) then do;
apply = min ( _balance, _redem_bal );
_redem_sum + apply;
_redem_bal + -apply;
_balance + -apply;
app_count = sum(app_count,1);
put #4 'excess: ' apply= _redem_bal= _redem_sum= _balance= 'reduced amount by excess redemption';
end;
if (_expir_bal > 0 and _balance > 0) then do;
apply = min ( _balance, _expir_bal );
_expir_sum + apply;
_expir_bal + -apply;
_balance + -apply;
app_count = sum(app_count,1);
put #4 'excess: ' apply= _expir_bal= _expir_sum= _balance= 'reduced amount by excess expiration';
end;
return;
fetch:
set redemptions end=EOT;
by XG;
put #5 'fetch: ' xg= xamount= type= first.xg= last.xg= EOT=;
return;
fetch_apply:
link fetch;
if (G = XG) then
link apply_redemp;
return;
apply_redemp:
if type in: ('RED' 'EXP') then do;
apply = min (_balance, -XAMOUNT);
excess = max (0, -XAMOUNT - _balance);
if type =: 'RED' then do;
_redem_sum + apply;
_redem_bal + excess;
end;
else
if type =: 'EXP' then do;
_expir_sum + apply;
_expir_bal + excess;
end;
_balance + -apply;
app_count = sum(app_count,1);
fetch_sum = sum(fetch_sum, -xamount);
put #5 'apply: ' apply= _balance= _redem_sum= _expir_sum= _redem_bal= _expir_bal=;
end;
return;
run;
Here is some additional sample data with
non matching redemption
B group has a first redemption which is a large excess that spans two coupons.
coupon group with no redemptions
Trickier data
data ISSUE; input
G $ AMOUNT;
A 500
A 500
B 500
B 500
B 500
B 1250
B 750
B2 100
B2 200
C 500
C 500
C 500
C 500
C 500
run;
data REDEMPTIONS; input
XG $ XAMOUNT TYPE $16.; datalines;
! -1000 REDEMPTION
A -150 REDEMPTION
A -350 REDEMPTION
A -200 EXPIRATION
B -1100 REDEMPTION was -300
B -200 EXPIRATION
B -1000 REDEMPTION
C -1500 REDEMPTION
C -500 EXPIRATION
run;

Related

Creating Dataset with random Values in SAS

I want to create a random dataset. Something like this-
ptno visits sex race
1 1 1 0
1 2 1 0
1 3 1 0
2 1 2 1
2 2 2 1
2 3 2 1
3 1 1 0
3 2 1 0
3 3 1 0
The values should be randomly generated. I want to know if I can do this dynamically using do loops. Thanks in advance for helping.
data want ;
length ptno visits sex race 8. ;
do ptno = 1 to 100 ;
_visits = ceil(ranuni(0)*5) ; /* between 1 & 5 */
sex = ceil(ranuni(0)*2) ; /* between 1 & 2 */
race = floor(ranuni(0)*2) ; /* between 0 & 1 */
do visits = 1 to _visits ;
output ;
end ;
end ;
drop _visits ;
run ;
SAS call ranuni() produce a random variate from a uniform distribution, if value is greater than 0.5 then 1, otherwise 0. Here, the same ptno (i) + seed get the same sex or race.
data want;
do i=100 to 110;
do j=1 to 5;
seed1=i+4567;
call ranuni(seed1,x);
seed2=i+1234;
call ranuni(seed2,y);
ptno=i;
visit=j;
sex=(x>0.5)+1;
race=(y<0.5);
output;
end;
end;
keep ptno--race;
run;

How to get the count of values greater than zero from a subset of an array in SAS

I want to get a data set with an array that saves the count of values greater than zero in a subset of an array.
My code:
%Macro Test(input_array, window);
array initial{*} &input_array;
array position[&window];
array cumulative[&window];
/* Fill array indicating position with value zero, previous value greater than zero */
do i = 1 to dim(initial) - 1;
if initial(i) gt 0 and initial(i+1) eq 0 then
position(i) = i + 1;
end;
/* Fill array indicating the count of values greater than zero until the index in the position array*/
%let j = 1;
%do %while (&j lt &window);
end_ = coalesce(of position&j - position&window);
if not missing(end_) then do;
gt_0_cnt = 0;
do k = &j to end_ - 1;
gt_0_cnt + ifn(initial(k) > 0,1,0);
end;
cumulative(end_ - 1) = gt_0_cnt;
end;
%let j = %eval(&j + end_);
%end;
%Mend;
DATA HAVE;
INPUT ID FM1-FM18;
DATALINES;
A 1 2 0 0 1 0 0 0 0 2 2 2 3 3 4 4 4 0
B 0 0 1 2 3 4 5 1 2 3 4 0 0 0 1 2 0 0
;
RUN;
DATA WANT;
SET HAVE;
%Test(FM: 18);
RUN;
The output I need:
But I have a problem when trying to evaluate this expression
%let j = %eval(&j + end_)
I get the messaje ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was:
1 + end_
I don't know of any other way to get the desired result.
If someone can help me I will be grateful.
Doesn't seem like you need the macro language for this.
data want;
set have;
array fm fm:;
array cum cum_1-cum_18;
do _i = 1 to dim(fm);
if fm[_i] eq 0 then call missing(cum[_i]);
else do;
do count = 1 by 1 until (fm[_i+count] eq 0 or (count+_i eq dim(fm)));
end;
put _i= count=;
cum[_i+count-1] = count;
_i = _i + count - 1;
end;
end;
run;
Obviously you can specify the 18 max on the cum array through a macro parameter, or what the variable names are, but all of the stuff you're doing is perfectly doable through the data step language or simple macro variable parameters.

Creating calculated variables in one datastep

Any help is much appreciated. Thanks
I would like to create a couple of variables with my transactional data
I am trying to create variables 'act_bal' and 'penalty' using amount, type and Op_Bal. The rules I have are:
For the first record, the id will have op_bal and it will be
subtracted from the 'amount' for type=C and added if type=D to
calculate act_bal
For the second record onwards it is act_bal + amount for type=C and
act_bal-amount for type=D
I will add the penalty 10 only if the amount is >4 and the type=D.
The id can have only two penalties.
Total Penalty should be subtracted from the act_bal of the last
record which would become op_bal for the next day. (e.g. for id
101, -178-20=-198 will become the op_bal for 4/2/2019)
This is the data I have for two customers IDs 101 and 102 for two different dates (My actual dataset has the data for all the 30 days).
id date amount type Op_Bal
101 4/1/2019 50 C 100
101 4/1/2019 25 D
101 4/1/2019 75 D
101 4/1/2019 3 D
101 4/1/2019 75 D
101 4/1/2019 75 D
101 4/2/2019 100 C
101 4/2/2019 125 D
101 4/2/2019 150 D
102 4/1/2019 50 C 125
102 4/1/2019 125 C
102 4/2/2019 250 D
102 4/2/2019 10 D
The code I wrote is like this
data want;
set have;
by id date;
if first.id or first.date then do;
if first.id then do;
if type='C' then act_bal=Op_Bal - amount;
if type='D' then act_bal=Op_Bal + amount;
end;
else do;
retain act_bal;
if type='C' then act_bal=act_bal + amount;
if type='D' then act_bal=act_bal - amount;
if amount>4 and type='D' then do;
penalty=10;
end;
run;
I couldn't create a counter to control the penalties to 2 and could not subtract the total penalty amount from the amount of the last row. Could someone help me in receiving the desired result? Thanks
id date amount type Op_Bal act_bal penalty
101 4/1/2019 50 C 200 150 0
101 4/1/2019 25 D 125 0
101 4/1/2019 150 D -25 10
101 4/1/2019 75 D -100 10
101 4/1/2019 3 D -103 0
101 4/1/2019 75 D -178 0
101 4/2/2019 100 C -198 -98 0
101 4/2/2019 125 D -223 10
101 4/2/2019 150 D -373 10
102 4/1/2019 50 C 125 175 0
102 4/1/2019 125 C 300 0
102 4/2/2019 250 D 50 0
102 4/2/2019 10 D 40 0
A few tips:
You have the same code for incrementing act_bal in both the if and else blocks, so factor it out. Don't repeat yourself.
You can skip the retain statement if you use a sum statement.
Use a separate variable to keep track of the number of penalties triggered per day, but only apply the first two of them.
So, putting it all together:
data want;
set have;
by id date;
if first.date and not first.id then op_bal = act_bal;
if first.date then do;
act_bal = op_bal;
penalties = 0;
end;
if type='C' then act_bal + amount;
if type='D' then act_bal + (-amount);
if amount > 4 and type='D' then penalties + 1;
if last.date then act_bal + (-min(penalties,2) * 10);
run;

Loops and output

I am trying to get a bit handy with my loop and output statements, currently I have a loan which amortizes like such:
data have;
input Payment2017 Payment2018 Payment2019 Payment2020;
datalines;
100 10 10 10;
run;
I'm trying to create a maturity and re-issuance profile that looks like this, I will explain the logic when I submit my current code:
data want;
input;
P2017 P2018 P2019 P2020 F1 F2 F3 MP2017 MP2018 MP2019 MP2020 NI2017 NI2018 NI2019 NI2020;
datalines;
100 10 10 10 0.1 0.1 0.1 100 10 10 10 0 0 0 0
100 10 10 10 0.1 0.1 0.1 0 10 1 1 0 10 0 0
100 10 10 10 0.1 0.1 0.1 0 0 11 1.1 0 0 11 0
100 10 10 10 0.1 0.1 0.1 0 0 0 12.1 0 0 0 12.1
;
run;
so the logic is that:
Payment2017 = the balance at the start of the year
Payment2018 - 2020 = the amount paid each period
F1-F3 is the fraction of the loan that is being paid each period.
MP2017-MP2020 is the amount of the loan that is paid back - essentially it is
mp(i) = p(i) *f(i)
NI2017-NI2020 is the amount that is newly issued if you assume that each time I pay off a bit of the loan , it is added back onto the loan. so the current code which I am using looks like this but i'm having some issues with the ouput and loops.
data want;
set have;
array MaturityProfile(4) MaturityProfile&StartDate-MaturityProfile&EndDate;
array NewIssuance(4) NewIssuance&StartDate - NewIssuance&EndDate;
array p(4) payment&StartDate-payment&EndDate;
array fraction(3); * track constant fraction determined at start of profile;
MaturityProfile(1) = P(1);
do i = 1 to 3;
fraction(i) = p(i+1) / p(1);
end;
iter=2;
do j = 1 to 2;
do i = iter to 4;
MaturityProfile(i) = P(j) * Fraction(i-j);
newissuance(i) = MaturityProfile(i);
end;
output;
iter=iter+1;
end;
output;
*MaturityProfile(4) = ( P(3) + MaturityProfile(2) ) * Fraction(1);
*output;
drop i;
drop j;
drop iter;
run;
I'm trying to find a way of for the first two rows, keeping it how it outputs currently but the third row needs the sum of the column for the second row ( or the newissuance2019) and then multiply that by fraction 1
so basically the output to look like the table I've put in the data want step.
TIA.
I managed to fix this by doing:
data want;
set have;
array MaturityProfile(4) MaturityProfile&StartDate-MaturityProfile&EndDate;
array NewIssuance(4) NewIssuance&StartDate - NewIssuance&EndDate;
array p(4) payment&StartDate-payment&EndDate;
array fraction(3); * track constant fraction determined at start of profile;
array Total(4) Total1-Total4;
MaturityProfile(1) = P(1);
do i = 1 to 3;
fraction(i) = p(i+1) / p(1);
end;
iter=2;
do j = 1 to 2;
do i = iter to 4;
MaturityProfile(i) = P(j) * Fraction(i-j);
Total(i)=MaturityProfile(i) + P(i);
end;
output;
iter=iter+1;
end;
MaturityProfile(4) = Total(3) * Fraction(1);
output;
drop i;
drop j;
drop iter;
run;

How to split an ordered array of durations in one-hour groups

I have an array with some durations (in seconds), I'd like to split that array into accumulated duration groups that not surpass 3600 seconds in MATLAB. The durations are in order.
Input:
Duration(s) | 2010 1000 500 1030 80 2030 1090
With an:
------------- ------------ ----
Accumulated duration (s) | 3510 3130 1090
------------- ------------ ----
1st group 2nd group 3rd
Output:
Groups index | 1 1 1 2 2 2 3
I've tried with some scripts, but these take so long, and I have to process a lot of data.
Here is a vectorized way using bsxfun and cumsum:
durations = [2010 1000 500 1030 80 2030 1090]
stepsize = 3600;
idx = sum(bsxfun(#ge, cumsum(durations), (0:stepsize:sum(durations)).'),1)
idx =
1 1 1 2 2 2 3
The accumulated durations you can then get with:
accDuratiation = accumarray(idx(:),durations(:),[],#sum).'
accDuratiation =
3510 3140 1090
Explanation:
%// cumulative sum of all durations
csum = cumsum(durations);
%// thresholds
threshs = 0:stepsize:sum(durations);
%// comparison
comp = bsxfun(#ge, csum(:).',threshs(:)) %'
comp =
1 1 1 1 1 1 1
0 0 0 1 1 1 1
0 0 0 0 0 0 1
%// get index
idx = sum(comp,1)
This will get you close . . .
durs = [2010 1000 500 1030 80 2030 1090];
cums = cumsum(durs);
t = 3600;
idx = zeros(size(durs));
while ~all(idx)
idx = idx + (cums <= t);
cums = cums - max(cums(cums <= t));
end
You can then get the output into your preferred format with a simple . .
idx = -(idx-max(idx)-1)
and just in case you don't have enough, yet another way to do it:
durations = [2010 1000 500 1030 80 2030 1090] ;
stepsize = 3600;
cs = cumsum(durations) ;
idxbeg = [1 find(sign([1 diff(mod(cs,stepsize))])==-1)] ; %// first index of each group
idxend = [idxbeg(2:end)-1 numel(d)] ; %// last index of each group
groupDuration = [cs(idxend(1)) diff(cs(idxend))]
groupIndex = cell2mat( arrayfun(#(x,y) repmat(x,1,y), 1:numel(idxbeg) , idxend-idxbeg+1 , 'uni',0) )
groupDuration =
3510 3140 1090
groupIndex =
1 1 1 2 2 2 3
although if you ask me I find the bsxfun solution more elegant

Resources