Loops and output - arrays

I am trying to get a bit handy with my loop and output statements, currently I have a loan which amortizes like such:
data have;
input Payment2017 Payment2018 Payment2019 Payment2020;
datalines;
100 10 10 10;
run;
I'm trying to create a maturity and re-issuance profile that looks like this, I will explain the logic when I submit my current code:
data want;
input;
P2017 P2018 P2019 P2020 F1 F2 F3 MP2017 MP2018 MP2019 MP2020 NI2017 NI2018 NI2019 NI2020;
datalines;
100 10 10 10 0.1 0.1 0.1 100 10 10 10 0 0 0 0
100 10 10 10 0.1 0.1 0.1 0 10 1 1 0 10 0 0
100 10 10 10 0.1 0.1 0.1 0 0 11 1.1 0 0 11 0
100 10 10 10 0.1 0.1 0.1 0 0 0 12.1 0 0 0 12.1
;
run;
so the logic is that:
Payment2017 = the balance at the start of the year
Payment2018 - 2020 = the amount paid each period
F1-F3 is the fraction of the loan that is being paid each period.
MP2017-MP2020 is the amount of the loan that is paid back - essentially it is
mp(i) = p(i) *f(i)
NI2017-NI2020 is the amount that is newly issued if you assume that each time I pay off a bit of the loan , it is added back onto the loan. so the current code which I am using looks like this but i'm having some issues with the ouput and loops.
data want;
set have;
array MaturityProfile(4) MaturityProfile&StartDate-MaturityProfile&EndDate;
array NewIssuance(4) NewIssuance&StartDate - NewIssuance&EndDate;
array p(4) payment&StartDate-payment&EndDate;
array fraction(3); * track constant fraction determined at start of profile;
MaturityProfile(1) = P(1);
do i = 1 to 3;
fraction(i) = p(i+1) / p(1);
end;
iter=2;
do j = 1 to 2;
do i = iter to 4;
MaturityProfile(i) = P(j) * Fraction(i-j);
newissuance(i) = MaturityProfile(i);
end;
output;
iter=iter+1;
end;
output;
*MaturityProfile(4) = ( P(3) + MaturityProfile(2) ) * Fraction(1);
*output;
drop i;
drop j;
drop iter;
run;
I'm trying to find a way of for the first two rows, keeping it how it outputs currently but the third row needs the sum of the column for the second row ( or the newissuance2019) and then multiply that by fraction 1
so basically the output to look like the table I've put in the data want step.
TIA.

I managed to fix this by doing:
data want;
set have;
array MaturityProfile(4) MaturityProfile&StartDate-MaturityProfile&EndDate;
array NewIssuance(4) NewIssuance&StartDate - NewIssuance&EndDate;
array p(4) payment&StartDate-payment&EndDate;
array fraction(3); * track constant fraction determined at start of profile;
array Total(4) Total1-Total4;
MaturityProfile(1) = P(1);
do i = 1 to 3;
fraction(i) = p(i+1) / p(1);
end;
iter=2;
do j = 1 to 2;
do i = iter to 4;
MaturityProfile(i) = P(j) * Fraction(i-j);
Total(i)=MaturityProfile(i) + P(i);
end;
output;
iter=iter+1;
end;
MaturityProfile(4) = Total(3) * Fraction(1);
output;
drop i;
drop j;
drop iter;
run;

Related

How to get the count of values greater than zero from a subset of an array in SAS

I want to get a data set with an array that saves the count of values greater than zero in a subset of an array.
My code:
%Macro Test(input_array, window);
array initial{*} &input_array;
array position[&window];
array cumulative[&window];
/* Fill array indicating position with value zero, previous value greater than zero */
do i = 1 to dim(initial) - 1;
if initial(i) gt 0 and initial(i+1) eq 0 then
position(i) = i + 1;
end;
/* Fill array indicating the count of values greater than zero until the index in the position array*/
%let j = 1;
%do %while (&j lt &window);
end_ = coalesce(of position&j - position&window);
if not missing(end_) then do;
gt_0_cnt = 0;
do k = &j to end_ - 1;
gt_0_cnt + ifn(initial(k) > 0,1,0);
end;
cumulative(end_ - 1) = gt_0_cnt;
end;
%let j = %eval(&j + end_);
%end;
%Mend;
DATA HAVE;
INPUT ID FM1-FM18;
DATALINES;
A 1 2 0 0 1 0 0 0 0 2 2 2 3 3 4 4 4 0
B 0 0 1 2 3 4 5 1 2 3 4 0 0 0 1 2 0 0
;
RUN;
DATA WANT;
SET HAVE;
%Test(FM: 18);
RUN;
The output I need:
But I have a problem when trying to evaluate this expression
%let j = %eval(&j + end_)
I get the messaje ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was:
1 + end_
I don't know of any other way to get the desired result.
If someone can help me I will be grateful.
Doesn't seem like you need the macro language for this.
data want;
set have;
array fm fm:;
array cum cum_1-cum_18;
do _i = 1 to dim(fm);
if fm[_i] eq 0 then call missing(cum[_i]);
else do;
do count = 1 by 1 until (fm[_i+count] eq 0 or (count+_i eq dim(fm)));
end;
put _i= count=;
cum[_i+count-1] = count;
_i = _i + count - 1;
end;
end;
run;
Obviously you can specify the 18 max on the cum array through a macro parameter, or what the variable names are, but all of the stuff you're doing is perfectly doable through the data step language or simple macro variable parameters.

Bound my result to [-1,1] in SAS

I am calculating a fraction based on the payments made in four years and I wish to put a cap on my fraction such that it can only be between -1 and 1. Subsequently I'd like to make the following fractions 0 if the cap is maxxed out - an example would be:
data want;
input payment1 payment2 payment3 payment4 fraction1 fraction2 fraction3;
datalines;
100 25 25 25 0.25 0.25 0.25
150 50 50 50 0.33 0.33 0.33
50 10 10 10 0.2 0.2 0.2
10 50 60 70 1 0 0
;
run;
I've been looking at the ceiling function with the following code
data want2;
set want;
array fraction(3) fraction1 - fraction3;
array payment(4) payment1 - payment4;
do i = 2 to 4;
fraction(i-1) = payment(i)/payment(1);
end;
run;
data want3;
set want2;
array fraction(3) fraction1 - fraction3;
array fract(3) fract1-fract3;
do i = 1 to 3;
fract = ceil (fraction,1);
end;
drop i;
run;
but I am getting this error
ERROR 72-185: The CEIL function call has too many arguments.
So in all i'm looking for a way to calculate the fraction of the payments and then make a ceiling at one, then once the ceiling is hit, the subsequent fractions must be zero (which could be done I suppose by just doing an IF-THEN)
The ceil function is a type of rounding. You need min and max:
do i = 1 to 3;
fract = min(max(fraction, -1) ,1);
end;

Working with an upper-triangular array in SAS (challenge +2 Points)

I'm looking to improve my code efficiency by turning my code into arrays and loops. The data i'm working with starts off like this:
ID Mapping Asset Fixed Performing Payment 2017 Payment2018 Payment2019 Payment2020
1 Loan1 1 1 1 90 30 30 30
2 Loan1 1 1 0 80 20 40 20
3 Loan1 1 0 1 60 40 10 10
4 Loan1 1 0 0 120 60 30 30
5 Loan2 ... ... ... ... ... ... ...
So For each ID (essentially the data sorted by Mapping, Asset, Fixed and then Performing) I'm looking to build a profile for the Payment Scheme.
The Payment Vector for the first ID looks like this:
PaymentVector1 PaymentVector2 PaymentVector3 PaymentVector4
1 0.33 0.33 0.33
It is represented by the formula
PaymentVector(I)=Payment(I)/Payment(1)
The above is fine to create in an array, example code can be given if you wish.
Next, under the assumption that every payment made is replaced i.e. when 30 is paid in 2018, it must be replaced, and so on.
I'm looking to make a profile that shows the outflows (and for illustration, but not required in code, in brackets inflows) for the movement of the payments as such - For ID=1:
Payment2017 Payment2018 Payment2019 Payment2020
17 (+90) -30 -30 -30
18 N/A (+30) -10 -10
19 N/A N/A (+40) -13.3
20 N/A N/A N/A (+53.3)
so if you're looking forwards, the rows can be thought of what year it is and the columns representing what years are coming up.
Hence, in year 2019, looking at what is to be paid in 2017 and 2018 is N/A because those payments are in the past / cannot be paid now.
As for in year 2018, looking at what has to be paid in 2019, you have to pay one-third of the money you have now, so -10.
I've been working to turn this dataset row by row into the array but there surely has to be a quicker way using an array:
The Code I've used so far looks like:
Data Want;
Set Have;
Array Vintage(2017:2020) Vintage2017-Vintage2020;
Array PaymentSchedule(2017:2020) PaymentSchedule2017-PaymentSchedule2020;
Array PaymentVector(2017:2020) PaymentVector2017-PaymentVector2020;
Array PaymentVolume(2017:2020) PaymentVolume2017-PaymentVolume2020;
do i=1 to 4;
PaymentVector(i)=PaymentSchedule(i)/PaymentSchedule(1);
end;
I'll add code tomorrow... but the code doesn't work regardless.
data have;
input
ID Mapping $ Asset Fixed Performing Payment2017 Payment2018 Payment2019 Payment2020; datalines;
1 Loan1 1 1 1 90 30 30 30
2 Loan1 1 1 0 80 20 40 20
3 Loan1 1 0 1 60 40 10 10
4 Loan1 1 0 0 120 60 30 30
data want(keep=id payment: fraction:);
set have;
array p payment:;
array fraction(4); * track constant fraction determined at start of profile;
array out(4); * track outlay for ith iteration;
* compute constant (over iterations) fraction for row;
do i = dim(p) to 1 by -1;
fraction(i) = p(i) / p(1);
end;
* reset to missing to allow for sum statement, which is <variable> + <expression>;
call missing(of out(*));
out(1) = p(1);
do iter = 1 to 4;
p(iter) = out(iter);
do i = iter+1 to dim(p);
p(i) = -fraction(i) * p(iter);
out(i) + (-p(i)); * <--- compute next iteration outlay with ye olde sum statement ;
end;
output;
p(iter) = .;
end;
format fract: best4. payment: 7.2;
run;
You've indexed your arrays with 2017:2020 but then try and use them using the 1 to 4 index. That won't work, you need to be consistent.
Array PaymentSchedule(2017:2020) PaymentSchedule2017-PaymentSchedule2020;
Array PaymentVector(2017:2020) PaymentVector2017-PaymentVector2020;
do i=2017 to 2020;
PaymentVector(i)=PaymentSchedule(i)/PaymentSchedule(2017);
end;

Find timeline for duration values in Matlab

I have the following time-series:
b = [2 5 110 113 55 115 80 90 120 35 123];
Each number in b is one data point at a time instant. I computed the duration values from b. Duration is represented by all numbers within b larger or equal to 100 and arranged consecutively (all other numbers are discarded). A maximum gap of one number smaller than 100 is allowed. This is how the code for duration looks like:
N = 2; % maximum allowed gap
duration = cellfun(#numel, regexp(char((b>=100)+'0'), [repmat('0',1,N) '+'], 'split'));
giving the following duration values for b:
duration = [4 3];
I want to find the positions (time-lines) within b for each value in duration. Next, I want to replace the other positions located outside duration with zeros. The result would look like this:
result = [0 0 3 4 5 6 0 0 9 10 11];
If anyone could help, it would be great.
Answer to original question: pattern with at most one value below 100
Here's an approach using a regular expression to detect the desired pattern. I'm assuming that one value <100 is allowed only between (not after) values >=100. So the pattern is: one or more values >=100 with a possible value <100 in between .
b = [2 5 110 113 55 115 80 90 120 35 123]; %// data
B = char((b>=100)+'0'); %// convert to string of '0' and '1'
[s, e] = regexp(B, '1+(.1+|)', 'start', 'end'); %// find pattern
y = 1:numel(B);
c = any(bsxfun(#ge, y, s(:)) & bsxfun(#le, y, e(:))); %// filter by locations of pattern
y = y.*c; %// result
This gives
y =
0 0 3 4 5 6 0 0 9 10 11
Answer to edited question: pattern with at most n values in a row below 100
The regexp needs to be modified, and it has to be dynamically built as a function of n:
b = [2 5 110 113 55 115 80 90 120 35 123]; %// data
n = 2;
B = char((b>=100)+'0'); %// convert to string of '0' and '1'
r = sprintf('1+(.{1,%i}1+)*', n); %// build the regular expression from n
[s, e] = regexp(B, r, 'start', 'end'); %// find pattern
y = 1:numel(B);
c = any(bsxfun(#ge, y, s(:)) & bsxfun(#le, y, e(:))); %// filter by locations of pattern
y = y.*c; %// result
Here is another solution, not using regexp. It naturally generalizes to arbitrary gap sizes and thresholds. Not sure whether there is a better way to fill the gaps. Explanation in comments:
% maximum step size and threshold
N = 2;
threshold = 100;
% data
b = [2 5 110 113 55 115 80 90 120 35 123];
% find valid data
B = b >= threshold;
B_ind = find(B);
% find lengths of gaps
step_size = diff(B_ind);
% find acceptable steps (and ignore step size 1)
permissible_steps = 1 < step_size & step_size <= N;
% find beginning and end of runs
good_begin = B_ind([permissible_steps, false]);
good_end = good_begin + step_size(permissible_steps);
% fill gaps in B
for ii = 1:numel(good_begin)
B(good_begin(ii):good_end(ii)) = true;
end
% find durations of runs in B. This finds points where we switch from 0 to
% 1 and vice versa. Due to padding the first match is always a start of a
% run, the last one always an end. There will be an even number of matches,
% so we can reshape and diff and thus fidn the durations
durations = diff(reshape(find(diff([false, B, false])), 2, []));
% get positions of 'good' data
outpos = zeros(size(b));
outpos(B) = find(B);

Calculating moving average using do loop in SAS

I am trying to find a way to calculate a moving average using SAS do loops. I am having difficulty. I essentially want to calculate a 4 unit moving average.
DATA data;
INPUT a b;
CARDS;
1 2
3 4
5 6
7 8
9 10
11 12
13 14
15 16
17 18
;
run;
data test(drop = i);
set data;
retain c 0;
do i = 1 to _n_-4;
c = (c+a)/4;
end;
run;
proc print data = test;
run;
One option is to use the merge-ahead:
DATA have;
INPUT a b;
CARDS;
1 2
3 4
5 6
7 8
9 10
11 12
13 14
15 16
17 18
;
run;
data want;
merge have have(firstobs=2 rename=a=a_1) have(firstobs=3 rename=a=a_2) have(firstobs=4 rename=a=a_3);
c = mean(of a:);
run;
Merge the data to itself, each time the merged dataset advancing one - so the 2nd starts with 2, third starts with 3, etc. That gives you all 4 'a' on one line.
SAS has a lag() function. What this does is create the lag of the variable it is applied to. SO for example, if your data looked like this:
DATA data;
INPUT a ;
CARDS;
1
2
3
4
5
;
Then the following would create a lag one, two, three etc variable;
data data2;
set data;
a_1=lag(a);
a_2=lag2(a);
a_3=lag3(a);
drop b;
run;
would create the following dataset
a a_1 a_2 a_3
1 . . .
2 1 . .
3 2 1 .
4 3 2 1
etc.
Moving averages can be easily calculated from these.
Check out http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212547.htm
(Please note, I did not get a chance to run the codes, so they may have errors.)
Straight from Cody's Collection of Popular Programming Tasks and How to Tackle them.
*Presenting a macro to compute a moving average;
%macro Moving_ave(In_dsn=, /*Input data set name */
Out_dsn=, /*Output data set name */
Var=, /*Variable on which to compute
the average */
Moving=, /* Variable for moving average */
n= /* Number of observations on which
to compute the average */);
data &Out_dsn;
set &In_dsn;
***compute the lags;
_x1 = &Var;
%do i = 1 %to &n - 1;
%let Num = %eval(&i + 1);
_x&Num = lag&i(&Var);
%end;
***if the observation number is greater than or equal to the
number of values needed for the moving average, output;
if _n_ ge &n then do;
&Moving = mean (of _x1 - _x&n);
output;
end;
drop _x:;
run;
%mend Moving_ave;
*Testing the macro;
%moving_Ave(In_dsn=data,
Out_dsn=test,
Var=a,
Moving=Average,
n=4)

Resources