Recursion in Array SAS - arrays

I have an existing collection of variables a_0,...,a_45 where a_i represents the amount of stuff I have on day i. I'd like to create a new collection of variables b_0,...,b_45 to represent the incremental change in stuff I have on day i (i.e. b_k=a_k-a_(k-1) ). My approach:
data test;
set dataset;
array a a_0-a_45;
array b b_0-b_45;
b(1)=a(1);
do i=2 to 45;
b(i)=a(i)-a(i-1);
end;
run;
However my b variables just come out missing.

What initial values do you have for a_1 to a_45 before you start the loop? As you are not intialising them (except for a_0 ≡ a(1)), every b(i) term will be a difference of 2 a terms, of which at least one will be missing, unless these variables are populated in your input dataset.

Here is some sample code showing that the delta computation is correct when the variable names in the data set align with the variables named in the array statement in the data step.
Sample data
data have(keep=product_id note a_:);
do product_id = 1 to 100;
length note $15;
array amount a_0-a_45;
call missing(of amount(*));
if (ranuni(123) < 0.5) then do;
note = 'static deltas';
static_delta = ceil(5 * ranuni(123));
amount(1) = static_delta;
do inventory_day = 2 to dim(amount);
amount(inventory_day) = amount(inventory_day-1) + static_delta;
end;
end;
else do;
note = 'random deltas';
amount(1) = ceil(5 * ranuni(123));
do inventory_day = 2 to dim(amount);
amount(inventory_day) = max ( 0, amount(inventory_day-1) + floor(10 * ranuni(123)) - 5 );
end;
end;
OUTPUT;
end;
run;
Compute deltas
data want;
set have;
array amount a_0-a_45;
array delta b_0-b_45;
delta(1) = amount(1);
do i=2 to dim(amount);
delta(i) = amount(i) - amount(i-1);
end;
drop i;
format a_: b_: 4.;
run;

As Richard has already suggested in his comment while I was working on writing the code...Basically the only error that you have in your code is that your code should loop from 2 to 46 because there are 46 elements in the array. below code should work for you.
%macro f();
data dataset;
%do i = 0 %to 45;
a_&i. = ranuni(2);
%end;
run;
%mend;
%f();
data test;
set dataset;
array a1 a_0-a_45;
array b1 b_0-b_45;
/* This line will help in avoiding b_0 to have a missing value */
b1(1)=a1(1);
do i=2 to 46;
b1(i)=a1(i)-a1(i-1);
end;
run;

Related

How to use call dynamic_array right?

I have try to write a user-defined function MinDis() and apply it in data step, this function is used to compute the minimum distance from one point to each element of an (numeric)array. Code fllowing:
proc fcmp outlib = work.funcs.Math;
function MinDis(Exp,Arr[*]);
array dis[2] /symbols;
call dynamic_array(dis,dim(Arr));
do i = 1 to dim(Arr);
dis[i] = abs((Exp - Arr[i]));
end;
return(min(of dis[*]));
endsub;
quit;
option cmplib=work.funcs ;
data MinDis;
input LamdazLower LamdazUpper #;
cards;
2.50 10.0
2.51 10.8
2.49 9.97
2.75 9.50
;
run;
data _null_;
set ;
array _PTN_ [14] _temporary_ (0.5,1,1.5,2,2.5,3,4,5,6,7,8,9,10,12);
StdLamZLow = MinDis(LamdazLower,_PTN_);
StdLamZUpp = MinDis(LamdazUpper,_PTN_);
put _all_;
run;
It was compile rightly but gave wrong results. StdLamZLow just get the minimum distance from LamdazLower to the first two element of array _PTN_.
When I rewrite the dim of dis as 999 or something very big and get rid of call dynamic_array statement I would get it right. But I surely want to know why min(of dis[*]) just take dis as a 2-dim array.
By the way, how can I use implied DO-loops do over ... to instead of explicit DO loops? I have tried several times but haven`t success yet.
Thanks for any hints.
I think this happens cause of dynamic array. MIN function see just static length of dis array(2 in your case). So you should try to compute min of array without calling MIN function:
proc fcmp outlib = work.funcs.Math;
function MinDis(Exp,Arr[*]);
length=dim(Arr);
array dis[2] /nosymbols;
call dynamic_array(dis,length);
dis[1]=abs((Exp - Arr[1]));
min=dis[1];
do i = 1 to length;
dis[i] = abs((Exp - Arr[i]));
if dis[i] < min then min=dis[i];
end;
return(min);
endsub;
quit;
Output:
LamdazLower=2.5 LamdazUpper=10 StdLamZLow=0 StdLamZUpp=0
LamdazLower=2.51 LamdazUpper=10.8 StdLamZLow=0.01 StdLamZUpp=0.8
LamdazLower=2.49 LamdazUpper=9.97 StdLamZLow=0.01 StdLamZUpp=0.03
LamdazLower=2.75 LamdazUpper=9.5 StdLamZLow=0.25 StdLamZUpp=0.5
In addition about implied DO-loops(here on 6th page), if i correctly understood the question:
data temp;
array dis dis1-dis4;
do over dis;
dis=2;
put _all_;
end;
run;
Output:
I=1 dis1=2 dis2=. dis3=. dis4=. ERROR=0 N=1
I=2 dis1=2 dis2=2 dis3=. dis4=. ERROR=0 N=1
I=3 dis1=2 dis2=2 dis3=2 dis4=. ERROR=0 N=1
I=4 dis1=2 dis2=2 dis3=2 dis4=2 ERROR=0 N=1

Transpose a correlation matrix into one long vector in SAS

I'm trying to turn a correlation matrix into one long column vector such that I have the following structure
data want;
input _name1_$ _name2_$ _corr_;
datalines;
var1 var2 0.54
;
run;
I have the following code, which outputs name1 and corr; however, I'm struggling to get name2!
DATA TEMP_1
(DROP=I J);
ARRAY VAR[*] VAR1-VAR10;
DO I = 1 TO 10;
DO J = 1 TO 10;
VAR(J) = RANUNI(0);
END;
OUTPUT;
END;
RUN;
PROC CORR
DATA=TEMP_1
OUT=TEMP_CORR
(WHERE=(_NAME_ NE " ")
DROP=_TYPE_)
;
RUN;
PROC SORT DATA=TEMP_CORR; BY _NAME_; RUN;
PROC TRANSPOSE
DATA=TEMP_CORR
OUT=TEMP_CORR_T
;
BY _NAME_;
RUN;
Help is appreciated
You're close. You're running into a weird issue with the name variable because that becomes a variable out of PROC TRANSPOSE as well. If you rename it, you get what you want. I also list the variables explicitly and add some RENAME data set options to get what you likely want.
PROC TRANSPOSE
DATA=TEMP_CORR (rename=_name_ = Name1)
OUT=TEMP_CORR_T (rename = (_name_ = Name2 col1=corr))
;
by name1;
var var1-var10;
RUN;
Edit: If you don’t want duplicates you can add a WHERE to the OUT dataset.
PROC TRANSPOSE
DATA=TEMP_CORR (rename=_name_ = Name1)
OUT=TEMP_CORR_T (rename = (_name_ = Name2 col1=corr) where = name1 > name2)
;
by name1;
var var1-var10;
RUN;
Just an ARRAY with VNAME() function. To just output the upper triangle set lower bound of DO loop to _N_.
data want ;
length _name1_ _name2_ $32 _corr_ 8 ;
keep _name1_ _name2_ _corr_;
set corr;
where _type_ = 'CORR';
array x _numeric_;
_name1_=_name_;
do i=_n_ to dim(x);
_name2_ = vname(x(i));
_corr_ = x(i);
output;
end;
run;

SAS Summing a dynamic range in array

I have an array with totals for 210 days. I need to find the sum of all 90 day ranges. The new array is med_sum. So med_sum(1) =sum(of Total(32)-total(121)), then med_sum(2)=sum(of total(33)-total(122)), and so on, 90 different times all the way to med_sum(90)=sum(of total(121)-total(210)).
Below is the syntax, but the sum(of) function isn't allowing me to do this and errors out. I have tried quite a few different options but have been unable to find anything that works.
Thank you in advance!!
data work.total_base_3;
set work.total_base_2;
array med_total(*) total1-total210;
array med_sum(*) avg1-avg90;
do i = 1 to 90;
med_sum(i)=sum(of med_total(i+31)-med_total(i+120));
end;
run;
You cannot use array references in variable lists, just actual variable names. So you want to generate 90 sums of 90 values with the window sliding. In essence you want
avg1 = sum(of total32 - total121);
avg2 = sum(of total33 - total122);
avg3 = sum(of total34 - total123);
You could use macro logic to just generate that series of statements. But if you look at the relationship between the variables you can see that
med_sum(n+1) = sum(med_sum(n),med_total(n+1+120),-1*med_total(n+31));
So your loop will look something like:
med_sum(1) = sum(of total32-total121);
do n=1 to dim(med_sum)-1;
med_sum(n+1) = sum(med_sum(n),med_total(n+1+120),-1*med_total(n+31));
end;
here's a sample that you should be able to extend to your data (so in your case you would change the 3's to 90's):
case 1 Data in rows :
data test;
keep obs;
do i=1 to 10;
obs = i;
output;
end;
run;
data test1;
set test;
keep obs sum;
array x[3];
retain x;
x[mod(_n_ -1,3)+1] = obs;
if (_n_ >= 3)then do;
sum = 0;
do i = 1 to 3;
sum= sum + x[i];
end;
end;
run;
case 2 data in columns (use the test dataset from from above):
proc transpose data=test out=testrow;
var obs;
run;
data test2;
set testrow;
array med_total(*) col1-col10;
array med_sum(8) ;
do i = 3 to 10;
med_sum[i-2]=0;
do j = 1 +(i-3) to i;
med_sum(i-2)=med_sum[i-2] + med_total(j);
end;
end;
run;

SAS: Looping through Logistic Regression Score table, calculating profit with cut-off, and output cut-off along with profit into seperate table

So to begin, I have completed a binary logistic regression model, and output several tables. I have a 'scored' data set that contains the actual default of a customer (GOODBAD) which is binary. I then have a predicted probability of default (p_1) that ranges from 0 to 1. I then must decide a cut-off value, generate a new variable that is a predicted-default that is now binary.
What I'm attempting to do is loop through potential cut-off values (lets say from .1 to .5 by a step of .1), and then calculate 'profit' from each of these 5 steps. I then want both the cut-off value and the 'profit' value in a separate data set to generate a graph of this relationship so that I may find my maximal profit.
Below is the code I currently have for generating a specific cut-off and it's associated profit. (the proc report shouldn't change at all, as these are pre-determined values for accounts/situations)
%MACRO PROFIT;
%DO I=1 %TO 5;
DATA TEST&i;
SET TRANS.SCORE;
IF P_1 >= .&i THEN preds = 1;
ELSE preds = 0;
RUN;
Data probs&i;
format outcometype $6.;
Set TEST&i (keep=preds goodbad crelim);
crelim2 = crelim/2;
if (preds=1 and goodbad=0) then do;
outcometype="error2";
profit =0;
end;
else if (preds=0 and goodbad=1) then do;
outcometype ="error1";
profit =-crelim2;
end;
else if (preds=1 and goodbad=1) then do;
outcometype="valid1";
profit=0;
end;
else do;
outcometype="valid2";
profit=250;
end;
run;
PROC REPORT DATA= probs&i nowd out=table&i;
COLUMN outcometype pct n profit pper1000;
DEFINE outcometype /group width = 8 ;
DEFINE profit /format=dollar15.2;
define pper1000 / computed format=dollar15.2;
/*get the overall number of obs*/
compute before;
overall=n;
endcomp;
compute pper1000;
pper1000 = (profit.sum/n)*1000;
endcomp;
compute before outcometype;
totaln=n;
endcomp;
compute pct;
pct = (totaln/overall);
if _break_ = '_RBREAK_' then pct= (overall/overall);
endcomp;
rbreak after/summarize dol;
RUN;
quit;
Data out;
set table&i;
CALL symput('profitAT', PUT(pper1000));
run;
Data new;
CutOFF = .&i;
profit = &profitAT;
run;
%END;
*proc print data = new; Run;
%MEND PROFIT;
%PROFIT;
The problem I'm having and I cannot seem to resolve is that I am over-writing the previous value with the most current value of both i and pper100 (or the macro variable profitAT).
I dont know how to force this to be seen as observations, if I should be looping through my macro variables, if, since I am placing them in the data step, I should have a separate loop that will check N and then write the observations as N increases, or yet further if there's an alternative I have yet to discover.
Let's ignore the logic that is generating the dataset TABLE&i and the variable PPER1000 and concentrate on the looping aspects.
You can use PROC APPEND as a method to aggregate values in your loop.
%macro profit ;
%if %sysfunc(exist(new)) %then %do;
* Remove existing NEW dataset on first pass;
proc delete data=new;
run;
%end;
%do i=1 %to 5;
* ... generate TABLE&i ... ;
* Get value of PPER1000 from last observation of TABLE&i ;
data add;
set table&i end=eof ;
if eof;
cutoff = .&i;
profit = pper1000;
keep cutoff profit;
run;
* Accumulate results in NEW dataset. ;
proc append base=new data=add force ;
run;
%end;
proc print data = new; run;
%mend profit ;
%profit;

Count the number of times a value occurs

I have 7 variables, 489 observations with variable values of 0-4.
What I need is the count percentage of use.
Answers 0,1 stand for non usage, and answers 2,3,4 stand for usage.
I created 7 additional vars and turned all the values above to:
1=usage - 0=non-usage.
Now, I don't know how to count and present how many "1" I have for each var and divide it by 489.
data LAB7;
set LAB3;
array v{*} v21-v27;
array VU{7};
DO i=1 to dim(v);
if v[i] = 1|0 THEN VU[i]=0;
else VU[i]=1;
END;
run;
You can do this:
data usage;
set lab3 end=eof;
array v{*} v21-v27
array n{7};
retain n: 0;
do i = 1 to dim(v);
if v[i] in (2, 3, 4) then n[i] + 1;
end;
if eof then do j = 1 to dim(v);
variable = vname(v[j]);
pct_usage = 100 * n[j] / _n_;
output;
end;
keep variable pct_usage;
run;
This creates an array of counters, one per variable, that are incremented by one whenever the corresponding variable is equal to 2, 3, or 4.
At the end of the data step, we output a record for each variable and record the percentage as the counter divided by the number of observations (_n_ when eof is true).
An alternative would be to use proc freq.
data indicators;
set lab3;
array v{*} v21-v27;
array ind{7};
do i = 1 to dim(v);
ind[i] = (v[i] in (2, 3, 4));
end;
run;
proc freq data = indicators;
tables ind: / out = usage;
run;
This creates binary indicator variables, one for each of the input variables, that are 1 when the input is 2, 3, or 4, and 0 otherwise. Counts and percentages are then obtained using proc freq.

Resources