How to use call dynamic_array right? - arrays

I have try to write a user-defined function MinDis() and apply it in data step, this function is used to compute the minimum distance from one point to each element of an (numeric)array. Code fllowing:
proc fcmp outlib = work.funcs.Math;
function MinDis(Exp,Arr[*]);
array dis[2] /symbols;
call dynamic_array(dis,dim(Arr));
do i = 1 to dim(Arr);
dis[i] = abs((Exp - Arr[i]));
end;
return(min(of dis[*]));
endsub;
quit;
option cmplib=work.funcs ;
data MinDis;
input LamdazLower LamdazUpper #;
cards;
2.50 10.0
2.51 10.8
2.49 9.97
2.75 9.50
;
run;
data _null_;
set ;
array _PTN_ [14] _temporary_ (0.5,1,1.5,2,2.5,3,4,5,6,7,8,9,10,12);
StdLamZLow = MinDis(LamdazLower,_PTN_);
StdLamZUpp = MinDis(LamdazUpper,_PTN_);
put _all_;
run;
It was compile rightly but gave wrong results. StdLamZLow just get the minimum distance from LamdazLower to the first two element of array _PTN_.
When I rewrite the dim of dis as 999 or something very big and get rid of call dynamic_array statement I would get it right. But I surely want to know why min(of dis[*]) just take dis as a 2-dim array.
By the way, how can I use implied DO-loops do over ... to instead of explicit DO loops? I have tried several times but haven`t success yet.
Thanks for any hints.

I think this happens cause of dynamic array. MIN function see just static length of dis array(2 in your case). So you should try to compute min of array without calling MIN function:
proc fcmp outlib = work.funcs.Math;
function MinDis(Exp,Arr[*]);
length=dim(Arr);
array dis[2] /nosymbols;
call dynamic_array(dis,length);
dis[1]=abs((Exp - Arr[1]));
min=dis[1];
do i = 1 to length;
dis[i] = abs((Exp - Arr[i]));
if dis[i] < min then min=dis[i];
end;
return(min);
endsub;
quit;
Output:
LamdazLower=2.5 LamdazUpper=10 StdLamZLow=0 StdLamZUpp=0
LamdazLower=2.51 LamdazUpper=10.8 StdLamZLow=0.01 StdLamZUpp=0.8
LamdazLower=2.49 LamdazUpper=9.97 StdLamZLow=0.01 StdLamZUpp=0.03
LamdazLower=2.75 LamdazUpper=9.5 StdLamZLow=0.25 StdLamZUpp=0.5
In addition about implied DO-loops(here on 6th page), if i correctly understood the question:
data temp;
array dis dis1-dis4;
do over dis;
dis=2;
put _all_;
end;
run;
Output:
I=1 dis1=2 dis2=. dis3=. dis4=. ERROR=0 N=1
I=2 dis1=2 dis2=2 dis3=. dis4=. ERROR=0 N=1
I=3 dis1=2 dis2=2 dis3=2 dis4=. ERROR=0 N=1
I=4 dis1=2 dis2=2 dis3=2 dis4=2 ERROR=0 N=1

Related

SAS array unable to process long list of variables

I am trying to log, square, cubic and log-odds transform my input data to provide an exhaustive overview of the best performing transformation in univariate regression
I have tried the following code on a dataset with 1,000 variables - It returns an error / runs out of memory or simply cannot execute. Are there any limitations with transforming variables en-masse in this way using arrays?
/*Create a table for reference*/
DATA input_data;
ARRAY var_[*] var_1-var_1000;
DO i = 1 to 1000;
DO i = 1 to 1000;
var_(i)= i*j;
output;
END;
END;
RUN;
/*Log, square, cubic, logit transform all variables*/
DATA input_transform;
SET input_data;
ARRAY var[*] var_1-var_1000;
ARRAY log[*] log_1-log_1000;
ARRAY logit[*] logit_1-logit_1000;
ARRAY sq[*] sq_1-sq_1000;
ARRAY cubic[*] cubic_1-cubic_1000;
DO i = 1 to 1000;
log(i) = log(var(i));
logit(i) = log((var(i))/(1-var(i)));
sq(i) = var(i)**2;
cubic(i) = var(i)**3;
END;
RUN;
A new dataset with 5000 variables each with the respective transformation
You are using I as the index variable for both or your two nested do loops. That is probably messing them up.
Also your first data step is writing 1,000,000 observations of 1,002 variables with only the lower left triangle of the "array" filled in. Do you really want the OUTPUT statement inside the loop?
Hypothetically there are no issues with this, as long as your code is correct. Here's an example and the log.
option notes;
%let size=1000;
/*Create a table for reference*/
DATA input_data;
ARRAY var_[*] var_1-var_&size.;
DO i = 1 to &size.;
DO j = 1 to &size.;
var_(j)= i*j;
END;
output;
END;
RUN;
/*Log, square, cubic, logit transform all variables*/
DATA input_transform;
SET input_data;
ARRAY _var[*] var_1-var_&size.;
ARRAY _log[*] log_1-log_&size.;
ARRAY _logit[*] logit_1-logit_&size.;
ARRAY _sq[*] sq_1-sq_&size.;
ARRAY _cubic[*] cubic_1-cubic_&size.;
DO i = 1 to &size.;
_log(i) = log(_var(i));
_logit(i) = sqrt(_var(i));
_sq(i) = _var(i)**2;
_cubic(i) = _var(i)**3;
END;
RUN;
and the log:
1576 option notes;
1577 %let size=1000;
1578
1579 /*Create a table for reference*/
1580 DATA input_data;
1581 ARRAY var_[*] var_1-var_&size.;
1582
1583 DO i = 1 to &size.;
1584 DO j = 1 to &size.;
1585 var_(j)= i*j;
1586 END;
1587 output;
1588 END;
1589 RUN;
NOTE: The data set WORK.INPUT_DATA has 1000 observations and 1002
variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
1590
1591 /*Log, square, cubic, logit transform all variables*/
1592 DATA input_transform;
1593 SET input_data;
1594 ARRAY _var[*] var_1-var_&size.;
1595 ARRAY _log[*] log_1-log_&size.;
1596 ARRAY _logit[*] logit_1-logit_&size.;
1597 ARRAY _sq[*] sq_1-sq_&size.;
1598 ARRAY _cubic[*] cubic_1-cubic_&size.;
1599
1600 DO i = 1 to &size.;
1601 _log(i) = log(_var(i));
1602 _logit(i) = sqrt(_var(i));
1603 _sq(i) = _var(i)**2;
1604 _cubic(i) = _var(i)**3;
1605 END;
1606 RUN;
NOTE: There were 1000 observations read from the data set
WORK.INPUT_DATA.
NOTE: The data set WORK.INPUT_TRANSFORM has 1000 observations and 5002
variables.
NOTE: DATA statement used (Total process time):
real time 0.12 seconds
cpu time 0.10 seconds

Transpose a correlation matrix into one long vector in SAS

I'm trying to turn a correlation matrix into one long column vector such that I have the following structure
data want;
input _name1_$ _name2_$ _corr_;
datalines;
var1 var2 0.54
;
run;
I have the following code, which outputs name1 and corr; however, I'm struggling to get name2!
DATA TEMP_1
(DROP=I J);
ARRAY VAR[*] VAR1-VAR10;
DO I = 1 TO 10;
DO J = 1 TO 10;
VAR(J) = RANUNI(0);
END;
OUTPUT;
END;
RUN;
PROC CORR
DATA=TEMP_1
OUT=TEMP_CORR
(WHERE=(_NAME_ NE " ")
DROP=_TYPE_)
;
RUN;
PROC SORT DATA=TEMP_CORR; BY _NAME_; RUN;
PROC TRANSPOSE
DATA=TEMP_CORR
OUT=TEMP_CORR_T
;
BY _NAME_;
RUN;
Help is appreciated
You're close. You're running into a weird issue with the name variable because that becomes a variable out of PROC TRANSPOSE as well. If you rename it, you get what you want. I also list the variables explicitly and add some RENAME data set options to get what you likely want.
PROC TRANSPOSE
DATA=TEMP_CORR (rename=_name_ = Name1)
OUT=TEMP_CORR_T (rename = (_name_ = Name2 col1=corr))
;
by name1;
var var1-var10;
RUN;
Edit: If you don’t want duplicates you can add a WHERE to the OUT dataset.
PROC TRANSPOSE
DATA=TEMP_CORR (rename=_name_ = Name1)
OUT=TEMP_CORR_T (rename = (_name_ = Name2 col1=corr) where = name1 > name2)
;
by name1;
var var1-var10;
RUN;
Just an ARRAY with VNAME() function. To just output the upper triangle set lower bound of DO loop to _N_.
data want ;
length _name1_ _name2_ $32 _corr_ 8 ;
keep _name1_ _name2_ _corr_;
set corr;
where _type_ = 'CORR';
array x _numeric_;
_name1_=_name_;
do i=_n_ to dim(x);
_name2_ = vname(x(i));
_corr_ = x(i);
output;
end;
run;

Recursion in Array SAS

I have an existing collection of variables a_0,...,a_45 where a_i represents the amount of stuff I have on day i. I'd like to create a new collection of variables b_0,...,b_45 to represent the incremental change in stuff I have on day i (i.e. b_k=a_k-a_(k-1) ). My approach:
data test;
set dataset;
array a a_0-a_45;
array b b_0-b_45;
b(1)=a(1);
do i=2 to 45;
b(i)=a(i)-a(i-1);
end;
run;
However my b variables just come out missing.
What initial values do you have for a_1 to a_45 before you start the loop? As you are not intialising them (except for a_0 ≡ a(1)), every b(i) term will be a difference of 2 a terms, of which at least one will be missing, unless these variables are populated in your input dataset.
Here is some sample code showing that the delta computation is correct when the variable names in the data set align with the variables named in the array statement in the data step.
Sample data
data have(keep=product_id note a_:);
do product_id = 1 to 100;
length note $15;
array amount a_0-a_45;
call missing(of amount(*));
if (ranuni(123) < 0.5) then do;
note = 'static deltas';
static_delta = ceil(5 * ranuni(123));
amount(1) = static_delta;
do inventory_day = 2 to dim(amount);
amount(inventory_day) = amount(inventory_day-1) + static_delta;
end;
end;
else do;
note = 'random deltas';
amount(1) = ceil(5 * ranuni(123));
do inventory_day = 2 to dim(amount);
amount(inventory_day) = max ( 0, amount(inventory_day-1) + floor(10 * ranuni(123)) - 5 );
end;
end;
OUTPUT;
end;
run;
Compute deltas
data want;
set have;
array amount a_0-a_45;
array delta b_0-b_45;
delta(1) = amount(1);
do i=2 to dim(amount);
delta(i) = amount(i) - amount(i-1);
end;
drop i;
format a_: b_: 4.;
run;
As Richard has already suggested in his comment while I was working on writing the code...Basically the only error that you have in your code is that your code should loop from 2 to 46 because there are 46 elements in the array. below code should work for you.
%macro f();
data dataset;
%do i = 0 %to 45;
a_&i. = ranuni(2);
%end;
run;
%mend;
%f();
data test;
set dataset;
array a1 a_0-a_45;
array b1 b_0-b_45;
/* This line will help in avoiding b_0 to have a missing value */
b1(1)=a1(1);
do i=2 to 46;
b1(i)=a1(i)-a1(i-1);
end;
run;

SAS Summing a dynamic range in array

I have an array with totals for 210 days. I need to find the sum of all 90 day ranges. The new array is med_sum. So med_sum(1) =sum(of Total(32)-total(121)), then med_sum(2)=sum(of total(33)-total(122)), and so on, 90 different times all the way to med_sum(90)=sum(of total(121)-total(210)).
Below is the syntax, but the sum(of) function isn't allowing me to do this and errors out. I have tried quite a few different options but have been unable to find anything that works.
Thank you in advance!!
data work.total_base_3;
set work.total_base_2;
array med_total(*) total1-total210;
array med_sum(*) avg1-avg90;
do i = 1 to 90;
med_sum(i)=sum(of med_total(i+31)-med_total(i+120));
end;
run;
You cannot use array references in variable lists, just actual variable names. So you want to generate 90 sums of 90 values with the window sliding. In essence you want
avg1 = sum(of total32 - total121);
avg2 = sum(of total33 - total122);
avg3 = sum(of total34 - total123);
You could use macro logic to just generate that series of statements. But if you look at the relationship between the variables you can see that
med_sum(n+1) = sum(med_sum(n),med_total(n+1+120),-1*med_total(n+31));
So your loop will look something like:
med_sum(1) = sum(of total32-total121);
do n=1 to dim(med_sum)-1;
med_sum(n+1) = sum(med_sum(n),med_total(n+1+120),-1*med_total(n+31));
end;
here's a sample that you should be able to extend to your data (so in your case you would change the 3's to 90's):
case 1 Data in rows :
data test;
keep obs;
do i=1 to 10;
obs = i;
output;
end;
run;
data test1;
set test;
keep obs sum;
array x[3];
retain x;
x[mod(_n_ -1,3)+1] = obs;
if (_n_ >= 3)then do;
sum = 0;
do i = 1 to 3;
sum= sum + x[i];
end;
end;
run;
case 2 data in columns (use the test dataset from from above):
proc transpose data=test out=testrow;
var obs;
run;
data test2;
set testrow;
array med_total(*) col1-col10;
array med_sum(8) ;
do i = 3 to 10;
med_sum[i-2]=0;
do j = 1 +(i-3) to i;
med_sum(i-2)=med_sum[i-2] + med_total(j);
end;
end;
run;

reading a data set multiple times in SAS

I am new here. I am trying to read in a data set multiple times. so for example, assume that I have 3 observations in a data set (called tempfile) for a variable called temp. the three observations are 4,6, and 5.. so I want to read in the set x number of times so the 4th observation would be 4, fifth would be 6 and sixth, would be 5. the 7th would be 4, etc etc. I have tried this literally a few dozen ways, by doing something like
data new;
do i=1 to 100;
set tempfile;
end;
output;
run;
I have tried this by moving the do statement, moving the output statement, omitting the output statement..... every which way, trying macros also. can somebody help? thanks John
followup....
Hello:
Thanks for response. That did work. I would like to now do several things involving some “if then” statements inside the loop (more than just reading in the data set).
I want to read in a data set n number of times, and each time, there will be two if then statements
So, assume I read in 3 numbers any number of times; 7, 15, and 12
As each number is read, it will ask if it is less than 10. And each time it will create a random number.
If less than 10, then
If rand(uniform) < .4 then 1 is added to counter1, else 1 is added to counter2
And if >= 10,
Then
If rand(uniform) < .2 then 1 is added to counter1, else 1 is added to counter2
Any help is much appreciated.
Thanks
John
The way that most data steps actually stop is when SAS reads past the end of the input. So you need a method that prevents SAS from doing that.
The easiest way to replicate the data is to just execute multiple output statements. So the first record is repeated three times, then the second record is repeated three times, etc.
data want;
set tempfile ;
do i=1 to 3;
output;
end;
run;
Another method is to just list the dataset multiple times on the SET statement. So to read it in 3 times just use
data want;
set tempfile tempfile tempfile;
run;
You could probably use macro logic or even just a macro variable to make the number of repetitions variable.
data _null_; call symputx('list',repeat('tempfile ',3-1)); run;
data want; set &list; run;
Other method is to use the POINT= and NOBS= options on the SET statement so that SAS never reads past the end and you can jump back to the beginning. But since it never reads past the end of the input data you will need to manually tell it when to stop.
data want ;
do i=1 to 3;
do p=1 to nobs ;
set tempfile point=p nobs=nobs;
output;
end;
end;
stop;
run;
Or more in the spirit of your original post you might want to use the MOD() function to figure out which observation to read next.
data want;
if _n_ > 100 then stop;
p=1+mod(_n_-1,nobs);
set tempfile point=p nobs=nobs;
run;
If you have SAS/STAT software SURVEYSELECT.
data have;
do temp=4,6,5;
output;
end;
run;
proc surveyselect reps=10 rate=1 out=temp2 noprint;
run;
The data step is designed for serial processing. In this case, you need to "remember" previous observations. You can do it using only the data step, but for that use case, there are other solutions in the SAS environment that are simpler. The one I suggest is a macro that appends the original file n times:
%macro replicate( data=, out=, n=)/des='&out is &data repeated &n times.';
data &out;
set
%do i=1 %to &n;
&data
%end;
; /* This ; ends the data step `set` statement */
run;
%mend;
You could test your example with this helper:
%macro test;
data have; /* create the example data set */
temp = 4; output;
temp = 6; output;
temp = 5; output;
run;
%replicate( data=have, out=want, n=4 );
proc print; quit;
%mend;
Here is a portion of the SAS doc that adds lots of detail with many examples.

Resources