Parsing through SAS array, macro variables - arrays

I want to create an array which stores the names of variables. Then index into the array and pass to a function. So far I have the following:
%let variables = cat dog lion sheep;
data _null_;
array a_vars[*] &variables;
do i = 1 to dim(a_vars);
some_function(a_vars[i],i);
end;
run;
I'm running into a problem with assigning the variables to the array and then indexing the array in the function to do: some_function(cat, 1) or some_function(dog,2) etc.

I'm not sure I understand exactly what you want to do. As mentioned you can use VNAME to find the name of the ith array element. Is that really what you need?
26 data _null_;
27 array a_vars[*] &variables;
28 length name $32;
29 do i = 1 to dim(a_vars);
30 name = vname(a_vars[i]);
31 put (i name) (=);
32 end;
33 run;
i=1 name=cat
i=2 name=dog
i=3 name=lion
i=4 name=sheep

Related

Problem: Referencing Array Value, but Returning Zero

I have been working on randomly selecting items within an array. Below, I have outlined my process. I have made it to successfully step 6 (with many data checks), but for some reason, when I reference the array, I receive a value of zero. This has been confusing because even when I check the raw sorted data note a certain value, the value retrieved is zero. Additionally, I ran a VNAME to see which variable it was pulling and it corresponded to the correct place within the array. Does anyone know why I am returning a zero value from the array?
*STEP 1: Set all non-codes to zero;
ARRAY CEREAL [337] ha_DTQ02_1-ha_DTQ02_337;
DO i=1 to 337;
if CEREAL[i]=88888.00 THEN CEREAL[i]=0;
END;
*STEP 2: Sort so that all zero values come first and food codes come last;
call SORTN(ha_DTQ02_1-ha_DTQ02_337);
*STEP 3: Rename array in reverse order so that zeros come last and codes are first. Sort function above only works in ascending order;
RENAME ha_DTQ02_1- ha_DTQ02_337=ha_DTQ02_337-ha_DTQ02_1;
*STEP 4: Count number of cereals selected;
ARRAY CEREALS[337]ha_DTQ02_1-ha_DTQ02_337;
NUMCEREALS=0;
DO i=1 to 337;
IF CEREALS[i] NOT IN (.,0) THEN NUMCEREALS+1;
END;
*STEP 5: get a random number between those two numbers- this works just fine;
IF NUMCEREALS NE 0 THEN rand1 = rand('integer', 1, numCereals);
*ensure that your second random number isn't the same as the first random number;
if NUMCEREALS ge 2 then do until(rand2 ne rand1);
rand2 = rand('integer', 1, numCereals);
end;
*STEP 6: Pull value from array using random number.;
Note: This is where I am stuck. I have tried alternative code where I recreated a new array and tried to pull the values from that new array. I have also tried placing the code directly below before closing the do loop. When the code does run, the value for these variables is zero. After many data checks, steps 1-5 work well and achieve their goals.
dtd020Af = CEREALS (rand1);
dtd020Bf = CEREALS (rand2);
OPTIONS NOFMTERR;
run;
The SORTN call routine needs the OF operator in order to utilize a name list.
call SORTN(of ha_DTQ02_1-ha_DTQ02_337);
A keen eye on the LOG window should have shown you the WARNING
3214 call SORTN(ha_DTQ02_1-ha_DTQ02_337);
-----
134
WARNING 134-185: Argument #1 is an expression, which cannot be updated by the SORTN subroutine
call.
You can't rename variables during run-time and reference the value with the new names.
You have declared an ARRAY listing the variables in 1..337 order. Check, that's good.
You CAN declare a second ARRAY listing the variables in reverse 337..1 order!
You also do not want to use a variable that might be missing, rand2, as a index value.
Suggested code:
data have;
call streaminit(123);
do id = 1 to 100;
array X X1-X337;
do over X;
if rand('uniform') < 0.75 then X = 88888;
else
X = rand('integer',1,10);
if id=50 then if _I_ ne 10 then X=88888; else X=5;
end;
OUTPUT;
end;
run;
data want;
set have;
ARRAY CEREAL X1-X337;
DO i=1 to DIM(CEREAL);
if CEREAL[i]=88888.00 THEN CEREAL[i]=0;
END;
* sort the variables that comprise the CEREAL array;
call SORTN(of CEREAL(*));
* second array to reference variables in reverse order;
array CEREAL_REVERSE x337-x1;
* count how many non-missing/non-zero values at the end of the sorted variables;
DO i=1 to DIM(CEREAL);
IF CEREAL_REVERSE[i] IN (.,0) then leave;
NUMCEREALS = i;
END;
IF NUMCEREALS NE 0 THEN rand1 = rand('integer', 1, numCereals);
if NUMCEREALS ge 2 then
do until(rand2 ne rand1);
rand2 = rand('integer', 1, numCereals);
end;
* assign random selection if warranted;
if NUMCEREALS > 0 then dtd020Af = CEREAL_REVERSE (rand1);
if NUMCEREALS > 1 then dtd020Bf = CEREAL_REVERSE (rand2);
run;

Naming collection of variables at once in SAS

I am trying to create 46 variables, indexed from 0-45 dependent on 3 other variables, each of which is indexed from 0-45. It seems as though the array approach would be the most straightforward but I can't get it to work. So i have variables a_0,...,a_45,b_0,...,b_45,c_0,...,c_45 and i want to create d_i=a_i+b_i+c_i but I'm having some difficulty.
Attempt:
data test;
set test;
array d [0:45];
array a [0:45] a_0-a_45;
array b [0:45] b_0-b_45;
array c [0:45] c_0-c_45;
do i=0 to 45;
d[i]=a[i]+b[i]+c[i];
end;
run;
1) I can't seem to get the index from 0.
2) Whenever I run checks, the variables never add up in the intended way.
try changing array d defintion.
array d [0:45];
to
array d [0:45] d0-d45;
In case of array d [0:45] it creates d1 to d46 whereas in case of array d [0:45] d_0-d_45 you explicitly index from 0 to 45.
If you don't tell SAS what variable names to use for the array it will just create names using the array name and adding a numeric suffix. So when you wrote
array d [0:45];
You told it to create 46 variables named d1 to d46.
You could tell it what names to use.
array d [0:45] d_0 - d_45 ;
Also in the code you posted it doesn't really matter whether your index variable's value matches the numeric suffix on the variable names. So why not make it much simpler.
array a a_0-a_45;
array b b_0-b_45;
array c c_0-c_45;
array d d_0-d_45;
do i=1 to dim(a);
d(i)=a(i)+b(i)+c(i);
end;
You could also just number your variables starting with 1 instead of Zero and save a lot of headache.

Multiple Dynamic Array with conditional in SAS

I have two arrays and I would like to make one conditional on the other. ARRAY1 contains binary flags (0 or 1) and I would like to make the second array be blank if the contents in ARRAY1[i] is 0. ARRAY1 and ARRAY2 have the same number of elements.
data test;
set test_data;
array ARRAY1 &variable_flags;
array ARRAY2 $ &variable_list &variable_list_initial_values;
do i=1 to &variable_count;
if ARRAY1[i]=0 then ARRAY2[i]="";
end;
run;
My output works until it hits a 0 in ARRAY[i]. When that happens the column is blank after words. I end up with something like the attach image. Why is this happening?
The initial values for an array are set only once. They are not re-applied at the start of each iteration of the data step. You could change your logic to have another array with the initial values. Let's make some test data.
data test_data;
input matt_flg ## ;
cards;
1 1 0 0 1 1
;
Now let's set the value to either the default value or empty based on the value of the FLAG variable.
%let variable_flags=matt_flg;
%let variable_list=matt;
%let variable_list_initial_values="MATT";
%let variable_count=%sysfunc(countw(&variable_list));
%let maxlength=20 ;
data test;
set test_data;
array flags &variable_flags;
array vars $&maxlength. &variable_list ;
array default (&variable_count) $&maxlength. _temporary_ (&variable_list_initial_values);
do i=1 to dim(vars);
if flags(i) then vars(i)=default(i);
else vars(i)=' ';
end;
run;

SAS use an array to search through variables and then populate new variables

I have a data set with diagnosis codes, and each observation has multiple diagnosis codes, up to 95 (variables dx1-dx95), some of the dx codes are numeric but some are e codes (they have an E before the number, and then they become character variables). I need to write code that will look in all 95 dx code variables and pull out each time there’s an e code and make new variables called ecode1-ecode# (however many ecodes there are in that observation).
For example one observation might have dx1=999 dx2=E100 dx3=878 and dx4=E202, I need to make new variables ecode1=E100 ecode2=202. The code I wrote yesterday got me close, but what I wrote makes the above example ecode2=E100 ecode4=E202. The ecode variable # ends up being the same as the dx # instead of starting at 1 and counting up.
Here’s what I wrote yesterday:
**//array to pull out ecodes from dx1-dx95//**;
data ecodes;
set injurycodes;
*array to create new ecode variables;
array ecode{95}$ ecode1-ecode95;
*array to pull out ecodes;
array dxcode{95} dx1-dx95;
do i=1 to 95;
if 'E0000' le dxcode{i} le 'E9999' then ecode{i}=dxcode{i};
end;
drop i;
run;
I know the problem right now is the ecode{i}=dxcode{i} piece. This is pulling out the Ecodes, but they aren't starting with ecode1, ecode2, etc.
Updated code:
data ecodes;
set injurycodes;
array ecode{95}$ ecode1-ecode95;
array dxcode{95} dx1-dx95;
j=0;
DO i=1 TO 95;
IF SUBSTR(CATT(dxcode{i}),1,1)="E" THEN DO;
ecode{j}=dxcode{i};
j=j+1;
END;
END;
run;
Now I'm getting "invalid second argument to function SUBSTR"
Just check the first character of dxcode with SUBSTR, and use j to loop ecode.
j=0;
DO i=1 TO 95;
IF SUBSTR(CATT(dxcode{i}),1,1)="E" THEN DO;
ecode{j}=dxcode{i};
j=j+1;
END;
END;
Your main problem is you need to keep a separate counter variable to use to index into the output array.
data ecodes;
set injurycodes;
array ecode(95) $5;
array dx (95) ;
j=1;
do i=1 to dim(dx);
if dx(i)=:'E' then do;
ecode(j) = dx(i);
j=j+1;
end;
end;
drop i j;
run;

Bring the length of an array to another array in SAS?

I have a big SAS table, let's describe the columns as, A nd B columns in character format and all other columns are vairable in numerical format (every variable has a different name) with unknow amounth length N, like:
A B Name1 Name2 Name3 .... NameN
-------------------------------------------------
Char Char Number1 Number2 Number3 ..... NumberN
.................................................
.................................................
The goal is that the numerical array Name1-NameN will sum up downward through the Class=B (By B),
So the final table will look like this:
A B Name1 Name2 Name3 .... NameN
----------------------------------------
Char Char Sum1 Sum2 Sum3 ..... SumN
........................................
........................................
To do this sum-up, I described 2 arrays. The first one is:
array Varr {*} _numeric_; /* it reads only numerical columns */
Then I described another array with the same length (Summ1-SummN) to do the sum-up process.
The thing is that I can only describe the length of this new array manually. For example, if there are 80 numerical values, then I have to write manually like:
array summ {80} Summ1-Summ80;
The code works when I write it manually. But instead I want to write something like
array summ {&N} Summ1-Summ&N; /* &N is the dimension of the array Varr */
I tried with do-loop and dim(Varr) under the array in many different ways like:
data want;
array Varr {*} _numeric_;
do i=1 to dim(Varr);
N+1 ;
end;
%put &N;
array Summ {&N} Summ1-Summ&N;
retain Summ;
if first.B then do i=1 to dim(varr); summ(i)=varr(i) ;end;
else do i =1 to dim(varr); summ(i) = summ(i) + varr(i) ; varr(i)=summ(i); end;
drop Summ1-Summ&N;
run;
But it doesn't work. Any idea about how to bring the length of the first array to the second array?
You need to calculate and store the number of numeric variables in a previous step. The easiest way is to use the dictionary.columns metadata table, available in proc sql. This contains all column details for a given dataset, including the type (num or char), you therefore just need to count the number of columns where the type is 'num'.
The code below does just that and stores the result in a macro variable, &N. using the into : functionality. I've also used the functions left and put to remove leading blanks from the macro variable, otherwise you'll encounter problems when putting summ1-summ&N.
I've also added a 2nd solution based on your answer, but will be more efficient as it doesn't read in any records, only the column details
proc sql noprint;
select left(put(count(*),best12.)) into :N
from dictionary.columns
where libname='SASHELP' and memname='CLASS' and type='num';
quit;
%put Numeric variables = &N.;
/*****************************************/
/* alternative solution */
data _null_;
set sashelp.class (obs=0);
array temp{*} _numeric_;
call symputx('N',dim(temp));
run;
%put Numeric variables = &N.;
Now I found another solution with a little modification of the solution from #kl78
Before when I tried with call symput ('N',dim(varr)); I forgot to change the numeric format and to remove the uneccessary spaces. When I run it without format, the code tried to find Summ_____87, so it gave error.
Now I run it with format, call symput ('N',put(dim(varr),2.)); the code can find Summ87, so it is totally sucessfull now.

Resources