I have been working on randomly selecting items within an array. Below, I have outlined my process. I have made it to successfully step 6 (with many data checks), but for some reason, when I reference the array, I receive a value of zero. This has been confusing because even when I check the raw sorted data note a certain value, the value retrieved is zero. Additionally, I ran a VNAME to see which variable it was pulling and it corresponded to the correct place within the array. Does anyone know why I am returning a zero value from the array?
*STEP 1: Set all non-codes to zero;
ARRAY CEREAL [337] ha_DTQ02_1-ha_DTQ02_337;
DO i=1 to 337;
if CEREAL[i]=88888.00 THEN CEREAL[i]=0;
END;
*STEP 2: Sort so that all zero values come first and food codes come last;
call SORTN(ha_DTQ02_1-ha_DTQ02_337);
*STEP 3: Rename array in reverse order so that zeros come last and codes are first. Sort function above only works in ascending order;
RENAME ha_DTQ02_1- ha_DTQ02_337=ha_DTQ02_337-ha_DTQ02_1;
*STEP 4: Count number of cereals selected;
ARRAY CEREALS[337]ha_DTQ02_1-ha_DTQ02_337;
NUMCEREALS=0;
DO i=1 to 337;
IF CEREALS[i] NOT IN (.,0) THEN NUMCEREALS+1;
END;
*STEP 5: get a random number between those two numbers- this works just fine;
IF NUMCEREALS NE 0 THEN rand1 = rand('integer', 1, numCereals);
*ensure that your second random number isn't the same as the first random number;
if NUMCEREALS ge 2 then do until(rand2 ne rand1);
rand2 = rand('integer', 1, numCereals);
end;
*STEP 6: Pull value from array using random number.;
Note: This is where I am stuck. I have tried alternative code where I recreated a new array and tried to pull the values from that new array. I have also tried placing the code directly below before closing the do loop. When the code does run, the value for these variables is zero. After many data checks, steps 1-5 work well and achieve their goals.
dtd020Af = CEREALS (rand1);
dtd020Bf = CEREALS (rand2);
OPTIONS NOFMTERR;
run;
The SORTN call routine needs the OF operator in order to utilize a name list.
call SORTN(of ha_DTQ02_1-ha_DTQ02_337);
A keen eye on the LOG window should have shown you the WARNING
3214 call SORTN(ha_DTQ02_1-ha_DTQ02_337);
-----
134
WARNING 134-185: Argument #1 is an expression, which cannot be updated by the SORTN subroutine
call.
You can't rename variables during run-time and reference the value with the new names.
You have declared an ARRAY listing the variables in 1..337 order. Check, that's good.
You CAN declare a second ARRAY listing the variables in reverse 337..1 order!
You also do not want to use a variable that might be missing, rand2, as a index value.
Suggested code:
data have;
call streaminit(123);
do id = 1 to 100;
array X X1-X337;
do over X;
if rand('uniform') < 0.75 then X = 88888;
else
X = rand('integer',1,10);
if id=50 then if _I_ ne 10 then X=88888; else X=5;
end;
OUTPUT;
end;
run;
data want;
set have;
ARRAY CEREAL X1-X337;
DO i=1 to DIM(CEREAL);
if CEREAL[i]=88888.00 THEN CEREAL[i]=0;
END;
* sort the variables that comprise the CEREAL array;
call SORTN(of CEREAL(*));
* second array to reference variables in reverse order;
array CEREAL_REVERSE x337-x1;
* count how many non-missing/non-zero values at the end of the sorted variables;
DO i=1 to DIM(CEREAL);
IF CEREAL_REVERSE[i] IN (.,0) then leave;
NUMCEREALS = i;
END;
IF NUMCEREALS NE 0 THEN rand1 = rand('integer', 1, numCereals);
if NUMCEREALS ge 2 then
do until(rand2 ne rand1);
rand2 = rand('integer', 1, numCereals);
end;
* assign random selection if warranted;
if NUMCEREALS > 0 then dtd020Af = CEREAL_REVERSE (rand1);
if NUMCEREALS > 1 then dtd020Bf = CEREAL_REVERSE (rand2);
run;
Related
I would like to set the length of an array depending on what value i obtain from reading a dataset:number which has one variable num with one numeric value. But I am getting an error message: saying that I cannot initiate the probs array. Can i get any suggestion on how to solve this issue? (I really don't want to hardcode the length of the probs array)
data test;
if _n_=1 then do;
set work.number;
i = num +1;
end;
array probs{i} _temporary_ .....
SAS Data step arrays can not be dynamically sized during step run-time.
One common approach is to place the computed number of rows of the data set into a macro variable before the data step.
I'm not sure what you are doing with probs.
What values will be going into the array elements ?
Do you need all prob data while iterating through each row of the data set ?
Is a single result computed from the probs data ?
Example - Compute row count in a data null using nobs set option:
data _null_;
if 0 then set work.number nobs=row_count;
call symputx ('ROW_COUNT', row_count);
stop;
run;
data test;
array probs (&ROW_COUNT.) _temporary_;
* something with probs[index] ;
* maybe fill it ?
do index = 1 by 1 until (last_row);
set work.number;
probs[index] = prob; * prob from ith row;
end;
* do some computation that a procedure isn't able to;
…
result_over_all_data = my_magic; * one result row from processing all prob;
output;
stop;
run;
Of course your actual use of the array will vary.
The many other ways to get row_count include dictionary.table views, sql select count(*) into and a variety of ATTRN invocations.
I have two arrays and I would like to make one conditional on the other. ARRAY1 contains binary flags (0 or 1) and I would like to make the second array be blank if the contents in ARRAY1[i] is 0. ARRAY1 and ARRAY2 have the same number of elements.
data test;
set test_data;
array ARRAY1 &variable_flags;
array ARRAY2 $ &variable_list &variable_list_initial_values;
do i=1 to &variable_count;
if ARRAY1[i]=0 then ARRAY2[i]="";
end;
run;
My output works until it hits a 0 in ARRAY[i]. When that happens the column is blank after words. I end up with something like the attach image. Why is this happening?
The initial values for an array are set only once. They are not re-applied at the start of each iteration of the data step. You could change your logic to have another array with the initial values. Let's make some test data.
data test_data;
input matt_flg ## ;
cards;
1 1 0 0 1 1
;
Now let's set the value to either the default value or empty based on the value of the FLAG variable.
%let variable_flags=matt_flg;
%let variable_list=matt;
%let variable_list_initial_values="MATT";
%let variable_count=%sysfunc(countw(&variable_list));
%let maxlength=20 ;
data test;
set test_data;
array flags &variable_flags;
array vars $&maxlength. &variable_list ;
array default (&variable_count) $&maxlength. _temporary_ (&variable_list_initial_values);
do i=1 to dim(vars);
if flags(i) then vars(i)=default(i);
else vars(i)=' ';
end;
run;
I have a data set with diagnosis codes, and each observation has multiple diagnosis codes, up to 95 (variables dx1-dx95), some of the dx codes are numeric but some are e codes (they have an E before the number, and then they become character variables). I need to write code that will look in all 95 dx code variables and pull out each time there’s an e code and make new variables called ecode1-ecode# (however many ecodes there are in that observation).
For example one observation might have dx1=999 dx2=E100 dx3=878 and dx4=E202, I need to make new variables ecode1=E100 ecode2=202. The code I wrote yesterday got me close, but what I wrote makes the above example ecode2=E100 ecode4=E202. The ecode variable # ends up being the same as the dx # instead of starting at 1 and counting up.
Here’s what I wrote yesterday:
**//array to pull out ecodes from dx1-dx95//**;
data ecodes;
set injurycodes;
*array to create new ecode variables;
array ecode{95}$ ecode1-ecode95;
*array to pull out ecodes;
array dxcode{95} dx1-dx95;
do i=1 to 95;
if 'E0000' le dxcode{i} le 'E9999' then ecode{i}=dxcode{i};
end;
drop i;
run;
I know the problem right now is the ecode{i}=dxcode{i} piece. This is pulling out the Ecodes, but they aren't starting with ecode1, ecode2, etc.
Updated code:
data ecodes;
set injurycodes;
array ecode{95}$ ecode1-ecode95;
array dxcode{95} dx1-dx95;
j=0;
DO i=1 TO 95;
IF SUBSTR(CATT(dxcode{i}),1,1)="E" THEN DO;
ecode{j}=dxcode{i};
j=j+1;
END;
END;
run;
Now I'm getting "invalid second argument to function SUBSTR"
Just check the first character of dxcode with SUBSTR, and use j to loop ecode.
j=0;
DO i=1 TO 95;
IF SUBSTR(CATT(dxcode{i}),1,1)="E" THEN DO;
ecode{j}=dxcode{i};
j=j+1;
END;
END;
Your main problem is you need to keep a separate counter variable to use to index into the output array.
data ecodes;
set injurycodes;
array ecode(95) $5;
array dx (95) ;
j=1;
do i=1 to dim(dx);
if dx(i)=:'E' then do;
ecode(j) = dx(i);
j=j+1;
end;
end;
drop i j;
run;
In SAS if I have a string or an Array like the following,
array x[4] $1 ('A' 'B' 'C' 'D');
I need to generate all "Unique" permutations of the elements like the following,
[ABCD]
[ABC]
[BCD]
[ACD]
[ABD]
[AB]
[AC]
[AD]
[BC]
[BD]
[CD]
[A]
[B]
[C]
[D]
Is there a function in SAS for generating all possible combinations of the array?
Assumption: I believe you are looking for combinations and not permutations, so the order does not matter, BA and AB are same thing.
Use call allcomb subroutine and comb function to find out the possible combinations.
Read more about allcomb and comb here
http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p0yx35py6pk47nn1vyrczffzrw25.htm
and here
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001009658.htm
In short -
call allcomb subroutine gives the possible combinations out of n elements when r of them are taken and
comb function gives you how many combinations it would be when out of n elements r of them are taken at a time.
data test(keep=my_string);
length my_string $50.;
array a[4] $ ('A' 'B' 'C' 'D');
n = dim(a);
do k=1 to n;
do j=1 to comb(n,k);
call allcomb(j,k,of a[*]);
do i = 1 to k;
if i=1 then do; my_string="";counter=0;end;
counter=counter+1;
my_string=cat(compress(my_string),compress(a[i]));
if counter=k then output;
end;
end;
end;
run;
A slightly different take on this is to use proc summary.
Create a dummy dataset. Assign each element of the array to a variable so we can feed it into proc summary:
data tmp;
array arr[*] a b c d (1 1 1 1);
run;
Run proc summary.
proc summary data=tmp noprint missing;
class a b c d;
output out=combinations;
run;
You can also use the ways or types statements in proc summary to limit any combinations you may want.
Now the interesting side effect of doing this is that you get the _type_ column in the output dataset as well. In the example above the following values would be assigned:
D = 1
C = 2
B = 4
A = 8
So if the _type_ value in the output dataset is 13, then we know that the row was generated by combining A, B and D (8 + 4 + 1).
Here is a quick script that will find the combinations of the individual characters within a string. This could be easily adapted to work with arrays if you prefer. Rather than using the combinational functions and call routines (all*, lex*, ran*) this approach creates the permutations by using the binary representation of integers up to 2**len. I prefer this approach as I think it demonstrates what is happening, is more transparent and doesn't change the order of the array assignments.
data have;
str = "123456"; output;
str = "ABCD"; output;
run;
data want;
set have;
length comb $20.;
len = length(str);
/* Loop through all possible permutations */
do _i = 0 to 2**len - 1;
/* Store the current iteration number in binary */
_bin = putn(_i, "binary" || put(len, best.) || ".");
/* Initialise an empty output variable */
comb = "";
/* Loop through each value in the input string */
do _k = 1 to len;
/* Check if the kth digit of the binary representation is 1 */
/* And if so add the kth input character to the output */
if substr(_bin, _k, 1) = "1" then
comb = cats(comb, substr(str, _k, 1));
end;
output;
end;
/* Clean up temporary variables, commented so you can see what's happening */
/* drop _:; */
run;
If you do want permutations then a similar approach is possible using factoradic representations of the numbers. But, I would recommend that you use a combinational function instead as the conversions would be much more involved. It's probably quite a nice coding exercise for learning though.
It would be great if SAS had a function for reducing strings by boolean patterns, but it probably wouldn't get much use.
bsubstr("ABCD", "1010") --> "AC"
bsubstr("ABCD", "1110") --> "ABC"
bsubstr("ABCD", "0001") --> "D"
SAS has in-built functions to calculate combinations & permutations, allcomb and allperm.
SAS Documentation for ALLCOMB function : http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a003112305.htm
I am trying to create a macro that compute the inner(dot) product of a vector and a matrix.
Y*X*t(Y) ## equivalent to the Sum(yi*Xij*yj)
I don't have IML, so I try to do it using array manipulation.
How to create a multidimensional array from the data to avoid index
translation within single array.
How to debug my loop, or at least print some variable to control my program?
How to delete temporary variables?
I am a SAS newbie, but this is what I have tried so far:
%macro dot_product(X = ,y=, value= );
/* read number of rows */
%let X_id=%sysfunc(open(&X));
%let nrows=%sysfunc(attrn(&X_id,nobs));
%let rc=%sysfunc(close(&X_id));
data &X.;
set &X.;
array arr_X{*} _numeric_;
set &y.;
array arr_y{*} _numeric_;
do i = 1 to &nrows;
do j = 1 to &nrows;
value + arr_y[i]*arr_X[j + &nrows*(i-1)]*arr_y[j];
end;
end;
run;
%mend;
When I run this :
%dot_product(X=X,y=y,value=val);
I get this error :
ERROR: Array subscript out of range at line 314 column 158.
I am using this to generate data :
data X;
array myCols{*} col1-col5;
do i = 1 to 5;
do j = 1 to dim(myCols);
myCols{j}=ranuni(123);
end;
output;
end;
drop i j;
run;
/* create a vector y */
data y;
array myCols{*} col1-col5;
do j = 1 to dim(myCols);
myCols{j}=ranuni(123);
end;
output;
drop j;
run;
Thanks in advance for your help or any idea to debug my data.
Edit: The following relates to the description of the question, how to evaluate a quadratic form using dot, inner or scalar products. The actual code is nearly fine. end edit
If you want to reduce it to dot products, then your value is the dot product of the linearization of X_ij and the same linearization applied to Z_ij=Y_i*Y_j.
The other way is to portion X_ij into its rows or columns depending on the linearization of the matrix, and compute separate dot products of Y with, say, each row. Of the resulting vector you the compute the dot product again with Y.
Edit added: The length nrows of the nested loops in the code should be determined from the length of the vector y, perhaps with a check that the length of x is nrows*nrows.