I have a big SAS table, let's describe the columns as, A nd B columns in character format and all other columns are vairable in numerical format (every variable has a different name) with unknow amounth length N, like:
A B Name1 Name2 Name3 .... NameN
-------------------------------------------------
Char Char Number1 Number2 Number3 ..... NumberN
.................................................
.................................................
The goal is that the numerical array Name1-NameN will sum up downward through the Class=B (By B),
So the final table will look like this:
A B Name1 Name2 Name3 .... NameN
----------------------------------------
Char Char Sum1 Sum2 Sum3 ..... SumN
........................................
........................................
To do this sum-up, I described 2 arrays. The first one is:
array Varr {*} _numeric_; /* it reads only numerical columns */
Then I described another array with the same length (Summ1-SummN) to do the sum-up process.
The thing is that I can only describe the length of this new array manually. For example, if there are 80 numerical values, then I have to write manually like:
array summ {80} Summ1-Summ80;
The code works when I write it manually. But instead I want to write something like
array summ {&N} Summ1-Summ&N; /* &N is the dimension of the array Varr */
I tried with do-loop and dim(Varr) under the array in many different ways like:
data want;
array Varr {*} _numeric_;
do i=1 to dim(Varr);
N+1 ;
end;
%put &N;
array Summ {&N} Summ1-Summ&N;
retain Summ;
if first.B then do i=1 to dim(varr); summ(i)=varr(i) ;end;
else do i =1 to dim(varr); summ(i) = summ(i) + varr(i) ; varr(i)=summ(i); end;
drop Summ1-Summ&N;
run;
But it doesn't work. Any idea about how to bring the length of the first array to the second array?
You need to calculate and store the number of numeric variables in a previous step. The easiest way is to use the dictionary.columns metadata table, available in proc sql. This contains all column details for a given dataset, including the type (num or char), you therefore just need to count the number of columns where the type is 'num'.
The code below does just that and stores the result in a macro variable, &N. using the into : functionality. I've also used the functions left and put to remove leading blanks from the macro variable, otherwise you'll encounter problems when putting summ1-summ&N.
I've also added a 2nd solution based on your answer, but will be more efficient as it doesn't read in any records, only the column details
proc sql noprint;
select left(put(count(*),best12.)) into :N
from dictionary.columns
where libname='SASHELP' and memname='CLASS' and type='num';
quit;
%put Numeric variables = &N.;
/*****************************************/
/* alternative solution */
data _null_;
set sashelp.class (obs=0);
array temp{*} _numeric_;
call symputx('N',dim(temp));
run;
%put Numeric variables = &N.;
Now I found another solution with a little modification of the solution from #kl78
Before when I tried with call symput ('N',dim(varr)); I forgot to change the numeric format and to remove the uneccessary spaces. When I run it without format, the code tried to find Summ_____87, so it gave error.
Now I run it with format, call symput ('N',put(dim(varr),2.)); the code can find Summ87, so it is totally sucessfull now.
Related
I've got a character variable which holds a delimited list of strings, like so:
data lists;
format list_val $75.;
list_val = "PDC; QRS; OLN; ABC";
run;
I need to alphabetize the elements of each list (so the desired result when applied to the above string is "ABC; OLN; PDC; QRS;").
I adapted the solution here for my purposes as follows:
data lists_sorted;
set lists;
array_size = count(list_val,";") + 1; /* Cannot be used as array length must be specified at creation */
array t(50) $ 8 _TEMPORARY_;
call missing(of t(*));
do _n_=1 to array_size;
t(_n_)=scan(list_val,_n_,";");
end;
call sortc(of t(*));
new_list_val =catx("; ", of t(*));
put "original: " list_val " new: " new_list_val;
run;
When I run this code I get the following output:
original: PDC; QRS; OLN; ABC new: ABC; OLN; QRS; PDC
Which was not expected or desired. In general, the result of the above code applied to any list is a new list which is sorted alphabetically, except that the first element of the original list becomes the last element of the new list, regardless of its alphabetical ordering.
I can't find anything in the documentation of sortc which would explain this behavior, so I'm wondering if the issue is somehow the way I've set up the temporary array (I don't have much experience with these).
Does anyone know why sortc behaves this way? Side question: is there anyway I can dynamically determine the size of the array, rather than hard-coding a value such as 50?
It is because you included the leading spaces when assigning the values to the array elements. Remove those.
t[_n_]=left(scan(list_val,_n_,";"));
If you want to know what the minimum size array you could use for your data step you would need to process the dataset twice.
proc sql ;
select max(count(list_val,";") + 1) into :max_size trimmed from have;
quit;
....
array t[&max_size] $ 8 _temporary_;
But there is probably not much harm in just using some large constant value.
I would like to set the length of an array depending on what value i obtain from reading a dataset:number which has one variable num with one numeric value. But I am getting an error message: saying that I cannot initiate the probs array. Can i get any suggestion on how to solve this issue? (I really don't want to hardcode the length of the probs array)
data test;
if _n_=1 then do;
set work.number;
i = num +1;
end;
array probs{i} _temporary_ .....
SAS Data step arrays can not be dynamically sized during step run-time.
One common approach is to place the computed number of rows of the data set into a macro variable before the data step.
I'm not sure what you are doing with probs.
What values will be going into the array elements ?
Do you need all prob data while iterating through each row of the data set ?
Is a single result computed from the probs data ?
Example - Compute row count in a data null using nobs set option:
data _null_;
if 0 then set work.number nobs=row_count;
call symputx ('ROW_COUNT', row_count);
stop;
run;
data test;
array probs (&ROW_COUNT.) _temporary_;
* something with probs[index] ;
* maybe fill it ?
do index = 1 by 1 until (last_row);
set work.number;
probs[index] = prob; * prob from ith row;
end;
* do some computation that a procedure isn't able to;
…
result_over_all_data = my_magic; * one result row from processing all prob;
output;
stop;
run;
Of course your actual use of the array will vary.
The many other ways to get row_count include dictionary.table views, sql select count(*) into and a variety of ATTRN invocations.
In SAS if I have a string or an Array like the following,
array x[4] $1 ('A' 'B' 'C' 'D');
I need to generate all "Unique" permutations of the elements like the following,
[ABCD]
[ABC]
[BCD]
[ACD]
[ABD]
[AB]
[AC]
[AD]
[BC]
[BD]
[CD]
[A]
[B]
[C]
[D]
Is there a function in SAS for generating all possible combinations of the array?
Assumption: I believe you are looking for combinations and not permutations, so the order does not matter, BA and AB are same thing.
Use call allcomb subroutine and comb function to find out the possible combinations.
Read more about allcomb and comb here
http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p0yx35py6pk47nn1vyrczffzrw25.htm
and here
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001009658.htm
In short -
call allcomb subroutine gives the possible combinations out of n elements when r of them are taken and
comb function gives you how many combinations it would be when out of n elements r of them are taken at a time.
data test(keep=my_string);
length my_string $50.;
array a[4] $ ('A' 'B' 'C' 'D');
n = dim(a);
do k=1 to n;
do j=1 to comb(n,k);
call allcomb(j,k,of a[*]);
do i = 1 to k;
if i=1 then do; my_string="";counter=0;end;
counter=counter+1;
my_string=cat(compress(my_string),compress(a[i]));
if counter=k then output;
end;
end;
end;
run;
A slightly different take on this is to use proc summary.
Create a dummy dataset. Assign each element of the array to a variable so we can feed it into proc summary:
data tmp;
array arr[*] a b c d (1 1 1 1);
run;
Run proc summary.
proc summary data=tmp noprint missing;
class a b c d;
output out=combinations;
run;
You can also use the ways or types statements in proc summary to limit any combinations you may want.
Now the interesting side effect of doing this is that you get the _type_ column in the output dataset as well. In the example above the following values would be assigned:
D = 1
C = 2
B = 4
A = 8
So if the _type_ value in the output dataset is 13, then we know that the row was generated by combining A, B and D (8 + 4 + 1).
Here is a quick script that will find the combinations of the individual characters within a string. This could be easily adapted to work with arrays if you prefer. Rather than using the combinational functions and call routines (all*, lex*, ran*) this approach creates the permutations by using the binary representation of integers up to 2**len. I prefer this approach as I think it demonstrates what is happening, is more transparent and doesn't change the order of the array assignments.
data have;
str = "123456"; output;
str = "ABCD"; output;
run;
data want;
set have;
length comb $20.;
len = length(str);
/* Loop through all possible permutations */
do _i = 0 to 2**len - 1;
/* Store the current iteration number in binary */
_bin = putn(_i, "binary" || put(len, best.) || ".");
/* Initialise an empty output variable */
comb = "";
/* Loop through each value in the input string */
do _k = 1 to len;
/* Check if the kth digit of the binary representation is 1 */
/* And if so add the kth input character to the output */
if substr(_bin, _k, 1) = "1" then
comb = cats(comb, substr(str, _k, 1));
end;
output;
end;
/* Clean up temporary variables, commented so you can see what's happening */
/* drop _:; */
run;
If you do want permutations then a similar approach is possible using factoradic representations of the numbers. But, I would recommend that you use a combinational function instead as the conversions would be much more involved. It's probably quite a nice coding exercise for learning though.
It would be great if SAS had a function for reducing strings by boolean patterns, but it probably wouldn't get much use.
bsubstr("ABCD", "1010") --> "AC"
bsubstr("ABCD", "1110") --> "ABC"
bsubstr("ABCD", "0001") --> "D"
SAS has in-built functions to calculate combinations & permutations, allcomb and allperm.
SAS Documentation for ALLCOMB function : http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a003112305.htm
I am trying to create a macro that compute the inner(dot) product of a vector and a matrix.
Y*X*t(Y) ## equivalent to the Sum(yi*Xij*yj)
I don't have IML, so I try to do it using array manipulation.
How to create a multidimensional array from the data to avoid index
translation within single array.
How to debug my loop, or at least print some variable to control my program?
How to delete temporary variables?
I am a SAS newbie, but this is what I have tried so far:
%macro dot_product(X = ,y=, value= );
/* read number of rows */
%let X_id=%sysfunc(open(&X));
%let nrows=%sysfunc(attrn(&X_id,nobs));
%let rc=%sysfunc(close(&X_id));
data &X.;
set &X.;
array arr_X{*} _numeric_;
set &y.;
array arr_y{*} _numeric_;
do i = 1 to &nrows;
do j = 1 to &nrows;
value + arr_y[i]*arr_X[j + &nrows*(i-1)]*arr_y[j];
end;
end;
run;
%mend;
When I run this :
%dot_product(X=X,y=y,value=val);
I get this error :
ERROR: Array subscript out of range at line 314 column 158.
I am using this to generate data :
data X;
array myCols{*} col1-col5;
do i = 1 to 5;
do j = 1 to dim(myCols);
myCols{j}=ranuni(123);
end;
output;
end;
drop i j;
run;
/* create a vector y */
data y;
array myCols{*} col1-col5;
do j = 1 to dim(myCols);
myCols{j}=ranuni(123);
end;
output;
drop j;
run;
Thanks in advance for your help or any idea to debug my data.
Edit: The following relates to the description of the question, how to evaluate a quadratic form using dot, inner or scalar products. The actual code is nearly fine. end edit
If you want to reduce it to dot products, then your value is the dot product of the linearization of X_ij and the same linearization applied to Z_ij=Y_i*Y_j.
The other way is to portion X_ij into its rows or columns depending on the linearization of the matrix, and compute separate dot products of Y with, say, each row. Of the resulting vector you the compute the dot product again with Y.
Edit added: The length nrows of the nested loops in the code should be determined from the length of the vector y, perhaps with a check that the length of x is nrows*nrows.
I am trying to loop through a column with 50000 rows. I would like to compare the value in say i with (i+1). The only way I know how to do this is by defining an array. However, there is only one variable i.e. The variables column name e.g. Col but 50000 observations within the column. When I use:
array Transform {50000} Col
where Transform is the name of the array and Col is the column name in my dataset, I receive a subscript error as there are too few variables i.e. Only 1 vs 50000. I have tried replacing {50000} with {50000,1} (and even {*}) so the compiler recognizes that there are 50k observations and only one column. Further I have attempted to transpose the dataset but this seems difficult as I need to add on another variable on to the dataset later which depends on the values of i and (i+1).
Is there a method to loop through the column to compare i and (i+1) using any method (not necessarily an array)? Thanks for the help :)
Example of using LAG:
data input;
infile cards;
input transform;
cards;
3
5
8
12
16
;
run;
data comp;
set input;
transform_change = transform - lag1(transform);
run;
For reversed order of rows:
data input_rownum / view=input_rownum;
set input;
rownum = _N_;
run;
proc sort data=input_rownum out=input_reversed;
by descending rownum;
run;
data comp_reverse;
set input_reversed;
transform_change = transform - lag1(transform);
run;
LAG1 means previous value of the variable. LAG2 is for pre-previous, and so on. Consult the documentation for more.
Arrays work across variables, so aren't suitable for your task here. There's a couple of options for you, given the small number of rows the easiest is probably to just join the dataset on itself, with the row number offset by one. You can then do your comparison.
data want;
merge have have (firstobs=2 rename=(col=col_plus1));
run;
If you only want to compare row i with i+1 you could use the lag function. This pulls the value from the previous row read (beware when using this with loops as not all rows will be processed in a loop)