So I have a string with text in it for example "Jonathan Bob Thomas Smith" and I have partitioned the words into 4 variables (OLDVAR1-4) so OLDVAR1 would be Jonathan and OLDVAR2 would be Bob etc. What I want to do is rewrite the following code with a do loop:
NewVar1 = Index(String,OldVar1);
NewVar2 = Index(String,OldVar2);
NewVar3 = Index(String,OldVar3);
NewVar4 = Index(String,OldVar4);
I have tried:
Array NewVar[i];
Do i = 1 to 4;
NewVar[i] = Index(String,OldVar[i]);
end;
but I get the error message "Undeclared array referenced OldVar" and I can't seem to do multiple references in arrays.
Any help is appreciated.
You just need to do what SAS says you to do: to declare array OldVar. So your code will look like:
Array NewVar[4];
Array OldVar[4];
Do i = 1 to 4;
NewVar[i] = Index(String,OldVar[i]);
end;
BTW, you can't declare array using i , unless i already has some integer value assigned.
You'll have to specify an actual number of elements when declaring the array. Basic syntax is like: Array arrayName(no. of elements) variableList;
e.g.
data test1;
string='Jonathan Bob Thomas Smith';
Oldvar1='Bob';
Oldvar2='Smith';
Oldvar3='mas';
Oldvar4='tha';
;
run;
data test2;
set test1;
Array NewVar(4) Newvar1-Newvar4;
Array Oldvar(4) Oldvar1-Oldvar4; *Additional array that's used in do loop;
do i=1 to 4;
NewVar[i] = Index(String,OldVar[i]);
end;
drop i;
run;
Related
I'm trying to turn a correlation matrix into one long column vector such that I have the following structure
data want;
input _name1_$ _name2_$ _corr_;
datalines;
var1 var2 0.54
;
run;
I have the following code, which outputs name1 and corr; however, I'm struggling to get name2!
DATA TEMP_1
(DROP=I J);
ARRAY VAR[*] VAR1-VAR10;
DO I = 1 TO 10;
DO J = 1 TO 10;
VAR(J) = RANUNI(0);
END;
OUTPUT;
END;
RUN;
PROC CORR
DATA=TEMP_1
OUT=TEMP_CORR
(WHERE=(_NAME_ NE " ")
DROP=_TYPE_)
;
RUN;
PROC SORT DATA=TEMP_CORR; BY _NAME_; RUN;
PROC TRANSPOSE
DATA=TEMP_CORR
OUT=TEMP_CORR_T
;
BY _NAME_;
RUN;
Help is appreciated
You're close. You're running into a weird issue with the name variable because that becomes a variable out of PROC TRANSPOSE as well. If you rename it, you get what you want. I also list the variables explicitly and add some RENAME data set options to get what you likely want.
PROC TRANSPOSE
DATA=TEMP_CORR (rename=_name_ = Name1)
OUT=TEMP_CORR_T (rename = (_name_ = Name2 col1=corr))
;
by name1;
var var1-var10;
RUN;
Edit: If you don’t want duplicates you can add a WHERE to the OUT dataset.
PROC TRANSPOSE
DATA=TEMP_CORR (rename=_name_ = Name1)
OUT=TEMP_CORR_T (rename = (_name_ = Name2 col1=corr) where = name1 > name2)
;
by name1;
var var1-var10;
RUN;
Just an ARRAY with VNAME() function. To just output the upper triangle set lower bound of DO loop to _N_.
data want ;
length _name1_ _name2_ $32 _corr_ 8 ;
keep _name1_ _name2_ _corr_;
set corr;
where _type_ = 'CORR';
array x _numeric_;
_name1_=_name_;
do i=_n_ to dim(x);
_name2_ = vname(x(i));
_corr_ = x(i);
output;
end;
run;
I've been working on a complicated code and am stuck in the end, where I need to assign one array's value as a dimension parameter to another array in the code. A snapshot from my code :
For example:
array temp_match_fl(3) temp_match_fl1 - temp_match_fl3;
ARRAY buracc_repay(3) buracc_repay1 - buracc_repay3;
ARRAY ocs_repay(3) ocs_repay1 - ocs_repay3;
jj = 0;
do until (jj>=3);
jj=jj+1;
If length(strip(match_flag(jj))) = 1 then do;
temp_match_fl(jj) = match_flag(jj);
end;
Else If length(strip(match_flag(jj))) > 1 then do;
j1 = 0;
min_diff = 99999999;
do until (j1>=length(strip(match_class(jj))));
j1=j1+1;
retain min_diff;
n=substr(strip(match_flag(jj)),j1,1);
If (min_diff > abs(buracc_repay(jj)-ocs_repay(n))) then do;
min_diff = abs(buracc_repay(jj)-ocs_repay(n));
temp_match_fl(jj) = n;
end;
end;
end;
kk=temp_match_fl(jj);
/* buracc_repay(jj) = ocs_repay(kk);*/
buracc_repay(jj) = ocs_repay(temp_match_fl(jj));
end;
run;
Now, I need to be able to assign the value stored in temp_match_fl(jj) array as dimension parameter to another array, how can I achieve that?? None of the last two statements work:
buracc_repay(jj) = ocs_repay(kk);
buracc_repay(jj) = ocs_repay(temp_match_fl(jj));
Can someone please suggest.
Thanks!
Actually your last two statements as written do work. Are you getting an error, or unexpected results? Can you make a simple example like below that shows the problem?
Note that for this to work, it's essential that the value of temp_match_fl(jj) is 1, 2, or 3, because your OCS_REPAY array has three elements. From the code you've shown, it's not clear if that is always true. You don't show the match_flag array.
data want ;
array temp_match_fl(3) temp_match_fl1 - temp_match_fl3 (1 2 3) ;
array buracc_repay(3) buracc_repay1 - buracc_repay3 (10 20 30) ;
array ocs_repay(3) ocs_repay1 - ocs_repay3 (100 200 300) ;
jj=1 ;
kk=2 ;
*buracc_repay(jj) = ocs_repay(kk); *this works ;
put temp_match_fl(jj)= ; *debug to confirm value is 1 2 or 3 ;
buracc_repay(jj) = ocs_repay(temp_match_fl(jj)); *this also works;
put (buracc_repay:)(=) temp_match_fl1=; *check output ;
run ;
I have a question on the SAS code below. I am new to arrays and what the below code is doing exactly. My understanding is that there are two indices below. I believe this is deduping the SAS data set by the two indices. I am not exactly sure. Thanks for your help!
data unix.txn_match_part_four_01;
set unix.txn_match_part_four_00;
format id_one1-id_one95000 BEST12. id_two1-id_two95000 BEST12.;
array id_one{95000} id_one1-id_one95000;
array id_two{95000} id_two1-id_two95000;
retain id_one1-id_one95000;
retain id_two1-id_two95000;
if _n_ = 1 then i = 1;
else i + 1;
do j = 1 to i;
if clm_idx = id_one{j} then delete;
end;
do k = 1 to i;
if txn_idx = id_two{k} then delete;
end;
id_one{i}=clm_idx;
id_two{i}=txn_idx;
run;
I have an existing collection of variables a_0,...,a_45 where a_i represents the amount of stuff I have on day i. I'd like to create a new collection of variables b_0,...,b_45 to represent the incremental change in stuff I have on day i (i.e. b_k=a_k-a_(k-1) ). My approach:
data test;
set dataset;
array a a_0-a_45;
array b b_0-b_45;
b(1)=a(1);
do i=2 to 45;
b(i)=a(i)-a(i-1);
end;
run;
However my b variables just come out missing.
What initial values do you have for a_1 to a_45 before you start the loop? As you are not intialising them (except for a_0 ≡ a(1)), every b(i) term will be a difference of 2 a terms, of which at least one will be missing, unless these variables are populated in your input dataset.
Here is some sample code showing that the delta computation is correct when the variable names in the data set align with the variables named in the array statement in the data step.
Sample data
data have(keep=product_id note a_:);
do product_id = 1 to 100;
length note $15;
array amount a_0-a_45;
call missing(of amount(*));
if (ranuni(123) < 0.5) then do;
note = 'static deltas';
static_delta = ceil(5 * ranuni(123));
amount(1) = static_delta;
do inventory_day = 2 to dim(amount);
amount(inventory_day) = amount(inventory_day-1) + static_delta;
end;
end;
else do;
note = 'random deltas';
amount(1) = ceil(5 * ranuni(123));
do inventory_day = 2 to dim(amount);
amount(inventory_day) = max ( 0, amount(inventory_day-1) + floor(10 * ranuni(123)) - 5 );
end;
end;
OUTPUT;
end;
run;
Compute deltas
data want;
set have;
array amount a_0-a_45;
array delta b_0-b_45;
delta(1) = amount(1);
do i=2 to dim(amount);
delta(i) = amount(i) - amount(i-1);
end;
drop i;
format a_: b_: 4.;
run;
As Richard has already suggested in his comment while I was working on writing the code...Basically the only error that you have in your code is that your code should loop from 2 to 46 because there are 46 elements in the array. below code should work for you.
%macro f();
data dataset;
%do i = 0 %to 45;
a_&i. = ranuni(2);
%end;
run;
%mend;
%f();
data test;
set dataset;
array a1 a_0-a_45;
array b1 b_0-b_45;
/* This line will help in avoiding b_0 to have a missing value */
b1(1)=a1(1);
do i=2 to 46;
b1(i)=a1(i)-a1(i-1);
end;
run;
In this block of SAS data step code I am setting a Table from an SQL query called TEST_Table. This table contains multiple columns including a larger section of columns titled PREFIX_1 to PREFIX_20. Each column starts with PREFIX_ and then an incrementing number from 1 to 20.
What I would like to do is iteratively cycle through each column and analyze the value of that column.
Below is an example of what I am trying to go for. As you can see I would like to create a variable that increases on each iteration and then I use that count value as a part of the variable name I am checking.
data TEST_Data;
set TEST_Table;
retain changing_number;
changing_number=1;
do while(changing_number<=20);
if PREFIX_changing_number='BAD_IDENTIFIER' then do;
PREFIX_changing_number='This is a bad part';
end;
end;
run;
How would be the best way to do this in SAS? I know I can do it by simply checking each value individually from 1 to 20.
if PREFIX_1 = 'BAD_IDENTIFIER' then do;
PREFIX_1 = 'This is a bad part';
end;
if PREFIX_2 = ...
But that would be really obnoxious as later I will be doing the same thing with a set of over 40 columns.
Ideas?
SOLUTION
data TEST_Data;
set TEST_Table;
array SC $ SC1-SC20;
do i=1 to dim(SC);
if SC{i}='xxx' then do;
SC{i}="bad part";
end;
end;
run;
Thank you for suggesting Arrays :)
You need to look up Array processing in SAS. Simply put, you can do something like this:
data TEST_Data;
set TEST_Table;
*retain changing_number; Remove this - even in your code it does nothing useful;
array prefixes prefix:; *one of a number of ways to do this;
changing_number=1;
do while(changing_number<=20);
if prefixes[changing_number]='BAD_IDENTIFIER' then do;
prefixes[changing_number]='This is a bad part';
end;
end;
run;
A slightly better loop is:
do changing_number = 1 to dim(prefixes);
... loop ...
end;
As that's all in one step, and it is flexible with the number of array elements (dim = number of elements in the array).