SAS Summing a dynamic range in array - arrays

I have an array with totals for 210 days. I need to find the sum of all 90 day ranges. The new array is med_sum. So med_sum(1) =sum(of Total(32)-total(121)), then med_sum(2)=sum(of total(33)-total(122)), and so on, 90 different times all the way to med_sum(90)=sum(of total(121)-total(210)).
Below is the syntax, but the sum(of) function isn't allowing me to do this and errors out. I have tried quite a few different options but have been unable to find anything that works.
Thank you in advance!!
data work.total_base_3;
set work.total_base_2;
array med_total(*) total1-total210;
array med_sum(*) avg1-avg90;
do i = 1 to 90;
med_sum(i)=sum(of med_total(i+31)-med_total(i+120));
end;
run;

You cannot use array references in variable lists, just actual variable names. So you want to generate 90 sums of 90 values with the window sliding. In essence you want
avg1 = sum(of total32 - total121);
avg2 = sum(of total33 - total122);
avg3 = sum(of total34 - total123);
You could use macro logic to just generate that series of statements. But if you look at the relationship between the variables you can see that
med_sum(n+1) = sum(med_sum(n),med_total(n+1+120),-1*med_total(n+31));
So your loop will look something like:
med_sum(1) = sum(of total32-total121);
do n=1 to dim(med_sum)-1;
med_sum(n+1) = sum(med_sum(n),med_total(n+1+120),-1*med_total(n+31));
end;

here's a sample that you should be able to extend to your data (so in your case you would change the 3's to 90's):
case 1 Data in rows :
data test;
keep obs;
do i=1 to 10;
obs = i;
output;
end;
run;
data test1;
set test;
keep obs sum;
array x[3];
retain x;
x[mod(_n_ -1,3)+1] = obs;
if (_n_ >= 3)then do;
sum = 0;
do i = 1 to 3;
sum= sum + x[i];
end;
end;
run;
case 2 data in columns (use the test dataset from from above):
proc transpose data=test out=testrow;
var obs;
run;
data test2;
set testrow;
array med_total(*) col1-col10;
array med_sum(8) ;
do i = 3 to 10;
med_sum[i-2]=0;
do j = 1 +(i-3) to i;
med_sum(i-2)=med_sum[i-2] + med_total(j);
end;
end;
run;

Related

Transpose a correlation matrix into one long vector in SAS

I'm trying to turn a correlation matrix into one long column vector such that I have the following structure
data want;
input _name1_$ _name2_$ _corr_;
datalines;
var1 var2 0.54
;
run;
I have the following code, which outputs name1 and corr; however, I'm struggling to get name2!
DATA TEMP_1
(DROP=I J);
ARRAY VAR[*] VAR1-VAR10;
DO I = 1 TO 10;
DO J = 1 TO 10;
VAR(J) = RANUNI(0);
END;
OUTPUT;
END;
RUN;
PROC CORR
DATA=TEMP_1
OUT=TEMP_CORR
(WHERE=(_NAME_ NE " ")
DROP=_TYPE_)
;
RUN;
PROC SORT DATA=TEMP_CORR; BY _NAME_; RUN;
PROC TRANSPOSE
DATA=TEMP_CORR
OUT=TEMP_CORR_T
;
BY _NAME_;
RUN;
Help is appreciated
You're close. You're running into a weird issue with the name variable because that becomes a variable out of PROC TRANSPOSE as well. If you rename it, you get what you want. I also list the variables explicitly and add some RENAME data set options to get what you likely want.
PROC TRANSPOSE
DATA=TEMP_CORR (rename=_name_ = Name1)
OUT=TEMP_CORR_T (rename = (_name_ = Name2 col1=corr))
;
by name1;
var var1-var10;
RUN;
Edit: If you don’t want duplicates you can add a WHERE to the OUT dataset.
PROC TRANSPOSE
DATA=TEMP_CORR (rename=_name_ = Name1)
OUT=TEMP_CORR_T (rename = (_name_ = Name2 col1=corr) where = name1 > name2)
;
by name1;
var var1-var10;
RUN;
Just an ARRAY with VNAME() function. To just output the upper triangle set lower bound of DO loop to _N_.
data want ;
length _name1_ _name2_ $32 _corr_ 8 ;
keep _name1_ _name2_ _corr_;
set corr;
where _type_ = 'CORR';
array x _numeric_;
_name1_=_name_;
do i=_n_ to dim(x);
_name2_ = vname(x(i));
_corr_ = x(i);
output;
end;
run;

Assign an array's value as a dimension to another array in SAS

I've been working on a complicated code and am stuck in the end, where I need to assign one array's value as a dimension parameter to another array in the code. A snapshot from my code :
For example:
array temp_match_fl(3) temp_match_fl1 - temp_match_fl3;
ARRAY buracc_repay(3) buracc_repay1 - buracc_repay3;
ARRAY ocs_repay(3) ocs_repay1 - ocs_repay3;
jj = 0;
do until (jj>=3);
jj=jj+1;
If length(strip(match_flag(jj))) = 1 then do;
temp_match_fl(jj) = match_flag(jj);
end;
Else If length(strip(match_flag(jj))) > 1 then do;
j1 = 0;
min_diff = 99999999;
do until (j1>=length(strip(match_class(jj))));
j1=j1+1;
retain min_diff;
n=substr(strip(match_flag(jj)),j1,1);
If (min_diff > abs(buracc_repay(jj)-ocs_repay(n))) then do;
min_diff = abs(buracc_repay(jj)-ocs_repay(n));
temp_match_fl(jj) = n;
end;
end;
end;
kk=temp_match_fl(jj);
/* buracc_repay(jj) = ocs_repay(kk);*/
buracc_repay(jj) = ocs_repay(temp_match_fl(jj));
end;
run;
Now, I need to be able to assign the value stored in temp_match_fl(jj) array as dimension parameter to another array, how can I achieve that?? None of the last two statements work:
buracc_repay(jj) = ocs_repay(kk);
buracc_repay(jj) = ocs_repay(temp_match_fl(jj));
Can someone please suggest.
Thanks!
Actually your last two statements as written do work. Are you getting an error, or unexpected results? Can you make a simple example like below that shows the problem?
Note that for this to work, it's essential that the value of temp_match_fl(jj) is 1, 2, or 3, because your OCS_REPAY array has three elements. From the code you've shown, it's not clear if that is always true. You don't show the match_flag array.
data want ;
array temp_match_fl(3) temp_match_fl1 - temp_match_fl3 (1 2 3) ;
array buracc_repay(3) buracc_repay1 - buracc_repay3 (10 20 30) ;
array ocs_repay(3) ocs_repay1 - ocs_repay3 (100 200 300) ;
jj=1 ;
kk=2 ;
*buracc_repay(jj) = ocs_repay(kk); *this works ;
put temp_match_fl(jj)= ; *debug to confirm value is 1 2 or 3 ;
buracc_repay(jj) = ocs_repay(temp_match_fl(jj)); *this also works;
put (buracc_repay:)(=) temp_match_fl1=; *check output ;
run ;

SAS Array Dedupe

I have a question on the SAS code below. I am new to arrays and what the below code is doing exactly. My understanding is that there are two indices below. I believe this is deduping the SAS data set by the two indices. I am not exactly sure. Thanks for your help!
data unix.txn_match_part_four_01;
set unix.txn_match_part_four_00;
format id_one1-id_one95000 BEST12. id_two1-id_two95000 BEST12.;
array id_one{95000} id_one1-id_one95000;
array id_two{95000} id_two1-id_two95000;
retain id_one1-id_one95000;
retain id_two1-id_two95000;
if _n_ = 1 then i = 1;
else i + 1;
do j = 1 to i;
if clm_idx = id_one{j} then delete;
end;
do k = 1 to i;
if txn_idx = id_two{k} then delete;
end;
id_one{i}=clm_idx;
id_two{i}=txn_idx;
run;

Recursion in Array SAS

I have an existing collection of variables a_0,...,a_45 where a_i represents the amount of stuff I have on day i. I'd like to create a new collection of variables b_0,...,b_45 to represent the incremental change in stuff I have on day i (i.e. b_k=a_k-a_(k-1) ). My approach:
data test;
set dataset;
array a a_0-a_45;
array b b_0-b_45;
b(1)=a(1);
do i=2 to 45;
b(i)=a(i)-a(i-1);
end;
run;
However my b variables just come out missing.
What initial values do you have for a_1 to a_45 before you start the loop? As you are not intialising them (except for a_0 ≡ a(1)), every b(i) term will be a difference of 2 a terms, of which at least one will be missing, unless these variables are populated in your input dataset.
Here is some sample code showing that the delta computation is correct when the variable names in the data set align with the variables named in the array statement in the data step.
Sample data
data have(keep=product_id note a_:);
do product_id = 1 to 100;
length note $15;
array amount a_0-a_45;
call missing(of amount(*));
if (ranuni(123) < 0.5) then do;
note = 'static deltas';
static_delta = ceil(5 * ranuni(123));
amount(1) = static_delta;
do inventory_day = 2 to dim(amount);
amount(inventory_day) = amount(inventory_day-1) + static_delta;
end;
end;
else do;
note = 'random deltas';
amount(1) = ceil(5 * ranuni(123));
do inventory_day = 2 to dim(amount);
amount(inventory_day) = max ( 0, amount(inventory_day-1) + floor(10 * ranuni(123)) - 5 );
end;
end;
OUTPUT;
end;
run;
Compute deltas
data want;
set have;
array amount a_0-a_45;
array delta b_0-b_45;
delta(1) = amount(1);
do i=2 to dim(amount);
delta(i) = amount(i) - amount(i-1);
end;
drop i;
format a_: b_: 4.;
run;
As Richard has already suggested in his comment while I was working on writing the code...Basically the only error that you have in your code is that your code should loop from 2 to 46 because there are 46 elements in the array. below code should work for you.
%macro f();
data dataset;
%do i = 0 %to 45;
a_&i. = ranuni(2);
%end;
run;
%mend;
%f();
data test;
set dataset;
array a1 a_0-a_45;
array b1 b_0-b_45;
/* This line will help in avoiding b_0 to have a missing value */
b1(1)=a1(1);
do i=2 to 46;
b1(i)=a1(i)-a1(i-1);
end;
run;

SAS - How to let array do loop output only once?

The follow code with the do loop creates repeated outputs if '3320' or '3321' appears more than once in following columns. My question is how do I use a do loop w/o it outputting repeated times? The commented out OR statements solves this problem, but it's not efficient given a bigger list of variables.
options obs = 1000;
data NIS_2013.PARKINSONS;
set NIS_2013.NIS_2013_CORE;
array DX (25) $ dx1--dx25;
do i = 1 to 25;
if DX(i) IN ('3320', '3321') then output;
end;
run;
/* if DX1 IN ('3320', '3321')
OR DX2 IN ('3320', '3321')
OR DX3 IN ('3320', '3321')
OR DX4 IN ('3320', '3321')
... */
Remove the OUTPUT from the loop. Instead create a flag that you then use to output the record.
If you're only searching for those two codes, I would suggest using WHICHC instead to search the array. I've included it in the code below but left it commented out.
options obs = 1000;
data NIS_2013.PARKINSONS;
set NIS_2013.NIS_2013_CORE;
array DX (25) $ dx1--dx25;
flag_parkinson=0;
do i = 1 to 25;
if DX(i) IN ('3320', '3321') then flag_parkinson=1;
end;
if flag_parkinson=1 then output;
*x = whichc('3320', of dx(*)) + whichc('3321', of dx(*));
*if x>0 then output;
run;

Resources