SAS Array Dedupe - arrays

I have a question on the SAS code below. I am new to arrays and what the below code is doing exactly. My understanding is that there are two indices below. I believe this is deduping the SAS data set by the two indices. I am not exactly sure. Thanks for your help!
data unix.txn_match_part_four_01;
set unix.txn_match_part_four_00;
format id_one1-id_one95000 BEST12. id_two1-id_two95000 BEST12.;
array id_one{95000} id_one1-id_one95000;
array id_two{95000} id_two1-id_two95000;
retain id_one1-id_one95000;
retain id_two1-id_two95000;
if _n_ = 1 then i = 1;
else i + 1;
do j = 1 to i;
if clm_idx = id_one{j} then delete;
end;
do k = 1 to i;
if txn_idx = id_two{k} then delete;
end;
id_one{i}=clm_idx;
id_two{i}=txn_idx;
run;

Related

Assign an array's value as a dimension to another array in SAS

I've been working on a complicated code and am stuck in the end, where I need to assign one array's value as a dimension parameter to another array in the code. A snapshot from my code :
For example:
array temp_match_fl(3) temp_match_fl1 - temp_match_fl3;
ARRAY buracc_repay(3) buracc_repay1 - buracc_repay3;
ARRAY ocs_repay(3) ocs_repay1 - ocs_repay3;
jj = 0;
do until (jj>=3);
jj=jj+1;
If length(strip(match_flag(jj))) = 1 then do;
temp_match_fl(jj) = match_flag(jj);
end;
Else If length(strip(match_flag(jj))) > 1 then do;
j1 = 0;
min_diff = 99999999;
do until (j1>=length(strip(match_class(jj))));
j1=j1+1;
retain min_diff;
n=substr(strip(match_flag(jj)),j1,1);
If (min_diff > abs(buracc_repay(jj)-ocs_repay(n))) then do;
min_diff = abs(buracc_repay(jj)-ocs_repay(n));
temp_match_fl(jj) = n;
end;
end;
end;
kk=temp_match_fl(jj);
/* buracc_repay(jj) = ocs_repay(kk);*/
buracc_repay(jj) = ocs_repay(temp_match_fl(jj));
end;
run;
Now, I need to be able to assign the value stored in temp_match_fl(jj) array as dimension parameter to another array, how can I achieve that?? None of the last two statements work:
buracc_repay(jj) = ocs_repay(kk);
buracc_repay(jj) = ocs_repay(temp_match_fl(jj));
Can someone please suggest.
Thanks!
Actually your last two statements as written do work. Are you getting an error, or unexpected results? Can you make a simple example like below that shows the problem?
Note that for this to work, it's essential that the value of temp_match_fl(jj) is 1, 2, or 3, because your OCS_REPAY array has three elements. From the code you've shown, it's not clear if that is always true. You don't show the match_flag array.
data want ;
array temp_match_fl(3) temp_match_fl1 - temp_match_fl3 (1 2 3) ;
array buracc_repay(3) buracc_repay1 - buracc_repay3 (10 20 30) ;
array ocs_repay(3) ocs_repay1 - ocs_repay3 (100 200 300) ;
jj=1 ;
kk=2 ;
*buracc_repay(jj) = ocs_repay(kk); *this works ;
put temp_match_fl(jj)= ; *debug to confirm value is 1 2 or 3 ;
buracc_repay(jj) = ocs_repay(temp_match_fl(jj)); *this also works;
put (buracc_repay:)(=) temp_match_fl1=; *check output ;
run ;

Recursion in Array SAS

I have an existing collection of variables a_0,...,a_45 where a_i represents the amount of stuff I have on day i. I'd like to create a new collection of variables b_0,...,b_45 to represent the incremental change in stuff I have on day i (i.e. b_k=a_k-a_(k-1) ). My approach:
data test;
set dataset;
array a a_0-a_45;
array b b_0-b_45;
b(1)=a(1);
do i=2 to 45;
b(i)=a(i)-a(i-1);
end;
run;
However my b variables just come out missing.
What initial values do you have for a_1 to a_45 before you start the loop? As you are not intialising them (except for a_0 ≡ a(1)), every b(i) term will be a difference of 2 a terms, of which at least one will be missing, unless these variables are populated in your input dataset.
Here is some sample code showing that the delta computation is correct when the variable names in the data set align with the variables named in the array statement in the data step.
Sample data
data have(keep=product_id note a_:);
do product_id = 1 to 100;
length note $15;
array amount a_0-a_45;
call missing(of amount(*));
if (ranuni(123) < 0.5) then do;
note = 'static deltas';
static_delta = ceil(5 * ranuni(123));
amount(1) = static_delta;
do inventory_day = 2 to dim(amount);
amount(inventory_day) = amount(inventory_day-1) + static_delta;
end;
end;
else do;
note = 'random deltas';
amount(1) = ceil(5 * ranuni(123));
do inventory_day = 2 to dim(amount);
amount(inventory_day) = max ( 0, amount(inventory_day-1) + floor(10 * ranuni(123)) - 5 );
end;
end;
OUTPUT;
end;
run;
Compute deltas
data want;
set have;
array amount a_0-a_45;
array delta b_0-b_45;
delta(1) = amount(1);
do i=2 to dim(amount);
delta(i) = amount(i) - amount(i-1);
end;
drop i;
format a_: b_: 4.;
run;
As Richard has already suggested in his comment while I was working on writing the code...Basically the only error that you have in your code is that your code should loop from 2 to 46 because there are 46 elements in the array. below code should work for you.
%macro f();
data dataset;
%do i = 0 %to 45;
a_&i. = ranuni(2);
%end;
run;
%mend;
%f();
data test;
set dataset;
array a1 a_0-a_45;
array b1 b_0-b_45;
/* This line will help in avoiding b_0 to have a missing value */
b1(1)=a1(1);
do i=2 to 46;
b1(i)=a1(i)-a1(i-1);
end;
run;

SAS Summing a dynamic range in array

I have an array with totals for 210 days. I need to find the sum of all 90 day ranges. The new array is med_sum. So med_sum(1) =sum(of Total(32)-total(121)), then med_sum(2)=sum(of total(33)-total(122)), and so on, 90 different times all the way to med_sum(90)=sum(of total(121)-total(210)).
Below is the syntax, but the sum(of) function isn't allowing me to do this and errors out. I have tried quite a few different options but have been unable to find anything that works.
Thank you in advance!!
data work.total_base_3;
set work.total_base_2;
array med_total(*) total1-total210;
array med_sum(*) avg1-avg90;
do i = 1 to 90;
med_sum(i)=sum(of med_total(i+31)-med_total(i+120));
end;
run;
You cannot use array references in variable lists, just actual variable names. So you want to generate 90 sums of 90 values with the window sliding. In essence you want
avg1 = sum(of total32 - total121);
avg2 = sum(of total33 - total122);
avg3 = sum(of total34 - total123);
You could use macro logic to just generate that series of statements. But if you look at the relationship between the variables you can see that
med_sum(n+1) = sum(med_sum(n),med_total(n+1+120),-1*med_total(n+31));
So your loop will look something like:
med_sum(1) = sum(of total32-total121);
do n=1 to dim(med_sum)-1;
med_sum(n+1) = sum(med_sum(n),med_total(n+1+120),-1*med_total(n+31));
end;
here's a sample that you should be able to extend to your data (so in your case you would change the 3's to 90's):
case 1 Data in rows :
data test;
keep obs;
do i=1 to 10;
obs = i;
output;
end;
run;
data test1;
set test;
keep obs sum;
array x[3];
retain x;
x[mod(_n_ -1,3)+1] = obs;
if (_n_ >= 3)then do;
sum = 0;
do i = 1 to 3;
sum= sum + x[i];
end;
end;
run;
case 2 data in columns (use the test dataset from from above):
proc transpose data=test out=testrow;
var obs;
run;
data test2;
set testrow;
array med_total(*) col1-col10;
array med_sum(8) ;
do i = 3 to 10;
med_sum[i-2]=0;
do j = 1 +(i-3) to i;
med_sum(i-2)=med_sum[i-2] + med_total(j);
end;
end;
run;

Reset Retained Array Values at the end of each observation in SAS

Im running the array code below
DATA Want;
SET Have;
ARRAY Dates{2562} (&Start_Date:&End_Date);
DO i = 1 TO DIM(Dates);
IF Dates[i] >= ObStartDate AND Dates[i] <= ObEndDate THEN Dates[i] = 1;
END;
RUN;
I have found the minimum date (ie first Obstartdate date of my dataset) and the maximum date (ie last ObEndDate date of my dataset) and those values are set to &Start_Date and &End_Date. The array creates itself correctly and enters unformatted SAS date values for each observation. I want to also run through each observation and say if the value in each of the array Dates columns are between the Observations individual Start and End date then replace that value with 1.
Heres where it starts to go wrong. It retains the ObStartDate and ObEndDate from observation to observation and only replaces different Dates[i] when it picks up a lower ObStartDate or higher ObEndDate.
Is there a way I can reset ObStartDate and ObEndDate to the value of each observations ObStartDate and ObEndDate when the Arrays Do Loop gets to each consecutive observation
Ive tried creating the array and doing a Do Loop in a different datastep. Ive also tried putting loops inside loops inside loops and arrays inside loops etc etc. I may have been close to success but this is the code that I thought would work and the first code that i wrote.
Any help will be greatly appreciated.
Cheers.
Here is some code to see what I mean
DATA Haveyay;
ATTRIB Ob LENGTH=3
ObStartDate Length=3
ObEndDate Length=3;
INFILE datalines DELIMITER='~';
INPUT Ob ObStartDate ObEndDate ;
DATALINES;
1~1~8
2~2~5
3~5~10
4~1~4
5~2~3
6~4~7
7~7~10
8~3~4
9~3~9
10~2~9
;
RUN;
PROC SQL Noprint;
SELECT min(ObStartDate), max(ObEndDate) into :Start_Date, :End_Date
FROM Haveyay;
QUIT;
DATA Wantyay;
SET Haveyay;
ARRAY Dates{10} (&Start_Date:&End_Date);
DO i = 1 to DIM(Dates);
IF Dates[i] >= ObStartDate AND Dates[i] <= ObEndDate THEN Dates[i] = 1;
END;
RUN;
It looks like your problem may be that you are expecting the values in the dates array to be reset to their original values with each observation. In reality the array statement initialises the value in the array only once, before any data is loaded. As the array variables are automatically retained each change you make to a member of the array will be carried forward into later observations.
You can use a second loop to reset the date values after outputting:
do i = 1 to dim(dates);
if obstartdate <= dates[i] <= obenddate then dates[i] = 1;
end;
output;
do i = 1 to dim(dates);
dates[i] = &start_date. + i - 1;
end;
Or more compactly calculate the date from i and the macro variable rather than the array:
do i = 1 to dim(dates);
_date = &start_date + i - 1;
dates[i] = ifn(ObStartDate <= _date <= ObEndDate , 1, _date);
end;

SAS Do loops with multiple variables

So I have a string with text in it for example "Jonathan Bob Thomas Smith" and I have partitioned the words into 4 variables (OLDVAR1-4) so OLDVAR1 would be Jonathan and OLDVAR2 would be Bob etc. What I want to do is rewrite the following code with a do loop:
NewVar1 = Index(String,OldVar1);
NewVar2 = Index(String,OldVar2);
NewVar3 = Index(String,OldVar3);
NewVar4 = Index(String,OldVar4);
I have tried:
Array NewVar[i];
Do i = 1 to 4;
NewVar[i] = Index(String,OldVar[i]);
end;
but I get the error message "Undeclared array referenced OldVar" and I can't seem to do multiple references in arrays.
Any help is appreciated.
You just need to do what SAS says you to do: to declare array OldVar. So your code will look like:
Array NewVar[4];
Array OldVar[4];
Do i = 1 to 4;
NewVar[i] = Index(String,OldVar[i]);
end;
BTW, you can't declare array using i , unless i already has some integer value assigned.
You'll have to specify an actual number of elements when declaring the array. Basic syntax is like: Array arrayName(no. of elements) variableList;
e.g.
data test1;
string='Jonathan Bob Thomas Smith';
Oldvar1='Bob';
Oldvar2='Smith';
Oldvar3='mas';
Oldvar4='tha';
;
run;
data test2;
set test1;
Array NewVar(4) Newvar1-Newvar4;
Array Oldvar(4) Oldvar1-Oldvar4; *Additional array that's used in do loop;
do i=1 to 4;
NewVar[i] = Index(String,OldVar[i]);
end;
drop i;
run;

Resources