Reset Retained Array Values at the end of each observation in SAS - arrays

Im running the array code below
DATA Want;
SET Have;
ARRAY Dates{2562} (&Start_Date:&End_Date);
DO i = 1 TO DIM(Dates);
IF Dates[i] >= ObStartDate AND Dates[i] <= ObEndDate THEN Dates[i] = 1;
END;
RUN;
I have found the minimum date (ie first Obstartdate date of my dataset) and the maximum date (ie last ObEndDate date of my dataset) and those values are set to &Start_Date and &End_Date. The array creates itself correctly and enters unformatted SAS date values for each observation. I want to also run through each observation and say if the value in each of the array Dates columns are between the Observations individual Start and End date then replace that value with 1.
Heres where it starts to go wrong. It retains the ObStartDate and ObEndDate from observation to observation and only replaces different Dates[i] when it picks up a lower ObStartDate or higher ObEndDate.
Is there a way I can reset ObStartDate and ObEndDate to the value of each observations ObStartDate and ObEndDate when the Arrays Do Loop gets to each consecutive observation
Ive tried creating the array and doing a Do Loop in a different datastep. Ive also tried putting loops inside loops inside loops and arrays inside loops etc etc. I may have been close to success but this is the code that I thought would work and the first code that i wrote.
Any help will be greatly appreciated.
Cheers.
Here is some code to see what I mean
DATA Haveyay;
ATTRIB Ob LENGTH=3
ObStartDate Length=3
ObEndDate Length=3;
INFILE datalines DELIMITER='~';
INPUT Ob ObStartDate ObEndDate ;
DATALINES;
1~1~8
2~2~5
3~5~10
4~1~4
5~2~3
6~4~7
7~7~10
8~3~4
9~3~9
10~2~9
;
RUN;
PROC SQL Noprint;
SELECT min(ObStartDate), max(ObEndDate) into :Start_Date, :End_Date
FROM Haveyay;
QUIT;
DATA Wantyay;
SET Haveyay;
ARRAY Dates{10} (&Start_Date:&End_Date);
DO i = 1 to DIM(Dates);
IF Dates[i] >= ObStartDate AND Dates[i] <= ObEndDate THEN Dates[i] = 1;
END;
RUN;

It looks like your problem may be that you are expecting the values in the dates array to be reset to their original values with each observation. In reality the array statement initialises the value in the array only once, before any data is loaded. As the array variables are automatically retained each change you make to a member of the array will be carried forward into later observations.
You can use a second loop to reset the date values after outputting:
do i = 1 to dim(dates);
if obstartdate <= dates[i] <= obenddate then dates[i] = 1;
end;
output;
do i = 1 to dim(dates);
dates[i] = &start_date. + i - 1;
end;
Or more compactly calculate the date from i and the macro variable rather than the array:
do i = 1 to dim(dates);
_date = &start_date + i - 1;
dates[i] = ifn(ObStartDate <= _date <= ObEndDate , 1, _date);
end;

Related

array processing with different indices and missing values in SAS

have is a sas data set with 4 variables: an id and variables storing info on all the activities a respondent shares with 3 different members of a team they're on. There are 4 different activity types, identified by the numbers populating in the :_activities vars for each player (p1 to p3). Below are the first 5 obs:
id p1_activities p2_activities p3_activities
A 1,2,3,4 1,3
B 1,3 1,2,3 1,2,3
C 1,2,3 1,2,3
D 1,2,3
E 1,2,3 1
Consider respondent A: they share all 4 activities with player 1 on their team, and activities 1 and 3 with player 2 on their team. I need to create flags for each player position and each activity. For example, a new numeric variable p1_act2_flag should equal 1 for all respondents who have a value of 2 appearing in the p1_activities character variable. Here are the first 6 variables I need to create out of the 12 total for the data shown:
p1_act1_flag p1_act2_flag p1_act3_flag p1_act4_flag p2_act1_flag p2_act2_flag …
1 1 1 1 1 0 …
1 0 1 0 1 1 …
. . . . 1 1 …
. . . . 1 1 …
1 1 1 0 . . …
I do this now by initializing all of the variable names in a length statement, then writing a ton if-then statements. I want to use far fewer lines of code, but my array logic is incorrect. Here's how I try to create the flags for player 1:
data want;
length p1_act1_flg p1_act2_flg p1_act3_flg p1_act4_flg
p2_act1_flg p2_act2_flg p2_act3_flg p2_act4_flg
p3_act1_flg p3_act2_flg p3_act3_flg p3_act4_flg
p4_act1_flg p4_act2_flg p4_act3_flg p4_act4_flg 8.0;
set have;
array plracts {*} p1_activities p2_activities p3_activities;
array p1actflg {*} p1_act1_flg p1_act2_flg p1_act3_flg p1_act4_flg;
array p2actflg {*} p2_act1_flg p2_act2_flg p2_act3_flg p2_act4_flg;
array p3actflg {*} p3_act1_flg p3_act2_flg p3_act3_flg p3_act4_flg;
array p4actflg {*} p4_act1_flg p4_act2_flg p4_act3_flg p4_act4_flg;
do i=1 to dim(plracts);
do j=1 to dim(p1actflg);
if find(plracts{i}, cats(put(j, $12.))) then p1actflg{j}=1;
else if missing(plracts{i}) then p1actflg{j}=.;
else p1actflg{j}=0;
end;
end;
*do this again for the other p#actflg arrays;
run;
My "array subscript is out of range" because of the different lengths of the player and activity arrays, but nesting in different do-loops would result in me writing many more lines of code than a wallpaper solution.
How would you do this more systematically, and/or in far fewer lines of code?
Not sure why you are processing 4 activities for flags when there are only 3.
Some ideas:
Refactoring the column names to numbered suffixes would reduce some of the wallpaper effect.
activities_p1-activities_p3
Refactoring the flag column names to number suffixes
flag_p1_1-flag_p1_4
flag_p2_1-flag_p2_4
flag_p3_1-flag_p3_4
Use DIM to stay within array bounds.
Use two dimensional array for flags
Use direct addressing of items to be flagged
Add error checking
Not fewer, but perhaps more robust ?
This code examines each item in the activities list as opposed to seeking presence of a specific items (1..4):
data want;
set have;
array activities
activities_p1-activities_p3
;
array flags(3,4)
flag_p1_1-flag_p1_4
flag_p2_1-flag_p2_4
flag_p3_1-flag_p3_4
;
do i = 1 to dim(activites);
if missing(activities[i]) then continue; %* skip;
do j = 1 by 1;
item = scan ( activities[i], j, ',' );
if missing(item) then leave; %* no more items in csv list;
item_num = input (item,?1.);
if missing(item_num) then continue; %* skip, csv item is not a number;
if item_num > hbound(flags,2) or item_num < lbound(flags,2) then do;
put 'WARNING:' item_num 'is invalid for flagging';
continue; %* skip, csv item is missing, 0, negative or exceeds 4;
end;
flags (i, item_num) = 1;
end;
* backfill zeroes where flag not assigned;
do j = 1 to hbound(flags,2);
flags (i, item_num) = sum (0, flags (i, item_num)); %* sum() handles missing values;
end;
end;
Here is the same processing, but only searching for specific items to be flagged:
data have; length id activities_p1-activities_p3 $20;input
id activities_p1-activities_p3 ; datalines;
A 1,2,3,4 1,3 .
B 1,3 1,2,3 1,2,3
C . 1,2,3 1,2,3
D . 1,2,3 .
E 1,2,3 . 1
;
data want;
set have;
array activities
activities_p1-activities_p3
;
array flags(3,4)
flag_p1_1-flag_p1_4
flag_p2_1-flag_p2_4
flag_p3_1-flag_p3_4
;
do i = 1 to dim(activities);
if not missing(activities[i]) then
do j = 1 to hbound(flags,2);
flags (i,j) = sum (flags(i,j), findw(trim(activities[i]),cats(j),',') > 0) > 0;
end;
end;
run;
What's going on ?
flags variables are reset to missing at top of step
hbound return 4 as upper limit of second dimension
findw(trim(activities[i]),cats(j),',') find position of j in csv string
trim needed to remove trailing spaces which are not part of findw word delimiter list
cats converts j number to character representation
findw returns position of j in csv string.
might want to also compress out spaces and other junk if activity data values are not reliable.
first > 0 evaluates position to 0 j not present and 1 present
second > 0 is a another logic evaluation that ensures j present flag remains 0 or 1. Otherwise flags would be a frequency count (imagine activity data 1,1,2,3)
flags(i,j) covers the 3 x 4 slots available for flagging.
Consider converting into a hierarchical view and doing the logic there. The real stickler here is the fact that there can be missing positions within each list. Because of this, a simple do loop will be difficult. A faster way would be multi-step:
Create a template of all possible players and positions
Create an actual list of all players and positions
Merge the template with the actual list and flag all matches
It's not as elegant as a single data step like could be done, but it is somewhat easy to work with.
data have;
infile datalines dlm='|';
input id$ p1_activities$ p2_activities$ p3_activities$;
datalines;
A|1,2,3,4|1,3|
B|1,3|1,2,3|1,2,3|
C| |1,2,3|1,2,3|
D| |1,2,3|
E|1,2,3| |1
;
run;
/* Make a template of all possible players and positions */
data template;
set have;
array players p1_activities--p3_activities;
length varname $15.;
do player = 1 to dim(players);
do activity = 1 to 4;
/* Generate a variable name for later */
varname = cats('p', player, '_act', activity, '_flg');
output;
end;
end;
keep ID player activity varname;
run;
/* Create a list of actual players and their positions */
data actual;
set have;
array players p1_activities--p3_activities;
do player = 1 to dim(players);
do i = 1 to countw(players[player], ',');
activity = input(scan(players[player], i, ','), 8.);
/* Do not output missing positions */
if(NOT missing(activity)) then output;
end;
end;
keep ID player activity;
run;
/* Merge the template with actual values and create a flag when an
an id, player, and activity matches with the template
*/
data want_long;
merge template(in=all)
actual(in=act);
by id player activity;
flag_activity = (all=act);
run;
/* Transpose it back to wide */
proc transpose data=want_long
out=want_wide;
id varname;
by id;
var flag_activity;
run;
Following Stu's example, a DS2 DATA step can perform his 'merge' using a hash lookup. The hash lookup depends on creating a data set that maps CSV item lists to flags.
* Create data for hash;
data share_flags(where=(not missing(key)));
length key $7 f1-f4 8;
array k[4] $1 _temporary_;
do f1 = 0 to 1; k[1] = ifc(f1,'1','');
do f2 = 0 to 1; k[2] = ifc(f2,'2','');
do f3 = 0 to 1; k[3] = ifc(f3,'3','');
do f4 = 0 to 1; k[4] = ifc(f4,'4','');
key = catx(',', of k[*]);
output;
end;end;end;end;
run;
proc ds2;
data want2 / overwrite=yes;
declare char(20) id;
vararray char(7) pact[*] activities_p1-activities_p3;
vararray double fp1[*] flag_p1_1-flag_p1_4;
vararray double fp2[*] flag_p2_1-flag_p2_4;
vararray double fp3[*] flag_p3_1-flag_p3_4;
declare char(1) sentinel;
keep id--sentinel;
drop sentinel;
declare char(7) key;
vararray double flags[*] f1-f4;
declare package hash shares([key],[f1-f4],4,'share_flags'); %* load lookup data;
method run();
declare int rc;
set have;
rc = shares.find([activities_p1],[flag_p1:]); %* find() will fill-in the flag variables;
rc = shares.find([activities_p2],[flag_p2:]);
rc = shares.find([activities_p3],[flag_p3:]);
end;
enddata;
run;
quit;
%let syslast = want2;
share_flags
result

Assign an array's value as a dimension to another array in SAS

I've been working on a complicated code and am stuck in the end, where I need to assign one array's value as a dimension parameter to another array in the code. A snapshot from my code :
For example:
array temp_match_fl(3) temp_match_fl1 - temp_match_fl3;
ARRAY buracc_repay(3) buracc_repay1 - buracc_repay3;
ARRAY ocs_repay(3) ocs_repay1 - ocs_repay3;
jj = 0;
do until (jj>=3);
jj=jj+1;
If length(strip(match_flag(jj))) = 1 then do;
temp_match_fl(jj) = match_flag(jj);
end;
Else If length(strip(match_flag(jj))) > 1 then do;
j1 = 0;
min_diff = 99999999;
do until (j1>=length(strip(match_class(jj))));
j1=j1+1;
retain min_diff;
n=substr(strip(match_flag(jj)),j1,1);
If (min_diff > abs(buracc_repay(jj)-ocs_repay(n))) then do;
min_diff = abs(buracc_repay(jj)-ocs_repay(n));
temp_match_fl(jj) = n;
end;
end;
end;
kk=temp_match_fl(jj);
/* buracc_repay(jj) = ocs_repay(kk);*/
buracc_repay(jj) = ocs_repay(temp_match_fl(jj));
end;
run;
Now, I need to be able to assign the value stored in temp_match_fl(jj) array as dimension parameter to another array, how can I achieve that?? None of the last two statements work:
buracc_repay(jj) = ocs_repay(kk);
buracc_repay(jj) = ocs_repay(temp_match_fl(jj));
Can someone please suggest.
Thanks!
Actually your last two statements as written do work. Are you getting an error, or unexpected results? Can you make a simple example like below that shows the problem?
Note that for this to work, it's essential that the value of temp_match_fl(jj) is 1, 2, or 3, because your OCS_REPAY array has three elements. From the code you've shown, it's not clear if that is always true. You don't show the match_flag array.
data want ;
array temp_match_fl(3) temp_match_fl1 - temp_match_fl3 (1 2 3) ;
array buracc_repay(3) buracc_repay1 - buracc_repay3 (10 20 30) ;
array ocs_repay(3) ocs_repay1 - ocs_repay3 (100 200 300) ;
jj=1 ;
kk=2 ;
*buracc_repay(jj) = ocs_repay(kk); *this works ;
put temp_match_fl(jj)= ; *debug to confirm value is 1 2 or 3 ;
buracc_repay(jj) = ocs_repay(temp_match_fl(jj)); *this also works;
put (buracc_repay:)(=) temp_match_fl1=; *check output ;
run ;

Recursion in Array SAS

I have an existing collection of variables a_0,...,a_45 where a_i represents the amount of stuff I have on day i. I'd like to create a new collection of variables b_0,...,b_45 to represent the incremental change in stuff I have on day i (i.e. b_k=a_k-a_(k-1) ). My approach:
data test;
set dataset;
array a a_0-a_45;
array b b_0-b_45;
b(1)=a(1);
do i=2 to 45;
b(i)=a(i)-a(i-1);
end;
run;
However my b variables just come out missing.
What initial values do you have for a_1 to a_45 before you start the loop? As you are not intialising them (except for a_0 ≡ a(1)), every b(i) term will be a difference of 2 a terms, of which at least one will be missing, unless these variables are populated in your input dataset.
Here is some sample code showing that the delta computation is correct when the variable names in the data set align with the variables named in the array statement in the data step.
Sample data
data have(keep=product_id note a_:);
do product_id = 1 to 100;
length note $15;
array amount a_0-a_45;
call missing(of amount(*));
if (ranuni(123) < 0.5) then do;
note = 'static deltas';
static_delta = ceil(5 * ranuni(123));
amount(1) = static_delta;
do inventory_day = 2 to dim(amount);
amount(inventory_day) = amount(inventory_day-1) + static_delta;
end;
end;
else do;
note = 'random deltas';
amount(1) = ceil(5 * ranuni(123));
do inventory_day = 2 to dim(amount);
amount(inventory_day) = max ( 0, amount(inventory_day-1) + floor(10 * ranuni(123)) - 5 );
end;
end;
OUTPUT;
end;
run;
Compute deltas
data want;
set have;
array amount a_0-a_45;
array delta b_0-b_45;
delta(1) = amount(1);
do i=2 to dim(amount);
delta(i) = amount(i) - amount(i-1);
end;
drop i;
format a_: b_: 4.;
run;
As Richard has already suggested in his comment while I was working on writing the code...Basically the only error that you have in your code is that your code should loop from 2 to 46 because there are 46 elements in the array. below code should work for you.
%macro f();
data dataset;
%do i = 0 %to 45;
a_&i. = ranuni(2);
%end;
run;
%mend;
%f();
data test;
set dataset;
array a1 a_0-a_45;
array b1 b_0-b_45;
/* This line will help in avoiding b_0 to have a missing value */
b1(1)=a1(1);
do i=2 to 46;
b1(i)=a1(i)-a1(i-1);
end;
run;

Putting individual customers transactions into an array

I have a dataset consisting of some transactions done by customers
I want to put these transactions into an array from 1 to 50. So 1 customers can have 50 transactions or more. I want my output dataset to be 1 row per customer with the value of each transaction being put into a column.
Finally what I am trying to do is to have these transaction values in an array. With the array being reset back to 0 upon first.cust_id. Any idea how I can go about doing this. This is what I have so far but it is giving me errors. Let's assume the initial dataset has just cust_id and transaction_amount fields.
The code represents just array initiation. I am doing a few calculations with the arrays afterwards.
data check;
set transactions;
by cust_id;
array trans[*] trans1-trans50;
retain array_counter;
if first.cust_id then do;
do i=1 to dim(trans);
trans[i]=0;
end;
array_counter=1;
end;
trans[array_counter] = transaction_amount;
array_counter=sum(array_counter,1);
if last.cust_id;
run;
First off, you need a retain. Without that, your values of the trans array will get cleared out every data step loop.
Second, if you say "50 or more" transactions; well, your array bounds only allow 50, so what does it do for 51?
This works, and I set it to 30 per customer so you can see the 0's. If you set that initial 1 to 30 loop to 1 to 60, you'll get out of bounds errors.
data transactions;
call streaminit(7);
do cust_id = 1 to 10;
do transaction = 1 to 30;
transaction_amount = rand('uniform');
output;
end;
end;
run;
data check;
set transactions;
by cust_id;
retain trans1-trans50;
array trans[*] trans1-trans50;
retain array_counter;
if first.cust_id then do;
do i=1 to dim(trans);
trans[i]=0;
end;
array_counter=1;
end;
trans[array_counter] = transaction_amount;
array_counter=sum(array_counter,1);
if last.cust_id;
run;
A couple things are wrong in your code. For starters, in order to use first/last processing, you need a by statement. Then, in order to get only one row per customer, you need to also retain your array variables and output only on last.cust_id:
data check;
set transactions;
by cust_id;
array trans[*] trans1-trans50;
retain array_counter trans1-trans50;
if first.cust_id then do;
do i=1 to dim(trans);
trans[i]=0;
end;
array_counter=1;
end;
trans[array_counter] = transaction_amount;
array_counter+1;
if last.cust_id then output;
run;

Count the number of times a value occurs

I have 7 variables, 489 observations with variable values of 0-4.
What I need is the count percentage of use.
Answers 0,1 stand for non usage, and answers 2,3,4 stand for usage.
I created 7 additional vars and turned all the values above to:
1=usage - 0=non-usage.
Now, I don't know how to count and present how many "1" I have for each var and divide it by 489.
data LAB7;
set LAB3;
array v{*} v21-v27;
array VU{7};
DO i=1 to dim(v);
if v[i] = 1|0 THEN VU[i]=0;
else VU[i]=1;
END;
run;
You can do this:
data usage;
set lab3 end=eof;
array v{*} v21-v27
array n{7};
retain n: 0;
do i = 1 to dim(v);
if v[i] in (2, 3, 4) then n[i] + 1;
end;
if eof then do j = 1 to dim(v);
variable = vname(v[j]);
pct_usage = 100 * n[j] / _n_;
output;
end;
keep variable pct_usage;
run;
This creates an array of counters, one per variable, that are incremented by one whenever the corresponding variable is equal to 2, 3, or 4.
At the end of the data step, we output a record for each variable and record the percentage as the counter divided by the number of observations (_n_ when eof is true).
An alternative would be to use proc freq.
data indicators;
set lab3;
array v{*} v21-v27;
array ind{7};
do i = 1 to dim(v);
ind[i] = (v[i] in (2, 3, 4));
end;
run;
proc freq data = indicators;
tables ind: / out = usage;
run;
This creates binary indicator variables, one for each of the input variables, that are 1 when the input is 2, 3, or 4, and 0 otherwise. Counts and percentages are then obtained using proc freq.

Resources