I want to create a variable Var2 that is equal to 1 starting at the first observation Var1 is equal to 1 and Var2 is equal to 1 until the end of the by group defined by ID.
Here is the minimal working example:
ID Year Var1
1 1 .
1 2 0
1 3 .
1 4 1
1 5 .
And I want to create the following output:
ID Year Var1 Var2
1 1 . .
1 2 0 0
1 3 . 0
1 4 1 1
1 5 . 1
My current code is as follows:
DATA data1;
SET data0;
BY ID YEAR ;
IF LAST.ID THEN END = _N_;
IF Var1 > 0 THEN CNT=_N_;
RUN;
DATA data2;
SET data1;
BY ID YEAR ;
Var2 = 0;
IF Var1 = 1 THEN DO;
DO I = CNT TO END;
Var[I] = 1;
END;
END;
RUN;
However, SAS does not loop along observations.
I'm not sure what your example is doing, but this is fairly straightforward.
data want;
set have;
by id;
retain var2;
if first.id then var2=0;
if var1=1 then var2=1;
run;
Retain var2 to keep its value across observations, and then set it to 1 when you see a 1 in var1; finally, set it to 0 when you see a first.id row.
Related
I want to create a random dataset. Something like this-
ptno visits sex race
1 1 1 0
1 2 1 0
1 3 1 0
2 1 2 1
2 2 2 1
2 3 2 1
3 1 1 0
3 2 1 0
3 3 1 0
The values should be randomly generated. I want to know if I can do this dynamically using do loops. Thanks in advance for helping.
data want ;
length ptno visits sex race 8. ;
do ptno = 1 to 100 ;
_visits = ceil(ranuni(0)*5) ; /* between 1 & 5 */
sex = ceil(ranuni(0)*2) ; /* between 1 & 2 */
race = floor(ranuni(0)*2) ; /* between 0 & 1 */
do visits = 1 to _visits ;
output ;
end ;
end ;
drop _visits ;
run ;
SAS call ranuni() produce a random variate from a uniform distribution, if value is greater than 0.5 then 1, otherwise 0. Here, the same ptno (i) + seed get the same sex or race.
data want;
do i=100 to 110;
do j=1 to 5;
seed1=i+4567;
call ranuni(seed1,x);
seed2=i+1234;
call ranuni(seed2,y);
ptno=i;
visit=j;
sex=(x>0.5)+1;
race=(y<0.5);
output;
end;
end;
keep ptno--race;
run;
I want to get a data set with an array that saves the count of values greater than zero in a subset of an array.
My code:
%Macro Test(input_array, window);
array initial{*} &input_array;
array position[&window];
array cumulative[&window];
/* Fill array indicating position with value zero, previous value greater than zero */
do i = 1 to dim(initial) - 1;
if initial(i) gt 0 and initial(i+1) eq 0 then
position(i) = i + 1;
end;
/* Fill array indicating the count of values greater than zero until the index in the position array*/
%let j = 1;
%do %while (&j lt &window);
end_ = coalesce(of position&j - position&window);
if not missing(end_) then do;
gt_0_cnt = 0;
do k = &j to end_ - 1;
gt_0_cnt + ifn(initial(k) > 0,1,0);
end;
cumulative(end_ - 1) = gt_0_cnt;
end;
%let j = %eval(&j + end_);
%end;
%Mend;
DATA HAVE;
INPUT ID FM1-FM18;
DATALINES;
A 1 2 0 0 1 0 0 0 0 2 2 2 3 3 4 4 4 0
B 0 0 1 2 3 4 5 1 2 3 4 0 0 0 1 2 0 0
;
RUN;
DATA WANT;
SET HAVE;
%Test(FM: 18);
RUN;
The output I need:
But I have a problem when trying to evaluate this expression
%let j = %eval(&j + end_)
I get the messaje ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was:
1 + end_
I don't know of any other way to get the desired result.
If someone can help me I will be grateful.
Doesn't seem like you need the macro language for this.
data want;
set have;
array fm fm:;
array cum cum_1-cum_18;
do _i = 1 to dim(fm);
if fm[_i] eq 0 then call missing(cum[_i]);
else do;
do count = 1 by 1 until (fm[_i+count] eq 0 or (count+_i eq dim(fm)));
end;
put _i= count=;
cum[_i+count-1] = count;
_i = _i + count - 1;
end;
end;
run;
Obviously you can specify the 18 max on the cum array through a macro parameter, or what the variable names are, but all of the stuff you're doing is perfectly doable through the data step language or simple macro variable parameters.
So my dataset looks like this:
ABC1 ABC2 ABC3 ABC4 ABC5 DEF1 DEF2 DEF3 DEF4 DEF5
1 0 0 1 . 0 1 1 0 .
I want my output to be:
XYZ1 XYZ2 XYZ3 XYZ4 XYZ5
0 1 1 0 .
Basically if DEF2 = 1 and count of ABC3 and ABC4 and ABC5 of 1 is > 0 then XYZ2 is 1.
I have tried the following code but it doesnt work
data want;
set have;
array ABC ABC:;
array DEF DEF:;
array XYZ [5] $1;
do i = 1 to dim(ABC)-5;
if ABC(i) = . then XYZ(i) = '';
else if (DEF(i) = 1 and sum(ABC(i+1), ABC(i+3)) > 0) then XYZ(i) = 1;
else XYZ(i) = 0;
end;
drop i;
run;
Lets pivot things for a better understanding
index ABC DEF XYZ (wanted)
----- --- --- ---
1 1 0 0 (because DEF=0)
2 0 1 1 (sum ABC index 2..5 because DEF=1 # index 2)
3 0 1 1 (sum ABC index 3..5 because DEF=1 # index 3)
4 1 0 0 (because DEF=0)
5 . . . (because DEF=.)
Now apply that understanding to processing variables of the row when arrayed. The items will be processed from 5 to 1, so a running_sum can be computed and applied when necessary.
data want;
set have;
array abc abc:;
array def def:;
array xyz(5);
running_sum = .;
do index = dim(abc) to 1 by -1;
if not missing(abc(index)) then running_sum + abc(index);
if def(index) in (., 0)
then xyz(index) = def(index);
else xyz(index) = running_sum;
end;
run;
Not all processing rules are stated in the question, such as
the case abc(j) = . and abc(k) ne . and k > j
Such a case may never happen.
How do I reshape this data in SAS?
id q1a q2a q1b q2b q1c q2c
1 3 0 1 1 1 9
2 4 9 1 2 2 0
3 5 9 1 2 4 0
into this:
id q1 q2 type
1 3 0 a
1 1 1 b
1 1 9 c
.............
The simplest way is to transpose the data, split out the last letter from the q variables, then re-transpose.
data have;
input id q1a q2a q1b q2b q1c q2c;
datalines;
1 3 0 1 1 1 9
2 4 9 1 2 2 0
3 5 9 1 2 4 0
;
run;
proc transpose data=have out=temp1;
by id;
run;
data temp2;
set temp1;
length type $1;
type=substr(_NAME_,3);
_NAME_=substr(_NAME_,1,2);
run;
proc transpose data=temp2 out=want (drop=_:) ;
by id type;
id _NAME_;
var COL1;
run;
Seems simpler to me to just do it in one bit.
data want;
set have;
array qs q:;
do _t = 1 to dim(qs) by 2;
q1=qs[_t];
q2=qs[_t+1];
type = substr(vname(qs[_t]),3,1);
output;
end;
keep id q1 q2 type;
run;
That works for exactly your data; there are some assumptions (the substr and the relationship between q1/q2) that might need to be changed for a real-world example.
I have a dataset like this
data ID;
input num;
datalines;
1
1
2
3
4
4
;
I want to create another variable to group them in increment of 2:
num id
1 1
1 1
2 1
3 2
4 2
4 2
Thanks!
If they're truly 1-2-3-4, then you can use MOD:
data want;
set have;
id=mod(num,2)+1;
run;
*or subtract 2-mod(num,2) if you want to start with 1;
If they're not truly 1-2-3-4, then you can use MOD(iter) instead of mod(num) and if first.num then iter+1; to make the iterator that is 1-2-3-4.
You can use binary variable that changes its value from 1 to 0 or from 0 to 1 for every new num;
data result;
set id;
by num;
retain incr 0;
if FIRST.num then do;
incr=not incr;
id+incr;
end;
drop incr;
run;