After I declares an array, I'd like to reset its values for the rest of the code.
array cutoffs[4] _temporary_ (1 2 3 4); /*works well*/
... use of the array
array cutoffs[3] _temporary_ (3.5 5 7.5); /*Error*/
... use of the updated array
The error is as following :
ERROR 124-185: The variable cutoffs has already been defined.
This error is very clear but I wonder how could I reattribute the array without changing its name (which would be most tedious).
I tried some syntaxes but couldn't find by myself, and I saw no ressources on google, nor on stackoverflow.
How can I do it ?
EDIT : the main purpose is that I created a function (with proc fcmp) that take arrays as parameter and cut the value (like R's cut function). The function is to be used on a lot of columns but with different cutoffs, and I don't want to tediously create an array for each and every column.
Here is a macro version of your FCMP function:
%macro cut2string(var,cutoffs,values);
%if &var. lt %scan(&cutoffs.,1,%str( )) %then "%scan(&values.,1,%str( ))";
%else %if &var. ge %scan(&cutoffs.,-1,%str( )) %then "%scan(&values.,-1,%str( ))";
%else %do i=1 %to %sysfunc(countw(&cutoffs.,%str( )));
%if &var. ge %scan(&cutoffs.,&i.,%str( )) and &var. lt %scan(&cutoffs.,%eval(&i.+1),%str( )) %then "%scan(&values.,%eval(&i.+1),%str( ))";
%end;
%mend;
And here is how you would call it, using the same example as you used in your linked page:
data Work.nonsales2;
/*set Work.nonsales;*/
salary_string = %cut2string(30000, 20000 100000 500000, <20k 20k-100k 100k-500k >500k);
run;
You could use keyword parameter instead of positional to make your calls clearer:
%macro cut2string(var=,cutoffs=,values=);
...
salary_string = %cut2string(var=30000,cutoffs=20000 100000 500000,values=<20k 20k-100k 100k-500k >500k);
HOWEVER now that I see the code, this should really be a format in SAS:
proc format;
values cutoffs
low-<20000='<20k'
20000-<100000='20k-100k'
100000-<500000='100k-500k'
500000-high='>500k'
;
run;
data work.nonsales2
salarystrings=put(30000,cutoffs.);
run;
You can change the values of the cutoffs array one by one.
array cutoffs{4} _temporary_ (1 2 3 4); /*works well*/
... use of the array
cutoffs[1]=3.5;
cutoffs{2}=5;
cutoffs{3}=7.5;
cutoffs{4}=.;
or you could just use another name for the array the second time.
With that said, the way you are using this seems a bit strange.
EDIT: you could consider rewriting your proc fcmp function to expect the list of values as a character string (e.g. '3.5,5,7.5') instead of an array and do away with arrays entirely.
Your proc fcmp would change from something like
do i=1 to dim(array);
val=array{i};
...
end;
to something like;
do i=1 to countw(array,',');
val=input(scan(array,i,','),best32.);
...
end;
Why not use a macro instead of function?
%macro cut(invar,outvar,cutoffs,categories,dlm=%str( ));
/*
"CUT" continuous variable into categories
by generating SELECT code that can be used in
a data step.
The list of CATEGORIES must have one more entry that the list of CUTOFFS
*/
%local i ;
select ;
%do i=1 %to %sysfunc(countw(&cutoffs,&dlm));
when (&invar <= %scan(&cutoffs,&i,&dlm)) &outvar=%scan(&categories,&i,&dlm) ;
%end;
otherwise &outvar= %scan(&categories,-1,&dlm);
end;
%mend ;
Here is an example that creates both a numeric and a character output variable. For character variables either define the variable before using the macro or make sure the values for the first category is long enough for all values.
Let's test it.
data test ;
input x ##;
%cut(invar=x,outvar=y,cutoffs=3.5 5 7,categories=1 2 3 4)
%cut(invar=x,outvar=z,cutoffs=3.5|5|7,categories="One "|"Two"|"Three"|"Four",dlm=|)
cards;
2 3.5 4 5 6 7.4 8
;
If you turn on the MPRINT option you can see the generated code in the SAS log.
2275 %cut(invar=x,outvar=y,cutoffs=3.5 5 7,categories=1 2 3 4)
MPRINT(CUT): select ;
MPRINT(CUT): when (x <= 3.5) y=1 ;
MPRINT(CUT): when (x <= 5) y=2 ;
MPRINT(CUT): when (x <= 7) y=3 ;
MPRINT(CUT): otherwise y= 4;
MPRINT(CUT): end;
2276 %cut(invar=x,outvar=z,cutoffs=3.5|5|7,categories="One "|"Two "|"Three"|"Four ",dlm=|)
MPRINT(CUT): select ;
MPRINT(CUT): when (x <= 3.5) z="One " ;
MPRINT(CUT): when (x <= 5) z="Two" ;
MPRINT(CUT): when (x <= 7) z="Three" ;
MPRINT(CUT): otherwise z= "Four";
MPRINT(CUT): end;
Results
Obs x y z
1 2.0 1 One
2 3.5 1 One
3 4.0 2 Two
4 5.0 2 Two
5 6.0 3 Three
6 7.4 4 Four
7 8.0 4 Four
Related
Is there any form to keep variables with a doop loop in data step?
will be something as:
data test;
input id aper_f_201501 aper_f_201502 aper_f_201503 aper_f_201504
aper_f_201505 aper_f_201506;
datalines;
1 0 1 2 3 5 7
2 -1 5 4 8 7 9
;
run;
%macro test;
%let date = '01Jul2015'd;
data test2;
set test(keep=do i = 1 to 3;
aper_f_%sysfunc(intnx(month,&date,-i,begin),yymmn6.);
end;)
run;
%mend;
%test;
I need to iterate several dates.
Thank you very much.
You need to use macro %do loop instead of the data step do loop, which is not going to be valid in the middle of a dataset option. Also do not generate those extra semi-colons into the middle of your dataset options. And do include a semi-colon to end your SET statement.
%macro test;
%local i date;
%let date = '01Jul2015'd;
data test2;
set test(keep=
%do i = 1 %to 3;
aper_f_%sysfunc(intnx(month,&date,-i,begin),yymmn6.)
%end;
);
run;
%mend;
%test;
You can use the colon shortcut to reference variables with the same prefix, anything in front of the colon will be kept.
keep ID aper_f_2015: ;
There's also a hyphen when you have sequential lists
keep ID aper_f_201501-aper_f_201512;
You can use a macro but not sure it adds a lot of value here.
I have a bunch of character variables which I need to sort out from a large dataset. The unwanted variables all have entries that are the same or are all missing (meaning I want to drop these from the dataset before processing the data further). The data sets are very large so this cannot be done manually, and I will be doing it a lot of times so I am trying to create a macro which will do just this. I have created a list macro variable with all character variables using the following code (The data for my part is different but I use the same sort of code):
data test;
input Obs ID Age;
datalines;
1 2 3
2 2 1
3 2 2
4 3 1
5 3 2
6 3 3
7 4 1
8 4 2
run;
proc contents
data = test
noprint
out = test_info(keep=name);
run;
proc sql noprint;
select name into : testvarlist separated by ' ' from test_info;
quit;
My idea is then to just use a data step to drop this list of variables from the original dataset. Now, the problem is that I need to loop over each variable, and determine if the observations for that variable are all the same or not. My idea is to create a macro that loops over all variables, and for each variable counts the occurrences of the entries. Since the length of this table is equal to the number of unique entries I know that the variable should be dropped if the table is of length 1. My attempt so far is the following code:
%macro ListScanner (org_list);
%local i next_name name_list;
%let name_list = &org_list;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
%put &next_name;
proc sql;
create table char_occurrences as
select &next_name, count(*) as numberofoccurrences
from &name_list group by &next_name;
select count(*) as countrec from char_occurrences;
quit;
%if countrec = 1 %then %do;
proc sql;
delete &next_name from &org_list;
quit;
%end;
%let i = %eval(&i + 1);
%end;
%mend;
%ListScanner(org_list = &testvarlist);
Though I get syntax errors, and with my real data I get other kinds of problems with not being able to read the data correctly but I am taking one step at a time. I am thinking that I might overcomplicate things so if anyone has an easier solution or can see what might be wrong to I would be very grateful.
There are many ways to do this posted around.
But let's just look at the issues you are having.
First for looping through your space delimited list of names it is easier to let the %do loop increment the index variable for you. Use the countw() function to find the upper bound.
%do i=1 %to %sysfunc(countw(&name_list,%str( )));
%let next_name = %scan(&name_list,&i,%str( ));
...
%end;
Second where is your input dataset in your SQL code? Add another parameter to your macro definition. Where to you want to write the dataset without the empty columns? So perhaps another parameter.
%macro ListScanner (dsname , out, name_list);
%local i next_name sep drop_list ;
Third you can use a single query to count all of variables at once. Just use count( distinct xxxx ) instead of group by.
proc sql noprint;
create table counts as
select
%let sep=;
%do i=1 %to %sysfunc(countw(&name_list,%str( )));
%let next_name = %scan(&name_list,&i,%str( ));
&sep. count(distinct &next_name) as &next_name
%let sep=,;
%end;
from &dsname
;
quit;
So this will get a dataset with one observation. You can use PROC TRANSPOSE to turn it into one observation per variable instead.
proc transpose data=counts out=counts_tall ;
var _all_;
run;
Now you can just query that table to find the names of the columns with 0 non-missing values.
proc sql noprint ;
select _name_ into :drop_list separated by ' '
from counts_tall
where col1=0
;
quit;
Now you can use the new DROP_LIST macro variable.
data &out ;
set &dsname ;
drop &drop_list;
run;
So now all that is left is to clean up after your self.
proc delete data=counts counts_tall ;
run;
%mend;
As far as your specific initial question, this is fairly straightforward. Assuming &testvarlist is your macro variable containing the variables you are interested in, and creating some test data in have:
%let testvarlist=x y z;
data have;
call streaminit(7);
do id = 1 to 1e6;
x = floor(rand('Uniform')*10);
y = floor(rand('Uniform')*10);
z = floor(rand('Uniform')*10);
if x=0 and y=4 and z=7 then call missing(of x y z);
output;
end;
run;
data want fordel;
set have;
if min(of &testvarlist.) = max(of &testvarlist.)
and (cmiss(of &testvarlist.)=0 or missing(min(of &testvarlist.)))
then output fordel;
else output want;
run;
This isn't particularly inefficient, but there are certainly better ways to do this, as referenced in comments.
Let's say I have a bunch of variables (call it eq3_xxxxx where the xxx are the variations) that all have 5 possible levels (1,2,3,4,5) and I want to create a dummy variable for each level of each variable.
I thought I could do something like:
%macro eq_levels(eq3:);
data mydata;
%do i = 1 %to 5;
x=cats(%eq3:,%i);
%end;
%mend;
But this doesn't seem to work. I'd rather not use SQL or anything like that, as I think the array and do-loop solutions should suffice, but I am open to it if the explanation can be made straightforward enough.
It may just be some minor syntax problems you have. Is this what you want?
%macro eq_levels(eq3);
data;
do i = 1 to 5;
x = cats(&eq3,i);
output;
end;
run;
%mend;
%eq_levels("eq3_");
Output:
i x
1 1 eq3_1
2 2 eq3_2
3 3 eq3_3
4 4 eq3_4
5 5 eq3_5
I am maybe misreading this, but I understand that OP wishes multiple variables:
%macro macro_evn;
data mydata;
%do i=1 %to 5;
var&i=&i.;
output;
%end;
%mend;
%macro_evn;
Resulting to
var1 var2 var3 var4 var5
1 . . . .
1 2 . . .
1 2 3
....
Which would be easy to fill as needed. Then again, maybe I misread the question.
I have a data set that needs to be blown out a certain number of rows according to a dynamic value. Take the dataset below for example:
DATA HAVE;
LENGTH ID $3 COUNT 3;
INPUT ID $ COUNT;
DATALINES;
A 4
B 3
C 1
D 2
;
RUN;
ID=A needs to be blown out 4 rows, ID=B needs to be blown out 3 rows, etc. The resulting dataset would look as such (minus a bunch of other variables I have):
A 1
A 2
A 3
A 4
B 1
B 2
B 3
C 1
D 1
D 2
The following code works to an extent, but I'm having trouble dynamically setting the &COUNT. macro. I tried to insert a CALL SYMPUTX("COUNT",COUNT) statement so that as it loops over each row, the count is placed into the macro and the row is blown at that number of rows.
** THIS CODE ONLY WORKS IF YOU SET COUNT= TO SOME VALUE **;
%MACRO LOOPOVER();
DATA WANT; SET HAVE;
DO UNTIL(LAST.ID);
BY ID;
%DO I=1 %TO &COUNT.;
COUNT = &I.; OUTPUT;
%END;
END;
RUN;
%MEND;
%LOOPOVER;
** THIS CODE DOESN'T WORK BUT I'M NOT SURE WHY?? **;
%MACRO LOOPOVER();
DATA WANT; SET HAVE;
DO UNTIL(LAST.ID);
BY ID;
CALL SYMPUTX("COUNT",COUNT); /* NEW LINE HERE */
%DO I=1 %TO &COUNT.;
COUNT = &I.; OUTPUT;
%END;
END;
RUN;
%MEND;
%LOOPOVER;
It is unnecessary to use macro.
data want(rename=(_count=count));
set have;
do i=1 to count;
_count=i;
output;
end;
drop count;
run;
I have a dataset where I have different names in one column, the names can be duplicate. My task here is to compare each and every name with the rest of the names in the column.For example if I take the name 1 "Vishal" I have to compare it with all the names from 2 to 13. If there is a matching name from row 2 to 13 there will be different column made "flag" with value of Y if there is a duplicate if no duplicate then a value of N.I have to perform this operation with all the names in the group
I have written a code which looks like this:
data Name;
input counter name $50.;
cards;
1 vishal
2 swati
3 sahil
4 suman
5 bindu
6 bindu
7 vishal
8 tushar
9 sahil
10 swati
11 gudia
12 priyansh
13 priyansh
;
proc sql;
select count(name) into: n from swati;
quit;
proc sql;
select name into: name1 -:name13 from swati;
quit;
options mlogic mprint symbolgen;
%macro swati;
data name1;
set swati;
%do i = 1 %to 1;
%do j= %eval(&i.+1) %to &n.;
if &&name&i. =&&name&j. then flag="N";
else flag="Y";
%end;
%end;
run;
%mend;
%swati;
the code gives me the vale N for all the names even if there is a name matching, also it makes a different variable with using all the variable names.*
The desired output is shown below
Name Flag
vishal N
swati N
sahil N
suman Y
bindu N
bindu Y
vishal Y
tushar Y
sahil Y
swati Y
gudia Y
priyansh N
priyansh Y
So basically we started finding vishal (the first name) from 2 to 13 and see if there is a duplicate, if there is the flag is N i.e. there is a duplicate. Let us see the name "Suman" which is the fourth name in the list, and we start searching for its matching from 5 to 13. Since there isn't any duplicate for that we have flagged it as "Y".
WE HAVE TO DO THIS USING A DO LOOP
Sort data by Name
Use a data step with BY to identify duplicates
Resort by Order if desired
proc sort data=name;
by name;
run;
data want;
set name;
by name;
if first.name and last.name then unique='Y';
else unique='N';
run;
proc sort data=want;
by counter;
run;
Your answer for the last observation does not look right. Is there another condition such that if it is the last record the flag should be 'N' instead of 'Y'?
I really see no reason why you have to use a DO loop. But you could place a DO loop around a SET statement with the POINT= option to look for matching names.
data want ;
set name nobs=nobs ;
length next $50;
next=' ';
do p=_n_+1 to nobs until (next=name) ;
set name(keep=name rename=(name=next)) point=p;
end;
if next=name then flag='N'; else flag='Y';
drop next;
run;
You could also take advantage of the COUNTER variable and do it using GROUP BY in a SELECT statement in PROC SQL.
proc sql ;
create table want2 as
select *
, case when (counter = max(counter)) then 'Y' else 'N' end as flag
from name
group by name
order by counter
;
quit;