Trouble looping through variables in sas loop - arrays

Very simple: i'm trying to convert many character variables into numeric. The following code gives the "syntax error, expecting on of the following: a name, -, :, ;" for the drop and rename line.
data ex; set ex;
array numeric{3} var1 var2 var3;
do i=1 to 8;
temp = input(strip(numeric(i)),10.);
drop numeric(i);
rename temp = numeric(i);
end;
run;
can you not use drop or rename statements in do loops??

The dataset structure has to be decided when the data step is compiled. So there is no way you could use an array reference in a rename statement.
If you really have simple numerically suffixed variable names then you could use a simple RENAME statement.
rename new1-new3=var1-var3;
So your program might be as simple as this:
data want;
set have;
array ch var1-var3;
array new new1-new3;
do index=1 to dim(ch);
new[index]=input(left(ch[index]),32.);
end;
drop index var1-var3;
rename new1-new3=var1-var3;
run;
If the list of names is more complex, like AGE HEIGHT WEIGHT for example, then you will need to use a more complex RENAME statement like:
rename new1=AGE new2=HEIGHT new3=WEIGHT ;
So use some type of code generation method. Like macro code or using a data step to write lines of code to a file that can be included into the program using %include statement.
For example you could make a macro like this:
%macro rename(varlist);
%local i;
rename
%do i=1 %to %sysfunc(countw(&varlist));
new&i=%scan(&varlist,&i)
%end;
;
%mend ;
And use it like this:
%let charvars=AGE HEIGHT WEIGHT;
data want;
set have;
array ch &charvars;
array new [%sysfunc(countw(&charvars))];
do index=1 to dim(ch);
new[index]=input(left(ch[index]),32.);
end;
drop index &charvars;
%rename(&charvars);
run;

You're doing a couple of things incorrectly for SAS.
Don't use the same data set name in the DATA/SET statements. It's bad practice and makes it much harder to debug your code.
You cannot change the type to the same variable name in the same data step. Often you can rename ahead of time to make this slightly easier.
Ideally, especially if the file was read from a text file you fix these issues at the data import stage, not after the fact.
I don't know if the DROP/RENAME statements will take the array variable appropriately, that's something that would need to be tested.
data ex1;
set ex;
*original character variables;
array _chars(3) var1-var3;
*new numeric variables;
array _nums{3} new_var1-new_var3;
do i=1 to 3; *should match size of array;
_nums(i) = input(strip(_chars(i)), 10.);
end;
drop var1-var3;
*not sure if this will work in the same step;
rename new_var1-new_var3 = var1-var3;
run;

Related

SAS Macro Do Loop Issues

I have a very simple request. Loop through a dataset, turning each observation into a macro variable, and then doing a comparison on that macro variable. here's what my code looks like:
%do n = 1 %to &i2.;
data want;
set have;
%if _N_ = &n. %then %do;
call symputx("Var1",var1);
call symputx("var2",var2);
%end;
run;
data want;
retain FinalCount
set have;
where Variable1="&var1.";
by SomeVariable
if first.SomeVariable then FinalCount=0;
if final="FINAL" then FinalCount+1;
if Finalcount=&var2. then Final_Samples=1;
finalCount=FinalCount;
run;
%end
The part that is failing in the _N_ = &n. section. I keep getting the error "Variable N has been defined as both character and numeric." Basically I just need to set each observation as a macro variable once to do the next comparison, and then move on to the next guy. So, if there's a better way of doing that, please let me know. Otherwise, could you help me figure out why that comparison is not working?
If you could explain your larger problem then you might get a better answer that does not require you to convert your data values into macro variables. Converting values to strings and then trying to compare them again introduces a number of sources of errors.
To your question of how to set macro variables based on the Nth observation in a dataset, try one of these.
If it supports the FIRSTOBS= and OBS= dataset options.
data _null_;
set have (firstobs=&n obs=&n);
call symputx("Var1",var1);
call symputx("var2",var2);
run;
If the dataset supports direct access then use that.
data _null_;
p = &n;
set have point=p;
call symputx("Var1",var1);
call symputx("var2",var2);
stop;
run;
If not then use an IF (not a macro %if).
data _null_;
set have ;
if _n_ = &n then do;
call symputx("Var1",var1);
call symputx("var2",var2);
stop;
end;
run;

How to resolve macro variable in a loop in SAS

I am trying to figure out how to call a macro variable in a loop within a data step in SAS, but I am lost; so I have 14 macro variables and I have to compare each of them to the entries of a vector. I tried:
data work.calendrier;
set projet.calendrier;
do i=1 to 3;
if date= "&vv&i"D then savinglight = 1;
end;
run;
But it is not working. The variable vv1 up to vv3 are date variables. For instance this code works:
data work.calendrier;
set projet.calendrier;
*do i=1 to 3;
if date= "&vv1"D then savinglight = 1;
*end;
run;
But with the loop it can not resolve the macro variable.
If you want to reference a macro variable with a number index like vv1,vv2,vv3 you need to resolve &i first.
SAS has a separate macro processor that resolves values before they reach the data step processor.
Essentially, you need to add extra ampersands at the beginning of your macro variable:
&&vv&i -> &vv1 -> "Value of vv1"
&&vv&i -> &vv2 -> "Value of vv2"
&&vv&i -> &vv3 -> "Value of vv3"
What happens here is that SAS reads in the information after the ampersand until it finds a break. SAS then resolves && as a single &, it then continues reading across until it resolves &i as a numeric value. You're then left with your required &vvi variable.
A couple of sources about this interesting topic:
http://www2.sas.com/proceedings/sugi29/063-29.pdf
http://www.lexjansen.com/nesug/nesug04/pm/pm07.pdf
Macro variable references are resolved before SAS compiles and runs your data step. You need to first figure out how to do what you want using SAS statements then, if necessary, you can use macro code to help you generate those statements.
If you want to test if a variable's value matches one of a list of values then consider using the IN operator.
data work.calendrier;
set projet.calendrier;
savinglight = date in ("&vv1"d,"&vv2"d,"&vv3"d);
run;
you need to use a macro. Here's the basic approach:
%let vv1 = 9;
%let vv2 = 2;
%let vv3 = 10;
data have;
drop i;
do i = 1 to 5;
date = i;
output;
end;
run;
%macro test;
data test;
set have;
%do i=1 %to 3;
if date= &&vv&i then savinglight = 1;
%end;
run;
%mend test;
%test;

SAS Performing Loops in a Proc Step

I can't seem to find information about this online...
I have a list of variables I want to do a proc summary on. As these proc summaries are performed individually per variable, it would be faster for me if I can find some way to loop through a numbered list of variables, then create an output to excel or simply a combined table of results that clearly indicates what results belong to what variable.
The problem is I only know do loops work in a datastep, how would I get this to work for proc steps? Could I write a macro for the proc step, then nest it within a datastep? Would this cause it to run appropriately? i.e.
data _NULL_;
set table_of_vars;
do i=1 to (number of vars in the table);
_n_ = i;
%let var = _n_;
%macro_proc_summ(&var.);
end;
and another code subsequently that merges the individual output, or perhaps the macro could even generate output that always appends information.
Obviously the code is very sketchy, but conceptually could this work?
EDIT: To give a bit more clarity, this is how the code would look like without a loop in place.
%macro Analysis(var); %macro _; %mend _;
proc summary data=masterdata nway missing;
class &var.;
output out = &var._summ (drop = _type_);
run;
%mend;
endrsubmit;
%Analysis(var1);
%Analysis(var2);
%Analysis(var3);
.
.
.
.
%Analysis(var100);
From here we could either:
Export var1_summ, var2_summ to Excel in cells A1, D1, etc.
Or first combine our individual summaries into a large table then export to
some graphing application to look at the trend.
Either way you can see how these are individual Proc steps, which could be done a lot quicker in a loop.
If you don't want an output table use Proc Tabulate.
proc tabulate data=sashelp.class out=summary1;
class sex age;
var weight;
table sex age, weight*(n mean min max median);
run;
data summary2;
set summary1;
Var=coalescec(sex, put(age, 2.));
drop age sex _:;
run;
EDIT: Rather than proc tabulate, if you only want N, use PROC FREQ
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
table sex age;
run;
*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable'
variable_value='Variable Value';
run;
*Display;
proc print data=want(obs=20) label;
run;

SAS Put Certain Row as New Variable Names After Manipulation

After importing my CSV data with GETNAMES=NO, I have 59 columns with variable names VAR1, VAR2, . . . VAR59. My first row contains the names I need for the new variables, but they first needed manipulated by removing special characters and turning spaces into underscores since SAS doesn't like spaces in variable names. This is the array I used for that piece:
DATA DATA1; SET DATA (FIRSTOBS=7);
ARRAY VAR(59) VAR1-VAR59;
IF _N_ = 1 THEN DO;
DO I = 1 TO 59;
VAR[I] = COMPRESS(TRANSLATE(TRIM(VAR[I]),'_',' '),'?()');
PUT VAR[I]=;
END;
END;
DROP I;
RUN;
This worked perfectly, but now I need to get this first row up to the new variable names. I tried a similar array to perform this:
DATA DATA2; SET DATA1;
ARRAY V(59) VAR1-VAR59;
DO I = 1 TO 59;
IF _N_ = 1 AND V[I] NE "" THEN CALL SYMPUT("NEWNAME",V[I]);
RENAME VAR[I] = &NEWNAME;
END;
DROP I;
RUN;
This only puts the name of VAR59 since there is no [i] connected to the &NEWNAME, and it still isn't working quite right. Any suggestions to moving a row up to variable names AFTER manipulation?
Your primary problem is you are trying to use a macro variable in the data step it's created in. You can't. You're also trying to create rename statements in the data step; rename, as with other similar statements (keep, drop), must be defined before the data step is compiled.
You need to write code somewhere - either in a text file, a macro variable, whatever - with this information. For example:
filename renamef temp;
data _null_;
set myfile (obs=1);
file renamef;
array var[59];
do _i = 1 to dim(Var);
[your code to clean it out];
strput = cat("rename",vname(var[_i]),'=',var[_i],';');
put strput;
end;
run;
data want;
set myfile (firstobs=2);
%include renamef;
run;
There are lots of other examples to this on the site and on the web, "list processing" is the term for this.
Joe -- using your suggestions and another one of your posts, the following worked flawlessly:
Put the row of needed variables into long format (in my case, first row so n = 1)
DATA NEWVARS; SET DATA;
IF _N_ = 1 THEN OUTPUT NEWVARS;
RUN;
PROC TRANSPOSE DATA = NEWVARS OUT=NEWVARS1;
VAR _ALL_;
RUN;
Create a list of rename macro calls.
PROC SQL;
SELECT CATS('%RENAME(VAR=',_NAME_,',NEWVAR=',COL1,')')
INTO :RENAMELIST SEPARATED BY ' '
FROM NEWVARS1;
QUIT;
%MACRO RENAME(VAR=,NEWVAR=);
RENAME &VAR.=&NEWVAR.;
%MEND RENAME;
Call in the list created in Step 2 to rename all variables.
PROC DATASETS LIB=WORK NOLIST;
MODIFY DATA;
&RENAMELIST.;
QUIT;
I had to perform a few additional checks making sure that the variable names were not greater than 32 characters, and this was easy to check for when the data was in long format after transposing. If there are certain words that make the lengths too long, a TRANWRD statement can easily replace them with abbreviations.

SAS use value from one observation to overwrite different one

I have a data set with two main variables of interest now - Major and Major_Code. These should match up 1 to 1 but there are some errors I need to fix and what I've found is that for 14 Major_Code values, there are two different Majors. This is only due to a change in spelling or punctuation, like "ed." and "education". They are supposed to have the same value here but don't.
So I have a table with 7 pairs. Each pair has the same Major_Code and different a Major. How can I select one of the Major vales to use for each code? My only idea was through an if-then statement but that seems horribly inefficient.
I found the doubled values like this:
proc freq data=majorslist;
tables Major_Code/out=majorcodedups;
run;
proc print data=majorcodedups;
where COUNT > 1;
run;
So I can easily find these observations but can't extract certain values to overwrite onto another observation. I've looked into arrays, macros, sql and transpose but it's all a bit over my head right now.
Logically it would work like this:
from obs i to n, find value for variable x at obs i, output value onto variable y at obs i, go to obs(i+1) and repeat.
Assuming you have some rule for determining which MAJOR is correct for a MAJOR_CODE, you should do this:
This assumes majorslist is a dataset of every major/major_code pair whether unique or not - but only one per major/major_code pair.
proc sort data=majorslist;
by major_code major;
run;
data majorslist_unique;
set majorslist;
by major_code major;
if first.major_code and last.major_code then output;
else do;
*rule to determine whether to output it or not;
end;
run;
So, you now have the major_code/major relationship. Let's say you picked if first.major_code then output; as your rule (ie, take the major_code with the alphabetically first major value).
Now, you need to apply this to your larger dataset. There are a lot of ways to do that - merge this on is one, format is another, for starters. Format works like this:
Create a dataset with FMTNAME, START, LABEL defined. For each value of MAJOR_CODE, construct one row like that, where START is MAJOR_CODE and LABEL is MAJOR. We'll also add an extra line that says what to do with non-matches (in case you get new values of major_code).
data for_fmt;
set majorslist_unique;
fmtname='MAJORF'; *add a $ if MAJOR_CODE is a character variable;
start=major_code;
label=major;
output;
if _n_=1 then do;
hlo='o';
call missing(start);
label='NONMATCHED';
output;
end;
keep fmtname start label hlo;
run;
proc format cntlin=for_fmt;
quit;
Now you have a format, MAJORF. (or $MAJORF. if MAJOR_CODE is character), that you can use in a PUT statement.
data my_bigdata2;
set my_bigdata;
major = put(major_code,MAJORF.);
run;

Resources