Trying to determine a sensible way to clean dates (character), then put those dates in a proper date format via input function, but maintain sensible variable names (and possibly even preserve the original variable names) once the char-to-number process is executed.
The dates are being cleaned with an array (replacing '..' with '01', or '....' with 0101) since there are about 75 variables that have dates as strings.
Ex. -
data sample;
input d1 $ d2 $ d3 $ d4 $ d5 $;
cards;
200103.. 20070905 20060222 2007.... 199801..
;
run;
data clean;
set sample;
array dt_cln(5) d1-d5;
array fl_dt (5) f1-f5;
*clean out '..'/'....', replace with '01'/'0101';
do i=1 to 5;
if substr(dt_cln(i),5,4) = '....' then do;
dt_cln(i) = substr(dt_cln(i),1,4) || '0101';
end;
else if substr(dt_cln(i),7,2) = '..' then do;
dt_cln(i) = substr(dt_cln(i),1,6) || '01';
end;
end;
*change to number;
do i=1 to 5;
fl_dt(i)=input(dt_cln(i),yymmdd8.);
end;
format f: date9.;
drop i d:;
run;
What would be the best way to approach this?
You cannot preserve the original names and convert from character to numeric directly - however, with a bit of macro code you could drop all the old character variables and rename the numeric versions you've created. E.g.
%macro rename_loop();
%local i;
%do i = 1 %to 5;
f&i = d&i
%end;
%mend;
Then in your data step add a rename statement at the end, after your drop statement:
rename %rename_loop;
Otherwise, your existing approach is already pretty good. You could perhaps simplify the cleaning process a bit, e.g. remove your first do-loop and do the following within the second one:
fl_dt(i)=input(tranwrd(dt_cln(i),'..','01'),yymmdd8.);
data want;
set sample;
array var1 newd1-newd5;
array var2 d:;
do over var2;
var1=input(ifc(index(var2,'.')^=0,put(prxchange('s/((\.){1,})/0101/',-1,var2),8.),var2),yymmdd8.);
end;
format newd1-newd5 yymmddn8.;
drop d:;
run;
Related
i have bellow currently
%macro sqlloop (event_id);
...lots of code, mostly proc sql segments ...
%mend;
that generates an output table (named export_table2). I need to be able to run this code dozens of time for every value in another table (named vars). my trial code testing what I want it to do is below (basically manually typing in the first two values of this 68 row table)
data ;
%let empl_nbr_var = '222';
%let fleet = '7ER';
%let position = 'A';
%let base = 'BWI';
%sqlloop(event_id = 1);
run;
data summary_pilots;
set work.export_table2;
run;
data;
%let empl_nbr_var = '111';
%let fleet = '320';
%let position = 'B';
%let base = 'CHS';
%sqlloop(event_id = 2);
run;
data summary_pilots;
set summary_pilots work.export_table2;
run;
This produces the final output of each execution stacked into one table called summary_pilots. How can I do this in a loop, prehaps using call execute to iterate through each row of vars? The columns of vars are exactly what I need for the macro variables, and I want to iterate through every single row to assign those macro variable and run my %sqlloop again. Thanks for the help!
EDIT:
currently figuring out how call execute works and see how its helpful here but still a bit stuck... code below works exactly as youd think, printing out all the variables in the table vars into the log.
data ;
set work.vars;
call execute( '%put='|| strip(empl_nbr_var) || ';
%put = ' || strip(fleet) ||';
%put = '|| strip(position) ||';
%put = ' || strip(base) ||';' );
run;
I am trying to use the below code, but am getting a crazy amount of errors due to the macros being assigned weirdly. The types in the columns of vars match exactly what I want them to be in the macros, but it still looks like that might be the issue here?
data ;
set work.vars;
call execute( '
%let empl_nbr_var =' || strip(empl_nbr_var) || ';
%let fleet = ' || strip(fleet) ||';
%let position = '|| strip(position) ||';
%let base = ' || strip(base) ||';
%sqlloop(event_id = 17);' );
run;
and the event ID doesnt actually matter here so i just left that as a random number for now.
Assuming your work.Vars contain data like this:
empl_nbr_var
fleet
position
base
222
7ER
A
BWI
111
320
B
CHS
...
...
...
...
Consider extending your macro to receive such input parameters:
%macro sqlloop(event_id, empl_nbr_var, fleet, position, base);
...lots of code, mostly proc sql segments ...
%mend;
Then, build run macro with concatenated data values via call execute. Below passes 17 into event_id parameter.
data _null_;
set Work.Vars;
args = catx("', '", empl_nbr_var, fleet, position, base);
args = '%sqlloop(17,'''|| strip(args) || ''');';
put args $char.; /* VIEW CALL COMMAND */
call execute(args); /* RUN CALL COMMAND */
run;
It makes no sense to code %LET statements in the middle of a data step. The macro processor will evaluate them before it passes the text of the data step code to SAS to process. Avoid confusing yourself by moving the %LET statements before the data step.
If the macro needs values of macros variables, like FLEET, as input then make those things parameters to the macro. Don't create a macro that references "magic" macro variables, macro variables that are neither input parameters nor created by the macro. Instead the reference to them just appears in the middle of the macro definition as if their values will appear by magic somehow.
%macro sqlloop(empl_nbr_var,fleet,position,base);
... code that uses &fleet.
%mend;
If you have a lot of combinations of parameters you want run through your macro then collect them into a dataset first.
data inputs ;
input empl_nbr_var fleet $ position $ base $ ;
cards;
222 7ER A BWI
111 320 B CHS
;
Then you can use those dataset variables to generate the calls to the macro. You could try using call execute() to do this, but personally I find it a lot easier to use a data step to write the code to a file. Then you can examine the file and make sure the code generation logic is correct. Plus you can use the power of the PUT statement to make the code generation easier. For example if the variable names match the parameter names you can use named output.
filename code temp;
data _null_;
set inputs;
file code ;
put '%sqlloop(' empl_nbr_var= ',' fleet= ',' position= ',' base= ')';
run;
Which will generate code like:
%sqlloop(empl_nbr_var=222 ,fleet=7ER ,position=A ,base=BWI )
%sqlloop(empl_nbr_var=111 ,fleet=320 ,position=B ,base=CHS )
Once you are confident that it is generating the right code use the %INCLUDE command to run the code it generates.
%include code / source2;
If the macro does not have its own step for aggregating the results you could include that step in the code generation.
filename code temp;
data _null_;
set inputs;
file code ;
put '%sqlloop(' empl_nbr_var= ',' fleet= ',' position= ',' base= ')';
put 'proc append base=summary_pilots data=export_table force; run;' ;
run;
%include code / source2;
Very simple: i'm trying to convert many character variables into numeric. The following code gives the "syntax error, expecting on of the following: a name, -, :, ;" for the drop and rename line.
data ex; set ex;
array numeric{3} var1 var2 var3;
do i=1 to 8;
temp = input(strip(numeric(i)),10.);
drop numeric(i);
rename temp = numeric(i);
end;
run;
can you not use drop or rename statements in do loops??
The dataset structure has to be decided when the data step is compiled. So there is no way you could use an array reference in a rename statement.
If you really have simple numerically suffixed variable names then you could use a simple RENAME statement.
rename new1-new3=var1-var3;
So your program might be as simple as this:
data want;
set have;
array ch var1-var3;
array new new1-new3;
do index=1 to dim(ch);
new[index]=input(left(ch[index]),32.);
end;
drop index var1-var3;
rename new1-new3=var1-var3;
run;
If the list of names is more complex, like AGE HEIGHT WEIGHT for example, then you will need to use a more complex RENAME statement like:
rename new1=AGE new2=HEIGHT new3=WEIGHT ;
So use some type of code generation method. Like macro code or using a data step to write lines of code to a file that can be included into the program using %include statement.
For example you could make a macro like this:
%macro rename(varlist);
%local i;
rename
%do i=1 %to %sysfunc(countw(&varlist));
new&i=%scan(&varlist,&i)
%end;
;
%mend ;
And use it like this:
%let charvars=AGE HEIGHT WEIGHT;
data want;
set have;
array ch &charvars;
array new [%sysfunc(countw(&charvars))];
do index=1 to dim(ch);
new[index]=input(left(ch[index]),32.);
end;
drop index &charvars;
%rename(&charvars);
run;
You're doing a couple of things incorrectly for SAS.
Don't use the same data set name in the DATA/SET statements. It's bad practice and makes it much harder to debug your code.
You cannot change the type to the same variable name in the same data step. Often you can rename ahead of time to make this slightly easier.
Ideally, especially if the file was read from a text file you fix these issues at the data import stage, not after the fact.
I don't know if the DROP/RENAME statements will take the array variable appropriately, that's something that would need to be tested.
data ex1;
set ex;
*original character variables;
array _chars(3) var1-var3;
*new numeric variables;
array _nums{3} new_var1-new_var3;
do i=1 to 3; *should match size of array;
_nums(i) = input(strip(_chars(i)), 10.);
end;
drop var1-var3;
*not sure if this will work in the same step;
rename new_var1-new_var3 = var1-var3;
run;
I have a bunch of character variables which I need to sort out from a large dataset. The unwanted variables all have entries that are the same or are all missing (meaning I want to drop these from the dataset before processing the data further). The data sets are very large so this cannot be done manually, and I will be doing it a lot of times so I am trying to create a macro which will do just this. I have created a list macro variable with all character variables using the following code (The data for my part is different but I use the same sort of code):
data test;
input Obs ID Age;
datalines;
1 2 3
2 2 1
3 2 2
4 3 1
5 3 2
6 3 3
7 4 1
8 4 2
run;
proc contents
data = test
noprint
out = test_info(keep=name);
run;
proc sql noprint;
select name into : testvarlist separated by ' ' from test_info;
quit;
My idea is then to just use a data step to drop this list of variables from the original dataset. Now, the problem is that I need to loop over each variable, and determine if the observations for that variable are all the same or not. My idea is to create a macro that loops over all variables, and for each variable counts the occurrences of the entries. Since the length of this table is equal to the number of unique entries I know that the variable should be dropped if the table is of length 1. My attempt so far is the following code:
%macro ListScanner (org_list);
%local i next_name name_list;
%let name_list = &org_list;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
%put &next_name;
proc sql;
create table char_occurrences as
select &next_name, count(*) as numberofoccurrences
from &name_list group by &next_name;
select count(*) as countrec from char_occurrences;
quit;
%if countrec = 1 %then %do;
proc sql;
delete &next_name from &org_list;
quit;
%end;
%let i = %eval(&i + 1);
%end;
%mend;
%ListScanner(org_list = &testvarlist);
Though I get syntax errors, and with my real data I get other kinds of problems with not being able to read the data correctly but I am taking one step at a time. I am thinking that I might overcomplicate things so if anyone has an easier solution or can see what might be wrong to I would be very grateful.
There are many ways to do this posted around.
But let's just look at the issues you are having.
First for looping through your space delimited list of names it is easier to let the %do loop increment the index variable for you. Use the countw() function to find the upper bound.
%do i=1 %to %sysfunc(countw(&name_list,%str( )));
%let next_name = %scan(&name_list,&i,%str( ));
...
%end;
Second where is your input dataset in your SQL code? Add another parameter to your macro definition. Where to you want to write the dataset without the empty columns? So perhaps another parameter.
%macro ListScanner (dsname , out, name_list);
%local i next_name sep drop_list ;
Third you can use a single query to count all of variables at once. Just use count( distinct xxxx ) instead of group by.
proc sql noprint;
create table counts as
select
%let sep=;
%do i=1 %to %sysfunc(countw(&name_list,%str( )));
%let next_name = %scan(&name_list,&i,%str( ));
&sep. count(distinct &next_name) as &next_name
%let sep=,;
%end;
from &dsname
;
quit;
So this will get a dataset with one observation. You can use PROC TRANSPOSE to turn it into one observation per variable instead.
proc transpose data=counts out=counts_tall ;
var _all_;
run;
Now you can just query that table to find the names of the columns with 0 non-missing values.
proc sql noprint ;
select _name_ into :drop_list separated by ' '
from counts_tall
where col1=0
;
quit;
Now you can use the new DROP_LIST macro variable.
data &out ;
set &dsname ;
drop &drop_list;
run;
So now all that is left is to clean up after your self.
proc delete data=counts counts_tall ;
run;
%mend;
As far as your specific initial question, this is fairly straightforward. Assuming &testvarlist is your macro variable containing the variables you are interested in, and creating some test data in have:
%let testvarlist=x y z;
data have;
call streaminit(7);
do id = 1 to 1e6;
x = floor(rand('Uniform')*10);
y = floor(rand('Uniform')*10);
z = floor(rand('Uniform')*10);
if x=0 and y=4 and z=7 then call missing(of x y z);
output;
end;
run;
data want fordel;
set have;
if min(of &testvarlist.) = max(of &testvarlist.)
and (cmiss(of &testvarlist.)=0 or missing(min(of &testvarlist.)))
then output fordel;
else output want;
run;
This isn't particularly inefficient, but there are certainly better ways to do this, as referenced in comments.
I am trying to figure out how to call a macro variable in a loop within a data step in SAS, but I am lost; so I have 14 macro variables and I have to compare each of them to the entries of a vector. I tried:
data work.calendrier;
set projet.calendrier;
do i=1 to 3;
if date= "&vv&i"D then savinglight = 1;
end;
run;
But it is not working. The variable vv1 up to vv3 are date variables. For instance this code works:
data work.calendrier;
set projet.calendrier;
*do i=1 to 3;
if date= "&vv1"D then savinglight = 1;
*end;
run;
But with the loop it can not resolve the macro variable.
If you want to reference a macro variable with a number index like vv1,vv2,vv3 you need to resolve &i first.
SAS has a separate macro processor that resolves values before they reach the data step processor.
Essentially, you need to add extra ampersands at the beginning of your macro variable:
&&vv&i -> &vv1 -> "Value of vv1"
&&vv&i -> &vv2 -> "Value of vv2"
&&vv&i -> &vv3 -> "Value of vv3"
What happens here is that SAS reads in the information after the ampersand until it finds a break. SAS then resolves && as a single &, it then continues reading across until it resolves &i as a numeric value. You're then left with your required &vvi variable.
A couple of sources about this interesting topic:
http://www2.sas.com/proceedings/sugi29/063-29.pdf
http://www.lexjansen.com/nesug/nesug04/pm/pm07.pdf
Macro variable references are resolved before SAS compiles and runs your data step. You need to first figure out how to do what you want using SAS statements then, if necessary, you can use macro code to help you generate those statements.
If you want to test if a variable's value matches one of a list of values then consider using the IN operator.
data work.calendrier;
set projet.calendrier;
savinglight = date in ("&vv1"d,"&vv2"d,"&vv3"d);
run;
you need to use a macro. Here's the basic approach:
%let vv1 = 9;
%let vv2 = 2;
%let vv3 = 10;
data have;
drop i;
do i = 1 to 5;
date = i;
output;
end;
run;
%macro test;
data test;
set have;
%do i=1 %to 3;
if date= &&vv&i then savinglight = 1;
%end;
run;
%mend test;
%test;
After importing my CSV data with GETNAMES=NO, I have 59 columns with variable names VAR1, VAR2, . . . VAR59. My first row contains the names I need for the new variables, but they first needed manipulated by removing special characters and turning spaces into underscores since SAS doesn't like spaces in variable names. This is the array I used for that piece:
DATA DATA1; SET DATA (FIRSTOBS=7);
ARRAY VAR(59) VAR1-VAR59;
IF _N_ = 1 THEN DO;
DO I = 1 TO 59;
VAR[I] = COMPRESS(TRANSLATE(TRIM(VAR[I]),'_',' '),'?()');
PUT VAR[I]=;
END;
END;
DROP I;
RUN;
This worked perfectly, but now I need to get this first row up to the new variable names. I tried a similar array to perform this:
DATA DATA2; SET DATA1;
ARRAY V(59) VAR1-VAR59;
DO I = 1 TO 59;
IF _N_ = 1 AND V[I] NE "" THEN CALL SYMPUT("NEWNAME",V[I]);
RENAME VAR[I] = &NEWNAME;
END;
DROP I;
RUN;
This only puts the name of VAR59 since there is no [i] connected to the &NEWNAME, and it still isn't working quite right. Any suggestions to moving a row up to variable names AFTER manipulation?
Your primary problem is you are trying to use a macro variable in the data step it's created in. You can't. You're also trying to create rename statements in the data step; rename, as with other similar statements (keep, drop), must be defined before the data step is compiled.
You need to write code somewhere - either in a text file, a macro variable, whatever - with this information. For example:
filename renamef temp;
data _null_;
set myfile (obs=1);
file renamef;
array var[59];
do _i = 1 to dim(Var);
[your code to clean it out];
strput = cat("rename",vname(var[_i]),'=',var[_i],';');
put strput;
end;
run;
data want;
set myfile (firstobs=2);
%include renamef;
run;
There are lots of other examples to this on the site and on the web, "list processing" is the term for this.
Joe -- using your suggestions and another one of your posts, the following worked flawlessly:
Put the row of needed variables into long format (in my case, first row so n = 1)
DATA NEWVARS; SET DATA;
IF _N_ = 1 THEN OUTPUT NEWVARS;
RUN;
PROC TRANSPOSE DATA = NEWVARS OUT=NEWVARS1;
VAR _ALL_;
RUN;
Create a list of rename macro calls.
PROC SQL;
SELECT CATS('%RENAME(VAR=',_NAME_,',NEWVAR=',COL1,')')
INTO :RENAMELIST SEPARATED BY ' '
FROM NEWVARS1;
QUIT;
%MACRO RENAME(VAR=,NEWVAR=);
RENAME &VAR.=&NEWVAR.;
%MEND RENAME;
Call in the list created in Step 2 to rename all variables.
PROC DATASETS LIB=WORK NOLIST;
MODIFY DATA;
&RENAMELIST.;
QUIT;
I had to perform a few additional checks making sure that the variable names were not greater than 32 characters, and this was easy to check for when the data was in long format after transposing. If there are certain words that make the lengths too long, a TRANWRD statement can easily replace them with abbreviations.