I can't seem to find information about this online...
I have a list of variables I want to do a proc summary on. As these proc summaries are performed individually per variable, it would be faster for me if I can find some way to loop through a numbered list of variables, then create an output to excel or simply a combined table of results that clearly indicates what results belong to what variable.
The problem is I only know do loops work in a datastep, how would I get this to work for proc steps? Could I write a macro for the proc step, then nest it within a datastep? Would this cause it to run appropriately? i.e.
data _NULL_;
set table_of_vars;
do i=1 to (number of vars in the table);
_n_ = i;
%let var = _n_;
%macro_proc_summ(&var.);
end;
and another code subsequently that merges the individual output, or perhaps the macro could even generate output that always appends information.
Obviously the code is very sketchy, but conceptually could this work?
EDIT: To give a bit more clarity, this is how the code would look like without a loop in place.
%macro Analysis(var); %macro _; %mend _;
proc summary data=masterdata nway missing;
class &var.;
output out = &var._summ (drop = _type_);
run;
%mend;
endrsubmit;
%Analysis(var1);
%Analysis(var2);
%Analysis(var3);
.
.
.
.
%Analysis(var100);
From here we could either:
Export var1_summ, var2_summ to Excel in cells A1, D1, etc.
Or first combine our individual summaries into a large table then export to
some graphing application to look at the trend.
Either way you can see how these are individual Proc steps, which could be done a lot quicker in a loop.
If you don't want an output table use Proc Tabulate.
proc tabulate data=sashelp.class out=summary1;
class sex age;
var weight;
table sex age, weight*(n mean min max median);
run;
data summary2;
set summary1;
Var=coalescec(sex, put(age, 2.));
drop age sex _:;
run;
EDIT: Rather than proc tabulate, if you only want N, use PROC FREQ
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
table sex age;
run;
*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable'
variable_value='Variable Value';
run;
*Display;
proc print data=want(obs=20) label;
run;
Related
Very simple: i'm trying to convert many character variables into numeric. The following code gives the "syntax error, expecting on of the following: a name, -, :, ;" for the drop and rename line.
data ex; set ex;
array numeric{3} var1 var2 var3;
do i=1 to 8;
temp = input(strip(numeric(i)),10.);
drop numeric(i);
rename temp = numeric(i);
end;
run;
can you not use drop or rename statements in do loops??
The dataset structure has to be decided when the data step is compiled. So there is no way you could use an array reference in a rename statement.
If you really have simple numerically suffixed variable names then you could use a simple RENAME statement.
rename new1-new3=var1-var3;
So your program might be as simple as this:
data want;
set have;
array ch var1-var3;
array new new1-new3;
do index=1 to dim(ch);
new[index]=input(left(ch[index]),32.);
end;
drop index var1-var3;
rename new1-new3=var1-var3;
run;
If the list of names is more complex, like AGE HEIGHT WEIGHT for example, then you will need to use a more complex RENAME statement like:
rename new1=AGE new2=HEIGHT new3=WEIGHT ;
So use some type of code generation method. Like macro code or using a data step to write lines of code to a file that can be included into the program using %include statement.
For example you could make a macro like this:
%macro rename(varlist);
%local i;
rename
%do i=1 %to %sysfunc(countw(&varlist));
new&i=%scan(&varlist,&i)
%end;
;
%mend ;
And use it like this:
%let charvars=AGE HEIGHT WEIGHT;
data want;
set have;
array ch &charvars;
array new [%sysfunc(countw(&charvars))];
do index=1 to dim(ch);
new[index]=input(left(ch[index]),32.);
end;
drop index &charvars;
%rename(&charvars);
run;
You're doing a couple of things incorrectly for SAS.
Don't use the same data set name in the DATA/SET statements. It's bad practice and makes it much harder to debug your code.
You cannot change the type to the same variable name in the same data step. Often you can rename ahead of time to make this slightly easier.
Ideally, especially if the file was read from a text file you fix these issues at the data import stage, not after the fact.
I don't know if the DROP/RENAME statements will take the array variable appropriately, that's something that would need to be tested.
data ex1;
set ex;
*original character variables;
array _chars(3) var1-var3;
*new numeric variables;
array _nums{3} new_var1-new_var3;
do i=1 to 3; *should match size of array;
_nums(i) = input(strip(_chars(i)), 10.);
end;
drop var1-var3;
*not sure if this will work in the same step;
rename new_var1-new_var3 = var1-var3;
run;
So I have a dataset with one primary key: unique_id and 1200 variables. This dataset is generated from a macro so the number of columns will not be fixed. I need to split this dataset into 4 or more datasets of 250 variables each, and each of these smaller datasets should contain the primary key so that I can merge them back later. Can somebody help me with either a sas function or a macro to solve this?
Thanks in advance.
A simple way to split a datasets in the way you request is to use a single data step with multiple output datasets where each one has a KEEP= dataset option listing the variables to keep. For example:
data split1(keep=Name Age Height) split2(keep=Name Sex Weight);
set sashelp.class;
run;
So you need to get the list of variables and group then into sets of 250 or less. Then you can use those groupings to generate code like above. Here is one method using PROC CONTENTS to get the list of variables and CALL EXECUTE() to generate the code.
I will use macro variables to hold the name of the input dataset, the key variable that needs to be kept on each dataset and maximum number of variables to keep in each dataset.
So for the example above those macro variable values would be:
%let ds=sashelp.class;
%let key=name;
%let nvars=2;
So use PROC CONTENTS to get the list of variable names:
proc contents data=&ds noprint out=contents; run;
Now run a data step to split them into groups and generate a member name to use for the new split dataset. Make sure not to include the KEY variable in the list of variables when counting.
data groups;
length group 8 memname $41 varnum 8 name $32 ;
group +1;
memname=cats('split',group);
do varnum=1 to &nvars while (not eof);
set contents(keep=name where=(upcase(name) ne %upcase("&key"))) end=eof;
output;
end;
run;
Now you can use that dataset to drive the generation of the code:
data _null_;
set groups end=eof;
by group;
if _n_=1 then call execute('data ');
if first.group then call execute(cats(memname,'(keep=&key'));
call execute(' '||trim(name));
if last.group then call execute(') ');
if eof then call execute(';set &ds;run;');
run;
Here are results from the SAS log:
NOTE: CALL EXECUTE generated line.
1 + data
2 + split1(keep=name
3 + Age
4 + Height
5 + )
6 + split2(keep=name
7 + Sex
8 + Weight
9 + )
10 + ;set sashelp.class;run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.SPLIT1 has 19 observations and 3 variables.
NOTE: The data set WORK.SPLIT2 has 19 observations and 3 variables.
Just another way of doing it using macro variables:
/* Number of columns you want in each chunk */
%let vars_per_part = 250;
/* Get all the column names into a dataset */
proc contents data = have out=cols noprint;
run;
%macro split(part);
/* Split the columns into 250 chunks for each part and put it into a macro variable */
%let fobs = %eval((&part - 1)* &vars_per_part + 1);
%let obs = %eval(&part * &vars_per_part);
proc sql noprint;
select name into :cols separated by " " from cols (firstobs = &fobs obs = &obs) where name ~= "uniq_id";
quit;
/* Chunk up the data only keeping those varaibles and the uniq_id */
data want_part∂
set have (keep = &cols uniq_id);
run;
%mend;
/* Run this from 1 to whatever the increment required to cover all the columnns */
%split(1);
%split(2);
%split(3);
this is not a complete solution but some help to give you another insight into how to solve this. The previous solutions have relied much on proc contents and data step, but I would solve this using proc sql and dictionary.columns. And I would create a macro that would split the original file into as many parts as needed, 250 cols each. The steps roughly:
proc sql; create table as _colstemp as select * from dictionary.columns where library='your library' and memname = 'your table' and name ne 'your primary key'; quit;
Count the number of files needed somewhere along:
proc sql;
select ceil(count(*)/249) into :num_of_datasets from _colstemp;
select count(*) into :num_of_cols from _colstemp;
quit;
Then just loop over the original dataset like:
%do &_i = 1 %to &num_of_datasets
proc sql;
select name into :vars separated by ','
from _colstemp(firstobs=%eval((&_i. - 1)*249 + 1) obs = %eval(min(249,&num_of_cols. - &_i. * 249)) ;
quit;
proc sql;
create table split_&_i. as
select YOUR_PRIMARY_KEY, &vars from YOUR_ORIGINAL_TABLE;
quit;
%end;
Hopefully this gives you another idea. The solution is not tested, and may contain some pseudocode elements as it's written from my memory of doing things. Also this is void of macro declaration and much of parametrization one could do.. This would make the solution more general (parametrize your number of variables for each dataset, your primary key name, and your dataset names for example.
I am new here. I am trying to read in a data set multiple times. so for example, assume that I have 3 observations in a data set (called tempfile) for a variable called temp. the three observations are 4,6, and 5.. so I want to read in the set x number of times so the 4th observation would be 4, fifth would be 6 and sixth, would be 5. the 7th would be 4, etc etc. I have tried this literally a few dozen ways, by doing something like
data new;
do i=1 to 100;
set tempfile;
end;
output;
run;
I have tried this by moving the do statement, moving the output statement, omitting the output statement..... every which way, trying macros also. can somebody help? thanks John
followup....
Hello:
Thanks for response. That did work. I would like to now do several things involving some “if then” statements inside the loop (more than just reading in the data set).
I want to read in a data set n number of times, and each time, there will be two if then statements
So, assume I read in 3 numbers any number of times; 7, 15, and 12
As each number is read, it will ask if it is less than 10. And each time it will create a random number.
If less than 10, then
If rand(uniform) < .4 then 1 is added to counter1, else 1 is added to counter2
And if >= 10,
Then
If rand(uniform) < .2 then 1 is added to counter1, else 1 is added to counter2
Any help is much appreciated.
Thanks
John
The way that most data steps actually stop is when SAS reads past the end of the input. So you need a method that prevents SAS from doing that.
The easiest way to replicate the data is to just execute multiple output statements. So the first record is repeated three times, then the second record is repeated three times, etc.
data want;
set tempfile ;
do i=1 to 3;
output;
end;
run;
Another method is to just list the dataset multiple times on the SET statement. So to read it in 3 times just use
data want;
set tempfile tempfile tempfile;
run;
You could probably use macro logic or even just a macro variable to make the number of repetitions variable.
data _null_; call symputx('list',repeat('tempfile ',3-1)); run;
data want; set &list; run;
Other method is to use the POINT= and NOBS= options on the SET statement so that SAS never reads past the end and you can jump back to the beginning. But since it never reads past the end of the input data you will need to manually tell it when to stop.
data want ;
do i=1 to 3;
do p=1 to nobs ;
set tempfile point=p nobs=nobs;
output;
end;
end;
stop;
run;
Or more in the spirit of your original post you might want to use the MOD() function to figure out which observation to read next.
data want;
if _n_ > 100 then stop;
p=1+mod(_n_-1,nobs);
set tempfile point=p nobs=nobs;
run;
If you have SAS/STAT software SURVEYSELECT.
data have;
do temp=4,6,5;
output;
end;
run;
proc surveyselect reps=10 rate=1 out=temp2 noprint;
run;
The data step is designed for serial processing. In this case, you need to "remember" previous observations. You can do it using only the data step, but for that use case, there are other solutions in the SAS environment that are simpler. The one I suggest is a macro that appends the original file n times:
%macro replicate( data=, out=, n=)/des='&out is &data repeated &n times.';
data &out;
set
%do i=1 %to &n;
&data
%end;
; /* This ; ends the data step `set` statement */
run;
%mend;
You could test your example with this helper:
%macro test;
data have; /* create the example data set */
temp = 4; output;
temp = 6; output;
temp = 5; output;
run;
%replicate( data=have, out=want, n=4 );
proc print; quit;
%mend;
Here is a portion of the SAS doc that adds lots of detail with many examples.
My source data contains 200,000+ observations, one of the many variables in the data set is "county." My goal is to write a macro that will take this one data set as an input, and split them into 58 different temporary data sets for each of the California counties.
First question is if it is possible to specify the 58 counties on the data statement using something like a global reference array defined beforehand.
Second question is, assuming the output data sets have been properly specified on the data statement, is it possible to use a do loop to choose the right data set to write to?
I can get the comparison to work properly, but cannot seem to use a array reference to specify a output data set. This is most likely because I need more experience with the macro environment!
Please see below for the simplistic skeleton framework I have written so far. c_long array contains the names of each of the counties, c_short array contains a 3 letter abbreviation for each of the counties. Thanks in advance!
data splitraw;
length county_name $15;
infile "&path/random.csv" dsd firstobs=2;
input county_name $ number;
run;
%macro _58countysplit(dxtosplit,countycol);
data <need to specify 58 data sets here named something like &dxtosplit_ALA, &dxtosplit_ALP, etc..>;
set &dxtosplit;
do i=1 to 58;
if c_long{i}=&countycol then output &dxtosplit._&c_short{i};
end;
run;
%mend _58countysplit;
%_58countysplit(splitraw,county_name);
The code you provided will need to run through the large dataset 58 times, each time writing a small one. I have done it a bit different.
First I create a sample dataset with a variable "county" this will contain ten different values:
data large;
attrib county length=$12;
do i=1 to 10000;
county=put(mod(i,10)+1,ROMAN.);
output;
end;
run;
First, I start with finding all the unique values and constructing the names of all the different tables I would like to create:
proc sql noprint;
select distinct compbl("large_"!!county) into :counties separated by " "
from large;
quit;
Now I have a macrovariable "counties" that containes all the different datasets I want to create.
Here I am writing the IF-statements to a file:
filename x temp;
data _null_;
attrib county length=$12 ds length=$18;
file x;
i=1;
do while(scan("&counties",i," ") ne "");
ds=scan("&counties",i," ");
county=scan(ds,-1,"_");
put "if county=""" county +(-1) """ then output " ds ";";
i+1;
end;
run;
Now I have what I need to create the small datasets:
data &counties;
set large;
%inc x;
run;
I agree with user667489, there is almost always a better way then splitting one large data set into many small data sets. However, if you want to proceed along these lines there is a table in sashelp called vcolumn which lists all your libraries, their tables, and each column (in each table) that should help you. Also if you want
if c_long{i}=&countycol then output &dxtosplit._&c_short{i};
to resolve you might mean:
if c_long{i}=&countycol then output &&dxtosplit._&c_short{i};
It's likely, depending upon what you're actually trying to do, that BY processing is all you need. Nevertheless, here is a simple solution:
%macro split_by(data=, splitvar=);
%local dslist iflist;
proc sql noprint;
select distinct cats("&splitvar._", &splitvar)
into :dslist separated by ' '
from &data;
select distinct
catt("if &splitvar='", &splitvar, "' then output &splitvar._", &splitvar, ";", '0A'x)
into :iflist separated by "else "
from &data;
quit;
data &dslist;
set &data;
&iflist
run;
%mend split_by;
Here is some test data to illustrate:
options mprint;
data test;
length county $1 val $1;
input county val;
infile cards;
datalines;
A 2
B 3
A 5
C 8
C 9
D 10
run;
%split_by(data=test, splitvar=county)
And you can view the log to see how the macro generates the DATA step you want:
MPRINT(SPLIT_BY): proc sql noprint;
MPRINT(SPLIT_BY): select distinct cats("county_", county) into :dslist separated by ' ' from test;
MPRINT(SPLIT_BY): select distinct catt("if county='", county, "' then output county_", county, ";", '0A'x) into :iflist separated
by "else " from test;
MPRINT(SPLIT_BY): quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
MPRINT(SPLIT_BY): data county_A county_B county_C county_D;
MPRINT(SPLIT_BY): set test;
MPRINT(SPLIT_BY): if county='A' then output county_A;
MPRINT(SPLIT_BY): else if county='B' then output county_B;
MPRINT(SPLIT_BY): else if county='C' then output county_C;
MPRINT(SPLIT_BY): else if county='D' then output county_D;
MPRINT(SPLIT_BY): run;
NOTE: There were 6 observations read from the data set WORK.TEST.
NOTE: The data set WORK.COUNTY_A has 2 observations and 2 variables.
NOTE: The data set WORK.COUNTY_B has 1 observations and 2 variables.
NOTE: The data set WORK.COUNTY_C has 2 observations and 2 variables.
NOTE: The data set WORK.COUNTY_D has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.05 seconds
User PomPazz post this answer for creating a list from input variables:
"You need to use a macro to "write" the SAS code for you.
This should do what you are looking for. It takes a space delimited list of values, and loops over them doing what your code specifies. Post a comment if you have a question on it.
%macro doit(list);
proc sql noprint;
%let n=%sysfunc(countw(&list));
%do i=1 %to &n;
%let val = %scan(&list,&i);
create table somlib._&val as
select * from somlib.somtable
where item=&val;
%end;
quit;
%mend;
%doit(100 101 102);
Note, data sets cannot start with a number so I have these starting with '_'"
My questions is, how can this be applied to creating a list from a variable in a dataset that can be used in an IF statement such as "IF Telephone in(List) then Invalid=1"
This is needed to validate a list of telephone numbers from a predetermined list of invalid numbers.
The basic answer to your question is that you need to pull it into a macro variable or an include file.
proc sql;
select distinct telephone into :tellist separated by ','
from invalid_phones;
quit;
data want;
set have;
if telephone in (&tellist.) then invalid=1;
run;
This is limited to about 32k characters, so if you have more than ~3000 phone numbers it may not work. If telephone is character you need to select distinct quote(telephone) instead.
The more detailed answer is that this, usually, is inefficient. Better is to use a format.
data for_fmt;
set invalid_phones;
start=telephone;
label='INVALID';
fmtname='TELCHECKF'; *add $ if telephone is a character field;
output;
if _n_=1 then do; *this block adds a line to deal with non-matching records (hlo=o means other);
start=.;
label='VALID';
hlo='o';
output;
end;
run;
proc format cntlin=for_fmt;
quit;
data want;
set have;
if put(telephone,TELCHECKF.)='INVALID' then invalidflag=1;
run;
This can be faster than the list method, and doesn't have the length issues.
Joe's answer is great, but I just wanted to add that you can do this in one SQL step
proc sql noprint;
create table want as
select *, case
when telephone in (select distinct telephone from invalid_phones) then 1
else 0
end as invalid
from have
;
quit;