Macro loop in SAS - passing value to condition - database

I'm a beginner in SAS and I am struggling a bit with the macro loop in SAS. The problem is illustrated by the code below. The task here is to create separate subsets and save them as libraries for later post-processing. Additionally I added graphs for visualization. I am operating on a huge database but for this post I create a sample at the beginning of the code for simplification.
However, it seems that the internal condition (IF ID = i ) doesn't filter out the data. Instead the internal loop creates empty tables (but with correct names: "SUB1", "SUB2", "SUB3") with a column (variabale) called "i".
DATA EXAMPLE;
INPUT ID DATE DDMMYY8. VALUE;
FORMAT DATE DDMMYY8.;
DATALINES;
1 01012011 100
1 01022011 400
1 01032011 678
2 01012011 678
2 01022011 333
2 01032011 333
3 01012011 733
3 01022011 899
3 01032011 999
;
%MACRO filter(number);
%DO i=1 %TO &number;
DATA SUB&i;
SET WORK.EXAMPLE;
IF ID = i;
PROC SGPLOT DATA=SUB&i;
reg x=DATE y=VALUE;
RUN;
%END;
%mend filter;
%filter(3);
If I manually copy and paste the part inside macro and manually change i to numbers 1 to 3 it creates correct graphs. What is wrong in this code? How can I pass the value from the DO statement inside the code?
I am using SAS Studio.

The macro is creating empty data sets because the code that the macro eventually writes contains the subsetting if statement
if ID = i;
Because the data set does not contain a variable i a new variable named i is added to the PDV and the output data sets SUB1, SUB2, SUB3. The default value for i is missing and no ID value is missing, thus no rows pass the test and you get empty data sets. The log will also provide clues to the situation:
NOTE: Variable i is uninitialized.
When abstracting a code segment for 'macroization' be sure to use & in front of the macro variables. Thus, when the macro contains
if ID = &i;
The eventual code written by the macro system will have your 3 similar code operations with the different values of the macro variable.
...
if ID = 1;
...
...
if ID = 2;
...
...
if ID = 3;
...

Right now you are producing the same graph three times because the datasets SUB1, SUB2, SUB3 all use the same set of data. That is because the only thing in your data step that depends on the value of the macro variable I is the name.
You are currently selecting the observations where the variable ID matches the variable I. Perhaps you meant to select the observations where the variable ID matches the macro variable used in the %DO loop?
IF ID = &i;

One tip for debugging your macro code is to add the statement
options mprint;
This will show the code that SAS is actually using.
For example in the log:
70 options mprint;
71 %MACRO filter(number);
72 %DO i=1 %TO &number;
73 DATA SUB&i;
74 SET WORK.EXAMPLE;
75 IF ID = &i;
76 PROC SGPLOT DATA=SUB&i;
77 reg x=DATE y=VALUE;
78 RUN;
79 %END;
80 %mend filter;
81
82 %filter(2);
MPRINT(FILTER): DATA SUB1;
MPRINT(FILTER): SET WORK.EXAMPLE;
MPRINT(FILTER): IF ID = 1;
NOTE: There were 9 observations read from the data set WORK.EXAMPLE.
NOTE: The data set WORK.SUB1 has 3 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds
MPRINT(FILTER): PROC SGPLOT DATA=SUB1;
MPRINT(FILTER): reg x=DATE y=VALUE;
MPRINT(FILTER): RUN;

Related

call execute inside a loop pulling from one table to execute a macro

i have bellow currently
%macro sqlloop (event_id);
...lots of code, mostly proc sql segments ...
%mend;
that generates an output table (named export_table2). I need to be able to run this code dozens of time for every value in another table (named vars). my trial code testing what I want it to do is below (basically manually typing in the first two values of this 68 row table)
data ;
%let empl_nbr_var = '222';
%let fleet = '7ER';
%let position = 'A';
%let base = 'BWI';
%sqlloop(event_id = 1);
run;
data summary_pilots;
set work.export_table2;
run;
data;
%let empl_nbr_var = '111';
%let fleet = '320';
%let position = 'B';
%let base = 'CHS';
%sqlloop(event_id = 2);
run;
data summary_pilots;
set summary_pilots work.export_table2;
run;
This produces the final output of each execution stacked into one table called summary_pilots. How can I do this in a loop, prehaps using call execute to iterate through each row of vars? The columns of vars are exactly what I need for the macro variables, and I want to iterate through every single row to assign those macro variable and run my %sqlloop again. Thanks for the help!
EDIT:
currently figuring out how call execute works and see how its helpful here but still a bit stuck... code below works exactly as youd think, printing out all the variables in the table vars into the log.
data ;
set work.vars;
call execute( '%put='|| strip(empl_nbr_var) || ';
%put = ' || strip(fleet) ||';
%put = '|| strip(position) ||';
%put = ' || strip(base) ||';' );
run;
I am trying to use the below code, but am getting a crazy amount of errors due to the macros being assigned weirdly. The types in the columns of vars match exactly what I want them to be in the macros, but it still looks like that might be the issue here?
data ;
set work.vars;
call execute( '
%let empl_nbr_var =' || strip(empl_nbr_var) || ';
%let fleet = ' || strip(fleet) ||';
%let position = '|| strip(position) ||';
%let base = ' || strip(base) ||';
%sqlloop(event_id = 17);' );
run;
and the event ID doesnt actually matter here so i just left that as a random number for now.
Assuming your work.Vars contain data like this:
empl_nbr_var
fleet
position
base
222
7ER
A
BWI
111
320
B
CHS
...
...
...
...
Consider extending your macro to receive such input parameters:
%macro sqlloop(event_id, empl_nbr_var, fleet, position, base);
...lots of code, mostly proc sql segments ...
%mend;
Then, build run macro with concatenated data values via call execute. Below passes 17 into event_id parameter.
data _null_;
set Work.Vars;
args = catx("', '", empl_nbr_var, fleet, position, base);
args = '%sqlloop(17,'''|| strip(args) || ''');';
put args $char.; /* VIEW CALL COMMAND */
call execute(args); /* RUN CALL COMMAND */
run;
It makes no sense to code %LET statements in the middle of a data step. The macro processor will evaluate them before it passes the text of the data step code to SAS to process. Avoid confusing yourself by moving the %LET statements before the data step.
If the macro needs values of macros variables, like FLEET, as input then make those things parameters to the macro. Don't create a macro that references "magic" macro variables, macro variables that are neither input parameters nor created by the macro. Instead the reference to them just appears in the middle of the macro definition as if their values will appear by magic somehow.
%macro sqlloop(empl_nbr_var,fleet,position,base);
... code that uses &fleet.
%mend;
If you have a lot of combinations of parameters you want run through your macro then collect them into a dataset first.
data inputs ;
input empl_nbr_var fleet $ position $ base $ ;
cards;
222 7ER A BWI
111 320 B CHS
;
Then you can use those dataset variables to generate the calls to the macro. You could try using call execute() to do this, but personally I find it a lot easier to use a data step to write the code to a file. Then you can examine the file and make sure the code generation logic is correct. Plus you can use the power of the PUT statement to make the code generation easier. For example if the variable names match the parameter names you can use named output.
filename code temp;
data _null_;
set inputs;
file code ;
put '%sqlloop(' empl_nbr_var= ',' fleet= ',' position= ',' base= ')';
run;
Which will generate code like:
%sqlloop(empl_nbr_var=222 ,fleet=7ER ,position=A ,base=BWI )
%sqlloop(empl_nbr_var=111 ,fleet=320 ,position=B ,base=CHS )
Once you are confident that it is generating the right code use the %INCLUDE command to run the code it generates.
%include code / source2;
If the macro does not have its own step for aggregating the results you could include that step in the code generation.
filename code temp;
data _null_;
set inputs;
file code ;
put '%sqlloop(' empl_nbr_var= ',' fleet= ',' position= ',' base= ')';
put 'proc append base=summary_pilots data=export_table force; run;' ;
run;
%include code / source2;

Split SAS datasets by column with primary key

So I have a dataset with one primary key: unique_id and 1200 variables. This dataset is generated from a macro so the number of columns will not be fixed. I need to split this dataset into 4 or more datasets of 250 variables each, and each of these smaller datasets should contain the primary key so that I can merge them back later. Can somebody help me with either a sas function or a macro to solve this?
Thanks in advance.
A simple way to split a datasets in the way you request is to use a single data step with multiple output datasets where each one has a KEEP= dataset option listing the variables to keep. For example:
data split1(keep=Name Age Height) split2(keep=Name Sex Weight);
set sashelp.class;
run;
So you need to get the list of variables and group then into sets of 250 or less. Then you can use those groupings to generate code like above. Here is one method using PROC CONTENTS to get the list of variables and CALL EXECUTE() to generate the code.
I will use macro variables to hold the name of the input dataset, the key variable that needs to be kept on each dataset and maximum number of variables to keep in each dataset.
So for the example above those macro variable values would be:
%let ds=sashelp.class;
%let key=name;
%let nvars=2;
So use PROC CONTENTS to get the list of variable names:
proc contents data=&ds noprint out=contents; run;
Now run a data step to split them into groups and generate a member name to use for the new split dataset. Make sure not to include the KEY variable in the list of variables when counting.
data groups;
length group 8 memname $41 varnum 8 name $32 ;
group +1;
memname=cats('split',group);
do varnum=1 to &nvars while (not eof);
set contents(keep=name where=(upcase(name) ne %upcase("&key"))) end=eof;
output;
end;
run;
Now you can use that dataset to drive the generation of the code:
data _null_;
set groups end=eof;
by group;
if _n_=1 then call execute('data ');
if first.group then call execute(cats(memname,'(keep=&key'));
call execute(' '||trim(name));
if last.group then call execute(') ');
if eof then call execute(';set &ds;run;');
run;
Here are results from the SAS log:
NOTE: CALL EXECUTE generated line.
1 + data
2 + split1(keep=name
3 + Age
4 + Height
5 + )
6 + split2(keep=name
7 + Sex
8 + Weight
9 + )
10 + ;set sashelp.class;run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.SPLIT1 has 19 observations and 3 variables.
NOTE: The data set WORK.SPLIT2 has 19 observations and 3 variables.
Just another way of doing it using macro variables:
/* Number of columns you want in each chunk */
%let vars_per_part = 250;
/* Get all the column names into a dataset */
proc contents data = have out=cols noprint;
run;
%macro split(part);
/* Split the columns into 250 chunks for each part and put it into a macro variable */
%let fobs = %eval((&part - 1)* &vars_per_part + 1);
%let obs = %eval(&part * &vars_per_part);
proc sql noprint;
select name into :cols separated by " " from cols (firstobs = &fobs obs = &obs) where name ~= "uniq_id";
quit;
/* Chunk up the data only keeping those varaibles and the uniq_id */
data want_part∂
set have (keep = &cols uniq_id);
run;
%mend;
/* Run this from 1 to whatever the increment required to cover all the columnns */
%split(1);
%split(2);
%split(3);
this is not a complete solution but some help to give you another insight into how to solve this. The previous solutions have relied much on proc contents and data step, but I would solve this using proc sql and dictionary.columns. And I would create a macro that would split the original file into as many parts as needed, 250 cols each. The steps roughly:
proc sql; create table as _colstemp as select * from dictionary.columns where library='your library' and memname = 'your table' and name ne 'your primary key'; quit;
Count the number of files needed somewhere along:
proc sql;
select ceil(count(*)/249) into :num_of_datasets from _colstemp;
select count(*) into :num_of_cols from _colstemp;
quit;
Then just loop over the original dataset like:
%do &_i = 1 %to &num_of_datasets
proc sql;
select name into :vars separated by ','
from _colstemp(firstobs=%eval((&_i. - 1)*249 + 1) obs = %eval(min(249,&num_of_cols. - &_i. * 249)) ;
quit;
proc sql;
create table split_&_i. as
select YOUR_PRIMARY_KEY, &vars from YOUR_ORIGINAL_TABLE;
quit;
%end;
Hopefully this gives you another idea. The solution is not tested, and may contain some pseudocode elements as it's written from my memory of doing things. Also this is void of macro declaration and much of parametrization one could do.. This would make the solution more general (parametrize your number of variables for each dataset, your primary key name, and your dataset names for example.

SAS pass array in SAS macro

I try to do some calculations based on existing columns and add the results back to the datasets. Could anyone help?
Here is what I try to write in SAS:
%macro ColumnCal1(m,prefix);
data _null_;
attr_&prefix. = sum(of &m.1-&m.3);
call symput("attr_&prefix.",attr_&prefix.);
run;
%mend ColumnCal1;
data c2;
set c1;
array mth{12} m1-m12;
%ColumnCal1(m=mth, prefix=ttl);
attr_ttl =&attr_ttl.;
run;
If I understood cthe problem you have a dataset and you want to calculate sum of different columns using a macro. I have shared an example below. Try it out.
Suppose you have a dataset with array like this:
data have;
array mnth{*} m1-m4;
input mnth{*};
cards;
1 3 6 9
2 4 8 10
;
run;
Code:
To calculate sum of different columns macro columncal1 is created with parameters
1) Input: Input file which contains the variables
2) Start: First column from which sum needs to be calculated
3) End: Last column till which sum needs to be calculated
4) Prefix: Prefix of the computed column name
5) Output: Output file which gives the result
%macro ColumnCal1(input=,start=,end=,prefix=,output=);
data &output.;
set &input.;
attr_&prefix = sum(of &start.-&end.);
run;
%mend ColumnCal1;
%ColumnCal1(input=have,start=m1,end=m2,prefix=ttl,output=want1);
/* Dataset want1 having all the initial columns plus sum of m1 and m2 stored in a variable attr_ttl has been created from have dataset*/
%ColumnCal1(input=want1,start=m2,end=m3,prefix=ttl1,output=want2);
/* Dataset want2 having all the initial columns plus sum of m1 and m2 stored in a variable attr_ttl has been created from want1 dataset*/
My Output (want2) :
m1 |m2 |m3 |m4 |attr_ttl |attr_ttl1
1 |3 |6 |9 |4 |9
2 |4 |8 |10 |6 |12
If you have any different requirement please do let me know.
You cannot nest data steps. Once SAS sees the new data step starting it stops compiling the first one and runs it. Also how does your current macro know how to find the data since there is no SET statement in the data step? Also you cannot reference a macro variable that has not been created yet. So if you generate a macro variable using then CALL SYMPUTX() function you cannot reference its value to modify the code of the current data step since the data steps needs to already have been compiled before the call symputx() can execute.
Something like this could work.
%macro ColumnCal1(m,prefix);
attr_&prefix. = sum(of &m.1-&m.3);
call symput("attr_&prefix.",attr_&prefix.);
%mend ColumnCal1;
data c2;
set c1;
%ColumnCal1(m=mth, prefix=ttl);
run;

dynamically creating variables based on the condition, SAS

My problem is following:
I have a dataset with two columns
number_of_years payment
4 100
5 123
2 52
and I would like to create new variable (or set of variables and then to sum them) and add value based on the value in the column number_of_years.
New variable should get following value:
number_of_years payment new_variable
4 100 100*1.01**4 + 100*1.01**3 + 100*1.01**2 + 100*1.01**1
5 123 123*1.01**5 + 123*1.01**4 + 123*1.01**3 + 123*1.01**2 + 123*1.01*1
2 52 52*1.01**2 + 52*1.01**1
e.t.c.
My original idea was to put a value from the column number_of_years into macro variable, loop with its value creating additional columns and then sum it, but it does not work.
data uprava;
set work.data_diskontace;
%let value1=number_of_years;
%macro spocti(n);
%do i=1 %to &n;
new_variable&i = payment*1.01**&i;
%end;
%mend doit;
%spocti(value1);
run;
Thank you for any suggestion which way to go.
You should use regular loops instead of macro loops, because number of iteration is dynamic depends on number_of_years variable.
data uprava;
set work.data_diskontace;
new_variable = 0;
do i = 1 to number_of_years;
new_variable = new_variable + payment*1.01**i;
end;
run;
No need for macros this is a geometric series that converge to
Simplest solution is:
data have;
input years payment;
cards;
4 100
5 123
2 52
;
run;
data want;
set have;
new_variable = (1.01*(1-1.01**years)/(1-1.01))*payment;
run;

How can I "define" SAS data sets using macro variable and write to them using an array

My source data contains 200,000+ observations, one of the many variables in the data set is "county." My goal is to write a macro that will take this one data set as an input, and split them into 58 different temporary data sets for each of the California counties.
First question is if it is possible to specify the 58 counties on the data statement using something like a global reference array defined beforehand.
Second question is, assuming the output data sets have been properly specified on the data statement, is it possible to use a do loop to choose the right data set to write to?
I can get the comparison to work properly, but cannot seem to use a array reference to specify a output data set. This is most likely because I need more experience with the macro environment!
Please see below for the simplistic skeleton framework I have written so far. c_long array contains the names of each of the counties, c_short array contains a 3 letter abbreviation for each of the counties. Thanks in advance!
data splitraw;
length county_name $15;
infile "&path/random.csv" dsd firstobs=2;
input county_name $ number;
run;
%macro _58countysplit(dxtosplit,countycol);
data <need to specify 58 data sets here named something like &dxtosplit_ALA, &dxtosplit_ALP, etc..>;
set &dxtosplit;
do i=1 to 58;
if c_long{i}=&countycol then output &dxtosplit._&c_short{i};
end;
run;
%mend _58countysplit;
%_58countysplit(splitraw,county_name);
The code you provided will need to run through the large dataset 58 times, each time writing a small one. I have done it a bit different.
First I create a sample dataset with a variable "county" this will contain ten different values:
data large;
attrib county length=$12;
do i=1 to 10000;
county=put(mod(i,10)+1,ROMAN.);
output;
end;
run;
First, I start with finding all the unique values and constructing the names of all the different tables I would like to create:
proc sql noprint;
select distinct compbl("large_"!!county) into :counties separated by " "
from large;
quit;
Now I have a macrovariable "counties" that containes all the different datasets I want to create.
Here I am writing the IF-statements to a file:
filename x temp;
data _null_;
attrib county length=$12 ds length=$18;
file x;
i=1;
do while(scan("&counties",i," ") ne "");
ds=scan("&counties",i," ");
county=scan(ds,-1,"_");
put "if county=""" county +(-1) """ then output " ds ";";
i+1;
end;
run;
Now I have what I need to create the small datasets:
data &counties;
set large;
%inc x;
run;
I agree with user667489, there is almost always a better way then splitting one large data set into many small data sets. However, if you want to proceed along these lines there is a table in sashelp called vcolumn which lists all your libraries, their tables, and each column (in each table) that should help you. Also if you want
if c_long{i}=&countycol then output &dxtosplit._&c_short{i};
to resolve you might mean:
if c_long{i}=&countycol then output &&dxtosplit._&c_short{i};
It's likely, depending upon what you're actually trying to do, that BY processing is all you need. Nevertheless, here is a simple solution:
%macro split_by(data=, splitvar=);
%local dslist iflist;
proc sql noprint;
select distinct cats("&splitvar._", &splitvar)
into :dslist separated by ' '
from &data;
select distinct
catt("if &splitvar='", &splitvar, "' then output &splitvar._", &splitvar, ";", '0A'x)
into :iflist separated by "else "
from &data;
quit;
data &dslist;
set &data;
&iflist
run;
%mend split_by;
Here is some test data to illustrate:
options mprint;
data test;
length county $1 val $1;
input county val;
infile cards;
datalines;
A 2
B 3
A 5
C 8
C 9
D 10
run;
%split_by(data=test, splitvar=county)
And you can view the log to see how the macro generates the DATA step you want:
MPRINT(SPLIT_BY): proc sql noprint;
MPRINT(SPLIT_BY): select distinct cats("county_", county) into :dslist separated by ' ' from test;
MPRINT(SPLIT_BY): select distinct catt("if county='", county, "' then output county_", county, ";", '0A'x) into :iflist separated
by "else " from test;
MPRINT(SPLIT_BY): quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
MPRINT(SPLIT_BY): data county_A county_B county_C county_D;
MPRINT(SPLIT_BY): set test;
MPRINT(SPLIT_BY): if county='A' then output county_A;
MPRINT(SPLIT_BY): else if county='B' then output county_B;
MPRINT(SPLIT_BY): else if county='C' then output county_C;
MPRINT(SPLIT_BY): else if county='D' then output county_D;
MPRINT(SPLIT_BY): run;
NOTE: There were 6 observations read from the data set WORK.TEST.
NOTE: The data set WORK.COUNTY_A has 2 observations and 2 variables.
NOTE: The data set WORK.COUNTY_B has 1 observations and 2 variables.
NOTE: The data set WORK.COUNTY_C has 2 observations and 2 variables.
NOTE: The data set WORK.COUNTY_D has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.05 seconds

Resources