How do I make an inner loop in SAS? - loops

I am creating multiple datasets named "Taxes&i.(&i notes each new dataset according to the counter I. ) I then append all of the tables at the end with the table " want". When I finish going through the first macro loop i would like to go through another loop that changes the dates. So the line
"DATE BETWEEN '14Feb2016:0:0:0'dt AND '16Feb2016:0:0:0'dt);
would look like
date between 'date1' and 'date2';
I don't know how to create that loop though, so it goes back into the first loop. Then that loop finishes and it goes into the second loop changes the dates and back into the first loop finishes...into the second loop.
Also there may be a way to make this less bulky and some how when the first loop is done executing maybe the dates can automatically increase by one day without them being declared. That will work also. I am just not sure which is best and possible.
%macro loop(list1, list2);
%let n=%sysfunc(countw(&list1, %str('')));
%do i=1 %to &n;
%let O_list1 = %scan(&list1, &i, %str('');
%let O_list2 = %scan(&list2, &i, %str('');
/* another macro here called date_loop(date1, date2); */
proc sql;
create table taxes&i;
select t1.tax_info
FROM work.taxes&1 as t1
WHERE (t1.O_LIST1 = &O_List2) AND
(DATE BETWEEN '14Feb2016:0:0:0'dt AND '16Feb2016:0:0:0'dt);
%end;
%mend;
run;
%list('1' '2', '3' '4') /*( this is "O_List1", "O_List2") */
data want;
set abc.taxes: ;
run;
Thanks for help!

Related

SAS: How to loop a macro over rows of data to change to missing

Can anyone help with this issue I'm having where the macro is only taking the final row value of the data?
I have some data that looks like this:
data data1 ;
infile datalines dsd dlm='|' truncover;
input id :$2. year_age_15 EDU_2000 EDU_2001 EDU_2002 ;
datalines4;
10|2000|3|4|5
11|2000|5|5|6
12|2001|1|2|3
13|2002|5|5|6
14|2001|2|2|2
15|2000|3|3|4
;;;;
However I need it to use the year variable to determine which data to keep, and then change all the values for the years after that value to missing, like so:
data data1 ;
infile datalines dsd dlm='|' truncover;
input id :$2. year_age_15 EDU_2000 EDU_2001 EDU_2002 ;
datalines4;
10|2000|3|.|.
11|2000|5|.|.
12|2001|1|2|.
13|2002|5|5|6
14|2001|2|2|.
15|2000|3|.|.
;;;;
I've been trying to get this macro to work, but it only works intermittently and works just for the final row of the data rather than looping through the rows.
%macro macro2 (output=, input=);
data &output;
set &input;
%DO I = 1 %TO 6;
%do; call symput('value2',trim(left(put(year_age_15,8.))));
temp_col=&value2.;
%let year_end=&value2.;
%put YEAR END IS: &year_end.;
%put EDU YEAR IS: EDU_&year_end.;
%do year = &year_end. %TO 2002;
%put &year.;
EDU_&year.=.;
%end;
%end;
%end;
run;
%MEND macro2;
%macro1(input=testset, output=output_testset);
In R it could be something simple like :
for(i in 1:6){.
do this
}
Any advice? I can't figure out which bit is going wrong, thanks!
So, I think the issue here is your data is at the wrong level. You certainly can do what Reeza suggests, and I think it's probably reasonable to do so, but the reason why this is a bit complicated is that you have data in your variable name. That's not a best practice - your variable name should be "education" and your data should have a row for each year. Then this would be a simple WHERE statement!
Here's a simple PROC TRANSPOSE that turns it to the right structure, and then if you really need it the other way, a second one will turn it back. The where statement can be in the proc transpose or could be used somewhere else.
proc transpose data=data1 out=data_t (where=(year_Age_15 ge input(scan(_NAME_,2,'_'),4.)));
by id year_Age_15;
var edu_:;
run;
proc transpose data=data_t out=want;
by id year_age_15;
id _name_;
var col1;
run;
Create an array and index it by years rather than default 1:n
Loop through your array starting at year+1 and set to missing
data want;
set data1;
array educ(2000:2002) edu_2000-edu_2002;
if (year_age_15 +1) <= hbound(educ) then do i= (year_age_15 +1) to hbound(educ);
call missing(educ(i));
end;
run;
As #Joe mentions, the year to match is part of a variable name, which is tremor inducing 'data in the metadata'
You can use the VNAME to retrieve the variable name of an index accessed array element. Use that feature to compare to expected variable name whilst looping over a variable array based on variables named EDU*.
Example:
data have ;
infile datalines dsd dlm='|' truncover;
input id :$2. year_age_15 EDU_2000 EDU_2001 EDU_2002 ;
datalines4;
10|2000|3|4|5
11|2000|5|5|6
12|2001|1|2|3
13|2002|5|5|6
14|2001|2|2|2
15|2000|3|3|4
;;;;
data want;
set have;
array edus edu_:;
* find index of element corresponding to variable name having year;
do _n_ = 1 to dim(edus) until (upcase(vname(edus(_n_))) = cats('EDU_',year_age_15));
end;
* fill in elements at indices post the found one with missing values;
do _n_ = _n_+1 to dim(edus);
call missing(edus(_n_));
end;
run;

SAS Looping through macro variable and processing the data

I have a bunch of character variables which I need to sort out from a large dataset. The unwanted variables all have entries that are the same or are all missing (meaning I want to drop these from the dataset before processing the data further). The data sets are very large so this cannot be done manually, and I will be doing it a lot of times so I am trying to create a macro which will do just this. I have created a list macro variable with all character variables using the following code (The data for my part is different but I use the same sort of code):
data test;
input Obs ID Age;
datalines;
1 2 3
2 2 1
3 2 2
4 3 1
5 3 2
6 3 3
7 4 1
8 4 2
run;
proc contents
data = test
noprint
out = test_info(keep=name);
run;
proc sql noprint;
select name into : testvarlist separated by ' ' from test_info;
quit;
My idea is then to just use a data step to drop this list of variables from the original dataset. Now, the problem is that I need to loop over each variable, and determine if the observations for that variable are all the same or not. My idea is to create a macro that loops over all variables, and for each variable counts the occurrences of the entries. Since the length of this table is equal to the number of unique entries I know that the variable should be dropped if the table is of length 1. My attempt so far is the following code:
%macro ListScanner (org_list);
%local i next_name name_list;
%let name_list = &org_list;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
%put &next_name;
proc sql;
create table char_occurrences as
select &next_name, count(*) as numberofoccurrences
from &name_list group by &next_name;
select count(*) as countrec from char_occurrences;
quit;
%if countrec = 1 %then %do;
proc sql;
delete &next_name from &org_list;
quit;
%end;
%let i = %eval(&i + 1);
%end;
%mend;
%ListScanner(org_list = &testvarlist);
Though I get syntax errors, and with my real data I get other kinds of problems with not being able to read the data correctly but I am taking one step at a time. I am thinking that I might overcomplicate things so if anyone has an easier solution or can see what might be wrong to I would be very grateful.
There are many ways to do this posted around.
But let's just look at the issues you are having.
First for looping through your space delimited list of names it is easier to let the %do loop increment the index variable for you. Use the countw() function to find the upper bound.
%do i=1 %to %sysfunc(countw(&name_list,%str( )));
%let next_name = %scan(&name_list,&i,%str( ));
...
%end;
Second where is your input dataset in your SQL code? Add another parameter to your macro definition. Where to you want to write the dataset without the empty columns? So perhaps another parameter.
%macro ListScanner (dsname , out, name_list);
%local i next_name sep drop_list ;
Third you can use a single query to count all of variables at once. Just use count( distinct xxxx ) instead of group by.
proc sql noprint;
create table counts as
select
%let sep=;
%do i=1 %to %sysfunc(countw(&name_list,%str( )));
%let next_name = %scan(&name_list,&i,%str( ));
&sep. count(distinct &next_name) as &next_name
%let sep=,;
%end;
from &dsname
;
quit;
So this will get a dataset with one observation. You can use PROC TRANSPOSE to turn it into one observation per variable instead.
proc transpose data=counts out=counts_tall ;
var _all_;
run;
Now you can just query that table to find the names of the columns with 0 non-missing values.
proc sql noprint ;
select _name_ into :drop_list separated by ' '
from counts_tall
where col1=0
;
quit;
Now you can use the new DROP_LIST macro variable.
data &out ;
set &dsname ;
drop &drop_list;
run;
So now all that is left is to clean up after your self.
proc delete data=counts counts_tall ;
run;
%mend;
As far as your specific initial question, this is fairly straightforward. Assuming &testvarlist is your macro variable containing the variables you are interested in, and creating some test data in have:
%let testvarlist=x y z;
data have;
call streaminit(7);
do id = 1 to 1e6;
x = floor(rand('Uniform')*10);
y = floor(rand('Uniform')*10);
z = floor(rand('Uniform')*10);
if x=0 and y=4 and z=7 then call missing(of x y z);
output;
end;
run;
data want fordel;
set have;
if min(of &testvarlist.) = max(of &testvarlist.)
and (cmiss(of &testvarlist.)=0 or missing(min(of &testvarlist.)))
then output fordel;
else output want;
run;
This isn't particularly inefficient, but there are certainly better ways to do this, as referenced in comments.

reading a data set multiple times in SAS

I am new here. I am trying to read in a data set multiple times. so for example, assume that I have 3 observations in a data set (called tempfile) for a variable called temp. the three observations are 4,6, and 5.. so I want to read in the set x number of times so the 4th observation would be 4, fifth would be 6 and sixth, would be 5. the 7th would be 4, etc etc. I have tried this literally a few dozen ways, by doing something like
data new;
do i=1 to 100;
set tempfile;
end;
output;
run;
I have tried this by moving the do statement, moving the output statement, omitting the output statement..... every which way, trying macros also. can somebody help? thanks John
followup....
Hello:
Thanks for response. That did work. I would like to now do several things involving some “if then” statements inside the loop (more than just reading in the data set).
I want to read in a data set n number of times, and each time, there will be two if then statements
So, assume I read in 3 numbers any number of times; 7, 15, and 12
As each number is read, it will ask if it is less than 10. And each time it will create a random number.
If less than 10, then
If rand(uniform) < .4 then 1 is added to counter1, else 1 is added to counter2
And if >= 10,
Then
If rand(uniform) < .2 then 1 is added to counter1, else 1 is added to counter2
Any help is much appreciated.
Thanks
John
The way that most data steps actually stop is when SAS reads past the end of the input. So you need a method that prevents SAS from doing that.
The easiest way to replicate the data is to just execute multiple output statements. So the first record is repeated three times, then the second record is repeated three times, etc.
data want;
set tempfile ;
do i=1 to 3;
output;
end;
run;
Another method is to just list the dataset multiple times on the SET statement. So to read it in 3 times just use
data want;
set tempfile tempfile tempfile;
run;
You could probably use macro logic or even just a macro variable to make the number of repetitions variable.
data _null_; call symputx('list',repeat('tempfile ',3-1)); run;
data want; set &list; run;
Other method is to use the POINT= and NOBS= options on the SET statement so that SAS never reads past the end and you can jump back to the beginning. But since it never reads past the end of the input data you will need to manually tell it when to stop.
data want ;
do i=1 to 3;
do p=1 to nobs ;
set tempfile point=p nobs=nobs;
output;
end;
end;
stop;
run;
Or more in the spirit of your original post you might want to use the MOD() function to figure out which observation to read next.
data want;
if _n_ > 100 then stop;
p=1+mod(_n_-1,nobs);
set tempfile point=p nobs=nobs;
run;
If you have SAS/STAT software SURVEYSELECT.
data have;
do temp=4,6,5;
output;
end;
run;
proc surveyselect reps=10 rate=1 out=temp2 noprint;
run;
The data step is designed for serial processing. In this case, you need to "remember" previous observations. You can do it using only the data step, but for that use case, there are other solutions in the SAS environment that are simpler. The one I suggest is a macro that appends the original file n times:
%macro replicate( data=, out=, n=)/des='&out is &data repeated &n times.';
data &out;
set
%do i=1 %to &n;
&data
%end;
; /* This ; ends the data step `set` statement */
run;
%mend;
You could test your example with this helper:
%macro test;
data have; /* create the example data set */
temp = 4; output;
temp = 6; output;
temp = 5; output;
run;
%replicate( data=have, out=want, n=4 );
proc print; quit;
%mend;
Here is a portion of the SAS doc that adds lots of detail with many examples.

How to resolve macro variable in a loop in SAS

I am trying to figure out how to call a macro variable in a loop within a data step in SAS, but I am lost; so I have 14 macro variables and I have to compare each of them to the entries of a vector. I tried:
data work.calendrier;
set projet.calendrier;
do i=1 to 3;
if date= "&vv&i"D then savinglight = 1;
end;
run;
But it is not working. The variable vv1 up to vv3 are date variables. For instance this code works:
data work.calendrier;
set projet.calendrier;
*do i=1 to 3;
if date= "&vv1"D then savinglight = 1;
*end;
run;
But with the loop it can not resolve the macro variable.
If you want to reference a macro variable with a number index like vv1,vv2,vv3 you need to resolve &i first.
SAS has a separate macro processor that resolves values before they reach the data step processor.
Essentially, you need to add extra ampersands at the beginning of your macro variable:
&&vv&i -> &vv1 -> "Value of vv1"
&&vv&i -> &vv2 -> "Value of vv2"
&&vv&i -> &vv3 -> "Value of vv3"
What happens here is that SAS reads in the information after the ampersand until it finds a break. SAS then resolves && as a single &, it then continues reading across until it resolves &i as a numeric value. You're then left with your required &vvi variable.
A couple of sources about this interesting topic:
http://www2.sas.com/proceedings/sugi29/063-29.pdf
http://www.lexjansen.com/nesug/nesug04/pm/pm07.pdf
Macro variable references are resolved before SAS compiles and runs your data step. You need to first figure out how to do what you want using SAS statements then, if necessary, you can use macro code to help you generate those statements.
If you want to test if a variable's value matches one of a list of values then consider using the IN operator.
data work.calendrier;
set projet.calendrier;
savinglight = date in ("&vv1"d,"&vv2"d,"&vv3"d);
run;
you need to use a macro. Here's the basic approach:
%let vv1 = 9;
%let vv2 = 2;
%let vv3 = 10;
data have;
drop i;
do i = 1 to 5;
date = i;
output;
end;
run;
%macro test;
data test;
set have;
%do i=1 %to 3;
if date= &&vv&i then savinglight = 1;
%end;
run;
%mend test;
%test;

SAS - How to modify several datasets using a macro?

I'm trying to modify a number of datasets (their names follow a certain order, like data_AXPM061203900_20120104 , data_AXPM061203900_20120105, data_AXPA061204100_20120103, data_AXPA061204100_20120104) under work library. For example, I want to delete variable "price=0" in all datasets.
I am using the following to create a table to identify the datasets:
proc sql ;
create table data.mytables as
select *
from dictionary.tables
where libname = 'WORK'
order by memname ;
quit ;
For the next step, I'm trying to use a macro:
%macro test;
proc sql ;
select count(memname) into: obs from data.mytables;
%let obs=&obs.;
select catx("_", "data", substr(memname, 6, 13), substr(memname,20,27))
into :setname1-:setname&obs.
from data.mytables;
quit;
%do i=1 %to &obs.;
data &&setname&i
set &&setname&i
if bid_price= '.' then delete;
%end;
%mend test;
However, it completely failed. Could anyone give me some suggestions? I'm really not good at macros. Errors include:
56: LINE and COLUMN cannot be determined.
ERROR 56-185: SET is not allowed in the DATA statement when option DATASTMTCHK=COREKEYWORDS. Check for a missing semicolon in
the DATA statement, or use DATASTMTCHK=NONE.
You are missing semicolons on the DATA statement and SET statement, and should probably add a RUN statement. Suggest you try:
%do i=1 %to &obs.;
data &&setname&i ;
set &&setname&i ;
if bid_price= '.' then delete;
run;
%end;
Note that the DELETE statement is deleting records, not deleting variables. The code above expects that bid_price is a character variable, and that you want to delete records when the value is '.'.

Resources