SAS: Dynamically copy a certain number of rows - loops

I have a data set that needs to be blown out a certain number of rows according to a dynamic value. Take the dataset below for example:
DATA HAVE;
LENGTH ID $3 COUNT 3;
INPUT ID $ COUNT;
DATALINES;
A 4
B 3
C 1
D 2
;
RUN;
ID=A needs to be blown out 4 rows, ID=B needs to be blown out 3 rows, etc. The resulting dataset would look as such (minus a bunch of other variables I have):
A 1
A 2
A 3
A 4
B 1
B 2
B 3
C 1
D 1
D 2
The following code works to an extent, but I'm having trouble dynamically setting the &COUNT. macro. I tried to insert a CALL SYMPUTX("COUNT",COUNT) statement so that as it loops over each row, the count is placed into the macro and the row is blown at that number of rows.
** THIS CODE ONLY WORKS IF YOU SET COUNT= TO SOME VALUE **;
%MACRO LOOPOVER();
DATA WANT; SET HAVE;
DO UNTIL(LAST.ID);
BY ID;
%DO I=1 %TO &COUNT.;
COUNT = &I.; OUTPUT;
%END;
END;
RUN;
%MEND;
%LOOPOVER;
** THIS CODE DOESN'T WORK BUT I'M NOT SURE WHY?? **;
%MACRO LOOPOVER();
DATA WANT; SET HAVE;
DO UNTIL(LAST.ID);
BY ID;
CALL SYMPUTX("COUNT",COUNT); /* NEW LINE HERE */
%DO I=1 %TO &COUNT.;
COUNT = &I.; OUTPUT;
%END;
END;
RUN;
%MEND;
%LOOPOVER;

It is unnecessary to use macro.
data want(rename=(_count=count));
set have;
do i=1 to count;
_count=i;
output;
end;
drop count;
run;

Related

Creating loop for proc freq in SAS

I have the following data
DATA HAVE;
input yr_2001 yr_2002 yr_2003 area;
cards;
1 1 1 3
0 1 0 4
0 0 1 3
1 0 1 6
0 0 1 4
;
run;
I want to do the following proc freq for variable yr_2001 to yr_2003.
proc freq data=have;
table yr_2001*area;
where yr_2001=1;
run;
Is there a way I can do it without having to repeat it for each year, may be using a loop for proc freq??
Two ways:
1. Transpose it
Add a counter variable to your data, n, and transpose it by n area, then only keep values where the year flag is equal to 1. Because we set an index on the transposed group year, we do not need to re-sort it before doing by-group processing.
data have2;
set have;
n = _N_;
run;
proc transpose data=have
name=year
out=have2_tpose(rename = (COL1 = year_flag)
where = (year_flag = 1)
index = (year)
drop = n
);
by n area;
var yr_:;
run;
proc freq data=have2_tpose;
by year;
table area;
run;
2. Macro loop
Since they all start with yr_, it will be easy to get all the variable names from dictionary.columns and loop over all the variables. We'll use SQL to read the names into a |-separated list and loop over that list.
proc sql noprint;
select name
, count(*)
into :varnames separated by '|'
, :nVarnames
from dictionary.columns
where memname = 'HAVE'
AND libname = 'WORK'
AND name LIKE "yr_%"
;
quit;
/* Take a look at the variable names we found */
%put &varnames.;
/* Loop over all words in &varnames */
%macro freqLoop;
%do i = 1 %to &nVarnames.;
%let varname = %scan(&varnames., &i., |);
title "&varname.";
proc freq data=have;
where &varname. = 1;
table &varname.*area;
run;
title;
%end;
%mend;
%freqLoop;

keep variables with a doop loop sas

Is there any form to keep variables with a doop loop in data step?
will be something as:
data test;
input id aper_f_201501 aper_f_201502 aper_f_201503 aper_f_201504
aper_f_201505 aper_f_201506;
datalines;
1 0 1 2 3 5 7
2 -1 5 4 8 7 9
;
run;
%macro test;
%let date = '01Jul2015'd;
data test2;
set test(keep=do i = 1 to 3;
aper_f_%sysfunc(intnx(month,&date,-i,begin),yymmn6.);
end;)
run;
%mend;
%test;
I need to iterate several dates.
Thank you very much.
You need to use macro %do loop instead of the data step do loop, which is not going to be valid in the middle of a dataset option. Also do not generate those extra semi-colons into the middle of your dataset options. And do include a semi-colon to end your SET statement.
%macro test;
%local i date;
%let date = '01Jul2015'd;
data test2;
set test(keep=
%do i = 1 %to 3;
aper_f_%sysfunc(intnx(month,&date,-i,begin),yymmn6.)
%end;
);
run;
%mend;
%test;
You can use the colon shortcut to reference variables with the same prefix, anything in front of the colon will be kept.
keep ID aper_f_2015: ;
There's also a hyphen when you have sequential lists
keep ID aper_f_201501-aper_f_201512;
You can use a macro but not sure it adds a lot of value here.

SAS Looping through macro variable and processing the data

I have a bunch of character variables which I need to sort out from a large dataset. The unwanted variables all have entries that are the same or are all missing (meaning I want to drop these from the dataset before processing the data further). The data sets are very large so this cannot be done manually, and I will be doing it a lot of times so I am trying to create a macro which will do just this. I have created a list macro variable with all character variables using the following code (The data for my part is different but I use the same sort of code):
data test;
input Obs ID Age;
datalines;
1 2 3
2 2 1
3 2 2
4 3 1
5 3 2
6 3 3
7 4 1
8 4 2
run;
proc contents
data = test
noprint
out = test_info(keep=name);
run;
proc sql noprint;
select name into : testvarlist separated by ' ' from test_info;
quit;
My idea is then to just use a data step to drop this list of variables from the original dataset. Now, the problem is that I need to loop over each variable, and determine if the observations for that variable are all the same or not. My idea is to create a macro that loops over all variables, and for each variable counts the occurrences of the entries. Since the length of this table is equal to the number of unique entries I know that the variable should be dropped if the table is of length 1. My attempt so far is the following code:
%macro ListScanner (org_list);
%local i next_name name_list;
%let name_list = &org_list;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
%put &next_name;
proc sql;
create table char_occurrences as
select &next_name, count(*) as numberofoccurrences
from &name_list group by &next_name;
select count(*) as countrec from char_occurrences;
quit;
%if countrec = 1 %then %do;
proc sql;
delete &next_name from &org_list;
quit;
%end;
%let i = %eval(&i + 1);
%end;
%mend;
%ListScanner(org_list = &testvarlist);
Though I get syntax errors, and with my real data I get other kinds of problems with not being able to read the data correctly but I am taking one step at a time. I am thinking that I might overcomplicate things so if anyone has an easier solution or can see what might be wrong to I would be very grateful.
There are many ways to do this posted around.
But let's just look at the issues you are having.
First for looping through your space delimited list of names it is easier to let the %do loop increment the index variable for you. Use the countw() function to find the upper bound.
%do i=1 %to %sysfunc(countw(&name_list,%str( )));
%let next_name = %scan(&name_list,&i,%str( ));
...
%end;
Second where is your input dataset in your SQL code? Add another parameter to your macro definition. Where to you want to write the dataset without the empty columns? So perhaps another parameter.
%macro ListScanner (dsname , out, name_list);
%local i next_name sep drop_list ;
Third you can use a single query to count all of variables at once. Just use count( distinct xxxx ) instead of group by.
proc sql noprint;
create table counts as
select
%let sep=;
%do i=1 %to %sysfunc(countw(&name_list,%str( )));
%let next_name = %scan(&name_list,&i,%str( ));
&sep. count(distinct &next_name) as &next_name
%let sep=,;
%end;
from &dsname
;
quit;
So this will get a dataset with one observation. You can use PROC TRANSPOSE to turn it into one observation per variable instead.
proc transpose data=counts out=counts_tall ;
var _all_;
run;
Now you can just query that table to find the names of the columns with 0 non-missing values.
proc sql noprint ;
select _name_ into :drop_list separated by ' '
from counts_tall
where col1=0
;
quit;
Now you can use the new DROP_LIST macro variable.
data &out ;
set &dsname ;
drop &drop_list;
run;
So now all that is left is to clean up after your self.
proc delete data=counts counts_tall ;
run;
%mend;
As far as your specific initial question, this is fairly straightforward. Assuming &testvarlist is your macro variable containing the variables you are interested in, and creating some test data in have:
%let testvarlist=x y z;
data have;
call streaminit(7);
do id = 1 to 1e6;
x = floor(rand('Uniform')*10);
y = floor(rand('Uniform')*10);
z = floor(rand('Uniform')*10);
if x=0 and y=4 and z=7 then call missing(of x y z);
output;
end;
run;
data want fordel;
set have;
if min(of &testvarlist.) = max(of &testvarlist.)
and (cmiss(of &testvarlist.)=0 or missing(min(of &testvarlist.)))
then output fordel;
else output want;
run;
This isn't particularly inefficient, but there are certainly better ways to do this, as referenced in comments.

SAS nested loop syntax

I have a SAS code that works fine to read to datasets and merge them. The data sets are named according to the quarter and year of the data, e.g.: "data1_Q11999" and "data2_Q11999". The code I use to do this is below.
Now I want to loop over several of this datasets by increasing the year from 1999 to 2014 and the quarter from 1 to 4 (i.e. two loops).
My understanding is that I need to create a macro to do this, but I am having some issues with the syntax.
The code is below. I tried to wrap the code around a %macro statement with do loops but keep getting a bunch of syntax errors. Is there a straightforward way of achieving this?
data origfile;
infile "D:/data1_Q11999.txt" dlm= '|' MISSOVER DSD lrecl=32767 firstobs=1 ;
input
fico : 8.
dt_first_pi : 8.
id : $16.
run;
data svcgfile;
infile "D:/data2_Q11999.txt" dlm= '|' MISSOVER DSD lrecl=32767 firstobs=1 ;
input
id : $12.
Period : 8.
actual_loss : 12.
;
run;
PROC SORT DATA=origfile OUT=origfile;
BY id;
RUN;
PROC SORT DATA=svcgfile OUT=svcgfile;
BY id;
RUN;
DATA mergedata;
MERGE origfile svcgfile;
BY id;
RUN;
Assuming that you want to generate a separated merged file for year quarter you could use a macro like this.
%macro read(first_yr,last_yr);
%local year qtr;
%do year=&first_yr %to &last_yr ;
%do qtr=1 %to 4 ;
data data1;
infile "D:\data1_Q&qtr.&year..txt" dsd dlm= '|' truncover ;
length id $16 fico dt_first_pi 8 ;
input fico dt_first_pi id ;
run;
proc sort data=data1; by id; run;
data data2;
infile "D:\data2_Q&qtr.&year..txt" dsd dlm= '|' truncover ;
length id $16 period actual_loss 8 ;
input id period actual_loss ;
run;
proc sort data=data2; by id; run;
data result_q&qtr.&year. ;
merge data1 data2 ;
by id;
run;
%end;
%end;
%mend read ;
Then you could call it like this to generate 64 separate datasets.
%read(1999,2014)
But you probably will really want to have those 64 dataset combined into one so that you can use it more easily for your next steps. You could probably fix the process that reads the data to generate it all at once, but here is a simple data step to combine any dataset that starts with RESULT_ like the macro above generates and combine them into a single dataset.
data want ;
length year qtr 8 dsname $41 ;
set result_: indsname=dsname ;
year = input(substr(scan(dsname,-1,'.'),9),4.);
qtr = input(substr(scan(dsname,-1,'.'),8),1.);
run;

dynamically creating variables based on the condition, SAS

My problem is following:
I have a dataset with two columns
number_of_years payment
4 100
5 123
2 52
and I would like to create new variable (or set of variables and then to sum them) and add value based on the value in the column number_of_years.
New variable should get following value:
number_of_years payment new_variable
4 100 100*1.01**4 + 100*1.01**3 + 100*1.01**2 + 100*1.01**1
5 123 123*1.01**5 + 123*1.01**4 + 123*1.01**3 + 123*1.01**2 + 123*1.01*1
2 52 52*1.01**2 + 52*1.01**1
e.t.c.
My original idea was to put a value from the column number_of_years into macro variable, loop with its value creating additional columns and then sum it, but it does not work.
data uprava;
set work.data_diskontace;
%let value1=number_of_years;
%macro spocti(n);
%do i=1 %to &n;
new_variable&i = payment*1.01**&i;
%end;
%mend doit;
%spocti(value1);
run;
Thank you for any suggestion which way to go.
You should use regular loops instead of macro loops, because number of iteration is dynamic depends on number_of_years variable.
data uprava;
set work.data_diskontace;
new_variable = 0;
do i = 1 to number_of_years;
new_variable = new_variable + payment*1.01**i;
end;
run;
No need for macros this is a geometric series that converge to
Simplest solution is:
data have;
input years payment;
cards;
4 100
5 123
2 52
;
run;
data want;
set have;
new_variable = (1.01*(1-1.01**years)/(1-1.01))*payment;
run;

Resources