I am new here. I am trying to read in a data set multiple times. so for example, assume that I have 3 observations in a data set (called tempfile) for a variable called temp. the three observations are 4,6, and 5.. so I want to read in the set x number of times so the 4th observation would be 4, fifth would be 6 and sixth, would be 5. the 7th would be 4, etc etc. I have tried this literally a few dozen ways, by doing something like
data new;
do i=1 to 100;
set tempfile;
end;
output;
run;
I have tried this by moving the do statement, moving the output statement, omitting the output statement..... every which way, trying macros also. can somebody help? thanks John
followup....
Hello:
Thanks for response. That did work. I would like to now do several things involving some “if then” statements inside the loop (more than just reading in the data set).
I want to read in a data set n number of times, and each time, there will be two if then statements
So, assume I read in 3 numbers any number of times; 7, 15, and 12
As each number is read, it will ask if it is less than 10. And each time it will create a random number.
If less than 10, then
If rand(uniform) < .4 then 1 is added to counter1, else 1 is added to counter2
And if >= 10,
Then
If rand(uniform) < .2 then 1 is added to counter1, else 1 is added to counter2
Any help is much appreciated.
Thanks
John
The way that most data steps actually stop is when SAS reads past the end of the input. So you need a method that prevents SAS from doing that.
The easiest way to replicate the data is to just execute multiple output statements. So the first record is repeated three times, then the second record is repeated three times, etc.
data want;
set tempfile ;
do i=1 to 3;
output;
end;
run;
Another method is to just list the dataset multiple times on the SET statement. So to read it in 3 times just use
data want;
set tempfile tempfile tempfile;
run;
You could probably use macro logic or even just a macro variable to make the number of repetitions variable.
data _null_; call symputx('list',repeat('tempfile ',3-1)); run;
data want; set &list; run;
Other method is to use the POINT= and NOBS= options on the SET statement so that SAS never reads past the end and you can jump back to the beginning. But since it never reads past the end of the input data you will need to manually tell it when to stop.
data want ;
do i=1 to 3;
do p=1 to nobs ;
set tempfile point=p nobs=nobs;
output;
end;
end;
stop;
run;
Or more in the spirit of your original post you might want to use the MOD() function to figure out which observation to read next.
data want;
if _n_ > 100 then stop;
p=1+mod(_n_-1,nobs);
set tempfile point=p nobs=nobs;
run;
If you have SAS/STAT software SURVEYSELECT.
data have;
do temp=4,6,5;
output;
end;
run;
proc surveyselect reps=10 rate=1 out=temp2 noprint;
run;
The data step is designed for serial processing. In this case, you need to "remember" previous observations. You can do it using only the data step, but for that use case, there are other solutions in the SAS environment that are simpler. The one I suggest is a macro that appends the original file n times:
%macro replicate( data=, out=, n=)/des='&out is &data repeated &n times.';
data &out;
set
%do i=1 %to &n;
&data
%end;
; /* This ; ends the data step `set` statement */
run;
%mend;
You could test your example with this helper:
%macro test;
data have; /* create the example data set */
temp = 4; output;
temp = 6; output;
temp = 5; output;
run;
%replicate( data=have, out=want, n=4 );
proc print; quit;
%mend;
Here is a portion of the SAS doc that adds lots of detail with many examples.
Related
I have a very simple request. Loop through a dataset, turning each observation into a macro variable, and then doing a comparison on that macro variable. here's what my code looks like:
%do n = 1 %to &i2.;
data want;
set have;
%if _N_ = &n. %then %do;
call symputx("Var1",var1);
call symputx("var2",var2);
%end;
run;
data want;
retain FinalCount
set have;
where Variable1="&var1.";
by SomeVariable
if first.SomeVariable then FinalCount=0;
if final="FINAL" then FinalCount+1;
if Finalcount=&var2. then Final_Samples=1;
finalCount=FinalCount;
run;
%end
The part that is failing in the _N_ = &n. section. I keep getting the error "Variable N has been defined as both character and numeric." Basically I just need to set each observation as a macro variable once to do the next comparison, and then move on to the next guy. So, if there's a better way of doing that, please let me know. Otherwise, could you help me figure out why that comparison is not working?
If you could explain your larger problem then you might get a better answer that does not require you to convert your data values into macro variables. Converting values to strings and then trying to compare them again introduces a number of sources of errors.
To your question of how to set macro variables based on the Nth observation in a dataset, try one of these.
If it supports the FIRSTOBS= and OBS= dataset options.
data _null_;
set have (firstobs=&n obs=&n);
call symputx("Var1",var1);
call symputx("var2",var2);
run;
If the dataset supports direct access then use that.
data _null_;
p = &n;
set have point=p;
call symputx("Var1",var1);
call symputx("var2",var2);
stop;
run;
If not then use an IF (not a macro %if).
data _null_;
set have ;
if _n_ = &n then do;
call symputx("Var1",var1);
call symputx("var2",var2);
stop;
end;
run;
I am creating multiple datasets named "Taxes&i.(&i notes each new dataset according to the counter I. ) I then append all of the tables at the end with the table " want". When I finish going through the first macro loop i would like to go through another loop that changes the dates. So the line
"DATE BETWEEN '14Feb2016:0:0:0'dt AND '16Feb2016:0:0:0'dt);
would look like
date between 'date1' and 'date2';
I don't know how to create that loop though, so it goes back into the first loop. Then that loop finishes and it goes into the second loop changes the dates and back into the first loop finishes...into the second loop.
Also there may be a way to make this less bulky and some how when the first loop is done executing maybe the dates can automatically increase by one day without them being declared. That will work also. I am just not sure which is best and possible.
%macro loop(list1, list2);
%let n=%sysfunc(countw(&list1, %str('')));
%do i=1 %to &n;
%let O_list1 = %scan(&list1, &i, %str('');
%let O_list2 = %scan(&list2, &i, %str('');
/* another macro here called date_loop(date1, date2); */
proc sql;
create table taxes&i;
select t1.tax_info
FROM work.taxes&1 as t1
WHERE (t1.O_LIST1 = &O_List2) AND
(DATE BETWEEN '14Feb2016:0:0:0'dt AND '16Feb2016:0:0:0'dt);
%end;
%mend;
run;
%list('1' '2', '3' '4') /*( this is "O_List1", "O_List2") */
data want;
set abc.taxes: ;
run;
Thanks for help!
I am trying to figure out how to call a macro variable in a loop within a data step in SAS, but I am lost; so I have 14 macro variables and I have to compare each of them to the entries of a vector. I tried:
data work.calendrier;
set projet.calendrier;
do i=1 to 3;
if date= "&vv&i"D then savinglight = 1;
end;
run;
But it is not working. The variable vv1 up to vv3 are date variables. For instance this code works:
data work.calendrier;
set projet.calendrier;
*do i=1 to 3;
if date= "&vv1"D then savinglight = 1;
*end;
run;
But with the loop it can not resolve the macro variable.
If you want to reference a macro variable with a number index like vv1,vv2,vv3 you need to resolve &i first.
SAS has a separate macro processor that resolves values before they reach the data step processor.
Essentially, you need to add extra ampersands at the beginning of your macro variable:
&&vv&i -> &vv1 -> "Value of vv1"
&&vv&i -> &vv2 -> "Value of vv2"
&&vv&i -> &vv3 -> "Value of vv3"
What happens here is that SAS reads in the information after the ampersand until it finds a break. SAS then resolves && as a single &, it then continues reading across until it resolves &i as a numeric value. You're then left with your required &vvi variable.
A couple of sources about this interesting topic:
http://www2.sas.com/proceedings/sugi29/063-29.pdf
http://www.lexjansen.com/nesug/nesug04/pm/pm07.pdf
Macro variable references are resolved before SAS compiles and runs your data step. You need to first figure out how to do what you want using SAS statements then, if necessary, you can use macro code to help you generate those statements.
If you want to test if a variable's value matches one of a list of values then consider using the IN operator.
data work.calendrier;
set projet.calendrier;
savinglight = date in ("&vv1"d,"&vv2"d,"&vv3"d);
run;
you need to use a macro. Here's the basic approach:
%let vv1 = 9;
%let vv2 = 2;
%let vv3 = 10;
data have;
drop i;
do i = 1 to 5;
date = i;
output;
end;
run;
%macro test;
data test;
set have;
%do i=1 %to 3;
if date= &&vv&i then savinglight = 1;
%end;
run;
%mend test;
%test;
I can't seem to find information about this online...
I have a list of variables I want to do a proc summary on. As these proc summaries are performed individually per variable, it would be faster for me if I can find some way to loop through a numbered list of variables, then create an output to excel or simply a combined table of results that clearly indicates what results belong to what variable.
The problem is I only know do loops work in a datastep, how would I get this to work for proc steps? Could I write a macro for the proc step, then nest it within a datastep? Would this cause it to run appropriately? i.e.
data _NULL_;
set table_of_vars;
do i=1 to (number of vars in the table);
_n_ = i;
%let var = _n_;
%macro_proc_summ(&var.);
end;
and another code subsequently that merges the individual output, or perhaps the macro could even generate output that always appends information.
Obviously the code is very sketchy, but conceptually could this work?
EDIT: To give a bit more clarity, this is how the code would look like without a loop in place.
%macro Analysis(var); %macro _; %mend _;
proc summary data=masterdata nway missing;
class &var.;
output out = &var._summ (drop = _type_);
run;
%mend;
endrsubmit;
%Analysis(var1);
%Analysis(var2);
%Analysis(var3);
.
.
.
.
%Analysis(var100);
From here we could either:
Export var1_summ, var2_summ to Excel in cells A1, D1, etc.
Or first combine our individual summaries into a large table then export to
some graphing application to look at the trend.
Either way you can see how these are individual Proc steps, which could be done a lot quicker in a loop.
If you don't want an output table use Proc Tabulate.
proc tabulate data=sashelp.class out=summary1;
class sex age;
var weight;
table sex age, weight*(n mean min max median);
run;
data summary2;
set summary1;
Var=coalescec(sex, put(age, 2.));
drop age sex _:;
run;
EDIT: Rather than proc tabulate, if you only want N, use PROC FREQ
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
table sex age;
run;
*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable'
variable_value='Variable Value';
run;
*Display;
proc print data=want(obs=20) label;
run;
After importing my CSV data with GETNAMES=NO, I have 59 columns with variable names VAR1, VAR2, . . . VAR59. My first row contains the names I need for the new variables, but they first needed manipulated by removing special characters and turning spaces into underscores since SAS doesn't like spaces in variable names. This is the array I used for that piece:
DATA DATA1; SET DATA (FIRSTOBS=7);
ARRAY VAR(59) VAR1-VAR59;
IF _N_ = 1 THEN DO;
DO I = 1 TO 59;
VAR[I] = COMPRESS(TRANSLATE(TRIM(VAR[I]),'_',' '),'?()');
PUT VAR[I]=;
END;
END;
DROP I;
RUN;
This worked perfectly, but now I need to get this first row up to the new variable names. I tried a similar array to perform this:
DATA DATA2; SET DATA1;
ARRAY V(59) VAR1-VAR59;
DO I = 1 TO 59;
IF _N_ = 1 AND V[I] NE "" THEN CALL SYMPUT("NEWNAME",V[I]);
RENAME VAR[I] = &NEWNAME;
END;
DROP I;
RUN;
This only puts the name of VAR59 since there is no [i] connected to the &NEWNAME, and it still isn't working quite right. Any suggestions to moving a row up to variable names AFTER manipulation?
Your primary problem is you are trying to use a macro variable in the data step it's created in. You can't. You're also trying to create rename statements in the data step; rename, as with other similar statements (keep, drop), must be defined before the data step is compiled.
You need to write code somewhere - either in a text file, a macro variable, whatever - with this information. For example:
filename renamef temp;
data _null_;
set myfile (obs=1);
file renamef;
array var[59];
do _i = 1 to dim(Var);
[your code to clean it out];
strput = cat("rename",vname(var[_i]),'=',var[_i],';');
put strput;
end;
run;
data want;
set myfile (firstobs=2);
%include renamef;
run;
There are lots of other examples to this on the site and on the web, "list processing" is the term for this.
Joe -- using your suggestions and another one of your posts, the following worked flawlessly:
Put the row of needed variables into long format (in my case, first row so n = 1)
DATA NEWVARS; SET DATA;
IF _N_ = 1 THEN OUTPUT NEWVARS;
RUN;
PROC TRANSPOSE DATA = NEWVARS OUT=NEWVARS1;
VAR _ALL_;
RUN;
Create a list of rename macro calls.
PROC SQL;
SELECT CATS('%RENAME(VAR=',_NAME_,',NEWVAR=',COL1,')')
INTO :RENAMELIST SEPARATED BY ' '
FROM NEWVARS1;
QUIT;
%MACRO RENAME(VAR=,NEWVAR=);
RENAME &VAR.=&NEWVAR.;
%MEND RENAME;
Call in the list created in Step 2 to rename all variables.
PROC DATASETS LIB=WORK NOLIST;
MODIFY DATA;
&RENAMELIST.;
QUIT;
I had to perform a few additional checks making sure that the variable names were not greater than 32 characters, and this was easy to check for when the data was in long format after transposing. If there are certain words that make the lengths too long, a TRANWRD statement can easily replace them with abbreviations.