Do while loop in SAS - loops

I am referring the SAS documentation page to understand the difference between DO while and DO until loops in SAS. While referring this document, I came to a conclusion that whenever I used DO while loop, the inequality of while statement should always less than (or less than or equal). The inequality cannot be greater than (or greater than or equal).
The reason is follows. When I ran this code:
data _null_;
n=0;
do while(n<5);
put n=;
n+1;
end;
run;
I got the output as 0,1,2,3,4.
But When I ran this code, I got nothing.
data _null_;
n=0;
do while(n>5);
put n=;
n+1;
end;
run;
Is my conclusion correct, or am I missing something?
Thank you
1:

It is not really whether you use > or <. But the logic of what to test is reversed. With DO WHILE() you execute the loop when the condition is TRUE. With DO UNTIL() you continue executing when the condition is FALSE.
11 data _null_;
12 do while(n<5);
13 put n #;
14 n+1;
15 end;
16 run;
0 1 2 3 4
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
17 data _null_;
18 do until(n>=5);
19 put n #;
20 n+1;
21 end;
22 run;
0 1 2 3 4
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
The other big difference is when the test is made. With DO WHILE() the test is before the loop starts. With DO UNTIL() it is after the loop finishes. So with DO UNTIL() the code always executes at least once.
Note your example is more easily done using an iterative DO loop instead.
6 data _null_;
7 do n=0 to 4;
8 put n #;
9 end;
10 run;
0 1 2 3 4

Related

Dynamically set size of array without hardcoding

So I have datasets with variables and values like so:
A1 A2 A3 A4 A5 A6
1 3 5 6 10 2
The variables can go up to A2000 in certain cases. I want to perform the same operation on each variable using an array. Is there a way to dynamically set the size of the array without manually typing it?
Example code of what I am striving for is below
data A;
input A1-A6;
datalines;
1 3 5 6 10 2;
run;
data A;
set A;
array a[*] a1-a&size;
do i=1 to &size;
{perform some operation here}
end;
run;
My question is how can I write code to get the parameter &size that represents the size of the array? In this example, &size = six.
Sure, use the : wildcard. This only works if a1-a6 are already defined (or a-whatever) in the dataset, though.
data have;
input a1-a6;
datalines;
1 2 3 4 5 6
7 8 9 10 11 12
;;;;
run;
data want;
set have;
array a a:;
do i=1 to dim(a);
sum = sum(sum ,a[i]);
end;
run;
Otherwise, what you put above would absolutely work. You don't need the [*] bit, though, and I prefer to keep the dim instead of &size on the loop control in case you change the way this works in the future. Of course you need to have a way to determine &size which will depend on your data.
%let size=6;
data want;
set have;
array a a1-a&size.;
do i=1 to dim(a);
sum = sum(sum ,a[i]);
end;
run;

Loop each variable into SAS macro

I have several variables in data set survey. I want to write a loop to load each variable into a SAS macro.
the code is below.
%let var= r1 r2 r3 ;
DATA survey;
INPUT id sex $ age inc r1 r2 r3 ;
DATALINES;
1 F 35 17 7 2 2
17 M 50 14 5 5 3
33 F 45 6 7 2 7
49 M 24 14 7 5 7
65 F 52 9 4 7 7
81 M 44 11 7 7 7
2 F 34 17 6 5 3
18 M 40 14 7 5 2
34 F 47 6 6 5 6
50 M 35 17 5 7 5
;
%MACRO bvars(input);
proc univariate data = "D:\hsb2" plots;
var &input.;
run;
%MEND bvars;
I just want &var can load into macro bvars each time for only one variable instead of writing the following.
%bvars(r1)
%bvars(r2)
%bvars(r3)
.....
This is time consuming while the number of variables are bigger than 100.
This will run proc univariate for all the variables in survay which start with "r" (so r1, r2, etc.). Procedures with a var statement usually accept multiple variables.
proc univariate data = survey;
var r:;
run;
If you wish to run for all numeric variables replace r: with _NUM_.
If you want to loop through the variables and call a function seperately each time there are several approaches. Usually they involve a macro do loop (which must be inside a macro) like so:
%macro looper(inData);
/* List all the variable names */
proc contents data = &inData. out = _colNames noprint;
run;
proc sql noprint;
select name
/* Put the variable names in a macro variable list */
into :colNames separated by " "
from _colNames
/* Get only numeric variables */
where type = 1
order by varnum;
quit;
/* Loop through the variable names */
%do i = 1 %to %sysfunc(countw(&colNames.));
%let colName = %scan(&colNAmes., &i.);
%put &colName.;
/* Your macro call or code here */
/* %bvars(&inData., &colName.) */
%end;
%mend looper;
%looper(sashelp.cars);
It might prove useful for you to become familiar with macro %do loops, proc contents (or better yet proc datasets), the %scan() function and the different ways to assign macro variables. The sas documentation online is a great place to start.
Updated answer.
You can utilise the VCOLUMN table that is automatically created for every SAS dataset in each library including the Work library. This table contains a row for each variable for each dataset in SAS.
So you would do the following. I am assuming your survery dataset is in the Work library.
So the code does the following;
1. Looks ups your dataset in the Vcolumn table and only keep the name of the variable (thats all we need) and store it into dataset temp.
2. For every variable run the bvars Marcro via the call execute statement.
data temp(keep=name);
set Sashelp.Vcolumn;
where libname = 'WORK' and memname = 'SURVEY';
run;
*Call macro using call execute;
data _null_;
set temp;
call execute ("%bvars("||name||");");
run;

Matlab Dataset Array Calculating delta t

I have a very large dataset array with over a million values that looks like this:
Month Day Year Hour Min Second Line1 Line2 Power Dt
7 8 2013 0 1 54 1.91 4.98 826.8 0
7 8 2013 0 0 9 1.93 3.71 676.8 0
7 8 2013 0 1 15 1.92 5.02 832.8 0
7 8 2013 0 1 21 1.91 5.01 830.4 0
and so on.
When the measurement of seconds got to 60 it would start over again at 0 hence why the first number is bigger. I need to fill the delta t column (Dt) by taking the current rows seconds column and subtracting the previous rows seconds column and correcting for negatyive values. This opperation cannot preform this operation in a loop as the it would take ages to complete and needs to be completed in a simple, one-shot, vector subtraction operation.
You can try diff command to generate such results. Its very fast and should work wihout any for loop.
HTH
Dt=diff(datenum(A(:,1:6)))*60*60*24;
This gives the delta in seconds, but I'm not sure what you want you correction for negative differences to be. Could you give an example of the expected output?
Note that Dt will be one entry shorter than A, so you may have to pad it.
You can remove the negative values (I think) with the command
Dt(Dt<0)=Dt(Dt<0)+60;
If you need to pad the Dt vector so that it is the same length as the data set, try
Dt=[Dt;0];

Calculating moving average using do loop in SAS

I am trying to find a way to calculate a moving average using SAS do loops. I am having difficulty. I essentially want to calculate a 4 unit moving average.
DATA data;
INPUT a b;
CARDS;
1 2
3 4
5 6
7 8
9 10
11 12
13 14
15 16
17 18
;
run;
data test(drop = i);
set data;
retain c 0;
do i = 1 to _n_-4;
c = (c+a)/4;
end;
run;
proc print data = test;
run;
One option is to use the merge-ahead:
DATA have;
INPUT a b;
CARDS;
1 2
3 4
5 6
7 8
9 10
11 12
13 14
15 16
17 18
;
run;
data want;
merge have have(firstobs=2 rename=a=a_1) have(firstobs=3 rename=a=a_2) have(firstobs=4 rename=a=a_3);
c = mean(of a:);
run;
Merge the data to itself, each time the merged dataset advancing one - so the 2nd starts with 2, third starts with 3, etc. That gives you all 4 'a' on one line.
SAS has a lag() function. What this does is create the lag of the variable it is applied to. SO for example, if your data looked like this:
DATA data;
INPUT a ;
CARDS;
1
2
3
4
5
;
Then the following would create a lag one, two, three etc variable;
data data2;
set data;
a_1=lag(a);
a_2=lag2(a);
a_3=lag3(a);
drop b;
run;
would create the following dataset
a a_1 a_2 a_3
1 . . .
2 1 . .
3 2 1 .
4 3 2 1
etc.
Moving averages can be easily calculated from these.
Check out http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212547.htm
(Please note, I did not get a chance to run the codes, so they may have errors.)
Straight from Cody's Collection of Popular Programming Tasks and How to Tackle them.
*Presenting a macro to compute a moving average;
%macro Moving_ave(In_dsn=, /*Input data set name */
Out_dsn=, /*Output data set name */
Var=, /*Variable on which to compute
the average */
Moving=, /* Variable for moving average */
n= /* Number of observations on which
to compute the average */);
data &Out_dsn;
set &In_dsn;
***compute the lags;
_x1 = &Var;
%do i = 1 %to &n - 1;
%let Num = %eval(&i + 1);
_x&Num = lag&i(&Var);
%end;
***if the observation number is greater than or equal to the
number of values needed for the moving average, output;
if _n_ ge &n then do;
&Moving = mean (of _x1 - _x&n);
output;
end;
drop _x:;
run;
%mend Moving_ave;
*Testing the macro;
%moving_Ave(In_dsn=data,
Out_dsn=test,
Var=a,
Moving=Average,
n=4)

apending for loop/recursion / strange error

I have a matlab/octave for loop which gives me an inf error messages along with the incorrect data
I'm trying to get 240,120,60,30,15... every number is divided by two then that number is also divided by two
but the code below gives me the wrong value when the number hits 30 and 5 and a couple of others it doesn't divide by two.
ang=240;
for aa=2:2:10
ang=[ang;ang/aa];
end
240
120
60
30
40
20
10
5
30
15
7.5
3.75
5
2.5
1.25
0.625
24
12
6
3
4
2
1
0.5
3
1.5
0.75
0.375
0.5
0.25
0.125
0.0625
PS: I will be accessing these values from different arrays, that's why I used a for loop so I can access the values using their indexes
In addition to the divide-by-zero error you were starting with (fixed in the edit), the approach you're taking isn't actually doing what you think it is. if you print out each step, you'll see why.
Instead of that approach, I suggest taking more of a "matlab way": avoid the loop by making use of vectorized operations.
orig = 240;
divisor = 2.^(0:5); #% vector of 2 to the power of [0 1 2 3 4 5]
ans = orig./divisor;
output:
ans = [240 120 60 30 15 7.5]
Try the following:
ang=240;
for aa=1:5
% sz=size(ang,1);
% ang=[ang;ang(sz)/2];
ang=[ang;ang(end)/2];
end
You should be getting warning: division by zero if you're running it in Octave. That says pretty much everything.
When you divide by zero, you get Inf. Because of your recursion... you see the problem.
You can simultaneously generalise and vectorise by using logic:
ang=240; %Replace 240 with any positive integer you like
ang=ang*2.^-(0:log2(ang));
ang=ang(1:sum(ang==floor(ang)));
This will work for any positive integer (to make it work for negatives as well, replace log2(ang) with log2(abs(ang))), and will produce the vector down to the point at which it goes odd, at which point the vector ends. It's also faster than jitendra's solution:
octave:26> tic; for i=1:100000 ang=240; ang=ang*2.^-(0:log2(ang)); ang=ang(1:sum(ang==floor(ang))); end; toc;
Elapsed time is 3.308 seconds.
octave:27> tic; for i=1:100000 ang=240; for aa=1:5 ang=[ang;ang(end)/2]; end; end; toc;
Elapsed time is 5.818 seconds.

Resources