I have the following data
DATA HAVE;
input yr_2001 yr_2002 yr_2003 area;
cards;
1 1 1 3
0 1 0 4
0 0 1 3
1 0 1 6
0 0 1 4
;
run;
I want to do the following proc freq for variable yr_2001 to yr_2003.
proc freq data=have;
table yr_2001*area;
where yr_2001=1;
run;
Is there a way I can do it without having to repeat it for each year, may be using a loop for proc freq??
Two ways:
1. Transpose it
Add a counter variable to your data, n, and transpose it by n area, then only keep values where the year flag is equal to 1. Because we set an index on the transposed group year, we do not need to re-sort it before doing by-group processing.
data have2;
set have;
n = _N_;
run;
proc transpose data=have
name=year
out=have2_tpose(rename = (COL1 = year_flag)
where = (year_flag = 1)
index = (year)
drop = n
);
by n area;
var yr_:;
run;
proc freq data=have2_tpose;
by year;
table area;
run;
2. Macro loop
Since they all start with yr_, it will be easy to get all the variable names from dictionary.columns and loop over all the variables. We'll use SQL to read the names into a |-separated list and loop over that list.
proc sql noprint;
select name
, count(*)
into :varnames separated by '|'
, :nVarnames
from dictionary.columns
where memname = 'HAVE'
AND libname = 'WORK'
AND name LIKE "yr_%"
;
quit;
/* Take a look at the variable names we found */
%put &varnames.;
/* Loop over all words in &varnames */
%macro freqLoop;
%do i = 1 %to &nVarnames.;
%let varname = %scan(&varnames., &i., |);
title "&varname.";
proc freq data=have;
where &varname. = 1;
table &varname.*area;
run;
title;
%end;
%mend;
%freqLoop;
Related
I have the following dataset in SAS;
City grade1 grade2 grade3
NY A. A. A
CA. B. A. C
CO. A. B. B
I would "combine" the three variables grades and get a proc freq that tells me the number of grades for each City; the expected output should therefore be:
A. B. C
NY 3. 0. 0
CA. 1. 1. 1
CO. 1. 2. 0
How could I do that in SAS?
Quite a few steps but it gives the expected result.
*-- Creating sample data --*;
data have;
infile datalines delimiter="|";
input City $ grade1 $ grade2 $ grade3 $;
datalines;
NY|A|A|A
CA|B|A|C
CO|A|B|B
;
*-- Sorting in order to use the transpose procedure --*;
proc sort data=have; by city; quit;
*-- Transposing from wide to tall format --*;
proc transpose data=have out=stage1(rename=(col1=grade) drop= _name_);
by city;
var grade:;
run;
*-- Assigning a value to 1 for each record for later sum --*;
data stage2;
set stage1;
val = 1;
run;
*-- Tabulate to create val_sum --*;
ods exclude all; *turn off default tabulate print;
proc tabulate data=stage2 out=stage3;
class city grade;
var val;
table city,grade*sum=''*val='';
run;
ods select all; *turn on;
*-- Transpose back using val_sum --*;
proc transpose data=stage3 out=stage4(drop=_name_);
by city;
id grade;
var val_sum;
run;
*-- Replace missing values by 0 to achieve desired output --*;
proc stdize data=stage4 out=want reponly missing=0;run;
City A B C
CA 1 1 1
CO 1 2 0
NY 3 0 0
In general:
Transpose data to a long format
Use PROC FREQ with the SPARSE option to generate the counts
Save the output from PROC FREQ to a data set
Transpose the output from PROC FREQ to the desired output format
*create sample data;
data have;
input City $ grade1 $ grade2 $ grade3 $;
cards;
NY A A A
CA B A C
CO A B B
;;;;
*sort;
proc sort data=have; by City;run;
*transpose to long format;
proc transpose data=have out=want1 prefix=Grade;
by City;
var grade1-grade3;
run;
*displayed output and counts;
proc freq data=want1;
table City*Grade1 / sparse out=freq norow nopercent nocol;
run;
*output table in desird format;
proc transpose data=freq out=want2;
by city;
id Grade1;
var count;
run;
Here is a way to do it in two steps: a sort step and a data step.
proc sort data=have; by city; run;
data count (drop grade1-grade3);
set have;
* create an array of all your grades;
array grade(3) 3 grade1-grade3;
by city;
*set the count to zero for each city;
if first.city then do;
A = 0;
B = 0;
C = 0;
end;
* use a do loop to count the grades;
do i = 1 to 3;
if grade(i) = 'A' then A + 1;
else if grade(i) = 'B' then B + 1;
else if grade(i) = 'C' then C + 1;
end;
run;
I am working with a table with more than 50 columns. I am trying to replace the value of multiple columns using a lookup table.
Table:
data have;
infile datalines delimiter=",";
input ID $1. SUB_ID :$2. COUNTRY :$2. A $1. B $1.;
datalines;
1,A,FR,A,B
2,B,CH,,B
3,C,DE,B,A
4,D,CZ,,B
5,E,GE,A,
6,F,EN,B,
7,G,US,,A
;
run;
Lookup table:
data lookup;
infile datalines delimiter=",";
input value_before $1. value_after :$2.;
datalines;
A,1
B,2
C,3
;
run;
Actual code:
data want;
if 0 then set lookup;
if _n_ = 1 then do;
declare hash lookup(dataset:'lookup');
lookup.defineKey('value_before');
lookup.defineData('value_after');
lookup.defineDone();
end;
set have;
if (lookup.find(key:A) = 0) then
A = value_after;
if (lookup.find(key:B) = 0) then
B = value_after;
/* ... */
/* if (lookup.find(key:Z) = 0) then
Z = value_after; */
drop value_before value_after;
run;
I guess this code would do the job if I would hardcode the 50 columns.
I wonder if there is a way to "apply" the hash.find() to all variables except the first three (ID, SUB_ID and Country) (maybe by indexing ?) without having to hardcode them or to use macros. For the sake of example I only computed 2 variables to replace the value (A and B) but there are more than 50 (with really different names and no pattern like var1,var2,...,varn).
In cases like this, I like to use proc sql and the dictionary table to fill in the column names for me to create an array. The below code will pull the variable names from dictionary.columns and save them as space-delimited into the macro variable varnames. We can feed this into an array and then use array logic to do the rest.
proc sql noprint;
select name
into :varnames separated by ' '
from dictionary.columns
where libname = 'WORK'
AND memname = 'HAVE'
AND name NOT IN('ID', 'SUB_ID', 'COUNTRY')
;
quit;
data want;
if 0 then set lookup;
if _n_ = 1 then do;
declare hash lookup(dataset:'lookup');
lookup.defineKey('value_before');
lookup.defineData('value_after');
lookup.defineDone();
end;
set have;
array vars[*] &varnames.;
do i = 1 to dim(vars);
if lookup.Find(key:vars[i])=0 then vars[i] = value_after;
end;
drop value_before value_after i;
run;
Is there any form to keep variables with a doop loop in data step?
will be something as:
data test;
input id aper_f_201501 aper_f_201502 aper_f_201503 aper_f_201504
aper_f_201505 aper_f_201506;
datalines;
1 0 1 2 3 5 7
2 -1 5 4 8 7 9
;
run;
%macro test;
%let date = '01Jul2015'd;
data test2;
set test(keep=do i = 1 to 3;
aper_f_%sysfunc(intnx(month,&date,-i,begin),yymmn6.);
end;)
run;
%mend;
%test;
I need to iterate several dates.
Thank you very much.
You need to use macro %do loop instead of the data step do loop, which is not going to be valid in the middle of a dataset option. Also do not generate those extra semi-colons into the middle of your dataset options. And do include a semi-colon to end your SET statement.
%macro test;
%local i date;
%let date = '01Jul2015'd;
data test2;
set test(keep=
%do i = 1 %to 3;
aper_f_%sysfunc(intnx(month,&date,-i,begin),yymmn6.)
%end;
);
run;
%mend;
%test;
You can use the colon shortcut to reference variables with the same prefix, anything in front of the colon will be kept.
keep ID aper_f_2015: ;
There's also a hyphen when you have sequential lists
keep ID aper_f_201501-aper_f_201512;
You can use a macro but not sure it adds a lot of value here.
I have a bunch of character variables which I need to sort out from a large dataset. The unwanted variables all have entries that are the same or are all missing (meaning I want to drop these from the dataset before processing the data further). The data sets are very large so this cannot be done manually, and I will be doing it a lot of times so I am trying to create a macro which will do just this. I have created a list macro variable with all character variables using the following code (The data for my part is different but I use the same sort of code):
data test;
input Obs ID Age;
datalines;
1 2 3
2 2 1
3 2 2
4 3 1
5 3 2
6 3 3
7 4 1
8 4 2
run;
proc contents
data = test
noprint
out = test_info(keep=name);
run;
proc sql noprint;
select name into : testvarlist separated by ' ' from test_info;
quit;
My idea is then to just use a data step to drop this list of variables from the original dataset. Now, the problem is that I need to loop over each variable, and determine if the observations for that variable are all the same or not. My idea is to create a macro that loops over all variables, and for each variable counts the occurrences of the entries. Since the length of this table is equal to the number of unique entries I know that the variable should be dropped if the table is of length 1. My attempt so far is the following code:
%macro ListScanner (org_list);
%local i next_name name_list;
%let name_list = &org_list;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
%put &next_name;
proc sql;
create table char_occurrences as
select &next_name, count(*) as numberofoccurrences
from &name_list group by &next_name;
select count(*) as countrec from char_occurrences;
quit;
%if countrec = 1 %then %do;
proc sql;
delete &next_name from &org_list;
quit;
%end;
%let i = %eval(&i + 1);
%end;
%mend;
%ListScanner(org_list = &testvarlist);
Though I get syntax errors, and with my real data I get other kinds of problems with not being able to read the data correctly but I am taking one step at a time. I am thinking that I might overcomplicate things so if anyone has an easier solution or can see what might be wrong to I would be very grateful.
There are many ways to do this posted around.
But let's just look at the issues you are having.
First for looping through your space delimited list of names it is easier to let the %do loop increment the index variable for you. Use the countw() function to find the upper bound.
%do i=1 %to %sysfunc(countw(&name_list,%str( )));
%let next_name = %scan(&name_list,&i,%str( ));
...
%end;
Second where is your input dataset in your SQL code? Add another parameter to your macro definition. Where to you want to write the dataset without the empty columns? So perhaps another parameter.
%macro ListScanner (dsname , out, name_list);
%local i next_name sep drop_list ;
Third you can use a single query to count all of variables at once. Just use count( distinct xxxx ) instead of group by.
proc sql noprint;
create table counts as
select
%let sep=;
%do i=1 %to %sysfunc(countw(&name_list,%str( )));
%let next_name = %scan(&name_list,&i,%str( ));
&sep. count(distinct &next_name) as &next_name
%let sep=,;
%end;
from &dsname
;
quit;
So this will get a dataset with one observation. You can use PROC TRANSPOSE to turn it into one observation per variable instead.
proc transpose data=counts out=counts_tall ;
var _all_;
run;
Now you can just query that table to find the names of the columns with 0 non-missing values.
proc sql noprint ;
select _name_ into :drop_list separated by ' '
from counts_tall
where col1=0
;
quit;
Now you can use the new DROP_LIST macro variable.
data &out ;
set &dsname ;
drop &drop_list;
run;
So now all that is left is to clean up after your self.
proc delete data=counts counts_tall ;
run;
%mend;
As far as your specific initial question, this is fairly straightforward. Assuming &testvarlist is your macro variable containing the variables you are interested in, and creating some test data in have:
%let testvarlist=x y z;
data have;
call streaminit(7);
do id = 1 to 1e6;
x = floor(rand('Uniform')*10);
y = floor(rand('Uniform')*10);
z = floor(rand('Uniform')*10);
if x=0 and y=4 and z=7 then call missing(of x y z);
output;
end;
run;
data want fordel;
set have;
if min(of &testvarlist.) = max(of &testvarlist.)
and (cmiss(of &testvarlist.)=0 or missing(min(of &testvarlist.)))
then output fordel;
else output want;
run;
This isn't particularly inefficient, but there are certainly better ways to do this, as referenced in comments.
I have a data set that needs to be blown out a certain number of rows according to a dynamic value. Take the dataset below for example:
DATA HAVE;
LENGTH ID $3 COUNT 3;
INPUT ID $ COUNT;
DATALINES;
A 4
B 3
C 1
D 2
;
RUN;
ID=A needs to be blown out 4 rows, ID=B needs to be blown out 3 rows, etc. The resulting dataset would look as such (minus a bunch of other variables I have):
A 1
A 2
A 3
A 4
B 1
B 2
B 3
C 1
D 1
D 2
The following code works to an extent, but I'm having trouble dynamically setting the &COUNT. macro. I tried to insert a CALL SYMPUTX("COUNT",COUNT) statement so that as it loops over each row, the count is placed into the macro and the row is blown at that number of rows.
** THIS CODE ONLY WORKS IF YOU SET COUNT= TO SOME VALUE **;
%MACRO LOOPOVER();
DATA WANT; SET HAVE;
DO UNTIL(LAST.ID);
BY ID;
%DO I=1 %TO &COUNT.;
COUNT = &I.; OUTPUT;
%END;
END;
RUN;
%MEND;
%LOOPOVER;
** THIS CODE DOESN'T WORK BUT I'M NOT SURE WHY?? **;
%MACRO LOOPOVER();
DATA WANT; SET HAVE;
DO UNTIL(LAST.ID);
BY ID;
CALL SYMPUTX("COUNT",COUNT); /* NEW LINE HERE */
%DO I=1 %TO &COUNT.;
COUNT = &I.; OUTPUT;
%END;
END;
RUN;
%MEND;
%LOOPOVER;
It is unnecessary to use macro.
data want(rename=(_count=count));
set have;
do i=1 to count;
_count=i;
output;
end;
drop count;
run;