First of all, apologize for my poor english but that's because I'm not native. I'm a newbie in SAS programming too, and I need someone to help me with this problem struggling me.
I have one dataset A containing a numeric field YM representing year and month (e.g., 200902) that I'm using to filter the dataset. In particular, I want to get N filtered datasets using N differents values YM.
A_filtered_200901 = A.filter(YM == 200901)
A_filtered_200902 = A.filter(YM == 200902)
A_filtered_200903 = A.filter(YM == 200903)
...
My idea was to generate the sequence of YM used to filter and then give it as an argument to a %macro containing a PROC SQL. In code/pseucode:
data ym_dataset;
date = input(put(20090201, 8.), yymmdd8.);
do i = 1 to 3;
aux1 = intnx('MONTH', date, i);
aux2 = put(aux1, yymmddn8.);
list_of_ym_values = substr(aux2 , 1, 6);
output;
end;
run;
%macro my_macro(list_of_ym_values);
proc sql;
%do i = 1 %to dim(&list_of_ym_values)
select *
from A
where YM = &list_of_ym_values(i)
%end
quit;
%mend my_macro;
%my_macro(ym_dataset[list_of_ym_values])
I know that this is not the correct approach, but I hope that someone could shed me some light about doing it properly.
Thank you!!
you need loop through list of variables and this values can be created in a macro variable. But as #richard suggested in comments is not great idea to split datasets.
/* create macrovariable with all values*/
proc sql;
select list_of_ym_values into :List
separated by "|" from ym_dataset;
%put &list;
/* scan through each variable and create new dataset*/
%macro one;
%do i=1 %to %sysfunc(countw(&list),"|") ;
%let val= %scan(&list,&i,|);
proc sql;
create table want_&val as
select * from ym_dataset
where list_of_ym_values = "&val";
%end;
%mend;
%one;
Related
I want to create five target year columns to the work.komposit_prod throug a loop.
I got following code:
proc sql noprint;
select distinct year into :targetyears1 - FROM work.QE_Target
ORDER by year;
quit;
proc sql noprint;
select distinct Count(Jahr) into :Count_targetyears
FROM
(select distinct year FROM work.QE_Target);
quit;
%let max = &Count_targetyears;
data test ;
set work.komposit_prod;
Do i=1 to &max;
"ZZ_&&targetyears&i"n = .;
end;
run;
Somehow the variable "ZZ_&&targetyears&i"n reference couldn't be resolved.
Can someone give me a hint?
Thank you.
Kind regars,
Ben
Your do loop references a data step variable rather than a macro variable:
Do i=1 to &max;
"ZZ_&&targetyears&i"n = .;
end;
You will need to convert this to a macro to run it:
%macro target_years;
data test ;
set work.komposit_prod;
%do i=1 to &max;
"ZZ_&&targetyears&i"n = .;
%end;
run;
%mend;
%target_years
Can anyone help with this issue I'm having where the macro is only taking the final row value of the data?
I have some data that looks like this:
data data1 ;
infile datalines dsd dlm='|' truncover;
input id :$2. year_age_15 EDU_2000 EDU_2001 EDU_2002 ;
datalines4;
10|2000|3|4|5
11|2000|5|5|6
12|2001|1|2|3
13|2002|5|5|6
14|2001|2|2|2
15|2000|3|3|4
;;;;
However I need it to use the year variable to determine which data to keep, and then change all the values for the years after that value to missing, like so:
data data1 ;
infile datalines dsd dlm='|' truncover;
input id :$2. year_age_15 EDU_2000 EDU_2001 EDU_2002 ;
datalines4;
10|2000|3|.|.
11|2000|5|.|.
12|2001|1|2|.
13|2002|5|5|6
14|2001|2|2|.
15|2000|3|.|.
;;;;
I've been trying to get this macro to work, but it only works intermittently and works just for the final row of the data rather than looping through the rows.
%macro macro2 (output=, input=);
data &output;
set &input;
%DO I = 1 %TO 6;
%do; call symput('value2',trim(left(put(year_age_15,8.))));
temp_col=&value2.;
%let year_end=&value2.;
%put YEAR END IS: &year_end.;
%put EDU YEAR IS: EDU_&year_end.;
%do year = &year_end. %TO 2002;
%put &year.;
EDU_&year.=.;
%end;
%end;
%end;
run;
%MEND macro2;
%macro1(input=testset, output=output_testset);
In R it could be something simple like :
for(i in 1:6){.
do this
}
Any advice? I can't figure out which bit is going wrong, thanks!
So, I think the issue here is your data is at the wrong level. You certainly can do what Reeza suggests, and I think it's probably reasonable to do so, but the reason why this is a bit complicated is that you have data in your variable name. That's not a best practice - your variable name should be "education" and your data should have a row for each year. Then this would be a simple WHERE statement!
Here's a simple PROC TRANSPOSE that turns it to the right structure, and then if you really need it the other way, a second one will turn it back. The where statement can be in the proc transpose or could be used somewhere else.
proc transpose data=data1 out=data_t (where=(year_Age_15 ge input(scan(_NAME_,2,'_'),4.)));
by id year_Age_15;
var edu_:;
run;
proc transpose data=data_t out=want;
by id year_age_15;
id _name_;
var col1;
run;
Create an array and index it by years rather than default 1:n
Loop through your array starting at year+1 and set to missing
data want;
set data1;
array educ(2000:2002) edu_2000-edu_2002;
if (year_age_15 +1) <= hbound(educ) then do i= (year_age_15 +1) to hbound(educ);
call missing(educ(i));
end;
run;
As #Joe mentions, the year to match is part of a variable name, which is tremor inducing 'data in the metadata'
You can use the VNAME to retrieve the variable name of an index accessed array element. Use that feature to compare to expected variable name whilst looping over a variable array based on variables named EDU*.
Example:
data have ;
infile datalines dsd dlm='|' truncover;
input id :$2. year_age_15 EDU_2000 EDU_2001 EDU_2002 ;
datalines4;
10|2000|3|4|5
11|2000|5|5|6
12|2001|1|2|3
13|2002|5|5|6
14|2001|2|2|2
15|2000|3|3|4
;;;;
data want;
set have;
array edus edu_:;
* find index of element corresponding to variable name having year;
do _n_ = 1 to dim(edus) until (upcase(vname(edus(_n_))) = cats('EDU_',year_age_15));
end;
* fill in elements at indices post the found one with missing values;
do _n_ = _n_+1 to dim(edus);
call missing(edus(_n_));
end;
run;
I have a bunch of character variables which I need to sort out from a large dataset. The unwanted variables all have entries that are the same or are all missing (meaning I want to drop these from the dataset before processing the data further). The data sets are very large so this cannot be done manually, and I will be doing it a lot of times so I am trying to create a macro which will do just this. I have created a list macro variable with all character variables using the following code (The data for my part is different but I use the same sort of code):
data test;
input Obs ID Age;
datalines;
1 2 3
2 2 1
3 2 2
4 3 1
5 3 2
6 3 3
7 4 1
8 4 2
run;
proc contents
data = test
noprint
out = test_info(keep=name);
run;
proc sql noprint;
select name into : testvarlist separated by ' ' from test_info;
quit;
My idea is then to just use a data step to drop this list of variables from the original dataset. Now, the problem is that I need to loop over each variable, and determine if the observations for that variable are all the same or not. My idea is to create a macro that loops over all variables, and for each variable counts the occurrences of the entries. Since the length of this table is equal to the number of unique entries I know that the variable should be dropped if the table is of length 1. My attempt so far is the following code:
%macro ListScanner (org_list);
%local i next_name name_list;
%let name_list = &org_list;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
%put &next_name;
proc sql;
create table char_occurrences as
select &next_name, count(*) as numberofoccurrences
from &name_list group by &next_name;
select count(*) as countrec from char_occurrences;
quit;
%if countrec = 1 %then %do;
proc sql;
delete &next_name from &org_list;
quit;
%end;
%let i = %eval(&i + 1);
%end;
%mend;
%ListScanner(org_list = &testvarlist);
Though I get syntax errors, and with my real data I get other kinds of problems with not being able to read the data correctly but I am taking one step at a time. I am thinking that I might overcomplicate things so if anyone has an easier solution or can see what might be wrong to I would be very grateful.
There are many ways to do this posted around.
But let's just look at the issues you are having.
First for looping through your space delimited list of names it is easier to let the %do loop increment the index variable for you. Use the countw() function to find the upper bound.
%do i=1 %to %sysfunc(countw(&name_list,%str( )));
%let next_name = %scan(&name_list,&i,%str( ));
...
%end;
Second where is your input dataset in your SQL code? Add another parameter to your macro definition. Where to you want to write the dataset without the empty columns? So perhaps another parameter.
%macro ListScanner (dsname , out, name_list);
%local i next_name sep drop_list ;
Third you can use a single query to count all of variables at once. Just use count( distinct xxxx ) instead of group by.
proc sql noprint;
create table counts as
select
%let sep=;
%do i=1 %to %sysfunc(countw(&name_list,%str( )));
%let next_name = %scan(&name_list,&i,%str( ));
&sep. count(distinct &next_name) as &next_name
%let sep=,;
%end;
from &dsname
;
quit;
So this will get a dataset with one observation. You can use PROC TRANSPOSE to turn it into one observation per variable instead.
proc transpose data=counts out=counts_tall ;
var _all_;
run;
Now you can just query that table to find the names of the columns with 0 non-missing values.
proc sql noprint ;
select _name_ into :drop_list separated by ' '
from counts_tall
where col1=0
;
quit;
Now you can use the new DROP_LIST macro variable.
data &out ;
set &dsname ;
drop &drop_list;
run;
So now all that is left is to clean up after your self.
proc delete data=counts counts_tall ;
run;
%mend;
As far as your specific initial question, this is fairly straightforward. Assuming &testvarlist is your macro variable containing the variables you are interested in, and creating some test data in have:
%let testvarlist=x y z;
data have;
call streaminit(7);
do id = 1 to 1e6;
x = floor(rand('Uniform')*10);
y = floor(rand('Uniform')*10);
z = floor(rand('Uniform')*10);
if x=0 and y=4 and z=7 then call missing(of x y z);
output;
end;
run;
data want fordel;
set have;
if min(of &testvarlist.) = max(of &testvarlist.)
and (cmiss(of &testvarlist.)=0 or missing(min(of &testvarlist.)))
then output fordel;
else output want;
run;
This isn't particularly inefficient, but there are certainly better ways to do this, as referenced in comments.
I have lots of tables which I would like to sort with Proc Sort. (The names of the tables are written in a text file.) To avoid repeating the same code all over again I have tried creating a macro that would import the text file, create an array consisting of those table names and finally sort all the tables. However, I came across a few problems. In Python, I would easily be able to loop through an array. But in SAS, I am not sure how to do it.
%MACRO SORT_TABLES();
PROC IMPORT
DATAFILE = 'TABLES_LIST.txt'
OUT = WORK.TABLES_LIST (RENAME = VAR1 = TABLE_NAME)
DBMS = TAB
REPLACE;
GETNAMES = NO;
QUIT;
/* GET THE LIST OF TABLE NAMES: */
PROC SQL NOPRINT;
SELECT
DISTINCT TABLE_NAME
INTO :TABLEVAR1 - :TABLEVAR&SYSMAXLONG
FROM
WORK.TABLES_LIST;
QUIT;
DATA _NULL_;
ARRAY TABLE_NAMES $ &TABLEVAR1 - &TABLEVAR&SYSMAXLONG;
RUN;
%DO %OVER TABLE_NAMES
PROC SORT
DATA = &TABLEVAR1 /* how can I iterate here???? */
OUT = 'WORK.'||&TABLEVAR1;
BY A B C;
QUIT;
%END;
%MEND;
Just use an iterative %DO loop to loop over your "array" of macro variables.
proc sql noprint ;
select distinct table_name
into :tablevar1 -
from table_list
;
quit;
%do i=1 %to &sqlobs ;
proc sort data=&&tablevar&i ; by _all_ ; run;
%end;
But you don't need a macro for this. There are easier ways to generate code.
filename code temp;
data _null_;
set table_list ;
put 'PROC SORT DATA = ' table_name '; BY _all_; run;' ;
run;
%include code / source2 ;
I want to repeat some code, varying one of the parameters and test whether a condition is met. If the condition is met, I want to leave the loop, if not I want to proceed to the next value of the parameter. I am using the below code, which works fine except that it does not leave the loop when I expect it to. Despite the summary showing that the condition should have been met it always seems to resolve to False.
%macro set_downward_caps(year, in_year_tolerance, large, small, start, end, increment);
%do c = &start. %to &end. %by &increment.;
%let nominal_down_large_&year. = %sysevalf(&large. + (&c. / 1000));
%let nominal_down_small_&year. = %sysevalf(&small. + (&c. / 100));
%let real_down_large_&year. = %sysevalf((1 - &&nominal_down_large_&year.) * &&rpi&year.);
%let real_down_small_&year. = %sysevalf((1 - &&nominal_down_small_&year.) * &&rpi&year.);
%rates(&year.);
proc means data = output.s_&scenario. noprint nway;
var transbill&year.;
output out = temporary (drop = _type_ _freq_) sum=cost;
run;
data _null_;
set temporary;
call symput('cost', cost);
run;
data temp;
length scenario $ 30;
scenario = "&scenario.";
large = &&real_down_large_&year.;
small = &&real_down_small_&year.;
cost = &cost.;
run;
data output.summary_of_caps;
set output.summary_of_caps temp;
run;
%if %sysfunc(abs(&cost.)) le &in_year_tolerance. %then leave;
%end;
%mend set_downward_caps;
So the sumary_of_caps table contains values that suggest that the following condition should have resolved to true:
%if %sysfunc(abs(&cost.)) le &in_year_tolerance. %then leave;
I've tried sticking it in sysevalf but to no avail.
I don't think there is a LEAVE equivalent for macro code. Why not just use a %GOTO? Or since you seem to want to totally leave the macro you could use %RETURN.
Also if you are comparing floating point numbers you need to use %SYSEVALF(). The implied %EVAL() call of the %IF statement will only handle integer arithmetic.
%if %sysevalf(%sysfunc(abs(&cost)) le &in_year_tolerance) %then %return;
Why do the calculations in macro logic at all? I am not sure why you have all of those macro variables, unless the %RATES() macro is referencing them? But if it needs them why aren't they parameters to the macro like you are passing in &YEAR?
You have plenty of data steps in your current code where you could do the calculation there and just set a flag variable that you can use to control whether to exit the loop.
%macro set_downward_caps
(year
,in_year_tolerance
,large
,small
,start
,end
,increment
);
%local c leave ;
%do c = &start %to &end %by &increment;
%local nominal_down_large_&year ;
%local nominal_down_small_&year ;
%local real_down_large_&year ;
%local real_down_small_&year ;
%let nominal_down_large_&year. = %sysevalf(&large. + (&c. / 1000));
%let nominal_down_small_&year. = %sysevalf(&small. + (&c. / 100));
%let real_down_large_&year. = %sysevalf((1 - &&nominal_down_large_&year.) * &&rpi&year.);
%let real_down_small_&year. = %sysevalf((1 - &&nominal_down_small_&year.) * &&rpi&year.);
%rates(&year.);
proc means data = output.s_&scenario. noprint nway;
var transbill&year.;
output out = temporary sum=cost;
run;
data temp;
length scenario $ 30;
scenario = "&scenario.";
large = &&real_down_large_&year.;
small = &&real_down_small_&year.;
set temporary (keep=cost);
call symputx('leave',abs(cost) le &in_year_tolerance);
put (_all_) (=);
run;
data output.summary_of_caps;
set output.summary_of_caps temp;
run;
%if (&leave) %then %goto quit;
%end;
%quit:
%mend set_downward_caps;