I have the following macro:
rsubmit;
data indexsecid;
input secid 1-6;
datalines;
108105
109764
102456
102480
101499
102434
107880
run;
%let endyear = 2014;
%macro getvols1;
* First I extract the secids for all the options given a date and
an expiry date;
%do yearnum = 1996 %to &endyear;
proc sql;
create table volsurface1&yearnum as
select a.secid, a.date, a.days, a.delta, a.impl_volatility,
a.impl_strike, a.cp_flag
from optionm.vsurfd&yearnum as a, indexsecid as b
where a.secid=b
and a.impl_strike NE -99.99
order by a.date, a.secid, a.impl_strike;
quit;
%if &yearnum > 1996 %then %do;
proc append base= volsurface11996 data=volsurface1&yearnum;
run;
%end;
%end;
%mend;
%getvols1;
proc download data=volsurface11996;
run;
endrsubmit;
data _null_;
set work.volsurface11996;
length fv $ 200;
fv = "C:\Users\user\Desktop\" || TRIM(put(indexsecid,4.)) || ".csv";
file write filevar=fv dsd dlm=',' lrecl=32000 ;
put (_all_) (:);
run;
On the code above I have: where a.secid=108105. Now I have a list with several secid and I need to run the macro once for each secid. I am looking to run it once and generate a new dataset for each secid.
How can I do that? Thanks
Here is an approach that uses
A single data step set statement to combine all the input datasets
A data set list so you don't have to call each input by name
A hash table to limit the output to your list of secids
proc sort to order the output
Rezza/DWal's approach to output separate csvs with file filevar =
%let startyear = 1996;
%let endyear = 2014;
data volsurface1;
/* Read in all the input tables */
set optionm.vsurfd&startyear.-optionm.vsurfd&endyear.;
where impl_strike ~= -99.99;
/* Set up a hash table containing all the wanted secids */
if _N_ = 1 then do;
declare hash h(dataset: "indexsecid");
_rc = h.defineKey("secid");
_rc = h.defineDone();
end;
/* Only keep observations where secid is matched in the hash table */
if not h.find();
/* Select which variables to output */
keep secid date days delta impl_volatility impl_strike cp_flag;
run;
/* Sort the data */
proc sort data = volsurface1;
by secid date secid impl_strike;
run;
/* Write out a CSV for each secid */
data _null_;
set volsurface1;
length fv $200;
fv = "\path\to\output\" || trim(put(secid, 6.)) || ".csv";
file write filevar = fv dsd dlm = ',' lrecl = 32000;
put (_all_) (:);
run;
As I don't have your data this is untested. The only constraint I can see is that the contents of indexsecid must fit in memory. If you were not concerned with the order this could be all done in one data step.
SRSwift thank you for your comprehensive answer. It run smoothly with no errors. The only issue is that I am running it on a remote server (wharton) using:
%let wrds=wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=wrds;
signon username=_prompt_;
rsubmit;
and on the log it says it wrote the file to my folder on the server but I can t see any file on the server. The log says:
NOTE: The file WRITE is:
Filename=/home/uni/user/108505.csv,
Owner Name=user,Group Name=uni,
Access Permission=rw-r--r--,
Last Modified=Wed Apr 1 20:11:20 2015
Related
Thanks in advance for any and all suggestions.
I am working in SAS for the first time to complete a (theoretically) simple task. I have a parent folder in a Windows directory which contains several sub-folders. The sub-folders are not systematically named. For example, if the parent folder is called "W:/Documents/ParentFolder/", then the sub-folders might be "W:/Documents/ParentFolder/ABC1D26/" and "W:/Documents/ParentFolder/HG34A/".
Each sub-folder contains several SAS datasets. In any particular sub-folder, some of the SAS datasets have the .sas7bdat extension and others have the .sd2 extension. Furthermore, no two sub-folders necessarily have the same number of datasets, and the datasets are not systematically named either.
I would like to write a program in SAS which looks inside each sub-folder, loads any .sas7bdat or .sd2 datasets it finds, and exports the dataset into a different folder as a .dta file.
There are too many SAS datasets in each sub-folder to do this task manually for each dataset, but there are not so many sub-folders that I cannot feed the sub-folder names to SAS manually. Below is a commented version of my attempt at a program which completes this task. Unfortunately, I encounter many errors, no doubt due to my inexperience with SAS.
For example, SAS gives the following errors: "ERROR: Invalid logical name;" "ERROR: Error in the FILENAME statement;" and "ERROR: Invalid DO loop control information;" among others.
Can anyone offer any advice?
%macro sas_file_converter();
/* List the sub-folders containing SAS files in the parent folder */
%let folder1 = W:\Documents\ParentFolder\ABC1D26;
%let folder2 = W:\Documents\ParentFolder\HG34A;
/* Start loop over the sub-folders. In each sub-folder, identify all the files, extract the file names, import the files, and export the files. */
%do folder_iter = 1 %to 2;
/* Define the sub-folder that is the focus of this iteration of the loop */
filename workingFolder "&&folder&folder_iter..";
/* Extract a list of datasets in this sub-folder */
data datasetlist;
length Line 8 dataset_name $300;
List = dopen('workingFolder');
do Line = 1 to dnum(List);
dataset_name = tranwrd(tranwrd(lowcase(trim(dread(List,Line))),".sas7bdat",""),".sd2","");
output;
end;
drop List Line;
run;
/* Get number of datasets in this sub-folder */
proc sql nprint;
select count(*)
into :datasetCount
from WORK.datasetlist;
quit;
/* Loop over datasets in the sub-folder. In each iteration of the loop, load the dataset and export the dataset. */
%do dataset_iter = 1 %to &datasetCount.;
/* Get the name of the dataset which is the focus of this iteration */
data _NULL_;
set WORK.DATASETLIST (firstobs=&dataset_iter. obs=&dataset_iter.);
call symput("inMember",strip(dataset_name));
end;
/* Set the libname */
LIBNAME library '&folder&folder_iter..';
/* Load the dataset */
data new;
set library.&inMember.;
run;
/* Export the dataset */
proc export data=library.&inMember.
file = "W:\Documents\OutputFolder\&inMember..dta"
dbms = stata replace;
run;
%end;
%end;
%mend;
Thanks very much for your helpful suggestions. I used the following program to perform this task. It is largely based on Richard's example. I'm posting it here for the benefit of future readers; Richard's example includes additional code that may help you understand what this program does.
Additional files/folders can be accommodated by adding them to the "%let folders" line. (I write many file/folder names here.)
Note that I separate the sub-folders with three dashes ("---") because some of the files and sub-sub-folders have spaces in their names. Note also that for the .sd2 files, I was able to simply replace the instances of "sas7bdat" with "sd2" and the program worked fine.
Thanks again.
%let inputfolder = W:\Documents\ParentFolder;
%let folders = ABC1D26---HG34A---Sub Folder\ZH323;
%let exportfolder = W:\Documents\ExportFolder;
data _null_;
do findex = 1 to countw("&folders.","---");
folder = scan("&folders", findex, "---");
path = catx("/", "&dataroot.", folder);
call execute ('libname user ' || quote(trim(path)) || ';');
length fileref $8;
call missing(fileref);
rc = filename(fileref, path);
did = dopen(fileref);
do dindex = 1 to dnum(did);
filename = lowcase(dread(did,dindex));
if scan(filename,-1) ne 'sas7bdat' then continue;
xptfilename = tranwrd(filename, '.sas7bdat', '.dta')
xptfilepath = catx("\", "&exportpath", folder, xptfilename);
datasetname = tranwrd(filename, '.sas7bdat', '');
sascode = 'PROC EXPORT data=' || trim(datasetname) || " replace file=" || quote(trim(xptfilepath)) || " dbms=stata; run;";
call execute (trim(sascode));
end;
did = dclose(did);
call execute ('libname user clear;');
rc = filename(fileref);
end;
run;
You can perform all the code generation in a DATA Step and submit via CALL EXECUTE. The only part of the program that would be macro related is specifying the sas data root folder, the names of the sub-folders to search and the export path.
The program could be very similarly macro coded, but could be tougher to debug, and would require %sysfunc wrappers around the function calls.
Example:
/* Create some sample data in some example folders */
%let workpath = %sysfunc(pathname(WORK));
%let name = %sysfunc(dcreate(ABC, &workpath));
%let name = %sysfunc(dcreate(DEF, &workpath));
libname user "&workpath./ABC";
data one two three four five;
set sashelp.class;
run;
libname user "&workpath./DEF";
data six seven eight nine ten;
set sashelp.class;
run;
libname user clear;
/* export all data sets in folders to liked named export files */
%let dataroot = &workpath;
%let folders = ABC DEF;
%let exportpath = c:\temp;
data _null_;
do findex = 1 to countw("&folders");
folder = scan("&folders", findex);
path = catx("/", "&dataroot.", folder);
call execute ('libname user ' || quote(trim(path)) || ';');
length fileref $8;
call missing(fileref);
rc = filename(fileref, path);
did = dopen(fileref);
do dindex = 1 to dnum(did);
filename = dread(did,dindex);
if scan(filename,-1) ne 'sas7bdat' then continue;
xptfilename = tranwrd(filename, '.sas7bdat', '.dta');
xptfilepath = catx('/', "&exportpath", xptfilename);
datasetname = tranwrd(filename, '.sas7bdat', '');
sascode = 'PROC EXPORT data=' || trim(datasetname)
|| " replace file=" || quote(trim(xptfilepath))
|| " dbms=stata;"
;
call execute (trim(sascode));
end;
did = dclose(did);
call execute ('run; libname user clear;');
rc = filename(fileref);
end;
run;
I need to do inner join with a dataset which has date and month in its name, i.e.
Account_2019_10 (as in Oct 2019).
I need to perform this inner join in a loop for each month from a specific month-year till today's month-year.(i.e. from Sept 2019 till July 2020). Considering the dataset has the month & year in the above format (2019_10 for Oct 2019), how would i perform this loop and append all the results in a group for that month-year?
To change the name of the dataset being referenced you will need to use some code generation. Typically just by using macro variables in place of the dataset name(s).
To loop over dates using an offset value and the INTNX() function. You can use INTCK() to determine how many months to generate.
data _null_;
start = '01OCT2019'd ;
end = '01JUL2020'd ;
length name $32 names $1000;
do offset=0 to intck('month',start,end);
date=intnx('month',start,offset);
name='account_'||translate(put(date,yymm7.),'_','M');
names=catx(' ',names,name);
end;
call symputx('names',names);
run;
Now that you have this list of dataset names you can use it in your code to combine the datasets.
data all;
set &names ;
run;
If your monthly tables do not actually have a variable already that indicates the month you can add one by using the INDSNAME= option of the SET statement. Note that the variable created by that option is not saved so you need to copy the value.
data all;
length dsname $41 ;
set &names indsname=dsname;
month = dsname;
run;
Use the SQL data dictionary to identify the data sets containing a yyyy_mm construct.
Select them into a macro variable that will be used to combine all the data sets in a SET statement with INDSNAME option.
Example:
%macro makefakedata();
%local year month amount;
%do year = 2019 %to 2020;
%do month = 1 %to 12;
data work.account_&year._%sysfunc(putn(&month,z2.));
do id = 1 to 10;
amount = 100 * %sysfunc(monotonic()) + id;
output;
end;
run;
%end;
%end;
%mend;
data work.foo_bar;
set sashelp.class;
run;
%makefakedata;
ods listing;
proc sql noprint;
select
catx('.', libname, memname) as dataset
, input (
cats ( substr ( memname, index(memname,'_')+1 ) , '_01' )
, ? YYMMDD10.
) as month
into
:datasets separated by ' '
, :months separated by ' '
from
dictionary.tables
where libname = 'WORK'
and index(memname,'_')
having
month
;
%put &=datasets;
%put &=months;
data all_month_named_data;
set &datasets indsname=from;
source = from;
month = input (
cats ( substr ( source, index(source,'_')+1 ) , '_01' )
, YYMMDD10.);
format month yymm7.;
run;
How do I get the dates from the file name to populate the date column?
I have 23 data files:
price_20070131
price_20070228
price_20070331
.
.
.
price_20081130
In the data file, price_20070131, it currently looks like this:
ID Product
001 A
002 B
003 C
I want my output to look like this:
ID Product Date
001 A 31Jan2007
002 B 31Jan2007
003 C 31Jan2007
The same will be repeated to all the 23 data files. And final result would merge all 23 files to look like this:
ID Product Date
001 A 31Jan2007
002 B 31Jan2007
003 C 31Jan2007
001 A 28Feb2007
002 B 28Feb2007
003 C 28Feb2007
.
.
.
.
001 A 30Nov2007
002 B 30Nov2007
003 C 30Nov2007
Use the INDSNAME option to add the file name and then use SCAN/SUBSTR() to extract the date portion. This would append all data sets starting with price_2007 and price_2008 and add a date field.
data want;
set price_2007: price_2008: indsname=source;
date=input(scan(source, 2, '_'), yymmdd10.);
format date date9.;
run;
EDIT: SAS 9.1 is about 15 years old so you should really upgrade. Upgrades are included with your license. This means you don't have data set lists or the ability to use the INDSNAME option and means you either need a macro solution of some sort. 4 lines of code becomes 47...
Making an assumption that your data sets are named PRICE_LAST_DAY_MONTH consistently.
*sample data sets for demonstration;
data price_20080131;
set sashelp.class;
test=1;
run;
data price_20080229;
set sashelp.class;
test=2;
run;
%macro stack_data_add_date(start_date=, end_date=, outData=, debug=);
%*get parameters for looping, primarily the number of intervals;
data _null_;
start_date= input("&start_date", yymmdd10.);
end_date = input("&end_date", yymmdd10.);
n_intervals = intck('month', start_date, end_date);
call symputx('start_date', start_date, 'l');
call symputx('end_date', end_date, 'l');
call symputx('n_intervals', n_intervals, 'l');
run;
%*loop from 0 - starting time to end;
%do i=0 %to &n_intervals;
%*determine end of month date for dataset name;
%let date = %sysfunc(intnx(month, &start_date, &i., e));
%*output statistics for testing;
%if &debug=Y %then %do;
%put &n_intervals;
%put &start_date;
%put &end_date;
%end;
%*create a view with the data and date added in;
data _temp / view=_temp;
set price_%sysfunc(putn(&date, yymmddn8.));
date = &date.;
format date date9.;
run;
%*insert into master table;
proc append base=&outData data=_temp;
run;
%*delete view so it doesn't exist for next loop;
proc datasets lib=work nodetails nolist;
delete _temp / memtype=view;
run;quit;
%end;
%mend;
*test;
%stack_data_add_date(start_date=20080131, end_date=20080229, outData=want, debug=Y);
I am defined four variables here and each of the variables with different number of ICD10 codes:
%LET DX_27800_CODE = 'E6609', 'E661', 'E668', 'E669';
%LET DX_27801_CODE = 'E6601';
%LET DX_2859_CODE = 'D649';
%LET DX_6202_CODE = 'N8320', 'N8329';
now I want to use create an array that can easy mapping those variables that with my icd 10 table columns so that I could assign flags variables with it.
the regular way would be:
data test; set input;
if (dx1 in ( &DX_27800_CODE) or dx2 in (&DX_27800_CODE) or dx3 in (&DX_27800_CODE))
then dx_27800 = 1; else dx_27800 =0;
run;
in the regular way I would need to do this procedure four times to get all four flags variable. So I'm wondering if it could be done by using array.
data test; set input;
array dx_code10 [4] &DX_27800_CODE &DX_27801_CODE &DX_2859_CODE &DX_6202_CODE;
ARRAY DX_VARIABLE[4] DX_27800 DX_27801 DX_2859 DX_6202;
DO I = 1 TO DIM(dx_code10);
IF (DX1 IN (DX_CODE10[I]) OR DX2 IN (DX_CODE10[I]) OR DX3 IN (DX_CODE10[I]))
THEN DX_VARIABLE[I] = 1;
ELSE DX_VARIABLE[I] = 0;
END;
END;
RUN;
But seems like it can't be done by this way. Please help me to solve this problem. thanks.
I think a better approach is to use formats. I'd rather have those DX codes in a spreadsheet or a text file or something, and then input that to make the formats, but even with the not-best-practice %LETs, you can still use a format solution.
Approach is to make a format that turns each of those DX code pairs into a value that returns the dx value (the 27800, 27801, etc.); then use that to drive how you assign the followup array.
%LET DX_27800_CODE = 'E6609', 'E661', 'E668', 'E669';
%LET DX_27801_CODE = 'E6601';
%LET DX_2859_CODE = 'D649';
%LET DX_6202_CODE = 'N8320', 'N8329';
proc format;
value $dxcode
&dx_27800_code = '27800'
&dx_27801_code = '27801'
&dx_2859_code = '2859'
&dx_6202_code = '6202'
other=' '
;
quit;
data input;
input dx1 $;
datalines;
E6601
E6609
E6608
E661
E668
D649
D650
N8320
E669
N8329
;;;;
run;
data want;
set input;
array dx_codes[4] dx_27800 dx_27801 dx_2859 dx_6202;
dx_code_val = put(dx1,$dxcode5.);
do _i = 1 to dim(dx_codes);
if dx_code_val = scan(vname(dx_codes[_i]),2,'_') then dx_codes[_i]=1;
else dx_codes[_i]=0;
end;
run;
For your specific example you could use FINDW() function instead of the IN operator. Turn your code lists into delimited strings instead.
%LET DX_27800_CODE = E6609,E661,E668,E669;
%LET DX_27801_CODE = E6601 ;
%LET DX_2859_CODE = D649 ;
%LET DX_6202_CODE = N8320,N8329;
data test;
set input;
array dx_code_list (4) $200 _temporary_ ("&dx_27800_code" "&dx_27801_code" "&dx_2859_code" "&dx_6202_code");
array dx_variable (4) dx_27800 dx_27801 dx_2859 dx_6202;
array dx dx1-dx3 ;
do i = 1 to dim(dx_variable);
dx_variable(i)=0;
do j=1 to dim(dx) while (dx_variable(i)=0);
if findw(dx_code_list(i),dx(j),',','it') then dx_variable(i)=1;
end;
end;
drop i j;
run;
So if I make some sample data.
data input ;
length dx1-dx3 $7 ;
input dx1 - dx3 ;
cards;
E6609 E661 .
E668 E669 .
E6601 . .
D649 N8320 N8329
. . .
;
I get this result:
How can I update the below code so that I don't get the 'log window full' message? Most of the lines generated in the log window are due to proc corr. I tried adding the proc printto line at the beginning of the code but log window still gets filled up for some reason. Thanks.
PROC PRINTTO PRINT='C:\Users\test\auto.lst' NEW;
RUN;
%MACRO RunProgram(month, year, n);
data sourceh.group2;
set sourceh.group_&month.&year.;
int1=int;
int2 = ceil(int/2);
int3 = ceil(int/3);
int4 = ceil(int/4);
int5 = ceil(int/5);
int6 = ceil(int/15);
int7 = ceil(int/30);
proc sort data=sourceh.group2;
by symbol day month year int&n.;
run;
proc corr data=sourceh.group2; by symbol day;
var zone ztwo;
ods output pearsoncorr=sourceh.zcorr;
run;
%MEND ;
%macro l;
%do n=1 %to 7;
%RunProgram(Dec, 2014, &n);
%RunProgram(Nov, 2014, &n);
%end;
%mend;
%l;
Redirect your log using proc
Printto.
Proc printto log='templog.log' new;
Run;
You can reset if afterwards using
Proc printto log=log; run;
Alternatively you can set the option nonotes on so that the log doesn't get output unless there's an error. This can make it hard to debug.
Option nonotes;
Turn notes option back on:
Option notes;
Generally, I think you should output your log so that you can check to see if something's gone wrong; Reeza's answer addresses this approach. However, you can also just clear the log by using the command
dm 'clear log';
If you insert this as the first or last line of your RunProgram macro, your log will clear on each iteration of the macro. This will get rid of your problem so long as one iteration of the macro doesn't fill up your log.