SAS: creating multiples files from multiple data sets - database

I have 24 datasets that are structured in the same way. By that I mean the same column headers (time, date, price, stock symbol), data set structure etc. I don't wish to append all 24 files since one data set is to big to handle. I named all my data sets with the name "file1 file2 file3 file4....up to file24".
What I want to do is the following:
For example change the date format in all of my 24 files at once;
Be able to extract from each file# a specific stock symbol like 'Dell' and append all the extracted 'Dell' data;
and finally, how can I create a loop that that allows me to change the stock symbol from 'Dell' to another stock symbol in my list like 'Goog'? I would like that loop to do (2) for all of my stock symbols.

To change the date format in a dataset, it might be a bad idea to loop through all the observations. The standard syntax is -
proc datasets library = your_libname nolist;
modify dataset_name;
format variable_name format_name;
quit;
Given that the modify statement does not take multiple SAS files, you will have to wrap it in a macro for all 24 files
%macro modformats();
proc datasets library = <your libname> nolist;
%do i = 1 %to 24;
modify file&i;
format <variable name> <format name>;
%end;
quit;
%mend modformats;
To extract and append all 'Dell' related data, it is best to use views.
For example, you first define a view (note that there is no physical dataset called 'all_files' created here) -
data all_files / view = all_files;
set file1 file2... file24;
run;
and you can then write -
data dell;
set all_files;
where ticker = 'DELL';
run;

This is a prototype of solution. I don't know whether you need to do many symbol changes. Will modify the code upon request. Haven't tested, it should work though.
%macro test();
%do i=1 %to 24;
data file&i;
set file&i;
format date [dateformat]; /*replace with the format you want */
proc append base=unions data=file&i(where=(stock_symbol='Dell'));
data unions;
set unions;
stock_symbol='Goog';
%end;
%mend;
%test(); run;

Related

The SAS way to loop over a table outside a data step

I am wondering what is the cleanest way how to perform a macro loop over a data table outside a data step in order to e.g. read in files from the table have and do some complex analysis for each of the files.
Assume we have a table have containing a set of file names and other meta data:
N filename purpose
1 foo.xls Blue team data
2 bar.xls Read team data
I was thinking of something like
%local lines current_file current_purpose;
proc sql noprint;
select count(*) into: lines from have;
quit;
%do I=1 %to &lines.;
%put --- Process file number &I. ---;
data _null_;
set have;
if _n_=&I. then do;
call symput('current_file',filename);
call symput('current_purpose',purpose);
end;
run;
%put --- &current_file. contains &purpose.;
/* Here comes the actual analysis */
%end;
Is this the way how to do it? For me, this does not look like the simplest way.
Related questions:
SAS loop through datasets
SAS let statement: refer to a cell value?
So if you defined a macro name ANALYSIS with input parameters FILENAME and PURPOSE.
%macro analysis(filename,purpose);
/* Here comes the actual analysis */
title &purpose ;
proc import datafile="&filename" ....
%mend;
Then you can use a data step to generate one call to the macro for each observation. You can use CALL EXECUTE, but I find it clearer and easier to debug to just write the code to a file and then %INCLUDE it. Especially when the parameter name matches the variable name in the metadata being used to drive the code generation.
So this step :
filename code temp;
data _null_;
set have;
file code;
put '%analysis(' filename= ',' purpose= :$quote. ')' ;
run;
Will generate a program like:
%analysis(filename=foo.xls,purpose="Blue team data")
%analysis(filename=bar.xls,purpose="Red team data")
Which you can then run using
%include code / source2;

SAS: Having a macro loop over an array iteratively

In my data I define an array as all the variables starting with rev_:
data df;
set def;
array vnames rev:;
run;
And now I want to repeat the means function over this array. For example, let's say each element in vnames is a different class variable i'd like part of my command.
Let's say rev: actually expands to rev1 rev2 rev3 revolution
So I want sas to do this:
proc means data=df;
var rev1;
run;
proc means data=df;
var rev2;
run;
proc means data=df;
var rev3;
run;
proc means data=df;
var revolution;
run;
Now the function I end up calling might be more complex. I thought I should set up a macro and then run the array and macro together, but I have no idea how to do this.
I don't really have any sample data, but the idea is to run the same command (or series of commands, ie a macro) over a named array.
Tom's answer is right if it solves your actual problem; generally, SAS provides a lot of ways to do things that don't require macros to brute force. One PROC step will undoubtedly be faster than multiple.
But, if you do need to, the answer is to look at dictionary.columns or sashelp.vcolumn or even proc contents output. Particularly since your list of rev variables is not just a numeric iterator (revolution), you can't just iterate numerically. The array you define doesn't persist past that data step, don't forget - they're data step programming tools but have no use in macro language or procs. revs: is still available in the proc of course, but vnames[1] is not.
Say your macro is:
%macro runmeans(data=, var=, out=);
proc means data=&data.;
var &var.;
output out=&out. mean(&var.)=;
run;
%mend runmeans;
Then you can do something like this:
proc sql;
select cats('%runmeans(data=SASHELP.CLASS, var=',name,',out=M_',name,')')
into :runmeanslist separated by ' '
from dictionary.columns
where libname='SASHELP' and memname='CLASS' and upcase(name) like '%EIGHT'; *weight height;
quit;
&runmeanslist.
If you don't feel comfortable in SQL, you can do the same thing in a data step using call execute and sashelp.vcolumn dataset, or proc contents output written to a file.
It sounds like you just want the WAYS statement in PROC MEANS. But your sample code doesn't match your description of what you want. If you really want to find the means for all numeric variables and run it separately for many different class variables then this is the code you want.
proc means data=have ;
class rev: ;
ways 1;
run;
Here's an example of using the ways in proc means to do this only once (rather than looping or multiply executing) without wildcard.
proc sql;
select name into :varlist separated by ' '
from dictionary.columns
where libname='SASHELP' and memname='CLASS'
and not (upcase(name) like '%EIGHT');
quit;
proc means data=sashelp.class;
class &varlist.;
ways 1;
run;
Something more like that. (I turn the class statement around here and use height/weight for VAR and class variables the non-numerics, as that makes more sense).
As the other answers suggest, there's probably a better way to get what you need in the larger context of your project. That being said, I thought call execute() was worth a mention as it comes close to what you were particularly looking for.
%macro SomeProc(dataset,variable);
proc univariate data=&dataset.;
var &variable.;
run;
%mend SomeProc;
data _null_;
set sashelp.Cars end=lastObs;
array vnames[*] MPG_:;
if lastObs then do i=1 to dim(vnames);
call execute('%SomeProc(sashelp.Cars,'||vname(vnames[i])||')');
end;
run;

SAS-Variable has been defined as both character and numeric

I have 100 datasets in a library (called DATA). When I want to merger them into one dataset, SAS said some variables defined as both character and numeric. So I used following codes to modify the problem through change variables' format. However, SAS still reported the error: Variable has been defined as both character and numeric when I attempting to change their formats.
edit: I changed my code and use input function to solve this problem. However, it reported same errors as shown in the pic:
%macro step1(sourcelib=,source=);
proc sql noprint; /*read datasets in a library*/
create table mytables as
select *
from dictionary.tables
where libname = &sourcelib
order by memname ;
select count(memname)
into:obs
from mytables;
%let obs=&obs.;
select memname
into : memname1-:memname&obs.
from mytables;
quit;
data
%do i=1 %to &obs.;
&source.&&memname&i
%end;
;
set
%do i=1 %to &obs.;
&source.&&memname&i
%end;
;
price1=input(price,best12.);
volume1=input(volume,best12.);
bid_imp__vol1=input(bid_imp__vol,best12.);
Ask_Imp__Vol1=input(Ask_Imp__Vol,best12.);
drop price volume bid_imp__vol ask_Imp__Vol;
run;
data
%do i=1 %to &obs.;
&source.&&memname&i
%end;
;
set
%do i=1 %to &obs.;
&source.&&memname&i
%end;
(rename=(price1=price volume1=volume bid_imp__vol1=bid_imp__vol Ask_Imp__Vol1=Ask_Imp__Vol));
run;
%mend;
%step1(sourcelib='DATA',source=DATA.);
FORMAT has nothing whatsoever to do with the issue. Type is the issue. Format is "how do you want me to print this out in human readable format", but Type is "what is fundamentally in this block of memory".
You'll need to do one of three things here.
Drop the conflicting variables.
Rename the conflicting variables, either in a previous step or in the SET dataset options.
Change the type of the conflicting variables, either in a previous step or through renaming them in the SET dataset options and then converting to the proper type.
In a macro like the above, one thing you can do is to use dictionary.columns to determine what variables are the wrong type. Since you have a limited set of variables it looks like (i.e., this doesn't need to be totally generic), you can just pick one type (sounds like numeric) and query dictionary.columns for any variables that are not of that type. Then apply the conversion for those that meet the criteria (of being character).
The way you constructed your macro is going to make this a bit more complex though; I think you may want to have a separate macro that goes through each dataset and converts its type to the consistent type, one at a time, before you run this macro at all. Otherwise it's going to be a headache to manage the macro variable lists.

showing specific columns from loaded from CSV file in SAS

i have a question to ask,
i'm dealing with a small csv database where i need to perform some calculations with SAS, i have exported an excel file to CSV format and i want to load some columns in SAS to work with, the problem i have encountered is the order of column mismatch after loading: here is the code :
cars6.txt
AMC,Concord,22,2930,4099
AMC,Concord,22,2930,4099
AMC,Pacer,17,3350,4749
AMC,Spirit,22,2640,3799
Buick,Century,20,3250,4816
Buick,Electra,15,4080,7827
code to output data:
DATA cars6;
INFILE "/folders/myfolders/hbv1/cars6.txt" delimiter=',';
INPUT make $ model $ mpg $ weight price;
RUN;
TITLE "cars5 data";
PROC PRINT DATA=cars5(OBS=5);
RUN;
but i want to show only the columns: Make , weight, price ?
so how to print selected columns ?
and how to do that if i have named columns (the example differs from this one only by column names 'variables' at the beginning) but i have tried to call the columns y name , it printed them but with bad data ( the sas is taking ordered column data and ignoring the column data based on column name :
input make $ model $ price $;
thank You.
If you are writing you own program to read a CSV file then you probably want to use the DSD and FIRSTOBS=2 options on the INFILE statement. This will treat missing values properly and skip the line with the variable names. You also probably want to add the TRUNCOVER option to properly handle lines that have only some of the columns. It is worth the extra work to properly define your variables by including LENGTH or ATTRIB statements. Otherwise SAS will have to guess whether you want numeric or character variables and how long to make the character variables from the way that you first reference them.
DATA cars6;
INFILE "/folders/myfolders/hbv1/cars6.txt" DSD DLM=',' FIRSTOBS=2 TRUNCOVER;
LENGTH make model $20 mpg weight price 8 ;
INPUT make model mpg weight price;
RUN;
But your program will need to know the order of the variables in the file. If your data files are inconsistent then you can try using PROC IMPORT to read the CSV file. It can take the names from the first row and make an educated guess at what the variable types are.
proc import datafile='/folders/myfolders/hbv1/cars6.txt' out=car6 replace dbms=dlm ;
delimiter=',';
getnames=yes;
run;
When using the data from the SAS dataset you have created you can use the SAS language to select the columns of interest. Syntax will depend on the procedure you are using. So for PROC PRINT use the VAR statement.
proc print data=car6 ;
var price make model;
run;
And for PROC FREQ use the TABLES statement.
proc freq data=car6;
tables make model;
run;
Consider using proc import and select columns as needed in print. Proc import can handle comma-separated files saved as either .txt or .csv. Below is demonstration of either text file type:
%Let fpath = /folders/myfolders/hbv1;
** READING IN TXT;
proc import
datafile = "&fpath/cars6.txt"
out = Cars6
dbms = csv replace;
run;
** READING IN CSV;
proc import
datafile = "&fpath/cars6.csv"
out = Cars6
dbms = csv replace;
run;
title "cars6 data";
proc print data=cars1(obs=5);
var make model price;
run;
Alternatively, you can drop variables and re-order needed columns for report with retain:
data CarsReport;
retain make model price;
set Cars6;
keep make model price;
run;
title "cars6 data";
proc print data=CarsReport(obs=5);
run;
Try the VAR statement in PROC PRINT.
DATA cars6;
INFILE "/folders/myfolders/hbv1/cars6.txt" delimiter=',' firstobs=2;
INPUT make $ model $ mpg $ weight price;
RUN;
proc print data=cars6 noobs;
var make weight price;
run;

Looping over libraries in SAS

I have thousands of files names CLICK, and they all reside inside different folder on my linux. I have assigned every single folder to a lib, and I am trying to extract each click file (and eventually append, while not in code shown below). This is what I have done
`%let listlib=
A B C;
%macro char_loop();
%let i=1;
%let v=%scan(&listlib,&i);
%do %while(&v ne ) ;
data click&v;
set &v.click;
type = &v;
run;
%let i=%eval(&i+1);
%let v=%scan(&listlib,&i);
%end;
%mend;
%char_loop;`
However, it seems that SAS is not able to loop through "set &v.click;", thus is changing lib. The log says "WORK.ACLICK.DATA does not exist". What am I missing here?
&v. is the macro variable - & starts and . terminates. The . is not necessary if something else like a space or semicolon makes it obvious where the termination occurs, but it is technically a component. So you need
set &v..click
to get the actual period.
On a side note, SAS isn't really very good at this sort of thing. You'd be better off getting perl or something similar to collect the click files into one directory, or better yet combine them into one file (I've actually done this before with clickstream files). SAS isn't very efficient at opening and closing lots of individual files and will take a lot longer to do it.
Adding to the other answer here, if you're going to set them in one pass (which is a good idea), the best way is probably not to macro loop. You can more easily do it like this:
*macro to define an element of the set statement;
%macro set(lib=);
&lib..click
%mend set;
*proc sql to generate a list of these calls from dictionary.tables - make sure you do not have any tables you need excluded from this, and if so use WHERE to do so;
proc sql;
select cats('%set(lib=',libname,')')
into :setlist separated by ' '
from dictionary.tables
where memname='CLICK';
quit;
*set them;
data want;
set &setlist. indsname=indsn;
type = scan(indsn,1,'.');
run;
Usually, macro looping is more complicated, and slower, than doing it through regular old data steps and SQL. indsname works in 9.3+.
as Joe pointed out the main issue was the missing second period ( &v..)
that said, it should be little quicker for you, if you set them on the initial read rather then making lots of work files to subsequent concatenate them together.
something like this should work:
*******************************************************************;
*** a few test datasets.
*** note: I prefixed that dataset with the libname because
*** they are all technically in the same directory
*** and to highlight the difference between &v. and &v..
*******************************************************************;
libname a (work);
libname b (work);
libname c (work);
data a.aclick;
do i = 1 to 10;
output;
end;
run;
data b.bclick;
do j = 1 to 10;
output;
end;
run;
data c.cclick;
do k = 1 to 10;
output;
end;
run;
*** modified macro ****;
%macro char_loop(listlib=);
%let ListN=%eval(%length(&listlib)-%length(%sysfunc(compress(&listlib)))+1);
data click(drop=_i);
set
%do i=1 %to &listN ;
%let v=%scan(&listlib,&i);
&v..&v.click (in=&v)
%end;
;
array _t &listlib;
do _i= 1 to &listN;
if _t(_i)=1 then type=vname(_t(_i));
end;
run;
%mend;
%char_loop(listlib=a b c );
proc print data=click;
run;

Resources