i have a question to ask,
i'm dealing with a small csv database where i need to perform some calculations with SAS, i have exported an excel file to CSV format and i want to load some columns in SAS to work with, the problem i have encountered is the order of column mismatch after loading: here is the code :
cars6.txt
AMC,Concord,22,2930,4099
AMC,Concord,22,2930,4099
AMC,Pacer,17,3350,4749
AMC,Spirit,22,2640,3799
Buick,Century,20,3250,4816
Buick,Electra,15,4080,7827
code to output data:
DATA cars6;
INFILE "/folders/myfolders/hbv1/cars6.txt" delimiter=',';
INPUT make $ model $ mpg $ weight price;
RUN;
TITLE "cars5 data";
PROC PRINT DATA=cars5(OBS=5);
RUN;
but i want to show only the columns: Make , weight, price ?
so how to print selected columns ?
and how to do that if i have named columns (the example differs from this one only by column names 'variables' at the beginning) but i have tried to call the columns y name , it printed them but with bad data ( the sas is taking ordered column data and ignoring the column data based on column name :
input make $ model $ price $;
thank You.
If you are writing you own program to read a CSV file then you probably want to use the DSD and FIRSTOBS=2 options on the INFILE statement. This will treat missing values properly and skip the line with the variable names. You also probably want to add the TRUNCOVER option to properly handle lines that have only some of the columns. It is worth the extra work to properly define your variables by including LENGTH or ATTRIB statements. Otherwise SAS will have to guess whether you want numeric or character variables and how long to make the character variables from the way that you first reference them.
DATA cars6;
INFILE "/folders/myfolders/hbv1/cars6.txt" DSD DLM=',' FIRSTOBS=2 TRUNCOVER;
LENGTH make model $20 mpg weight price 8 ;
INPUT make model mpg weight price;
RUN;
But your program will need to know the order of the variables in the file. If your data files are inconsistent then you can try using PROC IMPORT to read the CSV file. It can take the names from the first row and make an educated guess at what the variable types are.
proc import datafile='/folders/myfolders/hbv1/cars6.txt' out=car6 replace dbms=dlm ;
delimiter=',';
getnames=yes;
run;
When using the data from the SAS dataset you have created you can use the SAS language to select the columns of interest. Syntax will depend on the procedure you are using. So for PROC PRINT use the VAR statement.
proc print data=car6 ;
var price make model;
run;
And for PROC FREQ use the TABLES statement.
proc freq data=car6;
tables make model;
run;
Consider using proc import and select columns as needed in print. Proc import can handle comma-separated files saved as either .txt or .csv. Below is demonstration of either text file type:
%Let fpath = /folders/myfolders/hbv1;
** READING IN TXT;
proc import
datafile = "&fpath/cars6.txt"
out = Cars6
dbms = csv replace;
run;
** READING IN CSV;
proc import
datafile = "&fpath/cars6.csv"
out = Cars6
dbms = csv replace;
run;
title "cars6 data";
proc print data=cars1(obs=5);
var make model price;
run;
Alternatively, you can drop variables and re-order needed columns for report with retain:
data CarsReport;
retain make model price;
set Cars6;
keep make model price;
run;
title "cars6 data";
proc print data=CarsReport(obs=5);
run;
Try the VAR statement in PROC PRINT.
DATA cars6;
INFILE "/folders/myfolders/hbv1/cars6.txt" delimiter=',' firstobs=2;
INPUT make $ model $ mpg $ weight price;
RUN;
proc print data=cars6 noobs;
var make weight price;
run;
Related
Im working with a database in SAS that updates every so often. I want the macro to automatically load the most recent dataset of a given year. The datasets cover the years 2015-2018 and each year has a different updated version which is stated in the name of the dataset, i.e. 2015_version9. With my current code you need to update the macro manually everytime a dataset change its version and name.
You can scan through each library and find the max version number, then save those to a single macro variable string that you can supply to a set statement. Here are the assumptions of this solution:
Your libraries are named lib_2015, lib_2016, etc. and follow 8-char libname requirements
Your libraries are static for years 2015-2018
Your datasets are named _version1, _version2, etc.
Here's how we'll do it.
%let libraries = "LIB_2015", "LIB_2016", "LIB_2017", "LIB_2018";
proc sql noprint;
select cats(libname, '.', memname)
, input(compress(memname,,'KD'), 8.) as version
into :data separated by ' '
from dictionary.members
where upcase(libname) IN(&libraries.)
AND upcase(memname) LIKE "^_VERSION%" escape '^'
group by libname
having version = max(version)
;
quit;
data want;
set &data. indsname=name;
dsn = name;
run;
This code does the following:
Gets all dataset names from each library that starts with _VERSION. The ^ in the like clause is an escape character that we defined so that we can match _ literally.
Removes all non-digits from the dataset name and converts it to a version number, version. The KD option in the compress() function says to keep only digits from the string.
Keeps only names in each library where version is the highest value
Saves all the dataset names to a single macro variable, &data
&data will store a string of all the relevant datasets you want with the highest version number for each library. For example:
%put &data.;
LIB_2015._VERSION9 LIB_2016._VERSION19 LIB_2017._VERSION12 LIB_2018._VERSION8
The indsname option in the data step will store the full dataset name of each observation. We're saving that to a variable named dsn. This shows where each observation comes from so you can split them out to individual datasets as needed.
One interview question--I didn't get answer of this, please help me to solve this.
In excel file variable name having space in between (e.g- Shop Name), if we will bring excel data to the sas. How we will bring as it is, bcz in sas dataset, space is not allowed between the variable name?
Code:
proc import datafile='/home/roshnigupta16020/test (2).xlsx' out=testexcel dbms=xlsx replace; getnames=yes; run;
Your PROC IMPORT syntax is good.
proc import datafile='/home/roshnigupta16020/test (2).xlsx'
out=testexcel dbms=xlsx replace
;
getnames=yes;
run;
Depending on the setting of the VALIDVARNAME option PROC IMPORT will create different names for the variables.
With VALIDVARNAME=ANY the names will include the spaces. Which means that to use the name in your SAS code you will need to use name literals, like 'Column 1'n.
With other settings, like VALIDVARNAME=V7, then PROC IMPORT will replace invalid characters, like spaces, with underscores. Then the name will not exactly match the column header in the spreadsheet. But the name will be something like Column_1 which is easier to include in your SAS code.
I am wondering what is the cleanest way how to perform a macro loop over a data table outside a data step in order to e.g. read in files from the table have and do some complex analysis for each of the files.
Assume we have a table have containing a set of file names and other meta data:
N filename purpose
1 foo.xls Blue team data
2 bar.xls Read team data
I was thinking of something like
%local lines current_file current_purpose;
proc sql noprint;
select count(*) into: lines from have;
quit;
%do I=1 %to &lines.;
%put --- Process file number &I. ---;
data _null_;
set have;
if _n_=&I. then do;
call symput('current_file',filename);
call symput('current_purpose',purpose);
end;
run;
%put --- ¤t_file. contains &purpose.;
/* Here comes the actual analysis */
%end;
Is this the way how to do it? For me, this does not look like the simplest way.
Related questions:
SAS loop through datasets
SAS let statement: refer to a cell value?
So if you defined a macro name ANALYSIS with input parameters FILENAME and PURPOSE.
%macro analysis(filename,purpose);
/* Here comes the actual analysis */
title &purpose ;
proc import datafile="&filename" ....
%mend;
Then you can use a data step to generate one call to the macro for each observation. You can use CALL EXECUTE, but I find it clearer and easier to debug to just write the code to a file and then %INCLUDE it. Especially when the parameter name matches the variable name in the metadata being used to drive the code generation.
So this step :
filename code temp;
data _null_;
set have;
file code;
put '%analysis(' filename= ',' purpose= :$quote. ')' ;
run;
Will generate a program like:
%analysis(filename=foo.xls,purpose="Blue team data")
%analysis(filename=bar.xls,purpose="Red team data")
Which you can then run using
%include code / source2;
I have a permanent data set called Branch(Branch code, Branch description)
I want to create a format from that dataset (a permanent one)
I can see that this gives me more or less what I want, but now to put it into a permanent dataset?
proc format library = Home.Branch fmtlib;
Run;
What I've tried
proc print data=Home.DataSetToApply
format B_Code $B_CODE_FORMAT.;
RUN;
This works if I manually create the format. I can't seem to create a permanent format directly from a data set.
Could you point me in the right direction?
Resources
Creating a Format from Raw Data or a SASĀ® Dataset
SAS has an autoexec.sas file which executes when you start SAS.
Of course, whether this is a valid option depends on your access rights + the OS you're running.
Have a look here: http://support.sas.com/documentation/cdl/en/hostwin/63285/HTML/default/viewer.htm#win-sysop-autoexec.htm
You could just drop the format code in the auto-executing script then to have your format always available when using SAS.
This will create a dataset with formats in the current library.
proc format cntlout=myfmtdataset lib=mylibname;
select myformatname; *if you want to just pick one or some - leave out select for all;
quit;
This will import that back into formats (later):
proc format cntiln=myfmtdataset lib=myotherlibname;
quit;
That could of course be in your autoexec, or in your regular code.
If you are trying to take a dataset to make a permanent format, you need to set it up like this:
Required:
fmtname = name of format start = starting value (or, single value)
end = ending value (this can be missing if only single values)
label = formatted value
Optional:
type = type of format (n=numeric, c=character, i=informat, j=character informat)
hlo = various options (h=end is highest value, l = start is lowest value,
o=other, m=multilabel, etc.)
Then use the CNTLIN option to load it. SAS documentation has more detail if you need it.
I have 24 datasets that are structured in the same way. By that I mean the same column headers (time, date, price, stock symbol), data set structure etc. I don't wish to append all 24 files since one data set is to big to handle. I named all my data sets with the name "file1 file2 file3 file4....up to file24".
What I want to do is the following:
For example change the date format in all of my 24 files at once;
Be able to extract from each file# a specific stock symbol like 'Dell' and append all the extracted 'Dell' data;
and finally, how can I create a loop that that allows me to change the stock symbol from 'Dell' to another stock symbol in my list like 'Goog'? I would like that loop to do (2) for all of my stock symbols.
To change the date format in a dataset, it might be a bad idea to loop through all the observations. The standard syntax is -
proc datasets library = your_libname nolist;
modify dataset_name;
format variable_name format_name;
quit;
Given that the modify statement does not take multiple SAS files, you will have to wrap it in a macro for all 24 files
%macro modformats();
proc datasets library = <your libname> nolist;
%do i = 1 %to 24;
modify file&i;
format <variable name> <format name>;
%end;
quit;
%mend modformats;
To extract and append all 'Dell' related data, it is best to use views.
For example, you first define a view (note that there is no physical dataset called 'all_files' created here) -
data all_files / view = all_files;
set file1 file2... file24;
run;
and you can then write -
data dell;
set all_files;
where ticker = 'DELL';
run;
This is a prototype of solution. I don't know whether you need to do many symbol changes. Will modify the code upon request. Haven't tested, it should work though.
%macro test();
%do i=1 %to 24;
data file&i;
set file&i;
format date [dateformat]; /*replace with the format you want */
proc append base=unions data=file&i(where=(stock_symbol='Dell'));
data unions;
set unions;
stock_symbol='Goog';
%end;
%mend;
%test(); run;