Looping exports from SAS to .dta format using year month name - loops

I am a STATA user and am therefore not familiar with using SAS. However, all of the files that I require for my current project are stored in SAS format, so I would like to convert them from SAS to .dta format, using SAS code.
The files are stored as monthly sets like so:
1976 - x1976M1, x1976M2, x1976M3.... x1976M12
where 1976 is the folder, and each month, eg. x1976M1, is a file containing the observations for that month and year.
I would like to export those files to .dta format, with the same file structure so that I can easily read them into STATA.
I am not picky about whether or not I can loop over each folder, or will have to loop each folder individually--there are forty folders with 12 files in each.
Therefore, I will need to at least create a loop that goes from m1 to m2 that is appended to the end of the filename, eg. filename1976 + my, where y = [1, 12]. Ideally, I will be able to create a loop that goes from one folder to the next, executing this process via a nested loop.
I hope this is satisfactorily clear! If not, please comment and I will adjust my question accordingly.

Some code given to me by a coworker. Hope this helps anybody with the same issue. This will need to be updated for each individual folder, as it does not loop.
Cheers!
libname name 'G:\folder\'; run`;
%macro subset1976(month=);
data subset1976_&month;
set name.file1976_&month;
keep xyz /*varnames*/
;
if age>=15;
noc2011 = soc4+0;
run;
%mend;
%subset1976(month=jan);
%subset1976(month=feb);
....
%macro export1976(month=);
proc export data=subset1976_&month outfile='G:\lfs\subset1976_&month.dta' replace dbms=stata; run;
%mend;
%export1976(month=jan);
%export1976(month=feb);

Related

Adding observations and variables to a dataset with .csv files in Stata

I am using Stata 17. I want to add observations and variables in a dataset, I'll name it dataset1.
Dataset1 has the following structure
Date Year urbanname urbancode etc..
2010m1 2010 Beijing 1029 ...
2010m2 2010 Beijing 1029 ...
2010m3 2010 Beijing 1029 ...
...
2015m1 2015 Paris 1030 etc
For different cities and different time periods.
I would like to add observations of other cities (that are not in the rows of dataset1), that I have in different .csv files (dataset2.csv, dataset3.csv, and so on..). Each city has its own dataset.
In each .csv dataset I want to add I have the following variables
the dates
the urbanname
the urbancode
other variables which I do not yet have in dataset1 but that I want to add
What would be your advice on how to proceed ? I thought of doing it with R but dataset1 does not open well in RStudio and the variable Date is not well imported.
You do not describe what you have tried so far and what issues you are encountering but you can do something like this:
use dataset1, clear
* Store in the data in a temporary file
tempfile appendfile
save `appendfile`
foreach dataset in dataset2.csv dataset3.csv {
import delimited `dataset`
append using `appendfile`
save `appendfile`
}

Macro to open, recode and stack several .csv files in SPSS

I am trying to code a macro that:
Import the columns year, month, id, value and motive from several .csv sequential files to SPSS. The files are named like: DATA_JAN_2010, DATA_FEB_2010 [...], until DATA_DEC_2019. These are the first variables of the csv files (the code I am using to import this variables is provided in the end).
Alter type of columns id to (a11), motive to (a32), if necessary (needed to stack all files).
Stack all these datasets in a new dataset named: DATA_2010_2019.
For now, what I am doing is to import each file separately, stacking and saving two by two. But this is so repetitive and irrational from the efficiency standpoint. Moreover, if in the future I need to import additional variables, I would need to rewrite all the code for each file. That is why I believe that a loop or a macro would be the smartest way of dealing with this repetitive codes. Any help is really appreciated.
A sample of my code so far:
GET DATA /TYPE=TXT
/FILE="C:\Users\luizz\DATA\DATA_JAN_2010.csv"
/ENCODING='Locale'
/DELCASE=LINE
/DELIMITERS=";"
/ARRANGEMENT=DELIMITED
/FIRSTCASE=2
/IMPORTCASE=ALL
/VARIABLES=
YEAR F4.0
MONTH F1.0
ID A11
VALUE F4.0
MOTIVE A8.
CACHE.
EXECUTE.
DATASET NAME JAN_2010 WINDOW=FRONT.
ALTER TYPE MOTIVE (a32).
GET DATA /TYPE=TXT
/FILE="C:\Users\luizz\DATA\DATA_FEB_2010.csv"
/ENCODING='Locale'
/DELCASE=LINE
/DELIMITERS=";"
/ARRANGEMENT=DELIMITED
/FIRSTCASE=2
/IMPORTCASE=ALL
/VARIABLES=
YEAR F4.0
MONTH F1.0
ID A11
VALUE F4.0
MOTIVE A8.
CACHE.
EXECUTE.
DATASET NAME FEB_2010 WINDOW=FRONT.
DATASET ACTIVATE FEB_2010.
ALTER TYPE MOTIVE (a32).
DATASET ACTIVATE JAN_2010.
ADD FILES /FILE=*
/FILE='FEB_2010'.
EXECUTE.
SAVE OUTFILE='C:\Users\luizz\DATA\DATA_JAN_FEV_2010.sav'
/COMPRESSED.
Assuming the parameters for all the files are the same, you can use a macro like this:
define !getfiles ()
!do !yr=2010 !to 2019
!do !mn !in("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC")
GET DATA
/TYPE=TXT /FILE=!concat('"C:\Users\luizz\DATA\DATA_', !mn, '_', !yr, '.csv"')
/ENCODING='Locale' /DELCASE=LINE /DELIMITERS=";" /ARRANGEMENT=DELIMITED
/FIRSTCASE=2 /IMPORTCASE=ALL /VARIABLES=
YEAR F4.0
MONTH F1.0
ID A11
VALUE F4.0
MOTIVE A8.
CACHE.
EXECUTE.
ALTER TYPE id (a11) MOTIVE (a32).
dataset name tmp.
dataset activate gen.
add files /file=* /file=tmp.
exe.
!doend !doend
!enddefine.
The macro as defined will read each of the files and add it to a main file. Before we call the macro we will create the main file:
data list list/YEAR (F4) MONTH (F1) ID (A11) VALUE (F4) MOTIVE (A8).
begin data
end data.
exe.
dataset name gen.
* now we can call the macro.
!getfiles .
* now the data is all combined and we can save it.
SAVE OUTFILE='C:\Users\luizz\DATA\DATA_JAN_FEV_2010.sav' /COMPRESSED.
NOTE: I used your code from the original post in the macro. Please make sure all the definitions are right.

SAS DO LOOP with specific dates

I want to create a data set where I only want to keep 5 specific dates.
So my &date is 31mar2020 and &enddate is 31mar2025 and I only want to keep 31mar every year until 2025.
With my code below it creates dates for everyday up to 31mar2025 and thats to much so I only want to keep 5 specific dates.
How can i do that?
Thank you
DATA LOOP;FORMAT ROLL_BASE_DT DATE9.;DO ROLL_BASE_DT =&DATE TO &ENDdate;OUTPUT;END;RUN;
enter code here
enter code here
You can use commas in the DO statement to list multiple values.
do date='31mar2021'd,'31mar2022'd,'31mar2023'd,'31mar2024'd,'31mar2025'd;
...
end;
You could loop over the YEAR value instead.
do year=2021 to 2025;
date=mdy(3,31,year);
...
end;
You could use INTNX() to increment the date by YEAR. You can use INTCK() to figure out how many times to run the loop.
do index=0 to intck('year',&DATE,&ENDdate);
date=intnx('year',&date,index,'s');
...
end;
If it's just the 5 dates you want, you could use the cards input (I know of it but have never used it personally).
Alternatively, rather than using a loop just set the values individually with the output keyword after each time you set the value. That should do it.

How many Observations in SAS Sample Data Output

I have this sample SAS output and need to find out how many observations and compute the t-statistic for Beta1. Can anyone help me get started on how to go about doing this?
Sample SAS Output
To get the number of obs in a dataset you can run this code.
data _null_;
put nobs=;
stop;
set sashelp.class nobs=nobs;
run;
A proc reg will give you all the summary statistics you need, namely, t-value, beta1 and beta null, which are your intercepts of your regression equation.
Here is a simple easy to follow example http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_reg_sect003.htm

Auto-create folders from excel file

I want to create about 1000 folder, but I think the best way isn't "Create new folder - rename it". I have a excel file (.xls) with a column full with the preferable names..
eg
names.xls
column 1 = name1, name2, name3 , name4 ,name5 , name6 etc
column 2 = amount1 , amount2, amount 3, amount 4. etc
So I want a folder with a name test, and inside it I want all the folders with the names of column 1.. Is it possible and how that think can be done? I think the best language for this job is C, or not? Is excel file a problem? May I insert the 1st column from xls file to an txt file?
The fastest way to do this might be to this:
Insert a column before column 1 with the command to create a directory (md)
Fill it downwards for all the rows with values in the name column (now the 2nd column)
Copy the two columns to a textfile
Search and remove whitespaces if needed
Save the textfile as .bat or .cmd and execute it in the correct directory
If this is a one time thing writing a program could be overkill, but if you need to do it often go with VBS as suggested above.

Resources