Macro to open, recode and stack several .csv files in SPSS - loops

I am trying to code a macro that:
Import the columns year, month, id, value and motive from several .csv sequential files to SPSS. The files are named like: DATA_JAN_2010, DATA_FEB_2010 [...], until DATA_DEC_2019. These are the first variables of the csv files (the code I am using to import this variables is provided in the end).
Alter type of columns id to (a11), motive to (a32), if necessary (needed to stack all files).
Stack all these datasets in a new dataset named: DATA_2010_2019.
For now, what I am doing is to import each file separately, stacking and saving two by two. But this is so repetitive and irrational from the efficiency standpoint. Moreover, if in the future I need to import additional variables, I would need to rewrite all the code for each file. That is why I believe that a loop or a macro would be the smartest way of dealing with this repetitive codes. Any help is really appreciated.
A sample of my code so far:
GET DATA /TYPE=TXT
/FILE="C:\Users\luizz\DATA\DATA_JAN_2010.csv"
/ENCODING='Locale'
/DELCASE=LINE
/DELIMITERS=";"
/ARRANGEMENT=DELIMITED
/FIRSTCASE=2
/IMPORTCASE=ALL
/VARIABLES=
YEAR F4.0
MONTH F1.0
ID A11
VALUE F4.0
MOTIVE A8.
CACHE.
EXECUTE.
DATASET NAME JAN_2010 WINDOW=FRONT.
ALTER TYPE MOTIVE (a32).
GET DATA /TYPE=TXT
/FILE="C:\Users\luizz\DATA\DATA_FEB_2010.csv"
/ENCODING='Locale'
/DELCASE=LINE
/DELIMITERS=";"
/ARRANGEMENT=DELIMITED
/FIRSTCASE=2
/IMPORTCASE=ALL
/VARIABLES=
YEAR F4.0
MONTH F1.0
ID A11
VALUE F4.0
MOTIVE A8.
CACHE.
EXECUTE.
DATASET NAME FEB_2010 WINDOW=FRONT.
DATASET ACTIVATE FEB_2010.
ALTER TYPE MOTIVE (a32).
DATASET ACTIVATE JAN_2010.
ADD FILES /FILE=*
/FILE='FEB_2010'.
EXECUTE.
SAVE OUTFILE='C:\Users\luizz\DATA\DATA_JAN_FEV_2010.sav'
/COMPRESSED.

Assuming the parameters for all the files are the same, you can use a macro like this:
define !getfiles ()
!do !yr=2010 !to 2019
!do !mn !in("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC")
GET DATA
/TYPE=TXT /FILE=!concat('"C:\Users\luizz\DATA\DATA_', !mn, '_', !yr, '.csv"')
/ENCODING='Locale' /DELCASE=LINE /DELIMITERS=";" /ARRANGEMENT=DELIMITED
/FIRSTCASE=2 /IMPORTCASE=ALL /VARIABLES=
YEAR F4.0
MONTH F1.0
ID A11
VALUE F4.0
MOTIVE A8.
CACHE.
EXECUTE.
ALTER TYPE id (a11) MOTIVE (a32).
dataset name tmp.
dataset activate gen.
add files /file=* /file=tmp.
exe.
!doend !doend
!enddefine.
The macro as defined will read each of the files and add it to a main file. Before we call the macro we will create the main file:
data list list/YEAR (F4) MONTH (F1) ID (A11) VALUE (F4) MOTIVE (A8).
begin data
end data.
exe.
dataset name gen.
* now we can call the macro.
!getfiles .
* now the data is all combined and we can save it.
SAVE OUTFILE='C:\Users\luizz\DATA\DATA_JAN_FEV_2010.sav' /COMPRESSED.
NOTE: I used your code from the original post in the macro. Please make sure all the definitions are right.

Related

Adding observations and variables to a dataset with .csv files in Stata

I am using Stata 17. I want to add observations and variables in a dataset, I'll name it dataset1.
Dataset1 has the following structure
Date Year urbanname urbancode etc..
2010m1 2010 Beijing 1029 ...
2010m2 2010 Beijing 1029 ...
2010m3 2010 Beijing 1029 ...
...
2015m1 2015 Paris 1030 etc
For different cities and different time periods.
I would like to add observations of other cities (that are not in the rows of dataset1), that I have in different .csv files (dataset2.csv, dataset3.csv, and so on..). Each city has its own dataset.
In each .csv dataset I want to add I have the following variables
the dates
the urbanname
the urbancode
other variables which I do not yet have in dataset1 but that I want to add
What would be your advice on how to proceed ? I thought of doing it with R but dataset1 does not open well in RStudio and the variable Date is not well imported.
You do not describe what you have tried so far and what issues you are encountering but you can do something like this:
use dataset1, clear
* Store in the data in a temporary file
tempfile appendfile
save `appendfile`
foreach dataset in dataset2.csv dataset3.csv {
import delimited `dataset`
append using `appendfile`
save `appendfile`
}

how to create a Salesforce formula that can calculate the highest figure for last month?

in Salesforce, how to create a formula that calculate the highest figure for last month? for example, if I have an object that keeps records that created in Sept, now would like to calculate its max value (in this case, should be 20 on 3/8/2019) in last month's (August). If it's in July, then need to calculate for June. How to construct the right formula expression? Thanks very much!
Date Value
1/9/2019 10
1/8/2019 14
2/8/2019 15
3/8/2019 20
....
30/8/2019 15
You can't do this with normal formulas on records because they "see" only current records (and some related via lookup), not other rows in same table.
You could make another object called "accounting periods" or something like that. Link all these entries to periods (months) in master-detail relationship. You'll then be able to use rollup summary with MAX(). Still not great because you need lookup to previous month to pull it but should give you an idea.
You could make a report that achieves something like that. PREVGROUPVAL will let you do some amazing & scary stuff. https://trailhead.salesforce.com/en/content/learn/projects/rd-summary-formulas/rd-compare-groups Then... if all you need is a report - great. If you really need it saved somewhere - you could look into reporting snapshots & save results in helper object...
If you want to do it without any data model changes like that master-detail or helper object - you could also write some code. Nightly batch job (running daily? only on 1st day of month?) should be pretty simple.
Without code - in a pinch you could make a Flow that queries records from previous month. Bit expensive to run such thing for August every time you add a September record but if you discarded other options...

Looping exports from SAS to .dta format using year month name

I am a STATA user and am therefore not familiar with using SAS. However, all of the files that I require for my current project are stored in SAS format, so I would like to convert them from SAS to .dta format, using SAS code.
The files are stored as monthly sets like so:
1976 - x1976M1, x1976M2, x1976M3.... x1976M12
where 1976 is the folder, and each month, eg. x1976M1, is a file containing the observations for that month and year.
I would like to export those files to .dta format, with the same file structure so that I can easily read them into STATA.
I am not picky about whether or not I can loop over each folder, or will have to loop each folder individually--there are forty folders with 12 files in each.
Therefore, I will need to at least create a loop that goes from m1 to m2 that is appended to the end of the filename, eg. filename1976 + my, where y = [1, 12]. Ideally, I will be able to create a loop that goes from one folder to the next, executing this process via a nested loop.
I hope this is satisfactorily clear! If not, please comment and I will adjust my question accordingly.
Some code given to me by a coworker. Hope this helps anybody with the same issue. This will need to be updated for each individual folder, as it does not loop.
Cheers!
libname name 'G:\folder\'; run`;
%macro subset1976(month=);
data subset1976_&month;
set name.file1976_&month;
keep xyz /*varnames*/
;
if age>=15;
noc2011 = soc4+0;
run;
%mend;
%subset1976(month=jan);
%subset1976(month=feb);
....
%macro export1976(month=);
proc export data=subset1976_&month outfile='G:\lfs\subset1976_&month.dta' replace dbms=stata; run;
%mend;
%export1976(month=jan);
%export1976(month=feb);

Loop to create multiple reports dependent on a date

I need to create multiple reports at once taking into consideration the date column.
For example:
INVOICE COMMENT DATE
------------------------------------
1111 example1 14/04/2018
2222 example2 14/04/2018
3333 example3 15/04/2018
4444 example4 18/04/2018
For day 14/04/2018 I would need to generate two PDF with this data:
1111-example1-14/04/2018
2222-example2-14/04/2018
So basically one for each row with today's date. On 15/04/2018 only one report would be created.
I need SSRS "to loop" between dates and creating a PDF file for each one. Obviously the query would be larger, but this is just and example.
Is this even possible with SSRS or maybe there are other ways to do it?
You can do this with a data-driven subscription. You would need to write a small query that returns all the parameter values you want to use. When it runs, it will create a copy of the report for each value you specify. You can have the resulting PDF emailed or stored in a directory.

Inserting deleted data back into dataset using temp table. Qlikview

I am having trouble working out the logic to this little scenario. Basically I have a data set and it is stored on weeks of the year and each week the previous weeks data is deleted from the data set. What I need to do is copy the previous weeks data before its removed from the data set and then add it back after it's removed. So for example if today is week 33, I need to save this and then next week add it back in. Then next week I need to take week 34 and save that to add in during week 35. A picture explains better than a thousand words so here it is.
As you can see I need the minimum week from the data set before I add the previous weeks data. The real issue that I'm finding is that the dataset can be rerun more than once each week so I would need to keep the temp data set until the next week while extracting the Minimum weeks data set.
It's more logic I'm after here...Hope it makes sense and thanks in Advance.
QVD's are the way forward! Although maybe not as another (very good) answer states.
--Load of data from system
Test:
Load *
, today() as RunDate
From SourceData
--Load of data from QVD
Test:
Load *
From Test.QVD
--Store current load into QVD
Store Test into Test.QVD
This way you only have one QVD of data that continually expands.
Some warnings
You will need to bear in mind that report runs multiple times a week so you will need to cater for duplication in the data load.
QVD loads aren't encrypted, so put your data somewhere safe
when loading from a QVD and then overwriting it, if something goes wrong (the load fails) you will need to recover your QVD so make sure your backup solution is up to the task.
I also added the RunDate field so that it is easier for you to take apart when reviewing as this gives you the same split as storing in separate QVD's would.
Sounds like you should store the data out into weekly QVD files as part of an Extract process and then load the resulting files in.
The logic would be something like the below...
First run (week 34 for week 33 data):
Get data for previous week
Store into file correctly dated - e.g. 2016-33 for week 33 of 2016
Drop this table
Load all QVDs (in this case just 1)
Next week run (week 35 for week 33 & 34 data):
Get data for previous week
Store into file correctly dated - e.g. 2016-34 for week 34 of 2016
Drop this table
Load all QVDs (in this case 2)
Repeat run next week (week 35 for week 33 & 34 again data):
Get data for previous week
Store into file correctly dated - e.g. 2016-34 for week 34 of 2016 (this time overwrite it)
Drop this table
Load all QVDs (in this case 2)
Sensible file naming solve the problem, but if you really actually need to inspect the data to check the week number, you would need to first load all existing QVDs, query the minimum week number and take it from there probably.

Resources