How to create multiple datasets in SAS using loops - loops

proc iml;
use rdata3;
read all var _all_ into pp;
close rdata3;
do i = 1 to 1050;
perms = allperm(pp[i, ]);
create pp&i from perms[colname= {"Best" "NA1" "NA2" "Worst"}];
append from perms;
close pp&i;
end;
I will like to create multiple datasets in SAS using the above code through a do loop. However, i cant seem to change the name of each dataset using the &i indicator. Can anyone help me change my code to allow me to create multiple datasets? Or are there any other alternatives on how to create multiple datasets from matrix through loops? Thanks in advance.

You don't want to use macro variables you want to use the features of IML. However you will be creating an awful lot of data sets.
data rdata3;
x = 1;
y = 2;
a = 4;
b = 5;
output;
output;
run;
proc iml;
use rdata3;
read all var _all_ into pp;
close rdata3;
do i = 1 to nrow(pp);
outname = cats('pp',putn(i,'z5.'));
perms = allperm(pp[i, ]);
create (outname) from perms[colname= {"Best" "NA1" "NA2" "Worst"}];
append from perms;
close (outname);
end;
quit;
You can add an ID variable to PERMS and append all versions of PERMS into one data set. I'm not sure I used the best IML technique, I know just enough IML to be dangerous.
proc iml;
use rdata3;
read all var _all_ into pp;
close rdata3;
perms = j(1,5,0);
create PP_out from perms[colname= {'ID' "Best" "NA1" "NA2" "Worst"}];
do i = 1 to nrow(pp);
perms = allperm(pp[i, ]);
perms = j(nrow(perms),1,i)||perms;
append from perms;
end;
close PP_out;
quit;

Related

SAS: Looping Through Folders to Import and Export Multiple Files

Thanks in advance for any and all suggestions.
I am working in SAS for the first time to complete a (theoretically) simple task. I have a parent folder in a Windows directory which contains several sub-folders. The sub-folders are not systematically named. For example, if the parent folder is called "W:/Documents/ParentFolder/", then the sub-folders might be "W:/Documents/ParentFolder/ABC1D26/" and "W:/Documents/ParentFolder/HG34A/".
Each sub-folder contains several SAS datasets. In any particular sub-folder, some of the SAS datasets have the .sas7bdat extension and others have the .sd2 extension. Furthermore, no two sub-folders necessarily have the same number of datasets, and the datasets are not systematically named either.
I would like to write a program in SAS which looks inside each sub-folder, loads any .sas7bdat or .sd2 datasets it finds, and exports the dataset into a different folder as a .dta file.
There are too many SAS datasets in each sub-folder to do this task manually for each dataset, but there are not so many sub-folders that I cannot feed the sub-folder names to SAS manually. Below is a commented version of my attempt at a program which completes this task. Unfortunately, I encounter many errors, no doubt due to my inexperience with SAS.
For example, SAS gives the following errors: "ERROR: Invalid logical name;" "ERROR: Error in the FILENAME statement;" and "ERROR: Invalid DO loop control information;" among others.
Can anyone offer any advice?
%macro sas_file_converter();
/* List the sub-folders containing SAS files in the parent folder */
%let folder1 = W:\Documents\ParentFolder\ABC1D26;
%let folder2 = W:\Documents\ParentFolder\HG34A;
/* Start loop over the sub-folders. In each sub-folder, identify all the files, extract the file names, import the files, and export the files. */
%do folder_iter = 1 %to 2;
/* Define the sub-folder that is the focus of this iteration of the loop */
filename workingFolder "&&folder&folder_iter..";
/* Extract a list of datasets in this sub-folder */
data datasetlist;
length Line 8 dataset_name $300;
List = dopen('workingFolder');
do Line = 1 to dnum(List);
dataset_name = tranwrd(tranwrd(lowcase(trim(dread(List,Line))),".sas7bdat",""),".sd2","");
output;
end;
drop List Line;
run;
/* Get number of datasets in this sub-folder */
proc sql nprint;
select count(*)
into :datasetCount
from WORK.datasetlist;
quit;
/* Loop over datasets in the sub-folder. In each iteration of the loop, load the dataset and export the dataset. */
%do dataset_iter = 1 %to &datasetCount.;
/* Get the name of the dataset which is the focus of this iteration */
data _NULL_;
set WORK.DATASETLIST (firstobs=&dataset_iter. obs=&dataset_iter.);
call symput("inMember",strip(dataset_name));
end;
/* Set the libname */
LIBNAME library '&folder&folder_iter..';
/* Load the dataset */
data new;
set library.&inMember.;
run;
/* Export the dataset */
proc export data=library.&inMember.
file = "W:\Documents\OutputFolder\&inMember..dta"
dbms = stata replace;
run;
%end;
%end;
%mend;
Thanks very much for your helpful suggestions. I used the following program to perform this task. It is largely based on Richard's example. I'm posting it here for the benefit of future readers; Richard's example includes additional code that may help you understand what this program does.
Additional files/folders can be accommodated by adding them to the "%let folders" line. (I write many file/folder names here.)
Note that I separate the sub-folders with three dashes ("---") because some of the files and sub-sub-folders have spaces in their names. Note also that for the .sd2 files, I was able to simply replace the instances of "sas7bdat" with "sd2" and the program worked fine.
Thanks again.
%let inputfolder = W:\Documents\ParentFolder;
%let folders = ABC1D26---HG34A---Sub Folder\ZH323;
%let exportfolder = W:\Documents\ExportFolder;
data _null_;
do findex = 1 to countw("&folders.","---");
folder = scan("&folders", findex, "---");
path = catx("/", "&dataroot.", folder);
call execute ('libname user ' || quote(trim(path)) || ';');
length fileref $8;
call missing(fileref);
rc = filename(fileref, path);
did = dopen(fileref);
do dindex = 1 to dnum(did);
filename = lowcase(dread(did,dindex));
if scan(filename,-1) ne 'sas7bdat' then continue;
xptfilename = tranwrd(filename, '.sas7bdat', '.dta')
xptfilepath = catx("\", "&exportpath", folder, xptfilename);
datasetname = tranwrd(filename, '.sas7bdat', '');
sascode = 'PROC EXPORT data=' || trim(datasetname) || " replace file=" || quote(trim(xptfilepath)) || " dbms=stata; run;";
call execute (trim(sascode));
end;
did = dclose(did);
call execute ('libname user clear;');
rc = filename(fileref);
end;
run;
You can perform all the code generation in a DATA Step and submit via CALL EXECUTE. The only part of the program that would be macro related is specifying the sas data root folder, the names of the sub-folders to search and the export path.
The program could be very similarly macro coded, but could be tougher to debug, and would require %sysfunc wrappers around the function calls.
Example:
/* Create some sample data in some example folders */
%let workpath = %sysfunc(pathname(WORK));
%let name = %sysfunc(dcreate(ABC, &workpath));
%let name = %sysfunc(dcreate(DEF, &workpath));
libname user "&workpath./ABC";
data one two three four five;
set sashelp.class;
run;
libname user "&workpath./DEF";
data six seven eight nine ten;
set sashelp.class;
run;
libname user clear;
/* export all data sets in folders to liked named export files */
%let dataroot = &workpath;
%let folders = ABC DEF;
%let exportpath = c:\temp;
data _null_;
do findex = 1 to countw("&folders");
folder = scan("&folders", findex);
path = catx("/", "&dataroot.", folder);
call execute ('libname user ' || quote(trim(path)) || ';');
length fileref $8;
call missing(fileref);
rc = filename(fileref, path);
did = dopen(fileref);
do dindex = 1 to dnum(did);
filename = dread(did,dindex);
if scan(filename,-1) ne 'sas7bdat' then continue;
xptfilename = tranwrd(filename, '.sas7bdat', '.dta');
xptfilepath = catx('/', "&exportpath", xptfilename);
datasetname = tranwrd(filename, '.sas7bdat', '');
sascode = 'PROC EXPORT data=' || trim(datasetname)
|| " replace file=" || quote(trim(xptfilepath))
|| " dbms=stata;"
;
call execute (trim(sascode));
end;
did = dclose(did);
call execute ('run; libname user clear;');
rc = filename(fileref);
end;
run;

Dynamic file paths for Snowflake stages

I am copying data from a Snowflake table into an S3 external stage:
COPY INTO '#my_stage/my_folder/my_file.csv.gz' FROM (
SELECT *
FROM my_table
)
However this code runs daily and I don't want to overwrite my_file.csv.gz but rather keep all the historical versions. However I haven't found a way to create dynamic paths:
SET stage_name=CONCAT('#my_stage/my_folder/my_file', '_date.csv.gz');
COPY INTO $stage_name FROM (
SELECT *
FROM my_table
);
COPY INTO IDENTIFIER($stage_name) FROM (
SELECT *
FROM my_table
);
None of the later 2 queries work!
My question: How can I create dynamic Stage paths in Snowflake? Thanks
Here's a stored procedure you can use and modify. Note that the line with the comment to modify your copy into statement uses backticks instead of single or double quotes. In JavaScript, that allows use of single or double quotes in the string, multi-line constants, and replacement tokens in the form ${variable_name}
create or replace procedure COPY_TO_STAGE(PATH string)
returns variant
language javascript
as
$$
class Query{
constructor(statement){
this.statement = statement;
}
}
// Start of main function
var out = {};
// Change your copy into statement here.
var q = getQuery(`copy into '${PATH}' from (select * from my_table);`);
if (q.resultSet.next()) {
out["rows_unloaded"] = q.resultSet.getColumnValue("rows_unloaded");
out["input_bytes"] = q.resultSet.getColumnValue("input_bytes");
out["output_bytes"] = q.resultSet.getColumnValue("output_bytes");
} else {
out["Error"] = "Unknown error";
}
return out;
// End of main function
function getQuery(sql){
cmd1 = {sqlText: sql};
var query = new Query(snowflake.createStatement(cmd1));
query.resultSet = query.statement.execute();
return query;
}
$$;
Once you define it, you can use SQL variables as the input if you want:
SET stage_name=CONCAT('#my_stage/my_folder/my_file', '_date.csv.gz');
call copy_to_stage($stage_name);
This won't work. Unfortunately using variables for identifiers does not work for stages. You might need to create a Stored procedure with Dynamic SQL:
https://docs.snowflake.com/en/sql-reference/stored-procedures-usage.html#label-example-of-dynamic-sql-in-stored-procedure
So you can just call this procedure every day or generating a SP with several parameters for the path (Stage), the query which will be executed and the target filename.

how to do IN with an array in SAS

I am defined four variables here and each of the variables with different number of ICD10 codes:
%LET DX_27800_CODE = 'E6609', 'E661', 'E668', 'E669';
%LET DX_27801_CODE = 'E6601';
%LET DX_2859_CODE = 'D649';
%LET DX_6202_CODE = 'N8320', 'N8329';
now I want to use create an array that can easy mapping those variables that with my icd 10 table columns so that I could assign flags variables with it.
the regular way would be:
data test; set input;
if (dx1 in ( &DX_27800_CODE) or dx2 in (&DX_27800_CODE) or dx3 in (&DX_27800_CODE))
then dx_27800 = 1; else dx_27800 =0;
run;
in the regular way I would need to do this procedure four times to get all four flags variable. So I'm wondering if it could be done by using array.
data test; set input;
array dx_code10 [4] &DX_27800_CODE &DX_27801_CODE &DX_2859_CODE &DX_6202_CODE;
ARRAY DX_VARIABLE[4] DX_27800 DX_27801 DX_2859 DX_6202;
DO I = 1 TO DIM(dx_code10);
IF (DX1 IN (DX_CODE10[I]) OR DX2 IN (DX_CODE10[I]) OR DX3 IN (DX_CODE10[I]))
THEN DX_VARIABLE[I] = 1;
ELSE DX_VARIABLE[I] = 0;
END;
END;
RUN;
But seems like it can't be done by this way. Please help me to solve this problem. thanks.
I think a better approach is to use formats. I'd rather have those DX codes in a spreadsheet or a text file or something, and then input that to make the formats, but even with the not-best-practice %LETs, you can still use a format solution.
Approach is to make a format that turns each of those DX code pairs into a value that returns the dx value (the 27800, 27801, etc.); then use that to drive how you assign the followup array.
%LET DX_27800_CODE = 'E6609', 'E661', 'E668', 'E669';
%LET DX_27801_CODE = 'E6601';
%LET DX_2859_CODE = 'D649';
%LET DX_6202_CODE = 'N8320', 'N8329';
proc format;
value $dxcode
&dx_27800_code = '27800'
&dx_27801_code = '27801'
&dx_2859_code = '2859'
&dx_6202_code = '6202'
other=' '
;
quit;
data input;
input dx1 $;
datalines;
E6601
E6609
E6608
E661
E668
D649
D650
N8320
E669
N8329
;;;;
run;
data want;
set input;
array dx_codes[4] dx_27800 dx_27801 dx_2859 dx_6202;
dx_code_val = put(dx1,$dxcode5.);
do _i = 1 to dim(dx_codes);
if dx_code_val = scan(vname(dx_codes[_i]),2,'_') then dx_codes[_i]=1;
else dx_codes[_i]=0;
end;
run;
For your specific example you could use FINDW() function instead of the IN operator. Turn your code lists into delimited strings instead.
%LET DX_27800_CODE = E6609,E661,E668,E669;
%LET DX_27801_CODE = E6601 ;
%LET DX_2859_CODE = D649 ;
%LET DX_6202_CODE = N8320,N8329;
data test;
set input;
array dx_code_list (4) $200 _temporary_ ("&dx_27800_code" "&dx_27801_code" "&dx_2859_code" "&dx_6202_code");
array dx_variable (4) dx_27800 dx_27801 dx_2859 dx_6202;
array dx dx1-dx3 ;
do i = 1 to dim(dx_variable);
dx_variable(i)=0;
do j=1 to dim(dx) while (dx_variable(i)=0);
if findw(dx_code_list(i),dx(j),',','it') then dx_variable(i)=1;
end;
end;
drop i j;
run;
So if I make some sample data.
data input ;
length dx1-dx3 $7 ;
input dx1 - dx3 ;
cards;
E6609 E661 .
E668 E669 .
E6601 . .
D649 N8320 N8329
. . .
;
I get this result:

Output ZIP dataset

I am looking for some help with creating a macro utilizing an array as well as DO and IF statements for subsetting. Within my macro statement I am trying to look across columns for variables, and if the variable has a specific diagnosis code to then create a new variable and label as 1, and if not label all others as 0 to create one new data set based on that new variable, sort that data set, and append it to other data sets (because the input data sets are broken down quarterly), thus creating one final data set that I can then export (preferably as a newly created ZIP file to keep storage space down). I am using SAS 9.4/ Enterprise Guide 7.1.
Code:
OPTIONS MERROR SERROR SOURCE MLOGIC SYMBOLGEN MINOPERATOR OBS=MAX;
%MACRO DIAGXX(a,b);
DATA NEW;
SET x.&a(KEEP= PATID DIAG1-DIAG5);
ARRAY &b{5} $ DIAG1-DIAG5;
DO I = 1 TO 5;
IF &b{I} IN ("1630" "1631" "1638" "1639") THEN MESO = 1;
ELSE MESO = 0;
END;
DROP I;
RUN;
PROC SORT DATA=NEW NODUPKEY;
BY PATID;
WHERE MESO=1;
RUN;
PROC APPEND BASE=ALLDATA1 DATA=NEW FORCE;
RUN;
PROC EXPORT
DATA=ALLDATA1
OUTFILE= "C:\x\x\DIAGNOSIS EXPORT\MACRO DIAGXX MESO.CSV"
REPLACE
DBMS=CSV;
RUN;
%MEND DIAGXX;
%DIAGXX(Q1,MESTH);
%DIAGXX(Q2,MESTH);
You probably want to create your MESO flag this way so that the presence of any of the codes in any of the variables in the array will set MESO to true and it will be false when the codes never appear in any of the variables.
MESO = 0;
DO I = 1 TO 5;
IF &b{I} IN ("1630" "1631" "1638" "1639") THEN MESO = 1;
END;
If you want to get fancy you might save a little time by stopping the loop once the code is found.
DO I = 1 TO 5 WHILE (MESO=0);

Query to fetch data between two characters in informix

I have a value in informix which is like this :
value AMOUNT: <15000000.00> USD
I need to fetch 15000000.00 afrom the above.
I am using this query to fetch the data between <> as workaround
select substring (value[15,40]
from 1 for length (value[15,40]) -5 )
from tablename p where value like 'AMOUNT%';
But, this is not generic as the lenght may vary.
Please help me with a generic query for this, fetch the data between <>.
The database I am using is Informix version 9.4.
It's a diabolical problem, created by whoever chose to break one of the fundamental rules of database design: that the content of a column should be a single, indivisible value.
The best solution would be to modify the table to contain a value_descr = "AMOUNT", a value = 15000000.00, and a value_type = "USD", and ensure that the incoming data is stored in that fashion. Easier said than done, I know.
Failing that, you'll have to write a UDR that parses the string and returns the numeric portion of it. This would be feasible in SPL, but probably very slow. Something along the lines of:
CREATE PROCEDURE extract_value (inp VARCHAR(255)) RETURNING DECIMAL;
DEFINE s SMALLINT;
DEFINE l SMALLINT;
DEFINE i SMALLINT;
FOR i = 1 TO LENGTH(inp)
IF SUBSTR(inp, i, 1) = "<" THEN
LET s = i + 1;
ELIF SUBSTR(inp, i, 1) = ">" THEN
LET l = i - s - 1;
RETURN SUBSTR(inp, s, l)::DECIMAL;
END IF;
END FOR;
RETURN NULL::DECIMAL; -- could not parse out number
END PROCEDURE;
... which you would execute thus:
SELECT extract_value(p.value)
FROM tablename AS p
WHERE p.value LIKE 'AMOUNT%'
NB: that procedure compiles and produces output in my limited testing on version 11.5. There is no validation done to ensure the string between the <> parses as a number. I don't have an instance of 9.4 handy, but I haven't used any features not available in 9.4 TTBOMK.

Resources