I am defined four variables here and each of the variables with different number of ICD10 codes:
%LET DX_27800_CODE = 'E6609', 'E661', 'E668', 'E669';
%LET DX_27801_CODE = 'E6601';
%LET DX_2859_CODE = 'D649';
%LET DX_6202_CODE = 'N8320', 'N8329';
now I want to use create an array that can easy mapping those variables that with my icd 10 table columns so that I could assign flags variables with it.
the regular way would be:
data test; set input;
if (dx1 in ( &DX_27800_CODE) or dx2 in (&DX_27800_CODE) or dx3 in (&DX_27800_CODE))
then dx_27800 = 1; else dx_27800 =0;
run;
in the regular way I would need to do this procedure four times to get all four flags variable. So I'm wondering if it could be done by using array.
data test; set input;
array dx_code10 [4] &DX_27800_CODE &DX_27801_CODE &DX_2859_CODE &DX_6202_CODE;
ARRAY DX_VARIABLE[4] DX_27800 DX_27801 DX_2859 DX_6202;
DO I = 1 TO DIM(dx_code10);
IF (DX1 IN (DX_CODE10[I]) OR DX2 IN (DX_CODE10[I]) OR DX3 IN (DX_CODE10[I]))
THEN DX_VARIABLE[I] = 1;
ELSE DX_VARIABLE[I] = 0;
END;
END;
RUN;
But seems like it can't be done by this way. Please help me to solve this problem. thanks.
I think a better approach is to use formats. I'd rather have those DX codes in a spreadsheet or a text file or something, and then input that to make the formats, but even with the not-best-practice %LETs, you can still use a format solution.
Approach is to make a format that turns each of those DX code pairs into a value that returns the dx value (the 27800, 27801, etc.); then use that to drive how you assign the followup array.
%LET DX_27800_CODE = 'E6609', 'E661', 'E668', 'E669';
%LET DX_27801_CODE = 'E6601';
%LET DX_2859_CODE = 'D649';
%LET DX_6202_CODE = 'N8320', 'N8329';
proc format;
value $dxcode
&dx_27800_code = '27800'
&dx_27801_code = '27801'
&dx_2859_code = '2859'
&dx_6202_code = '6202'
other=' '
;
quit;
data input;
input dx1 $;
datalines;
E6601
E6609
E6608
E661
E668
D649
D650
N8320
E669
N8329
;;;;
run;
data want;
set input;
array dx_codes[4] dx_27800 dx_27801 dx_2859 dx_6202;
dx_code_val = put(dx1,$dxcode5.);
do _i = 1 to dim(dx_codes);
if dx_code_val = scan(vname(dx_codes[_i]),2,'_') then dx_codes[_i]=1;
else dx_codes[_i]=0;
end;
run;
For your specific example you could use FINDW() function instead of the IN operator. Turn your code lists into delimited strings instead.
%LET DX_27800_CODE = E6609,E661,E668,E669;
%LET DX_27801_CODE = E6601 ;
%LET DX_2859_CODE = D649 ;
%LET DX_6202_CODE = N8320,N8329;
data test;
set input;
array dx_code_list (4) $200 _temporary_ ("&dx_27800_code" "&dx_27801_code" "&dx_2859_code" "&dx_6202_code");
array dx_variable (4) dx_27800 dx_27801 dx_2859 dx_6202;
array dx dx1-dx3 ;
do i = 1 to dim(dx_variable);
dx_variable(i)=0;
do j=1 to dim(dx) while (dx_variable(i)=0);
if findw(dx_code_list(i),dx(j),',','it') then dx_variable(i)=1;
end;
end;
drop i j;
run;
So if I make some sample data.
data input ;
length dx1-dx3 $7 ;
input dx1 - dx3 ;
cards;
E6609 E661 .
E668 E669 .
E6601 . .
D649 N8320 N8329
. . .
;
I get this result:
Related
I need to do inner join with a dataset which has date and month in its name, i.e.
Account_2019_10 (as in Oct 2019).
I need to perform this inner join in a loop for each month from a specific month-year till today's month-year.(i.e. from Sept 2019 till July 2020). Considering the dataset has the month & year in the above format (2019_10 for Oct 2019), how would i perform this loop and append all the results in a group for that month-year?
To change the name of the dataset being referenced you will need to use some code generation. Typically just by using macro variables in place of the dataset name(s).
To loop over dates using an offset value and the INTNX() function. You can use INTCK() to determine how many months to generate.
data _null_;
start = '01OCT2019'd ;
end = '01JUL2020'd ;
length name $32 names $1000;
do offset=0 to intck('month',start,end);
date=intnx('month',start,offset);
name='account_'||translate(put(date,yymm7.),'_','M');
names=catx(' ',names,name);
end;
call symputx('names',names);
run;
Now that you have this list of dataset names you can use it in your code to combine the datasets.
data all;
set &names ;
run;
If your monthly tables do not actually have a variable already that indicates the month you can add one by using the INDSNAME= option of the SET statement. Note that the variable created by that option is not saved so you need to copy the value.
data all;
length dsname $41 ;
set &names indsname=dsname;
month = dsname;
run;
Use the SQL data dictionary to identify the data sets containing a yyyy_mm construct.
Select them into a macro variable that will be used to combine all the data sets in a SET statement with INDSNAME option.
Example:
%macro makefakedata();
%local year month amount;
%do year = 2019 %to 2020;
%do month = 1 %to 12;
data work.account_&year._%sysfunc(putn(&month,z2.));
do id = 1 to 10;
amount = 100 * %sysfunc(monotonic()) + id;
output;
end;
run;
%end;
%end;
%mend;
data work.foo_bar;
set sashelp.class;
run;
%makefakedata;
ods listing;
proc sql noprint;
select
catx('.', libname, memname) as dataset
, input (
cats ( substr ( memname, index(memname,'_')+1 ) , '_01' )
, ? YYMMDD10.
) as month
into
:datasets separated by ' '
, :months separated by ' '
from
dictionary.tables
where libname = 'WORK'
and index(memname,'_')
having
month
;
%put &=datasets;
%put &=months;
data all_month_named_data;
set &datasets indsname=from;
source = from;
month = input (
cats ( substr ( source, index(source,'_')+1 ) , '_01' )
, YYMMDD10.);
format month yymm7.;
run;
I have a SAS data step statement –
Data work.CABGothers2;
set work.CABGothers1;
IF proc_p in (a HUGE LIST OF ICD10 CODES) and PDDCABG = 1
and TypeofCABG_PDDTemp = . then TypeofCABG_PDDTemp = 4;
IF proc2 in (a HUGE LIST OF ICD10 CODES) and PDDCABG = 1
and TypeofCABG_PDDTemp = . then TypeofCABG_PDDTemp = 4;
IF proc3 in (a HUGE LIST OF ICD10 CODES) and PDDCABG = 1
and TypeofCABG_PDDTemp = . then TypeofCABG_PDDTemp = 4;
...
run;
This IF-THEN section goes on 21 times, so you can imagine how HUGE and cumbersome this sas code file gets, especially when it comes to any modifications to the ICD10 code list. It would have to be changed individually in all the proc1,proc2... columns.
Also, the ICD10 lists are very huge with over 7000 codes, I was wondering if someone could show me a better SAS code that might take as input a column of data (ICD10 codes) from a file.
I would like a proc sql or Data step procedure. Whichever is more efficient.
Current code-
Data work.CABGothers2;
set work.CABGothers1;
IF proc_p in (a HUGE LIST OF ICD10 CODES) and PDDCABG = 1
and TypeofCABG_PDDTemp = . then TypeofCABG_PDDTemp = 4;
run;
UPDATE--
I got this to work if the list is small...however I have a column with 8000 unique ICD10 codes. So I get an error message as shown below.
proc sql;
select quote(icd10) into :cabgvalexcl separated by ','
from newlink.cabgvalexcl2019;
quit;
Data work.test1;
set WORK.cabgpddcol;
IF proc_p in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1;
IF oproc1 in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1 ;
IF oproc2 in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1;
IF oproc3 in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1 ;
IF oproc4 in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1;
run;
**> ERROR message- ERROR: The length of the value of the macro variable
CABGVALEXCL (65540) exceeds the maximum length (65534). The value has
been
truncated to 65534 characters.**
UPDATE --
eXAMPLE (JUST FEW ROWS) of ONLY 1 column (I do not have multiple columns. I did that in the macro example because macro variable was running out of max space.) containing ICD10 codes and the data file in which I have to tag rows that have any of the ICD10 codes -
OUTPUT table-
LOgic - If any of the ICD10 codes listed in cabgvalexcl2019 (shown here in RED) is found in the table CABGOTHERS1, create a column called - EXCLUDE - and put a value of 1 for that record.
Here's a hash-based example. It doesn't use macro variables, so it should work for any number of ICD10 codes:
data cabgvalexcl2019;
input (icd1-icd3) (:$2.);
datalines;
1 2 3
4 5 6
7 8 9
;
run;
/*Generate some dummy data*/
data cabgpddcol;
array keys[*] $2 proc_p oproc1-oproc20;
call streaminit(1); /*Set random number seed*/
do i = 1 to 20;
do j = 1 to dim(keys);
keys[j] = put(int(rand('uniform') * 11 + 9), 2.); /*Chosen so we get a few rows with no exclusion codes*/
end;
PDDCABG = rand('uniform') < 0.75;
output;
end;
drop i j;
run;
/* CABGval_Excl = Identify CABG+VALVE exclusions which are "CABG OTHERS". This is the 2019 CABG+VALVE exclusion list. */
/* If the RECORD IN following table has CABGVAL_Excl = 1 then it is a CABG+valve WITH EXCLUSION*/
Data work.CABGval_Excl; /* CABG OTHERS prior to refinement into non-iso CABG WITH Valve and non-iso CABG WITHOUT Valve */
/*Create hash object to hold list of ICD codes*/
length icd $ 2;
if _n_ = 1 then do;
declare hash h();
rc = h.definekey('icd');
rc = h.definedone();
do until(eof);
set cabgvalexcl2019 end = eof;
/*Consider using an array here if you have lots of ICD columns*/
do icd = icd1, icd2, icd3;
rc = h.add();
end;
end;
end;
set cabgpddcol;
/*Loop through all the keys and stop if we find one in the hash*/
array keys[*] proc_p oproc1-oproc20;
rc = -1;
do i = 1 to dim(keys) until(rc = 0);
rc = h.find(key:keys[i]); /*This sets rc = 0 if a match is found*/
end;
drop i rc icd:;
CABGVAL_Excl = rc ne 0 and PDDCABG = 1;
run;
Constructing the hash object is a little bit fiddly if you have multiple columns holding all the distinct ICD10 codes you care about - if they're all in one column then there's a simpler way of doing this:
declare hash h(dataset:'cabgvalexcl2019');
rc = h.definekey('icd');
rc = h.definedone();
Does anybody knows how to compress this long SAS code with some sort of looping technique?
DATA CDS; SET CDS;
retain find131 find132 find133 find134 find135 find136 find137 find138 find139 find140;
if _n_=1
THEN DO;
find131 = prxPARSE('/\d\d\d\d\d\d\d\.\d\d/');
find132 = prxPARSE('/\d\d\d\d\d\d\d\.\d\d/');
find133 = prxPARSE('/\d\d\d\d\d\d\d\.\d\d/');
find134 = prxPARSE('/\d\d\d\d\d\d\d\.\d\d/');
find135 = prxPARSE('/\d\d\d\d\d\d\d\.\d\d/');
find136 = prxPARSE('/\d\d\d\d\d\d\d\.\d\d/');
find137 = prxPARSE('/\d\d\d\d\d\d\d\.\d\d/');
find138 = prxPARSE('/\d\d\d\d\d\d\d\.\d\d/');
find139 = prxPARSE('/\d\d\d\d\d\d\d\.\d\d/');
find140 = prxPARSE('/\d\d\d\d\d\d\d\.\d\d/');
END;
Thank you very much
Marco
Replace each series of find# variables with a loop. Also, you forgot a run statement in your original code block.
%macro simplify;
DATA CDS;
SET CDS;
retain %do i = 131 %to 140; find&i. %end;;
if _n_=1 THEN DO;
%do i = 131 %to 140;
find&i. = prxPARSE('/\d\d\d\d\d\d\d\.\d\d/');
%end;
END;
RUN;
%mend simplify;
%simplify;
I am looking for some help with creating a macro utilizing an array as well as DO and IF statements for subsetting. Within my macro statement I am trying to look across columns for variables, and if the variable has a specific diagnosis code to then create a new variable and label as 1, and if not label all others as 0 to create one new data set based on that new variable, sort that data set, and append it to other data sets (because the input data sets are broken down quarterly), thus creating one final data set that I can then export (preferably as a newly created ZIP file to keep storage space down). I am using SAS 9.4/ Enterprise Guide 7.1.
Code:
OPTIONS MERROR SERROR SOURCE MLOGIC SYMBOLGEN MINOPERATOR OBS=MAX;
%MACRO DIAGXX(a,b);
DATA NEW;
SET x.&a(KEEP= PATID DIAG1-DIAG5);
ARRAY &b{5} $ DIAG1-DIAG5;
DO I = 1 TO 5;
IF &b{I} IN ("1630" "1631" "1638" "1639") THEN MESO = 1;
ELSE MESO = 0;
END;
DROP I;
RUN;
PROC SORT DATA=NEW NODUPKEY;
BY PATID;
WHERE MESO=1;
RUN;
PROC APPEND BASE=ALLDATA1 DATA=NEW FORCE;
RUN;
PROC EXPORT
DATA=ALLDATA1
OUTFILE= "C:\x\x\DIAGNOSIS EXPORT\MACRO DIAGXX MESO.CSV"
REPLACE
DBMS=CSV;
RUN;
%MEND DIAGXX;
%DIAGXX(Q1,MESTH);
%DIAGXX(Q2,MESTH);
You probably want to create your MESO flag this way so that the presence of any of the codes in any of the variables in the array will set MESO to true and it will be false when the codes never appear in any of the variables.
MESO = 0;
DO I = 1 TO 5;
IF &b{I} IN ("1630" "1631" "1638" "1639") THEN MESO = 1;
END;
If you want to get fancy you might save a little time by stopping the loop once the code is found.
DO I = 1 TO 5 WHILE (MESO=0);
I have the following macro:
rsubmit;
data indexsecid;
input secid 1-6;
datalines;
108105
109764
102456
102480
101499
102434
107880
run;
%let endyear = 2014;
%macro getvols1;
* First I extract the secids for all the options given a date and
an expiry date;
%do yearnum = 1996 %to &endyear;
proc sql;
create table volsurface1&yearnum as
select a.secid, a.date, a.days, a.delta, a.impl_volatility,
a.impl_strike, a.cp_flag
from optionm.vsurfd&yearnum as a, indexsecid as b
where a.secid=b
and a.impl_strike NE -99.99
order by a.date, a.secid, a.impl_strike;
quit;
%if &yearnum > 1996 %then %do;
proc append base= volsurface11996 data=volsurface1&yearnum;
run;
%end;
%end;
%mend;
%getvols1;
proc download data=volsurface11996;
run;
endrsubmit;
data _null_;
set work.volsurface11996;
length fv $ 200;
fv = "C:\Users\user\Desktop\" || TRIM(put(indexsecid,4.)) || ".csv";
file write filevar=fv dsd dlm=',' lrecl=32000 ;
put (_all_) (:);
run;
On the code above I have: where a.secid=108105. Now I have a list with several secid and I need to run the macro once for each secid. I am looking to run it once and generate a new dataset for each secid.
How can I do that? Thanks
Here is an approach that uses
A single data step set statement to combine all the input datasets
A data set list so you don't have to call each input by name
A hash table to limit the output to your list of secids
proc sort to order the output
Rezza/DWal's approach to output separate csvs with file filevar =
%let startyear = 1996;
%let endyear = 2014;
data volsurface1;
/* Read in all the input tables */
set optionm.vsurfd&startyear.-optionm.vsurfd&endyear.;
where impl_strike ~= -99.99;
/* Set up a hash table containing all the wanted secids */
if _N_ = 1 then do;
declare hash h(dataset: "indexsecid");
_rc = h.defineKey("secid");
_rc = h.defineDone();
end;
/* Only keep observations where secid is matched in the hash table */
if not h.find();
/* Select which variables to output */
keep secid date days delta impl_volatility impl_strike cp_flag;
run;
/* Sort the data */
proc sort data = volsurface1;
by secid date secid impl_strike;
run;
/* Write out a CSV for each secid */
data _null_;
set volsurface1;
length fv $200;
fv = "\path\to\output\" || trim(put(secid, 6.)) || ".csv";
file write filevar = fv dsd dlm = ',' lrecl = 32000;
put (_all_) (:);
run;
As I don't have your data this is untested. The only constraint I can see is that the contents of indexsecid must fit in memory. If you were not concerned with the order this could be all done in one data step.
SRSwift thank you for your comprehensive answer. It run smoothly with no errors. The only issue is that I am running it on a remote server (wharton) using:
%let wrds=wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=wrds;
signon username=_prompt_;
rsubmit;
and on the log it says it wrote the file to my folder on the server but I can t see any file on the server. The log says:
NOTE: The file WRITE is:
Filename=/home/uni/user/108505.csv,
Owner Name=user,Group Name=uni,
Access Permission=rw-r--r--,
Last Modified=Wed Apr 1 20:11:20 2015