Iterate through two datasets to create distinct results dataset

Iterate through two datasets to create distinct results dataset - loops

In SAS, I have the following two datasets:
Dataset #1: Data on people's meal preferences
ID | Meal | Meal_rank
1 Lobster 1
1 Cake 2
1 Hot Dog 3
1 Salad 4
1 Fries 5
2 Burger 1
2 Hot Dog 2
2 Pizza 3
2 Fries 4
3 Hot Dog 1
3 Salad 2
3 Soup 3
4 Lobster 1
4 Hot Dog 2
4 Burger 3
Dataset #2: Data on meal availability
Meal | Units_available
Hot Dog 2
Burger 1
Pizza 2
In SAS, I'd like to find a way to derive a result dataset that looks as follows (without changing anything in Dataset #1 or #2):
ID | Assigned_Meal
1 Hot Dog
2 Burger
3 Hot Dog
4 Meal cannot be assigned (out of stock/unavailable)
The results are driven by a process that iterates through the meals of each person (identified by their 'ID' values) until either:
A meal is found where there are enough units available.
All meals have been checked against the availability data.
Notably:
There are cases where the person lists a meal that isn't available.
The dataset I'm working with is much larger than in this example (thousands of rows).
Here is SAS code for creating the two sample datasets:
proc sql;
create table work.ppl_meal_pref
(ID char(4),
Meal char(20),
Meal_rank num);
insert into work.ppl_meal_pref
values('1','Lobster',1)
values('1','Cake',2)
values('1','Hot Dog',3)
values('1','Salad',4)
values('1','Fries',5)
values('2','Burger',1)
values('2','Hot Dog',2)
values('2','Pizza',3)
values('2','Fries',4)
values('3','Hot Dog',1)
values('3','Salad',2)
values('3','Soup',3)
values('4','Lobster',1)
values('4','Hot Dog',2)
values('4','Burger',3)
;
quit;
run;
proc sql;
create table work.lunch_menu
(FoodName char(14),
Units_available num);
insert into work.lunch_menu
values('Hot Dog',2)
values('Burger',1)
values('Pizza',1)
;
quit;
run;
I've tried to implement loops to perform this task, but to no avail (see below).
data work.assign_meals;
length FoodName $ 14 Units_available 8;
if (_n_ = 1) then do;
declare hash lookup(dataset:'work.lunch_menu', duplicate: 'error', ordered: 'ascending', multidata: 'NO');
lookup.defineKey('FoodName');
lookup.defineData('Units_available');
lookup.defineDone();
end;
do until (eof_pref);
set work.ppl_meal_pref END = eof_pref;
rc = lookup.FIND();
IF rc ne 0 THEN DO;
Units_available = 0;
end;
output;
end;
stop;
run;

Here is a working hash based code using the sample data from ealfons1. Having different variable names for the key (Meal versus FoodName) mean you have to use extra syntax in the FIND() (or you could rename in the SET or DATASET specifiers)
It will also output an updated stock level dataset. Tracking the not assigned condition, i.e. what preferences were run out / not stocked for each ID who did not get a meal assignment, would require extra code and output data.
data meal_assignments;
if 0 then set meals_stock; * prep PDV;
declare hash stock (dataset:'meals_stock');
stock.defineKey('FoodName');
stock.defineData('FoodName', 'Units_available');
stock.defineDone();
do until (lastrow_flag);
assigned = 0;
stocked = 0;
do until (last.ID);
set ppl_meal_pref end=lastrow_flag;
by ID Meal_rank; * error will happen if meal_rank is not monotonic;
if assigned then continue; * alread assigned;
if stock.find(key:Meal) ne 0 then continue; * off the menu;
stocked = 1;
if Units_available < 1 then continue; * out of stock or missing count;
Units_available + (-1);
if stock.replace() = 0 then do; * hash replace worked;
assigned = 1;
OUTPUT;
end;
else put 'WARNING: Problem with stock hash ' Meal=;
end;
if not assigned then do;
if stocked then Meal = 'Ran out'; else Meal = 'Not stocked';
OUTPUT;
end;
end;
keep ID Meal;
stock.output(dataset:'meals_stock_after_assignments');
stop;
run;
options nocenter;
title "Meals report";
proc print noobs data=meal_assignments; title2 "Assignments";
proc print noobs data=meals_stock_after_assignments; title2 "New stock levels";
proc sql;
title2 "Usage summary";
select A.Meal, A.have_count, B.had_count, B.had_count - A.have_count as use_count
from
(select FoodName as Meal, Units_available as have_count from meals_stock_after_assignments) as A
join
(select FoodName as Meal, Units_available as had_count from meals_stock) as B
on A.Meal = B.Meal
;
quit;
The 'want' here is queue based:
first come, first served by preference rank solution.
a random queue order over ID could deliver a modicum of perceived 'fairness'
More difficult solutions would be based on global planning, such as:
serve most people, highest preference rank
serve most people, lowest cost
etc ...

Another approach: modify-ing the meal availability dataset as you go along. This is slightly more concise than the hash approach but might not perform quite as well. On the other hand, it will still work even if your lunch_menu dataset is too large to fit conveniently into memory, and you have a record of what meals are left over afterwards. I have renamed variables for consistency between the input datasets:
proc sql;
create table work.ppl_meal_pref
(ID char(4),
Food char(20),
Meal_rank num);
insert into work.ppl_meal_pref
values('1','Lobster',1)
values('1','Cake',2)
values('1','Hot Dog',3)
values('1','Salad',4)
values('1','Fries',5)
values('2','Burger',1)
values('2','Hot Dog',2)
values('2','Pizza',3)
values('2','Fries',4)
values('3','Hot Dog',1)
values('3','Salad',2)
values('3','Soup',3)
values('4','Lobster',1)
values('4','Hot Dog',2)
values('4','Burger',3)
;
quit;
run;
proc sql;
create table work.lunch_menu
(Food char(20),
Units_available num);
insert into work.lunch_menu
values('Hot Dog',2)
values('Burger',1)
values('Pizza',1)
;
quit;
run;
proc datasets lib = work nolist nowarn nodetails;
modify lunch_menu;
index create Food /unique;
run;
quit;
/*Output to assigned_meals and update lunch_menu*/
data assigned_meals(keep = id AssignedFood AssignedFoodRank) lunch_menu;
length AssignedFood $ 20;
do until(last.ID);
set ppl_meal_pref;
by ID;
if missing(AssignedFood) then do;
modify lunch_menu key = Food;
if _iorc_ then _error_ = 0;
else if units_available > 0 then do;
AssignedFood = Food;
AssignedFoodRank = Meal_Rank;
units_available + -1;
replace lunch_menu;
end;
end;
end;
output assigned_meals;
run;

I never used the replace function of hashtables before and I did not test this code, but to my understanding, this should do the job:
/* build a dataset assign_meals with variables ID and Assigned_Meal */
data work.assign_meals (keep=ID Assigned_Meal);
/* Do that while reading ppl_meal_pref */
set work.ppl_meal_pref;
/* Take care can use first.ID to know you start a new ID */
by ID;
/* Remember if someone is served (without retain, SAS forgets all values when reading a new observation) */
retain served;
if first.ID then served = 0;
/* but first read lunch_menu into memory */
length FoodName $ 14 Units_available 8;
if (_n_ = 1) then do;
declare hash lookup(dataset:'work.lunch_menu',
duplicate: 'error',
ordered: 'ascending',
multidata: 'NO');
lookup.defineKey('FoodName');
lookup.defineData('Units_available');
lookup.defineDone();
end;
if not served then do;
/* Look up if the desired meal is available */
rc = lookup.FIND();
IF rc eq 0 THEN DO;
if Units_available gt 0 then do;
/* Serve this customer */
output;
served = 1;
Assigned_Meal= Meal;
/* Remember the a meal is used */
Units_available = Units_available - 1;
lookup.REPLACE();
end;
end;
end;
run;
I currently don't have the time to test it. If it does not work, tell me, so I can do that later.

Related

How to generate Stackoverflow table markdown from Snowflake

Stackoverflow supports table markdown. For example, to display a table like this:
N_NATIONKEY
N_NAME
N_REGIONKEY
0
ALGERIA
0
1
ARGENTINA
1
2
BRAZIL
1
3
CANADA
1
4
EGYPT
4
You can write code like this:
|N_NATIONKEY|N_NAME|N_REGIONKEY|
|---:|:---|---:|
|0|ALGERIA|0|
|1|ARGENTINA|1|
|2|BRAZIL|1|
|3|CANADA|1|
|4|EGYPT|4|
It would save a lot of time to generate the Stackoverflow table markdown automatically when running Snowflake queries.

The following stored procedure accepts either a query string or a query ID (it will auto-detect which it is) and returns the table results as Stackoverflow table markdown. It will automatically align numbers and dates to the right, strings, arrays, and objects to the left, and other types default to centered. It supports any query you can pass to it. It may be a good idea to use $$ to terminate the string passed into the procedure in case the SQL contains single quotes. You can create the procedure and test it using this script:
create or replace procedure MARKDOWN("queryOrQueryId" string)
returns string
language javascript
execute as caller
as
$$
const MAX_ROWS = 50; // Set the maximum row count to fetch. Tables in markdown larger than this become hard to read.
var [rs, i, c, row, props] = [null, 0, 0, 0, {}];
if (!queryOrQueryId || queryOrQueryId == 0){
queryOrQueryId = `select * from table(result_scan(last_query_id())) limit ${MAX_ROWS}`;
}
queryOrQueryId = queryOrQueryId.trim();
if (isUUID(queryOrQueryId)){
rs = snowflake.execute({sqlText:`select * from table(result_scan('${queryOrQueryId}')) limit ${MAX_ROWS}`});
} else {
rs = snowflake.execute({sqlText:`${queryOrQueryId}`});
}
props.columnCount = rs.getColumnCount();
for(i = 1; i <= props.columnCount; i++){
props["col" + i + "Name"] = rs.getColumnName(i);
props["col" + i + "Type"] = rs.getColumnType(i);
}
var table = getHeader(props);
while(rs.next()){
row = "|";
for(c = 1; c <= props.columnCount; c++){
row += escapeMarkup(rs.getColumnValueAsString(c)) + "|";
}
table += "\n" + row;
}
return table;
//------ End main function. Start of helper functions.
function escapeMarkup(s){
s = s.replace(/\\/g, "\\\\");
s = s.replaceAll('|', '\\|');
s = s.replace(/\s+/g, " ");
return s;
}
function getHeader(props){
s = "|";
for (var i = 1; i <= props.columnCount; i++){
s += props["col" + i + "Name"] + "|";
}
s += "\n";
for (var i = 1; i <= props.columnCount; i++){
switch(props["col" + i + "Type"]) {
case 'number':
s += '|---:';
break;
case 'string':
s += '|:---';
break;
case 'date':
s += '|---:';
break;
case 'json':
s += '|:---';
break;
default:
s += '|:---:';
}
}
return s + "|";
}
function isUUID(str){
const regexExp = /^[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}$/gi;
return regexExp.test(str);
}
$$;
-- Usage type 1, a simple query:
call stackoverflow_table($$ select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5 $$);
-- Usage type 2, a query ID:
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
set quid = (select last_query_id());
call stackoverflow_table($quid);
Edit: Based on Fieldy's helpful feedback, I modified the procedure code to allow passing null or 0 or a blank string '' as the parameter. This will use the last query ID and is a helpful shortcut. It also adds a constant to the code that will limit the returns to a set number of rows. This limit will be applied when using query IDs (or sending null, '', or 0, which uses the last query ID). The limit is not applied when the input parameter is the text of a query to run to avoid syntax errors if there's already a limit applied, etc.

Greg Pavlik's Javascript Stored Procedure solution made me wonder if this would be any easier with the new Python language support in Stored Procedures. This is currently a public-preview feature.
The Python Snowpark API supports returning a result as a Pandas dataframe, and Pandas supports returning a dataframe in Markdown format, via the tabulate package. Here's the stored procedure.
CREATE OR REPLACE PROCEDURE markdown_table(query_id VARCHAR)
RETURNS VARCHAR
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python','pandas','tabulate', 'regex')
HANDLER = 'markdown_table'
EXECUTE AS CALLER
AS $$
import pandas as pd
import tabulate
import regex
def markdown_table(session, queryOrQueryId = None):
# Validate UUID
if(queryOrQueryId is None):
pandas_result = session.sql("""Select * from table(result_scan(last_query_id()))""").to_pandas()
elif(bool(regex.match("^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", queryOrQueryId))):
pandas_result = session.sql(f"""select * from table(result_scan('{queryOrQueryId}'))""").to_pandas()
else:
pandas_result = session.sql(queryOrQueryId).to_pandas()
return pandas_result.to_markdown()
$$;
Which you can use as follows:
-- Usage type 1, use the result from the query ran immediately proceeding the Store-Procedure Call
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
call markdown_table(NULL);
-- Usage type 2, pass in a query_id
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
set quid = (select last_query_id());
select $quid;
call markdown_table($quid);
-- Usage type 3, provide a Query string to the Store-Procedure Call
call markdown_table('select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5');

The table can also be
N_NATIONKEY|N_NAME|N_REGIONKEY
--|--|--
0|ALGERIA|0
1|ARGENTINA|1
2|BRAZIL|1
3|CANADA|1
4|EGYPT|4
giving, so it can be a simpler solution
N_NATIONKEY
N_NAME
N_REGIONKEY
0
ALGERIA
0
1
ARGENTINA
1
2
BRAZIL
1
3
CANADA
1
4
EGYPT
4
I grab the result table and use notepad++ and replace tab \t with pipe space | and then insert by hand the header marker line. I sometime replace the empty null results with the text null to make the results make more sense. the form you use with the start/end pipes gets around the need for that.

DBeaver IDE supports "data export as markdown" and "advanced copy as markdown" out-of-the-box:
Output:
|R_REGIONKEY|R_NAME|R_COMMENT|
|-----------|------|---------|
|0|AFRICA|lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to |
|1|AMERICA|hs use ironic, even requests. s|
|2|ASIA|ges. thinly even pinto beans ca|
|3|EUROPE|ly final courts cajole furiously final excuse|
|4|MIDDLE EAST|uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl|
It is rendered as:
R_REGIONKEY
R_NAME
R_COMMENT
0
AFRICA
lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
1
AMERICA
hs use ironic, even requests. s
2
ASIA
ges. thinly even pinto beans ca
3
EUROPE
ly final courts cajole furiously final excuse
4
MIDDLE EAST
uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl

Working with nested loops in pl/sql but not displaying the proper output

SET SERVEROUTPUT ON SIZE 4000;
DECLARE
call_id COURSE.CALL_ID%type;
sec_num COURSE_SECTION.SEC_NUM%type;
fname STUDENT.S_FIRST%TYPE ;
lname STUDENT.S_LAST%TYPE;
CURSOR c_info is
SELECT CALL_ID , SEC_NUM
FROM COURSE_SECTION ,COURSE,TERM
WHERE COURSE_SECTION.COURSE_ID = COURSE.COURSE_ID
AND TERM.TERM_ID = COURSE_SECTION.TERM_ID
AND TERM.TERM_DESC = 'Summer 2007' ;
CURSOR S_NAME IS
SELECT DISTINCT S_FIRST, S_LAST
FROM STUDENT,COURSE_SECTION,TERM,ENROLLMENT
WHERE TERM.TERM_ID = COURSE_SECTION.TERM_ID
AND COURSE_SECTION.C_SEC_ID = ENROLLMENT.C_SEC_ID
AND COURSE_SECTION.TERM_ID=TERM.TERM_ID
AND ENROLLMENT.S_ID = STUDENT.S_ID
AND TERM.TERM_DESC LIKE 'Summer 2007';
BEGIN
OPEN c_info;
LOOP
FETCH c_info INTO call_id , sec_num ;
EXIT WHEN c_info%notfound;
DBMS_OUTPUT.PUT_LINE('==================================');
DBMS_OUTPUT.PUT_LINE(call_id || ' ' || 'Sec. ' || sec_num);
DBMS_OUTPUT.PUT_LINE('==================================');
OPEN S_NAME;
LOOP
FETCH S_NAME INTO fname , lname ;
EXIT WHEN S_NAME%notfound;
DBMS_OUTPUT.PUT_LINE(fname || ' ' || lname );
END LOOP;
CLOSE S_NAME ;
END LOOP;
CLOSE c_info;
END;
-- The output expected
-- I have having some issues, I am unable to display the proper output. I am trying to use a nested loop but i made some mistake when implementing it. Plus i think an explicit cursor is much better to be used.
Make use of the Northwood university database.
https://drive.google.com/file/d/1M_g7FbgOUahoFtE943OK28UxIFbUFgRk/view?usp=sharing
The script

I'm making a lot of assumptions here - I'm guessing you are getting all students for all courses in your inner loop, but you really just want get students for the particular course section you are dealing with in your outer loop.
So your second query will need to reference the right course section ID to limit the students to just that section.
You don't need to explicitly define cursors unless you need them for some reason - if you just iterating through them, its better to reference them directly in the FOR loop.
So that brings me to the following
set serveroutput on size 4000;
begin
for c_info in (
select call_id,
sec_num,
SEC_ID -- PK to link to enrollment later
from course_section,
course,
term
where course_section.course_id = course.course_id
and term.term_id = course_section.term_id
and term.term_desc = 'Summer 2007' ;
)
loop
dbms_output.put_line('==================================');
dbms_output.put_line(c_info.call_id || ' ' || 'Sec. ' || c_info.sec_num);
dbms_output.put_line('==================================');
for s_name in (
select distinct s_first, s_last
from student,
course_section,
term,
enrollment
where term.term_id = course_section.term_id
and course_section.c_sec_id = enrollment.c_sec_id
and course_section.term_id=term.term_id
and enrollment.s_id = student.s_id
and term.term_desc like 'Summer 2007'
AND ENROLLMENT.C_SEC_ID = C_INFO.SEC_ID -- get students just for THIS course section
)
loop
dbms_output.put_line(s_name.s_first || ' ' || s_name.s_last );
end loop;
end loop;
end;
where I've put query alterations in CAPS.
Since I cut/pasted your SQL there are no aliases in the first query - I'd recommend you correct that, as aliasing columns is always good practice.
Simiarly, I retained the DISTINCT in the second query, but I'd suspect its redundant, because I imagine a student wont enroll more than once for a single course section. (And in reality, if you had two different students named Sue Smith, you would probably want to print them out twice, no?)

How to read in a column of data in an IF-THEN statement or in a PROC SQL statement, SAS

I have a SAS data step statement –
Data work.CABGothers2;
set work.CABGothers1;
IF proc_p in (a HUGE LIST OF ICD10 CODES) and PDDCABG = 1
and TypeofCABG_PDDTemp = . then TypeofCABG_PDDTemp = 4;
IF proc2 in (a HUGE LIST OF ICD10 CODES) and PDDCABG = 1
and TypeofCABG_PDDTemp = . then TypeofCABG_PDDTemp = 4;
IF proc3 in (a HUGE LIST OF ICD10 CODES) and PDDCABG = 1
and TypeofCABG_PDDTemp = . then TypeofCABG_PDDTemp = 4;
...
run;
This IF-THEN section goes on 21 times, so you can imagine how HUGE and cumbersome this sas code file gets, especially when it comes to any modifications to the ICD10 code list. It would have to be changed individually in all the proc1,proc2... columns.
Also, the ICD10 lists are very huge with over 7000 codes, I was wondering if someone could show me a better SAS code that might take as input a column of data (ICD10 codes) from a file.
I would like a proc sql or Data step procedure. Whichever is more efficient.
Current code-
Data work.CABGothers2;
set work.CABGothers1;
IF proc_p in (a HUGE LIST OF ICD10 CODES) and PDDCABG = 1
and TypeofCABG_PDDTemp = . then TypeofCABG_PDDTemp = 4;
run;
UPDATE--
I got this to work if the list is small...however I have a column with 8000 unique ICD10 codes. So I get an error message as shown below.
proc sql;
select quote(icd10) into :cabgvalexcl separated by ','
from newlink.cabgvalexcl2019;
quit;
Data work.test1;
set WORK.cabgpddcol;
IF proc_p in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1;
IF oproc1 in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1 ;
IF oproc2 in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1;
IF oproc3 in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1 ;
IF oproc4 in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1;
run;
**> ERROR message- ERROR: The length of the value of the macro variable
CABGVALEXCL (65540) exceeds the maximum length (65534). The value has
been
truncated to 65534 characters.**
UPDATE --
eXAMPLE (JUST FEW ROWS) of ONLY 1 column (I do not have multiple columns. I did that in the macro example because macro variable was running out of max space.) containing ICD10 codes and the data file in which I have to tag rows that have any of the ICD10 codes -
OUTPUT table-
LOgic - If any of the ICD10 codes listed in cabgvalexcl2019 (shown here in RED) is found in the table CABGOTHERS1, create a column called - EXCLUDE - and put a value of 1 for that record.

Here's a hash-based example. It doesn't use macro variables, so it should work for any number of ICD10 codes:
data cabgvalexcl2019;
input (icd1-icd3) (:$2.);
datalines;
1 2 3
4 5 6
7 8 9
;
run;
/*Generate some dummy data*/
data cabgpddcol;
array keys[*] $2 proc_p oproc1-oproc20;
call streaminit(1); /*Set random number seed*/
do i = 1 to 20;
do j = 1 to dim(keys);
keys[j] = put(int(rand('uniform') * 11 + 9), 2.); /*Chosen so we get a few rows with no exclusion codes*/
end;
PDDCABG = rand('uniform') < 0.75;
output;
end;
drop i j;
run;
/* CABGval_Excl = Identify CABG+VALVE exclusions which are "CABG OTHERS". This is the 2019 CABG+VALVE exclusion list. */
/* If the RECORD IN following table has CABGVAL_Excl = 1 then it is a CABG+valve WITH EXCLUSION*/
Data work.CABGval_Excl; /* CABG OTHERS prior to refinement into non-iso CABG WITH Valve and non-iso CABG WITHOUT Valve */
/*Create hash object to hold list of ICD codes*/
length icd $ 2;
if _n_ = 1 then do;
declare hash h();
rc = h.definekey('icd');
rc = h.definedone();
do until(eof);
set cabgvalexcl2019 end = eof;
/*Consider using an array here if you have lots of ICD columns*/
do icd = icd1, icd2, icd3;
rc = h.add();
end;
end;
end;
set cabgpddcol;
/*Loop through all the keys and stop if we find one in the hash*/
array keys[*] proc_p oproc1-oproc20;
rc = -1;
do i = 1 to dim(keys) until(rc = 0);
rc = h.find(key:keys[i]); /*This sets rc = 0 if a match is found*/
end;
drop i rc icd:;
CABGVAL_Excl = rc ne 0 and PDDCABG = 1;
run;
Constructing the hash object is a little bit fiddly if you have multiple columns holding all the distinct ICD10 codes you care about - if they're all in one column then there's a simpler way of doing this:
declare hash h(dataset:'cabgvalexcl2019');
rc = h.definekey('icd');
rc = h.definedone();

Output ZIP dataset

I am looking for some help with creating a macro utilizing an array as well as DO and IF statements for subsetting. Within my macro statement I am trying to look across columns for variables, and if the variable has a specific diagnosis code to then create a new variable and label as 1, and if not label all others as 0 to create one new data set based on that new variable, sort that data set, and append it to other data sets (because the input data sets are broken down quarterly), thus creating one final data set that I can then export (preferably as a newly created ZIP file to keep storage space down). I am using SAS 9.4/ Enterprise Guide 7.1.
Code:
OPTIONS MERROR SERROR SOURCE MLOGIC SYMBOLGEN MINOPERATOR OBS=MAX;
%MACRO DIAGXX(a,b);
DATA NEW;
SET x.&a(KEEP= PATID DIAG1-DIAG5);
ARRAY &b{5} $ DIAG1-DIAG5;
DO I = 1 TO 5;
IF &b{I} IN ("1630" "1631" "1638" "1639") THEN MESO = 1;
ELSE MESO = 0;
END;
DROP I;
RUN;
PROC SORT DATA=NEW NODUPKEY;
BY PATID;
WHERE MESO=1;
RUN;
PROC APPEND BASE=ALLDATA1 DATA=NEW FORCE;
RUN;
PROC EXPORT
DATA=ALLDATA1
OUTFILE= "C:\x\x\DIAGNOSIS EXPORT\MACRO DIAGXX MESO.CSV"
REPLACE
DBMS=CSV;
RUN;
%MEND DIAGXX;
%DIAGXX(Q1,MESTH);
%DIAGXX(Q2,MESTH);

You probably want to create your MESO flag this way so that the presence of any of the codes in any of the variables in the array will set MESO to true and it will be false when the codes never appear in any of the variables.
MESO = 0;
DO I = 1 TO 5;
IF &b{I} IN ("1630" "1631" "1638" "1639") THEN MESO = 1;
END;
If you want to get fancy you might save a little time by stopping the loop once the code is found.
DO I = 1 TO 5 WHILE (MESO=0);

Loop through list

I have the following macro:
rsubmit;
data indexsecid;
input secid 1-6;
datalines;
108105
109764
102456
102480
101499
102434
107880
run;
%let endyear = 2014;
%macro getvols1;
* First I extract the secids for all the options given a date and
an expiry date;
%do yearnum = 1996 %to &endyear;
proc sql;
create table volsurface1&yearnum as
select a.secid, a.date, a.days, a.delta, a.impl_volatility,
a.impl_strike, a.cp_flag
from optionm.vsurfd&yearnum as a, indexsecid as b
where a.secid=b
and a.impl_strike NE -99.99
order by a.date, a.secid, a.impl_strike;
quit;
%if &yearnum > 1996 %then %do;
proc append base= volsurface11996 data=volsurface1&yearnum;
run;
%end;
%end;
%mend;
%getvols1;
proc download data=volsurface11996;
run;
endrsubmit;
data _null_;
set work.volsurface11996;
length fv $ 200;
fv = "C:\Users\user\Desktop\" || TRIM(put(indexsecid,4.)) || ".csv";
file write filevar=fv dsd dlm=',' lrecl=32000 ;
put (_all_) (:);
run;
On the code above I have: where a.secid=108105. Now I have a list with several secid and I need to run the macro once for each secid. I am looking to run it once and generate a new dataset for each secid.
How can I do that? Thanks

Here is an approach that uses
A single data step set statement to combine all the input datasets
A data set list so you don't have to call each input by name
A hash table to limit the output to your list of secids
proc sort to order the output
Rezza/DWal's approach to output separate csvs with file filevar =
%let startyear = 1996;
%let endyear = 2014;
data volsurface1;
/* Read in all the input tables */
set optionm.vsurfd&startyear.-optionm.vsurfd&endyear.;
where impl_strike ~= -99.99;
/* Set up a hash table containing all the wanted secids */
if _N_ = 1 then do;
declare hash h(dataset: "indexsecid");
_rc = h.defineKey("secid");
_rc = h.defineDone();
end;
/* Only keep observations where secid is matched in the hash table */
if not h.find();
/* Select which variables to output */
keep secid date days delta impl_volatility impl_strike cp_flag;
run;
/* Sort the data */
proc sort data = volsurface1;
by secid date secid impl_strike;
run;
/* Write out a CSV for each secid */
data _null_;
set volsurface1;
length fv $200;
fv = "\path\to\output\" || trim(put(secid, 6.)) || ".csv";
file write filevar = fv dsd dlm = ',' lrecl = 32000;
put (_all_) (:);
run;
As I don't have your data this is untested. The only constraint I can see is that the contents of indexsecid must fit in memory. If you were not concerned with the order this could be all done in one data step.

SRSwift thank you for your comprehensive answer. It run smoothly with no errors. The only issue is that I am running it on a remote server (wharton) using:
%let wrds=wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=wrds;
signon username=_prompt_;
rsubmit;
and on the log it says it wrote the file to my folder on the server but I can t see any file on the server. The log says:
NOTE: The file WRITE is:
Filename=/home/uni/user/108505.csv,
Owner Name=user,Group Name=uni,
Access Permission=rw-r--r--,
Last Modified=Wed Apr 1 20:11:20 2015

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Iterate through two datasets to create distinct results dataset - loops

Related

How to generate Stackoverflow table markdown from Snowflake

Working with nested loops in pl/sql but not displaying the proper output

How to read in a column of data in an IF-THEN statement or in a PROC SQL statement, SAS

Output ZIP dataset

Loop through list

Categories

Resources