I need to do inner join with a dataset which has date and month in its name, i.e.
Account_2019_10 (as in Oct 2019).
I need to perform this inner join in a loop for each month from a specific month-year till today's month-year.(i.e. from Sept 2019 till July 2020). Considering the dataset has the month & year in the above format (2019_10 for Oct 2019), how would i perform this loop and append all the results in a group for that month-year?
To change the name of the dataset being referenced you will need to use some code generation. Typically just by using macro variables in place of the dataset name(s).
To loop over dates using an offset value and the INTNX() function. You can use INTCK() to determine how many months to generate.
data _null_;
start = '01OCT2019'd ;
end = '01JUL2020'd ;
length name $32 names $1000;
do offset=0 to intck('month',start,end);
date=intnx('month',start,offset);
name='account_'||translate(put(date,yymm7.),'_','M');
names=catx(' ',names,name);
end;
call symputx('names',names);
run;
Now that you have this list of dataset names you can use it in your code to combine the datasets.
data all;
set &names ;
run;
If your monthly tables do not actually have a variable already that indicates the month you can add one by using the INDSNAME= option of the SET statement. Note that the variable created by that option is not saved so you need to copy the value.
data all;
length dsname $41 ;
set &names indsname=dsname;
month = dsname;
run;
Use the SQL data dictionary to identify the data sets containing a yyyy_mm construct.
Select them into a macro variable that will be used to combine all the data sets in a SET statement with INDSNAME option.
Example:
%macro makefakedata();
%local year month amount;
%do year = 2019 %to 2020;
%do month = 1 %to 12;
data work.account_&year._%sysfunc(putn(&month,z2.));
do id = 1 to 10;
amount = 100 * %sysfunc(monotonic()) + id;
output;
end;
run;
%end;
%end;
%mend;
data work.foo_bar;
set sashelp.class;
run;
%makefakedata;
ods listing;
proc sql noprint;
select
catx('.', libname, memname) as dataset
, input (
cats ( substr ( memname, index(memname,'_')+1 ) , '_01' )
, ? YYMMDD10.
) as month
into
:datasets separated by ' '
, :months separated by ' '
from
dictionary.tables
where libname = 'WORK'
and index(memname,'_')
having
month
;
%put &=datasets;
%put &=months;
data all_month_named_data;
set &datasets indsname=from;
source = from;
month = input (
cats ( substr ( source, index(source,'_')+1 ) , '_01' )
, YYMMDD10.);
format month yymm7.;
run;
Related
How do I get the dates from the file name to populate the date column?
I have 23 data files:
price_20070131
price_20070228
price_20070331
.
.
.
price_20081130
In the data file, price_20070131, it currently looks like this:
ID Product
001 A
002 B
003 C
I want my output to look like this:
ID Product Date
001 A 31Jan2007
002 B 31Jan2007
003 C 31Jan2007
The same will be repeated to all the 23 data files. And final result would merge all 23 files to look like this:
ID Product Date
001 A 31Jan2007
002 B 31Jan2007
003 C 31Jan2007
001 A 28Feb2007
002 B 28Feb2007
003 C 28Feb2007
.
.
.
.
001 A 30Nov2007
002 B 30Nov2007
003 C 30Nov2007
Use the INDSNAME option to add the file name and then use SCAN/SUBSTR() to extract the date portion. This would append all data sets starting with price_2007 and price_2008 and add a date field.
data want;
set price_2007: price_2008: indsname=source;
date=input(scan(source, 2, '_'), yymmdd10.);
format date date9.;
run;
EDIT: SAS 9.1 is about 15 years old so you should really upgrade. Upgrades are included with your license. This means you don't have data set lists or the ability to use the INDSNAME option and means you either need a macro solution of some sort. 4 lines of code becomes 47...
Making an assumption that your data sets are named PRICE_LAST_DAY_MONTH consistently.
*sample data sets for demonstration;
data price_20080131;
set sashelp.class;
test=1;
run;
data price_20080229;
set sashelp.class;
test=2;
run;
%macro stack_data_add_date(start_date=, end_date=, outData=, debug=);
%*get parameters for looping, primarily the number of intervals;
data _null_;
start_date= input("&start_date", yymmdd10.);
end_date = input("&end_date", yymmdd10.);
n_intervals = intck('month', start_date, end_date);
call symputx('start_date', start_date, 'l');
call symputx('end_date', end_date, 'l');
call symputx('n_intervals', n_intervals, 'l');
run;
%*loop from 0 - starting time to end;
%do i=0 %to &n_intervals;
%*determine end of month date for dataset name;
%let date = %sysfunc(intnx(month, &start_date, &i., e));
%*output statistics for testing;
%if &debug=Y %then %do;
%put &n_intervals;
%put &start_date;
%put &end_date;
%end;
%*create a view with the data and date added in;
data _temp / view=_temp;
set price_%sysfunc(putn(&date, yymmddn8.));
date = &date.;
format date date9.;
run;
%*insert into master table;
proc append base=&outData data=_temp;
run;
%*delete view so it doesn't exist for next loop;
proc datasets lib=work nodetails nolist;
delete _temp / memtype=view;
run;quit;
%end;
%mend;
*test;
%stack_data_add_date(start_date=20080131, end_date=20080229, outData=want, debug=Y);
I am defined four variables here and each of the variables with different number of ICD10 codes:
%LET DX_27800_CODE = 'E6609', 'E661', 'E668', 'E669';
%LET DX_27801_CODE = 'E6601';
%LET DX_2859_CODE = 'D649';
%LET DX_6202_CODE = 'N8320', 'N8329';
now I want to use create an array that can easy mapping those variables that with my icd 10 table columns so that I could assign flags variables with it.
the regular way would be:
data test; set input;
if (dx1 in ( &DX_27800_CODE) or dx2 in (&DX_27800_CODE) or dx3 in (&DX_27800_CODE))
then dx_27800 = 1; else dx_27800 =0;
run;
in the regular way I would need to do this procedure four times to get all four flags variable. So I'm wondering if it could be done by using array.
data test; set input;
array dx_code10 [4] &DX_27800_CODE &DX_27801_CODE &DX_2859_CODE &DX_6202_CODE;
ARRAY DX_VARIABLE[4] DX_27800 DX_27801 DX_2859 DX_6202;
DO I = 1 TO DIM(dx_code10);
IF (DX1 IN (DX_CODE10[I]) OR DX2 IN (DX_CODE10[I]) OR DX3 IN (DX_CODE10[I]))
THEN DX_VARIABLE[I] = 1;
ELSE DX_VARIABLE[I] = 0;
END;
END;
RUN;
But seems like it can't be done by this way. Please help me to solve this problem. thanks.
I think a better approach is to use formats. I'd rather have those DX codes in a spreadsheet or a text file or something, and then input that to make the formats, but even with the not-best-practice %LETs, you can still use a format solution.
Approach is to make a format that turns each of those DX code pairs into a value that returns the dx value (the 27800, 27801, etc.); then use that to drive how you assign the followup array.
%LET DX_27800_CODE = 'E6609', 'E661', 'E668', 'E669';
%LET DX_27801_CODE = 'E6601';
%LET DX_2859_CODE = 'D649';
%LET DX_6202_CODE = 'N8320', 'N8329';
proc format;
value $dxcode
&dx_27800_code = '27800'
&dx_27801_code = '27801'
&dx_2859_code = '2859'
&dx_6202_code = '6202'
other=' '
;
quit;
data input;
input dx1 $;
datalines;
E6601
E6609
E6608
E661
E668
D649
D650
N8320
E669
N8329
;;;;
run;
data want;
set input;
array dx_codes[4] dx_27800 dx_27801 dx_2859 dx_6202;
dx_code_val = put(dx1,$dxcode5.);
do _i = 1 to dim(dx_codes);
if dx_code_val = scan(vname(dx_codes[_i]),2,'_') then dx_codes[_i]=1;
else dx_codes[_i]=0;
end;
run;
For your specific example you could use FINDW() function instead of the IN operator. Turn your code lists into delimited strings instead.
%LET DX_27800_CODE = E6609,E661,E668,E669;
%LET DX_27801_CODE = E6601 ;
%LET DX_2859_CODE = D649 ;
%LET DX_6202_CODE = N8320,N8329;
data test;
set input;
array dx_code_list (4) $200 _temporary_ ("&dx_27800_code" "&dx_27801_code" "&dx_2859_code" "&dx_6202_code");
array dx_variable (4) dx_27800 dx_27801 dx_2859 dx_6202;
array dx dx1-dx3 ;
do i = 1 to dim(dx_variable);
dx_variable(i)=0;
do j=1 to dim(dx) while (dx_variable(i)=0);
if findw(dx_code_list(i),dx(j),',','it') then dx_variable(i)=1;
end;
end;
drop i j;
run;
So if I make some sample data.
data input ;
length dx1-dx3 $7 ;
input dx1 - dx3 ;
cards;
E6609 E661 .
E668 E669 .
E6601 . .
D649 N8320 N8329
. . .
;
I get this result:
In SAS, I have the following two datasets:
Dataset #1: Data on people's meal preferences
ID | Meal | Meal_rank
1 Lobster 1
1 Cake 2
1 Hot Dog 3
1 Salad 4
1 Fries 5
2 Burger 1
2 Hot Dog 2
2 Pizza 3
2 Fries 4
3 Hot Dog 1
3 Salad 2
3 Soup 3
4 Lobster 1
4 Hot Dog 2
4 Burger 3
Dataset #2: Data on meal availability
Meal | Units_available
Hot Dog 2
Burger 1
Pizza 2
In SAS, I'd like to find a way to derive a result dataset that looks as follows (without changing anything in Dataset #1 or #2):
ID | Assigned_Meal
1 Hot Dog
2 Burger
3 Hot Dog
4 Meal cannot be assigned (out of stock/unavailable)
The results are driven by a process that iterates through the meals of each person (identified by their 'ID' values) until either:
A meal is found where there are enough units available.
All meals have been checked against the availability data.
Notably:
There are cases where the person lists a meal that isn't available.
The dataset I'm working with is much larger than in this example (thousands of rows).
Here is SAS code for creating the two sample datasets:
proc sql;
create table work.ppl_meal_pref
(ID char(4),
Meal char(20),
Meal_rank num);
insert into work.ppl_meal_pref
values('1','Lobster',1)
values('1','Cake',2)
values('1','Hot Dog',3)
values('1','Salad',4)
values('1','Fries',5)
values('2','Burger',1)
values('2','Hot Dog',2)
values('2','Pizza',3)
values('2','Fries',4)
values('3','Hot Dog',1)
values('3','Salad',2)
values('3','Soup',3)
values('4','Lobster',1)
values('4','Hot Dog',2)
values('4','Burger',3)
;
quit;
run;
proc sql;
create table work.lunch_menu
(FoodName char(14),
Units_available num);
insert into work.lunch_menu
values('Hot Dog',2)
values('Burger',1)
values('Pizza',1)
;
quit;
run;
I've tried to implement loops to perform this task, but to no avail (see below).
data work.assign_meals;
length FoodName $ 14 Units_available 8;
if (_n_ = 1) then do;
declare hash lookup(dataset:'work.lunch_menu', duplicate: 'error', ordered: 'ascending', multidata: 'NO');
lookup.defineKey('FoodName');
lookup.defineData('Units_available');
lookup.defineDone();
end;
do until (eof_pref);
set work.ppl_meal_pref END = eof_pref;
rc = lookup.FIND();
IF rc ne 0 THEN DO;
Units_available = 0;
end;
output;
end;
stop;
run;
Here is a working hash based code using the sample data from ealfons1. Having different variable names for the key (Meal versus FoodName) mean you have to use extra syntax in the FIND() (or you could rename in the SET or DATASET specifiers)
It will also output an updated stock level dataset. Tracking the not assigned condition, i.e. what preferences were run out / not stocked for each ID who did not get a meal assignment, would require extra code and output data.
data meal_assignments;
if 0 then set meals_stock; * prep PDV;
declare hash stock (dataset:'meals_stock');
stock.defineKey('FoodName');
stock.defineData('FoodName', 'Units_available');
stock.defineDone();
do until (lastrow_flag);
assigned = 0;
stocked = 0;
do until (last.ID);
set ppl_meal_pref end=lastrow_flag;
by ID Meal_rank; * error will happen if meal_rank is not monotonic;
if assigned then continue; * alread assigned;
if stock.find(key:Meal) ne 0 then continue; * off the menu;
stocked = 1;
if Units_available < 1 then continue; * out of stock or missing count;
Units_available + (-1);
if stock.replace() = 0 then do; * hash replace worked;
assigned = 1;
OUTPUT;
end;
else put 'WARNING: Problem with stock hash ' Meal=;
end;
if not assigned then do;
if stocked then Meal = 'Ran out'; else Meal = 'Not stocked';
OUTPUT;
end;
end;
keep ID Meal;
stock.output(dataset:'meals_stock_after_assignments');
stop;
run;
options nocenter;
title "Meals report";
proc print noobs data=meal_assignments; title2 "Assignments";
proc print noobs data=meals_stock_after_assignments; title2 "New stock levels";
proc sql;
title2 "Usage summary";
select A.Meal, A.have_count, B.had_count, B.had_count - A.have_count as use_count
from
(select FoodName as Meal, Units_available as have_count from meals_stock_after_assignments) as A
join
(select FoodName as Meal, Units_available as had_count from meals_stock) as B
on A.Meal = B.Meal
;
quit;
The 'want' here is queue based:
first come, first served by preference rank solution.
a random queue order over ID could deliver a modicum of perceived 'fairness'
More difficult solutions would be based on global planning, such as:
serve most people, highest preference rank
serve most people, lowest cost
etc ...
Another approach: modify-ing the meal availability dataset as you go along. This is slightly more concise than the hash approach but might not perform quite as well. On the other hand, it will still work even if your lunch_menu dataset is too large to fit conveniently into memory, and you have a record of what meals are left over afterwards. I have renamed variables for consistency between the input datasets:
proc sql;
create table work.ppl_meal_pref
(ID char(4),
Food char(20),
Meal_rank num);
insert into work.ppl_meal_pref
values('1','Lobster',1)
values('1','Cake',2)
values('1','Hot Dog',3)
values('1','Salad',4)
values('1','Fries',5)
values('2','Burger',1)
values('2','Hot Dog',2)
values('2','Pizza',3)
values('2','Fries',4)
values('3','Hot Dog',1)
values('3','Salad',2)
values('3','Soup',3)
values('4','Lobster',1)
values('4','Hot Dog',2)
values('4','Burger',3)
;
quit;
run;
proc sql;
create table work.lunch_menu
(Food char(20),
Units_available num);
insert into work.lunch_menu
values('Hot Dog',2)
values('Burger',1)
values('Pizza',1)
;
quit;
run;
proc datasets lib = work nolist nowarn nodetails;
modify lunch_menu;
index create Food /unique;
run;
quit;
/*Output to assigned_meals and update lunch_menu*/
data assigned_meals(keep = id AssignedFood AssignedFoodRank) lunch_menu;
length AssignedFood $ 20;
do until(last.ID);
set ppl_meal_pref;
by ID;
if missing(AssignedFood) then do;
modify lunch_menu key = Food;
if _iorc_ then _error_ = 0;
else if units_available > 0 then do;
AssignedFood = Food;
AssignedFoodRank = Meal_Rank;
units_available + -1;
replace lunch_menu;
end;
end;
end;
output assigned_meals;
run;
I never used the replace function of hashtables before and I did not test this code, but to my understanding, this should do the job:
/* build a dataset assign_meals with variables ID and Assigned_Meal */
data work.assign_meals (keep=ID Assigned_Meal);
/* Do that while reading ppl_meal_pref */
set work.ppl_meal_pref;
/* Take care can use first.ID to know you start a new ID */
by ID;
/* Remember if someone is served (without retain, SAS forgets all values when reading a new observation) */
retain served;
if first.ID then served = 0;
/* but first read lunch_menu into memory */
length FoodName $ 14 Units_available 8;
if (_n_ = 1) then do;
declare hash lookup(dataset:'work.lunch_menu',
duplicate: 'error',
ordered: 'ascending',
multidata: 'NO');
lookup.defineKey('FoodName');
lookup.defineData('Units_available');
lookup.defineDone();
end;
if not served then do;
/* Look up if the desired meal is available */
rc = lookup.FIND();
IF rc eq 0 THEN DO;
if Units_available gt 0 then do;
/* Serve this customer */
output;
served = 1;
Assigned_Meal= Meal;
/* Remember the a meal is used */
Units_available = Units_available - 1;
lookup.REPLACE();
end;
end;
end;
run;
I currently don't have the time to test it. If it does not work, tell me, so I can do that later.
I have the following macro:
rsubmit;
data indexsecid;
input secid 1-6;
datalines;
108105
109764
102456
102480
101499
102434
107880
run;
%let endyear = 2014;
%macro getvols1;
* First I extract the secids for all the options given a date and
an expiry date;
%do yearnum = 1996 %to &endyear;
proc sql;
create table volsurface1&yearnum as
select a.secid, a.date, a.days, a.delta, a.impl_volatility,
a.impl_strike, a.cp_flag
from optionm.vsurfd&yearnum as a, indexsecid as b
where a.secid=b
and a.impl_strike NE -99.99
order by a.date, a.secid, a.impl_strike;
quit;
%if &yearnum > 1996 %then %do;
proc append base= volsurface11996 data=volsurface1&yearnum;
run;
%end;
%end;
%mend;
%getvols1;
proc download data=volsurface11996;
run;
endrsubmit;
data _null_;
set work.volsurface11996;
length fv $ 200;
fv = "C:\Users\user\Desktop\" || TRIM(put(indexsecid,4.)) || ".csv";
file write filevar=fv dsd dlm=',' lrecl=32000 ;
put (_all_) (:);
run;
On the code above I have: where a.secid=108105. Now I have a list with several secid and I need to run the macro once for each secid. I am looking to run it once and generate a new dataset for each secid.
How can I do that? Thanks
Here is an approach that uses
A single data step set statement to combine all the input datasets
A data set list so you don't have to call each input by name
A hash table to limit the output to your list of secids
proc sort to order the output
Rezza/DWal's approach to output separate csvs with file filevar =
%let startyear = 1996;
%let endyear = 2014;
data volsurface1;
/* Read in all the input tables */
set optionm.vsurfd&startyear.-optionm.vsurfd&endyear.;
where impl_strike ~= -99.99;
/* Set up a hash table containing all the wanted secids */
if _N_ = 1 then do;
declare hash h(dataset: "indexsecid");
_rc = h.defineKey("secid");
_rc = h.defineDone();
end;
/* Only keep observations where secid is matched in the hash table */
if not h.find();
/* Select which variables to output */
keep secid date days delta impl_volatility impl_strike cp_flag;
run;
/* Sort the data */
proc sort data = volsurface1;
by secid date secid impl_strike;
run;
/* Write out a CSV for each secid */
data _null_;
set volsurface1;
length fv $200;
fv = "\path\to\output\" || trim(put(secid, 6.)) || ".csv";
file write filevar = fv dsd dlm = ',' lrecl = 32000;
put (_all_) (:);
run;
As I don't have your data this is untested. The only constraint I can see is that the contents of indexsecid must fit in memory. If you were not concerned with the order this could be all done in one data step.
SRSwift thank you for your comprehensive answer. It run smoothly with no errors. The only issue is that I am running it on a remote server (wharton) using:
%let wrds=wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=wrds;
signon username=_prompt_;
rsubmit;
and on the log it says it wrote the file to my folder on the server but I can t see any file on the server. The log says:
NOTE: The file WRITE is:
Filename=/home/uni/user/108505.csv,
Owner Name=user,Group Name=uni,
Access Permission=rw-r--r--,
Last Modified=Wed Apr 1 20:11:20 2015
I need to create a summary dataset/report which tracks the flow of these purchases over time.I have a dataset which gives a signup date for an overall service and 9 variables which give the purchase dates for different add on products. If the add on variable dates match the signup date then those add on products were included with the signup package. Any add on variable purchase date that comes after the signup date are products which are purchased during the history of the active account. This is what it looks like:
data have ;
length ID 8
signup_DT 8 preferredhd_tv_estbd_dt 8
ultimate_estbd_dt 8 quant_estbd_dt 8
FullyLoaded_estbd_dt 8 HB_estbd_dt Cin_estbd_dt 8
time_estbd_dt 8 router_estbd_dt internet_estbd_dt 8;
INPUT ID 8
signup_DT : anydtdte9. preferredhd_tv_estbd_dt : anydtdte9.
ultimate_estbd_dt : anydtdte9. quant_estbd_dt : anydtdte9.
FullyLoaded_estbd_dt : anydtdte9. HB_estbd_dt Cin_estbd_dt : anydtdte9.
time_estbd_dt : anydtdte9. router_estbd_dt internet_estbd_dt : anydtdte9. ;;
format signup_DT preferredhd_tv_estbd_dt
ultimate_estbd_dt quant_estbd_dt
FullyLoaded_estbd_dt HB_estbd_dt Cin_estbd_dt
time_estbd_dt router_estbd_dt internet_estbd_dt date9.;
datalines;
98663699 4/7/14 4/9/14 4/7/14 9/12/14 10/15/14 7/7/14 4/7/14 4/7/14 4/12/14 .
33663798 4/11/14 . 4/11/14 . 4/11/14 4/11/14 4/11/14 4/11/14 6/11/14 7/15/14
43663463 5/12/14 5/12/14 5/12/14 9/5/14 9/17/14 . . . . .
77661437 5/16/14 . 5/16/14 . 10/31/14 . 5/16/14 5/16/14 11/16/14 .
85662295 5/29/14 . . 5/29/14 . 6/12/14 . . 11/16/14 .
36656756 6/4/14 . . . 6/4/14 6/4/14 6/12/14 6/4/14 6/4/14 12/4/14
67662646 6/14/14 . 6/14/14 8/31/14 . . 6/17/14 6/14/14 . 6/22/14
55663786 6/26/14 . . . 8/14/14 6/26/14 7/8/14 6/26/14 11/30/14 .
44663191 8/21/14 . 9/30/14 . . . . 1/12/15 . 10/31/14
;
The variables I’m trying to produce are:
Signup month (easy to do)
A count of the total number of signups for that month (easy to do)
A overall count of additional products which included with sign up
A variable which has all add on product values (transposed from original dataset).
A count of the different products purchased on the startup date
A count of add on products purchased after the signup date that were purchased in the same month of the signup date
7.Then month variables which count the additional add on products by month
If I take just April, the output I'm looking for is something like this:
data want ;
length
Sign_up_Month $5
Sign_up_count 8
Initial_Products_total 8
Products $25
Prod_Purchased_on_Signup 8
AddPro_ April_After_SU 8
May 8 June 8 July 8 August 8 September 8 October 8;
INPUT Sign_up_Month $
Sign_up_count
Initial_Products_total
Products $
Prod_Purchased_on_Signup
AddPro_ April_After_SU
May June July August September October;
datalines;
April 2 8 preferredhd_tv_estbd_dt 1
April 2 8 ultimate_estbd_dt 2
April 2 8 quant_estbd_dt 1
April 2 8 FullyLoaded_estbd_dt 1 1
April 2 8 HB_estbd_dt 1
April 2 8 Cin_estbd_dt 2
April 2 8 time_estbd_dt 2
April 2 8 router_estbd_dt 1 1
April 2 8 internet_estbd_dt 1
;
Below is the code I have for the first three vars in the output data set: signup_month, Sign_up_count, Initial_Products_total.
proc sort data=have;
by ID signup_DT; run;
proc transpose data=have out=have (drop=_LABEL_);
by ID signup_DT; run;
data have;
set have;
if signup_DT=COL1 then Initial_flag=1;run;
proc sql;
create table have as
select distinct
count( distinct ID) as Sign_up_count ,
month (signup_DT) as signup_month,
sum (Initial_flag) as Initial_Products
from have
group by month ( signup_DT) ; quit;
I'm having trouble creating the remaining vars: Prod_Purchased_on_Signup, AddPro_ April_After_SU and the counts by month.
I having been experimenting with arrays to try and accomplish this but I've been having trouble.
I'm not certain from your question what level of aggregation you want your counts to be at. But here is a solution if you are looking for summary for each distinct ID and sign-up date. This requires your original input sorted by ID signup_DT.
proc transpose
data = have
out = trans;
by ID signup_DT;
run;
/* Sort for by group processing and regular name order */
proc sort data = trans;
by ID signup_DT _NAME_;
run;
data products (drop = _NAME_ COL1 i);
set trans;
/* For by group processing */
by ID signup_DT;
/* Get the signup month as a word */
signup_month = put(signup_DT, monname.);
/* Make the product list variable to prevent truncation */
length Products $400.;
/* Retain so we can add to the variables as we go down through the group */
retain Products Sign_up_count signups_month0-signups_month4;
/* Set up array reference for later month counts so we can loop */
array som[5] signups_month0-signups_month4;
/* Reset out new variables */
if first.signup_DT then do;
Products = "";
Sign_up_count = 0;
do i = 1 to 5;
som[i] = 0;
end;
end;
/* Add to the listt and count of sign up products */
if signup_DT = COL1 then do;
Sign_up_count + 1;
Products = catx(" ", Products, _NAME_);
end;
/* Otherwise add to the later month counts by checking months seperating the dates */
else do i = 1 to 5;
if intck("month", signup_DT, COL1) = i - 1 then som[i] + 1;
end;
/* Only output once we have completed a group */
if last.signup_DT and Sign_up_count then output;
run;