Base SAS: Transposing Counts By Date Var - arrays

I need to create a summary dataset/report which tracks the flow of these purchases over time.I have a dataset which gives a signup date for an overall service and 9 variables which give the purchase dates for different add on products. If the add on variable dates match the signup date then those add on products were included with the signup package. Any add on variable purchase date that comes after the signup date are products which are purchased during the history of the active account. This is what it looks like:
data have ;
length ID 8
signup_DT 8 preferredhd_tv_estbd_dt 8
ultimate_estbd_dt 8 quant_estbd_dt 8
FullyLoaded_estbd_dt 8 HB_estbd_dt Cin_estbd_dt 8
time_estbd_dt 8 router_estbd_dt internet_estbd_dt 8;
INPUT ID 8
signup_DT : anydtdte9. preferredhd_tv_estbd_dt : anydtdte9.
ultimate_estbd_dt : anydtdte9. quant_estbd_dt : anydtdte9.
FullyLoaded_estbd_dt : anydtdte9. HB_estbd_dt Cin_estbd_dt : anydtdte9.
time_estbd_dt : anydtdte9. router_estbd_dt internet_estbd_dt : anydtdte9. ;;
format signup_DT preferredhd_tv_estbd_dt
ultimate_estbd_dt quant_estbd_dt
FullyLoaded_estbd_dt HB_estbd_dt Cin_estbd_dt
time_estbd_dt router_estbd_dt internet_estbd_dt date9.;
datalines;
98663699 4/7/14 4/9/14 4/7/14 9/12/14 10/15/14 7/7/14 4/7/14 4/7/14 4/12/14 .
33663798 4/11/14 . 4/11/14 . 4/11/14 4/11/14 4/11/14 4/11/14 6/11/14 7/15/14
43663463 5/12/14 5/12/14 5/12/14 9/5/14 9/17/14 . . . . .
77661437 5/16/14 . 5/16/14 . 10/31/14 . 5/16/14 5/16/14 11/16/14 .
85662295 5/29/14 . . 5/29/14 . 6/12/14 . . 11/16/14 .
36656756 6/4/14 . . . 6/4/14 6/4/14 6/12/14 6/4/14 6/4/14 12/4/14
67662646 6/14/14 . 6/14/14 8/31/14 . . 6/17/14 6/14/14 . 6/22/14
55663786 6/26/14 . . . 8/14/14 6/26/14 7/8/14 6/26/14 11/30/14 .
44663191 8/21/14 . 9/30/14 . . . . 1/12/15 . 10/31/14
;
The variables I’m trying to produce are:
Signup month (easy to do)
A count of the total number of signups for that month (easy to do)
A overall count of additional products which included with sign up
A variable which has all add on product values (transposed from original dataset).
A count of the different products purchased on the startup date
A count of add on products purchased after the signup date that were purchased in the same month of the signup date
7.Then month variables which count the additional add on products by month
If I take just April, the output I'm looking for is something like this:
data want ;
length
Sign_up_Month $5
Sign_up_count 8
Initial_Products_total 8
Products $25
Prod_Purchased_on_Signup 8
AddPro_ April_After_SU 8
May 8 June 8 July 8 August 8 September 8 October 8;
INPUT Sign_up_Month $
Sign_up_count
Initial_Products_total
Products $
Prod_Purchased_on_Signup
AddPro_ April_After_SU
May June July August September October;
datalines;
April 2 8 preferredhd_tv_estbd_dt 1
April 2 8 ultimate_estbd_dt 2
April 2 8 quant_estbd_dt 1
April 2 8 FullyLoaded_estbd_dt 1 1
April 2 8 HB_estbd_dt 1
April 2 8 Cin_estbd_dt 2
April 2 8 time_estbd_dt 2
April 2 8 router_estbd_dt 1 1
April 2 8 internet_estbd_dt 1
;
Below is the code I have for the first three vars in the output data set: signup_month, Sign_up_count, Initial_Products_total.
proc sort data=have;
by ID signup_DT; run;
proc transpose data=have out=have (drop=_LABEL_);
by ID signup_DT; run;
data have;
set have;
if signup_DT=COL1 then Initial_flag=1;run;
proc sql;
create table have as
select distinct
count( distinct ID) as Sign_up_count ,
month (signup_DT) as signup_month,
sum (Initial_flag) as Initial_Products
from have
group by month ( signup_DT) ; quit;
I'm having trouble creating the remaining vars: Prod_Purchased_on_Signup, AddPro_ April_After_SU and the counts by month.
I having been experimenting with arrays to try and accomplish this but I've been having trouble.

I'm not certain from your question what level of aggregation you want your counts to be at. But here is a solution if you are looking for summary for each distinct ID and sign-up date. This requires your original input sorted by ID signup_DT.
proc transpose
data = have
out = trans;
by ID signup_DT;
run;
/* Sort for by group processing and regular name order */
proc sort data = trans;
by ID signup_DT _NAME_;
run;
data products (drop = _NAME_ COL1 i);
set trans;
/* For by group processing */
by ID signup_DT;
/* Get the signup month as a word */
signup_month = put(signup_DT, monname.);
/* Make the product list variable to prevent truncation */
length Products $400.;
/* Retain so we can add to the variables as we go down through the group */
retain Products Sign_up_count signups_month0-signups_month4;
/* Set up array reference for later month counts so we can loop */
array som[5] signups_month0-signups_month4;
/* Reset out new variables */
if first.signup_DT then do;
Products = "";
Sign_up_count = 0;
do i = 1 to 5;
som[i] = 0;
end;
end;
/* Add to the listt and count of sign up products */
if signup_DT = COL1 then do;
Sign_up_count + 1;
Products = catx(" ", Products, _NAME_);
end;
/* Otherwise add to the later month counts by checking months seperating the dates */
else do i = 1 to 5;
if intck("month", signup_DT, COL1) = i - 1 then som[i] + 1;
end;
/* Only output once we have completed a group */
if last.signup_DT and Sign_up_count then output;
run;

Related

SAS : How to use loop with date

I need to do inner join with a dataset which has date and month in its name, i.e.
Account_2019_10 (as in Oct 2019).
I need to perform this inner join in a loop for each month from a specific month-year till today's month-year.(i.e. from Sept 2019 till July 2020). Considering the dataset has the month & year in the above format (2019_10 for Oct 2019), how would i perform this loop and append all the results in a group for that month-year?
To change the name of the dataset being referenced you will need to use some code generation. Typically just by using macro variables in place of the dataset name(s).
To loop over dates using an offset value and the INTNX() function. You can use INTCK() to determine how many months to generate.
data _null_;
start = '01OCT2019'd ;
end = '01JUL2020'd ;
length name $32 names $1000;
do offset=0 to intck('month',start,end);
date=intnx('month',start,offset);
name='account_'||translate(put(date,yymm7.),'_','M');
names=catx(' ',names,name);
end;
call symputx('names',names);
run;
Now that you have this list of dataset names you can use it in your code to combine the datasets.
data all;
set &names ;
run;
If your monthly tables do not actually have a variable already that indicates the month you can add one by using the INDSNAME= option of the SET statement. Note that the variable created by that option is not saved so you need to copy the value.
data all;
length dsname $41 ;
set &names indsname=dsname;
month = dsname;
run;
Use the SQL data dictionary to identify the data sets containing a yyyy_mm construct.
Select them into a macro variable that will be used to combine all the data sets in a SET statement with INDSNAME option.
Example:
%macro makefakedata();
%local year month amount;
%do year = 2019 %to 2020;
%do month = 1 %to 12;
data work.account_&year._%sysfunc(putn(&month,z2.));
do id = 1 to 10;
amount = 100 * %sysfunc(monotonic()) + id;
output;
end;
run;
%end;
%end;
%mend;
data work.foo_bar;
set sashelp.class;
run;
%makefakedata;
ods listing;
proc sql noprint;
select
catx('.', libname, memname) as dataset
, input (
cats ( substr ( memname, index(memname,'_')+1 ) , '_01' )
, ? YYMMDD10.
) as month
into
:datasets separated by ' '
, :months separated by ' '
from
dictionary.tables
where libname = 'WORK'
and index(memname,'_')
having
month
;
%put &=datasets;
%put &=months;
data all_month_named_data;
set &datasets indsname=from;
source = from;
month = input (
cats ( substr ( source, index(source,'_')+1 ) , '_01' )
, YYMMDD10.);
format month yymm7.;
run;

Oracle database is returning date as 1951 instead of 2051

Hi i am using an input field to enter date in UI and enters as 02/01/2051. It will save into the database as 01-FEB-51, i am using DATE as the datatype. When i am fetching it is returning as 01-feb-1951. Below is the query i am using
select to_date(LN_MAT_DT,'dd/mm/YYYY') from Emp ;
Can some one please help on this.
It is about format mask you use and differences between RRRR and YYYY. Have a look at the following example:
SQL> select to_date('01.02.51', 'dd.mm.yy') date_yy,
2 to_date('01.02.51', 'dd.mm.rr') date_rr,
3 --
4 to_date('01.02.1951', 'dd.mm.yyyy') date_19_yyyy,
5 to_date('01.02.1951', 'dd.mm.rrrr') date_19_rrrr,
6 --
7 to_date('01.02.2051', 'dd.mm.yyyy') date_20_yyyy,
8 to_date('01.02.2051', 'dd.mm.rrrr') date_20_rrrr
9 from dual;
DATE_YY DATE_RR DATE_19_YY DATE_19_RR DATE_20_YY DATE_20_RR
---------- ---------- ---------- ---------- ---------- ----------
01.02.2051 01.02.1951 01.02.1951 01.02.1951 01.02.2051 01.02.2051
SQL>
What you should do is to use 4-digits year with the YYYY format mask.
SInce LN_MAT_DT is DATE datatype, you need to_char, not to_date, ie:
select to_char(LN_MAT_DT,'dd/mm/YYYY') from Emp ;

How to create date field using the date from data filename in SAS?

How do I get the dates from the file name to populate the date column?
I have 23 data files:
price_20070131
price_20070228
price_20070331
.
.
.
price_20081130
In the data file, price_20070131, it currently looks like this:
ID Product
001 A
002 B
003 C
I want my output to look like this:
ID Product Date
001 A 31Jan2007
002 B 31Jan2007
003 C 31Jan2007
The same will be repeated to all the 23 data files. And final result would merge all 23 files to look like this:
ID Product Date
001 A 31Jan2007
002 B 31Jan2007
003 C 31Jan2007
001 A 28Feb2007
002 B 28Feb2007
003 C 28Feb2007
.
.
.
.
001 A 30Nov2007
002 B 30Nov2007
003 C 30Nov2007
Use the INDSNAME option to add the file name and then use SCAN/SUBSTR() to extract the date portion. This would append all data sets starting with price_2007 and price_2008 and add a date field.
data want;
set price_2007: price_2008: indsname=source;
date=input(scan(source, 2, '_'), yymmdd10.);
format date date9.;
run;
EDIT: SAS 9.1 is about 15 years old so you should really upgrade. Upgrades are included with your license. This means you don't have data set lists or the ability to use the INDSNAME option and means you either need a macro solution of some sort. 4 lines of code becomes 47...
Making an assumption that your data sets are named PRICE_LAST_DAY_MONTH consistently.
*sample data sets for demonstration;
data price_20080131;
set sashelp.class;
test=1;
run;
data price_20080229;
set sashelp.class;
test=2;
run;
%macro stack_data_add_date(start_date=, end_date=, outData=, debug=);
%*get parameters for looping, primarily the number of intervals;
data _null_;
start_date= input("&start_date", yymmdd10.);
end_date = input("&end_date", yymmdd10.);
n_intervals = intck('month', start_date, end_date);
call symputx('start_date', start_date, 'l');
call symputx('end_date', end_date, 'l');
call symputx('n_intervals', n_intervals, 'l');
run;
%*loop from 0 - starting time to end;
%do i=0 %to &n_intervals;
%*determine end of month date for dataset name;
%let date = %sysfunc(intnx(month, &start_date, &i., e));
%*output statistics for testing;
%if &debug=Y %then %do;
%put &n_intervals;
%put &start_date;
%put &end_date;
%end;
%*create a view with the data and date added in;
data _temp / view=_temp;
set price_%sysfunc(putn(&date, yymmddn8.));
date = &date.;
format date date9.;
run;
%*insert into master table;
proc append base=&outData data=_temp;
run;
%*delete view so it doesn't exist for next loop;
proc datasets lib=work nodetails nolist;
delete _temp / memtype=view;
run;quit;
%end;
%mend;
*test;
%stack_data_add_date(start_date=20080131, end_date=20080229, outData=want, debug=Y);

Nested If do statements SAS

I have a dataset which looks like this:
ID 2017 2018 2019 2020
2017 30 24 20 18
2018 30 24 20 18
2019 30 24 20 18
2020 30 24 20 18
I am looking to create an array based on a few inputs:
%let FixedorFloating = '1 or 0';
%let Repricingfrequency = n Years;
%let LastRepricingDate = 'Date'n;
So far my code looks like this:
data ReferenceRateContract;
set refratecontract;
*arrays for years and flags;
array _year(2017:2020) year2017-year2020;
array _flag(2017:2020) flag2017-flag2020;
*loop over array;
if &FixedorFloating=1;
do i=&dateoflastrepricing to hbound(_year);
/*check if year matches year in variable name*/
if put(ID, 4.) = compress(vname(_year(i)),, 'kd')
then _flag(i)=1;
else _flag(i)=0;
end;
else if &fixedorfloating=0;
do i=&dateoflastrepricing to hbound(_year);
if put (ID,4.)<=compress(vname(_year(i)),,'kd')
then _flag(i)=1;
else if put (ID, 4.) = compress(vname(_year(i-2*i)),, 'kd')
then _flag(i)=1;
else _flag(i)=0;
end;
drop i;
run;
The code works for the original if function but I'd like to make this more dynamic by introducing the else if FixedorFloating=0.
I'm also looking to make my function able to decipher whether the ID is on a year +2i year from the ID. i.e.
if ID=2017 - i'd like a 1 for years 2017, 2019. For ID=2018,
I'd like a 1 for 2018, 2020 and so on hence the
year(I-2*I)
I'm unsure if this is reasonable or incorrect.
The error of the log looks like this:
82 else if &fixedorfloating=0;
____
160
ERROR 160-185: No matching IF-THEN clause.
84 then do i=&dateoflastrepricing to hbound(_year);
____
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
91 else _flag(i)=0;
92 end;
___
161
ERROR 161-185: No matching DO/SELECT statement.
I'm assuming the if do followed by an else-if do isn't structured properly.
the issue is here:
if &FixedorFloating=1;
do i=&dateoflastrepricing to hbound(_year);
the first if is a "gating if", meaning that only records matching the condition are processed.
Try changing to:
if &FixedorFloating=1 then
do i=&dateoflastrepricing to hbound(_year);
data ReferenceRateContract;
set refratecontract;
*arrays for years and flags;
array _year(2017:2020) year2017-year2020;
array _flag(2017:2020) flag2017-flag2020;
*loop over array;
if &FixedorFloating=1
then do i=&dateoflastrepricing to hbound(_year);
/*check if year matches year in variable name*/
if put(ID, 4.) = compress(vname(_year(i)),, 'kd')
then _flag(i)=1;
else _flag(i)=0;
end;
else if &fixedorfloating=0
then do i=&dateoflastrepricing to hbound(_year);
if put (ID,4.)<=compress(vname(_year(i)),,'kd')
then _flag(i)=1;
else if put (ID, 4.) = compress(vname(_year(i-2*i)),, 'kd')
then _flag(i)=1;
else _flag(i)=0;
end;
drop i;
run;
by KurtBremser

Iterate through two datasets to create distinct results dataset

In SAS, I have the following two datasets:
Dataset #1: Data on people's meal preferences
ID | Meal | Meal_rank
1 Lobster 1
1 Cake 2
1 Hot Dog 3
1 Salad 4
1 Fries 5
2 Burger 1
2 Hot Dog 2
2 Pizza 3
2 Fries 4
3 Hot Dog 1
3 Salad 2
3 Soup 3
4 Lobster 1
4 Hot Dog 2
4 Burger 3
Dataset #2: Data on meal availability
Meal | Units_available
Hot Dog 2
Burger 1
Pizza 2
In SAS, I'd like to find a way to derive a result dataset that looks as follows (without changing anything in Dataset #1 or #2):
ID | Assigned_Meal
1 Hot Dog
2 Burger
3 Hot Dog
4 Meal cannot be assigned (out of stock/unavailable)
The results are driven by a process that iterates through the meals of each person (identified by their 'ID' values) until either:
A meal is found where there are enough units available.
All meals have been checked against the availability data.
Notably:
There are cases where the person lists a meal that isn't available.
The dataset I'm working with is much larger than in this example (thousands of rows).
Here is SAS code for creating the two sample datasets:
proc sql;
create table work.ppl_meal_pref
(ID char(4),
Meal char(20),
Meal_rank num);
insert into work.ppl_meal_pref
values('1','Lobster',1)
values('1','Cake',2)
values('1','Hot Dog',3)
values('1','Salad',4)
values('1','Fries',5)
values('2','Burger',1)
values('2','Hot Dog',2)
values('2','Pizza',3)
values('2','Fries',4)
values('3','Hot Dog',1)
values('3','Salad',2)
values('3','Soup',3)
values('4','Lobster',1)
values('4','Hot Dog',2)
values('4','Burger',3)
;
quit;
run;
proc sql;
create table work.lunch_menu
(FoodName char(14),
Units_available num);
insert into work.lunch_menu
values('Hot Dog',2)
values('Burger',1)
values('Pizza',1)
;
quit;
run;
I've tried to implement loops to perform this task, but to no avail (see below).
data work.assign_meals;
length FoodName $ 14 Units_available 8;
if (_n_ = 1) then do;
declare hash lookup(dataset:'work.lunch_menu', duplicate: 'error', ordered: 'ascending', multidata: 'NO');
lookup.defineKey('FoodName');
lookup.defineData('Units_available');
lookup.defineDone();
end;
do until (eof_pref);
set work.ppl_meal_pref END = eof_pref;
rc = lookup.FIND();
IF rc ne 0 THEN DO;
Units_available = 0;
end;
output;
end;
stop;
run;
Here is a working hash based code using the sample data from ealfons1. Having different variable names for the key (Meal versus FoodName) mean you have to use extra syntax in the FIND() (or you could rename in the SET or DATASET specifiers)
It will also output an updated stock level dataset. Tracking the not assigned condition, i.e. what preferences were run out / not stocked for each ID who did not get a meal assignment, would require extra code and output data.
data meal_assignments;
if 0 then set meals_stock; * prep PDV;
declare hash stock (dataset:'meals_stock');
stock.defineKey('FoodName');
stock.defineData('FoodName', 'Units_available');
stock.defineDone();
do until (lastrow_flag);
assigned = 0;
stocked = 0;
do until (last.ID);
set ppl_meal_pref end=lastrow_flag;
by ID Meal_rank; * error will happen if meal_rank is not monotonic;
if assigned then continue; * alread assigned;
if stock.find(key:Meal) ne 0 then continue; * off the menu;
stocked = 1;
if Units_available < 1 then continue; * out of stock or missing count;
Units_available + (-1);
if stock.replace() = 0 then do; * hash replace worked;
assigned = 1;
OUTPUT;
end;
else put 'WARNING: Problem with stock hash ' Meal=;
end;
if not assigned then do;
if stocked then Meal = 'Ran out'; else Meal = 'Not stocked';
OUTPUT;
end;
end;
keep ID Meal;
stock.output(dataset:'meals_stock_after_assignments');
stop;
run;
options nocenter;
title "Meals report";
proc print noobs data=meal_assignments; title2 "Assignments";
proc print noobs data=meals_stock_after_assignments; title2 "New stock levels";
proc sql;
title2 "Usage summary";
select A.Meal, A.have_count, B.had_count, B.had_count - A.have_count as use_count
from
(select FoodName as Meal, Units_available as have_count from meals_stock_after_assignments) as A
join
(select FoodName as Meal, Units_available as had_count from meals_stock) as B
on A.Meal = B.Meal
;
quit;
The 'want' here is queue based:
first come, first served by preference rank solution.
a random queue order over ID could deliver a modicum of perceived 'fairness'
More difficult solutions would be based on global planning, such as:
serve most people, highest preference rank
serve most people, lowest cost
etc ...
Another approach: modify-ing the meal availability dataset as you go along. This is slightly more concise than the hash approach but might not perform quite as well. On the other hand, it will still work even if your lunch_menu dataset is too large to fit conveniently into memory, and you have a record of what meals are left over afterwards. I have renamed variables for consistency between the input datasets:
proc sql;
create table work.ppl_meal_pref
(ID char(4),
Food char(20),
Meal_rank num);
insert into work.ppl_meal_pref
values('1','Lobster',1)
values('1','Cake',2)
values('1','Hot Dog',3)
values('1','Salad',4)
values('1','Fries',5)
values('2','Burger',1)
values('2','Hot Dog',2)
values('2','Pizza',3)
values('2','Fries',4)
values('3','Hot Dog',1)
values('3','Salad',2)
values('3','Soup',3)
values('4','Lobster',1)
values('4','Hot Dog',2)
values('4','Burger',3)
;
quit;
run;
proc sql;
create table work.lunch_menu
(Food char(20),
Units_available num);
insert into work.lunch_menu
values('Hot Dog',2)
values('Burger',1)
values('Pizza',1)
;
quit;
run;
proc datasets lib = work nolist nowarn nodetails;
modify lunch_menu;
index create Food /unique;
run;
quit;
/*Output to assigned_meals and update lunch_menu*/
data assigned_meals(keep = id AssignedFood AssignedFoodRank) lunch_menu;
length AssignedFood $ 20;
do until(last.ID);
set ppl_meal_pref;
by ID;
if missing(AssignedFood) then do;
modify lunch_menu key = Food;
if _iorc_ then _error_ = 0;
else if units_available > 0 then do;
AssignedFood = Food;
AssignedFoodRank = Meal_Rank;
units_available + -1;
replace lunch_menu;
end;
end;
end;
output assigned_meals;
run;
I never used the replace function of hashtables before and I did not test this code, but to my understanding, this should do the job:
/* build a dataset assign_meals with variables ID and Assigned_Meal */
data work.assign_meals (keep=ID Assigned_Meal);
/* Do that while reading ppl_meal_pref */
set work.ppl_meal_pref;
/* Take care can use first.ID to know you start a new ID */
by ID;
/* Remember if someone is served (without retain, SAS forgets all values when reading a new observation) */
retain served;
if first.ID then served = 0;
/* but first read lunch_menu into memory */
length FoodName $ 14 Units_available 8;
if (_n_ = 1) then do;
declare hash lookup(dataset:'work.lunch_menu',
duplicate: 'error',
ordered: 'ascending',
multidata: 'NO');
lookup.defineKey('FoodName');
lookup.defineData('Units_available');
lookup.defineDone();
end;
if not served then do;
/* Look up if the desired meal is available */
rc = lookup.FIND();
IF rc eq 0 THEN DO;
if Units_available gt 0 then do;
/* Serve this customer */
output;
served = 1;
Assigned_Meal= Meal;
/* Remember the a meal is used */
Units_available = Units_available - 1;
lookup.REPLACE();
end;
end;
end;
run;
I currently don't have the time to test it. If it does not work, tell me, so I can do that later.

Resources