create index in SAS using do loop - loops

Say I have a set of data in this format:
ID Product account open date
1 A 20100101
1 B 20100103
2 C 20100104
2 A 20100205
2 D 20100605
3 A 20100101
And I want to create a column to capture the sequence of the products opened so the table will look like this:
ID First Second third
1 A B
2 C A D
3 A
I know I need to create an index for each ID so I can transpose the data afterwards:
ID Product account open date sequence
1 A 20100101 1
1 B 20100103 2
2 C 20100104 1
2 A 20100205 2
2 D 20100605 3
3 A 20100101 1
From my limited knowledge in do loop, I think I need to write something like this:
if first.ID and not last.ID then n=1 do while ID not last n+1
Something like that. Can anyone help me with the exact syntax? I have tried googling for similar codes and haven't had much luck.
Thanks!

I'd sort by ID and then date and use proc transpose for simplicity. Here's an example:
data prod;
input ID Product $ Open_DT :yymmdd8.;
format open_dt date9.;
datalines;
1 A 20100101
1 B 20100103
2 C 20100104
2 A 20100205
2 D 20100605
3 A 20100101
;
run;
proc sort data=prod;
by ID Open_DT;run;
proc transpose data=prod
out=prod_trans(drop=_name_)
prefix=ITEM;
by id;
var Product;
run;
proc print data=prod_trans noobs;
run;

Related

merge or join unequal data and duplicate value of one of them in sas

I am trying to merge 2 datasets (df1, df2) with the one of them df2 has only 1 observation that I want to assign its value to all length of the df1 duplicate with merge in sas.
I am aware that I can add that manually but I want to use automated way as this is just a step in my long code with big data.
Here is a reproducible example and datasets:
data df1;
input a b c;
datalines;
1 2 3
6 7 8
5 6 9
;
run;
data df2;
input d ;
datalines;
4
;
run;
data df3;
merge df1 df2;
run;
/*I need the resulting df3 to be */;
a b c d
1 2 3 4
6 7 8 4
5 6 9 4
Any help will be greatly appreciated.
Then you don't want to MERGE the dataset, since there are no common variables that the merge could actually use.
Instead just SET both datasets, but take care to not read past the end of single observation set.
data want;
set long_dataset;
if _n_=1 then set short_dataset;
run;

Retain last 5 visits by Person in SAS

I have the following that contains dates, the visit number, and a specific variable of interest. I would like to retain the last five visits that are available in SAS by person. I am familiar with retaining the first and last visits. The data for a single subject is listed below:
Person Date VisitNumber VariableOfInterest
001 10/10/2001 1 6
001 11/12/2001 3 8
001 01/05/2002 5 12
001 03/10/2002 6 5
001 05/03/2002 8 3
001 07/29/2002 10 11
Any insight would be appreciated.
A double DOW loop will let you measure the group in the first loop and select from the group based on your desired per-group criteria in the second loop. This is useful when have is large and pre-sorted, and you want to avoid additional sorting.
data want;
* measure the group size;
do _n_ = 1 by 1 until (last.person);
set have;
by person visitnumber; * visitnumber in by only to enforce expectation of orderness;
end;
_i_ = _n_;
* apply the criteria "last 5 rows in group";
do _n_ = 1 to _n_;
set have;
if _i_ - _n_ < 5 then output;
end;
run;
It is easier if you sort by descending VisitNumber so that the problem becomes take the first 5 observations for a person. Then just generate a counter of which observation this is for the person and subset on that.
data want;
set have ;
by person descending visitnumber;
if first.person then rowno=0;
rowno+1;
if rowno <= 5;
run;

SAS Function that can create every possible combination

I have a dataset that looks like this.
data test;
input cat1 $ cat2 $ score;
datalines;
A D 1
A D 2
A E 3
A E 4
A F 4
B D 3
B D 2
B E 6
B E 5
B F 6
C D 8
C D 5
C E 4
C E 12
C E 2
C F 7
;
run;
I want to create tables based off of this table that are summarized forms of this data. For example, I want one table that sums every score for every cat1 and cat2 together, like so
proc sql;
create table all as select
'all' as cat1
,'all' as cat2
,sum(score) as score
from test
group by 1,2
;quit;
I want a table that sums all the scores for cat1='A', despite what cat2 is, like so
proc sql;
create table a_all as select
cat1
,'all' as cat2
,sum(score) as score
from test
where
cat1='A'
group by 1,2
;quit;
I want a table that sums the score for cat1='A' and cat2='E', like so
proc sql;
create table a_e as select
cat1
,cat2
,sum(score) as score
from test
where
cat1='A'
and
cat2='E'
group by 1,2
;quit;
And so on and so forth. I want a comprehensive set of tables that consists of every possible combination. I can use loops if they are efficient. The problem that the real data set I'm using has 8 categories (as opposed to the 2 here) and within those categories, there are as many as 98 levels. So the loops I've been writing have been nested 8 degrees and take up a ton of time. Pain to debug too.
Is there some kind of function or a special array I can apply that will create this series of tables I'm talking about? Thanks!
I think you want what PROC SUMMARY does by default.
data test;
input cat1 $ cat2 $ score;
datalines;
A D 1
A D 2
A E 3
A E 4
A F 4
B D 3
B D 2
B E 6
B E 5
B F 6
C D 8
C D 5
C E 4
C E 12
C E 2
C F 7
;
run;
proc print;
run;
proc summary data=test chartype;
class cat:;
output out=summary sum(score)=;
run;
proc print;
run;

Condense multiple rows to single row with counts based on unique values in sqlite

I am trying to condense a table which contains multiple rows per event to a smaller table which contains counts of key sub-events within each event. Events are defined based on unique combinations across columns.
As a specific example, say I have the following data involving customer visits to various stores on different dates with different items purchased:
cust date store item_type
a 1 Main St 1
a 1 Main St 2
a 1 Main St 2
a 1 Main St 2
b 1 Main St 1
b 1 Main St 2
b 1 Main St 2
c 1 Main St 1
d 2 Elm St 1
d 2 Elm St 3
e 2 Main St 1
e 2 Main St 1
a 3 Main St 1
a 3 Main St 2
I would like to restructure the data to a table that contains a single line per customer visit on a given day, with appropriate counts. I am trying to understand how to use SQLite to condense this to:
Index cust date store n_items item1 item2 item3 item4
1 a 1 Main St 4 1 3 0 0
2 b 1 Main St 3 1 2 0 0
3 c 1 Main St 1 1 0 0 0
4 d 2 Elm St 2 1 0 1 0
5 e 2 Main St 2 2 0 0 0
6 a 3 Main St 2 1 1 0 0
I can do this in excel for this trivial example (begin with sumproduct( cutomer * date) as suggested here, followed by cumulative sum on this column to generate Index, then countif and countifs to generate desired counts).
Excel is poorly suited to doing this for thousands of rows, so I am looking for a solution using SQLite.
Sadly, my SQLite kung-fu is weak.
I think this is the closest I have found, but I am having trouble understanding exactly how to adapt it.
When I tried a more basic approach to begin by generating a unique index:
CREATE UNIQUE INDEX ui ON t(cust, date);
I get:
Error: indexed columns are not unique
I would greatly appreciate any help with where to start. Many thanks in advance!
To create one result record for each unique combination of column values, use GROUP BY.
The number of records in the group is available with COUNT.
To count specific item types, use a boolean expression like item_type=x, which returns 0 or 1, and sum this over all records in the group:
SELECT cust,
date,
store,
COUNT(*) AS n_items,
SUM(item_type = 1) AS item1,
SUM(item_type = 2) AS item2,
SUM(item_type = 3) AS item3,
SUM(item_type = 4) AS item4
FROM t
GROUP BY cust,
date,
store

SAS use a lookup dataset like array in another dataset

I have 1 data set with content description for a school
contents:
num description
content1 math
content2 spanish
content3 geography
content4 chemistry
content5 history
in another data set (students) i have the array content1-content5 and i use a flag to indicate content that have each student.
students
name age content1 content2 content3 content4 content5
BOB 15 1 1 1 1
BRYA 16
CARL 15 1 1
SUE 17 1 1 1
LOU 15 1
if i use a code like this:
data students1;
set students;
array content[5];
format allcontents $100.;
do i=1 to dim(content);
if content[i]=1 then do;
allcontents=cat(vname(content[i]),',',allcontents);
end;
end;
run;
the result is:
name age content1 content2 content3 content4 content5 allcontents
BOB 15 1 1 1 1 content1,content2,content3,content5,
BRYA 16
CARL 15 1 1 content2,content5,
SUE 17 1 1 1 content3,content4,content5,
LOU 15 1 content5
1) i want to use the name of the lookup table (data set contents) to use the name of the content and not the arrays names of content[1-5] in the variable allcontents. how can i do that?
2) and later i want the result by content description, not by student, like this:
description name age
math BOB 15
spanish BOB 15
geography BOB 15
history BOB 15
spanish CARL 15
history CARL 15
spanish SUE 17
chemistry SUE 17
history SUE 17
history LOU 15
is it possible?
thanks.
First, grab the %create_hash() macro from this post.
Use the hash table to look up the values.
data students1;
set students
array content[5];
format num $32. description $16.;
if _n_ = 1 then do;
%create_hash(cnt,num,description,"contents");
end;
do i=1 to 5;
if content[i]=1 then do;
num = vname(content[i]);
rc = cnt.find();
output;
end;
end;
keep description name age;
run;
I find proc transpose suitable. Doing once is enough for question 2) and twice for renaming the variables contents1-5 (hence question 1). The key is the ID statement in proc transpose which automatically rename variables by their corresponding transposed orders.
The code below should give you the desired answers (albeit the name are ordered alphabetically, which may not be the same as your original ordering).
/* original data sets */
data names;
input num $ description $;
cards;
content1 math
content2 spanish
content3 geography
content4 chemistry
content5 history
;run;
data students;
input name $ age content1 content2 content3 content4 content5;
cards;
BOB 15 1 1 1 . 1
BRYA 16 . . . . .
CARL 15 . 1 . . 1
SUE 17 . . 1 1 1
LOU 15 . . . . 1
;run;
/* transpose */
proc sort data=students out=tmp_sorted;
by name age;
run;
proc transpose data=tmp_sorted out=tmp_transposed;
by name age;
run;
/* merge the names of content1-5 */
* If you want to preserve ordering from contents1-contents5
* instead of alphabetical ordering of "description" column
* from a-z, do not drop the "num" column for further use.;
proc sql;
create table tmp_merged as
select B.description, A.name, A.age, B.num, A.COL1
from tmp_transposed as A
left join names as B
on A._NAME_=B.num
order by A.name, B.num;
quit;
/* transpose again */
proc transpose data=tmp_merged(drop=num) out=tmp_renamed(drop=_name_);
by name age;
ID description; *name the transposed variables;
run;
/* answer (1) */
data ans1;
set tmp_renamed;
array content[5] math--history;
format allcontents $100.;
do i=1 to dim(content);
* better use cats (cat does not seem to work);
if content[i]=1 then allcontents=cats(allcontents,',',vname(content[i]));
end;
*kill the leading comma;
allcontents=substr(allcontents,2,99);
run;
/* answer (2) */
data ans2(drop=num col1);
set tmp_merged;
where col1=1;
run;
*cleanup;
proc datasets lib=work nolist;
delete tmp_:;
quit;

Resources