SAS use a lookup dataset like array in another dataset - dataset

I have 1 data set with content description for a school
contents:
num description
content1 math
content2 spanish
content3 geography
content4 chemistry
content5 history
in another data set (students) i have the array content1-content5 and i use a flag to indicate content that have each student.
students
name age content1 content2 content3 content4 content5
BOB 15 1 1 1 1
BRYA 16
CARL 15 1 1
SUE 17 1 1 1
LOU 15 1
if i use a code like this:
data students1;
set students;
array content[5];
format allcontents $100.;
do i=1 to dim(content);
if content[i]=1 then do;
allcontents=cat(vname(content[i]),',',allcontents);
end;
end;
run;
the result is:
name age content1 content2 content3 content4 content5 allcontents
BOB 15 1 1 1 1 content1,content2,content3,content5,
BRYA 16
CARL 15 1 1 content2,content5,
SUE 17 1 1 1 content3,content4,content5,
LOU 15 1 content5
1) i want to use the name of the lookup table (data set contents) to use the name of the content and not the arrays names of content[1-5] in the variable allcontents. how can i do that?
2) and later i want the result by content description, not by student, like this:
description name age
math BOB 15
spanish BOB 15
geography BOB 15
history BOB 15
spanish CARL 15
history CARL 15
spanish SUE 17
chemistry SUE 17
history SUE 17
history LOU 15
is it possible?
thanks.

First, grab the %create_hash() macro from this post.
Use the hash table to look up the values.
data students1;
set students
array content[5];
format num $32. description $16.;
if _n_ = 1 then do;
%create_hash(cnt,num,description,"contents");
end;
do i=1 to 5;
if content[i]=1 then do;
num = vname(content[i]);
rc = cnt.find();
output;
end;
end;
keep description name age;
run;

I find proc transpose suitable. Doing once is enough for question 2) and twice for renaming the variables contents1-5 (hence question 1). The key is the ID statement in proc transpose which automatically rename variables by their corresponding transposed orders.
The code below should give you the desired answers (albeit the name are ordered alphabetically, which may not be the same as your original ordering).
/* original data sets */
data names;
input num $ description $;
cards;
content1 math
content2 spanish
content3 geography
content4 chemistry
content5 history
;run;
data students;
input name $ age content1 content2 content3 content4 content5;
cards;
BOB 15 1 1 1 . 1
BRYA 16 . . . . .
CARL 15 . 1 . . 1
SUE 17 . . 1 1 1
LOU 15 . . . . 1
;run;
/* transpose */
proc sort data=students out=tmp_sorted;
by name age;
run;
proc transpose data=tmp_sorted out=tmp_transposed;
by name age;
run;
/* merge the names of content1-5 */
* If you want to preserve ordering from contents1-contents5
* instead of alphabetical ordering of "description" column
* from a-z, do not drop the "num" column for further use.;
proc sql;
create table tmp_merged as
select B.description, A.name, A.age, B.num, A.COL1
from tmp_transposed as A
left join names as B
on A._NAME_=B.num
order by A.name, B.num;
quit;
/* transpose again */
proc transpose data=tmp_merged(drop=num) out=tmp_renamed(drop=_name_);
by name age;
ID description; *name the transposed variables;
run;
/* answer (1) */
data ans1;
set tmp_renamed;
array content[5] math--history;
format allcontents $100.;
do i=1 to dim(content);
* better use cats (cat does not seem to work);
if content[i]=1 then allcontents=cats(allcontents,',',vname(content[i]));
end;
*kill the leading comma;
allcontents=substr(allcontents,2,99);
run;
/* answer (2) */
data ans2(drop=num col1);
set tmp_merged;
where col1=1;
run;
*cleanup;
proc datasets lib=work nolist;
delete tmp_:;
quit;

Related

Modifying final column value in SAS by group

I have the following data set:
Student TestDayStart TestDayEnd
001 1 5
001 6 10
001 11 15
002 1 4
002 5 9
002 10 14
I would like to make the last 'TestDayEnd' the final value for 'TestDayStart' for each Student.
So the data should look like this:
Student TestDayStart TestDayEnd
001 1 5
001 6 10
001 11 15
001 15 15
002 1 4
002 5 9
002 10 14
002 14 14
I'm not quite sure how I can do this in SAS. Any insight would be appreciated.
After sorting the dataset you can do this within a data step.
proc sort data=have;
by student testdaystart testdayend;
run;
Now you can use the by and retain statements in the data step. The by statement allows you to find the last student, and the retain statement lets you keep the previous value in the dataset.
data want;
set have;
retain last_testdayend;
by student testdaystart testdayend;
output;
last_testdayend = testdayend;
if last.student then do;
if testdaystart ne testdayend then do;
testdaystart = last_testdayend;
testdayend = last_testdayend;
output; * this second output statement creates a new record in the dataset;
end;
end;
drop last_testdayend;
run;

Replacing null value in SAS with next available value by group

I am trying to replace missing values that occur before the first non-null entry in SAS. I have the following data:
StudentID Day TestScore
Student001 0 .
Student001 1 78
Student001 2 89
Student002 3 .
Student002 4 .
Student002 5 .
Student002 6 95
I'd like to modify the data so the null values are replaces with the next available non-null entry:
StudentID Day TestScore
Student001 0 78
Student001 1 78
Student001 2 89
Student002 3 95
Student002 4 95
Student002 5 95
Student002 6 95
data scores;
length StudentID $ 10;
input StudentID $ Day TestScore;
datalines;
Student001 0 .
Student001 1 78
Student001 2 89
Student002 3 .
Student002 4 .
Student002 5 .
Student002 6 95
;
run;
proc sort data = scores;
by descending day;
run;
data scores;
drop addscore;
retain addscore;
set scores;
if testscore ne . then addscore = testscore;
if testscore eq . then testscore = addscore;
run;
proc sort data = scores;
by day;
run;
proc sort data = have;
by id descending day ;
run;
data want;
set have;
by id;
retain last_score;
if first.id then call missing(last_score);
if not missing(score) then last_score = score;
else score = last_score;
run;
proc sort data=want;
by id day;
run;
FYI, this will NOT set the missing values if there are any after the last known score for a given ID. i.e. if you had something like:
Student002 5 95
Student002 6 .
Then only records prior to day 5 for id 002 will get a value of 95. Is that a possible condition for you? If yes, this solution will require a slight modification
You can use a DOW loop to identify the next non-missing score, and a subsequent DOW loop to apply the non-missing score. The DOW approach does not require sorting and maintains the original row order.
data want;
do _n_ = 1 by 1 until (last.id or not missing(score));
set have;
by id;
end;
_score = score;
do _n_ = 1 to _n_;
set have;
score = _score;
output;
end;
drop _score;
run;
In SQL, presuming day ordering, the imputed value can be looked up in a correlated sub-query.
proc sql;
create table want as
select
id, day,
case
when not missing(score) then score
else (select score from have as inner
where inner.id = outer.id
and inner.day > outer.day
and not missing(score)
having inner.day = min(inner.day)
)
end as score
from have as outer;

SAS combining datasets, binary search, indices

In SAS, for the two test datasets below - for every value of "amount" that falls within "y" and "z", I need to extract the corresponding "x". There could be multiple values of "x" that fit into the criteria.
The final result should look something like this:
/*
4 banana eggs
15 .
31 .
7 banana
22 fig
1 eggs
11 coconut
17 date
41 apple
*/
I realize this relies on using indices or binary searches but I can't figure out a workable solution! Any help would appreciated! Thanks!
data test1;
input x $ y z;
datalines;
apple 29 43
banana 2 7
coconut 9 13
date 17 20
eggs 1 5
fig 18 26
;
run;
data test2;
input amount;
datalines;
4
15
31
7
22
1
11
17
41
;
run;
Join the two datasets so amount falls between y and z.
proc sql;
create table join as
select a.amount
,b.*
from test2 a
left join
test1 b
on a.amount between b.y and b.z;
quit;
Sort the result by amount for transpose.
proc sort data=join; by amount; run;
Transpose it.
proc transpose data=join out=trans;
by amount;
var x;
run;
Now you have your fruits each in its own variable named col1, col2, ....
If you want them all in one variable separated by a blank, just concatenate them.
data trans2(keep= amount text);
set trans(drop=_name_);
array v{*} _character_;
text = catx(' ', of v{*});
run;
Here is a possible solution using "old-fashioned" data step code plus PROC TRANSPOSE:
data test1;
input x $ y z;
datalines;
apple 29 43
banana 2 7
coconut 9 13
date 17 20
eggs 1 5
fig 18 26
run;
data test2;
input amount;
datalines;
4
15
31
7
22
1
11
17
41
run;
data want(keep=amount x);
set test2;
found = 0;
do _i_=1 to nobs;
set test1 point=_i_ nobs=nobs;
if y <= amount <= z then do;
found = 1;
output;
end;
end;
if not found then do;
x = ' ';
output;
end;
run;
proc transpose data=want out=want2(drop=_name_);
by amount notsorted;
var x;
run;
Note my results do not match that in your example; amount 31 is an "apple".

parsing a text file in sas

So I have a rather messy text file I'm trying to convert to a sas data set. It looks something like this (though much bigger):
0305679 SMITH, JOHN ARCH05 001 2
ARCH05 005 3
ARCH05 001 7
I'm trying to set 5 separate variables (ID, name, job, time, hours) but clearly only 3 of the variables appear after the first line. I tried this:
infile "C:\Users\Desktop\jobs.txt" dlm = ' ' dsd missover;
input ID $ name $ job $ time hours;
and didn't get the right output, then I tried to parse it
infile "C:\Users\Desktop\jobs.txt" dlm = ' ' dsd missover; input
allData $; id = substr(allData, find(allData,"305")-2, 7);
but I'm still not getting the right output. Any ideas?
EDIT: I'm trying now to use .scan() and .substr() to apart the larger data set, how do I subset a single line from the data?
Your data might not be all that messy; it just might be in a hierarchical format where the first row contains all five variables and subsequent rows contain values for variables 3-5. In other words, ID and NAME should be retained as you read through the file.
If that is correct (it's a hierarchical layout) this here is a possible solution:
data have;
retain ID NAME;
informat ID 7. JOB $6. TIME 3. HOURS 1.;
input #1 test_string $7. #;
if notdigit(test_string) = 0
then input #1 ID NAME $12. JOB time hours;
else input #1 JOB time hours;
drop test_string;
datalines;
0305679 SMITH, JOHN ARCH05 001 2
ARCH05 005 3
ARCH05 001 7
0305680 JONES, MARY ARCH06 002 4
ARCH06 005 3
ARCH07 001 7
run;
The key thing is to really understand how your raw file is organized. Once you know the rules, using SAS to read it is a snap!
A list input solution could be the following:
data have;
array all(6) $20. ID LNAME FNAME JOB TIME HOURS;
retain Id Lname Fname;
drop i;
input #;
nitems = countw(_infile_,', ');
if notdigit(scan(_infile_,1)) = 0 then
do i = 1 to nitems;
all(i) = Scan(_infile_,i);
end;
else
do i = 1 to 3;
all(i+3) = Scan(_infile_,i);
if i = 6 then all(i) = all(i)*1;
end;
datalines;
0305679 SMITH, JOHN ARCH05 001 2
ARCH05 005 3
ARCH05 001 7
0305680 JONES, MARY ARCH06 002 4
ARCH06 005 3
ARCH07 001 7
run;
proc print; run;

create index in SAS using do loop

Say I have a set of data in this format:
ID Product account open date
1 A 20100101
1 B 20100103
2 C 20100104
2 A 20100205
2 D 20100605
3 A 20100101
And I want to create a column to capture the sequence of the products opened so the table will look like this:
ID First Second third
1 A B
2 C A D
3 A
I know I need to create an index for each ID so I can transpose the data afterwards:
ID Product account open date sequence
1 A 20100101 1
1 B 20100103 2
2 C 20100104 1
2 A 20100205 2
2 D 20100605 3
3 A 20100101 1
From my limited knowledge in do loop, I think I need to write something like this:
if first.ID and not last.ID then n=1 do while ID not last n+1
Something like that. Can anyone help me with the exact syntax? I have tried googling for similar codes and haven't had much luck.
Thanks!
I'd sort by ID and then date and use proc transpose for simplicity. Here's an example:
data prod;
input ID Product $ Open_DT :yymmdd8.;
format open_dt date9.;
datalines;
1 A 20100101
1 B 20100103
2 C 20100104
2 A 20100205
2 D 20100605
3 A 20100101
;
run;
proc sort data=prod;
by ID Open_DT;run;
proc transpose data=prod
out=prod_trans(drop=_name_)
prefix=ITEM;
by id;
var Product;
run;
proc print data=prod_trans noobs;
run;

Resources