Modifying final column value in SAS by group - arrays

I have the following data set:
Student TestDayStart TestDayEnd
001 1 5
001 6 10
001 11 15
002 1 4
002 5 9
002 10 14
I would like to make the last 'TestDayEnd' the final value for 'TestDayStart' for each Student.
So the data should look like this:
Student TestDayStart TestDayEnd
001 1 5
001 6 10
001 11 15
001 15 15
002 1 4
002 5 9
002 10 14
002 14 14
I'm not quite sure how I can do this in SAS. Any insight would be appreciated.

After sorting the dataset you can do this within a data step.
proc sort data=have;
by student testdaystart testdayend;
run;
Now you can use the by and retain statements in the data step. The by statement allows you to find the last student, and the retain statement lets you keep the previous value in the dataset.
data want;
set have;
retain last_testdayend;
by student testdaystart testdayend;
output;
last_testdayend = testdayend;
if last.student then do;
if testdaystart ne testdayend then do;
testdaystart = last_testdayend;
testdayend = last_testdayend;
output; * this second output statement creates a new record in the dataset;
end;
end;
drop last_testdayend;
run;

Related

SAS: Adding value after last entry in column

I have the following data set:
Student TestDay Score
001 1 85
001 6 76
001 7 89
002 1 92
002 5 82
002 7 93
I'd like to add a '100' value after the last non-empty value in the column 'Score', as well as add one to the value of TestDay. So the new data would look like the following:
Student TestDay Score
001 1 85
001 6 76
001 7 89
001 8 100
002 1 92
002 5 82
002 7 93
002 8 100
No need for arrays or loops.
data want;
set have;
by student;
output;
if last.student then do;
score=100;
testday=testday+1;
output;
end;
run;

SAS macro to print out change of baseline scores

I'm looking for a way to print out change of tests scores for each subject with a SAS macro. Here is a sample of the data:
Subject Visit Date Test Score
001 Baseline 01/01/99 Jump 5
001 Baseline 01/01/99 Reach 3
001 Week 6 02/12/99 Jump 7
001 Week 6 02/12/99 Reach 6
002 Baseline 03/01/99 Jump 2
002 Baseline 03/01/99 Reach 4
002 Week 6 04/12/99 Jump 5
002 Week 6 04/12/99 Reach 9
I would like to create a macro that generates the following for each subject:
Subject Visit Date (Days from Baseline) Test Score Change from Baseline Score
001 Baseline 01/01/99 Jump 5
01/01/99 Reach 3
001 Week 6 02/12/99 (42) Jump 7 +2
02/12/99 (42) Reach 6 +3
002 Baseline 03/01/99 Jump 2
03/01/99 Reach 4
002 Week 6 04/12/99 (42) Jump 5 +3
04/12/99 (42) Reach 9 +5
I believe I can just use the INTCK function for the Days from Baseline, but I'm not sure how to print out each test without retaining the 'Subject' and 'Visit' values in each row. Any help would be much appreciated.
You can sort by test and process using a retain for date and score for computing deltas. The print out can be done with Proc REPORT, formatting delta values appropriately.
Example:
data have; input
Subject Visit& $8. Date& mmddyy8. Test $ Score; format date mmddyy8.; datalines;
001 Baseline 01/01/99 Jump 5
001 Baseline 01/01/99 Reach 3
001 Week 6 02/12/99 Jump 7
001 Week 6 02/12/99 Reach 6
002 Baseline 03/01/99 Jump 2
002 Baseline 03/01/99 Reach 4
002 Week 6 04/12/99 Jump 5
002 Week 6 04/12/99 Reach 9
run;
proc sort data=have;
by subject test date;
run;
data for_report;
set have;
by subject test;
retain base_date base_score;
if first.subject then do;
base_date = .;
base_score = .;
end;
if first.test and visit='Baseline' then do;
base_date = date;
base_score = score;
end;
if not first.test then do;
delta_days = intck('days', date, base_date);
delta_score = score - base_score;
end;
run;
proc format;
picture plus low-0 = [best12.] other = '000000009' (prefix='+');
options missing=' ';
proc report data=for_report;
columns subject visit date delta_days test score delta_score;
define subject / order;
define visit / order order=data;
format delta_days negparen.;
format delta_score plus.;
run;
options missing='.';
An alternate report can be more subject-centric:
proc report data=for_report
style(lines) = [just=left fontweight=bold]
;
columns subject visit date delta_days test score delta_score;
define subject / order noprint;
define visit / order order=data;
format delta_days negparen.;
format delta_score plus.;
compute before subject;
subj = catx(' ', "Subject:", subject);
line subj $200.;
endcomp;
run;
Here is one way of doing it. The SQL-step calculates changes from baseline. The case-when-construct is only there to suppress zeroes in the output.
Printing using group-variables in proc report means Subject- and Visit-values are not retained on every line (but note that subject is not repeated each week).
I put the code in a macro, as that was the question. It doesn't really do much, however.
/* Creating test data*/
data testdata;
input Subject $3. #5 Visit $8. #17 Date mmddyy10. #28 Test $5. Score;
format date mmddyy10.;
datalines;
001 Baseline 01/01/99 Jump 5
001 Baseline 01/01/99 Reach 3
001 Week 6 02/12/99 Jump 7
001 Week 6 02/12/99 Reach 6
002 Baseline 03/01/99 Jump 2
002 Baseline 03/01/99 Reach 4
002 Week 6 04/12/99 Jump 5
002 Week 6 04/12/99 Reach 9
;
%macro baselines(dataset=);
/* Adding days from baseline and change from baseline. Please note that the first visit
must denoted as exactly "Baseline"*/
proc sql;
create table changes as
select t1.*, case when t1.date-t2.date>0 then t1.date-t2.date else . end as days
"Days from baseline", case when t1.score-t2.score>0 then t1.score-t2.score else .
end as score_change "Change from Baseline"
from &dataset as t1 left join (select * from &dataset where visit="Baseline") as t2
on t1.subject=t2.subject and t1.test=t2.test
order by subject, visit, test;
/* Printing the dataset. The use of subject and visit as group variables keeps SAS from repeating the values*/
title "Changes based on the dataset &dataset";
proc report data=changes;
column subject visit days test score score_change;
define subject / group;
define visit / group;
run;
%mend;
%baselines(dataset=testdata)

Looping a proc transpose through multiple data ranges SAS

I am trying to transpose a sequence of ranges from an excel file into SAS. The excel file looks something like this:
31 Dec 01Jan 02Jan 03Jan 04Jan
Book id1 23 24 35 43 98
Book id2 3 4 5 4 1
(few blank rows in between)
05Jan 06Jan 07Jan 08Jan 09Jan
Book id1 14 100 30 23 58
Book id2 2 7 3 8 6
(and it repeats..)
My final output should have a first column for the date and then two additional columns for the book Ids:
Date Book id1 Book id2
31 Dec 23 3
01Jan 24 4
02Jan 35 5
03Jan 43 4
04Jan 98 1
05Jan 14 2
06Jan 100 7
07Jan 30 3
08Jan 23 8
09Jan 58 6
In this particular case I am asking for a simpler method to:
Either import and transpose each range of data and replacing the data range with macro variables to separately import and transpose each individual range
Or to import the whole datafile first and then to create a loop that
transposes each range of data
Code I used for a simple import and transpose of a specific data range:
proc import datafile="&input./have.xlsx"
out=want
dbms=xlsx replace;
range="Data$A3:F5" ;
run;
proc transpose data=want
out=want_transposed
name=date;
id A;
run;
Thanks!
A data row that is split over several segments or blocks of rows in an Excel file can be imported raw into SAS and then processed into a categorical form using a DATA Step.
In this example sample data is put into a text file and imported such that the column names are generic VAR-1 ... VAR-n. The generic import is then processed across each row, outputting one SAS data set row per import cell.
The column names in each segment are retained within a temporary array an updated whenever a blank book id is encountered.
* mock data;
filename demo "%sysfunc(pathname(WORK))\demo.txt";
data _null_;
input;
file demo;
put _infile_;
datalines;
., 31Dec, 01Jan, 02Jan, 03Jan, 04Jan
Book_id1, 23 , 24 , 35 , 43 , 98
Book_id2, 3 , 4 , 5 , 4 , 1
., 05Jan, 06Jan, 07Jan, 08Jan, 09Jan
Book_id1, 14 , 100 , 30 , 23 , 58
Book_id2, 2 , 7 , 3 , 8 , 6
run;
* mock import;
proc import replace out=work.haveraw file=demo dbms=csv;
getnames = no;
datarow = 1;
run;
ods listing;
proc print data=haveraw;
run;
When Excel import is be made to look like this:
Obs VAR1 VAR2 VAR3 VAR4 VAR5 VAR6
1 31Dec 01Jan 02Jan 03Jan 04Jan
2 Book_id1 23 24 35 43 98
3 Book_id2 3 4 5 4 1
4
5 05Jan 06Jan 07Jan 08Jan 09Jan
6 Book_id1 14 100 30 23 58
7 Book_id2 2 7 3 8 6
It can be processed in a transposing way, outputting only the name value pairs corresponding to a original cell.
data have (keep=bookid date value);
set haveraw;
array dates(1000) $12 _temporary_ ;
array vars var:;
if missing(var1) then do;
do index = 2 by 1 while (index <= dim(vars));
if not missing(vars(index)) then
dates(index) = put(index-1,z3.) || '_' || vars(index); * adjust as you see fit;
else
dates(index) = '';
end;
end;
else do;
bookid = var1;
do index = 2 by 1 while (index <= dim(vars));
date = dates(index);
value = input(vars(index),??best12.);
output;
end;
end;
run;

SAS use a lookup dataset like array in another dataset

I have 1 data set with content description for a school
contents:
num description
content1 math
content2 spanish
content3 geography
content4 chemistry
content5 history
in another data set (students) i have the array content1-content5 and i use a flag to indicate content that have each student.
students
name age content1 content2 content3 content4 content5
BOB 15 1 1 1 1
BRYA 16
CARL 15 1 1
SUE 17 1 1 1
LOU 15 1
if i use a code like this:
data students1;
set students;
array content[5];
format allcontents $100.;
do i=1 to dim(content);
if content[i]=1 then do;
allcontents=cat(vname(content[i]),',',allcontents);
end;
end;
run;
the result is:
name age content1 content2 content3 content4 content5 allcontents
BOB 15 1 1 1 1 content1,content2,content3,content5,
BRYA 16
CARL 15 1 1 content2,content5,
SUE 17 1 1 1 content3,content4,content5,
LOU 15 1 content5
1) i want to use the name of the lookup table (data set contents) to use the name of the content and not the arrays names of content[1-5] in the variable allcontents. how can i do that?
2) and later i want the result by content description, not by student, like this:
description name age
math BOB 15
spanish BOB 15
geography BOB 15
history BOB 15
spanish CARL 15
history CARL 15
spanish SUE 17
chemistry SUE 17
history SUE 17
history LOU 15
is it possible?
thanks.
First, grab the %create_hash() macro from this post.
Use the hash table to look up the values.
data students1;
set students
array content[5];
format num $32. description $16.;
if _n_ = 1 then do;
%create_hash(cnt,num,description,"contents");
end;
do i=1 to 5;
if content[i]=1 then do;
num = vname(content[i]);
rc = cnt.find();
output;
end;
end;
keep description name age;
run;
I find proc transpose suitable. Doing once is enough for question 2) and twice for renaming the variables contents1-5 (hence question 1). The key is the ID statement in proc transpose which automatically rename variables by their corresponding transposed orders.
The code below should give you the desired answers (albeit the name are ordered alphabetically, which may not be the same as your original ordering).
/* original data sets */
data names;
input num $ description $;
cards;
content1 math
content2 spanish
content3 geography
content4 chemistry
content5 history
;run;
data students;
input name $ age content1 content2 content3 content4 content5;
cards;
BOB 15 1 1 1 . 1
BRYA 16 . . . . .
CARL 15 . 1 . . 1
SUE 17 . . 1 1 1
LOU 15 . . . . 1
;run;
/* transpose */
proc sort data=students out=tmp_sorted;
by name age;
run;
proc transpose data=tmp_sorted out=tmp_transposed;
by name age;
run;
/* merge the names of content1-5 */
* If you want to preserve ordering from contents1-contents5
* instead of alphabetical ordering of "description" column
* from a-z, do not drop the "num" column for further use.;
proc sql;
create table tmp_merged as
select B.description, A.name, A.age, B.num, A.COL1
from tmp_transposed as A
left join names as B
on A._NAME_=B.num
order by A.name, B.num;
quit;
/* transpose again */
proc transpose data=tmp_merged(drop=num) out=tmp_renamed(drop=_name_);
by name age;
ID description; *name the transposed variables;
run;
/* answer (1) */
data ans1;
set tmp_renamed;
array content[5] math--history;
format allcontents $100.;
do i=1 to dim(content);
* better use cats (cat does not seem to work);
if content[i]=1 then allcontents=cats(allcontents,',',vname(content[i]));
end;
*kill the leading comma;
allcontents=substr(allcontents,2,99);
run;
/* answer (2) */
data ans2(drop=num col1);
set tmp_merged;
where col1=1;
run;
*cleanup;
proc datasets lib=work nolist;
delete tmp_:;
quit;

parsing a text file in sas

So I have a rather messy text file I'm trying to convert to a sas data set. It looks something like this (though much bigger):
0305679 SMITH, JOHN ARCH05 001 2
ARCH05 005 3
ARCH05 001 7
I'm trying to set 5 separate variables (ID, name, job, time, hours) but clearly only 3 of the variables appear after the first line. I tried this:
infile "C:\Users\Desktop\jobs.txt" dlm = ' ' dsd missover;
input ID $ name $ job $ time hours;
and didn't get the right output, then I tried to parse it
infile "C:\Users\Desktop\jobs.txt" dlm = ' ' dsd missover; input
allData $; id = substr(allData, find(allData,"305")-2, 7);
but I'm still not getting the right output. Any ideas?
EDIT: I'm trying now to use .scan() and .substr() to apart the larger data set, how do I subset a single line from the data?
Your data might not be all that messy; it just might be in a hierarchical format where the first row contains all five variables and subsequent rows contain values for variables 3-5. In other words, ID and NAME should be retained as you read through the file.
If that is correct (it's a hierarchical layout) this here is a possible solution:
data have;
retain ID NAME;
informat ID 7. JOB $6. TIME 3. HOURS 1.;
input #1 test_string $7. #;
if notdigit(test_string) = 0
then input #1 ID NAME $12. JOB time hours;
else input #1 JOB time hours;
drop test_string;
datalines;
0305679 SMITH, JOHN ARCH05 001 2
ARCH05 005 3
ARCH05 001 7
0305680 JONES, MARY ARCH06 002 4
ARCH06 005 3
ARCH07 001 7
run;
The key thing is to really understand how your raw file is organized. Once you know the rules, using SAS to read it is a snap!
A list input solution could be the following:
data have;
array all(6) $20. ID LNAME FNAME JOB TIME HOURS;
retain Id Lname Fname;
drop i;
input #;
nitems = countw(_infile_,', ');
if notdigit(scan(_infile_,1)) = 0 then
do i = 1 to nitems;
all(i) = Scan(_infile_,i);
end;
else
do i = 1 to 3;
all(i+3) = Scan(_infile_,i);
if i = 6 then all(i) = all(i)*1;
end;
datalines;
0305679 SMITH, JOHN ARCH05 001 2
ARCH05 005 3
ARCH05 001 7
0305680 JONES, MARY ARCH06 002 4
ARCH06 005 3
ARCH07 001 7
run;
proc print; run;

Resources