Suppose I have a data set as below
data have;
input subject$ day$ arm$ var1 var2;
datalines;
100-01 Day1 left 40 30
100-01 Day1 right 35 25
200-01 Day28 left 45 22
200-01 Day28 right 38 15
;
run;
In this data set each subject has two row. But I would like to make one row for each subject. The expected data set is
data want;
input subject$ day$ arm_left$ arm_right$ var1_left var1_right var2_left var2_right;
datalines;
100-01 Day1 left right 40 35 30 25
200-01 Day28 left right 45 38 22 15
;
run;
Any help is appreciated.
This gets you almost there, not the arm portion? Do you actually need that variable (arm_left/arm_right) seems redundant.
data have;
input subject$ day$ arm$ var1 var2;
datalines;
100-01 Day1 left 40 30
100-01 Day1 right 35 25
200-01 Day28 left 45 22
200-01 Day28 right 38 15
;
run;
proc transpose data=have out=long;
by subject day arm;
var var1 var2;
run;
proc transpose data=long out=wide delimiter=_;
by subject day;
id _name_ arm;
var col1;
run;
Another way but may not scale:
data have_left;
set have;
by subject day;
where arm='left';
rename arm = arm_left var1=var1_left var2=var2_left;
run;
data have_right;
set have;
where arm='right';
rename arm = arm_right var1=var1_right var2=var2_right;
run;
data want;
merge have_left have_right;
by subject day;
run;
Related
I am trying to group by dataset in three month groups, or quarters, but as I'm starting from an arbitrary date, I cannot use the quarter function in sas.
Example data below of what I have and quarter is the column I need to create in SAS.
The start date is always the same, so my initial quarter will be 3rd Sep 2018 - 3rd Dec 2018 and any active date falling in that quarter will be 1, then quarter 2 will be 3rd Dec 2018 - 3rd Mar 2019 and so on. This cannot be coded manually as the start date will change depending on the data, and the number of quarters could be up to 20+.
The code I have attempted so far is below
data test_Data_op;
set test_data end=eof;
%let j = 0;
%let start_date = start_Date;
if &start_Date. <= effective_dt < (&start_date. + 90) then quarter = &j.+1;
run;
This works and gives the first quarter correctly, but I can't figure out how to loop this for every following quarter? Any help will be greatly appreciated!
No need for a DO loop if you already have the start_date and actual event dates. Just count the number of months and divide by three. Use the continuous method of the INTCK() function to handle start dates that are not the first day of a month.
month_number=intck('month',&start_date,mydate,'cont')+1;
qtr_number=floor((month_number-1)/3)+1;
Based on the comment by #Lee. Edited to match the data from the screenshot.
The example shows that May 11 would be in the 3rd quarter since the seed date is September 3.
data have;
input mydate :yymmdd10.;
format mydate yymmddd10.;
datalines;
2018-09-13
2018-12-12
2019-05-11
;
run;
%let start_date='03sep2018'd;
data want;
set have;
quarter=floor(mod((yrdif(&start_date,mydate)*4),4))+1;
run;
If you want the number of quarters to extend beyond 4 (e.g. September 4, 2019 would be in quarter 5 rather than cycle back to 1), then remove the "mod" from the function:
quarter=floor(yrdif(&start_date,mydate)*4)+1;
The traditional use of quarter means a 3 month time period relative to Jan 1. Make sure your audience understands the phrase quarter in your data presentation actually means 3 months relative to some arbitrary starting point.
The funky quarter can be functionally computed from a months apart derived using a mix of INTCK for the baseline months computation and a logical expression for adjusting with relation to the day of the month of the start date. No loops required.
For example:
data have;
do startDate = '11feb2019'd ;
do effectiveDate = startDate to startDate + 21*90;
output;
end;
end;
format startDate effectiveDate yymmdd10.;
run;
data want;
set have;
qtr = 1
+ floor(
( intck ('month', startDate, effectiveDate)
-
(day(effectiveDate) < day(startDate))
)
/ 3
);
format qtr 4.;
run;
Extra
Comparing my method (qtr) to #Tom (qtr_number) for a range of startDates:
data have;
retain seq 0;
do startDate = '01jan1999'd to '15jan2001'd;
seq + 1;
do effectiveDate = startDate to startDate + 21*90;
output;
end;
end;
format startDate effectiveDate yymmdd10.;
run;
data want;
set have;
qtr = 1
+ floor( ( intck ('month', startDate, effectiveDate)
- (day(effectiveDate) < day(startDate))
) / 3 );
month_number=intck('month',startDate,effectiveDate,'cont')+1;
qtr_number=floor((month_number-1)/3)+1;
format qtr: month: 4.;
run;
options nocenter nodate nonumber;title;
ods listing;
proc print data=want;
where qtr ne qtr_number;
run;
dm 'output';
-------- OUTPUT ---------
effective month_ qtr_
Obs seq startDate Date qtr number number
56820 31 1999-01-31 1999-04-30 1 4 2
57186 31 1999-01-31 2000-04-30 5 16 6
57551 31 1999-01-31 2001-04-30 9 28 10
57916 31 1999-01-31 2002-04-30 13 40 14
58281 31 1999-01-31 2003-04-30 17 52 18
168391 90 1999-03-31 1999-06-30 1 4 2
168483 90 1999-03-31 1999-09-30 2 7 3
168757 90 1999-03-31 2000-06-30 5 16 6
168849 90 1999-03-31 2000-09-30 6 19 7
169122 90 1999-03-31 2001-06-30 9 28 10
169214 90 1999-03-31 2001-09-30 10 31 11
169487 90 1999-03-31 2002-06-30 13 40 14
169579 90 1999-03-31 2002-09-30 14 43 15
169852 90 1999-03-31 2003-06-30 17 52 18
169944 90 1999-03-31 2003-09-30 18 55 19
280510 149 1999-05-29 2001-02-28 7 22 8
280875 149 1999-05-29 2002-02-28 11 34 12
281240 149 1999-05-29 2003-02-28 15 46 16
282035 150 1999-05-30 2000-02-29 3 10 4
282400 150 1999-05-30 2001-02-28 7 22 8
282765 150 1999-05-30 2002-02-28 11 34 12
I am trying to replace missing values that occur before the first non-null entry in SAS. I have the following data:
StudentID Day TestScore
Student001 0 .
Student001 1 78
Student001 2 89
Student002 3 .
Student002 4 .
Student002 5 .
Student002 6 95
I'd like to modify the data so the null values are replaces with the next available non-null entry:
StudentID Day TestScore
Student001 0 78
Student001 1 78
Student001 2 89
Student002 3 95
Student002 4 95
Student002 5 95
Student002 6 95
data scores;
length StudentID $ 10;
input StudentID $ Day TestScore;
datalines;
Student001 0 .
Student001 1 78
Student001 2 89
Student002 3 .
Student002 4 .
Student002 5 .
Student002 6 95
;
run;
proc sort data = scores;
by descending day;
run;
data scores;
drop addscore;
retain addscore;
set scores;
if testscore ne . then addscore = testscore;
if testscore eq . then testscore = addscore;
run;
proc sort data = scores;
by day;
run;
proc sort data = have;
by id descending day ;
run;
data want;
set have;
by id;
retain last_score;
if first.id then call missing(last_score);
if not missing(score) then last_score = score;
else score = last_score;
run;
proc sort data=want;
by id day;
run;
FYI, this will NOT set the missing values if there are any after the last known score for a given ID. i.e. if you had something like:
Student002 5 95
Student002 6 .
Then only records prior to day 5 for id 002 will get a value of 95. Is that a possible condition for you? If yes, this solution will require a slight modification
You can use a DOW loop to identify the next non-missing score, and a subsequent DOW loop to apply the non-missing score. The DOW approach does not require sorting and maintains the original row order.
data want;
do _n_ = 1 by 1 until (last.id or not missing(score));
set have;
by id;
end;
_score = score;
do _n_ = 1 to _n_;
set have;
score = _score;
output;
end;
drop _score;
run;
In SQL, presuming day ordering, the imputed value can be looked up in a correlated sub-query.
proc sql;
create table want as
select
id, day,
case
when not missing(score) then score
else (select score from have as inner
where inner.id = outer.id
and inner.day > outer.day
and not missing(score)
having inner.day = min(inner.day)
)
end as score
from have as outer;
I have a dataset that looks like this:
data have;
input ID P1 P2 P3 P4;
datalines;
ID P1 P2 P3 P4
12 10 15 20 30
12 - 20 5 3
12 - - 25 33
12 - - - 30
19 10 15 20 30
19 - 10 17 30
19 - - 5 30
19 - - - 30
;
run;
I am trying to build in a variable called Year which then can be used to identify that the ID and P1-P4 is an array with each row representing a year. Such that the dataset would look like.
data want;
set have;
input ID P1 P2 P3 P4;
datalines;
ID P1 P2 P3 P4 Year
12 10 15 20 30 2017
12 - 20 5 3 2018
12 - - 25 33 2019
12 - - - 30 2020
19 10 15 20 30 2017
19 - 10 17 30 2018
19 - - 5 30 2019
19 - - - 30 2020
;
run;
I originally used to use this code:
Data Year;
do ID = 1 to 8;
do Year = 2017 to 2020;
output;
end;
end;
run;
data Final;
set have;
Merge Year;
run;
But now that I am working with a different dataset each time and I don't know the structure of the ID, I can't keep changing ID=1 to 8 to suit the dataset each time.
My question: Is there a way to do this through the dataset, possibly a count?
Count ID = 2017;
Year = count + 1;
There is no need to create a second data set that will be merged with the first.
You do need to make assumptions about the grouping in the have data set. The assumptions are the data is already sorted or arranged in a manner that allows a monotonic year value to be assigned to each sequential row in each group.
data want;
set have;
by id;
if first.id
then year = 2017; %* initial year for a group;
else year + 1; %* increment year for subsequent rows of a group;
run;
I am trying to transpose a sequence of ranges from an excel file into SAS. The excel file looks something like this:
31 Dec 01Jan 02Jan 03Jan 04Jan
Book id1 23 24 35 43 98
Book id2 3 4 5 4 1
(few blank rows in between)
05Jan 06Jan 07Jan 08Jan 09Jan
Book id1 14 100 30 23 58
Book id2 2 7 3 8 6
(and it repeats..)
My final output should have a first column for the date and then two additional columns for the book Ids:
Date Book id1 Book id2
31 Dec 23 3
01Jan 24 4
02Jan 35 5
03Jan 43 4
04Jan 98 1
05Jan 14 2
06Jan 100 7
07Jan 30 3
08Jan 23 8
09Jan 58 6
In this particular case I am asking for a simpler method to:
Either import and transpose each range of data and replacing the data range with macro variables to separately import and transpose each individual range
Or to import the whole datafile first and then to create a loop that
transposes each range of data
Code I used for a simple import and transpose of a specific data range:
proc import datafile="&input./have.xlsx"
out=want
dbms=xlsx replace;
range="Data$A3:F5" ;
run;
proc transpose data=want
out=want_transposed
name=date;
id A;
run;
Thanks!
A data row that is split over several segments or blocks of rows in an Excel file can be imported raw into SAS and then processed into a categorical form using a DATA Step.
In this example sample data is put into a text file and imported such that the column names are generic VAR-1 ... VAR-n. The generic import is then processed across each row, outputting one SAS data set row per import cell.
The column names in each segment are retained within a temporary array an updated whenever a blank book id is encountered.
* mock data;
filename demo "%sysfunc(pathname(WORK))\demo.txt";
data _null_;
input;
file demo;
put _infile_;
datalines;
., 31Dec, 01Jan, 02Jan, 03Jan, 04Jan
Book_id1, 23 , 24 , 35 , 43 , 98
Book_id2, 3 , 4 , 5 , 4 , 1
., 05Jan, 06Jan, 07Jan, 08Jan, 09Jan
Book_id1, 14 , 100 , 30 , 23 , 58
Book_id2, 2 , 7 , 3 , 8 , 6
run;
* mock import;
proc import replace out=work.haveraw file=demo dbms=csv;
getnames = no;
datarow = 1;
run;
ods listing;
proc print data=haveraw;
run;
When Excel import is be made to look like this:
Obs VAR1 VAR2 VAR3 VAR4 VAR5 VAR6
1 31Dec 01Jan 02Jan 03Jan 04Jan
2 Book_id1 23 24 35 43 98
3 Book_id2 3 4 5 4 1
4
5 05Jan 06Jan 07Jan 08Jan 09Jan
6 Book_id1 14 100 30 23 58
7 Book_id2 2 7 3 8 6
It can be processed in a transposing way, outputting only the name value pairs corresponding to a original cell.
data have (keep=bookid date value);
set haveraw;
array dates(1000) $12 _temporary_ ;
array vars var:;
if missing(var1) then do;
do index = 2 by 1 while (index <= dim(vars));
if not missing(vars(index)) then
dates(index) = put(index-1,z3.) || '_' || vars(index); * adjust as you see fit;
else
dates(index) = '';
end;
end;
else do;
bookid = var1;
do index = 2 by 1 while (index <= dim(vars));
date = dates(index);
value = input(vars(index),??best12.);
output;
end;
end;
run;
In SAS, for the two test datasets below - for every value of "amount" that falls within "y" and "z", I need to extract the corresponding "x". There could be multiple values of "x" that fit into the criteria.
The final result should look something like this:
/*
4 banana eggs
15 .
31 .
7 banana
22 fig
1 eggs
11 coconut
17 date
41 apple
*/
I realize this relies on using indices or binary searches but I can't figure out a workable solution! Any help would appreciated! Thanks!
data test1;
input x $ y z;
datalines;
apple 29 43
banana 2 7
coconut 9 13
date 17 20
eggs 1 5
fig 18 26
;
run;
data test2;
input amount;
datalines;
4
15
31
7
22
1
11
17
41
;
run;
Join the two datasets so amount falls between y and z.
proc sql;
create table join as
select a.amount
,b.*
from test2 a
left join
test1 b
on a.amount between b.y and b.z;
quit;
Sort the result by amount for transpose.
proc sort data=join; by amount; run;
Transpose it.
proc transpose data=join out=trans;
by amount;
var x;
run;
Now you have your fruits each in its own variable named col1, col2, ....
If you want them all in one variable separated by a blank, just concatenate them.
data trans2(keep= amount text);
set trans(drop=_name_);
array v{*} _character_;
text = catx(' ', of v{*});
run;
Here is a possible solution using "old-fashioned" data step code plus PROC TRANSPOSE:
data test1;
input x $ y z;
datalines;
apple 29 43
banana 2 7
coconut 9 13
date 17 20
eggs 1 5
fig 18 26
run;
data test2;
input amount;
datalines;
4
15
31
7
22
1
11
17
41
run;
data want(keep=amount x);
set test2;
found = 0;
do _i_=1 to nobs;
set test1 point=_i_ nobs=nobs;
if y <= amount <= z then do;
found = 1;
output;
end;
end;
if not found then do;
x = ' ';
output;
end;
run;
proc transpose data=want out=want2(drop=_name_);
by amount notsorted;
var x;
run;
Note my results do not match that in your example; amount 31 is an "apple".