SAS: Adding value after last entry in column - arrays

I have the following data set:
Student TestDay Score
001 1 85
001 6 76
001 7 89
002 1 92
002 5 82
002 7 93
I'd like to add a '100' value after the last non-empty value in the column 'Score', as well as add one to the value of TestDay. So the new data would look like the following:
Student TestDay Score
001 1 85
001 6 76
001 7 89
001 8 100
002 1 92
002 5 82
002 7 93
002 8 100

No need for arrays or loops.
data want;
set have;
by student;
output;
if last.student then do;
score=100;
testday=testday+1;
output;
end;
run;

Related

Do Loops (with multiple rows of id's) with conditional statements?

Please see my data below;
data finance;
input id loan1 loan2 loan3 assets home$ type;
datalines;
1 93000 98000 45666 new 1
1 98000 45678 98765 67 old 2
1 55000 56764 435371 54 new 1
2 7000 6000 7547 57 new 1
4 67333 87444 98666 34 old 1
4 98000 68777 986465 23 new 1
5 4555 334 652 12 new 1
5 78999 98999 80000 34 new 1
5 889 989 676 3 new 1
;
data finance1;
set finance;
if loan1<80000 then conc'level1';
if loan2 <80000 and home='new' then borrowcap = 'high';
run;
I would like the following dataset, as you can see although there are multiple rows for each ID initially, if there was a level1 or high in any of those rows, I would like to capture that in the same row.
data finance;
input id conc$ borrowcap$;
datalines;
1 level1 high
2 level1 high
4 level1
5 level1 high
;
Any help is appreciated!
Use retain statement, you can keep value from any row for each ID. Use by statement + if last.var statement, you can keep only one row for each ID.
data finance;
input id loan1 loan2 loan3 assets home$ type;
datalines;
1 93000 98000 45666 . new 1
1 98000 45678 98765 67 old 2
1 55000 56764 435371 54 new 1
2 7000 6000 7547 57 new 1
4 67333 87444 98666 34 old 1
4 98000 68777 986465 23 new 1
5 4555 334 652 12 new 1
5 78999 98999 80000 34 new 1
5 889 989 676 3 new 1
;
data finance1;
set finance;
by id;
retain conc borrowcap;
length conc borrowcap $8.;
if first.id then call missing(conc,borrowcap);
if loan1<80000 then conc='level1';
if loan2<80000 and home='new' then borrowcap = 'high';
if last.id;
run;

Do loop SAS programming

I have table as below:
id comp1 comp2 comp3 comp4 comp5
1 12 14 45 78 74
2 25 45 36 88 45
I want to create a table if any comp1-comp10 has 36?
You can use WHICHN()
data want;
set have;
array comp(5) comp1-comp5;
if whichn(36, of comp(*))>0;
run;

Modifying final column value in SAS by group

I have the following data set:
Student TestDayStart TestDayEnd
001 1 5
001 6 10
001 11 15
002 1 4
002 5 9
002 10 14
I would like to make the last 'TestDayEnd' the final value for 'TestDayStart' for each Student.
So the data should look like this:
Student TestDayStart TestDayEnd
001 1 5
001 6 10
001 11 15
001 15 15
002 1 4
002 5 9
002 10 14
002 14 14
I'm not quite sure how I can do this in SAS. Any insight would be appreciated.
After sorting the dataset you can do this within a data step.
proc sort data=have;
by student testdaystart testdayend;
run;
Now you can use the by and retain statements in the data step. The by statement allows you to find the last student, and the retain statement lets you keep the previous value in the dataset.
data want;
set have;
retain last_testdayend;
by student testdaystart testdayend;
output;
last_testdayend = testdayend;
if last.student then do;
if testdaystart ne testdayend then do;
testdaystart = last_testdayend;
testdayend = last_testdayend;
output; * this second output statement creates a new record in the dataset;
end;
end;
drop last_testdayend;
run;

Creating calculated variables in one datastep

Any help is much appreciated. Thanks
I would like to create a couple of variables with my transactional data
I am trying to create variables 'act_bal' and 'penalty' using amount, type and Op_Bal. The rules I have are:
For the first record, the id will have op_bal and it will be
subtracted from the 'amount' for type=C and added if type=D to
calculate act_bal
For the second record onwards it is act_bal + amount for type=C and
act_bal-amount for type=D
I will add the penalty 10 only if the amount is >4 and the type=D.
The id can have only two penalties.
Total Penalty should be subtracted from the act_bal of the last
record which would become op_bal for the next day. (e.g. for id
101, -178-20=-198 will become the op_bal for 4/2/2019)
This is the data I have for two customers IDs 101 and 102 for two different dates (My actual dataset has the data for all the 30 days).
id date amount type Op_Bal
101 4/1/2019 50 C 100
101 4/1/2019 25 D
101 4/1/2019 75 D
101 4/1/2019 3 D
101 4/1/2019 75 D
101 4/1/2019 75 D
101 4/2/2019 100 C
101 4/2/2019 125 D
101 4/2/2019 150 D
102 4/1/2019 50 C 125
102 4/1/2019 125 C
102 4/2/2019 250 D
102 4/2/2019 10 D
The code I wrote is like this
data want;
set have;
by id date;
if first.id or first.date then do;
if first.id then do;
if type='C' then act_bal=Op_Bal - amount;
if type='D' then act_bal=Op_Bal + amount;
end;
else do;
retain act_bal;
if type='C' then act_bal=act_bal + amount;
if type='D' then act_bal=act_bal - amount;
if amount>4 and type='D' then do;
penalty=10;
end;
run;
I couldn't create a counter to control the penalties to 2 and could not subtract the total penalty amount from the amount of the last row. Could someone help me in receiving the desired result? Thanks
id date amount type Op_Bal act_bal penalty
101 4/1/2019 50 C 200 150 0
101 4/1/2019 25 D 125 0
101 4/1/2019 150 D -25 10
101 4/1/2019 75 D -100 10
101 4/1/2019 3 D -103 0
101 4/1/2019 75 D -178 0
101 4/2/2019 100 C -198 -98 0
101 4/2/2019 125 D -223 10
101 4/2/2019 150 D -373 10
102 4/1/2019 50 C 125 175 0
102 4/1/2019 125 C 300 0
102 4/2/2019 250 D 50 0
102 4/2/2019 10 D 40 0
A few tips:
You have the same code for incrementing act_bal in both the if and else blocks, so factor it out. Don't repeat yourself.
You can skip the retain statement if you use a sum statement.
Use a separate variable to keep track of the number of penalties triggered per day, but only apply the first two of them.
So, putting it all together:
data want;
set have;
by id date;
if first.date and not first.id then op_bal = act_bal;
if first.date then do;
act_bal = op_bal;
penalties = 0;
end;
if type='C' then act_bal + amount;
if type='D' then act_bal + (-amount);
if amount > 4 and type='D' then penalties + 1;
if last.date then act_bal + (-min(penalties,2) * 10);
run;

Sum of multiple variables by group

I have a dataset with over 900 observations, each observation represents the population of a sub-geographical area for a given year by gender (male, female, all) and 20 different age groups.
I have dropped the variable for the sub-geographical area and I want to collape into the greater geographical area (called Geo).
I am having a difficult time doing a SUM or PROC MEANS because I have so many age groups to sum up and I am trying to avoid writing them all out. I want to collapse across the group year, geo, sex so that I only have 3 observations per Geo (my raw data could have as many as 54 observations).
This is an example of what a tiny section of the raw data looks like:
Year Geo Sex Age0005 Age0610 Age1115 (etc)
2010 1 1 92 73 75
2010 1 2 57 81 69
2010 1 3 159 154 144
2010 1 1 41 38 43
2010 1 2 52 41 39
2010 1 3 93 79 82
2010 2 1 71 66 68
2010 2 2 63 64 70
2010 2 3 134 130 138
2010 2 1 32 35 34
2010 2 2 29 31 36
2010 2 3 61 66 70
This is how I want it to look:
Year Group Sex Age0005 Age0610 Age1115 (etc)
2010 1 1 133 111 118
2010 1 2 109 122 08
2010 1 3 252 233 226
2010 2 1 103 101 102
2010 2 2 92 95 106
2010 2 3 195 196 208
Any ideas? Please help!
You don't have to write out each variable name individually - there are ways of getting around that. E.g. if all of the age group variables that need to be summed up start with age then you can use a : wildcard to match them:
proc summary nway data = have;
var age:;
class year geo sex;
output out = want sum=;
run;
If your variables don't have a common prefix, but are all next to each other in one big horizontal group in your dataset, you can use a double dash list instead:
proc summary nway data = have;
var age005--age1115; /*Includes all variables between these two*/
class year geo sex;
output out = want sum=;
run;
Note also the use of sum= - this means that each summarised variable is reproduced with its original name in the output dataset.
I personally like to use proc sql for this, since it makes it very clear what you're summing and grouping by.
data old ;
input Year Geo Sex Age0005 Age0610 Age1115 ;
datalines;
2010 1 1 92 73 75
2010 1 2 57 81 69
2010 1 3 159 154 144
2010 1 1 41 38 43
2010 1 2 52 41 39
2010 1 3 93 79 82
2010 2 1 71 66 68
2010 2 2 63 64 70
2010 2 3 134 130 138
2010 2 1 32 35 34
2010 2 2 29 31 36
2010 2 3 61 66 70
;
run;
proc sql ;
create table new as select
year
, geo label = 'Group'
, sex
, sum(age0005) as age0005
, sum(age0610) as age0610
, sum(age1115) as age1115
from old
group by geo, year, sex ;
quit;

Resources