I have table as below:
id comp1 comp2 comp3 comp4 comp5
1 12 14 45 78 74
2 25 45 36 88 45
I want to create a table if any comp1-comp10 has 36?
You can use WHICHN()
data want;
set have;
array comp(5) comp1-comp5;
if whichn(36, of comp(*))>0;
run;
Related
Say I have an Mx4 array A where the values in the first column are a number 1 to 12. Now I want to gather the rest of the columns in 12 separate Mx3 arrays depending on which number is in column 1.
How would I go about doing that?
You can use unique and splitapply as follows. The result is a cell array of arrays.
M = [2 11 41 51;
1 10 20 30;
1 62 83 22;
4 73 53 53;
2 84 94 14]; % example data
L = 5; % Group labels are 1:L (L=12 in your case)
[u,~,w] = unique(M(:,1));
result = cell(L,1);
result(u) = splitapply(#(x){x}, M(:,2:end), w);
This gives
>> celldisp(result)
result{1} =
10 20 30
62 83 22
result{2} =
11 41 51
84 94 14
result{3} =
[]
result{4} =
73 53 53
result{5} =
[]
I have the following data set:
Student TestDay Score
001 1 85
001 6 76
001 7 89
002 1 92
002 5 82
002 7 93
I'd like to add a '100' value after the last non-empty value in the column 'Score', as well as add one to the value of TestDay. So the new data would look like the following:
Student TestDay Score
001 1 85
001 6 76
001 7 89
001 8 100
002 1 92
002 5 82
002 7 93
002 8 100
No need for arrays or loops.
data want;
set have;
by student;
output;
if last.student then do;
score=100;
testday=testday+1;
output;
end;
run;
I have a dataset with over 900 observations, each observation represents the population of a sub-geographical area for a given year by gender (male, female, all) and 20 different age groups.
I have dropped the variable for the sub-geographical area and I want to collape into the greater geographical area (called Geo).
I am having a difficult time doing a SUM or PROC MEANS because I have so many age groups to sum up and I am trying to avoid writing them all out. I want to collapse across the group year, geo, sex so that I only have 3 observations per Geo (my raw data could have as many as 54 observations).
This is an example of what a tiny section of the raw data looks like:
Year Geo Sex Age0005 Age0610 Age1115 (etc)
2010 1 1 92 73 75
2010 1 2 57 81 69
2010 1 3 159 154 144
2010 1 1 41 38 43
2010 1 2 52 41 39
2010 1 3 93 79 82
2010 2 1 71 66 68
2010 2 2 63 64 70
2010 2 3 134 130 138
2010 2 1 32 35 34
2010 2 2 29 31 36
2010 2 3 61 66 70
This is how I want it to look:
Year Group Sex Age0005 Age0610 Age1115 (etc)
2010 1 1 133 111 118
2010 1 2 109 122 08
2010 1 3 252 233 226
2010 2 1 103 101 102
2010 2 2 92 95 106
2010 2 3 195 196 208
Any ideas? Please help!
You don't have to write out each variable name individually - there are ways of getting around that. E.g. if all of the age group variables that need to be summed up start with age then you can use a : wildcard to match them:
proc summary nway data = have;
var age:;
class year geo sex;
output out = want sum=;
run;
If your variables don't have a common prefix, but are all next to each other in one big horizontal group in your dataset, you can use a double dash list instead:
proc summary nway data = have;
var age005--age1115; /*Includes all variables between these two*/
class year geo sex;
output out = want sum=;
run;
Note also the use of sum= - this means that each summarised variable is reproduced with its original name in the output dataset.
I personally like to use proc sql for this, since it makes it very clear what you're summing and grouping by.
data old ;
input Year Geo Sex Age0005 Age0610 Age1115 ;
datalines;
2010 1 1 92 73 75
2010 1 2 57 81 69
2010 1 3 159 154 144
2010 1 1 41 38 43
2010 1 2 52 41 39
2010 1 3 93 79 82
2010 2 1 71 66 68
2010 2 2 63 64 70
2010 2 3 134 130 138
2010 2 1 32 35 34
2010 2 2 29 31 36
2010 2 3 61 66 70
;
run;
proc sql ;
create table new as select
year
, geo label = 'Group'
, sex
, sum(age0005) as age0005
, sum(age0610) as age0610
, sum(age1115) as age1115
from old
group by geo, year, sex ;
quit;
I have raw data like this
time ID01 ID02 ID03 ~ IDxx
0 10 11 xx
0.5 20 12 xx
1 29 25 xx
1.5 41 30 xx
2 50 40 xx
3 30 50 xx
4 40 42 xx
. . .
. . .
. . .
I want to make it to this form
x time temp.
01 0 10
01 0.5 20
01 1 29
01 1.5 41
01 2 50
01 3 30
01 4 40
02 0 11
02 0.5 12
02 1 25
02 1.5 30
02 2 40
02 3 50
02 4 42
I used array statement and proc transpose
but I can't repeat time variable beside temp.
It works using arrays. Just write an output within the loop and time will be written in your output datatset, and then sort.
data output;
set input;
array ID(*) ID01-ID03;
do i=1 to 3;
X=put(i,z2.);
temp=ID(i);
output;
end;
keep time X temp;
run;
proc sort data=output;
by X time;
run;
lets say i have an array :
#time = qw(
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
);
but the values 1..50 depend on the size of an array #arr
so instead of declaring #time manually, how can i populate #time with 1 .. #arr, and possibly have other TYPES of elements like TIME in seconds, etc.
This will initialise #time with the values from 1 to $#arr:
#time = (1..$#arr);
I suspect you probably want 0 .. $#arr rather than 1 .. $#arr?
and possibly have other TYPES of elements like TIME in seconds, etc.
I'm not quite sure what you mean here, but you should have a look at map for one convenient way of generating a list of values by transforming another list. That might be what you're after.
#time = 1 .. #arr;
If you want to do something with each number, like multiply them by 2, you can use map:
#time = map { 2 * $_ } 1 .. #arr;