SAS - Find and print first non-zero value from a dataset in columns - arrays

I have a data set with ID in rows and months in columns, as the one shown below.
I want to create an auxiliary column that records the first value that is not zero of each line.
ID M1 M2 M3 M4 M5 Auxiliary column
1 0 0 8 8 7 8
2 7 7 7 . . 7
3 0 0 0 0 9 9
4 0 9 9 9 8 9
5 1 1 1 1 1 1
6 0 2 2 1 1 2
Currently l am using this code, but I haven't been able to get the results I am looking for. Any ideas?
data new_ops04;
set new_ops03;
array MONTHS (24) M1-M24;
RETAIN AUXILIARY_COLUMN 0;
do i=1 to 24;
IF MONTHS(i) ne 0 and AUXILIARY_COLUMN = 0 THEN
AUXILIARY_COLUMN = MONTHS(i);
end;
drop i;
run;
Thanks a lot!

You're very close. Just drop the retain statement:
data new_ops04;
set new_ops03;
array MONTHS (24) M1-M24;
AUXILIARY_COLUMN = 0;
do i=1 to 24;
IF MONTHS(i) ne 0 and AUXILIARY_COLUMN = 0 THEN
AUXILIARY_COLUMN = MONTHS(i);
end;
drop i;
run;
you need to consider what happens if the first observation(s) are missing

I would do this use case in proc sql. But your problem is that you are not stopping when you reach the first value. So:
flag = 0;
do i=1 to 24 until (flag)
if MONTHS(i) ne 0 and AUXILIARY_COLUMN = 0 THEN
AUXILIARY_COLUMN = MONTHS(i);
flag = 1;
end;
drop i, flag;

Related

Creating Dataset with random Values in SAS

I want to create a random dataset. Something like this-
ptno visits sex race
1 1 1 0
1 2 1 0
1 3 1 0
2 1 2 1
2 2 2 1
2 3 2 1
3 1 1 0
3 2 1 0
3 3 1 0
The values should be randomly generated. I want to know if I can do this dynamically using do loops. Thanks in advance for helping.
data want ;
length ptno visits sex race 8. ;
do ptno = 1 to 100 ;
_visits = ceil(ranuni(0)*5) ; /* between 1 & 5 */
sex = ceil(ranuni(0)*2) ; /* between 1 & 2 */
race = floor(ranuni(0)*2) ; /* between 0 & 1 */
do visits = 1 to _visits ;
output ;
end ;
end ;
drop _visits ;
run ;
SAS call ranuni() produce a random variate from a uniform distribution, if value is greater than 0.5 then 1, otherwise 0. Here, the same ptno (i) + seed get the same sex or race.
data want;
do i=100 to 110;
do j=1 to 5;
seed1=i+4567;
call ranuni(seed1,x);
seed2=i+1234;
call ranuni(seed2,y);
ptno=i;
visit=j;
sex=(x>0.5)+1;
race=(y<0.5);
output;
end;
end;
keep ptno--race;
run;

Keep n largest values per column and set the rest to zero in MATLAB without a loop

I have a matrix of unsorted numbers and I want to keep the n largest (not necessarily unique) values per column and set the rest to zero.
I figured out how to do it with a loop:
a = [4 8 12 5; 9 2 6 18; 11 3 9 7; 8 9 12 4]
k = 2
for n = 1:4
[y, ind] = sort(a(:,n), 'descend');
a(ind(k+1:end),n) = 0;
end
a
which gives me:
a =
4 8 12 5
9 2 6 18
11 3 9 7
8 9 12 4
k =
2
a =
0 8 12 0
9 0 0 18
11 0 0 7
0 9 12 0
However when I try to eliminate the loop, I can't seem to get the indexing right, because this:
a = [4 8 12 5; 9 2 6 18; 11 3 9 7; 8 9 12 4]
k = 2
[y, ind] = sort(a, 'descend');
b = ind(k+1:end,:)
a(b) = 0
which gives me this: (which is not what I wanted to do)
a =
4 8 12 5
9 2 6 18
11 3 9 7
8 9 12 4
k =
2
b =
4 3 3 1
1 2 2 4
a =
0 8 12 5
0 2 6 18
0 3 9 7
0 9 12 4
Am I indexing this wrong? Do I have to use the loop?
I referenced this question to get started but it wasn't exactly what I was trying to do: How to find n largest elements in an array and make the other elements zero in matlab?
You're very close. ind in the sort function gives you the row locations for each column where that particular value would appear in the sorted output. You need to do some additional work if you want to index into the matrix properly and eliminate the entries. You know that for each column of I, that tells you that we need to eliminate those entries from that particular column. Therefore, what I would do is generate column-major linear indices using each column of I to be the rows we need to eliminate.
Try doing this:
a = [4 8 12 5; 9 2 6 18; 11 3 9 7; 8 9 12 4];
k = 2;
[y, ind] = sort(a, 'descend');
%// Change here
b = sub2ind(size(a), ind(k+1:end,:), repmat(1:size(a,2), size(a,1)-k, 1));
a(b) = 0;
We use sub2ind to help us generate our column major indices where the rows are denoted by the values in ind after the kth element and the columns we need are for each column in this matrix. There are size(a,1)-k rows remaining after you truncate out the k values after sorting, and so we generate column values that go from 1 up to as many columns as we have in a and as many rows as there are remaining.
We get this output:
>> a
a =
0 8 12 0
9 0 0 18
11 0 0 7
0 9 12 0
Here's one using bsxfun -
%// Get descending sorting indices per column
[~, ind] = sort(a,1, 'descend')
%// Get linear indices that are to be set to zeros and set those in a to 0s
rem_ind = bsxfun(#plus,ind(n+1:end,:),[0:size(a,2)-1]*size(a,1))
a(rem_ind) = 0
Sample run -
a =
4 8 12 5
9 2 6 18
11 3 9 7
8 9 12 4
n =
2
ind =
3 4 1 2
2 1 4 3
4 3 3 1
1 2 2 4
rem_ind =
4 7 11 13
1 6 10 16
a =
0 8 12 0
9 0 0 18
11 0 0 7
0 9 12 0

How can I find all the cells that have the same values in a multi-dimensional array in octave / matlab

How can I find all the cells that have the same values in a multi-dimensional array?
I can get it partially to work with result=A(:,:,1)==A(:,:,2) but I'm not sure how to also include A(:,:,3)
I tried result=A(:,:,1)==A(:,:,2)==A(:,:,3) but the results come back as all 0 when there should be 1 correct answer
which is where the number 8 is located in the same cell on all the pages of the array. Note: this is just a test the repeating number could be found multiple times and as different numbers.
PS: I'm using octave 3.8.1 which is like matlab
See code below:
clear all, tic
%graphics_toolkit gnuplot %use this for now it's older but allows zoom
A(:,:,1)=[1 2 3; 4 5 6; 7 9 8]; A(:,:,2)=[9 1 7; 6 5 4; 7 2 8]; A(:,:,3)=[2 4 6; 8 9 1; 3 5 8]
[i j k]=size(A)
for ii=1:k
maxamp(ii)=max(max(A(:,:,ii)))
Ainv(:,:,ii)=abs(A(:,:,ii)-maxamp(ii));%the extra max will get the max value of all values in array
end
%result=A(:,:,1)==A(:,:,2)==A(:,:,3)
result=A(:,:,1)==A(:,:,2)
result=double(result); %turns logical index into double to do find
[row col page] = find(result) %gives me the col, row, page
This is the output it gives me:
>>>A =
ans(:,:,1) =
1 2 3
4 5 6
7 9 8
ans(:,:,2) =
9 1 7
6 5 4
7 2 8
ans(:,:,3) =
2 4 6
8 9 1
3 5 8
i = 3
j = 3
k = 3
maxamp = 9
maxamp =
9 9
maxamp =
9 9 9
result =
0 0 0
0 1 0
1 0 1
row =
3
2
3
col =
1
2
3
page =
1
1
1
Use bsxfun(MATLAB doc, Octave doc) and check to see if broadcasting the first slice is equal across all slices with a call to all(MATLAB doc, Octave doc):
B = bsxfun(#eq, A, A(:,:,1));
result = all(B, 3);
If we're playing code golf, a one liner could be:
result = all(bsxfun(#eq, A, A(:,:,1)), 3);
The beauty of the above approach is that you can have as many slices as you want in the third dimension, other than just three.
Example
%// Your data
A(:,:,1)=[1 2 3; 4 5 6; 7 9 8];
A(:,:,2)=[9 1 7; 6 5 4; 7 2 8];
A(:,:,3)=[2 4 6; 8 9 1; 3 5 8];
B = bsxfun(#eq, A, A(:,:,1));
result = all(B, 3);
... gives us:
>> result
result =
0 0 0
0 0 0
0 0 1
The above makes sense since the third row and third column for all slices is the only value where every slice shares this same value (i.e. 8).
Here's another approach: compute differences along third dimension and detect when all those differences are zero:
result = ~any(diff(A,[],3),3);
You can do
result = A(:,:,1) == A(:,:,2) & A(:,:,1) == A(:,:,3);
sum the elements along the third dimension and divide it with the number of dimensions. We get back the original value if the values are the same in all dimension. Otherwise a different (e.g. a decimal) value. Then find the location where A and the summation are equal over the third dimension.
all( A == sum(A,3)./size(A,3),3)
ans =
0 0 0
0 0 0
0 0 1
or
You could also do
all(A==repmat(sum(A,3)./size(A,3),[1 1 size(A,3)]),3)
where repmat(sum(A,3)./size(A,3),[1 1 size(A,3)]) would highlight the implicit broadcasting of this when compared with A.
or
you skip the broadcasting altogether and just compare it with the first slice of A
A(:,:,1) == sum(A,3)./size(A,3)
Explanation
3 represents the third dimension .
sum(A,3) means that we are taking the sum over the third dimension.
Then we divide that sum by the number of dimensions.
It's basically the average value for that position in the third dimension.
If you add three values and then divide it by three then you get the original value back.
For example, A(3,3,:) is [8 8 8]. (8+8+8)/3 = 8.
If you take another example, i.e. the value above, A(2,3,:) = [6 4 1].
Then (6+4+1)/3=3.667. This is not equal to A(2,3,:).
sum(A,3)./size(A,3)
ans =
4.0000 2.3333 5.3333
6.0000 6.3333 3.6667
5.6667 5.3333 8.0000
Therefore, we know that the elements are not the same
throughout the third dimension. This is just a trick I use
to determine that. You also have to remember that
sum(A,3)./size(A,3) is originally a 3x3x1 matrix
that will be automatically expanded (i.e. broadcasted) to a
3x3x3 matrix when we do the comparison with A (A == sum(A,3)./size(A,3)).
The result of that comparison will be a logical array with 1 for the positions that are the same throughout the third dimension.
A == sum(A,3)./size(A,3)
ans =
ans(:,:,1) =
0 0 0
0 0 0
0 0 1
ans(:,:,2) =
0 0 0
1 0 0
0 0 1
ans(:,:,3) =
0 0 0
0 0 0
0 0 1
Then use all(....,3) to get those. The result is a 3x3x1
matrix where a 1 indicates that the value is the same in the
third dimension.
all( A == sum(A,3)./size(A,3),3)
ans =
0 0 0
0 0 0
0 0 1

Removing zeros and then vertically collapse the matrix

In MATLAB, say I have a set of square matrices, say A, with trace(A)=0 as follows:
For example,
A = [0 1 2; 3 0 4; 5 6 0]
How can I remove the zeros and then vertically collapse the matrix to become as follow:
A_reduced = [1 2; 3 4; 5 6]
More generally, what if the zeroes can appear anywhere in the column (i.e., not necessarily at the long diagonal)? Assuming, of course, that the total number of zeros for all columns are the same.
The matrix can be quite big (hundreds x hundreds in dimension). So, an efficient way will be appreciated.
To compress the matrix vertically (assuming every column has the same number of zeros):
A_reduced_v = reshape(nonzeros(A), nnz(A(:,1)), []);
To compress the matrix horizontally (assuming every row has the same number of zeros):
A_reduced_h = reshape(nonzeros(A.'), nnz(A(1,:)), []).';
Case #1
Assuming that A has equal number of zeros across all rows, you can compress it horizontally (i.e. per row) with this -
At = A' %//'# transpose input array
out = reshape(At(At~=0),size(A,2)-sum(A(1,:)==0),[]).' %//'# final output
Sample code run -
>> A
A =
0 3 0 2
3 0 0 1
7 0 6 0
1 0 6 0
0 16 0 9
>> out
out =
3 2
3 1
7 6
1 6
16 9
Case #2
If A has equal number of zeros across all columns, you can compress it vertically (i.e. per column) with something like this -
out = reshape(A(A~=0),size(A,1)-sum(A(:,1)==0),[]) %//'# final output
Sample code run -
>> A
A =
0 3 7 1 0
3 0 0 0 16
0 0 6 6 0
2 1 0 0 9
>> out
out =
3 3 7 1 16
2 1 6 6 9
This seems to work, quite fiddly to get the behaviour right with transposing:
>> B = A';
>> C = B(:);
>> reshape(C(~C==0), size(A) - [1, 0])'
ans =
1 2
3 4
5 6
As your zeros are always in the main diagonal you can do the following:
l = tril(A, -1);
u = triu(A, 1);
out = l(:, 1:end-1) + u(:, 2:end)
A correct and very simple way to do what you want is:
A = [0 1 2; 3 0 4; 5 6 0]
A =
0 1 2
3 0 4
5 6 0
A = sort((A(find(A))))
A =
1
2
3
4
5
6
A = reshape(A, 2, 3)
A =
1 3 5
2 4 6
I came up with almost the same solution as Mr E's though with another reshape command. This solution is more universal, as it uses the number of rows in A to create the final matrix, instead of counting the number of zeros or assuming a fixed number of zeros..
B = A.';
B = B(:);
C = reshape(B(B~=0),[],size(A,1)).'

SAS: reshape data (stacking)

How do I reshape this data in SAS?
id q1a q2a q1b q2b q1c q2c
1 3 0 1 1 1 9
2 4 9 1 2 2 0
3 5 9 1 2 4 0
into this:
id q1 q2 type
1 3 0 a
1 1 1 b
1 1 9 c
.............
The simplest way is to transpose the data, split out the last letter from the q variables, then re-transpose.
data have;
input id q1a q2a q1b q2b q1c q2c;
datalines;
1 3 0 1 1 1 9
2 4 9 1 2 2 0
3 5 9 1 2 4 0
;
run;
proc transpose data=have out=temp1;
by id;
run;
data temp2;
set temp1;
length type $1;
type=substr(_NAME_,3);
_NAME_=substr(_NAME_,1,2);
run;
proc transpose data=temp2 out=want (drop=_:) ;
by id type;
id _NAME_;
var COL1;
run;
Seems simpler to me to just do it in one bit.
data want;
set have;
array qs q:;
do _t = 1 to dim(qs) by 2;
q1=qs[_t];
q2=qs[_t+1];
type = substr(vname(qs[_t]),3,1);
output;
end;
keep id q1 q2 type;
run;
That works for exactly your data; there are some assumptions (the substr and the relationship between q1/q2) that might need to be changed for a real-world example.

Resources