How to efficently create arrays with runs of equal elements in SAS - arrays

Case One
Sample output:
`1,1,1,1,2,2,2,2......9,9,9,9,0,0,0,0'
A more general case would be:
The array starts with n1 elements valued x1, then followed by n2 elements valued x2...
In the sample output, n1 = n2 = n3 = .. = 4, x1=1, x2=2 ...
But I don't want to create it based on the element's position in array using if-else statement.
Here's what I have done:
%let nd = 80;
data _t(drop = i);
array ap{&nd};
do i = 1 to &nd;
if i le 4 then a[i] = 1;
else ....;
end;
'other codes'
run;
Case Two
What if the order in the array doesn't matter as long as it contains all the elements I need (n1 x1, n2 x2 ...) ? In this scenario, is it easier to build up the array?

Provided that you know in advance an upper bound for how many elements you want to create, you can do this in the array statement - e.g.
data _null_;
array t{10} (1*1 2*2 3*3 4*4);
put t{*};
run;
Output:
1 2 2 3 3 3 4 4 4 4
N.B. This sort of assignment implicitly causes your array variables to be retained.
You can also nest brackets when creating runs of elements, e.g,
data _null_;
array t{10} (2*(1 2 3 4 5));
put t{*};
run;
Output:
1 2 3 4 5 1 2 3 4 5
However, * signs need to be separated by brackets, e.g.
data _null_;
array t{10} (2*(1 2) 2*(3*3));
put t{*};
run;
Output:
1 2 1 2 3 3 3 3 3 3

Related

How to find a sum 3 elements in array?

I have an array A=[a1,a2,a3, ..., aN] I would like to take a product of each 3 elements:
s1=a1+a2+a3
s2=a4+a5+a6
...
sM=a(N-2)+a(N-1)+aN
My solution:
k=size(A);
s=0;
for n=1:k
s(n)=s(n-2)+s(n-1)+s(n);
end
Error: Attempted to access s(2); index out of bounds because numel(s)=1.
Hoe to fix it?
If you want to sum in blocks, for the general case when the number of elements of A is not necessarily a multiple of the block size, you can use accumarray:
A = [3 8 5 8 2 3 4 7 9 6 4]; % 11 elements
s = 3; % block size
result = accumarray(ceil((1:numel(A))/s).', A(:));
If you want a sliding sum with a given block size, you can use conv:
A = [3 8 5 8 2 3 4 7 9 6 4]; % 11 elements
s = 3; % block size
result = conv(A(:).', ones(1,s), 'valid');
You try to calculate sby using values from s. Dont you mean s(n)=A(n-2)+A(n-1)+A(n);? Also size returns more than one dimension on its own.
That being said, getting the 2 privous values n-2 and n-1 doenst work for n=1;2 (because you must have positive indices). You have to explain how the first two values should be handeled. I assume either 0 for elements not yet exisiting
k=size(A,2); %only the second dimension when A 1xn, or length(A)
s=zeros(1,k); %get empty values instead of appending each value for better performance
s(1)=A(1);
s(2)=A(2)+A(1);
for n=3:k %start at 3
s(n)=A(n-2)+A(n-1)+A(n);
end
or sshoult be 2 values shorter than A.
k=size(A,2);
s=zeros(1,k-2);
for n=1:k-2
s(n)=A(n)+A(n+1)+A(n+2);
end
You initialise s as a scalar with s = 0. Then you try and index it like an array, but it only has a single element.
Your current logic (if fixed) will calculate this:
s(1) = a(1)+a(2)+a(3)
s(2) = a(2)+a(3)+a(4)
...
% 's' will be 2 elements shorter than 'a'
So we need to be a bit wiser with the indexing to get what you describe, which is
s(1) = a(1)+a(2)+a(3)
s(2) = a(4)+a(5)+a(6)
...
% 's' will be a third as big as 'a'
You should pre-allocate s to the right size, like so:
k = numel(A); % Number of elements in 'A'
s = zeros( 1, k/3 ); % Output array, assuming 'k' is divisible by 3
for n = 0:3:k-3
s(n/3+1) = a(n+1) + a(n+2) + a(n+3);
end
You could do this in one line by reshaping the array to have 3 rows, then summing down each column, this assumes that the number of elements in a is divisible by 3, and that a is a row vector...
s = sum( reshape( a, 3, [] ) );

Collapsing matrix into columns

I have a 2D matrix where the № of columns is always a multiple of 3 (e.g. 250×27) - due to a repeating organisation of the results (A,B,C, A,B,C, A,B,C, and so forth). I wish to reshape this matrix to create a new matrix with 3 columns - each containing the aggregated data for each type (A,B,C) (e.g. 2250×3).
So in a matrix of 250×27, all the data in columns 1,4,7,10,13,16,19,22,25 would be merged to form the first column of the resulting reshaped matrix.
The second column in the resulting reshaped matrix would contain all the data from columns 2,5,8,11,14,17,20,23,26 - and so forth.
Is there a simple way to do this in MATLAB? I only know how to use reshape if the columns I wanted to merge were adjacent (1,2,3,4,5,6) rather than non-adjacent (1,4,7,10,13,16) etc.
Shameless steal from #Divakar:
B = reshape( permute( reshape(A,size(A,1),3,[]), [1,3,2]), [], 3 );
Let A be your matrix. You can save every third column in one matrix like:
(Note that you don't have to save them as matrices separately but it makes this example easier to read).
A = rand(27); %as test
B = A(:,1:3:end);
C = A(:,2:3:end);
D = A(:,3:3:end);
Then you use reshape:
B = reshape(B,[],1);
C = reshape(C,[],1);
D = reshape(D,[],1);
And finally put it all together:
A = [B C D];
You can just treat every set of columns as a single item and do three reshapes together. This should do the trick:
[save as "reshape3.m" file in your Matlab folder to call it as a function]
function out = reshape3(in)
[~,C]=size(in); % determine number of columns
if mod(C,3) ~=0
error('ERROR: Number of rows must be a multiple of 3')
end
R_out=numel(in)/3; % number of rows in output
% Reshape columns 1,4,7 together as new column 1, column 2,5,8 as new col 2 and so on
out=[reshape(in(:,1:3:end),R_out,1), ...
reshape(in(:,2:3:end),R_out,1), ...
reshape(in(:,3:3:end),R_out,1)];
end
Lets suppose you have a 3x6 matrix A
A = [1 2 3 4 5 6;6 5 4 3 2 1;2 3 4 5 6 7]
A =
1 2 3 4 5 6
6 5 4 3 2 1
2 3 4 5 6 7
you extract the size of the matrix
b =size(A)
and then extract each third column for a single row
c1 = A((1:b(1)),[1:3:b(2)])
c2 = A((1:b(1)),[2:3:b(2)])
c3 = A((1:b(1)),[3:3:b(2)])
and put them in one matrix
A_result = [c1(:) c2(:) c3(:)]
A_result =
1 2 3
6 5 4
2 3 4
4 5 6
3 2 1
5 6 7
My 2 cents:
nRows = size(matrix, 1);
nBlocks = size(matrix, 2) / 3;
matrix = reshape(matrix, [nRows 3 nBlocks]);
matrix = permute(matrix, [1 3 2]);
matrix = reshape(matrix, [nRows * nBlocks 1 3]);
matrix = reshape(matrix(:), [nRows * nBlocks 3]);
Here's my 2 minute take on it:
rv = #(x) x(:);
ind = 1:3:size(A,2);
B = [rv(A(:,ind)) rv(A(:,ind+1)) rv(A(:,ind+2))];
saves a few ugly reshapes, may be a bit slower though.
If you have the Image Processing Toolbox, im2col is a very handy solution:
out = im2col(A,[1 4], 'distinct').'
Try Matlab function mat2cell, I think this form is allowed.
X is the "start matrix"
C = mat2cell(X, [n], [3, 3, 3]); %n is the number of rows, repeat "3" as many times as you nedd
%extract every matrix
C1 = C{1,1}; %first group of 3 columns
C2 = C{1,2}; %second group of 3 columns
%repeat for all your groups
%join the matrix with vertcat
Cnew = vertcat(C1,C2,C3); %join as many matrix n-by-3 as you have

merge two matrix and its attributes in matlab

I've two matrix a and b and I'd like to combine the rows in a way that in the first row I got no duplicate value and in the second value, columns in a & b which have the same row value get added together in new matrix. i.e.
a =
1 2 3
8 2 5
b =
1 2 5 7
2 4 6 1
Desired outputc =
1 2 3 5 7
10 6 5 6 1
Any help is welcomed,please.
For two-row matrices
You want to add second-row values corresponding to the same first-row value. This is a typical use of unique and accumarray:
[ii, ~, kk] = unique([a(1,:) b(1,:)]);
result = [ ii; accumarray(kk(:), [a(2,:) b(2,:)].').'];
General case
If you need to accumulate columns with an arbitrary number of columns (based on the first-row value), you can use sparse as follows:
[ii, ~, kk] = unique([a(1,:) b(1,:)]);
r = repmat((1:size(a,1)-1).', 1, numel(kk));
c = repmat(kk.', size(a,1)-1, 1);
result = [ii; full(sparse(r,c,[a(2:end,:) b(2:end,:)]))];

Vectorization- Matlab

Given a vector
X = [1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3]
I would like to generate a vector such
Y = [1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5]
So far what I have got is
idx = find(diff(X))
Y = [1:idx(1) 1:idx(2)-idx(1) 1:length(X)-idx(2)]
But I was wondering if there is a more elegant(robust) solution?
One approach with diff, find & cumsum for a generic case -
%// Initialize array of 1s with the same size as input array and an
%// intention of using cumsum on it after placing "appropriate" values
%// at "strategic" places for getting the final output.
out = ones(size(X))
%// Find starting indices of each "group", except the first group, and
%// by group here we mean run of identical numbers.
idx = find(diff(X))+1
%// Place differentiated and subtracted values of indices at starting locations
out(idx) = 1-diff([1 idx])
%// Perform cumulative summation for the final output
Y = cumsum(out)
Sample run -
X =
1 1 1 1 2 2 3 3 3 3 3 4 4 5
Y =
1 2 3 4 1 2 1 2 3 4 5 1 2 1
Just for fun, but customary bsxfun based alternative solution -
%// Logical mask with each column of ones for presence of each group elements
mask = bsxfun(#eq,X(:),unique(X(:).')) %//'
%// Cumulative summation along columns and use masked values for final output
vals = cumsum(mask,1)
Y = vals(mask)
Here's another approach:
Y = sum(triu(bsxfun(#eq, X, X.')), 1);
This works as follows:
Compare each element with all others (bsxfun(...)).
Keep only comparisons with current or previous elements (triu(...)).
Count, for each element, how many comparisons are true (sum(..., 1)); that is, how many elements, up to and including the current one, are equal to the current one.
Another method is using the function unique
like this:
[unqX ind Xout] = unique(X)
Y = [ind(1):ind(2) 1:ind(3)-ind(2) 1:length(X)-ind(3)]
Whether this is more elegant is up to you.
A more robust method will be:
[unqX ind Xout] = unique(X)
for ii = 1:length(unqX)-1
Y(ind(ii):ind(ii+1)-1) = 1:(ind(ii+1)-ind(ii));
end

matlab: eliminate elements from array

I have quite big array. To make things simple lets simplify it to:
A = [1 1 1 1 2 2 3 3 3 3 4 4 5 5 5 5 5 5 5 5];
So, there is a group of 1's (4 elements), 2's (2 elements), 3's (4 elements), 4's (2 elements) and 5's (8 elements). Now, I want to keep only columns, which belong to group of 3 or more elements. So it will be like:
B = [1 1 1 1 3 3 3 3 5 5 5 5 5 5 5 5];
I was doing it using for loop, scanning separately 1's, 2's, 3's and so on, but its extremely slow with big arrays...
Thanks for any suggestions how to do it in more efficient way :)
Art.
A general approach
If your vector is not necessarily sorted, then you need to run to count the number of occurrences of each element in the vector. You have histc just for that:
elem = unique(A);
counts = histc(A, elem);
B = A;
B(ismember(A, elem(counts < 3))) = []
The last line picks the elements that have less than 3 occurrences and deletes them.
An approach for a grouped vector
If your vector is "semi-sorted", that is if similar elements in the vector are grouped together (as in your example), you can speed things up a little by doing the following:
start_idx = find(diff([0, A]))
counts = diff([start_idx, numel(A) + 1]);
B = A;
B(ismember(A, A(start_idx(counts < 3)))) = []
Again, note that the vector need not to be entirely sorted, just that similar elements are adjacent to each other.
Here is my two-liner
counts = accumarray(A', 1);
B = A(ismember(A, find(counts>=3)));
accumarray is used to count the individual members of A. find extracts the ones that meet your '3 or more elements' criterion. Finally, ismember tells you where they are in A. Note that A needs not be sorted. Of course, accumarray only works for integer values in A.
What you are describing is called run-length encoding.
There is software for this in Matlab on the FileExchange. Or you can do it directly as follows:
len = diff([ 0 find(A(1:end-1) ~= A(2:end)) length(A) ]);
val = A(logical([ A(1:end-1) ~= A(2:end) 1 ]));
Once you have your run-length encoding you can remove elements based on the length. i.e.
idx = (len>=3)
len = len(idx);
val = val(idx);
And then decode to get the array you want:
i = cumsum(len);
j = zeros(1, i(end));
j(i(1:end-1)+1) = 1;
j(1) = 1;
B = val(cumsum(j));
Here's another way to do it using matlab built-ins.
% Set up
A=[1 1 1 1 2 2 3 3 3 3 4 4 5 5 5 5 5];
threshold=2;
% Get the unique elements of the array
uniqueElements=unique(A);
% Count haw many times each unique element occurs
counts=histc(A,uniqueElements);
% Write which elements should be kept
toKeep=uniqueElements(counts>threshold);
% Make a logical index
indexer=false(size(A));
for i=1:length(toKeep)
% For every unique element we want to keep select the indices in A that
% are equal
indexer=indexer|(toKeep(i)==A);
end
% Apply index
B=A(indexer);

Resources