Filter matrix by multiple column values w/o loops (Matlab)? - arrays

Say I have the following:
Data matrix M (m-by-n);
Matching row V (1-by-n);
Matching positions I (1-by-n logical);
I want to filter all rows of M that have the same values as V at the matching positions I. I believe that Matlab indexing if powerful enough to do that without loops. But how?
Current solution: run though all the columns and update the filtered row positions F (m-by-1 logical).
F = true(m,1);
for k = 1:n;
if I(k);
F = F & (M(:,k)==V(k));
end;
end;
M = M(F,:);

Here's one way:
result = M(all(bsxfun(#eq, M(:,I), V(I)), 2), :);
How it works
Each row of M(:,I) is compared element-wise with the row vector V(I) using bsxfun. Rows for which all columns match are selected. The resulting logical vector is used to index the rows of M.
Example
M = [ 8 3 6 9
5 4 9 8
8 9 6 9 ];
I = [ true false true true ];
V = [ 8 1 6 9 ];
>> result = M(all(bsxfun(#eq, M(:,I), V(I)), 2), :)
result =
8 3 6 9
8 9 6 9

Related

Shuffle array while spacing repeating elements

I'm trying to write a function that shuffles an array, which contains repeating elements, but ensures that repeating elements are not too close to one another.
This code works but seems inefficient to me:
function shuffledArr = distShuffle(myArr, myDist)
% this function takes an array myArr and shuffles it, while ensuring that repeating
% elements are at least myDist elements away from on another
% flag to indicate whether there are repetitions within myDist
reps = 1;
while reps
% set to 0 to break while-loop, will be set to 1 if it doesn't meet condition
reps = 0;
% randomly shuffle array
shuffledArr = Shuffle(myArr);
% loop through each unique value, find its position, and calculate the distance to the next occurence
for x = 1:length(unique(myArr))
% check if there are any repetitions that are separated by myDist or less
if any(diff(find(shuffledArr == x)) <= myDist)
reps = 1;
break;
end
end
end
This seems suboptimal to me for three reasons:
1) It may not be necessary to repeatedly shuffle until a solution has been found.
2) This while loop will go on forever if there is no possible solution (i.e. setting myDist to be too high to find a configuration that fits). Any ideas on how to catch this in advance?
3) There must be an easier way to determine the distance between repeating elements in an array than what I did by looping through each unique value.
I would be grateful for answers to points 2 and 3, even if point 1 is correct and it is possible to do this in a single shuffle.
I think it is sufficient to check the following condition to prevent infinite loops:
[~,num, C] = mode(myArr);
N = numel(C);
assert( (myDist<=N) || (myDist-N+1) * (num-1) +N*num <= numel(myArr),...
'Shuffling impossible!');
Assume that myDist is 2 and we have the following data:
[4 6 5 1 6 7 4 6]
We can find the the mode , 6, with its occurence, 3. We arrange 6s separating them by 2 = myDist blanks:
6 _ _ 6 _ _6
There must be (3-1) * myDist = 4 numbers to fill the blanks. Now we have five more numbers so the array can be shuffled.
The problem becomes more complicated if we have multiple modes. For example for this array [4 6 5 1 6 7 4 6 4] we have N=2 modes: 6 and 4. They can be arranged as:
6 4 _ 6 4 _ 6 4
We have 2 blanks and three more numbers [ 5 1 7] that can be used to fill the blanks. If for example we had only one number [ 5] it was impossible to fill the blanks and we couldn't shuffle the array.
For the third point you can use sparse matrix to accelerate the computation (My initial testing in Octave shows that it is more efficient):
function shuffledArr = distShuffleSparse(myArr, myDist)
[U,~,idx] = unique(myArr);
reps = true;
while reps
S = Shuffle(idx);
shuffledBin = sparse ( 1:numel(idx), S, true, numel(idx) + myDist, numel(U) );
reps = any (diff(find(shuffledBin)) <= myDist);
end
shuffledArr = U(S);
end
Alternatively you can use sub2ind and sort instead of sparse matrix:
function shuffledArr = distShuffleSparse(myArr, myDist)
[U,~,idx] = unique(myArr);
reps = true;
while reps
S = Shuffle(idx);
f = sub2ind ( [numel(idx) + myDist, numel(U)] , 1:numel(idx), S );
reps = any (diff(sort(f)) <= myDist);
end
shuffledArr = U(S);
end
If you just want to find one possible solution you could use something like that:
x = [1 1 1 2 2 2 3 3 3 3 3 4 5 5 6 7 8 9];
n = numel(x);
dist = 3; %minimal distance
uni = unique(x); %get the unique value
his = histc(x,uni); %count the occurence of each element
s = [sortrows([uni;his].',2,'descend'), zeros(length(uni),1)];
xr = []; %the vector that will contains the solution
%the for loop that will maximize the distance of each element
for ii = 1:n
s(s(:,3)<0,3) = s(s(:,3)<0,3)+1;
s(1,3) = s(1,3)-dist;
s(1,2) = s(1,2)-1;
xr = [xr s(1,1)];
s = sortrows(s,[3,2],{'descend','descend'})
end
if any(s(:,2)~=0)
fprintf('failed, dist is too big')
end
Result:
xr = [3 1 2 5 3 1 2 4 3 6 7 8 3 9 5 1 2 3]
Explaination:
I create a vector s and at the beggining s is equal to:
s =
3 5 0
1 3 0
2 3 0
5 2 0
4 1 0
6 1 0
7 1 0
8 1 0
9 1 0
%col1 = unique element; col2 = occurence of each element, col3 = penalities
At each iteration of our for-loop we choose the element with the maximum occurence since this element will be harder to place in our array.
Then after the first iteration s is equal to:
s =
1 3 0 %1 is the next element that will be placed in our array.
2 3 0
5 2 0
4 1 0
6 1 0
7 1 0
8 1 0
9 1 0
3 4 -3 %3 has now 5-1 = 4 occurence and a penalities of -3 so it won't show up the next 3 iterations.
at the end every number of the second column should be equal to 0, if it's not the minimal distance was too big.

Remove one element from each row of a matrix, each in a different column

How can I remove elements in a matrix, that aren't all in a straight line, without going through a row at a time in a for loop?
Example:
[1 7 3 4;
1 4 4 6;
2 7 8 9]
Given a vector (e.g. [2,4,3]) How could I remove the elements in each row (where each number in the vector corresponds to the column number) without going through each row at a time and removing each element?
The example output would be:
[1 3 4;
1 4 4;
2 7 9]
It can be done using linear indexing at follows. Note that it's better to work down columns (because of Matlab's column-major order), which implies transposing at the beginning and at the end:
A = [ 1 7 3 4
1 4 4 6
2 7 8 9 ];
v = [2 4 3]; %// the number of elements of v must equal the number of rows of A
B = A.'; %'// transpose to work down columns
[m, n] = size(B);
ind = v + (0:n-1)*m; %// linear index of elements to be removed
B(ind) = []; %// remove those elements. Returns a vector
B = reshape(B, m-1, []).'; %'// reshape that vector into a matrix, and transpose back
Here's one approach using bsxfun and permute to solve for a 3D array case, assuming you want to remove indexed elements per row across all 3D slices -
%// Inputs
A = randi(9,3,4,3)
idx = [2 4 3]
%// Get size of input array, A
[M,N,P] = size(A)
%// Permute A to bring the columns as the first dimension
Ap = permute(A,[2 1 3])
%// Per 3D slice offset linear indices
offset = bsxfun(#plus,[0:M-1]'*N,[0:P-1]*M*N) %//'
%// Get 3D array linear indices and remove those from permuted array
Ap(bsxfun(#plus,idx(:),offset)) = []
%// Permute back to get the desired output
out = permute(reshape(Ap,3,3,3),[2 1 3])
Sample run -
>> A
A(:,:,1) =
4 4 1 4
2 9 7 5
5 9 3 9
A(:,:,2) =
4 7 7 2
9 6 6 9
3 5 2 2
A(:,:,3) =
1 7 5 8
6 2 9 6
8 4 2 4
>> out
out(:,:,1) =
4 1 4
2 9 7
5 9 9
out(:,:,2) =
4 7 2
9 6 6
3 5 2
out(:,:,3) =
1 5 8
6 2 9
8 4 4

merge two matrix and its attributes in matlab

I've two matrix a and b and I'd like to combine the rows in a way that in the first row I got no duplicate value and in the second value, columns in a & b which have the same row value get added together in new matrix. i.e.
a =
1 2 3
8 2 5
b =
1 2 5 7
2 4 6 1
Desired outputc =
1 2 3 5 7
10 6 5 6 1
Any help is welcomed,please.
For two-row matrices
You want to add second-row values corresponding to the same first-row value. This is a typical use of unique and accumarray:
[ii, ~, kk] = unique([a(1,:) b(1,:)]);
result = [ ii; accumarray(kk(:), [a(2,:) b(2,:)].').'];
General case
If you need to accumulate columns with an arbitrary number of columns (based on the first-row value), you can use sparse as follows:
[ii, ~, kk] = unique([a(1,:) b(1,:)]);
r = repmat((1:size(a,1)-1).', 1, numel(kk));
c = repmat(kk.', size(a,1)-1, 1);
result = [ii; full(sparse(r,c,[a(2:end,:) b(2:end,:)]))];

finding the first value of one matrix after satisfying condition of another matrix

Please help me solve this problem...
a = [1 2 3 4 5 6 7 8 9 10]
b = [12 4 13 7 5 7 8 10 3 12]
c = [4 5 3 2 6 7 5 3 4 5]
I have to find the first value on a, if the value on b is less than 10 for more than 3 consecutive places and index for the starting of satisfying the condition. Also the value of c after finding the value of b for same index.
Ans should be index for b=4, index for a=4 and value for a =4 and c=2
Thank you in advance
You may use strfind as one approach -
str1 = num2str(b <10,'%1d') %%// String of binary numbers
indx = strfind(['0' str1],'0111') %%// Indices where the condition is met
ind = indx(1) %%// Choose the first occurance
a_out = a(ind) %%// Index into a
c_out = c(ind) %%// Index into c
Output -
ind =
4
a_out =
4
c_out =
2
To find a given number of consecutive values lower than a threshold, you can apply conv to a vector of 0-1 values resulting from the comparison:
threshold = 10; %// values "should" be smaller than this
number = 4; %// at least 4 consecutive occurrences
ind = find(conv(double(b<threshold), ones(1,number), 'valid')==number, 1);
%// double(b<threshold) gives 0-1.
%// conv(...)==... gives 1 when the sought number of consecutive 1's is reached
%// find(... ,1) gives the first index where that happens
a_out = a(ind);
c_out = c(ind);

Element-wise array replication according to a count [duplicate]

This question already has answers here:
Repeat copies of array elements: Run-length decoding in MATLAB
(5 answers)
Closed 8 years ago.
My question is similar to this one, but I would like to replicate each element according to a count specified in a second array of the same size.
An example of this, say I had an array v = [3 1 9 4], I want to use rep = [2 3 1 5] to replicate the first element 2 times, the second three times, and so on to get [3 3 1 1 1 9 4 4 4 4 4].
So far I'm using a simple loop to get the job done. This is what I started with:
vv = [];
for i=1:numel(v)
vv = [vv repmat(v(i),1,rep(i))];
end
I managed to improve by preallocating space:
vv = zeros(1,sum(rep));
c = cumsum([1 rep]);
for i=1:numel(v)
vv(c(i):c(i)+rep(i)-1) = repmat(v(i),1,rep(i));
end
However I still feel there has to be a more clever way to do this... Thanks
Here's one way I like to accomplish this:
>> index = zeros(1,sum(rep));
>> index(cumsum([1 rep(1:end-1)])) = 1;
index =
1 0 1 0 0 1 1 0 0 0 0
>> index = cumsum(index)
index =
1 1 2 2 2 3 4 4 4 4 4
>> vv = v(index)
vv =
3 3 1 1 1 9 4 4 4 4 4
This works by first creating an index vector of zeroes the same length as the final count of all the values. By performing a cumulative sum of the rep vector with the last element removed and a 1 placed at the start, I get a vector of indices into index showing where the groups of replicated values will begin. These points are marked with ones. When a cumulative sum is performed on index, I get a final index vector that I can use to index into v to create the vector of heterogeneously-replicated values.
To add to the list of possible solutions, consider this one:
vv = cellfun(#(a,b)repmat(a,1,b), num2cell(v), num2cell(rep), 'UniformOutput',0);
vv = [vv{:}];
This is much slower than the one by gnovice..
What you are trying to do is to run-length decode. A high level reliable/vectorized utility is the FEX submission rude():
% example inputs
counts = [2, 3, 1];
values = [24,3,30];
the result
rude(counts, values)
ans =
24 24 3 3 3 30
Note that this function performs the opposite operation as well, i.e. run-length encodes a vector or in other words returns values and the corresponding counts.
accumarray function can be used to make the code work if zeros exit in rep array
function vv = repeatElements(v, rep)
index = accumarray(cumsum(rep)'+1, 1);
vv = v(cumsum(index(1:end-1))+1);
end
This works similar to solution of gnovice, except that indices are accumulated instead being assigned to 1. This allows to skip some indices (3 and 6 in the example below) and remove corresponding elements from the output.
>> v = [3 1 42 9 4 42];
>> rep = [2 3 0 1 5 0];
>> index = accumarray(cumsum(rep)'+1, 1)'
index =
0 0 1 0 0 2 1 0 0 0 0 2
>> cumsum(index(1:end-1))+1
ans =
1 1 2 2 2 4 5 5 5 5 5
>> vv = v(cumsum(index(1:end-1))+1)
vv =
3 3 1 1 1 9 4 4 4 4 4

Resources