Easiest way to create arrays based on repeating character positions - arrays

I want to group my elements using the repeated segments in the array. The breaking is basically depend on where the repeated segments are, in my real data contains ~10000 elements and I want to know if there is a easier way to do that.
Here is a short example to clarify what I want:
Let's say I have an array,
A=[1 5 3 4 4 4 6 9 8 8 9 5 2];
What I want is to break A into [1 5 3],[6 9], and [9 5 2];
What is the easiest to code this using matlab??
Thanks.

For a vectorized solution, you can find out the places where either forward or backward differences to the neighbor are zero, and then use bwlabel (from the Image Processing Toolbox) and accumarray to gather the data.
A=[1 5 3 4 4 4 6 9 8 8 9 5 2];
d = diff(A)==0;
%# combine forward and backward difference
%# and invert to identify non-repeating elments
goodIdx = ~([d,false]|[false,d]);
%# create list of group labels using bwlabel
groupIdx = bwlabel(goodIdx);
%# distribute the data into cell arrays
%# note that the first to inputs should be n-by-1
B = accumarray(groupIdx(goodIdx)',A(goodIdx)',[],#(x){x})
EDIT
Replace the last two lines of code with the following if you want the repeating elements to appear in the cell array as well
groupIdx = cumsum([1,abs(diff(goodIdx))]);
B = accumarray(groupIdx',A',[],#(x){x})
EDIT2
If you want to be able to split consecutive groups of identical numbers as well, you need to calculate groupIdx as follows:
groupIdx = cumsum([1,abs(diff(goodIdx))|~d.*~goodIdx(2:end)])

Here is a solution that works if I understand the question correctly. It can probably be optimised further.
A=[1 5 3 4 4 4 6 9 8 8 9 5 2];
% //First get logical array of non consecutive numbers
x = [1 (diff(A)~=0)];
for nn=1:numel(A)
if ~x(nn)
if x(nn-1)
x(nn-1)=0;
end
end
end
% //Make a cell array using the logical array
y = 1+[0 cumsum(diff(find(x))~=1)];
x(x~=0) = y;
for kk = unique(y)
B{kk} = A(x==kk);
end
B{:}

Related

Mean values of a matrix in a matrix on Matlab

This is about matlab.
Let's say I have a matrix like this
A = [1,2,3,4,5;6,7,8,9,10;11,12,13,14,15]‍
Now I want to know how to get a mean value of a small matrix in A.
Like a mean of the matrix located upper left side [1,2;6,7]
The only way I could think of is cut out the part I want to get a value from like this
X = A(1:2,:);
XY = X(:,1:2);
and mean the values column wise Mcol = mean(XY);.
and finally get a mean of the part by meaning Mcol row-wise.
Mrow = mean(Mcol,2);
I don't think this is a smart way to do this so it would be great if someone helps me make it smarter and faster.
Your procedure is correct. Some small improvements are:
Get XY using indexing in a single step: XY = A(1:2, 1:2)
Replace the two calls to mean by a single one on the linearized submatrix: mean(XY(:)).
Avoid creating XY. In this case you can linearize using reshape as follows: mean(reshape(A(1:2, 1:2), 1, [])).
If you want to do this for all overlapping submatrices, im2col from the Image Processing Toolbox may be handy:
submatrix_size = [2 2];
A_sub = im2col(A, submatrix_size);
gives
A_sub =
1 6 2 7 3 8 4 9
6 11 7 12 8 13 9 14
2 7 3 8 4 9 5 10
7 12 8 13 9 14 10 15
that is, each column is one of the submatrices linearized. So now you only need mean(A_sub, 1) to get the means of all submatrices.

MATLAB - extract array values based on conditions

I have 4x4 matrix A
[1 2 3 4;
2 2 2 3;
5 5 5 5;
4 4 4 4]
I know how to locate all values less than 4. A<4. But I'm not sure how to write an 'if' statement for; three or more values, all which are less than 4, contained in the same row. For instance; see above A(1,:) and A(2,:) satisfies my conditions.
You can basically do A<4 to know which ones are smaller. If you want to know which rows contain N values smaller than 4 then you can do
rows=find(sum(A<4,2)>=3)
This basically does:
find smaller than 4
Count how many of them in each row (sum(_,2))
find if they are 3 or more
give the row index of those find()

Find elements in array those have come two times in array

Given A = [3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 8]
Output B = [3 4 5 6 8]
Is there a Matlab function or command to get this result? I am new to Matlab. Just now I am doing it going through for each element and keeping a counter for it. I have very big array so this is taking too much time.
Use a combination of unique and histc:
uA = unique(A); %// find unique values
B = uA(histc(A, uA)>=2); %// select those that appear at least twice
The above code gives the values that appear at least twice. If you want values that appear exactly twice, replace >= by ==.

Trying to compare elements of on array with every element of another array in matlab

I'm using Matlab, and I'm trying to come up with a vectorized solution for comparing the elements of one array to every element of another array. Specifically I want to find the difference and see if this difference is below a certain threshold.
Ex: a = [1 5 10 15] and b=[12 13 14 15], threshold = 6
so the elements in a that would satisfy the threshold would be 10 and 15 since each value comes within 6 of any of the values in b while 1 and 5 do not. Currently I have a for loop going through the elements of a and subtracting an equivalently sized matrix from b (for 5 it would be a = [5 5 5 5]). This obviously takes a long time so I'm trying to find a vectorized solution. Additionally, the current format I have my data in is actually cells where each cell element has size [1 2], and I have been using the cellfun function to perform my subtraction. I'm not sure if this complicates the solution of each [1 2] block with the [1 2] block of the second cell. A vectorized solution response is fine, there is no need to do the threshold analysis. I just added it in for a little more background.
Thanks in advance,
Manwei Chan
Use bsxfun:
>> ind = any(abs(bsxfun(#minus,a(:).',b(:)))<threshold)
ind =
0 0 1 1
>> a(ind)
ans =
10 15

matlab: eliminate elements from array

I have quite big array. To make things simple lets simplify it to:
A = [1 1 1 1 2 2 3 3 3 3 4 4 5 5 5 5 5 5 5 5];
So, there is a group of 1's (4 elements), 2's (2 elements), 3's (4 elements), 4's (2 elements) and 5's (8 elements). Now, I want to keep only columns, which belong to group of 3 or more elements. So it will be like:
B = [1 1 1 1 3 3 3 3 5 5 5 5 5 5 5 5];
I was doing it using for loop, scanning separately 1's, 2's, 3's and so on, but its extremely slow with big arrays...
Thanks for any suggestions how to do it in more efficient way :)
Art.
A general approach
If your vector is not necessarily sorted, then you need to run to count the number of occurrences of each element in the vector. You have histc just for that:
elem = unique(A);
counts = histc(A, elem);
B = A;
B(ismember(A, elem(counts < 3))) = []
The last line picks the elements that have less than 3 occurrences and deletes them.
An approach for a grouped vector
If your vector is "semi-sorted", that is if similar elements in the vector are grouped together (as in your example), you can speed things up a little by doing the following:
start_idx = find(diff([0, A]))
counts = diff([start_idx, numel(A) + 1]);
B = A;
B(ismember(A, A(start_idx(counts < 3)))) = []
Again, note that the vector need not to be entirely sorted, just that similar elements are adjacent to each other.
Here is my two-liner
counts = accumarray(A', 1);
B = A(ismember(A, find(counts>=3)));
accumarray is used to count the individual members of A. find extracts the ones that meet your '3 or more elements' criterion. Finally, ismember tells you where they are in A. Note that A needs not be sorted. Of course, accumarray only works for integer values in A.
What you are describing is called run-length encoding.
There is software for this in Matlab on the FileExchange. Or you can do it directly as follows:
len = diff([ 0 find(A(1:end-1) ~= A(2:end)) length(A) ]);
val = A(logical([ A(1:end-1) ~= A(2:end) 1 ]));
Once you have your run-length encoding you can remove elements based on the length. i.e.
idx = (len>=3)
len = len(idx);
val = val(idx);
And then decode to get the array you want:
i = cumsum(len);
j = zeros(1, i(end));
j(i(1:end-1)+1) = 1;
j(1) = 1;
B = val(cumsum(j));
Here's another way to do it using matlab built-ins.
% Set up
A=[1 1 1 1 2 2 3 3 3 3 4 4 5 5 5 5 5];
threshold=2;
% Get the unique elements of the array
uniqueElements=unique(A);
% Count haw many times each unique element occurs
counts=histc(A,uniqueElements);
% Write which elements should be kept
toKeep=uniqueElements(counts>threshold);
% Make a logical index
indexer=false(size(A));
for i=1:length(toKeep)
% For every unique element we want to keep select the indices in A that
% are equal
indexer=indexer|(toKeep(i)==A);
end
% Apply index
B=A(indexer);

Resources