Merge multiple arrays of unique occurrences - arrays

I want to merge multiple arrays of unique occurrences to a single array. To get the arrays in the first place I use this code, where image series is a slice from a tiff image imported using imread:
a = unique(img_series);
occu = [a,histc(img_series(:),a)];
I do that multiple times, because the tiff image I'm using has multiple hundred images stacked, which my RAM will not support to import at once. So each 'occu' looks something like this (first number is the unique value, second number is the number of occurrences):
occu1 occu2 .....
0 1 1 2
12 1 10 1
14 1 12 1
15 1 14 2
.. .. .. .. .....
Now I want to merge them all together, or better merge them in each iteration, when I'm reading another stacked image.
The merged results should be a 2D matrix similar to the one above. The number of occurrences of the same values should be added to one another, as this is the whole point of counting them. So the result of the above example should be this:
occu_total
0 1
1 2
10 1
12 2
14 3
15 1
.. ..
I found the join command, but that one does not seem to work here. I guess I could do it the long way of searching the matching number and add the occurrences together and so on, but there must be a quicker way of doing it.

A = [0 1;12 1; 14 1;15 1];B = [1 2;10 1;12 1;14 2];
tmp = [A;B]; %// merge arrays into a single one
tmp(:,1) = tmp(:,1)+1;%// remove zero occurrences by adding 1 to everything
C = accumarray(tmp(:,1),tmp(:,2)); %// add occurrences all up
D = [1:numel(C)].'; %// create numbered array
E = [D C];
E((C==0),:)=[]; %// get output
E(:,1) = E(:,1)-1;%// subtract the 1 again
E =
0 1
1 2
10 1
12 2
14 3
15 1
Job for accumarray. This takes the first argument as your dictionary key, and adds the values of the each key together. The addition and subtraction of 1 is done because 0 cannot be an index in MATLAB. To circumvent this (assuming you have no negative numbers), you can simply add 1 and remove that afterwards, shifting all your indices to positive integers. If you hit negative numbers, subtract tmp(:,1) = min(tmp(:,1)+1 and add E(:,1) = min(tmp(:,1)-1

Related

How to compare and add elements of an array

I'd like to take a single array lets say 3x5 as follows
1 3 5
1 3 5
2 8 6
4 5 7
4 5 8
and write a code that will create a new array that adds the third column together with the previous number if the numbers in the first and second columns equal the numbers in the row below it.
since the first two values in row 1 and 2, then add the third elements in row 1 and 2 together
so the output from the array above should look like this
1 3 10
2 8 6
4 5 15
The function accumarray(subs,val) accumulate elements of vector val using the subscripts subs. So we can use this function to sum the elements in the third column having the same value in the first and second column. We can use unique(...,'rows') to determine which pairs of value are unique.
%Example data
A = [1 3 5,
1 3 5,
2 3 6,
4 5 7,
4 5 7]
%Get the unique pair of value based on the first two column (uni) and the associated index.
[uni,~,sub] = unique(A(:,1:2),'rows');
%Create the result using accumarray.
R = [uni, accumarray(sub,A(:,3))]
If the orders matters the script would be a little bit more complex:
%Get the unique pair of value based on the first two column (uni) and the associated index.
[uni,~,sub] = unique(A(:,1:2),'rows');
%Locate the consecutive similar row with this small tricks
dsub = diff([0;sub])~=0;
%Create the adjusted index
subo = cumsum(dsub);
%Create the new array
R = [uni(sub(dsub),:), accumarray(subo,A(:,3))]
Or you can get an identical result with a for loop:
R = A(1,:)
for ii = 2:length(A)
if all(A(ii-1,1:2)==A(ii,1:2))
R(end,3) = R(end,3)+A(ii,3)
else
R(end+1,:) = A(ii,:)
end
end
Benchmark:
With an array A of size 100000x3 on the mathworks live editor:
The for loop take about 5.5s (no pre-allocation, so it's pretty slow)
The vectorized method take about 0.012s

Find 10 most repeated elements in a vector in MATLAB

I'm suppose to find 10 most repeated elements in a vector with n elements,
(the elements are from 1-100)
does anyone know how to do that?
I know how to find the one that is most repeated element in a vector but I don't know how to find 10 most repeated elements with n being unknown.
a = randi(10,1,100);
y = hist(a,1:max(a));
[~,ind] = sort(y,'descend');
out = ind(1:10);
for number of occurrences use y(ind(1:10)).
I had some doubts so I tested it many times, it seems to work.
You can use unique for that case. In my example, I have 4 numbers and I want to grep the 2 with the most occurances.
A = [1 1 3 3 1 1 2 2 1 1 1 2 3 3 3 4 4 4 4];
B = sort(A); % Required for the usage of unique below
[~,i1] = unique(B,'first');
[val,i2] = unique(B,'last');
[~,pos] = sort(i2-i1,'descend');
val(pos(1:2))
1 3
Replace val(pos(1:2)) by val(pos(1:10)) in your case to get the 10 most values. The get the number of elements you can use i1 and i2.
num = i2-i1+1;
num(1:2)
ans =
7 3
Since you already know how to find the most repeated element, you could use the following algorithm:
Find the most repeated element of the vector
Remove the most repeated element from the vector
Repeat the process on the new vector to find the 2nd most repeated element
Continue until you have the 10 most repeated elements
The code would look something like:
count = 0;
values = [];
while count < 10
r = Mode(Vector);
values = [values r]; % store most repeated values
Vector = Vector(find(Vector~=r));
count = count + 1;
end
Not efficient, but it'll get the job done

Matlab : Matrix indexing Logic

i am doing very simple Matrix indexing examples . where code is as give below
>> A=[ 1 2 3 4 ; 5 6 7 8 ; 9 10 11 12 ]
A =
1 2 3 4
5 6 7 8
9 10 11 12
>> A(end, end-2)
ans =
10
>> A(2:end, end:-2:1)
ans =
8 6
12 10
here i am bit confused . when i use A(end, end-2) it takes difference of two till first column and when there is just one column left there is no further processing , but when i use A(2:end, end:-2:1) it takes 6 10 but how does it print 8 12 while there is just one column left and we have to take difference of two from right to left , Pleas someone explain this simple point
The selection A(end, end-2) reads: take elements in last row of A that appear in column 4(end)-2=2.
The selection A(2:end, end:-2:1) similarly reads: take elements in rows 2 to 4(end) and starting from last column going backwards in jumps of two, i.e. 4 then 2.
To check the indexing, simply substitute the end with size(A,1) or size(A,2) if respectively found in the row and col position.
First the general stuff: end is just a placeholder for an index, namely the last position in a given array dimension. For instance, for an arbitrary array A(end,1) will pick the last element in column 1, and A(1,end) will pick the last element in the first row.
In your example, A(end, end-2) picks an element in the last row two columns before the last one.
To interpret a statement such as
A(2:end, end:-2:1)
it might help to substitute end with the actual index of the last row/column elements, so this is equivalent to
A(2:3, 4:-2:1)
Furthermore 4:-2:1 is equivalent to the list 4,2 since we are instructing to make the list starting at 4, decreasing in steps of 2, up to (minimum) 1. So this is equivalent to
A([2 3],[4 2])
Finally, the following combination of indices is implied by A([2 3],[4 2]):
A(2,4) A(2,2)
A(3,4) A(3,2)

matlab: eliminate elements from array

I have quite big array. To make things simple lets simplify it to:
A = [1 1 1 1 2 2 3 3 3 3 4 4 5 5 5 5 5 5 5 5];
So, there is a group of 1's (4 elements), 2's (2 elements), 3's (4 elements), 4's (2 elements) and 5's (8 elements). Now, I want to keep only columns, which belong to group of 3 or more elements. So it will be like:
B = [1 1 1 1 3 3 3 3 5 5 5 5 5 5 5 5];
I was doing it using for loop, scanning separately 1's, 2's, 3's and so on, but its extremely slow with big arrays...
Thanks for any suggestions how to do it in more efficient way :)
Art.
A general approach
If your vector is not necessarily sorted, then you need to run to count the number of occurrences of each element in the vector. You have histc just for that:
elem = unique(A);
counts = histc(A, elem);
B = A;
B(ismember(A, elem(counts < 3))) = []
The last line picks the elements that have less than 3 occurrences and deletes them.
An approach for a grouped vector
If your vector is "semi-sorted", that is if similar elements in the vector are grouped together (as in your example), you can speed things up a little by doing the following:
start_idx = find(diff([0, A]))
counts = diff([start_idx, numel(A) + 1]);
B = A;
B(ismember(A, A(start_idx(counts < 3)))) = []
Again, note that the vector need not to be entirely sorted, just that similar elements are adjacent to each other.
Here is my two-liner
counts = accumarray(A', 1);
B = A(ismember(A, find(counts>=3)));
accumarray is used to count the individual members of A. find extracts the ones that meet your '3 or more elements' criterion. Finally, ismember tells you where they are in A. Note that A needs not be sorted. Of course, accumarray only works for integer values in A.
What you are describing is called run-length encoding.
There is software for this in Matlab on the FileExchange. Or you can do it directly as follows:
len = diff([ 0 find(A(1:end-1) ~= A(2:end)) length(A) ]);
val = A(logical([ A(1:end-1) ~= A(2:end) 1 ]));
Once you have your run-length encoding you can remove elements based on the length. i.e.
idx = (len>=3)
len = len(idx);
val = val(idx);
And then decode to get the array you want:
i = cumsum(len);
j = zeros(1, i(end));
j(i(1:end-1)+1) = 1;
j(1) = 1;
B = val(cumsum(j));
Here's another way to do it using matlab built-ins.
% Set up
A=[1 1 1 1 2 2 3 3 3 3 4 4 5 5 5 5 5];
threshold=2;
% Get the unique elements of the array
uniqueElements=unique(A);
% Count haw many times each unique element occurs
counts=histc(A,uniqueElements);
% Write which elements should be kept
toKeep=uniqueElements(counts>threshold);
% Make a logical index
indexer=false(size(A));
for i=1:length(toKeep)
% For every unique element we want to keep select the indices in A that
% are equal
indexer=indexer|(toKeep(i)==A);
end
% Apply index
B=A(indexer);

How can I concatenate ranges of numbers into an array in MATLAB?

For example, I want to combine two ranges of numbers like this:
1 2 3 4 5 11 12 13 14 15 16 17 18 19 20
So, I tried:
a = 1:5,11:20
but that didn't work.
I also want to do this in a non-hardcoded way so that the missing 5 elements can start at any index.
For your example, you need to use square brackets to concatenate your two row vectors:
a = [1:5 11:20];
Or to make it less hardcoded:
startIndex = 6; %# The starting index of the 5 elements to remove
a = [1:startIndex-1 startIndex+5:20];
You may also want to check out these related functions: HORZCAT, VERTCAT, CAT.
There are a few other ways you could do this as well. For one, you could first make the entire vector, then index the elements you don't want and remove them (i.e. set them to the empty vector []):
a = 1:20; %# The entire vector
a(6:10) = []; %# Remove the elements in indices 6 through 10
You could also use set operations to do this, such as the function SETDIFF:
a = setdiff(1:20,6:10); %# Get the values from 1 to 20 not including 6 to 10

Resources