Comparing two arrays of pixel values, and store any matches - arrays

I want to compare the pixel values of two images, which I have stored in arrays.
Suppose the arrays are A and B. I want to compare the elements one by one, and if A[l] == B[k], then I want to store the match as a key value-pair in a third array, C, like so: C[l] = k.
Since the arrays are naturally quite large, the solution needs to finish within a reasonable amount of time (minutes) on a Core 2 Duo system.

This seems to work in under a second for 1024*720 matrices:
A = randi(255,737280,1);
B = randi(255,737280,1);
C = zeros(size(A));
[b_vals, b_inds] = unique(B,'first');
for l = 1:numel(b_vals)
C(A == b_vals(l)) = b_inds(l);
end
First we find the unique values of B and the indices of the first occurrences of these values.
[b_vals, b_inds] = unique(B,'first');
We know that there can be no more than 256 unique values in a uint8 array, so we've reduced our loop from 1024*720 iterations to just 256 iterations.
We also know that for each occurrence of a particular value, say 209, in A, those locations in C will all have the same value: the location of the first occurrence of 209 in B, so we can set all of them at once. First we get locations of all of the occurrences of b_vals(l) in A:
A == b_vals(l)
then use that mask as a logical index into C.
C(A == b_vals(l))
All of these values will be equal to the corresponding index in B:
C(A == b_vals(l)) = b_inds(l);
Here is the updated code to consider all of the indices of a value in B (or at least as many as are necessary). If there are more occurrences of a value in A than in B, the indices wrap.
A = randi(255,737280,1);
B = randi(255,737280,1);
C = zeros(size(A));
b_vals = unique(B);
for l = 1:numel(b_vals)
b_inds = find(B==b_vals(l)); %// find the indices of each unique value in B
a_inds = find(A==b_vals(l)); %// find the indices of each unique value in A
%// in case the length of a_inds is greater than the length of b_inds
%// duplicate b_inds until it is larger (or equal)
b_inds = repmat(b_inds,[ceil(numel(a_inds)/numel(b_inds)),1]);
%// truncate b_inds to be the same length as a_inds (if necessary) and
%// put b_inds into the proper places in C
C(a_inds) = b_inds(1:numel(a_inds));
end
I haven't fully tested this code, but from my small samples it seems to work properly and on the full-size case, it only takes about twice as long as the previous code, or less than 2 seconds on my machine.

So, if I understand your question correctly, you want for each value of l=1:length(A) the (first) index k into B so that A(l) == B(k). Then:
C = arrayfun(#(val) find(B==val, 1, 'first'), A)
could give you your solution, as long as you're sure that every element will have a match. The above solution would fail otherwise, complaning that the function returned a non-scalar (because find would return [] if no match is found). You have two options:
Using a cell array to store the result instead of a numeric array. You would need to call arrayfun with 'UniformOutput', false at the end. Then, the values of A without matches in B would be those for which isempty(C{i}) is true.
Providing a default value for an index into A with no matches in B (e.g. 0 or NaN). I'm not sure about this one, but I think that you would need to add 'ErrorHandler', #(~,~) NaN to the arrayfun call. The error handler is a function that gets called when the function passed to arrayfun fails, and may either rethrow the error or compute a substitute value. Thus the #(~,~) NaN. I am not sure that it would work, however, since in this case the error is in arrayfun and not in the passed function, but you can try it.

If you have the images in arrays A & B
idx = A == B;
C = zeros(size(A));
C(idx) = A(idx);

Related

Sorting an array of integers in nlog(n) time without using comparison operators

Imagine there's have an array of integers but you aren't allowed to access any of the values (so no Arr[i] > Arr[i+1] or whatever). The only way to discern the integers from one another is by using a query() function: this function takes a subset of elements as inputs and returns the number of unique integers in this subset. The goal is to partition the integers into groups based on their values — integers in the same group should have the same value, while integers in different groups have different values.
The catch - the code has to be O(nlog(n)), or in other words the query() function can only be called O(nlog(n)) times.
I've spent hours optimizing different algorithms in Python, but all of them have been O(n^2). For reference, here's the code I start out with:
n = 100
querycalls = 0
secretarray = [random.randint(0, n-1) for i in range(n)]
def query(items):
global querycalls
querycalls += 1
return len(set(items))
groups = []
secretarray generates a giant random list of numbers of length n. querycalls keeps track of how much the function is called. groups are where the results go.
The first thing I did was try to create an algorithm based off of merge sort (split the arrays down and then merge based on the query() value) but I could never get it below O(n^2).
Let's say you have an element x and an array of distinct elements, A = [x0, x1, ..., x_{k-1}] and want to know if the x is equivalent to some element in the array and if yes, to which element.
What you can do is a simple recursion (let's call it check-eq):
Check if query([x, A]) == k + 1. If yes, then you know that x is distinct from every element in A.
Otherwise, you know that x is equivalent to some element of A. Let A1 = A[:k/2], A2 = A[k/2+1:]. If query([x, A1]) == len(A1), then you know that x is equivalent to some element in A1, so recurse in A1. Otherwise recurse in A2.
This recursion takes at most O(logk) steps. Now, let our initial array be T = [x0, x1, ..., x_{n-1}]. A will be an array of "representative" of the groups of elements. What you do is first take A = [x0] and x = x1. Now use check-eq to see if x1 is in the same group as x0. If no, then let A = [x0, x1]. Otherwise do nothing. Proceed with x = x2. You can see how it goes.
Complexity is of course O(nlogn), because check-eq is called exactly n-1 times and each call take O(logn) time.

Python: Finding the row index of a value in 2D array when a condition is met

I have a 2D array PointAndTangent of dimension 8500 x 5. The data is row-wise with 8500 data rows and 5 data values for each row. I need to extract the row index of an element in 4th column when this condition is met, for any s:
abs(PointAndTangent[:,3] - s) <= 0.005
I just need the row index of the first match for the above condition. I tried using the following:
index = np.all([[abs(s - PointAndTangent[:, 3])<= 0.005], [abs(s - PointAndTangent[:, 3]) <= 0.005]], axis=0)
i = int(np.where(np.squeeze(index))[0])
which doesn't work. I get the follwing error:
i = int(np.where(np.squeeze(index))[0])
TypeError: only size-1 arrays can be converted to Python scalars
I am not so proficient with NumPy in Python. Any suggestions would be great. I am trying to avoid using for loop as this is small part of a huge simulation that I am trying.
Thanks!
Possible Solution
I used the following
idx = (np.abs(PointAndTangent[:,3] - s)).argmin()
It seems to work. It returns the row index of the nearest value to s in the 4th column.
You were almost there. np.where is one of the most abused functions in numpy. Half the time, you really want np.nonzero, and the other half, you want to use the boolean mask directly. In your case, you want np.flatnonzero or np.argmax:
mask = abs(PointAndTangent[:,3] - s) <= 0.005
mask is a 1D array with ones where the condition is met, and zeros elsewhere. You can get the indices of all the ones with flatnonzero and select the first one:
index = np.flatnonzero(mask)[0]
Alternatively, you can select the first one directly with argmax:
index = np.argmax(mask)
The solutions behave differently in the case when there are no rows meeting your condition. Three former does indexing, so will raise an error. The latter will return zero, which can also be a real result.
Both can be written as a one-liner by replacing mask with the expression that was assigned to it.

Array of sets in Matlab

Is there a way to create an array of sets in Matlab.
Eg: I have:
a = ones(10,1);
b = zeros(10,1);
I need c such that c = [(1,0); (1,0); ...], i.e. each set in c has first element from a and 2nd element from b with the corresponding index.
Also is there some way I can check if an unknown set (x,y) is in c.
Can you all please help me out? I am a Matlab noob. Thanks!
There are not sets in your understanding in MATLAB (I assume that you are thinking of tuples on Python...) But there are cells in MATLAB. That is a data type that can store pretty much anything (you may think of pointers if you are familiar with the concept). It is indicated by using { }.
Knowing this, you could come up with a cell of arrays and check them using cellfun
% create a cell of numeric arrays
C = {[1,0],[0,2],[99,-1]}
% check which input is equal to the array [1,0]
lg = cellfun(#(x)isequal(x,[1,0]),C)
Note that you access the address of a cell with () and the content of a cell with {}. [] always indicate arrays of something. We come to this in a moment.
OK, this was the part that you asked for; now there is a bonus:
That you use the term set makes me feel that they always have the same size. So why not create an array of arrays (or better an array of vectors, which is a matrix) and check this matrix column-wise?
% array of vectors (there is a way with less brackets but this way is clearer):
M = [[1;0],[0;2],[99;-1]]
% check at which column (direction ",1") all rows are equal to the proposed vector:
lg = all(M == [0;2],1)
This way is a bit clearer, better in terms of memory and faster.
Note that both variables lg are arrays of logicals. You can use them directly to index the original variable, i.e. M(:,lg) and C{lg} returns the set that you are looking for.
If you would like to get logical value regarding if p is in C, maybe you can try the code below
any(sum((C-p).^2,2)==0)
or
any(all(C-p==0,2))
Example
C = [1,2;
3,-1;
1,1;
-2,5];
p1 = [1,2];
p2 = [1,-2];
>> any(sum((C-p1).^2,2)==0) # indicating p1 is in C
ans = 1
>> any(sum((C-p2).^2,2)==0) # indicating p2 is not in C
ans = 0
>> any(all(C-p1==0,2))
ans = 1
>> any(all(C-p2==0,2))
ans = 0

Substitute a vector value with two values in MATLAB

I have to create a function that takes as input a vector v and three scalars a, b and c. The function replaces every element of v that is equal to a with a two element array [b,c].
For example, given v = [1,2,3,4] and a = 2, b = 5, c = 5, the output would be:
out = [1,5,5,3,4]
My first attempt was to try this:
v = [1,2,3,4];
v(2) = [5,5];
However, I get an error, so I do not understand how to put two values in the place of one in a vector, i.e. shift all the following values one position to the right so that the new two values fit in the vector and, therefore, the size of the vector will increase in one. In addition, if there are several values of a that exist in v, I'm not sure how to replace them all at once.
How can I do this in MATLAB?
Here's a solution using cell arrays:
% remember the indices where a occurs
ind = (v == a);
% split array such that each element of a cell array contains one element
v = mat2cell(v, 1, ones(1, numel(v)));
% replace appropriate cells with two-element array
v(ind) = {[b c]};
% concatenate
v = cell2mat(v);
Like rayryeng's solution, it can replace multiple occurrences of a.
The problem mentioned by siliconwafer, that the array changes size, is here solved by intermediately keeping the partial arrays in cells of a cell array. Converting back to an array concenates these parts.
Something I would do is to first find the values of v that are equal to a which we will call ind. Then, create a new output vector that has the output size equal to numel(v) + numel(ind), as we are replacing each value of a that is in v with an additional value, then use indexing to place our new values in.
Assuming that you have created a row vector v, do the following:
%// Find all locations that are equal to a
ind = find(v == a);
%// Allocate output vector
out = zeros(1, numel(v) + numel(ind));
%// Determine locations in output vector that we need to
%// modify to place the value b in
indx = ind + (0:numel(ind)-1);
%// Determine locations in output vector that we need to
%// modify to place the value c in
indy = indx + 1;
%// Place values of b and c into the output
out(indx) = b;
out(indy) = c;
%// Get the rest of the values in v that are not equal to a
%// and place them in their corresponding spots.
rest = true(1,numel(out));
rest([indx,indy]) = false;
out(rest) = v(v ~= a);
The indx and indy statements are rather tricky, but certainly not hard to understand. For each index in v that is equal to a, what happens is that we need to shift the vector over by 1 for each index / location of v that is equal to a. The first value requires that we shift the vector over to the right by 1, then the next value requires that we shift to the right by 1 with respect to the previous shift, which means that we actually need to take the second index and shift by the right by 2 as this is with respect to the original index.
The next value requires that we shift to the right by 1 with respect to the second shift, or shifting to the right by 3 with respect to the original index and so on. These shifts define where we're going to place b. To place c, we simply take the indices generated for placing b and move them over to the right by 1.
What's left is to populate the output vector with those values that are not equal to a. We simply define a logical mask where the indices used to populate the output array have their locations set to false while the rest are set to true. We use this to index into the output and find those locations that are not equal to a to complete the assignment.
Example:
v = [1,2,3,4,5,4,4,5];
a = 4;
b = 10;
c = 11;
Using the above code, we get:
out =
1 2 3 10 11 5 10 11 10 11 5
This successfully replaces every value that is 4 in v with the tuple of [10,11].
I think that strrep deserves a mention here.
Although it's called string replacement and warns for non-char input, it still works perfectly fine for other numbers as well (including integers, doubles and even complex numbers).
v = [1,2,3,4]
a = 2, b = 5, c = 5
out = strrep(v, a, [b c])
Warning: Inputs must be character arrays or cell arrays of strings.
out =
1 5 5 3 4
You are not attempting to overwrite an existing value in the vector. You're attempting to change the size of the vector (meaning the number of rows or columns in the vector) because you're adding an element. This will always result in the vector being reallocated in memory.
Create a new vector, using the first and last half of v.
Let's say your index is stored in the variable index.
index = 2;
newValues = [5, 5];
x = [ v(1:index), newValues, v(index+1:end) ]
x =
1 2 5 5 3 4

Indicies of zero ranges in a zero-one matrix

I am using Matlab for one of my projects. I am actually stuck at a point since some time now. Tried searching on google, but, not much success.
I have an array of 0s and 1s. Something like:
A = [0,0,0,1,1,1,1,1,0,0,1,1,1,1,1,1,0,0,0,0,0,1,1,1,0,0,0,0];
I want to extract an array of indicies: [x_1, x_2, x_3, x_4, x_5, ..]
Such that x_1 is the index of start of first range of zeros. x_2 is the index of end of first range of zeros.
x_3 is the index of start of second range of zeros. x_4 is the index of end of second range of zeros.
For the above example:
x_1 = 1, x_2 = 3
x_3 = 9, x_4 = 10
and so on.
Of course, I can do it by writing a simple loop. I am wondering if there is a more elegant (vectorized) way to solve this problem. I was thinking about something like prefix some, but, no luck as of now.
Thanks,
Anil.
The diff function is great for this sort of stuff and pretty quick.
temp = diff(A);
Starts = find([A(1) == 0, temp==-1]);
Ends = find([temp == 1,A(end)==0])
Edit: Fixed the error in the Ends calculation caught by gnovice.
Zeros not preceded by other zeros: A==0 & [true A(1:(end-1))~=0]
Zeros not followed by other zeros: A==0 & [A(2:end)~=0 true]
Use each of these plus find to get starts and ends of runs of zeros. Then, if you really want them in a single vector as you described, interleave them.
If you want to get your results in a single vector like you described above (i.e. x = [x_1 x_2 x_3 x_4 x_5 ...]), then you can perform a second-order difference using the function DIFF and find the points greater than 0:
x = find(diff([1 A 1],2) > 0);
EDIT:
The above will work for the case when there are at least 2 zeroes in every string of zeroes. If you will have single zeroes appearing in A, the above can be modified to handle them like so:
diffA = diff([1 A 1],2);
[~,x] = find([diffA > 0; diffA == 2]);
In this case, a single zero value will create repeated indices in x (i.e. if A starts with a single zero, then x(1) and x(2) will both be 1).

Resources