Replicate scipy rankdata in matlab - arrays

I cannot figure out a clean solution on how to retrieve the new indices of the sorted array in MATLAB.
Scipy (in Python) has rankdata which I need, while MATLAB's sort provides indices.
For example [0 -3 -1 1] array after sorting in ascending order is [-3 -1 0 1].
I want to retrieve the new indices, i.e. [3 1 2 4], while MATLAB offers no built in solution..

You can use:
x = [0 -3 -1 1];
[~,ind] = sort(x);
ind = ind(ind)

unique happens to sort as well ascending, and gives indices both ways.
A = [0 -3 -1 1];
[B,I,C] = unique(A);
B =
-3 -1 0 1
C =
3
1
2
4
Do note that if A contains repetitions (unlike your example), this method will fail:
A = [0 -3 -1 1 1];
[B,I,C] = unique(A);
B =
-3 -1 0 1
C =
3
1
2
4
4

If I am interpreting what you are asking for correctly I think this should work.
x = [0 -3 -1 1];
[s,r]=sort(x);
[~,rank] = sort(r);
The output I get is
s = -3 -1 0 1
rank = 3 1 2 4

Related

Cluster analysis on a 1D vector

Consider the following data:
A = [-1 -1 -1 0 1 -1 -1 0 0 1 1 1 1 -1 1 0 1];
How can the size and appearance frequency of clusters in A (of similar neighbors) be calculated, preferably using MATLAB built in commands?
The result should read something like
s_plus = [1 2 3 4 5 ; 3 0 0 1 0]'; % accounts (1,1,1,1) and (1),(1),(1) which appear in A
s_zero = [1 2 3 4 5 ; 2 1 0 0 0]'; % accounts (0,0) and (0),(0) which appear in A
s_mins = [1 2 3 4 5 ; 1 1 1 0 0]'; % accounts (-1), (-1,-1) , and (-1,-1,-1)) which appear in A
in the above the first column indicates the cluster size and the second column is the appearance frequency.
You can use run length encoding to transform your input array into two arrays
The value of a group (or "run" of equal values)
The number of elements in that group
Then you can covert this into your desired output by checking when two conditions are true
The values array matches the value you want (-1,0,1)
The group size matches 1..5
This might sound a bit tricky but it's only a few lines of code, and should be relatively fast for even large arrays because the outputs are calculated from the "encoded" arrays which will be smaller than the input array.
Here is the code, see the comments for details:
A = [-1 -1 -1 0 1 -1 -1 0 0 1 1 1 1 -1 1 0 1]; % Example input
% Run length encoding step
idx = [ find( A(1:end-1) ~= A(2:end) ), numel(A) ]; % Find group start points
count = diff([0, idx]); % Find number of elements in each group
val = A( idx ); % Get value of each group
% Helper function to go from "val" and "count" to desired output format
% by checking value = target and group size matches 1 to 5, counting matching groups.
f = #(v) sum(val==v & count==(1:5).',2).';
% Create outputs
s_plus = f(1); % = [3 0 0 1 0]
s_zero = f(0); % = [2 1 0 0 0]
s_mins = f(-1); % = [1 1 1 0 0]

How to find out consecutive sequence of numbers in two different vectors in MATLAB?

Suppose one vector is x=
[-2 -2 -1 -1 -1 -2 -1 0 5 -1 0 5 -1 0] and other vector is y=[ 2 3 4 5 -1 0 5 -1 0 5 -1]. The two vectors need not to be of same length. I want to find out the similar sequence/pattern of longest consecutive numbers in two vectors using MATLAB? The result should be starting and ending indices of matched pattern in both vectors. For this example: ix=[7 12] and iy=[5 10].
This requires the Image Processing Toolbox and the Statistics Toolbox. It uses a loop over chunk size:
x = [-2 -2 -1 -1 -1 -2 -1 0 5 -1 0 5 -1 0];
y = [ 2 3 4 5 -1 0 5 -1 0 5 -1];
for n = min(numel(x), numel(y)):-1:1; % try sizes in decreasing order
x_sliding = reshape(im2col(x,[1 n],'sliding'),n,[]).'; % reshape needed for n=1
y_sliding = reshape(im2col(y,[1 n],'sliding'),n,[]).'; % reshape needed for n=1
[ind_x, ind_y] = find(pdist2(x_sliding, y_sliding) == 0);
if ~isempty(ind_x)
ix_start = ind_x;
iy_start = ind_y;
ix_end = ind_x+n-1;
iy_end = ind_y+n-1;
break
end
end
The solutions, if they exist, are given in ix_start, ix_end, iy_start, iy_end. If there are several solutions of the maximal possible size, the indices of all of them are produced.

Number 0's and 1's blocks in a binary vector

In MATLAB, there is the bwlabel function, that given a binary vector, for instance x=[1 1 0 0 0 1 1 0 0 1 1 1 0] gives (bwlabel(x)):
[1 1 0 0 0 2 2 0 0 3 3 3 0]
but what I want to obtain is
[1 1 2 2 2 3 3 4 4 5 5 5 6]
I know I can negate x to obtain (bwlabel(~x))
[0 0 1 1 1 0 0 2 2 0 0 0 3]
But how can I combine them?
All in one line:
y = cumsum([1,abs(diff(x))])
Namely, abs(diff(x)) spots changes in the binary vector, and you gain the output with the cumulative sum.
You can still do it using bwlabel by vertically concatenating x and ~x, using 4-connected components for the labeling, then taking the maximum down each column:
>> max(bwlabel([x; ~x], 4))
ans =
1 1 2 2 2 3 3 4 4 5 5 5 6
However, the solution from Bentoy13 is probably a bit faster.
x=[1 1 0 0 0 1 1 0 0 1 1 1 0];
A = bwlabel(x);
B = bwlabel(~x);
if x(1)==1
tmp = A>0;
A(tmp) = 2*A(tmp)-1;
tmp = B>0;
B(tmp) = 2*B(tmp);
C = A+B
elseif x(1)==0
tmp = A>0;
A(tmp) = 2*A(tmp);
tmp = B>1;
B(tmp) = 2*B(tmp)-1;
C = A+B
end
C =
1 1 2 2 2 3 3 4 4 5 5 5 6
You know the first index should remain 1, but the second index should go from 1 to 2, the third from 2 to 3 etc; thus even indices should be doubled and odd indices should double minus one. This is given by A+A-1 for odd entries, and B+B for even entries. So a simple check for whether A or B contains the even points is sufficient, and then simply add the two arrays.
I found this function that does exactly what i wanted:
https://github.com/davidstutz/matlab-multi-label-connected-components
So, clone the repository and compile in matlab using mex :
mex sp_fast_connected_relabel.cpp
Then,
labels = sp_fast_connected_relabel(x);

Syntax understanding in task with matlab

can comeone help to understand in MAtlab this:
k=2
n = (0:-1:-4)+k
the result; 2 1 0 -1 -2
how it works?
You are dealing with a colon operator and a vectorized sum at the same time. Let's split the problem into smaller, stand-alone problems:
In Matlab, if you add or subtract between a scalar value to a matrix, the arithmetic operation is performed on all the elements of the matrix, in a vectorized way. Example:
A = [1 2; 3 4]; % 2-by-2 matrix
S1 = A + 2 % output: S1 = [3 4; 5 6]
B = [1 2 3 4] % 1-by-5 matrix, also called column vector
S2 = B - 5 % output: S2 = [3 4 5 6]
The column operator in Matlab can be used in many situation: indexing, for iterations and vector creation. In your case, its purpose is the third one and it's syntax is START(:STEP):END. The default STEP, if not specified, is 1. The START and END parameters are never exceeded. Example:
A = 1:5 % output: A = [1 2 3 4 5]
B = -2.5:2.5:6 % output: B = [-2.5 0 2.5 5]
C = 1:-1:-5 % output: C = [1 0 -1 -2 -3 -4 -5]
D = -4:-2:0 % output: D = []
In all the programming languages, an operator precedence criterion is defined so that a one-liner calculation that uses multiple operators is atomized into smaller calculations that respect the given priority, unless parentheses are used to redefine the default criterion... just like in common maths. Example
A = 2 * 5 + 3 % output: A = 13
B = 2 * (5 + 3) % output: B = 16
Let's put all this together to provide you an explaination:
n = (0:-1:-4) + k
% vector creation has parentheses, so it's executed first
% then, the addition is executed on the result of the first operation
Let's subdivide the calculation into intermediate steps:
n_1 = 0:-1:-4 % output: n_1 = [0 -1 -2 -3 -4]
n_2 = n_1 + k % output: n_2 = [2 1 0 -1 -2]
n = n_2
Want to see what happens without parentheses?
n = 0:-1:-4+k % output: n = [0 -1 -2]
Why? Because the addition has priority over the colon operator. It's like writing n = 0:-1:(-4+k) and adding k to the END parameter of the colon operator. Let's subdivide the calculation into intermediate steps:
n_1 = -4 + k % output: n_1 = -2
n_2 = 0:-1:n_1 % output: n_2 = [0 -1 -2]
n = n_2
Basic Matlab syntax, you're dealing with range operators. There are two patterns:
[start]:[end]
and
[start]:[step]:[end]
Patterns like this result in arrays / vectors / "1D matrices".
In your example, you will get a vector first, stepping through the numbers 0 to -4 (step == -1). Then, you are adding k == 2 to all numbers in this vector.
octave:1> k = 2
k = 2
octave:2> n = (0:-1:-4)+k
n =
2 1 0 -1 -2
octave:3> 0:-1:-4
ans =
0 -1 -2 -3 -4
The parenthesizes expression determines an array. The the first number there is the first element, the second is the step and the last one is the last element. So the parenthesizes returns 0 -1 -2 -3 -4. Next we add k=2 to each element that results in 2 1 0 -1 -2

Find number of consecutive ones in binary array

I want to find the lengths of all series of ones and zeros in a logical array in MATLAB. This is what I did:
A = logical([0 0 0 1 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 1 1]);
%// Find series of ones:
csA = cumsum(A);
csOnes = csA(diff([A 0]) == -1);
seriesOnes = [csOnes(1) diff(csOnes)];
%// Find series of zeros (same way, using ~A)
csNegA = sumsum(~A);
csZeros = csNegA(diff([~A 0]) == -1);
seriesZeros = [csZeros(1) diff(csZeros)];
This works, and gives seriesOnes = [4 2 5] and seriesZeros = [3 1 6]. However it is rather ugly in my opinion.
I want to know if there is a better way to do this. Performance is not an issue as this is inexpensive (A is no longer than a few thousand elements). I am looking for code clarity and elegance.
If nothing better can be done, I'll just put this in a little helper function so I don't have to look at it.
You could use an existing code for run-length-encoding, which does the (ugly) work for you and then filter out your vectors yourself. This way your helper function is rather general and its functionality is evident from the name runLengthEncode.
Reusing code from this answer:
function [lengths, values] = runLengthEncode(data)
startPos = find(diff([data(1)-1, data]));
lengths = diff([startPos, numel(data)+1]);
values = data(startPos);
You would then filter out your vectors using:
A = logical([0 0 0 1 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 1 1]);
[lengths, values] = runLengthEncode(A);
seriesOnes = lengths(values==1);
seriesZeros = lengths(values==0);
You can try this:
A = logical([0 0 0 1 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 1 1]);
B = [~A(1) A ~A(end)]; %// Add edges at start/end
edges_indexes = find(diff(B)); %// find edges
lengths = diff(edges_indexes); %// length between edges
%// Separate zeros and ones, to a cell array
s(1+A(1)) = {lengths(1:2:end)};
s(1+~A(1)) = {lengths(2:2:end)};
This strfind (works wonderfully with numeric arrays as well as string arrays) based approach could be easier to follow -
%// Find start and stop indices for ones and zeros with strfind by using
%// "opposite (0 for 1 and 1 for 0) sentients"
start_ones = strfind([0 A],[0 1]) %// 0 is the sentient here and so on
start_zeros = strfind([1 A],[1 0])
stop_ones = strfind([A 0],[1 0])
stop_zeros = strfind([A 1],[0 1])
%// Get lengths of islands of ones and zeros using those start-stop indices
length_ones = stop_ones - start_ones + 1
length_zeros = stop_zeros - start_zeros + 1

Resources