Finding the position of all sets of consecutive ones in an Array (MATLAB)

Finding the position of all sets of consecutive ones in an Array (MATLAB) - arrays

I have an array of the following values:
X=[1 1 1 2 3 4 1 1 1 1 5 4 2 1 1 2 3 4 1 1 1 1 1 2 2 1]
I want to get the position (the index) of all the consecutive ones in the array, and construct an array that holds the start and end positions of each set of the consecutive zeros:
idx= [1 3; 7 10; 14 15; 19 23; 26 26];
I tried to use the following functions, but I am not sure how to implement it:
positionofoness= find(X==1);
find(diff(X==1));
How can I construct idx array ??

You were almost there with your find and diff solution. To find all the positions where X changes from 1, pad X with a NaN in the beginning and the end:
tmp = find(diff([NaN X NaN] == 1)) % NaN to identify 1st and last elements as start and end
tmp =
1 4 7 11 14 16 19 24 26 27
%start|end start|end
Notice that every even element tmp indicates the index + 1 of where consecutive 1s end.
idx = [reshape(tmp,2,[])]'; % reshape in desired form
idx = [idx(:,1) idx(:,2)-1]; % subtract 1 from second column

Related

Shuffle array while spacing repeating elements

I'm trying to write a function that shuffles an array, which contains repeating elements, but ensures that repeating elements are not too close to one another.
This code works but seems inefficient to me:
function shuffledArr = distShuffle(myArr, myDist)
% this function takes an array myArr and shuffles it, while ensuring that repeating
% elements are at least myDist elements away from on another
% flag to indicate whether there are repetitions within myDist
reps = 1;
while reps
% set to 0 to break while-loop, will be set to 1 if it doesn't meet condition
reps = 0;
% randomly shuffle array
shuffledArr = Shuffle(myArr);
% loop through each unique value, find its position, and calculate the distance to the next occurence
for x = 1:length(unique(myArr))
% check if there are any repetitions that are separated by myDist or less
if any(diff(find(shuffledArr == x)) <= myDist)
reps = 1;
break;
end
end
end
This seems suboptimal to me for three reasons:
1) It may not be necessary to repeatedly shuffle until a solution has been found.
2) This while loop will go on forever if there is no possible solution (i.e. setting myDist to be too high to find a configuration that fits). Any ideas on how to catch this in advance?
3) There must be an easier way to determine the distance between repeating elements in an array than what I did by looping through each unique value.
I would be grateful for answers to points 2 and 3, even if point 1 is correct and it is possible to do this in a single shuffle.

I think it is sufficient to check the following condition to prevent infinite loops:
[~,num, C] = mode(myArr);
N = numel(C);
assert( (myDist<=N) || (myDist-N+1) * (num-1) +N*num <= numel(myArr),...
'Shuffling impossible!');
Assume that myDist is 2 and we have the following data:
[4 6 5 1 6 7 4 6]
We can find the the mode , 6, with its occurence, 3. We arrange 6s separating them by 2 = myDist blanks:
6 _ _ 6 _ _6
There must be (3-1) * myDist = 4 numbers to fill the blanks. Now we have five more numbers so the array can be shuffled.
The problem becomes more complicated if we have multiple modes. For example for this array [4 6 5 1 6 7 4 6 4] we have N=2 modes: 6 and 4. They can be arranged as:
6 4 _ 6 4 _ 6 4
We have 2 blanks and three more numbers [ 5 1 7] that can be used to fill the blanks. If for example we had only one number [ 5] it was impossible to fill the blanks and we couldn't shuffle the array.
For the third point you can use sparse matrix to accelerate the computation (My initial testing in Octave shows that it is more efficient):
function shuffledArr = distShuffleSparse(myArr, myDist)
[U,~,idx] = unique(myArr);
reps = true;
while reps
S = Shuffle(idx);
shuffledBin = sparse ( 1:numel(idx), S, true, numel(idx) + myDist, numel(U) );
reps = any (diff(find(shuffledBin)) <= myDist);
end
shuffledArr = U(S);
end
Alternatively you can use sub2ind and sort instead of sparse matrix:
function shuffledArr = distShuffleSparse(myArr, myDist)
[U,~,idx] = unique(myArr);
reps = true;
while reps
S = Shuffle(idx);
f = sub2ind ( [numel(idx) + myDist, numel(U)] , 1:numel(idx), S );
reps = any (diff(sort(f)) <= myDist);
end
shuffledArr = U(S);
end

If you just want to find one possible solution you could use something like that:
x = [1 1 1 2 2 2 3 3 3 3 3 4 5 5 6 7 8 9];
n = numel(x);
dist = 3; %minimal distance
uni = unique(x); %get the unique value
his = histc(x,uni); %count the occurence of each element
s = [sortrows([uni;his].',2,'descend'), zeros(length(uni),1)];
xr = []; %the vector that will contains the solution
%the for loop that will maximize the distance of each element
for ii = 1:n
s(s(:,3)<0,3) = s(s(:,3)<0,3)+1;
s(1,3) = s(1,3)-dist;
s(1,2) = s(1,2)-1;
xr = [xr s(1,1)];
s = sortrows(s,[3,2],{'descend','descend'})
end
if any(s(:,2)~=0)
fprintf('failed, dist is too big')
end
Result:
xr = [3 1 2 5 3 1 2 4 3 6 7 8 3 9 5 1 2 3]
Explaination:
I create a vector s and at the beggining s is equal to:
s =
3 5 0
1 3 0
2 3 0
5 2 0
4 1 0
6 1 0
7 1 0
8 1 0
9 1 0
%col1 = unique element; col2 = occurence of each element, col3 = penalities
At each iteration of our for-loop we choose the element with the maximum occurence since this element will be harder to place in our array.
Then after the first iteration s is equal to:
s =
1 3 0 %1 is the next element that will be placed in our array.
2 3 0
5 2 0
4 1 0
6 1 0
7 1 0
8 1 0
9 1 0
3 4 -3 %3 has now 5-1 = 4 occurence and a penalities of -3 so it won't show up the next 3 iterations.
at the end every number of the second column should be equal to 0, if it's not the minimal distance was too big.

circshift using index values

I was looking for a way to circshift using index values.
I know I can shift all values using the circshift command see below
a=[1:9]
b=circshift(a,[0,1])
But how do I shift every 3rd value over 1?
example:
Note: variable a can be any length
a=[1,2,30,4,5,60,7,8,90] %note variable `a` could be any length
I'm trying to get b to be
b=[1,30,2,4,60,5,7,90,8] % so the values in the index 3,6 and 9 are shifted over 1.

You're not going to be able to do this with the standard usage of circshift. There are a couple of other ways that you could approach this. Here are just a few.
Using mod to create index values
You could use mod to subtract 1 from the index values at locations 3:3:end and add 1 to the index values at locations 2:3:end.
b = a((1:numel(a)) + mod(1:numel(a), 3) - 1);
Explanation
Calling mod 3 on 1:numel(a) yields the following sequence
mod(1:numel(a), 3)
% 1 2 0 1 2 0 1 2 0
If we subtract 1 from this sequence we get the "shift" for a given index
mod(1:numel(a), 3) - 1
% 0 1 -1 0 1 -1 0 1 -1
Then we can add this shift to the original index
(1:numel(a)) + mod(1:numel(a), 3) - 1
% 1 3 2 4 6 5 7 9 8
And then assign the values in a to these positions in b.
b = a((1:numel(a)) + mod(1:numel(a), 3) - 1);
% 1 30 2 4 60 5 7 90 8
Using reshape.
Another option is to reshape your data into a 3 x N array and flip the 2nd and 3rd rows, then reshape back to the original size. This option will only work if numel(a) is divisible by 3.
tmp = reshape(a, 3, []);
% Grab the 2nd and 3rd rows in reverse order to flip them
b = reshape(tmp([1 3 2],:), size(a));

Find consecutive values in 3D array

Say I have an array the size 100x150x30, a geographical grid 100x150 with 30 values for each grid point, and want to find consecutive elements along the third dimension with a congruous length of minimum 3.
I would like to find the maximum length of consecutive elements blocks, as well as the number of occurrences.
I have tried this on a simple vector:
var=[20 21 50 70 90 91 92 93];
a=diff(var);
q = diff([0 a 0] == 1);
v = find(q == -1) - find(q == 1);
v = v+1;
v2 = v(v>3);
v3 = max(v2); % maximum length: 4
z = numel(v2); % number: 1
Now I'd like to apply this to the 3rd dimension of my array.
With A being my 100x150x30 array, I've come this far:
aa = diff(A, 1, 3);
b1 = diff((aa == 1),1,3);
b2 = zeros(100,150,1);
qq = cat(3,b2,b1,b2);
But I'm stuck on the next step, which would be: find(qq == -1) - find(qq == 1);. I can't make it work.
Is there a way to put it in a loop, or do I have to find the consecutive values another way?
Thanks for any help!

A = randi(25,100,150,30); %// generate random array
tmpsize = size(A); %// get its size
B = diff(A,1,3); %// difference
v3 = zeros(tmpsize([1 2])); %//initialise
z = zeros(tmpsize([1 2]));
for ii = 1:100 %// double loop over all entries
for jj = 1:150
q = diff([0 squeeze(B(ii,jj,:)).' 0] == 1);%'//
v = find(q == -1) - find(q == 1);
v=v+1;
v2=v(v>3);
try %// if v2 is empty, set to nan
v3(ii,jj)=max(v2);
catch
v3(ii,jj)=nan;
end
z(ii,jj)=numel(v2);
end
end
The above seems to work. It just doubly loops over both dimensions you want to get the difference over.
The part where I think you were stuck was using squeeze to get the vector to put in your variable q.
The try/catch is there solely to prevent empty consecutive arrays in v2 throwing an error in the assignment to v3, since that would remove its entry. Now it simply sets it to nan, though you can switch that to 0 of course.

Here's one vectorized approach -
%// Parameters
[m,n,r] = size(var);
max_occ_thresh = 2 %// Threshold for consecutive occurrences
% Get indices of start and stop of consecutive number islands
df = diff(var,[],3)==1;
A = reshape(df,[],size(df,3));
dfA = diff([zeros(size(A,1),1) A zeros(size(A,1),1)],[],2).'; %//'
[R1,C1] = find(dfA==1);
[R2,C2] = find(dfA==-1);
%// Get interval lengths
interval_lens = R2 - R1+1;
%// Get max consecutive occurrences across dim-3
max_len = zeros(m,n);
maxIDs = accumarray(C1,interval_lens,[],#max);
max_len(1:numel(maxIDs)) = maxIDs
%// Get number of consecutive occurrences that are a bove max_occ_thresh
num_occ = zeros(m,n);
counts = accumarray(C1,interval_lens>max_occ_thresh);
num_occ(1:numel(counts)) = counts
Sample run -
var(:,:,1) =
2 3 1 4 1
1 4 1 5 2
var(:,:,2) =
2 2 3 1 2
1 3 5 1 4
var(:,:,3) =
5 2 4 1 2
1 5 1 5 1
var(:,:,4) =
3 5 5 1 5
5 1 3 4 3
var(:,:,5) =
5 5 4 4 4
3 4 5 2 2
var(:,:,6) =
3 4 4 5 3
2 5 4 2 2
max_occ_thresh =
2
max_len =
0 0 3 2 2
0 2 0 0 0
num_occ =
0 0 1 0 0
0 0 0 0 0

Calculating difference between both adjacent and non-adjacent pairs using multiple index vectors

I have three numerical vectors containing position values (pos), a category (type), and an index (ind), in these general forms:
pos =
2 4 5 11 1 5 8 11 12 20
type =
1 2 1 2 1 1 2 1 2 3
ind =
1 1 1 1 2 2 2 2 2 2
I want to calculate the difference between values held within pos but only between the same types, and confined to each index. Using the above example:
When ind = 1
The difference(s) between type 1 positions = 3 (5-2).
The difference(s) between type 2 positions = 7 (11-4).
In the case where more than two instances of any given type exist within any index, the differences are calculate sequentially from left to right as shown here:
When ind = 2
The difference(s) between type 1 positions = 4 (5-1), 6 (11-5).
The difference(s) between type 2 positions = 4 (12-8).
Even though index 2 contains type '3', no difference is calculated as only 1 instance of this type is present.
Types are not always only 1, 2 or 3.
Ideally, the desired output would be matrix containing the same number of columns as length(unique(type)) with rows containing all differences calculated for that type. The output does not need to separate by index, only the actual calculation needs to. In this case there are three unique types, so the output would be (labels added for clarity only):
Type 1 Type 2 Type 3
3 7 0
4 4 0
6 0 0
Any empty entries can be padded with zeroes.
Is there a concise or fast manner to do this?
EDIT:
EDIT 2:
Additional input/output example.
pos = [1 15 89 120 204 209 8 43 190 304]
type = [1 1 1 2 2 1 2 3 2 3]
ind = [1 1 1 1 1 1 2 2 2 2]
Desired output:
Type 1 Type 2 Type 3
14 84 261
74 182 0
120 0 0
In this case, the script works perfectly:

At least for creating the output matrix a loop is required:
pos = [2 4 5 11 1 5 8 11 12 20]
type = [1 2 1 2 1 1 2 1 2 3]
ind = [1 1 1 1 2 2 2 2 2 2]
%// get unique combinations of type and ind
[a,~,subs] = unique( [type(:) ind(:)] , 'rows')
%// create differences
%// output is cell array according to a
temp = accumarray(subs,1:numel(subs),[],#(x) {abs(diff(pos(x(end:-1:1))))} )
%// creating output matrix
for ii = 1:max(a(:,1)) %// iterating over types
vals = [temp{ a(:,1) == ii }]; %// differences for each type
out(1:numel(vals),ii) = vals;
end
out =
3 7 0
4 4 0
6 0 0
In case it doesn't work for your real data you may need unique(...,'rows','stable') and a 'stable' accumarray.
It appeared that the above solution gives different results depending on the system.
The only reason, why the code could give different results on different machines, is that accumarray is not "stable" as mentioned above. And in some very rare cases it could return unpredictable results. So please try the following:
pos = [2 4 5 11 1 5 8 11 12 20]
type = [1 2 1 2 1 1 2 1 2 3]
ind = [1 1 1 1 2 2 2 2 2 2]
%// get unique combinations of type and ind
[a,~,subs] = unique( [type(:) ind(:)] , 'rows')
%// take care of unstable accumarray
[~, I] = sort(subs);
pos = pos(I);
subs = subs(I,:);
%// create differences
%// output is cell array according to a
temp = accumarray(subs,1:numel(subs),[],#(x) {abs(diff(pos(x(end:-1:1))))} )
%// creating output matrix
for ii = 1:max(a(:,1)) %// iterating over types
vals = [temp{ a(:,1) == ii }]; %// differences for each type
out(1:numel(vals),ii) = vals;
end
out =
3 7 0
4 4 0
6 0 0

Vectorize the sum of unique columns

There are multiple occurrence of same combination of values in different rows of matlab, for example 1 1 in first and second row. I want to remove all those duplicates but adding the values in third column. In case of 1 1 it will be 7. Finally I want to create a similarity matrix as shown below in Answer. I don't mind 2*values in diagonals because I will not be considering diagonal elements in further work. The code below does this but it is not vectorized. Can this be vectorized somehow. Example is given below.
datain = [ 1 1 3;
1 1 4;
1 2 5;
1 2 4;
1 2 3;
1 3 8;
1 3 7;
1 3 12;
2 2 22;
2 2 77;
2 3 111;
2 3 113;
3 3 456;
3 3 568];
cmp1=unique(datain(:,1));
cmp1sz=size(cmp1,1);
cmp2=unique(datain(:,2));
cmp2sz=size(cmp2,1);
thetotal=zeros(cmp1sz,cmp2sz);
for i=1:size(datain,1)
for j=1:cmp1sz
for k=1:cmp2sz
if datain(i,1)==cmp1(j,1) && datain(i,2)== cmp2(k,1)
thetotal(j,k)=thetotal(j,k)+datain(i,3);
thetotal(k,j)=thetotal(k,j)+datain(i,3);
end
end
end
end
The answer is
14 12 27
12 198 224
27 224 2048

This is a poster case for using ACCUMARRAY.
thetotal = accumarray(datain(:,1:2),datain(:,3),[],#sum,0);
%# to make the array symmetric, you simply add its transpose
thetotal = thetotal + thetotal'
thetotal =
14 12 27
12 198 224
27 224 2048
EDIT
So what if datain does not contain only integer values? In this case, you can still construct a table, but e.g. thetotal(1,1) will not correspond to datain(1,1:2) == [1 1], but to the smallest entry in the first two columns of datain.
[uniqueVals,~,tmp] = unique(reshape(datain(:,1:2),[],1));
correspondingIndices = reshape(tmp,size(datain(:,1:2)));
thetotal = accumarray(correspondingIndices,datain(:,3),[],#sum,0);
The value at [1 1] now corresponds to the row [uniqueVals(1) uniqueVals(1)] in the first two cols of datain.