Find consecutive values in 3D array - arrays

Say I have an array the size 100x150x30, a geographical grid 100x150 with 30 values for each grid point, and want to find consecutive elements along the third dimension with a congruous length of minimum 3.
I would like to find the maximum length of consecutive elements blocks, as well as the number of occurrences.
I have tried this on a simple vector:
var=[20 21 50 70 90 91 92 93];
a=diff(var);
q = diff([0 a 0] == 1);
v = find(q == -1) - find(q == 1);
v = v+1;
v2 = v(v>3);
v3 = max(v2); % maximum length: 4
z = numel(v2); % number: 1
Now I'd like to apply this to the 3rd dimension of my array.
With A being my 100x150x30 array, I've come this far:
aa = diff(A, 1, 3);
b1 = diff((aa == 1),1,3);
b2 = zeros(100,150,1);
qq = cat(3,b2,b1,b2);
But I'm stuck on the next step, which would be: find(qq == -1) - find(qq == 1);. I can't make it work.
Is there a way to put it in a loop, or do I have to find the consecutive values another way?
Thanks for any help!

A = randi(25,100,150,30); %// generate random array
tmpsize = size(A); %// get its size
B = diff(A,1,3); %// difference
v3 = zeros(tmpsize([1 2])); %//initialise
z = zeros(tmpsize([1 2]));
for ii = 1:100 %// double loop over all entries
for jj = 1:150
q = diff([0 squeeze(B(ii,jj,:)).' 0] == 1);%'//
v = find(q == -1) - find(q == 1);
v=v+1;
v2=v(v>3);
try %// if v2 is empty, set to nan
v3(ii,jj)=max(v2);
catch
v3(ii,jj)=nan;
end
z(ii,jj)=numel(v2);
end
end
The above seems to work. It just doubly loops over both dimensions you want to get the difference over.
The part where I think you were stuck was using squeeze to get the vector to put in your variable q.
The try/catch is there solely to prevent empty consecutive arrays in v2 throwing an error in the assignment to v3, since that would remove its entry. Now it simply sets it to nan, though you can switch that to 0 of course.

Here's one vectorized approach -
%// Parameters
[m,n,r] = size(var);
max_occ_thresh = 2 %// Threshold for consecutive occurrences
% Get indices of start and stop of consecutive number islands
df = diff(var,[],3)==1;
A = reshape(df,[],size(df,3));
dfA = diff([zeros(size(A,1),1) A zeros(size(A,1),1)],[],2).'; %//'
[R1,C1] = find(dfA==1);
[R2,C2] = find(dfA==-1);
%// Get interval lengths
interval_lens = R2 - R1+1;
%// Get max consecutive occurrences across dim-3
max_len = zeros(m,n);
maxIDs = accumarray(C1,interval_lens,[],#max);
max_len(1:numel(maxIDs)) = maxIDs
%// Get number of consecutive occurrences that are a bove max_occ_thresh
num_occ = zeros(m,n);
counts = accumarray(C1,interval_lens>max_occ_thresh);
num_occ(1:numel(counts)) = counts
Sample run -
var(:,:,1) =
2 3 1 4 1
1 4 1 5 2
var(:,:,2) =
2 2 3 1 2
1 3 5 1 4
var(:,:,3) =
5 2 4 1 2
1 5 1 5 1
var(:,:,4) =
3 5 5 1 5
5 1 3 4 3
var(:,:,5) =
5 5 4 4 4
3 4 5 2 2
var(:,:,6) =
3 4 4 5 3
2 5 4 2 2
max_occ_thresh =
2
max_len =
0 0 3 2 2
0 2 0 0 0
num_occ =
0 0 1 0 0
0 0 0 0 0

Related

How to find the number of times a group of a specific value is present in an array?

I have a 1 by 1000 (1 row by 1000 columns) matrix that contain only 0 and 1 as their elements. How can I find how many times 1 is repeated 3 times consecutively.
If there are more than 3 ones then it is necessary to reset the counting. So 4 would be 3+1 and it counts as only one instance of 3 consecutive 1s but 6 would be 3+3 so it counts as two instances of having 3 consecutive 1s.
This approach finds the differences between when A goes from 0 to 1 (rising edge) and from 1 to 0 (falling edge). This gives the lengths of consecutive 1s in each block. Then divide these numbers by 3 and round down to get the number of runs of 3.
Padding A with a 0 at the start and end just ensures we have a rising edge at the start if A starts with a 1, and we have a falling edge at the end if A ends with a 1.
A = round(rand(1,1000));
% padding with a 0 at the start and end will make this simpler
B = [0,A,0];
rising_edges = ~B(1:end-1) & B(2:end);
falling_edges = B(1:end-1) & ~B(2:end);
lengths_of_ones = find(falling_edges) - find(rising_edges);
N = sum(floor(lengths_of_ones / 3));
Or in a much less readable 2 lines:
A = round(rand(1,1000));
B = [0,A,0];
N = sum(floor((find(B(1:end-1) & ~B(2:end)) - find(~B(1:end-1) & B(2:end))) / 3));
You can define your custom functions like below
v = randi([0,1],1,1000);
% get runs in cell array
function C = runs(v)
C{1} = v(1);
for k = 2:length(v)
if v(k) == C{end}(end)
C{end} = [C{end},v(k)];
else
C{end+1} = v(k);
end
end
end
% count times of 3 consecutive 1s
function y = count(x)
if all(x)
y = floor(length(x)/3);
else
y = 0;
end
end
sum(cellfun(#count,runs(v)))
Here is another vectorized way:
% input
n = 3;
a = [1 1 1 1 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1]
% x x x x x = 5
% output
a0 = [a 0];
b = cumsum( a0 ) % cumsum
c = diff( [0 b( ~( diff(a0) + 1 ) ) ] ) % number of ones within group
countsOf3 = sum( floor( c/n ) ) % groups of 3
You like it messy? Here is a one-liner:
countsOf3 = sum(floor(diff([0 getfield(cumsum([a 0]),{~(diff([a 0])+1)})])/n))

Generate square matrix for vector with diagonals in MatLab

I have a vector, where each value corresponds to a diagonal. I want to create a matrix from this vector. I have a code:
x = [1:5];
N = numel(x);
diagM = diag(repmat(x(1),N,1),0);
for iD = 2:N
d = repmat(x(iD),N-iD+1,1);
d_pos = diag(d,iD-1);
d_neg = diag(d,-iD+1);
d_join = d_pos+d_neg;
diagM = diagM+d_join;
end
It gives me what i want:
diagM =
1 2 3 4 5
2 1 2 3 4
3 2 1 2 3
4 3 2 1 2
5 4 3 2 1
But it becames really slow, for example for x=[1:10^4].
Could You help me with another FASTER way to generate such a sequence?
Use toeplitz:
x = 1:5;
diagM = toeplitz(x);
Or do it manually, vectorized:
x = 1:5;
t = 1:numel(x);
diagM = x(abs(t-t.')+1); % x(abs(bsxfun(#minus, t, t.'))+1) in old Matlab versions

Shuffle array while spacing repeating elements

I'm trying to write a function that shuffles an array, which contains repeating elements, but ensures that repeating elements are not too close to one another.
This code works but seems inefficient to me:
function shuffledArr = distShuffle(myArr, myDist)
% this function takes an array myArr and shuffles it, while ensuring that repeating
% elements are at least myDist elements away from on another
% flag to indicate whether there are repetitions within myDist
reps = 1;
while reps
% set to 0 to break while-loop, will be set to 1 if it doesn't meet condition
reps = 0;
% randomly shuffle array
shuffledArr = Shuffle(myArr);
% loop through each unique value, find its position, and calculate the distance to the next occurence
for x = 1:length(unique(myArr))
% check if there are any repetitions that are separated by myDist or less
if any(diff(find(shuffledArr == x)) <= myDist)
reps = 1;
break;
end
end
end
This seems suboptimal to me for three reasons:
1) It may not be necessary to repeatedly shuffle until a solution has been found.
2) This while loop will go on forever if there is no possible solution (i.e. setting myDist to be too high to find a configuration that fits). Any ideas on how to catch this in advance?
3) There must be an easier way to determine the distance between repeating elements in an array than what I did by looping through each unique value.
I would be grateful for answers to points 2 and 3, even if point 1 is correct and it is possible to do this in a single shuffle.
I think it is sufficient to check the following condition to prevent infinite loops:
[~,num, C] = mode(myArr);
N = numel(C);
assert( (myDist<=N) || (myDist-N+1) * (num-1) +N*num <= numel(myArr),...
'Shuffling impossible!');
Assume that myDist is 2 and we have the following data:
[4 6 5 1 6 7 4 6]
We can find the the mode , 6, with its occurence, 3. We arrange 6s separating them by 2 = myDist blanks:
6 _ _ 6 _ _6
There must be (3-1) * myDist = 4 numbers to fill the blanks. Now we have five more numbers so the array can be shuffled.
The problem becomes more complicated if we have multiple modes. For example for this array [4 6 5 1 6 7 4 6 4] we have N=2 modes: 6 and 4. They can be arranged as:
6 4 _ 6 4 _ 6 4
We have 2 blanks and three more numbers [ 5 1 7] that can be used to fill the blanks. If for example we had only one number [ 5] it was impossible to fill the blanks and we couldn't shuffle the array.
For the third point you can use sparse matrix to accelerate the computation (My initial testing in Octave shows that it is more efficient):
function shuffledArr = distShuffleSparse(myArr, myDist)
[U,~,idx] = unique(myArr);
reps = true;
while reps
S = Shuffle(idx);
shuffledBin = sparse ( 1:numel(idx), S, true, numel(idx) + myDist, numel(U) );
reps = any (diff(find(shuffledBin)) <= myDist);
end
shuffledArr = U(S);
end
Alternatively you can use sub2ind and sort instead of sparse matrix:
function shuffledArr = distShuffleSparse(myArr, myDist)
[U,~,idx] = unique(myArr);
reps = true;
while reps
S = Shuffle(idx);
f = sub2ind ( [numel(idx) + myDist, numel(U)] , 1:numel(idx), S );
reps = any (diff(sort(f)) <= myDist);
end
shuffledArr = U(S);
end
If you just want to find one possible solution you could use something like that:
x = [1 1 1 2 2 2 3 3 3 3 3 4 5 5 6 7 8 9];
n = numel(x);
dist = 3; %minimal distance
uni = unique(x); %get the unique value
his = histc(x,uni); %count the occurence of each element
s = [sortrows([uni;his].',2,'descend'), zeros(length(uni),1)];
xr = []; %the vector that will contains the solution
%the for loop that will maximize the distance of each element
for ii = 1:n
s(s(:,3)<0,3) = s(s(:,3)<0,3)+1;
s(1,3) = s(1,3)-dist;
s(1,2) = s(1,2)-1;
xr = [xr s(1,1)];
s = sortrows(s,[3,2],{'descend','descend'})
end
if any(s(:,2)~=0)
fprintf('failed, dist is too big')
end
Result:
xr = [3 1 2 5 3 1 2 4 3 6 7 8 3 9 5 1 2 3]
Explaination:
I create a vector s and at the beggining s is equal to:
s =
3 5 0
1 3 0
2 3 0
5 2 0
4 1 0
6 1 0
7 1 0
8 1 0
9 1 0
%col1 = unique element; col2 = occurence of each element, col3 = penalities
At each iteration of our for-loop we choose the element with the maximum occurence since this element will be harder to place in our array.
Then after the first iteration s is equal to:
s =
1 3 0 %1 is the next element that will be placed in our array.
2 3 0
5 2 0
4 1 0
6 1 0
7 1 0
8 1 0
9 1 0
3 4 -3 %3 has now 5-1 = 4 occurence and a penalities of -3 so it won't show up the next 3 iterations.
at the end every number of the second column should be equal to 0, if it's not the minimal distance was too big.

Extract pattern and subsequent n elements from array and count number of occurences

I have an array of doubles like this:
C = [1 2 3 4 0 3 2 5 6 7 1 2 3 4 150 30]
i want to find the pattern [1 2 3 4] within the array and then store the 2 values after that pattern with it like:
A = [1 2 3 4 0 3]
B = [1 2 3 4 150 30]
i can find the pattern like this but i don't know how to get and store 2 values after that with the previous one.
And after finding A, B if i want to find the number of occurrences of each arrays within array C how can i do that?
indices = cellfun(#(c) strfind(c,pattern), C, 'UniformOutput', false);
Thanks!
Assuming you're fine with a cell array output, this works fine:
C = [1 2 3 4 0 3 2 5 6 7 1 2 3 4 150 30 42 1 2 3 4 0 3]
p = [1 2 3 4]
n = 2
% full patttern length - 1
dn = numel(p) + n - 1
%// find indices
ind = strfind(C,p)
%// pre check if pattern at end of array
if ind(end)+ dn > numel(C), k = -1; else k = 0; end
%// extracting
temp = arrayfun(#(x) C(x:x+dn), ind(1:end+k) , 'uni', 0)
%// post processing
[out, ~, idx] = unique(vertcat(temp{:}),'rows','stable')
occ = histcounts(idx).'
If the array C ends with at least n elements after the last occurrence of the pattern p, you can use the short form:
out = arrayfun(#(x) C(x:x+n+numel(p)-1), strfind(C,p) , 'uni', 0)
out =
1 2 3 4 0 3
1 2 3 4 150 30
occ =
2
1
A simple solution can be:
C = [1 2 3 4 0 3 2 5 6 7 1 2 3 4 150 30];
pattern = [1 2 3 4];
numberOfAddition = 2;
outputs = zeros(length(A),length(pattern)+ numberOfAddition); % preallocation
numberOfFoundPattern = 1;
lengthOfConsider = length(C) - length(pattern) - numberOfAddition;
for i = 1:lengthOfConsider
if(sum(C(i:i+length(pattern)) - pattern) == 0) % find pattern
outputs(numberOfFoundPattern,:) = C(i:i+length(pattern)+numberOfAddition);
numberOfFoundPattern = numberOfFoundPattern + 1;
end
end
outputs = outputs(1:numberOfFoundPattern - 1,:);

Find where condition is true n times consecutively

I have an array (say of 1s and 0s) and I want to find the index, i, for the first location where 1 appears n times in a row.
For example,
x = [0 0 1 0 1 1 1 0 0 0] ;
i = 5, for n = 3, as this is the first time '1' appears three times in a row.
Note: I want to find where 1 appears n times in a row so
i = find(x,n,'first');
is incorrect as this would give me the index of the first n 1s.
It is essentially a string search? eg findstr but with a vector.
You can do it with convolution as follows:
x = [0 0 1 0 1 1 1 0 0 0];
N = 3;
result = find(conv(x, ones(1,N), 'valid')==N, 1)
How it works
Convolve x with a vector of N ones and find the first time the result equals N. Convolution is computed with the 'valid' flag to avoid edge effects and thus obtain the correct value for the index.
Another answer that I have is to generate a buffer matrix where each row of this matrix is a neighbourhood of overlapping n elements of the array. Once you create this, index into your array and find the first row that has all 1s:
x = [0 0 1 0 1 1 1 0 0 0]; %// Example data
n = 3; %// How many times we look for duplication
%// Solution
ind = bsxfun(#plus, (1:numel(x)-n+1).', 0:n-1); %'
out = find(all(x(ind),2), 1);
The first line is a bit tricky. We use bsxfun to generate a matrix of size m x n where m is the total number of overlapping neighbourhoods while n is the size of the window you are searching for. This generates a matrix where the first row is enumerated from 1 to n, the second row is enumerated from 2 to n+1, up until the very end which is from numel(x)-n+1 to numel(x). Given n = 3, we have:
>> ind
ind =
1 2 3
2 3 4
3 4 5
4 5 6
5 6 7
6 7 8
7 8 9
8 9 10
These are indices which we will use to index into our array x, and for your example it generates the following buffer matrix when we directly index into x:
>> x = [0 0 1 0 1 1 1 0 0 0];
>> x(ind)
ans =
0 0 1
0 1 0
1 0 1
0 1 1
1 1 1
1 1 0
1 0 0
0 0 0
Each row is an overlapping neighbourhood of n elements. We finally end by searching for the first row that gives us all 1s. This is done by using all and searching over every row independently with the 2 as the second parameter. all produces true if every element in a row is non-zero, or 1 in our case. We then combine with find to determine the first non-zero location that satisfies this constraint... and so:
>> out = find(all(x(ind), 2), 1)
out =
5
This tells us that the fifth location of x is where the beginning of this duplication occurs n times.
Based on Rayryeng's approach you can loop this as well. This will definitely be slower for short array sizes, but for very large array sizes this doesn't calculate every possibility, but stops as soon as the first match is found and thus will be faster. You could even use an if statement based on the initial array length to choose whether to use the bsxfun or the for loop. Note also that for loops are rather fast since the latest MATLAB engine update.
x = [0 0 1 0 1 1 1 0 0 0]; %// Example data
n = 3; %// How many times we look for duplication
for idx = 1:numel(x)-n
if all(x(idx:idx+n-1))
break
end
end
Additionally, this can be used to find the a first occurrences:
x = [0 0 1 0 1 1 1 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 1 0 1 1 1 0 0 0]; %// Example data
n = 3; %// How many times we look for duplication
a = 2; %// number of desired matches
collect(1,a)=0; %// initialise output
kk = 1; %// initialise counter
for idx = 1:numel(x)-n
if all(x(idx:idx+n-1))
collect(kk) = idx;
if kk == a
break
end
kk = kk+1;
end
end
Which does the same but shuts down after a matches have been found. Again, this approach is only useful if your array is large.
Seeing you commented whether you can find the last occurrence: yes. Same trick as before, just run the loop backwards:
for idx = numel(x)-n:-1:1
if all(x(idx:idx+n-1))
break
end
end
One possibility with looping:
i = 0;
n = 3;
for idx = n : length(x)
idx_true = 1;
for sub_idx = (idx - n + 1) : idx
idx_true = idx_true & (x(sub_idx));
end
if(idx_true)
i = idx - n + 1;
break
end
end
if (i == 0)
disp('No index found.')
else
disp(i)
end

Resources