Find location of a pattern of bits in a binary array in matlab - arrays

I have an array of binary data with long stretches of ones and zeros and I want to find the indices of when it changes.
a = [ 1 1 1 1 1 0 0 0 0 0 0 1 1]
I want to search for [1 0] and [0 1] to find the transition points. I'd like to avoid long loops to find these if possible. Any ideas?

Something like this should do the job:
b = diff(a); % (assuming 'a' is a vector)
oneFromZero = find(b == 1) + 1; % vector of indices of a '1' preceded by a '0'
zeroFromOne = find(b == -1) + 1; % vector of indices of a '0' preceded by a '1'
Depending on what you want exactly, you may or may not want to add 1 to the resulting arrays of indices.

I'd go with
d = a(1:end-1) - a(2:end);
ind = find(d);
Here, d will be 1 where you have a ... 1 0 ... in your bit string and it will be -1 where you have a ... 0 1 .... All the other elements in d will be 0, since, at those positions, the bits are equal to their neighbour.
With this in place, you can use find to get the indices where these two patterns occur. The whole procedure is of O(n) complexity, where n=length(a), since it requires two passes through a.
For a = [ 1 1 1 1 1 0 0 0 0 0 0 1 1] the above code computes ind = [5 11].

To search for an arbitrary pattern of zeros and ones:
You can compute a convolution (conv) of the two sequences in bipolar (±1) form and then find the maxima. Since the convolution flips one of the inputs, it has to be flipped to undo that:
a = [ 1 1 1 1 1 0 0 0 0 0 0 1 1];
pattern = [0 1 1];
result = find(conv(2*a-1, 2*pattern(end:-1:1)-1, 'valid')==numel(pattern));
In this example
result =
11
which means that [0 1 1] appears in a only once, namely at index 11.
A simpler approach is to use strfind, exploiting the undocumented fact that this function can be applied to numeric vectors:
result = strfind(a, pattern);

Related

How to see if an array is contained (in the same order) of another array in matlab?

I have an array A of 1s and 0s and want to see if the larger array of bits B contains those bits in that exact order?
Example: A= [0 1 1 0 0 0 0 1]
B= [0 1 0 0 1 1 0 0 0 0 1 0 1 0 1]
would be true as A is contained in B
Most solutions I have found only determine if a value IS contained in another matrix, this is no good here as it is already certain that both matrices will be 1s and 0s
Thanks
One (albeit unusual) option, since you're dealing with integer values, is to convert A and B to character arrays and use the contains function:
isWithin = contains(char(B), char(A));
There are some obtuse vectorized ways to to do this, but by far the easiest, and likely just as efficient, is to use a loop with a sliding window,
A = [0 1 1 0 0 0 0 1];
B = [0 1 0 0 1 1 0 0 0 0 1 0 1 0 1];
vec = 0:(numel(A)-1);
for idx = 1:(numel(B)-numel(A)-1)
if all(A==B(idx+vec))
fprintf('A is contained in B\n');
break; % exit the loop as soon as 1 match is found
end
end
Or if you want to know the location(s) in B (of potentially multiple matches) then,
A = [0 1 1 0 0 0 0 1];
B = [0 1 0 0 1 1 0 0 0 0 1 0 1 0 1];
C = false(1,numel(B)-numel(A)-1);
vec = 0:(numel(A)-1);
for idx = 1:numel(C)
C(idx) = all(A==B(idx+vec));
end
if any(C)
fprintf('A is contained in B\n');
end
In this case
>> C
C =
1×6 logical array
0 0 0 1 0 0
You can use the cross-correlation between two signals for this, as a measure of local similarity.
For achieving good results, you need to shift A and B so that you don't have the value 0 any more. Then compute the correlation between the two of them with conv (keeping in mind that the convolution is the cross-correlation with one signal flipped), and normalize with the energy of A so that you get a perfect match whenever you get the value 1:
conv(B-0.5, flip(A)-0.5, 'valid')/sum((A-0.5).^2)
In the normalization term, flipping is removed as it does not change the value.
It gives:
[0 -0.5 0.25 1 0 0 -0.25 0]
4th element is 1, so starting from index equal to 4 you get a perfect match.

Find where condition is true n times consecutively

I have an array (say of 1s and 0s) and I want to find the index, i, for the first location where 1 appears n times in a row.
For example,
x = [0 0 1 0 1 1 1 0 0 0] ;
i = 5, for n = 3, as this is the first time '1' appears three times in a row.
Note: I want to find where 1 appears n times in a row so
i = find(x,n,'first');
is incorrect as this would give me the index of the first n 1s.
It is essentially a string search? eg findstr but with a vector.
You can do it with convolution as follows:
x = [0 0 1 0 1 1 1 0 0 0];
N = 3;
result = find(conv(x, ones(1,N), 'valid')==N, 1)
How it works
Convolve x with a vector of N ones and find the first time the result equals N. Convolution is computed with the 'valid' flag to avoid edge effects and thus obtain the correct value for the index.
Another answer that I have is to generate a buffer matrix where each row of this matrix is a neighbourhood of overlapping n elements of the array. Once you create this, index into your array and find the first row that has all 1s:
x = [0 0 1 0 1 1 1 0 0 0]; %// Example data
n = 3; %// How many times we look for duplication
%// Solution
ind = bsxfun(#plus, (1:numel(x)-n+1).', 0:n-1); %'
out = find(all(x(ind),2), 1);
The first line is a bit tricky. We use bsxfun to generate a matrix of size m x n where m is the total number of overlapping neighbourhoods while n is the size of the window you are searching for. This generates a matrix where the first row is enumerated from 1 to n, the second row is enumerated from 2 to n+1, up until the very end which is from numel(x)-n+1 to numel(x). Given n = 3, we have:
>> ind
ind =
1 2 3
2 3 4
3 4 5
4 5 6
5 6 7
6 7 8
7 8 9
8 9 10
These are indices which we will use to index into our array x, and for your example it generates the following buffer matrix when we directly index into x:
>> x = [0 0 1 0 1 1 1 0 0 0];
>> x(ind)
ans =
0 0 1
0 1 0
1 0 1
0 1 1
1 1 1
1 1 0
1 0 0
0 0 0
Each row is an overlapping neighbourhood of n elements. We finally end by searching for the first row that gives us all 1s. This is done by using all and searching over every row independently with the 2 as the second parameter. all produces true if every element in a row is non-zero, or 1 in our case. We then combine with find to determine the first non-zero location that satisfies this constraint... and so:
>> out = find(all(x(ind), 2), 1)
out =
5
This tells us that the fifth location of x is where the beginning of this duplication occurs n times.
Based on Rayryeng's approach you can loop this as well. This will definitely be slower for short array sizes, but for very large array sizes this doesn't calculate every possibility, but stops as soon as the first match is found and thus will be faster. You could even use an if statement based on the initial array length to choose whether to use the bsxfun or the for loop. Note also that for loops are rather fast since the latest MATLAB engine update.
x = [0 0 1 0 1 1 1 0 0 0]; %// Example data
n = 3; %// How many times we look for duplication
for idx = 1:numel(x)-n
if all(x(idx:idx+n-1))
break
end
end
Additionally, this can be used to find the a first occurrences:
x = [0 0 1 0 1 1 1 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 1 0 1 1 1 0 0 0]; %// Example data
n = 3; %// How many times we look for duplication
a = 2; %// number of desired matches
collect(1,a)=0; %// initialise output
kk = 1; %// initialise counter
for idx = 1:numel(x)-n
if all(x(idx:idx+n-1))
collect(kk) = idx;
if kk == a
break
end
kk = kk+1;
end
end
Which does the same but shuts down after a matches have been found. Again, this approach is only useful if your array is large.
Seeing you commented whether you can find the last occurrence: yes. Same trick as before, just run the loop backwards:
for idx = numel(x)-n:-1:1
if all(x(idx:idx+n-1))
break
end
end
One possibility with looping:
i = 0;
n = 3;
for idx = n : length(x)
idx_true = 1;
for sub_idx = (idx - n + 1) : idx
idx_true = idx_true & (x(sub_idx));
end
if(idx_true)
i = idx - n + 1;
break
end
end
if (i == 0)
disp('No index found.')
else
disp(i)
end

Find number of consecutive ones in binary array

I want to find the lengths of all series of ones and zeros in a logical array in MATLAB. This is what I did:
A = logical([0 0 0 1 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 1 1]);
%// Find series of ones:
csA = cumsum(A);
csOnes = csA(diff([A 0]) == -1);
seriesOnes = [csOnes(1) diff(csOnes)];
%// Find series of zeros (same way, using ~A)
csNegA = sumsum(~A);
csZeros = csNegA(diff([~A 0]) == -1);
seriesZeros = [csZeros(1) diff(csZeros)];
This works, and gives seriesOnes = [4 2 5] and seriesZeros = [3 1 6]. However it is rather ugly in my opinion.
I want to know if there is a better way to do this. Performance is not an issue as this is inexpensive (A is no longer than a few thousand elements). I am looking for code clarity and elegance.
If nothing better can be done, I'll just put this in a little helper function so I don't have to look at it.
You could use an existing code for run-length-encoding, which does the (ugly) work for you and then filter out your vectors yourself. This way your helper function is rather general and its functionality is evident from the name runLengthEncode.
Reusing code from this answer:
function [lengths, values] = runLengthEncode(data)
startPos = find(diff([data(1)-1, data]));
lengths = diff([startPos, numel(data)+1]);
values = data(startPos);
You would then filter out your vectors using:
A = logical([0 0 0 1 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 1 1]);
[lengths, values] = runLengthEncode(A);
seriesOnes = lengths(values==1);
seriesZeros = lengths(values==0);
You can try this:
A = logical([0 0 0 1 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 1 1]);
B = [~A(1) A ~A(end)]; %// Add edges at start/end
edges_indexes = find(diff(B)); %// find edges
lengths = diff(edges_indexes); %// length between edges
%// Separate zeros and ones, to a cell array
s(1+A(1)) = {lengths(1:2:end)};
s(1+~A(1)) = {lengths(2:2:end)};
This strfind (works wonderfully with numeric arrays as well as string arrays) based approach could be easier to follow -
%// Find start and stop indices for ones and zeros with strfind by using
%// "opposite (0 for 1 and 1 for 0) sentients"
start_ones = strfind([0 A],[0 1]) %// 0 is the sentient here and so on
start_zeros = strfind([1 A],[1 0])
stop_ones = strfind([A 0],[1 0])
stop_zeros = strfind([A 1],[0 1])
%// Get lengths of islands of ones and zeros using those start-stop indices
length_ones = stop_ones - start_ones + 1
length_zeros = stop_zeros - start_zeros + 1

find specific zeros grouped in an array - matlab

I have an array like this one
a=[0 0 0 1 1 1 1 0 0];
and would like if the community would know an elegant way to find the last occurrence of 0 in the first group and the first occurrence of 0 in the last group of zeros. Please note there is always ones between the two groups of zeros. The answer should look like this
b=[0 0 1 0 0 0 0 1 0];
strfind based approach that works pretty well with numeric arrays to find patterns like these, seems like a good fit to solve it. Here's the implementation -
%// Find indices where we have matches of [0 1] and [1 0] corresponding to
%// the two cases as listed in the question
case1_idx = strfind(a,[0 1])
case2_idx = strfind(a,[1 0])
%// Initialize output array; set those required positions in it as ones
b = zeros(size(a))
b([case1_idx(1) case2_idx(end)+1]) = 1
Sample run -
a =
0 0 0 1 1 1 1 0 0
b =
0 0 1 0 0 0 0 1 0
How about this descriptive solution:
afterA = [a(2:end),nan]
beforeA = [nan,a(1:end-1)]
b = (a==0 & afterA==1) | (a==0 & beforeA==1)
d = diff(a)
res = zeros(size(a))
res(find(d==1)) = 1
res(find(d==-1)+1) = 1
or (assuming that a always starts and ends with a 0), you would not even need to search the entire array
res = zeros(size(a))
res(find(a, 1, 'first')-1) = 1
res(find(a, 1, 'last')+1) = 1
With convolution:
b = zeros(size(a)); %// initiallize
x = conv(2*a-1,[1 -1],'same'); %// convolution
b(find(x==2)) = 1; %// last zero in a run
b(find(x==-2)+1) = 1; %// first zero in a run
Or you could use the same approach with diff instead of conv:
b = zeros(size(a)); %// initiallize
c = diff(a); %// compute differences
b(find(c==1)) = 1; %// last zero in a run
b(find(c==-1)+1) = 1; %// first zero in a run

Effective picking of surrounded element

If I have sequence 1 0 0 0 1 0 1 0 1 1 1
how to effectively locate zero which has from both sides 1.
In this sequence it means zero on position 6 and 8. The ones in bold.
1 0 0 0 1 0 1 0 1 1 1
I can imagine algorithm that would loop through the array and look one in back and one in front I guess that means O(n) so probably there is not any more smooth one.
If you can find another way, I am interested.
Use strfind:
pos = strfind(X(:)', [1 0 1]) + 1
Note that this will work only when X is a vector.
Example
X = [1 0 0 0 1 0 1 0 1 1 1 ];
pos = strfind(X(:)', [1 0 1]) + 1
The result:
pos =
6 8
The strfind method that #EitanT suggested is quite nice. Another way to do this is to use find and element-wise bit operations:
% let A be a logical ROW array
B = ~A & [A(2:end),false] & [false,A(1:end-1)];
elements = find(B);
This assumes, based on your example, that you want to exclude boundary elements. The concatenations [A(2:end),false] and [false,A(1:end-1)] are required to keep the array length the same. If memory is a concern, these can be eliminated:
% NB: this will work for both ROW and COLUMN vectors
B = ~A(2:end-1) & A(3:end) & A(1:end-2);
elements = 1 + find(B); % need the 1+ because we cut off the first element above
...and to elaborate on #Eitan T 's answer, you can use strfind for an array if you loop by row
% let x = some matrix of 1's and 0's (any size)
[m n] = size(x);
for r = 1:m;
pos(r,:) = strfind(x(r,:)',[1 0 1]) + 1;
end
pos would be a m x ? matrix with m rows and any returned positions. If there were no zeros in the proper positions though, you might get a NaN ... or an error. Didn't get a chance to test.

Resources