Any efficient way to identify a set of 1s in a big array?

Any efficient way to identify a set of 1s in a big array? - arrays

I have an array called link_slots of 800 elements made of 1, 0 and -1.
E.g. [1 1 1 -1 0 0 0 0 1 1 -1 0 0 1 1 1 1 -1 0 0 ...]. So, 1 denotes occupied, 0 for unoccupied and -1 just to mark the end of a set of 1s.
I want to know the start and end indices of each set of 1s. E.g.,here, start as [1,9,14] and end as [3,10,17]. My code works but found out through Profiler that it takes a lot of time. Is there any efficient way to solve this? Considering that I have to do this thing for multiple arrays for size 800 elements.
i=1;
while(i<numel(link_slots(1,:)) ) %to cycle through whole array
j=i
if(link_slots(1,i)==1) %i.e. if occupied
startt(i)=i %store it in the start array
j=i
while(link_slots(index,j+1)~=-1)
j=j+1
end
endd(i)=j %store the end index upon encountering -1
end
i=j+1
end

data= [1 1 1 -1 0 0 0 0 1 1 -1 0 0 1 1 1 1 -1 0 0]';
the end indices are very easy to find :
I=find(data==-1);
end_indices=I-1;
to find the start indices you want the indices of '1' s which previous value is zero or '-1'
so for example:
temp=[0;data]; % i added a zero to the start of data to use diff function
I=find(diff(temp)>0 & data==1) % here diff function calculates difference between subsequent values of array. so in case of your question if we had ..0 1..diff is 1 and ...

Related

2D array grouping 1's in C

2D array of 1s and 0s. How to label every group of 1s with a unique number?
I’m stuck on this problem for a while now. 1s can be grouped vertically, horizontally and diagonally. How can you go about solving this? For example,
0 0 1 1 0
0 1 1 0 0
0 0 0 0 1
0 0 0 1 0
Should be transformed to
0 0 x x 0
0 x x 0 0
0 0 0 0 y
0 0 0 y 0
x, y can be any unique numbers.
Appreciate it.
Here is what I have so far for iterative: https://i.imgur.com/oCmYC02.png
But the result is a bit off because it only checks for immediate adjacent 1's: https://i.imgur.com/DAtTBmM.png
Anyone have any idea how to fix this?

I'd do it like this:
Scan 2D array sequentially, row by row, column by column
If 1 found, use variation of the flood fill algorithm, which moves in 8 directions instead of 4, from that starting point (see normal 4-direction algorithm at https://en.wikipedia.org/wiki/Flood_fill), since you have diagonal example with "y", each time using new filler number.
Repeat 1 and 2 until no more ones left.

Finding array elements close to another array in space?

I basically want to use the function ismember, but for a range. For example, I want to know what data points in array1 are within n distance to array2, for each element in array2.
I have the following:
array1 = [1,2,3,4,5]
array2 = [2,2,3,10,20,40,50]
I want to know what values in array2 are <= 2 away from array1:
indices(1,:) (where array1(1) = 1) = [1 1 1 0 0 0 0]
indices(2,:) (where array1(2) = 2) = [1 1 1 0 0 0 0]
indices(3,:) (where array1(3) = 3) = [1 1 1 0 0 0 0]
indices(4,:) (where array1(4) = 4) = [1 1 1 0 0 0 0]
indices(5,:) (where array1(5) = 5) = [0 0 1 0 0 0 0]
Drawbacks:
My array1 is 496736 elements, my array2 is 9268 elements, so aI would rather not use a loop.

Looping is a valid option here. Intialise an array output to the size of array1 X array2, then loop over all elements in array1 an subtract array2 from that, then check whether the absolute value is less than or equal to 2:
array1 = [1,2,3,4,5];
array2 = [2,2,3,10,20,40,50];
output = zeros(numel(array1), numel(array2),'logical');
for ii = 1:numel(array1)
output(ii,:) = abs(array1(ii)-array2)<=2;
end
output =
1 1 1 0 0 0 0
1 1 1 0 0 0 0
1 1 1 0 0 0 0
1 1 1 0 0 0 0
0 0 1 0 0 0 0
i.e. loops are not the problem.
Thanks to Rahnema1's suggestion, you can initialise output directly as a logical matrix:
output = zeros(numel(array1),numel(array2),'logical');
whose size is just 4.3GB.
On timings: Hans' code runs in a matter of seconds for array1 = 5*rand(496736,1); array2 = 25*rand(9286,1);, the looped solution takes about 15 times longer. Both solutions are equal to one another. obcahrdon's ismembertol solution is somewhere in between on my machine.
On RAM usage:
Both implicit expansion, as per Hans' answer, as well as the loop suggested in mine work with just 4.3GB RAM on your expanded problem size (496736*9286)
pdist2 as per Luis' answer and bsxfun as per Hans' on the other hand try to create an intermediate double matrix of 34GB (which doesn't even fit in my RAM, so I cannot compare timings).
obchardon's ismembertol solution outputs a different form of the solution, and takes ~5.04GB (highly dependent on the amount of matches found, the more, the larger this number will be).
In general this leads me to the conclusion that implicit expansion should be your option of choice, but if you have R2016a or earlier, ismembertol or a loop is the way to go.

Using implicit expansion, introduced in MATLAB R2016b, you can simply write:
abs(array1.' - array2) <= 2
ans =
1 1 1 0 0 0 0
1 1 1 0 0 0 0
1 1 1 0 0 0 0
1 1 1 0 0 0 0
0 0 1 0 0 0 0
For earlier MATLAB versions, you can get this using the bsxfun function:
abs(bsxfun(#minus, array1.', array2)) <= 2
ans =
1 1 1 0 0 0 0
1 1 1 0 0 0 0
1 1 1 0 0 0 0
1 1 1 0 0 0 0
0 0 1 0 0 0 0
Hope that helps!
P.S. On the "MATLAB is slow for loops" myth, please have a look at that blog post for example.
EDIT: Please read Adriaan's answer on the RAM consumption using this and/or his approach!

If you have the Statistics Toolbox, you can use pdist2 to compute the distance matrix:
result = pdist2(array1(:), array2(:)) <= 2;
As noted by #Adriaan, this is not efficient for large input sizes. In that case a better approach is a loop with preallocation of the logical matrix output, as in his answer.

You can also use ismembertol with some specific option:
A = [1,2,3,4,5];
B = [2,2,3,10,20,40,5000];
tol = 2;
[~,ind] = ismembertol([A-tol;A+tol].',[B-tol;B+tol].',tol, 'ByRows', true, ...
'OutputAllIndices', true, 'DataScale', [1,Inf])
It will create a 5x1 cell array containing the corresponding linear indice
ind{1} = [1,2,3]
ind{2} = [1,2,3]
...
ind{5} = [3]
In this case using linear indices instead of logical indices will greatly reduce the memory consumption.

How to see if an array is contained (in the same order) of another array in matlab?

I have an array A of 1s and 0s and want to see if the larger array of bits B contains those bits in that exact order?
Example: A= [0 1 1 0 0 0 0 1]
B= [0 1 0 0 1 1 0 0 0 0 1 0 1 0 1]
would be true as A is contained in B
Most solutions I have found only determine if a value IS contained in another matrix, this is no good here as it is already certain that both matrices will be 1s and 0s
Thanks

One (albeit unusual) option, since you're dealing with integer values, is to convert A and B to character arrays and use the contains function:
isWithin = contains(char(B), char(A));

There are some obtuse vectorized ways to to do this, but by far the easiest, and likely just as efficient, is to use a loop with a sliding window,
A = [0 1 1 0 0 0 0 1];
B = [0 1 0 0 1 1 0 0 0 0 1 0 1 0 1];
vec = 0:(numel(A)-1);
for idx = 1:(numel(B)-numel(A)-1)
if all(A==B(idx+vec))
fprintf('A is contained in B\n');
break; % exit the loop as soon as 1 match is found
end
end
Or if you want to know the location(s) in B (of potentially multiple matches) then,
A = [0 1 1 0 0 0 0 1];
B = [0 1 0 0 1 1 0 0 0 0 1 0 1 0 1];
C = false(1,numel(B)-numel(A)-1);
vec = 0:(numel(A)-1);
for idx = 1:numel(C)
C(idx) = all(A==B(idx+vec));
end
if any(C)
fprintf('A is contained in B\n');
end
In this case
>> C
C =
1×6 logical array
0 0 0 1 0 0

You can use the cross-correlation between two signals for this, as a measure of local similarity.
For achieving good results, you need to shift A and B so that you don't have the value 0 any more. Then compute the correlation between the two of them with conv (keeping in mind that the convolution is the cross-correlation with one signal flipped), and normalize with the energy of A so that you get a perfect match whenever you get the value 1:
conv(B-0.5, flip(A)-0.5, 'valid')/sum((A-0.5).^2)
In the normalization term, flipping is removed as it does not change the value.
It gives:
[0 -0.5 0.25 1 0 0 -0.25 0]
4th element is 1, so starting from index equal to 4 you get a perfect match.

Find indices of blocks of 0s that are continuous [duplicate]

This question already has answers here:
Finding islands of zeros in a sequence
(6 answers)
Closed 6 years ago.
I have a vector and I want to find the indices of blocks of 0s that are continuous for at least 3 times.
y = [1 1 1 0 1 1 0 0 0 1 1 1 0 1 0 1 0 0 1 0 0 0 0 1 1];
So in this case, the blocks should be [0 0 0] from 7-9 and [0 0 0 0] from 20-23. The output should give me the indices, something like [7, 9] and [20,23], or even better, change these blocks of 0s to a single NAN to become:
[1 1 1 0 1 1 NAN 1 1 1 0 1 0 1 0 0 1 NAN 1 1]
Thanks!

What you can do is:
Pad the vector with 1 on each side.
Use find and diff to find where the vector changes from 1 to 0 (diff = -1)
Use find and diff to find where the vector changes from 0 to 1 (diff = 1)
Find the duration of each interval by subtracting the values in 3 by the values in 2 (and add 1)
Create a logical vector with true where the duration is >= 3, and use that vector to find the start indices (from the values found in point 2).
Set the value of each of the start indices to NaN
Set the value of start indices + 1 : end indices to [].
And you're set to go!
It actually took a lot more time writing the explanation than it took to write the code. It's quite a nice exercise to learn some basic MATLAB so I'll leave it to you. Good luck!

MATLAB removing rows which has duplicates in sequence

I'm trying to remove the rows which has duplicates in sequence. I have only 2 possible values which are 0 and 1. I have nXm which n shows possible number of bits and m is not important for my question. My goal is to find an matrix which is nX(m-a). The rows a which has the property which includes duplicates in sequence. For example:
My matrix is :
A=[0 1 0 1 0 1;
0 0 0 1 1 1;
0 0 1 0 0 1;
0 1 0 0 1 0;
1 0 0 0 1 0]
I want to remove the rows has t duplicates in sequence for 0. In this question let's assume t is 3. So I want the matrix which:
B=[0 1 0 1 0 1;
0 0 1 0 0 1;
0 1 0 0 1 0]
2nd and 5th rows are removed.
I probably need to use diff.

So you want to remove rows of A that contain at least t zeros in sequence.
How about a single line?
B = A(~any(conv2(1,ones(1,t),2*A-1,'valid')==-t, 2),:);
How this works:
Transform A to bipolar form (2*A-1)
Convolve each row with a sequence of t ones (conv2(...))
Keep only rows for which the convolution does not contain -t (~any(...)). The presence of -t indicates a sequence of t zeros in the corresponding row of A.
To remove rows that contain at least t ones, just change -t to t:
B = A(~any(conv2(1,ones(1,t),2*A-1,'valid')==t, 2),:);

Here is a generalized approach which removes any rows which has given number of consecutive duplicates (not just zero. could be any number).
t = 3;
row_mask = ~any(all(~diff(reshape(im2col(A,[1 t],'sliding'),t,size(A,1),[]))),3);
out = A(row_mask,:)
Sample Run:
>> A
A =
0 1 0 1 0 1
0 0 1 5 5 5 %// consecutive 3 5's
0 0 1 0 0 1
0 1 0 0 1 0
1 1 1 0 0 1 %// consecutive 3 1's
>> out
out =
0 1 0 1 0 1
0 0 1 0 0 1
0 1 0 0 1 0

How about an approach using strings? This is certainly not as fast as Luis Mendo's method where you work directly with the numerical array, but it's thinking a bit outside of the box. The basis of this approach is that I consider each row of A to be a unique string, and I can search each string for occurrences of a string of 0s by regular expressions.
A=[0 1 0 1 0 1;
0 0 0 1 1 1;
0 0 1 0 0 1;
0 1 0 0 1 0;
1 0 0 0 1 0];
t = 3;
B = sprintfc('%s', char('0' + A));
ind = cellfun('isempty', regexp(B, repmat('0', [1 t])));
B(~ind) = [];
B = double(char(B) - '0');
We get:
B =
0 1 0 1 0 1
0 0 1 0 0 1
0 1 0 0 1 0
Explanation
Line 1: Convert each line of the matrix A into a string consisting of 0s and 1s. Each line becomes a cell in a cell array. This uses the undocumented function sprintfc to facilitate this cell array conversion.
Line 2: I use regular expressions to find any occurrences of a string of 0s that is t long. I first use repmat to create a search string that is full of 0s and is t long. After, I determine if each line in this cell array contains this sequence of characters (i.e. 000....). The function regexp helps us perform regular expressions and returns the locations of any matches for each cell in the cell array. Alternatively, you can use the function strfind for more recent versions of MATLAB to speed up the computation, but I chose regexp so that the solution is compatible with most MATLAB distributions out there.
Continuing on, the output of regexp/strfind is a cell array of elements where each cell reports the locations of where we found the particular string. If we have a match, there should be at least one location that is reported at the output, so I check to see if any matches are empty, meaning that these are the rows we don't want to remove. I want to turn this into a logical array for the purposes of removing rows from A, and so this is wrapped with a cellfun call to determine the cells that are empty. Therefore, this line returns a logical array where a 0 means that remove this row and a 1 means that we don't.
Line 3: I take the logical array from Line 2 and invert it because that's what we really want. We use this inverted array to index into the cell array and remove those strings.
Line 4: The output is still a cell array, so I convert it back into a character array, and finally back into a numerical array.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Any efficient way to identify a set of 1s in a big array? - arrays

Related

2D array grouping 1's in C

Finding array elements close to another array in space?

How to see if an array is contained (in the same order) of another array in matlab?

Find indices of blocks of 0s that are continuous [duplicate]

MATLAB removing rows which has duplicates in sequence

Categories

Resources