detect exact blocks in matrix (two-dimensional array) - arrays

I'm looking for an efficient algorithm to identify a block structure in a matrix with many 0 entries.
For example, the 6×7 matrix
0.0975 0.9575 0 0 0 0 0
0.2785 0.9649 0 0 0 0 0
0.5469 0.1576 0 0 0 0 0
0 0 0.9706 0.9572 0 0 0
0 0 0 0 0.8235 0.3171 0.0344
0 0 0 0 0.6948 0.9502 0.4387
consists of three blocks of sizes 3×2, 1×2, and 2×3, respectively.
A block is defined by a set of rows and a set of columns. A block structure is characterized by the fact that all entries that do not belong to a block are 0 exactly. However, there may be exact-0 entries also within the blocks.
A trivial solution is to always declare the whole matrix a block; therefore, a solution is sought such that the number of within-block entries is as small as possible.
To make things harder (or maybe easier?), the blocks do not have to be contiguous. A permuted version of the above matrix,
0 0.9572 0 0 0 0 0.9706
0 0 0.0975 0 0 0.9575 0
0.4387 0 0 0.9502 0.6948 0 0
0.0344 0 0 0.3171 0.8235 0 0
0 0 0.2785 0 0 0.9649 0
0 0 0.5469 0 0 0.1576 0
therefore also has a three-block structure, which can be described as:
a block containing rows 3, 4 and columns 1, 4, 5,
a block containing row 1 and columns 2, 7,
a block containing rows 2, 5, 6 and columns 3, 6.
Solutions I have thought of are:
Use a connection-weight-based cluster algorithm. However, the matrix does not have to be symmetric or even square. There is no correspondence between a specific row and a specific column.
Initially define a block to consist of one (non-0) entry (described by its row and its column), look for non-0 entries in its row and in its column and add the respective columns and rows, grow like that iteratively until no rows or columns are added; that identifies one block. Do the same starting from an entry that is not contained in the block. Repeat until no non-0 entries are left. Here I doubt that this algorithm efficiently scales to a large matrix with many blocks.
I'm looking for an algorithm, or other ideas for an algorithm, not for an implementation. However, an implementation e.g. in Matlab or Python would be welcome.

This is a standard scenario in general expression analysis.
The algorithms for this are known as biclustering (because they cluster rows and columns at the same time). An early method is due to Cheng and Church.

Related

2D array grouping 1's in C

2D array of 1s and 0s. How to label every group of 1s with a unique number?
I’m stuck on this problem for a while now. 1s can be grouped vertically, horizontally and diagonally. How can you go about solving this? For example,
0 0 1 1 0
0 1 1 0 0
0 0 0 0 1
0 0 0 1 0
Should be transformed to
0 0 x x 0
0 x x 0 0
0 0 0 0 y
0 0 0 y 0
x, y can be any unique numbers.
Appreciate it.
Here is what I have so far for iterative: https://i.imgur.com/oCmYC02.png
But the result is a bit off because it only checks for immediate adjacent 1's: https://i.imgur.com/DAtTBmM.png
Anyone have any idea how to fix this?
I'd do it like this:
Scan 2D array sequentially, row by row, column by column
If 1 found, use variation of the flood fill algorithm, which moves in 8 directions instead of 4, from that starting point (see normal 4-direction algorithm at https://en.wikipedia.org/wiki/Flood_fill), since you have diagonal example with "y", each time using new filler number.
Repeat 1 and 2 until no more ones left.

MATLAB removing rows which has duplicates in sequence

I'm trying to remove the rows which has duplicates in sequence. I have only 2 possible values which are 0 and 1. I have nXm which n shows possible number of bits and m is not important for my question. My goal is to find an matrix which is nX(m-a). The rows a which has the property which includes duplicates in sequence. For example:
My matrix is :
A=[0 1 0 1 0 1;
0 0 0 1 1 1;
0 0 1 0 0 1;
0 1 0 0 1 0;
1 0 0 0 1 0]
I want to remove the rows has t duplicates in sequence for 0. In this question let's assume t is 3. So I want the matrix which:
B=[0 1 0 1 0 1;
0 0 1 0 0 1;
0 1 0 0 1 0]
2nd and 5th rows are removed.
I probably need to use diff.
So you want to remove rows of A that contain at least t zeros in sequence.
How about a single line?
B = A(~any(conv2(1,ones(1,t),2*A-1,'valid')==-t, 2),:);
How this works:
Transform A to bipolar form (2*A-1)
Convolve each row with a sequence of t ones (conv2(...))
Keep only rows for which the convolution does not contain -t (~any(...)). The presence of -t indicates a sequence of t zeros in the corresponding row of A.
To remove rows that contain at least t ones, just change -t to t:
B = A(~any(conv2(1,ones(1,t),2*A-1,'valid')==t, 2),:);
Here is a generalized approach which removes any rows which has given number of consecutive duplicates (not just zero. could be any number).
t = 3;
row_mask = ~any(all(~diff(reshape(im2col(A,[1 t],'sliding'),t,size(A,1),[]))),3);
out = A(row_mask,:)
Sample Run:
>> A
A =
0 1 0 1 0 1
0 0 1 5 5 5 %// consecutive 3 5's
0 0 1 0 0 1
0 1 0 0 1 0
1 1 1 0 0 1 %// consecutive 3 1's
>> out
out =
0 1 0 1 0 1
0 0 1 0 0 1
0 1 0 0 1 0
How about an approach using strings? This is certainly not as fast as Luis Mendo's method where you work directly with the numerical array, but it's thinking a bit outside of the box. The basis of this approach is that I consider each row of A to be a unique string, and I can search each string for occurrences of a string of 0s by regular expressions.
A=[0 1 0 1 0 1;
0 0 0 1 1 1;
0 0 1 0 0 1;
0 1 0 0 1 0;
1 0 0 0 1 0];
t = 3;
B = sprintfc('%s', char('0' + A));
ind = cellfun('isempty', regexp(B, repmat('0', [1 t])));
B(~ind) = [];
B = double(char(B) - '0');
We get:
B =
0 1 0 1 0 1
0 0 1 0 0 1
0 1 0 0 1 0
Explanation
Line 1: Convert each line of the matrix A into a string consisting of 0s and 1s. Each line becomes a cell in a cell array. This uses the undocumented function sprintfc to facilitate this cell array conversion.
Line 2: I use regular expressions to find any occurrences of a string of 0s that is t long. I first use repmat to create a search string that is full of 0s and is t long. After, I determine if each line in this cell array contains this sequence of characters (i.e. 000....). The function regexp helps us perform regular expressions and returns the locations of any matches for each cell in the cell array. Alternatively, you can use the function strfind for more recent versions of MATLAB to speed up the computation, but I chose regexp so that the solution is compatible with most MATLAB distributions out there.
Continuing on, the output of regexp/strfind is a cell array of elements where each cell reports the locations of where we found the particular string. If we have a match, there should be at least one location that is reported at the output, so I check to see if any matches are empty, meaning that these are the rows we don't want to remove. I want to turn this into a logical array for the purposes of removing rows from A, and so this is wrapped with a cellfun call to determine the cells that are empty. Therefore, this line returns a logical array where a 0 means that remove this row and a 1 means that we don't.
Line 3: I take the logical array from Line 2 and invert it because that's what we really want. We use this inverted array to index into the cell array and remove those strings.
Line 4: The output is still a cell array, so I convert it back into a character array, and finally back into a numerical array.

find largest rectangle not (necessary) aligned with image boundary in binary matrix

I am using this solution to find rectangles aligned with the image border in a binary matrix. Suppose now I want to find a rectangle that is not aligned with the image border, and I don't know its orientation; what would be the fastest way to find it?
For the sake of the example, let's look for a rectangle containing only 1's. For example:
1 1 1 1 0 0 0 0 0 1 0 0 1 1 1
0 1 1 1 1 1 0 0 0 1 0 0 1 1 0
0 0 0 1 1 1 1 1 0 1 0 0 1 0 0
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 1 1 1 1 1 0
Then the algorithm described in the solution I described above would only find a rectangle of size 6 (3x2). I would like to find a bigger rectangle that is tilted; we can clearly see a rectanble of at least size 10 or more...
I am working in C/C++ but an algorithm description in any language or pseudo-code would help me a lot.
Some more details:
there can be more than one rectangle in the image: I need the biggest only
the rectangle is not a beautiful rectangle in the image (I adapted my example above a little bit)
I work on large images (1280x1024) so I'm looking for the fastest solution (a brute-force O(n³) algorithm will be very slow)
(optional) if the solution can be parallellized, that is a plus (then I can boost it more using GPU, SIMD, ...)
I only have a partial answer for this question, and only a few thoughts on complexity or speed for what I propose.
Brute Force
The first idea that I see is to use the fact that your problem is discrete to implement a rotation around the center of the image and repeat the algorithm you already use in order to find the axis aligned solution.
This has the downside of checking a whole lot of candidate rotations. However, this check can be done in parallel since they are indepedant of one another. This is still probably very slow, although implementing it (shouldn't be too hard) and would provide a more definite answer to the question speed once parallelized.
Note that your work-space being a discrete matrix, there is only a finite number of rotation to browse through.
Other Approach
The second solution I see is:
To cut down your base matrix so as to separate the connected components [1] (corresponding to the value set you're interested in).
For each one of those smaller matrices -- note that they may be overlapping depending on the distribution -- find the minimum oriented bounding box for the value set you're interested in.
Still for each one of those, rotate your matrix so that the minimum oriented bounding box is now axis-aligned.
Launch the algorithm you already have to find the maximum axis-aligned rectangle containing only values from your value set.
The solution found by this algorithm would be the largest rectangle obtained from all the connected components.
This second solution would probably give you an approximation of the soluiton, but I believe it might prove to be worth trying.
For reference
The only solutions that I have found for the problem of the maximum/largest empty rectangle are axis-aligned. I have seen many unanswered questions corresponding to the oriented version of this problem on 2D continuous space.
EDIT:
[1] Since what we want is to separate the connected component, if there is a degree of overlap, you should do as in the following example:
0 1 0 0
0 1 0 1
0 0 0 1
should be divided into:
0 0 0 0
0 0 0 1
0 0 0 1
and
0 1 0 0
0 1 0 0
0 0 0 0
Note that I kept the original dimensions of the matrix. I did that because I'm guessing from your post it has some importance and that a rectangle expanding further away from the boundaries would not be found as a solution (i.e. that we can't just assume there are zero values beyond the border).
EDIT #2:
The choice of whether or not to keep the matrix dimensions is debatable since it will not directly influence the algorithm.
However, it is worth noting that if the matrices corresponding to connected components do not overlap on non-zero values, you may choose to store those matrices "in-place".
You also need to consider the fact that if you wish to return as output the coordinates of the rectangle, creating a matrix with different dimensions for each connected component, this will force you to store the coordinates of your newly created matrix in the original one (actually, one point, say for instance the up-left one, should be enough).

Radial basis network character recognition

I want to develop a simple character recognition program by implementing a given neural network kind; a simple command line-type is enough.
The radial basis function neural network was assigned to me and I already studied the weight training, input-to-hidden-to-output procedures but I am still doubtful of in implementing it. My references are (1) and (2).
A simple one-dimensional array of a 10 by 10 binary object (that represents a character) is the input. For example, the array below
input = array(
0,0,0,1,1,1,1,0,0,0,
0,0,1,0,0,0,0,1,0,0,
0,1,0,0,0,0,0,0,1,0,
1,0,0,0,0,0,0,0,0,1,
1,1,1,1,1,1,1,1,1,1,
1,0,0,0,0,0,0,0,0,1,
1,0,0,0,0,0,0,0,0,1,
1,0,0,0,0,0,0,0,0,1,
1,0,0,0,0,0,0,0,0,1,
1,0,0,0,0,0,0,0,0,1 )
is the representation of the character "A":
0 0 0 1 1 1 1 0 0 0
0 0 1 0 0 0 0 1 0 0
0 1 0 0 0 0 0 0 1 0
1 0 0 0 0 0 0 0 0 1
1 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 1
I plan to take the total weight of the input and compare it to the training set as in the saved 1-D arrays of the other characters of the alphabet and the one with the closest is the prediction.
The problem is I tend to understand algorithms better if presented in a CLRS-manner or similar type as opposed to mathematical formula. I find it hard to understand the explanations in those two papers (which I find the easiest to read among others here in the Google search).
Can someone point me to a friendly algorithm for a RBNFF that takes in an array and produces an output of weights? If not, a paper that explains this in Layman's manner would be appreciated.
Training
For what I could find there is no "one right way" to train them.
The simplest training I could find was by a composition of two algorithms
(Clustering) Taking the left part (input weights and RBFs) of the network and doing unsupervised clustering. There is a few things you can try out hard/soft and the number of clusters/RBFs.
Each cluster is a representation of a single RBF with the weights connecting to it.
How you go from having clusters to get rbf and rbf weights depends on what clustering you are using. (I can extend this if it's unclear)
(Neural Network) The solving the left out part of the original RBFNN from the last step by using the output from the clustering as input to an ordinary single layer neural network.
Probably easier to find these more primitive algorithms easily explained
EDIT
found some "pseudo"-code with explanations that might explain it all better (written in C#)
http://msdn.microsoft.com/en-us/magazine/dn532201.aspx
(Supposedly) working python code
https://github.com/andrewdyates/Radial-Basis-Function-Neural-Network

Efficient way to "fill" a binary matrix to create smooth, connected zones

I have a large matrix of 1's and 0's, and am looking for a way to "fill" up areas that are locally dense with 1's.
I first did this task for an array, and counted the number of 1's within a certain radius of the element in questions. If the radius was 5, for example, and my threshold was 4, then a point that had 4 elements marked "1" within 5 elements to the left or right would be changed to a 1.
Basically I would like to generalized this to a two - dimensional array and have a resulting matrix that has "smooth" and "connected" regions of 1's and no "patchy" spots.
As an example, the matrix
1 0 0 1 0 0 0
0 0 1 0 1 0 0
0 1 0 1 0 0 0
0 0 1 1 1 0 0
would ideally be changed to
1 0 0 1 1 0 0
0 0 1 1 1 0 0
0 1 1 1 1 0 0
0 0 1 1 1 0 0
or something similar
For binary images, the morphologial operations that are implemented in MATLAB are perfect for manipulating the shape and size of connected regions. Specifically, the process of image closing is designed to fill holes in connected regions. In MATLAB, the function is imclose, which takes the image and a structuring element, similar to a filter kernel, for how neighboring pixels effect the filling of holes and gaps. A simple invocation of imclose is,
IM2 = imclose(IM,strel(ones(3)));
Larger gaps can be filled by increasing the area of the influence of of neighboring pixes, via larger structuring elements. For example, we an use a disk of radius 10 pixels:
IM2 = imclose(IM,strel('disk',10));
While, imclose supports grayscale and binary (0 and 1) images, the function bwmorph is designed for operation on binary images only but provides a generic interface to all of the morphological operations and various neat combinations of operations (e.g. 'bothat', 'tophat', etc.). The syntax for closing is simplified with bwmorph:
BW2 = bwmorph(BW,'close');
Here the structuring element is the standard ones(3).
A simple filter such as the following might do the trick:
h = [ 0 1 0
1 0 1
0 1 0];
img2=(imfilter(img,h)>2) | img;
For instance:
img =
1 0 0 1 0 0 0
0 0 1 0 1 0 0
0 1 0 1 0 0 0
0 0 1 1 1 0 0
img2 =
1 0 0 1 0 0 0
0 0 1 1 1 0 0
0 1 1 1 1 0 0
0 0 1 1 1 0 0
You can try different filters to modify the output img2.
This uses the image processing toolbox. If you don't have that, you may want to look up equivalent routines from the matlab exchange.

Resources