Efficient way to "fill" a binary matrix to create smooth, connected zones - arrays

I have a large matrix of 1's and 0's, and am looking for a way to "fill" up areas that are locally dense with 1's.
I first did this task for an array, and counted the number of 1's within a certain radius of the element in questions. If the radius was 5, for example, and my threshold was 4, then a point that had 4 elements marked "1" within 5 elements to the left or right would be changed to a 1.
Basically I would like to generalized this to a two - dimensional array and have a resulting matrix that has "smooth" and "connected" regions of 1's and no "patchy" spots.
As an example, the matrix
1 0 0 1 0 0 0
0 0 1 0 1 0 0
0 1 0 1 0 0 0
0 0 1 1 1 0 0
would ideally be changed to
1 0 0 1 1 0 0
0 0 1 1 1 0 0
0 1 1 1 1 0 0
0 0 1 1 1 0 0
or something similar

For binary images, the morphologial operations that are implemented in MATLAB are perfect for manipulating the shape and size of connected regions. Specifically, the process of image closing is designed to fill holes in connected regions. In MATLAB, the function is imclose, which takes the image and a structuring element, similar to a filter kernel, for how neighboring pixels effect the filling of holes and gaps. A simple invocation of imclose is,
IM2 = imclose(IM,strel(ones(3)));
Larger gaps can be filled by increasing the area of the influence of of neighboring pixes, via larger structuring elements. For example, we an use a disk of radius 10 pixels:
IM2 = imclose(IM,strel('disk',10));
While, imclose supports grayscale and binary (0 and 1) images, the function bwmorph is designed for operation on binary images only but provides a generic interface to all of the morphological operations and various neat combinations of operations (e.g. 'bothat', 'tophat', etc.). The syntax for closing is simplified with bwmorph:
BW2 = bwmorph(BW,'close');
Here the structuring element is the standard ones(3).

A simple filter such as the following might do the trick:
h = [ 0 1 0
1 0 1
0 1 0];
img2=(imfilter(img,h)>2) | img;
For instance:
img =
1 0 0 1 0 0 0
0 0 1 0 1 0 0
0 1 0 1 0 0 0
0 0 1 1 1 0 0
img2 =
1 0 0 1 0 0 0
0 0 1 1 1 0 0
0 1 1 1 1 0 0
0 0 1 1 1 0 0
You can try different filters to modify the output img2.
This uses the image processing toolbox. If you don't have that, you may want to look up equivalent routines from the matlab exchange.

Related

Matlab finding the center of cluster of a few pixels and counting the clusters

So I have this matrix A, which is made of 1 and zeros, I have about 10 to 14 white spots of many pixels, but I want only 1 white pixel/centers coordinate for every cluster of white, how do I calculate how many cluster there are and their centers.
Try to imagine the matrix A as the night sky with white starts in black sky and how to I count the stars and the stars centers, plus the star are made of cluster of white pixels.
also the clusters are not all exactly the same size.
Here is some code using bwlabel and/or regioprops, which are used to identify connected components in a matrix and a buch of other properties, respectively. I think it suits your problem quite well; however you might want to adapt my code a bit as its more of a starting point.
clear
clc
%// Create dummy matrix.
BW = logical ([ 1 1 1 0 1 1 1 0
1 1 1 0 1 1 1 0
1 1 1 0 1 1 1 0
0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 0
1 1 1 1 0 1 1 0
1 1 1 1 0 1 1 0
1 1 1 1 0 0 0 0]);
%// Identify clusters.
L = bwlabel(BW,4)
Matrix L looks like this:
L =
1 1 1 0 3 3 3 0
1 1 1 0 3 3 3 0
1 1 1 0 3 3 3 0
0 0 0 0 0 0 0 0
0 0 0 0 0 4 4 0
2 2 2 2 0 4 4 0
2 2 2 2 0 4 4 0
2 2 2 2 0 0 0 0
Here you have many ways to locate the center of the clusters. The first one uses the output of bwlabel to find each cluster and calculate the coordinates in a loop. It works and its didactic but it's a bit long and not so efficient. The 2nd method, as mentioned by #nkjt, uses regionprops which does exactly what you want using the 'Centroid' property. So here are the 2 methods:
Method 1: a bit complicated
So bwlabel identified 4 clusters, which makes sense. Now we need to identify the center of each of those clusters. My method could probably be simplified; but I'm a bit out of time so fell free to modify it as you see fit.
%// Get number of clusters
NumClusters = numel(unique(L)) -1;
Centers = zeros(NumClusters,2);
CenterLinIdices = zeros(NumClusters,1);
for k = 1:NumClusters
%// Find indices for elements forming each cluster.
[r, c] = find(L==k);
%// Sort the elements to know hot many rows and columns the cluster is spanning.
[~,y] = sort(r);
c = c(y);
r = r(y);
NumRow = numel(unique(r));
NumCol = numel(unique(c));
%// Calculate the approximate center of the cluster.
CenterCoord = [r(1)+floor(NumRow/2) c(1)+floor(NumCol/2)];
%// Actually this array is not used here but you might want to keep it for future reference.
Centers(k,:) = [CenterCoord(1) CenterCoord(2)];
%// Convert the subscripts indices to linear indices for easy reference.
CenterLinIdices(k) = sub2ind(size(BW),CenterCoord(1),CenterCoord(2));
end
%// Create output matrix full of 0s, except at the center of the clusters.
BW2 = false(size(BW));
BW2(CenterLinIdices) = 1
BW2 =
0 0 0 0 0 0 0 0
0 1 0 0 0 1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
Method 2 Using regionprops and the 'Centroid' property.
Once you have matrix L, apply regionprops and concatenate the output to get an array containing the coordinates directly. Much simpler!
%// Create dummy matrix.
BW = logical ([ 1 1 1 0 1 1 1 0
1 1 1 0 1 1 1 0
1 1 1 0 1 1 1 0
0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 0
1 1 1 1 0 1 1 0
1 1 1 1 0 1 1 0
1 1 1 1 0 0 0 0]);
%// Identify clusters.
L = bwlabel(BW,4)
s = regionprops(L,'Centroid');
CentroidCoord = vertcat(s.Centroid)
which gives this:
CentroidCoord =
2.0000 2.0000
2.5000 7.0000
6.0000 2.0000
6.5000 6.0000
Which is much simpler and gives the same output once you use floor.
Hope that helps!

Two dimensional array median filtering

I'm trying to write code that implements median filtering on a two-dimensional array.
Here's an image to illustrate:
The program starts at the beginning of the array. The maximum array size is 100. I know that I can use an array like:
int a[100][100];
to store the input, and that I can iterate over a part of this array using two for loops like this:
for(i=0;i<size_filter;i++)
for(j=0;j<size_filter;j++)
temp[i][j]=a[i][j] // not so sure
But how can I make this code loop over the neighbors of every element in the array, calculate their median, and replace the center element with the median?
For some examples of what I'm trying to do, let's say that the input is a 5x5 matrix, so the input size is 5. And I want to run a 3x3 median filter on it, i.e. each element should be replaced by the median of the 3x3 elements surrounding it.
The program starts at the corner index (0,0). For this index, it scans the 3x3 region surrounding it (of which only four indexes actually lie within the input array), which contains the values 0, 0, 1, and 0. The median of these values is 0, so that's what the code should output for this array index.
In the picture below, the number in bold italics is the center cell, and the plain bold numbers are its neighbors within the 3x3 region surrounding it:
0 0 0 0 0
1 0 0 1 0
1 1 0 0 0
0 1 1 0 0
0 0 0 0 0
Here's another example, this time with the center index (0,1):
0 0 0 0 0
1 0 0 1 0
1 1 0 0 0
0 1 1 0 0
0 0 0 0 0
This time, the elements in the 3x3 region (excluding those outside the input array) have the values 0, 0, 0, 1, 0, and 0, and again, their median is therefore 0.
Here's yet another example, this time from the middle of the input, at center index (3,2):
0 0 0 0 0
1 0 0 1 0
1 1 0 0 0
0 1 1 0 0
0 0 0 0 0
This time, the elements within the 3x3 region have the values 1, 0, 0, 1, 1, 0, 0, 1, and 1, and their median in therefore 1.
Final example:
<size of array><size filter> <data>
8
3
0 0 0 0 0 0 0 0
0 5 0 0 6 0 0 0
0 0 0 0 0 7 0 0
0 0 0 0 5 0 0 0
0 0 0 5 6 0 0 0
0 0 8 5 5 0 0 0
0 0 0 7 0 0 9 0
0 0 0 0 0 0 0 0
Output:
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 5 5 0 0 0
0 0 0 5 5 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
It looks like you're trying to implement a two-dimensional median filter. The straightforward way to implement such a filter is to have four nested loops: two outer loops over the x and y coordinates of the whole image, and two inner loops over the neighborhood of the center pixel.
It's perhaps easier to describe this in code than in text, so here's some Python-esque pseudocode to illustrate:
# assumptions:
# * image is a height x width array containing source pixel values
# * filtered is a height x width array to store result pixel values in
# * size is an odd number giving the diameter of the filter region
radius = (size - 1) / 2 # size = 3 -> radius = 1
for y from 0 to height-1:
top = max(y - radius, 0)
bottom = min(y + radius, height-1)
for x from 0 to width-1:
left = max(x - radius, 0)
right = min(x + radius, width-1)
values = new list
for v from top to bottom:
for u from left to right:
add image[v][u] to values
filtered[y][x] = median(values)
Translating this code into C is left as an exercise.
It's also possible to optimize this code by noting that the neighborhoods of adjacent array cells overlap significantly, so that the values of those neighboring cells can be reused across successive iterations of the outer loops. Since the performance of this algorithm on modern CPUs is essentially limited by RAM access latency, such reuse can provide a significant speedup, especially for large filter sizes.
this:
for(i=0;i<size_filter;i++)
for(j=0;j<size_filter;j++)
temp[i][j]=a[i][j];
is a good starting point.
You just iterating over every pixel of your input array, determine the median of the neighborhood and write it to an output array.
So instead of temp[i][j]=a[i][j]; you need some WhatEverType calcMedianAt(const WhatEverType a[100][100], int r, int c, int size); function.
So you can call temp[i][j]=calcMedianAt(a, i,j, 3);
the function itself has to extract the value to a list (do proper border handling) and find the median in that list (for example by calling some median function WhatEverType calcMedian(const WhatEverType* data, int len); and return it.

Navigation of matrix from left top to right bottom, only moving to the right or downwards?

actual problem is like this which I got from an Online competition. I solved it but my solution, which is in C, couldn't produce answer in time for large numbers. I need to solve it in C.
Given below is a word from the English dictionary arranged as a matrix:
MATHE
ATHEM
THEMA
HEMAT
EMATI
MATIC
ATICS
Tracing the matrix is starting from the top left position and at each step move either RIGHT or DOWN, to reach the bottom right of the matrix. It is assured that any such tracing generates the same word. How many such tracings can be possible for a given word of length m+n-1 written as a matrix of size m * n?
1 ≤ m,n ≤ 10^6
I have to print the number of ways S the word can be traced as explained in the problem statement. If the number is larger than 10^9+7, I have to print S mod (10^9 + 7).
In the testcases, m and n can be very large.
Imagine traversing the matrix, whatever path you choose you need to take exatcly n+m-2 steps to make the word, among of which n-1 are down and m-1 are to the right, their order may change but the numbers n-1 and m-1 remain same. So the problem got reduced to only select n-1 positions out of n+m-2, so the answer is
C(n+m-2,n-1)=C(n+m-2,m-1)
How to calculate C(n,r) for this problem:
You must be knowing how to multiply two numbers in modular arithmetics, i.e.
(a*b)%mod=(a%mod*b%mod)%mod,
now to calculate C(n,r) you also need to divide, but division in modular arithmetic can be performed by using modular multiplicative inverse of the number i.e.
((a)*(a^-1))%mod=1
Ofcourse a^-1 in modular arithmetic need not equal to 1/a, and can be computed using Extended Euclidean Algorithm, as in your case mod is a prime number therefore
(a^(-1))=a^(mod-2)%mod
a^(mod-2) can be computed efficiently using repetitive squaring method.
I would suggest a dynamic programming approach for this problem since calculation of factorials of large numbers shall involve a lot of time, especially since you have multiple queries.
Starting from a small matrix (say 2x1), keep finding solutions for bigger matrices. Note that this solution works since in finding the solution for bigger matrix, you can use the value calculated for smaller matrices and speed up your calculation.
The complexity of the above soltion IMO is polynomial in M and N for an MxN matrix.
Use Laplace's triangle, incorrectly named also "binomial"
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1 1 0 0 0
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1 1 1 0 0
1 2 0 0 0
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1 1 1 1 0
1 2 3 0 0
1 3 0 0 0
1 0 0 0 0
0 0 0 0 0
1 1 1 1 1
1 2 3 4 0
1 3 6 0 0
1 4 0 0 0
1 0 0 0 0
1 1 1 1 1
1 2 3 4 5
1 3 6 10 0
1 4 10 0 0
1 5 0 0 0
1 1 1 1 1
1 2 3 4 5
1 3 6 10 15
1 4 10 20 0
1 5 15 0 0
1 1 1 1 1
1 2 3 4 5
1 3 6 10 15
1 4 10 20 35
1 5 15 35 0
1 1 1 1 1
1 2 3 4 5
1 3 6 10 15
1 4 10 20 35
1 5 15 35 70
Got it? Notice, that elements could be counted as binomial members. The diag members are here: C^1_2, C^2_4,C^3_6,C^4_8, and so on. Choose which you need.

What is better to fed into ANN for OCR: character's border or character's 'filling'?

I am having hard time deciding what is better (in terms of performance) to fed into ANN for OCR purposes. I have found rectangular areas which contain characters and now I would like to know what is better to use :
charater's border
0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 1 1 1 1 1 0
0 0 1 0 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0 1 0
0 0 1 1 1 1 1 1 1 1 0
character's filling
0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 1 1 1 1 1 0
0 0 1 1 1 1 1 1 1 1 0
0 0 1 1 1 1 1 1 1 1 0
0 0 1 1 1 1 1 1 1 1 0
I am asking before doing the testing mydelf because preparation of samples will take me a lot of time.
Sorry for formatting but I couldn't set the proper code blocks.
I think you will have a hard time figuring out what the optimal method is before you actually try because you are not going to be able to predict if your method is even going to give you a decent result anyway even if it meant less input data.
This is a classical problem that has been discussed in classic texts, there is an example here in Java:
http://www.heatonresearch.com/articles/7
You haven't explained the structure of your intended ANN, this can be implemented in so many ways that you need to decide and explain what type of ANN you intend to use. You could use Auto-associator networks, NN with hidden layer with back propagation, etc..

Gaussian elimation - Linear equation matrices, algorithm

Let's assume that we have a simple matrix 3rows x 7cols.
The matrix includes only zeros (0) and (1) like:
1 0 1 1 1 0 0
0 0 1 1 0 0 0
0 0 1 0 1 1 0
Senario:
If we know the sum of non-zeros in each row,
(in first row is 4, in second row is 2, in third row is 3.) (blue line)
additional, if we know the sum of each col (1 , 0, 3, 2, 2, 1, 0) (green line)
also if we know the sum of each diagonal from the top-left to bottom-right (1,0,1,2,3,0,1,1,0)(red lines) anti-clockwise
and finally we know the sum of each diagonal from the bottom-left to top-right (0,0,2,1,3,2,1,0,0) (yellow lines)
My question is:
With these values as input (and the lenght of matrix 3x7),
4, 2, 3
1, 0, 3, 2, 2, 1, 0
1, 0, 1, 2, 3, 0, 1, 1, 0
0, 0, 2, 1, 3, 2, 1, 0, 0
How we can draw the first matrix?
After a lot of thoughts I came to the conclusion that this is a linear equation system with 3x7 unknown values and some equations.
Right?
How can I make an algorithm in C, or whatever, to solve these equations?
Should I use a method like gausian equation?
Any help would be greatly appreciated!
Start with the first column. You know the top and bottom values (from the first values of the red & yellow lists). Subtract the sum of these two from the first in the green list, and now you have the middle value as well.
Now just work to the right.
Subtract the first column's middle value from the next value in the red list, and you have the second column's top value. Subtract that same middle value from the next value in the yellow list, and you have the second column's bottom value. Subtract the sum of these two from the next value in the green list, and now you have the middle value for the second column.
et cetera
If you're going to code this up, you can see that the first two columns are a special case, and that'll make the code ugly. I'd suggest using two "ghost" columns of all zeros to the left so that you can use a single method for determining the top, bottom, and middle values for each column.
This is also easily generalizable. You'll just have to use (#rows)-1 ghost columns.
Enjoy.
You can use singular value decomposition to compute a non zero least squares solution to a system of linear homogeneous (and non homogeneous) equations in matrix form.
For a quick overview see:
http://campar.in.tum.de/twiki/pub/Chair/TeachingWs05ComputerVision/3DCV_svd_000.pdf
You should first write out your systems as a matrix equation in the form Ax = b, where x is the 21 unknowns as a column vector, and A is the 28 x 21 matrix that forms the linear system when multiplied out. You essentially need to a compute the matrix A of linear equations, compute the singular value decomposition of A and plug the results into the equation as shown in equation 9.17
There are plenty of libraries that will compute the SVD for you in C, so you only need to formulate the matrix and perform the computations in 9.17. The most difficult part is probably understanding how it all works, with a library SVD function there is relatively little code needed.
To get you started on how to form the equation of linear systems, consider a simple 3 x 3 case.
Suppose that our system is a matrix of the form
1 0 1
0 1 0
1 0 1
We would have the following inputs to the linear system:
2 1 2 (sum of rows - row)
2 1 2 (sum of colums - col)
1 0 3 0 1 (sum of first diagonal sets - t2b)
1 0 3 0 1 (sum of second diagonal sets - b2t)
so now we create a matrix for the linear system
A a1 a2 a3 b1 b2 b3 c1 c2 c3 unknowns (x) = result (b)
sum of row 1 [ 1 1 1 0 0 0 0 0 0 ] [a1] [2]
sum of row 2 [ 0 0 0 1 1 1 0 0 0 ] [a2] [1]
sum of row 3 [ 0 0 0 0 0 0 1 1 1 ] [a3] [2]
sum of col 1 [ 1 0 0 1 0 0 1 0 0 ] [b1] [2]
sum of col 2 [ 0 1 0 0 1 0 0 1 0 ] [b2] [1]
sum of col 3 [ 0 0 1 0 0 1 0 0 1 ] [b3] [2]
sum of t2b 1 [ 1 0 0 0 0 0 0 0 0 ] [c1] [1]
sum of t2b 2 [ 0 1 0 1 0 0 0 0 0 ] [c2] [0]
sum or t2b 3 [ 0 0 1 0 1 0 1 0 0 ] [c3] [3]
sum of t2b 4 [ 0 0 0 0 0 1 0 1 0 ] [0]
sum of t2b 5 [ 0 0 0 0 0 0 0 0 1 ] [1]
sum of b2t 1 [ 0 0 0 0 0 0 1 0 0 ] [1]
sum of b2t 2 [ 0 0 0 1 0 0 0 1 0 ] [0]
sum of b2t 3 [ 1 0 0 0 1 0 0 0 1 ] [3]
sum of b2t 4 [ 0 1 0 0 0 1 0 0 0 ] [0]
sum of b2t 5 [ 0 0 1 0 0 0 0 0 0 ] [1]
When you multiply out Ax, you see that you get the linear system of equations. For example if you multiply out the first row by the unkown column, you get
a1 + a2 + a3 = 2
All you have to do is put a 1 in any of the colums that appear in the equation and 0 elsewhere.
Now all you have to do is compute the SVD of A and plug the result into equation 9.17 to compute the unknowns.
I recommend SVD because it can be computed efficiently. If you would prefer, you can augment the matrix A with the result vector b (A|b) and put A in reduced row echelon form to obtain the result.
For an array of 10x15 ones and zeros, you would be trying to find 150 unknowns and have 10+15+2*(10+15-1) = 73 equations if you ignore that the values are limited to being either one or zero. Obviously you can't create a linear system on that basis which has a unique solution.
So is that constraint enough to give a unique solution?
For a 4x4 matrix with the following sums there are two solutions:
- 1 1 1 1
| 1 1 1 1
\ 0 1 1 0 1 1 0
/ 0 1 1 0 1 1 0
0 0 1 0
1 0 0 0
0 0 0 1
0 1 0 0
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0
So I wouldn't expect there to be a unique solution for larger matrices - the same symmetry would exist in many places:
- 1 1 0 0 1 1
| 1 1 0 0 1 1
\ 0 1 0 0 1 0 1 0 0 1 0
/ 0 1 0 0 1 0 1 0 0 1 0
0 0 0 0 1 0
1 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 1
0 1 0 0 0 0
0 1 0 0 0 0
0 0 0 0 0 1
0 0 0 0 0 0
0 0 0 0 0 0
1 0 0 0 0 0
0 0 0 0 1 0
How about this as another variation
Count the amount of unknown squares each sum passes through
While there are unsolved cells
Solve all the cells which are passed through by a sum with only one unknown square
Cells are solved by simply subtracting off all the known cells from the sum
Update the amount of unknown squares each sum passes through
No boundary cases but very similar to the previous answer. This would first solve all the corners, then those adjacent to the corners, then those one step more interior from that, and so on...
Edit: Also zero out any paths that have a sum of zero, that should solve any that are solvable (I think)

Resources