What is better to fed into ANN for OCR: character's border or character's 'filling'? - artificial-intelligence

I am having hard time deciding what is better (in terms of performance) to fed into ANN for OCR purposes. I have found rectangular areas which contain characters and now I would like to know what is better to use :
charater's border
0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 1 1 1 1 1 0
0 0 1 0 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0 1 0
0 0 1 1 1 1 1 1 1 1 0
character's filling
0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 1 1 1 1 1 0
0 0 1 1 1 1 1 1 1 1 0
0 0 1 1 1 1 1 1 1 1 0
0 0 1 1 1 1 1 1 1 1 0
I am asking before doing the testing mydelf because preparation of samples will take me a lot of time.
Sorry for formatting but I couldn't set the proper code blocks.

I think you will have a hard time figuring out what the optimal method is before you actually try because you are not going to be able to predict if your method is even going to give you a decent result anyway even if it meant less input data.
This is a classical problem that has been discussed in classic texts, there is an example here in Java:
http://www.heatonresearch.com/articles/7
You haven't explained the structure of your intended ANN, this can be implemented in so many ways that you need to decide and explain what type of ANN you intend to use. You could use Auto-associator networks, NN with hidden layer with back propagation, etc..

Related

Find smallest enclosed area of a grid

There is a grid, the edges of which are always wall.
The internal area of the grid is also divided by walls into several sub-areas, like this
1 = Wall,
0 = Empty.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 0 0 1 0 0 0 0 0 0 0 0 0 1
1 0 0 0 1 0 0 0 0 0 0 0 0 0 1
1 0 0 0 1 0 0 0 0 0 0 0 0 0 1
1 1 1 1 1 0 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 1 1 1 0 0 0 1
1 0 0 0 0 0 0 0 1 0 1 1 1 1 1
1 0 0 0 0 0 0 0 1 0 0 0 0 0 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I need to find the smallest empty sub-area.
How do i do it?
Telescope's suggestion is correct. Here's a slightly more detailed illustration of how you might approach this.
Given an M x N grid, you will loop over the (M - 2)(N - 2) subgrid ignoring the outer walls. When looking at a given grid cell:
if the grid cell is 0, you have not seen this area yet; begin a flood fill here that counts the number of adjacent 0s and changes them to 2 to mark them as having been seen already in some enclosed area, to avoid having to re-flood this area again later
if a grid cell is 1, it's an interior wall and should be skipped
if the grid cell is 2, you have seen this area already and can skip it
At the end, you'll have counted the area of each distinct enclosed section and can choose the biggest, smallest, or whichever you need to know.
This algorithm will visit each cell at most a few times (worst case is a 1 surrounded by 0s which the flood fill will bump into up to four times, and the wall will be checked once during the subgrid scan). Therefore, the time complexity is O(MN). The algorithm uses the grid itself to keep track of what it has done so far, so no extra memory is used; if the grid must not be modified in place, an extra O(MN) memory can be allocated for a working copy.

How to isolate specific boolean island given a coordinate on it?

Given a 2d array with Boolean islands where 1 is land and 0 is water. Lets say I want only the island to which I point with a coordinate. How would I transfer it to a new array where everything beyond the borders of that island is water.
Here is a simple example.
I am given this 2d array
1 0 0 1 1 0
0 1 0 0 0 1
1 1 1 0 0 0
0 1 0 1 0 1
1 1 1 1 1 0
and the coordinate [1][2] (that would be the 2nd column 3rd row)
Then the final result in the new array should be something like
0 0 0 0 0 0
0 1 0 0 0 0
1 1 1 0 0 0
0 1 0 1 0 0
1 1 1 1 1 0
The pixels can only be connected either up, down, left or right to each other (no diagonals)

Divide-by-N binary clock sequence algorithm?

I'm not quite sure how to describe what I mean, so let me try to explain by example (bear with me).
When you simply increment an integer you get a binary sequence like so (let's assume 8 bits for this question):
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
0 0 0 0 0 0 1 1
0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 1
0 0 0 0 0 1 1 0
0 0 0 0 0 1 1 1
0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 1
0 0 0 0 1 0 1 0
0 0 0 0 1 0 1 1
0 0 0 0 1 1 0 0
0 0 0 0 1 1 0 1
0 0 0 0 1 1 1 0
0 0 0 0 1 1 1 1
0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 1
0 0 0 1 0 0 1 0
0 0 0 1 0 0 1 1
0 0 0 1 0 1 0 0
0 0 0 1 0 1 0 1
0 0 0 1 0 1 1 0
0 0 0 1 0 1 1 1
0 0 0 1 1 0 0 0
0 0 0 1 1 0 0 1
0 0 0 1 1 0 1 0
0 0 0 1 1 0 1 1
0 0 0 1 1 1 0 0
0 0 0 1 1 1 0 1
0 0 0 1 1 1 1 0
0 0 0 1 1 1 1 1
[ ... etc ... ]
One way to visualize this is that each column represents a "clock". Each clock/column is half the frequency of its right neighbor.
So the right-most clock has one 0 followed by one 1, etc. The next clock has two 0s followed by two 1s, etc and so on...
I'm interested in a sequence of binary strings in which each clock is an integer division of its neighbor.
So the right-most clock is still one 0, one 1, the next clock is still two 0s, two 1s, but the third clock is three 0s and three 1s, etc.
Instead of /1 /2 /4 /8 /16 ... it's now /1 /2 /3 /4 /5 ....
The sequence now looks like this:
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
0 0 0 0 0 1 1 1
0 0 0 0 1 1 0 0
0 0 0 1 1 1 0 1
0 0 1 1 1 0 1 0
0 1 1 1 1 0 1 1
1 1 1 1 0 0 0 0
1 1 1 1 0 1 0 1
1 1 1 0 0 1 1 0
1 1 1 0 0 1 1 1
1 1 0 0 1 0 0 0
1 1 0 0 1 0 0 1
1 0 0 0 1 0 1 0
1 0 0 1 1 1 1 1
0 0 0 1 0 1 0 0
0 0 0 1 0 1 0 1
0 0 1 1 0 0 1 0
0 0 1 1 0 0 1 1
[ ... etc ... ]
Question: Is there an operation/algorithm which can give me the value at i given the value at i-1?
In other words, let's say I'm at the 4th step (0 0 0 0 0 1 1 1). Is there some operation I can perform on this number to get the value at the 5th step (0 0 0 0 1 1 0 0), and similarly for any other step?
In the divide-by-2 case you simply increment the number (i++) but in the divide-by-N case I can't seem to figure out a similar way to go from one to the next. Am I missing something obvious?
I've tried translating the sequencing into decimal but that pattern is 0, 1, 2, 7, 12, 29, 58, etc which doesn't stand out to me as anything obvious.
The brute-force way that I'm doing it now is that I have an array of counters (one for each column/clock) and I independently reset each count when the respective column's "period" is reached (so 2 for the first column, 3 for the next, etc). But that feels ugly.
I'd love to do this directly on the number without requiring an array of counters. Is this even possible? Is this a known sequence? I'm not even sure what to Google to be honest. I'd appreciate any kind of leads on this. I'm happy to go down the rabbit hole with some guidance.
UPDATE
As per #borrible's observation, there are more than one values for i-1 for a given i so it turns out the solution to my original question is ambiguous. So I will expand my question to allow i as an input (in addition to the i-1th value.
Without knowing i you are only going be able to generate the successor to a given sequence if that sequence uniquely implies i (modulo the number of bit sequences). If this is not the case the successor to a given sequence is ambiguous.
Lets consider the first few sequences for 3 bits:
0 0 0
0 0 1
0 1 0
1 1 1
1 0 0
1 0 1
0 1 0
0 1 1
Note that 0 1 0 is succeeded by both 1 1 1 and 0 1 1; i.e. it is ambiguous. Given 0 1 0 but not i you cannot deduce the next sequence. You can see a similar ambiguity in 4 bit sequences for 0 1 1 1 etc...
In other words, without knowing i, your problem is not generally solvable.
This sequence can be considered as a set of state machines, each with 2,4,6,...,16 states. The least common multiple of 2,4,6,...,16, i.e. the length of the sequence, is 1680. Eight bits only lets us represent 256 values, so even if we were allowed to select the state encoding (which we aren't!), we wouldn't be able to uniquely identify all possible states.
If we know the index i (or, as the sequence length is 1680, it is sufficient to know the index modulo 1680), digit j is given by (i mod (2 * j)) / j.

Algorithm for 'Pogo Painter' minigame

I am working on a minigame called 'Pogo Painter', and I need some mathematical solutions. Below is an image (made with Paint) to illustrate a bit what it's all about.
Four players, each of different color, must claim squares to gain points. The minigame will be similar to this: http://www.youtube.com/watch?v=rKCQfAlaRrc, but slightly different. The players will be allowed to run around the playground and claim any of the squares, and points are gathered when a pattern is closed. For example, claiming blue square on A3 will create a closed blue pattern.
What kind of variables should I declare and how do I check if the pattern is closed?
Please answer if you have a solution :)
Here’s another (Discrete Optimization) way to model your problem.
Notation
View your grid as a ‘graph’ with n^2 nodes, and edges of length 1 (Edges connect two neighboring nodes.) Let the nodes be numbered 1:n^2. (For ease of notation, you can use a double array (x,y) to denote each node if you prefer.)
Decision Variables
There are k colors, one for each player (1 through 4). 0 is an unclaimed cell (white)
X_ik = 1 if player k has claimed node i. 0 otherwise.
To start out
X_i0 = 1 for all nodes i.
All nodes start out as white (0).
Neighboring sets: Two nodes i and j are ‘neighbors’ if they are adjacent to each other. (Any given node i can have at most 4 neighbors: Up down right and left.)
Edge variables:
We can now define a new set of edge variables Y_ijk that connect two adjacent nodes (i and j) with a common color k.
Y_ijk = 1 if neighboring nodes i and j are both of color k. 0 Otherwise.
(That is, X_ik = X_jk) for non-zero k.
We now have an undirected graph. Checking for ‘closed patterns’ is the same as detecting cycles.
Detecting Cycles:
A simple DFS search will do, since we have undirected cycles. Start with each colored node i, and check for cycles. If a path leads you back to a visited node, cycles exist. You can award points accordingly.
Finally, one suggestion as you design the game. You can reward points according to the “longest cycle” you detect. The shortest cycle gets 4 points, one point for each edge (or one point for each node in the cycle) whichever works best for you.
1 1
1 1 scores 4 points
1 1 1
1 1 1 scores 6 points
1 1 1
1 1 1
1 1 scores 8 points
Hope that helps.
Okay,
This is plenty of text, but it's simple.
An N-by-N square will satisfy as the game-board.
Each time a player claims a square,
If the square is not attached to any square of that player, then you must give that square a unique ID.
If the square is attached,
Count how many neighbours of each ID it has.
( See the demos I put below, to see what this means)
For each group
patterns_count += group_size - 1
If the number of groups is more than 1
Change the ID of that group as well as every other square connected to it so they all share the same ID
You must remember which IDs belong to which players.
This is what you have in your example
1 1 1 0 0 0 0 2 2
1 0 0 0 1 3 3 0 0
1 1 0 0 3 3 0 0 0
0 1 0 0 4 5 0 0 0
0 0 0 6 4 0 0 0 0
7 7 0 0 0 0 8 8 8
0 7 7 0 9 8 8 0 8
A A 7 0 9 8 0 0 8
A 0 7 0 0 0 8 8 8
And this is what it would turn out like after blue grabs A-3
1 1 1 0 0 0 0 2 2
1 0 0 0 1 3 3 0 0
1 1 0 0 3 3 0 0 0
0 1 0 0 4 5 0 0 0
0 0 0 6 4 0 0 0 0
7 7 0 0 0 0 8 8 8
0 7 7 0 9 8 8 0 8
A A 7 0 9 8 0 0 8
A 0 7 0 0 8 8 8 8
More examples of the algorithm in use
1 1 1 0
1 0 1 0
1 1 0
0 0 0 0
2 neighbours. 2x'1'
1x closed pattern.
1 1 1 0
1 0 1 0
1 1 1 0
0 0 0 0
--
1 1 1 0 0
1 0 1 0 0
1 1 0 0
1 0 1 0 0
1 1 1 0 0
3 neighbours: 3x'1'
2x closed patterns
1 1 1 0 0
1 0 1 0 0
1 1 1 0 0
1 0 1 0 0
1 1 1 0 0
--
1 1 1 0 0
1 0 1 0 0
1 1 2 2
0 0 2 0 2
0 0 2 2 2
4 neighbours: 2x'1', 2x'2'
2 Closed patterns
1 1 1 0 0
1 0 1 0 0
1 1 1 1 1
0 0 1 0 1
0 0 1 1 1
But I also consider these a closed pattern. You haven't given any description as to what should be considered one and what shouldn't be.
1 1 0
1 1 0
0 0 0
1 1 1
1 1 1
0 0 0
1 1 1
1 1 1
1 1

Binary Matrix SQL

I want to store a binary matrix into a database.
Matrix example:
0 1 0 0 1 1 0 1
0 1 0 0 0 1 0 0
0 1 1 0 0 1 1 1
0 1 0 0 1 1 0 0
0 1 0 0 0 1 0 0
0 1 1 0 0 1 0 0
0 1 0 1 0 0 0 0
0 1 0 0 1 1 1 0
Which is the best way to do that?
OBS: I don't understand much of db.
Thanks.
you can store it as varchar(n^2). If you don't need to work on it in the DB, you can also store each line in the matrix as it's value- for example, your first line will be 77.
or if you know the number of digits in any line and colomn you can save it in one decimal(11135377233198751900) or hexadecimal(9A88CE9888C8A09C) number

Resources