I need to find the largest square of 1's in a giant file full of 1's and 0's. I know i have to use dynamic programming. I am storing it in a 2D array. Any help with the algorithm to find the largest square would be great, thanks!
example input:
1 0 1 0 1 0
1 0 1 1 1 1
0 1 1 1 1 1
0 0 1 1 1 1
1 1 1 1 1 1
answer:
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
My code so far:
int Square (Sq[int x][int y]) {
if (Sq[x][y]) == 0) {
return 0;
}
else {
return 1+MIN( Sq(X-1,Y), Sq(X,Y-1), Sq(X-1,Y-1) );
}
}
(assuming values already entered into the array)
int main() {
int Sq[5][6]; //5,6 = bottom right conner
int X = Square(Sq[5][6]);
}
How do I go on from there?
Here is a sketch of the solution:
For each of the cells we will keep a counter of how big a square can be made using that cell as top left. Clearly all cells with 0 will have 0 as the count.
Start iterating from bottom right cell and go to bottom left, then go to one row up and repeat.
At each scan do this:
If the cell has 0 assign count=0
If the cell has 1 and is an edge cell (bottom or right edge only), assign count=1
For all other cells, check the count of the cell on its right, right-below, and below. Take the min of them and add 1 and assign that to the count. Keep a global max_count variable to keep track of the max count so far.
At the end of traversing the matrix, max_count will have the desired value.
Complexity is no more that the cost of traversal of the matrix.
This is how the matrix will look like after the traversal. Values in parentheses are the counts, i.e. biggest square that can be made using the cell as top left.
1(1) 0(0) 1(1) 0(0) 1(1) 0(0)
1(1) 0(0) 1(4) 1(3) 1(2) 1(1)
0(0) 1(1) 1(3) 1(3) 1(2) 1(1)
0(0) 0(0) 1(2) 1(2) 1(2) 1(1)
1(1) 1(1) 1(1) 1(1) 1(1) 1(1)
Implementation in Python
def max_size(mat, ZERO=0):
"""Find the largest square of ZERO's in the matrix `mat`."""
nrows, ncols = len(mat), (len(mat[0]) if mat else 0)
if not (nrows and ncols): return 0 # empty matrix or rows
counts = [[0]*ncols for _ in xrange(nrows)]
for i in reversed(xrange(nrows)): # for each row
assert len(mat[i]) == ncols # matrix must be rectangular
for j in reversed(xrange(ncols)): # for each element in the row
if mat[i][j] != ZERO:
counts[i][j] = (1 + min(
counts[i][j+1], # east
counts[i+1][j], # south
counts[i+1][j+1] # south-east
)) if i < (nrows - 1) and j < (ncols - 1) else 1 # edges
return max(c for rows in counts for c in rows)
LSBRA(X,Y) means "Largest Square with Bottom-Right At X,Y"
Pseudocode:
LSBRA(X,Y):
if (x,y) == 0:
0
else:
1+MIN( LSBRA(X-1,Y), LSBRA(X,Y-1), LSBRA(X-1,Y-1) )
(For edge cells, you can skip the MIN part and just return 1 if (x,y) is not 0.)
Work diagonally through the grid in "waves", like the following:
0 1 2 3 4
+----------
0 | 1 2 3 4 5
1 | 2 3 4 5 6
2 | 3 4 5 6 7
3 | 4 5 6 7 8
or alternatively, work through left-to-right, top-to-bottom, as long as you fill in edge cells.
0 1 2 3 4
+----------
0 | 1 2 3 4 5
1 | 6 7 8 9 .
2 | . . . . .
3 | . . . . .
That way you'll never run into a computation where you haven't previously computed the necessary data - so all of the LSBRA() "calls" are actually just table lookups of your previous computation results (hence the dynamic programming aspect).
Why it works
In order to have a square with a bottom-right at X,Y - it must contain the overlapping squares of one less dimension that touch each of the other 3 corners. In other words, to have
XXXX
XXXX
XXXX
XXXX
you must also have...
XXX. .XXX .... ....
XXX. .XXX XXX. ....
XXX. .XXX XXX. ....
.... .... XXX. ...X
As long as you have those 3 (each of the LSBRA checks) N-size squares plus the current square is also "occupied", you will have an (N+1)-size square.
The first algorithm that comes to my mind is:
'&&' column/row 1 with column/row 2 if, this is to say do an '&&' operation between each entry and its corresponding entry in the other column/row.
Check the resulting column, if there are any length 2 1's that means we hit a 2x2 square.
And the next column with the result of the first two. If there are any length 3 1's we have hit a 3x3 square.
Repeat until all columns have been used.
Repeat 1-4 starting at column 2.
I won't show you the implementation as its quite straightforward and your problem sounds like homework. Additionally there are likely much more efficient ways to do this, as this will become slow if the input was very large.
Let input matrix is M: n x m
T[i][j] is DP matrix which contains largest square side with squares bottom right angle (i,j).
General rule to fill the table:
if (M[i][j] == 1) {
int v = min(T[i][j-1], T[i-1][j]);
v = min(v, T[i-1][j-1]);
T[i][j] = v + 1;
}
else
T[i][j] = 0;
The result square size is max value in T.
Filling T[i][0] and T[0][j] is trivial.
I am not sure if this algo can be used for your huge file,
but you don't need to store entire matrix T but only current and previous lines only.
Following notes can help to undestand general idea:
all squares with right bottom angles (i-1, j), (i, j-1), (i-1, j-1) with size s are inside square of with right bottom angle (i, j) with size s+1.
if there is square of size s+1 with right bottom corner at (i, j), then size of maximal square with right bottom angles (i-1, j), (i, j-1), (i-1, j-1) is at least s.
Opposite is also true. If size of at least one square with bottom right angles at (i-1, j), (i, j-1), (i-1, j-1) is less then s, then size of square with right bottom corner at (i, j) can not be larger then s+1.
OK, the most inefficient way but simple would be:
select first item. check if 1, if so you have a 1x1 square.
check one below and one to right, if 1, then check row 2 col 2, if 1, 2x2 square.
check row 3 col 1, col 2 and col 3, plus row 1 col 3, row 2 col 3, if 1, 3x3.
So basically you keep expanding the row and col together and check all the cells inside their boundaries. As soon as you hit a 0, it's broken, so you move along 1 point in a row, and start again.
At end of row, move to next row.
until the end.
You can probably see how those fit into while loops etc, and how &&s can be used to check for the 0s, and as you look at it, you'll perhaps also notice how it can be sped up. But as the other answer just mentioned, it does sound a little like homework so we'll leave the actual code up to you.
Good luck!
The key here is that you can keep track of the root of the area instead of the actual area, using dynamic programming.
The algorithm is as follow:
Store an 2D array of ints called max-square, where an element at index i,j represents the size of the square it's in with i,j being the bottom right corner. (if max[i,j] = 2, it means that index i,j is the bottom right corner of a square of size 2^2 = 4)
For each index i,j:
if at i,j the element is 0, then set max-square i,j to 0.
else:
Find the minimum of max-square[i - 1, j] and max-square[i, j - 1] and max-square[i - 1][j -1]. set max-square[i, j] to 1 + the minimum of the 3. Inductively, you'll end up filling in the max-square array. Find/or keep track of the maximum value in the process, return that value^2.
Take a look at these solutions people have proposed:
https://leetcode.com/discuss/questions/oj/maximal-square?sort=votes
Let N be the amount of cells in the 2D array. There exists a very efficient algorithm to list all the maximum empty rectangles. The largest empty square is inside one of these empty rectangles, and founding it is trivial once the list of the maximum empty rectangles has been computed. A paper presenting a O(N) algorithm to create such a list can be found at www.ulg.ac.be/telecom/rectangles as well as source code (not optimized). Note that a proof exists (see the paper) that the number of largest empty rectangles is bounded by N. Therefore, selecting the largest empty square can be done in O(N), and the overall method is also O(N). In practice, this method is very fast. The implementation is very easy to do, since the whole code should not be more than 40 lines of C (the algorithm to list all the maximum empty rectangles takes about 30 lines of C).
Related
This question already has an answer here:
C++/OpenGL convert world coords to screen(2D) coords
(1 answer)
Closed 5 years ago.
Say I have a n by n matrix that has each cell as either a 0 or a 1. There is a list of commands that feed into a program. These commands specify an operation (shift, rotate, flip) and a value x, to specify the magnitude of the operation. The operation will only move cells with a 1 in it. A "UP 3" operation would cause all cells with "1" to shift up by 3.
In the case where there are multiple operations that must be sequentially applied to the matrix, for optimization, what I can do is combine contiguous operations of the same type. As in (up, down, left, right) would all be the same type (shift). CW and CCW rotations would be the same type. And Flip in the x or y directions would be the same type. I can combine multiple operations of the same type if they happen one after another. (i.e. UP 3, DOWN 2 -> results in a net UP 1). I want to know if there is a way to do a single "net" operation by combining operations of different types.
So I want to know, in my operations list, if I have for example, 1 UP, 1 CW, 3 RIGHT, 1 Y FLIP, 2 DOWN.
Instead of doing the above 6 instructions in 6 "moves", is there a mathematical/programming way to combine those into a single instruction for a square n by n matrix?
Must be do-able in C.
In the comments, there is some confusion about the terminology, so let's clarify the terms.
Let's say you have a set of nine unique operations: UP, DOWN, LEFT, RIGHT, ROTATE+90, ROTATE+180, ROTATE+270, HFLIP, and VFLIP. You have implemented each of these as a separate function, copying the contents of an N-by-N matrix to a new matrix.
(Also note that you cannot shift, rotate, or flip/mirror "only ones" in a matrix with only ones and zeros in it. If you try, the effect is the same as when you apply the operation to both ones and zeros. Try it, and you'll see.)
If I've understood correctly, the question can be restated as
Is there a way to combine any sequence of those operations, so that I only need one double loop, rather than a double loop for each individual operation?
The answer is Yes, but the implementation is tricky, and there are two different ways the shifts (UP, DOWN, LEFT, RIGHT) can operate, depending on whether elements that are "shifted out" appear from the other edge, or are lost. As an example, consider a RIGHT operation on a 4×4 matrix:
Horizontally periodic nonperiodic
0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1
0 1 1 1 ⇒ 1 0 1 1 0 1 1 1 ⇒ 0 0 1 1
0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The matrix can be nonperiodic, horizontally periodic, vertically periodic, or fully periodic (i.e., both horizontally and vertically periodic).
For simplicity, let's assume nonperiodic boundaries: The ones are lost, if they are shifted outside the matrix. Zeros are always "shifted in". (This is the easiest case.)
Any combination of the nine unique operations, in any order, even repeated, can be implemented using a single double loop:
For newrow = 1 to N:
row = oldrow
col = oldcol
For newcol = 1 to N:
If (row >= 1) and (row <= N) and
(col >= 1) and (col <= N) and
(oldmatrix[row][col] == 1):
newmatrix[newrow][newcol] = 1
Else:
newmatrix[newrow][newcol] = 0
End if
row = row + colsteprow
col = col + colstepcol
End For
oldrow = oldrow + rowsteprow
oldcol = oldcol + rowstepcol
End For
As I said, this is a bit tricky. Think of superimposing the old and new matrices, but with the sequence of operations applied, and extending the old matrix (row and column numbers) to infinity. (Thus, the row and column numbers in the old matrix are no longer limited to 1..N, but can be any integer.)
You need to identify the row and column numbers (in the old matrix), that correspond to three elements in the new matrix: the first column in the first row, the second column in the first row, and the first column in the second row.
Let row11 and col11 identify the element in the old, infinite matrix, that coincides with the first column, first row in the new matrix; row12 and col12 the element that coincides with the second column, first row in the new matrix; and row21, col21 the element that coincides with the first column, second row in the new matrix.
(In other words, you only need to track how your operations relocate/rotate/mirror these three cells in the old matrix. Hint: reverse.)
Then, at the beginning of the above pseudocode, set oldrow = row11, oldcol = col11, colsteprow = row12 - row11, colstepcol = col12 - col11, rowsteprow = row21 - row11, and rowstepcol = col21 - col11.
Essentially, colsteprow and colstepcol form an integer vector that tells us how an increment in the column number in the new matrix, corresponds to a change of row and/or column in the old matrix. (One of them will always be zero, and the other either +1 or -1.)
Similarly, rowsteprow and rowstepcol tells us how an increment in the row number in the new matrix corresponds to a change in row and/or column numbers in the old matrix.
If your matrix is horizontally periodic, then you wrap oldcol == 0 to oldcol = N, and oldcol == N+1 to oldcol == 1, and drop the If check for oldcol. Similarly, if your matrix is vertically periodic, then you wrap oldrow == 0 to oldrow = N, and oldrow == N+1 to oldrow = 1, and you drop the If check for oldrow.
If you find it hard to fathom how this works, get a transparent sheet of something, and draw a small grid on it, labeling each cell with the row and column numbers. On a paper, draw a small matrix, with same sized cells. The operations shift, rotate, and mirror the transparent sheet on top of the paper. Experiment and observe.
I have a 2D array square size.
such as :
(3x3) (4x4)
1 2 3 or 1 2 3 4
8 9 4 12 13 14 5
7 6 5 11 16 15 6
10 9 8 7
I am trying to find a solution to get by giving a value and the array size the Y, X position of the 2D array.
Exemple:
>> find_y_x_in_snail(3, 4)
1, 2
# in a 3x3 array, search value 4
return y=1 x=2
the only idea i have to create the snail in a 2D array and return the position.. not that great.
I found the opposite algorithm here (first exemple)
Any idea ?
You could use this function:
def find_y_x_in_snail(n, v):
r = 0
span = n
while v > span:
v -= span
r += 1
span -= r%2
d, m = divmod(r,4);
c = n-1-d
return [d, d+v, c, c-v][m], [d+v-1, c, c-v, d][m] # y, x
Explanation
r is the number of corners the "snake" needs to take to get to the targetted value.
span is the number of values in the current, straight segment of the snake, i.e. it starts with n, and decreases after the next corner, and again every second next corner.
d is the distance the snake has from the nearest side of the matrix, i.e. the "winding" level.
m indicates which of the 4 sides the segment -- containing the targetted value -- is at:
0: up
1: right
2: down
3: left
Depending on m a value is taken from a list with 4 expressions, each one tailored for the corresponding side: it defines the y coordinate. A similar method is applied (but with different expressions) for x.
Let us assume that we have a two dimensional array A (n X n). All elements of A are either O or 1. We also have a given integer K. Our task is to find the number of all possible "rectangles" in A, which contain elements with total sum K.
To give an example , if A =
0 0 1 0
1 0 0 1
1 1 1 1
1 0 0 1 and k=3 ,
0 0 1 0
1 0 0 1 holds the property ,
1 1 1 holds the property ,
1 1 1 holds the property ,
0 0
1 0
1 1 holds the property ,
1 1
1 0 holds the property ,
1 1
0 1 holds the property ,
1
1
1 holds the property
1
1
1 holds the property
So unless I missed something, the answer should be 8 for this example.
In other words, we need to check all possible rectangles in A to see if the sum of their elements is K. Is there a way to do it faster than O(n^2 * k^2) ?
You could do this in O(n^3).
First note that a summed area table allows you to compute the sum of any rectangle in O(1) time given O(n^2) preprocessing time.
In this problem we only need to sum the columns, but the general technique is worth knowing.
Then for each start row and end row combination you can do a linear scan across the matrix to count the solutions either with a two pointers approach or simply by storing the previous sums.
Example Python code (finds 14 solutions to your example):
from collections import defaultdict
A=[[0, 0, 1, 0],
[1, 0, 0, 1],
[1, 1, 1, 1],
[1, 0, 0, 1]]
k=3
h=len(A)
w=len(A[0])
C=[ [0]*w for i in range(h+1)]
for x in range(w):
for y in range(1,h+1):
C[y][x] = C[y-1][x] + A[y-1][x]
# C[y][x] contains sum of all values A[y2][x] with y2<y
count=0
for start_row in range(h):
for end_row in range(start_row,h):
D=defaultdict(int) # Key is sum of columns from start to here, value is count
D[0]=1
t=0 # Sum of all A[y][x] for x <= col, start_row<=y<=end_row
for x in range(w):
t+=C[end_row+1][x] - C[start_row][x]
count += D[t-k]
D[t] += 1
print count
I think it's worse than you calculated. I found a total of 14 rectangles with three 1's (green squares). The method I used was to take each {row,column} position in the array as the upper-left of a rectangle, and then consider every possible combination of width and height.
Since the width and height are not constrained by k(at least not directly), the search time is O(n^4). Of course, for any given {row,column,width}, the search ends when the height is such that the sum is greater than k. But that doesn't change the worst case time.
The three starting points in the lower-right need not be considered because it's not possible to construct a rectangle containing k 1's starting from those positions. But again, that doesn't change the time complexity.
Note: I'm aware that this is more of a comment than an answer. However, it doesn't fit in a comment, and I believe it's still useful to the OP. You can't solve a problem until you fully understand it.
I have two decimal number variables, colSum and rowSum, using those I want to build a matrix of binary values based on those sums, the rowSum array variable is the result of adding all the 1's for each row, the same goes for colSum array.
So ,if you have
rowSum = [0,1,2]
colSum = [1,1,1]
you will have to build properly the following array
matrix = [
[0,0,0],
[0,0,1],
[1,1,0]
]
I'm using this method in PHP, that works for a 3x3 matrix, but not for a bigger one, like 8x8.
First ,fill all the 1's in the rows using the rowSum value.
Then ,try to find a wrong sum value of 2 columns, with a pivot I inter-change them (1 with a cero value) in the same row, until i get the correct value of colSum.
But it will not work because I need some control of the criteria to change the 1 and 0 in the same row for two columns...
This is the method I'm using.
Let's say we have this Matrix (N=3 -> NxN):
0 0 0
0 0 1
1 1 0
then we have the following arrays
R0 = {0,1,2} //--> result of sums of each rows: ( 0+0+0, 0+0+1 , 1+1+0 )
C0 = {1,1,1} // ->sums of each columns
Step 1
Create and fill a NxN array using as many 1's as R0(i) in each row:
0 0 0
1 0 0
1 1 0
compute sums of this new matrix now:
R1 = {0,1,2}
C1 = {2,1,0}
Step 2
Check if for all the elements of the column sums of the created matrix has the same value as C0 (origin)
for ( i=0, N-1) do
if C0(i)!=C1(i) then
ReplaceColumn(i)
end
end
To replace a column we have to dig inside the conditions.
C0(0) = 1 != C1(0) = 2
the first column sum does meet the condition to call the replace ,so
Step 3
Choose criteria for apply the branch & bound method and find the best row to change column that satisfy the global condition (all column sums).
The amount of changes for a difference between columns sums is:
|C0(i)-C1(i)|
for this example, |C0(0)-C1(0)| = 1 change.
Go back condition must be if the change generates a greater difference between the total sum of columns.
Σi,N(|C0(i)-C1(i)|)
So, could this method really work?
Is the goal to construct the matrix that satisfies the row and column sums or a matrix that satisfies them? It's not clear from the question, but if it's the former ("the" case) then it's not going to possible.
Suppose it were the case that you could uniquely represent any m × m matrix of bits in this way. Then consider the following hypothetical compression algorithm.
Take 22n bits of data
Treat it as 2n × 2n bits
To describe the data, use 2 × 2n row and column sums, each using at most log2(2n) = n bits
The data is compressed to 2 × n × 2n bits
Since 2 × n × 2n << 22n and this process could just keep being repeated, the supposition that you can uniquely represent any m × m matrix of bits by only its row and column sums is false.
Hey guys, I have a question.
If given a four dimensional array in FORTRAN, and told to find a location of a certain part of it (with a starting location of 200 and 4 bytes per integer). Is there a formula to find the location if is stored in row-major and column-major order.
Basiically given array A(x:X, y:Y, z:Z, q:q) and told to find the location at A(a,b,c,d) what is the formula for finding the location
This comes up all the time when using C libraries with Fortran -- eg, calling MPI routines trying to send particular subsets of Fortran arrays.
Fortran is row-major, or more usefully, the first index moves fastest. That is, the item after A(1,2,3,4) in linear order in memory is A(2,2,3,4). So in your example above, an increase in a by one is a jump of 1 index in the array; a jump in b by one corresponds to a jump of (X-x+1); a jump in c by one corresponds to a jump of (X-x+1)x(Y-y+1), and a jump in d by one is a jump of (X-x+1)x(Y-y+1)x(Z-z+1). In C-based languages, it would be just the opposite; a jump of 1 in the d index would move you 1 index in memory; a jump in c would be a jump of (Q-q+1), etc.
If you have m indicies, and ni is the (zero-based) index in the ith index from the left, and that index has a range of Ni, then the (zero-based) index from the starting position is something like this:
where the product is 1 if the upper index is less than the lower index. To find the number of bytes from the start of the array, you'd multiply that by the size of the object, eg 4 bytes for 32-bit integers.
Been over 25 years since I did any FORTRAN.
I believe FORTRAN, unlike many other languages, lays arrays out in
column major order. That means the leftmost index is the
one that changes most frequently when processing a multi
dimensional array in linear order. Once
the maximum dimension of the leftmost index is reached, set it back to 1, assuming 1 based
indexing, and increment the next level index by 1 and start the process over again.
To calculate the index configuration for any given address offset
you need to know the value of each of the 4 array dimensions. Without this
you can't do it.
Example:
Suppose your array has dimensions 2 by 3 by 4 by 5. This implies a
total of 2 * 3 * 4 * 5 = 120 cells in the matrix. You want the index corresponding
to the 200th byte.
This would be the (200 / 4) - 1 = 49th cell (this assumes 4 bytes per cell and offset zero
is the first cell).
First observe how specific indices translate into offsets...
What cell number does the element X(1,1,1,1) occur at? Simple answer: 1
What cell number does element X(1, 2, 1, 1) occur at? Since we cycled through
the leftmost dimension it must be that dimension plus 1. In other words,
2 + 1 = 3. How about X(1, 1, 2, 1)? We cycled trough the first two dimensions
which is 2 * 3 = 6 plus 1 to give us 7. Finally X(1, 1, 1, 2) must be:
2 * 3 * 4 = 24 plus 1 gives the 25th cell.
Notice that the next righmost index does not increment until the cell number
exceeds the product of the indices to its left. Using this observation you can
calculate the indices for any given cell number by working from the rightmost
index to the left most as follows:
Right most index increments every (2 * 3 * 4 = 24) cells. 24 goes into 49 (the cell number
we want to find the indexing for) twice
leaving 1 left over. Add 1 (for 1 based indexing) that gives us a rightmost
index value of 2 + 1 = 3. Next index (moving left) changes every (2 * 3 = 12) cells. One goes into 12
zero times, this gives us index 0 + 1 = 1. Next index changes every 2 cells. One goes into 2 zero
times giving an incex value of 1. For the last (leftmost index) just add 1 to whatever is
left over, 1 + 1 = 2. This gives us the following reference X(2, 1, 1, 2).
Double check by working it back to an offset:
((2 - 1) + ((1 - 1) * 2) + ((1 - 1) * 2 * 3) + ((3 - 1) * 2 * 3 * 4) = 49.
Just change the numbers and use the same process for any number of dimensions
and/or offsets.
Fortran has column-major order for arrays. This is described at http://en.wikipedia.org/wiki/Row-major_order#Column-major_order. Further down in that article there is the equation for the memory offset of a higher dimensional array.