I have an sparse matrix array in which i need to find highest empty region around an element. Region should be in rectangle or square form. In that region, no other element should present. Algorithm is enough to develop code. Is there any algorithm available to achieve this?
Since you mentioned no requirements as to the efficiency of a solution, here's a brute force approach.
Let M denote the matrix
Let n be the number of rows
Let m be the number of columns
Let maxRowSize be 0, initially
Let maxColSize be 0, initially
Let maxRowStart be 0, initially
Let maxColStart be 0, initially
for top from 0 to n:
for left from 0 to m:
numNonEmptyElements = 0
if M[top][left] is non-empty:
numNonEmptyElements = 1
for bottom from i to n:
if M[bottom][left] is non-empty AND numNonEmptyElements == 1:
break
for right from 0 to m:
if M[bottom][right] is non-empty:
numNonEmptyElements += 1
if numNonEmptyElements > 1:
break
if (right - left + 1) * (bottom - top + 1) > maxRowSize * maxColSize:
maxRowSize = bottom - top + 1
maxColSize = right - left + 1
maxRowStart = top
maxColStart = left
return any of the maxRowSize, maxColSize, maxRowStart, maxColStart you need
As you can observe from the loops, the time complexity for the pseudocode is O(N2M2), N and M being row and column size of the matrix, and is really inefficient.
Here is an algorithm to solve this problem.First you have to calculate size of the maximum rectangle possible in all the four directions: top-left , top-right ,bottom-left,bottom-right.See the figure all rectangle are indicated in different colors.
For example we want to calculate for the index(3,4),which is 5 in the matrix below.Then calculate the dimensions of all rectangles(top-left , top-right ,bottom-left,bottom-right).I have shown These rectangles in the figure with different colors(red,green,yellow & blue).
After finding dimensions of all the rectangles we can find the dimensions of the highest empty region(Shaded region in figure) by:
Length:
Min((Top-Left_Length+Top-right_Length),(Bottom-Left_Length+Bottom-right_Length));
Width:
Min((Top-Left_Breadth+Top-right_Breadth),(Bottom-Left_Breadth+Bottom-right_Breadth));
This looks like a very good application for a modified flood-fill algorithm.
Considering a NxM matrix and the postion of your element (i,j); 0<=i
getLargestArea(i,j)
a0 = floodFill_markArea(i+1,j)
a1 = floodFill_markArea(i-1,j)
a2 = floodFill_markArea(i,j+1)
a3 = floodFill_markArea(i,j-1)
return max(a0,a1,a2,a3)
As for floodFill_markArea, it starts from a corned and fills a rectangular area keping track of surface area; if should be easy to add a few constraints to the classic flood-fill algorithm to achieve this.
Related
I have a set of data which contains a list of array. Each array contains 4 value of coordinates which forms a rectangle.
i.e. the start and end point(Xs, Xe) on X-axis; and the start and end point(Ys, Ye) on y-axis.
Xs|Xe|Ys|Ye
--------------
[[10,15,5,8],
[9,12,5,8],
[1,20,1,20]]
, for example.
And now I had a point(x,y).
To get all the rectangles that will cover the point, I simply loop through all array and compare one by one.
related_rectangles = []
for rectangle in dataset:
if x > rectangle[0] and x < rectangle[1] and y > rectangle[2] and y < rectangle[3]:
related_rectangles.append(rectangle)
The complexity should be O(n).
The question is, is there any algorithm or data structure that could reduce the complexity of searching? Especially for the case that all rectangles are squares.
Thanks.
I have an array with 600x600 integer values.
The values represents a circle.
For example: 0 is outside, >=1 is inside the circle.
Here is a short example:
0000110000000
0001111000000
0011111200000
0112112110000
1111111111000
0111411110000
0011131100000
0000110000000
0000000000000
0000000000000
The position and the size of the circle in the array differs.
Now, I am looking for a fast algorithm to find the center and the radius of the circle.
Fast because I have to process many arrays.
Superimpose a grid, walk it, (small matrices: every row and column, large matrices every Xth row and column) and find the points where the change (0 -> >=1 or vice versa) happens.
If your grid is symetrical and dense enough, the average of these points is equal to the center.
The average distance ( sqrt(sqr(x-xm)+sqr(y-ym))) of the found points and the center is a measure for the radius.
Walking rows alone might be enough for larger datasets, and you scan only Xth line. If you work with real images, you might have to cater for noise, and variations in brightness.
It is possible to do it faster. Actually according to your sample it does not matter (and we must not estimate) if there is a circle or a square.
Assuming that there is a two-dimention 0-based array the radius will be simply the the count of non-zero (not empty) rows (or columns) divided by 2.
Algo.
Find the number of the non-zero rows and their first (top) index.
Divide this number by two and add the number of zero rows with indexes less then the lowest non-zero row. Now you know the radius and y coordinate.
Find the first(left) non-zero column and add the radius to its index. Now you have the x coordinate.
In the provided example.
The number of non-zero rows is 8. The radius is 8 / 2 = 4. There are 0 rows to the top before the non-zero rows (in other words the ID of the first non-zero row is 0). So the y coordinate is 0 + 4 = 4.
There are 0 empty columns to the left from the first non-zero column (or the ID of the first non-zero column is 0). The x coordinate will be 4 + 0 = 4.
To know if the column is zero you can use some function like this:
IsEmpty := true;
for i := 0 to High(Column) do
if Column[i] > 0 then
begin
IsEmpty := false;
Break;
end;
You have a matrix of 0 and 1 for example:
1 1 1 0
1 1 1 0
1 1 1 0
1 1 1 1
0 1 1 1
1 0 1 1
A square is placed at position (0,0), find the size of the largest square of all 1s that be move from the upper left corner to the lower right corner. The square can only move down and to the right and only over elements which are 1.
In this example the size of the largest square is 2. The indexes of elements in the square are (0,0), (0,1), (1,0), (1,1).
I'm not sure how to solve this problem. I think first, you need to find the all the squares in the upper left corner and all squares in lower right corner. If the move is possible, then there must be squares in these 2 positions that are the same size. Then only attempt to move squares in the left corner that are equal in size to square in right corner. But I'm not sure how to go about finding the squares and checking if they can be moved.
You can use dynamic programming.
Let's assume that max_size(i, j) is the size of the largest square that can stay in a (i, j) cell(possibly 0)(stays in means that its top left corner is located in this cell). We can compute this value in a naive way(by iteratively increasing the size of the square and checking that it does not touch any 0). If a naive solution is not feasible, we can use binary search over the answer and prefix sums to get an O(log n) time per cell.
Let's say that f(i, j) is the largest square that can reach the (i, j) cell. The base case: f(0, 0) = max_size(0, 0)(we can always reach the the top left corner). For the other cells, it can be computed in the following way(I omit corner cases here):
for i <- 0 ... n - 1:
for j <- 0 ... m - 1:
f(i, j) = min(max_size(i, j), max(f(i - 1, j), f(i, j - 1)))
The answer is the largest f(i, j) such that i + f(i, j) - 1 = n - 1 and j + f(i, j) - 1 = m - 1.
The time complexity is O(n * m * log(min(n, m))).
Your first step is right: find the largest square that fits into both the lower right and the upper left.
Then, start with a square of that size in the upper left. Try the paths to the lower right breadth-first. After each step, you check whether the square still fits, otherwise reduce its size. You will then often arrive at a given place from two directions (from above and from the left). Continue from there with the bigger size (this unification step is why you should go breadth-first, this is also called "dynamic programming").
In your example, you start with an array of size 2 at (0 0), but let's say that you start with size 3 for demonstration purposes. So, layer 0 is a single square of size 3 at (0 0). Moving down is no problem, but moving to the right steps on some zeroes, so you have to reduce the size. Layer 1 is thus a list of two squares: one of size 3 at (1 0) and one of size 2 at (0 1). Next layer: move right from (0 1), need to reduce to size 1 at (0 2); move down from (0 1) as well as right from (1 0), both arrive at (1 1), from above with size 2, from left needs to reduce to size 2, so size 2; move down from (1 0), reduce to size 2. So, Layer 2 is a list of three squares: one of size 2 at (2 0), one of size 2 at (1 1), and one of size 1 at (0 2). I think this suffices as a demonstration.
You have reached a solution when you complete a layer that contains a square that touches the lower right. You can tell that there is no solution when all your squares are size 0.
Brute force solution
If your matrix has size N x M, every path of a square of size S x S from top left to bottom right is N + M - 2S 'steps' (movements to the right or left) long, where N - S steps will go downwards and M - S steps rightwards. So if we disregard the values on the matrix, there will be (N + M - 2S) choose (N - S) possible paths.
So if the matrix isn't too large, it might be feasible to just try all these paths for a given square size S and test them for compliance with the square placement rules (whether in each step, the square only covers 1s and no 0s.)
Do this for each S from 1 to min(N,M), and keep track of the values of S for which you can find at least one valid path.
Then, just take the maximum of these S values and you've got the wanted result.
So brute-forcing will (eventually) give you the correct value.
Optimizations
Off course, this isn't as efficient as could be, which will lead to enormous runtimes for large matrices. But we can improve step by step, by looking into what steps are unneccesary.
One valid path for a given square size suffices
If you've found a valid path for a given square size S, you don't have to look for more paths for the same square size and can skip to testing other, yet untested square sizes.
One invalid step kills the whole path
If any step of a path leads to an invalid placement, you don't have to check the other steps. You already know you can't use the whole path.
Don't do double work
Each of the (N-S) * (M-S) possible placements of a square with given size S will be part of several paths. Instead of checking each placement's validity for each path it is part of, do it just once for each placement and store the result in a (N-S) x (M-S) matrix.
Bigger won't fit better
If you've tested all paths for a given square size S, and none of them was valid, you don't have to test larger S at all, as you know there won't be valid paths for them.
Smaller won't fit worse
If you've found a path for a given square size S, you can be sure that all smaller square sizes will have at least one valid path, too, so you don't have to test them. (You wouldn't have to, anyway, as you're looking for the maximum S with at least one valid path.)
Bisection?
Combining the two realizations above, you'll come to the conclusion that it won't be optimal to test sizes S in order, be it increasing or decreasing. Rather, you could start somewhere in-between and —depending on the result for that S— rule out all smaller or all larger values for S. Whether it is optimal to start exactly in the middle (at min(N,M) / 2), I'm not sure, though, as the number of paths to search for a given S (remember the binomial coefficient formula in the "Brute force" section above) depends on the size of S.
Parallelization
Almost every level of the brute force algorithm has several steps that are independent of each other and could be executed in parallel.
More?
I'm sure even with all of the above implemented, there's still room for more optimization, even if I can't think of any right now.
We a given a weighted N*N grid W. Two robots r1,r2 start from top left and top right corners respectively. R1 has to reach to the bottom right and R2 the bottom left corners. Here are the valid moves.
R1 moves one square down, R2 moves one square left.
R1 moves one square right, R2 moves one square down.
They must move in such a way that the sum of the weights of the squares they visit (including the starting and ending square) is maximized.
For Examples, if the grid is:
6 0 3 1
7 4 2 4
3 3 2 8
13 10 1 4
In this example, if R1 follows the path marked by * and R2 follows the path marked
by ^, as shown below, their combined score is 56.
6* 0 3^ -1^
7* 4*^ 2^ 4
-3 3*^ -2* 8*
13^ 10^ -1 -4*
It can be verifyed that this is the best combined score that can be achieved for this grid.
We cannot solve this by recursion as N <= 2500 and the time limit is 2 seconds.
If the problem had only one robot, we could solve this by using dynamic programming.
I tried using a similar approach for this problem;
We have two extra N*N grids G1,G2,
for i from 1 to N:
for j from 1 to N and K from N down to 1:
if (G1[i-1][j] + G2[i][k+1]) > (G1[i][j-1] + G2[i-1][k]):
G1[i][j] = G1[i-1][j] + W[i][j]
G2[i][k] = G2[i][k+1] + W[i][k]
else:
G1[i][j] = G1[i][j-1] + W[i][j]
G2[i][k] = G2[i-1][k] + W[i][k]
return G1[N][N] + G2[N][1]
but this gives a wrong answer.
I am not able to understand what is wrong with my algorithm, because for each square it is calculating the max weighted path to rech there.
Can anyone tell me what is wrong with my method and how can i correct it to get the correct answer?
It is a graph problem, and the graph is G=(V,E) where:
V = Squares x Squares (The Cartesian product of all the squares)
(You might want to exclude points where (x1,y1)=(x2,y2), it can be easily done).
E = { all possible ways to move in a turn } = (or formally) =
{ (((x1,y1),(x2,y2)),((x1-1,y1),(x1,y1-1))), (((x1,y1),(x2,y2)),((x1,y1-1),(x1-1,y1))) | per x1,y1,x2,y2 }
Now, once we have the graph - we can see it is actually a DAG - and this is a good thing, because for a general case graph - the longest path problem is NP-Hard, but not for DAG.
Now, given a DAG G, we want to find the longest path from ((0,0),(n,n)) to ((n,n),(0,0)), and it can be done with the following approach:
For simplicity define weight((x1,y1),(x2,y2)) = weight(x1,y1) + weight(x2,y2):
The algorithm:
Use topological sort on the graph
Init D((n,n),(0,0)) = weight((n,n),(0,0)) (the target node)
Iterate the graph according to the sort descending and do for each vertex v:
D(v) = max { D(u) | for each (v,u) in E as described above } + weight(v)
When you are done D((0,0),(n,n)) will have the optimal result.
I could see a typo in the 2nd valid scenario
The valid 2nd scenario should be
R1 moves one square right, R2 moves one square down
but was given as
R2 moves one square right, R2 moves one square down
I've been trying to find solution to my problem for more than a week and I couldn't find out anything better than a milion iterations prog, so I think it's time to ask someone to help me.
I've got a 3D array. Let's say, we're talking about the ground and the first layer is a surface.
Another layers are floors below the ground. I have to find deepest path's length, count of isolated caves underground and the size of the biggest cave.
Here's the visualisation of my problem.
Input:
5 5 5 // x, y, z
xxxxx
oxxxx
xxxxx
xoxxo
ooxxx
xxxxx
xxoxx
and so...
Output:
5 // deepest path - starting from the surface
22 // size of the biggest cave
3 // number of izolated caves (red ones) (izolated - cave that doesn't reach the surface)
Note, that even though red cell on the 2nd floor is placed next to green one, It's not the same cave because it's placed diagonally and that doesn't count.
I've been told that the best way to do this, might be using recursive algorithm "divide and rule" however I don't really know how could it look like.
I think you should be able to do it in O(N).
When you parse your input, assign each node a 'caveNumber' initialized to 0. Set it to a valid number whenever you visit a cave:
CaveCount = 0, IsolatedCaveCount=0
AllSizes = new Vector.
For each node,
ProcessNode(size:0,depth:0);
ProcessNode(size,depth):
If node.isCave and !node.caveNumber
if (size==0) ++CaveCount
if (size==0 and depth!=0) IsolatedCaveCount++
node.caveNumber = CaveCount
AllSizes[CaveCount]++
For each neighbor of node,
if (goingDeeper) depth++
ProcessNode(size+1, depth).
You will visit each node 7 times at worst case: once from the outer loop, and possibly once from each of its six neighbors. But you'll only work on each one once, since after that the caveNumber is set, and you ignore it.
You can do the depth tracking by adding a depth parameter to the recursive ProcessNode call, and only incrementing it when visiting a lower neighbor.
The solution shown below (as a python program) runs in time O(n lg*(n)), where lg*(n) is the nearly-constant iterated-log function often associated with union operations in disjoint-set forests.
In the first pass through all cells, the program creates a disjoint-set forest, using routines called makeset(), findset(), link(), and union(), just as explained in section 22.3 (Disjoint-set forests) of edition 1 of Cormen/Leiserson/Rivest. In later passes through the cells, it counts the number of members of each disjoint forest, checks the depth, etc. The first pass runs in time O(n lg*(n)) and later passes run in time O(n) but by simple program changes some of the passes could run in O(c) or O(b) for c caves with a total of b cells.
Note that the code shown below is not subject to the error contained in a previous answer, where the previous answer's pseudo-code contains the line
if (size==0 and depth!=0) IsolatedCaveCount++
The error in that line is that a cave with a connection to the surface might have underground rising branches, which the other answer would erroneously add to its total of isolated caves.
The code shown below produces the following output:
Deepest: 5 Largest: 22 Isolated: 3
(Note that the count of 24 shown in your diagram should be 22, from 4+9+9.)
v=[0b0000010000000000100111000, # Cave map
0b0000000100000110001100000,
0b0000000000000001100111000,
0b0000000000111001110111100,
0b0000100000111001110111101]
nx, ny, nz = 5, 5, 5
inlay, ncells = (nx+1) * ny, (nx+1) * ny * nz
masks = []
for r in range(ny):
masks += [2**j for j in range(nx*ny)][nx*r:nx*r+nx] + [0]
p = [-1 for i in range(ncells)] # parent links
r = [ 0 for i in range(ncells)] # rank
c = [ 0 for i in range(ncells)] # forest-size counts
d = [-1 for i in range(ncells)] # depths
def makeset(x): # Ref: CLR 22.3, Disjoint-set forests
p[x] = x
r[x] = 0
def findset(x):
if x != p[x]:
p[x] = findset(p[x])
return p[x]
def link(x,y):
if r[x] > r[y]:
p[y] = x
else:
p[x] = y
if r[x] == r[y]:
r[y] += 1
def union(x,y):
link(findset(x), findset(y))
fa = 0 # fa = floor above
bc = 0 # bc = floor's base cell #
for f in v: # f = current-floor map
cn = bc-1 # cn = cell#
ml = 0
for m in masks:
cn += 1
if m & f:
makeset(cn)
if ml & f:
union(cn, cn-1)
mr = m>>nx
if mr and mr & f:
union(cn, cn-nx-1)
if m & fa:
union(cn, cn-inlay)
ml = m
bc += inlay
fa = f
for i in range(inlay):
findset(i)
if p[i] > -1:
d[p[i]] = 0
for i in range(ncells):
if p[i] > -1:
c[findset(i)] += 1
if d[p[i]] > -1:
d[p[i]] = max(d[p[i]], i//inlay)
isola = len([i for i in range(ncells) if c[i] > 0 and d[p[i]] < 0])
print "Deepest:", 1+max(d), " Largest:", max(c), " Isolated:", isola
It sounds like you're solving a "connected components" problem. If your 3D array can be converted to a bit array (e.g. 0 = bedrock, 1 = cave, or vice versa) then you can apply a technique used in image processing to find the number and dimensions of either the foreground or background.
Typically this algorithm is applied in 2D images to find "connected components" or "blobs" of the same color. If possible, find a "single pass" algorithm:
http://en.wikipedia.org/wiki/Connected-component_labeling
The same technique can be applied to 3D data. Googling "connected components 3D" will yield links like this one:
http://www.ecse.rpi.edu/Homepages/wrf/pmwiki/pmwiki.php/Research/ConnectedComponents
Once the algorithm has finished processing your 3D array, you'll have a list of labeled, connected regions, and each region will be a list of voxels (volume elements analogous to image pixels). You can then analyze each labeled region to determine volume, closeness to the surface, height, etc.
Implementing these algorithms can be a little tricky, and you might want to try a 2D implementation first. Thought it might not be as efficient as you like, you could create a 3D connected component labeling algorithm by applying a 2D algorithm iteratively to each layer and then relabeling the connected regions from the top layer to the bottom layer:
For layer 0, find all connected regions using the 2D connected component algorithm
For layer 1, find all connected regions.
If any labeled pixel in layer 0 sits directly over a labeled pixel in layer 1, change all the labels in layer 1 to the label in layer 0.
Apply this labeling technique iteratively through the stack until you reach layer N.
One important considering in connected component labeling is how one considers regions to be connected. In a 2D image (or 2D array) of bits, we can consider either the "4-connected" region of neighbor elements
X 1 X
1 C 1
X 1 X
where "C" is the center element, "1" indicates neighbors that would be considered connected, and "X" are adjacent neighbors that we do not consider connected. Another option is to consider "8-connected neighbors":
1 1 1
1 C 1
1 1 1
That is, every element adjacent to a central pixel is considered connected. At first this may sound like the better option. In real-world 2D image data a chessboard pattern of noise or diagonal string of single noise pixels will be detected as a connected region, so we typically test for 4-connectivity.
For 3D data you can consider either 6-connectivity or 26-connectivity: 6-connectivity considers only the neighbor pixels that share a full cube face with the center voxel, and 26-connectivity considers every adjacent pixel around the center voxel. You mention that "diagonally placed" doesn't count, so 6-connectivity should suffice.
You can observe it as a graph where (non-diagonal) adjacent elements are connected if they both empty (part of a cave). Note that you don't have to convert it to a graph, you can use normal 3d array representation.
Finding caves is the same task as finding the connected components in a graph (O(N)) and the size of a cave is the number of nodes of that component.