What's the fastest way to find deepest path in a 3D array? - arrays

I've been trying to find solution to my problem for more than a week and I couldn't find out anything better than a milion iterations prog, so I think it's time to ask someone to help me.
I've got a 3D array. Let's say, we're talking about the ground and the first layer is a surface.
Another layers are floors below the ground. I have to find deepest path's length, count of isolated caves underground and the size of the biggest cave.
Here's the visualisation of my problem.
Input:
5 5 5 // x, y, z
xxxxx
oxxxx
xxxxx
xoxxo
ooxxx
xxxxx
xxoxx
and so...
Output:
5 // deepest path - starting from the surface
22 // size of the biggest cave
3 // number of izolated caves (red ones) (izolated - cave that doesn't reach the surface)
Note, that even though red cell on the 2nd floor is placed next to green one, It's not the same cave because it's placed diagonally and that doesn't count.
I've been told that the best way to do this, might be using recursive algorithm "divide and rule" however I don't really know how could it look like.

I think you should be able to do it in O(N).
When you parse your input, assign each node a 'caveNumber' initialized to 0. Set it to a valid number whenever you visit a cave:
CaveCount = 0, IsolatedCaveCount=0
AllSizes = new Vector.
For each node,
ProcessNode(size:0,depth:0);
ProcessNode(size,depth):
If node.isCave and !node.caveNumber
if (size==0) ++CaveCount
if (size==0 and depth!=0) IsolatedCaveCount++
node.caveNumber = CaveCount
AllSizes[CaveCount]++
For each neighbor of node,
if (goingDeeper) depth++
ProcessNode(size+1, depth).
You will visit each node 7 times at worst case: once from the outer loop, and possibly once from each of its six neighbors. But you'll only work on each one once, since after that the caveNumber is set, and you ignore it.
You can do the depth tracking by adding a depth parameter to the recursive ProcessNode call, and only incrementing it when visiting a lower neighbor.

The solution shown below (as a python program) runs in time O(n lg*(n)), where lg*(n) is the nearly-constant iterated-log function often associated with union operations in disjoint-set forests.
In the first pass through all cells, the program creates a disjoint-set forest, using routines called makeset(), findset(), link(), and union(), just as explained in section 22.3 (Disjoint-set forests) of edition 1 of Cormen/Leiserson/Rivest. In later passes through the cells, it counts the number of members of each disjoint forest, checks the depth, etc. The first pass runs in time O(n lg*(n)) and later passes run in time O(n) but by simple program changes some of the passes could run in O(c) or O(b) for c caves with a total of b cells.
Note that the code shown below is not subject to the error contained in a previous answer, where the previous answer's pseudo-code contains the line
if (size==0 and depth!=0) IsolatedCaveCount++
The error in that line is that a cave with a connection to the surface might have underground rising branches, which the other answer would erroneously add to its total of isolated caves.
The code shown below produces the following output:
Deepest: 5 Largest: 22 Isolated: 3
(Note that the count of 24 shown in your diagram should be 22, from 4+9+9.)
v=[0b0000010000000000100111000, # Cave map
0b0000000100000110001100000,
0b0000000000000001100111000,
0b0000000000111001110111100,
0b0000100000111001110111101]
nx, ny, nz = 5, 5, 5
inlay, ncells = (nx+1) * ny, (nx+1) * ny * nz
masks = []
for r in range(ny):
masks += [2**j for j in range(nx*ny)][nx*r:nx*r+nx] + [0]
p = [-1 for i in range(ncells)] # parent links
r = [ 0 for i in range(ncells)] # rank
c = [ 0 for i in range(ncells)] # forest-size counts
d = [-1 for i in range(ncells)] # depths
def makeset(x): # Ref: CLR 22.3, Disjoint-set forests
p[x] = x
r[x] = 0
def findset(x):
if x != p[x]:
p[x] = findset(p[x])
return p[x]
def link(x,y):
if r[x] > r[y]:
p[y] = x
else:
p[x] = y
if r[x] == r[y]:
r[y] += 1
def union(x,y):
link(findset(x), findset(y))
fa = 0 # fa = floor above
bc = 0 # bc = floor's base cell #
for f in v: # f = current-floor map
cn = bc-1 # cn = cell#
ml = 0
for m in masks:
cn += 1
if m & f:
makeset(cn)
if ml & f:
union(cn, cn-1)
mr = m>>nx
if mr and mr & f:
union(cn, cn-nx-1)
if m & fa:
union(cn, cn-inlay)
ml = m
bc += inlay
fa = f
for i in range(inlay):
findset(i)
if p[i] > -1:
d[p[i]] = 0
for i in range(ncells):
if p[i] > -1:
c[findset(i)] += 1
if d[p[i]] > -1:
d[p[i]] = max(d[p[i]], i//inlay)
isola = len([i for i in range(ncells) if c[i] > 0 and d[p[i]] < 0])
print "Deepest:", 1+max(d), " Largest:", max(c), " Isolated:", isola

It sounds like you're solving a "connected components" problem. If your 3D array can be converted to a bit array (e.g. 0 = bedrock, 1 = cave, or vice versa) then you can apply a technique used in image processing to find the number and dimensions of either the foreground or background.
Typically this algorithm is applied in 2D images to find "connected components" or "blobs" of the same color. If possible, find a "single pass" algorithm:
http://en.wikipedia.org/wiki/Connected-component_labeling
The same technique can be applied to 3D data. Googling "connected components 3D" will yield links like this one:
http://www.ecse.rpi.edu/Homepages/wrf/pmwiki/pmwiki.php/Research/ConnectedComponents
Once the algorithm has finished processing your 3D array, you'll have a list of labeled, connected regions, and each region will be a list of voxels (volume elements analogous to image pixels). You can then analyze each labeled region to determine volume, closeness to the surface, height, etc.
Implementing these algorithms can be a little tricky, and you might want to try a 2D implementation first. Thought it might not be as efficient as you like, you could create a 3D connected component labeling algorithm by applying a 2D algorithm iteratively to each layer and then relabeling the connected regions from the top layer to the bottom layer:
For layer 0, find all connected regions using the 2D connected component algorithm
For layer 1, find all connected regions.
If any labeled pixel in layer 0 sits directly over a labeled pixel in layer 1, change all the labels in layer 1 to the label in layer 0.
Apply this labeling technique iteratively through the stack until you reach layer N.
One important considering in connected component labeling is how one considers regions to be connected. In a 2D image (or 2D array) of bits, we can consider either the "4-connected" region of neighbor elements
X 1 X
1 C 1
X 1 X
where "C" is the center element, "1" indicates neighbors that would be considered connected, and "X" are adjacent neighbors that we do not consider connected. Another option is to consider "8-connected neighbors":
1 1 1
1 C 1
1 1 1
That is, every element adjacent to a central pixel is considered connected. At first this may sound like the better option. In real-world 2D image data a chessboard pattern of noise or diagonal string of single noise pixels will be detected as a connected region, so we typically test for 4-connectivity.
For 3D data you can consider either 6-connectivity or 26-connectivity: 6-connectivity considers only the neighbor pixels that share a full cube face with the center voxel, and 26-connectivity considers every adjacent pixel around the center voxel. You mention that "diagonally placed" doesn't count, so 6-connectivity should suffice.

You can observe it as a graph where (non-diagonal) adjacent elements are connected if they both empty (part of a cave). Note that you don't have to convert it to a graph, you can use normal 3d array representation.
Finding caves is the same task as finding the connected components in a graph (O(N)) and the size of a cave is the number of nodes of that component.

Related

FiPy: Can I directly change faceVariables depending on neighboring cells?

I am working with a biological model of the distribution of microbial biomass (b1) on a 2D grid. From the biomass a protein (p1) is produced. The biomass diffuses over the grid, while the protein does not. Only if a certain amount of protein is produced (p > p_lim), the biomass is supposed to diffuse.
I try to implement this by using a dummy cell variable z multiplied with the diffusion coefficient and setting it from 0 to 1 only in cells where p > p_lim.
The condition works fine and when the critical amount of p is reached in a cell, z is set to 1, and diffusion happens. However, the diffusion still does not work with the rate I would like, because to calculate diffusion, the face variable, not the value of the cell itself is used. The faces of z are always a mean of the cell with z=1 and its neighboring cells with z=0. I I, however, would like the diffusion to work at its original rate even if the neighbouring cell is still at p < p_lim.
So, my question is: Can i somehow access a faceVariable and change it? For example, set a face to 1 if any neigboring cell has reached p1 > p_lim? I guess this is not a proper mathematical thing to do, but I couldn't think of another way to simulate this problem.
I will show a very reduced form of my model below. In any case, I thank you very much for your time!
##### produce mesh
nx= 5.
ny= nx
dx = 1.
dy = dx
L = nx*dx
mesh = Grid2D(nx=nx,ny=ny,dx=dx,dy=dy)
#parameters
h1 = 0.5 # production rate of p
Db = 10. # diffusion coeff of b
p_lim=0.1
# cell variables
z = CellVariable(name="z",mesh=mesh,value=0.)
b1 = CellVariable(name="b1",mesh=mesh,hasOld=True,value=0.)
p1= CellVariable(name="p1",mesh=mesh,hasOld=True,value=0.)
# equations
eqb1 = (TransientTerm(var=b1)== DiffusionTerm(var=b1,coeff=Db*z.arithmeticFaceValue)-ImplicitSourceTerm(var=b1,coeff=h1))
eqp1 = (TransientTerm(var=p1)==ImplicitSourceTerm(var=b1,coeff=h1))
# set b1 to 10. in the center of the grid
b1.setValue(10.,where=((x>2.)&(x<3.)&(y>2.)&(y<3.)))
vi=Viewer(vars=(b1,p1),FIPY_VIEWER="matplotlib")
eq = eqb1 & eqp1
from builtins import range
for t in range(10):
b1.updateOld()
p1.updateOld()
z.setValue(z + 0.1,where=((p1>=p_lim) & (z < 1.)))
eq.solve(dt=0.1)
vi.plot()
In addition to .arithmeticFaceValue, FiPy provides other interpolators between cell and face values, such as .harmonicFaceValue and .minmodFaceValue.
These properties are implemented using subclasses of _CellToFaceVariable, specifically _ArithmeticCellToFaceVariable, _HarmonicCellToFaceVariable, and _MinmodCellToFaceVariable.
You can also make a custom interpolator by subclassing _CellToFaceVariable. Two such examples are _LevelSetDiffusionVariable and ScharfetterGummelFaceVariable (neither is well documented, I'm afraid).
You need to override the _calc_() method to provide your custom calculation. This method takes three arguments:
alpha: an array of the ratio (0-1) of the distance from the face to the cell on one side, relative to the distance from distance from the cell on the other side to the cell on the first side
id1: an array of indices of the cells on one side of the face
id2: an array of indices of the cells on the other side of the face
Note: You can ignore any clause if inline.doInline: and look at the _calc_() method defined under the else: clause.

How to calculate distance between 2 points in a 2D matrix

I am both new to this website and new to C. I need a program to find the average 'jumps' it takes from all points.
The idea is this: Find "jump" distance from 1 to 2, 1 to 3, 1 to 4 ... 1 to 9, or find 2 to 1, 2 to 3, 2 to 4 2 to 5 etc.
Doing them on the first row is simple, just (2-1) or (3-1) and you get the correct number. But if I want to find the distance between 1 and 4 or 1 to 8 then I have absolutely no idea.
The dimensions of the matrix should potentially be changeable. But I just want help with a 3x3 matrix.
Anyone could show me how to find it?
Jump means vertical or horizontal move from one point to another. from 1 to 2 = 1, from 1 to 9 = 4 (shortest path only)
The definition of "distance" on this kind of problems is always tricky.
Imagine that the points are marks on a field, and you can freely walk all over it. Then, you could take any path from one point to the other. The shortest route then would be a straight line; its length would be the length of the vector that joins the points, which happens to be the difference vector among two points' positions. This length can be computed with the help of Pythagora's theorem: dist = sqrt((x2-x1)^2 + (y2-y1)^2). This is known as the Euclidian distance between the points.
Now imagine that you are in a city, and each point is a building. You can't walk over a building, so the only options are to go either up/down or left/right. Then, the shortest distance is given by the sum of the components of the difference vector; which is the mathematical way of saying that "go down 2 blocks and then one block to the left" means walking 3 blocks' distance: dist = abs(x2-x1) + abs(y2-y1). This is known as the Manhattan distance between the points.
In your problem, however, it looks like the only possible move is to jump to an adjacent point, in a single step, diagonals allowed. Then the problem gets a bit trickier, because the path is very irregular. You need some Graph Theory here, very useful when modeling problems with linked elements, or "nodes". Each point would be a node, connected to their neighbors, and the problem would be to find the shortest path to another given point. If jumps had different weights (for instance, is jumping in diagonal was harder), an easy way to solve this is would be with the Dijkstra's Algorithm; more details on implementation at Wikipedia.
If the cost is always the same, then the problem is reduced to counting the number of jumps in a Breadth-First Search of the destination point from the source.
Let's define the 'jump' distance : "the number of hops required to reach from Point A [Ax,Ay] to Point B [Bx,By]."
Now there can be two ways in which the hops are allowed :
Horizontally/VerticallyIn this case, you can go up/down or left/right. As you have to travel X axis and Y axis independently, your ans is:jumpDistance = abs(Bx - Ax) + abs(By - Ay);
Horizontally/Vertically and also Diagonally
In this case, you can go up/down or left/right and diagonally as well. How it differs from Case 1 is that now you have the ability to change your X axis and Y axis together at the cost of only one jump . Your answer now is:jumpDistance = Max(abs(Bx - Ax),abs(By - Ay));
What is the definition of "jump-distance" ?
If you mean how many jumps a man needs to go from square M to N, if he can only jumps vertically and horizontally, one possibility can:
dist = abs(x2 - x1) + abs(y2 - y1);
For example jump-distance between 1 and 9 is: |3-1|+|3-1| = 4
There are two ways to calculate jump distance.
1) when only horizontal and vertical movements are allowed, in that case all you need to do is form a rectangle in between the two points and calculate the length of two adjacent side. Like if you want to move from 1 to 9 then first move from 1 to 3 and then move from 3 to 9. (Convert it to code)
2) when movements in all eight directions are allowed, things get tricky. Like if you want to move from 1 to 6 suppose. What you'll need to do is you'll have to more from 1 to 5. And then from 5 to 6. The way of doing it in code is to find the maximum in between the difference in x and y coordinates. In this example, in x coordinate, difference is 2 (3-1) and in y coordinate, difference is 1 (2-1). So the maximum of this is 2. So here's the answer. (Convert to code)

How do we find two most-weighted routes in a grid?

We a given a weighted N*N grid W. Two robots r1,r2 start from top left and top right corners respectively. R1 has to reach to the bottom right and R2 the bottom left corners. Here are the valid moves.
R1 moves one square down, R2 moves one square left.
R1 moves one square right, R2 moves one square down.
They must move in such a way that the sum of the weights of the squares they visit (including the starting and ending square) is maximized.
For Examples, if the grid is:
6 0 3 1
7 4 2 4
3 3 2 8
13 10 1 4
In this example, if R1 follows the path marked by * and R2 follows the path marked
by ^, as shown below, their combined score is 56.
6* 0 3^ -1^
7* 4*^ 2^ 4
-3 3*^ -2* 8*
13^ 10^ -1 -4*
It can be verifyed that this is the best combined score that can be achieved for this grid.
We cannot solve this by recursion as N <= 2500 and the time limit is 2 seconds.
If the problem had only one robot, we could solve this by using dynamic programming.
I tried using a similar approach for this problem;
We have two extra N*N grids G1,G2,
for i from 1 to N:
for j from 1 to N and K from N down to 1:
if (G1[i-1][j] + G2[i][k+1]) > (G1[i][j-1] + G2[i-1][k]):
G1[i][j] = G1[i-1][j] + W[i][j]
G2[i][k] = G2[i][k+1] + W[i][k]
else:
G1[i][j] = G1[i][j-1] + W[i][j]
G2[i][k] = G2[i-1][k] + W[i][k]
return G1[N][N] + G2[N][1]
but this gives a wrong answer.
I am not able to understand what is wrong with my algorithm, because for each square it is calculating the max weighted path to rech there.
Can anyone tell me what is wrong with my method and how can i correct it to get the correct answer?
It is a graph problem, and the graph is G=(V,E) where:
V = Squares x Squares (The Cartesian product of all the squares)
(You might want to exclude points where (x1,y1)=(x2,y2), it can be easily done).
E = { all possible ways to move in a turn } = (or formally) =
{ (((x1,y1),(x2,y2)),((x1-1,y1),(x1,y1-1))), (((x1,y1),(x2,y2)),((x1,y1-1),(x1-1,y1))) | per x1,y1,x2,y2 }
Now, once we have the graph - we can see it is actually a DAG - and this is a good thing, because for a general case graph - the longest path problem is NP-Hard, but not for DAG.
Now, given a DAG G, we want to find the longest path from ((0,0),(n,n)) to ((n,n),(0,0)), and it can be done with the following approach:
For simplicity define weight((x1,y1),(x2,y2)) = weight(x1,y1) + weight(x2,y2):
The algorithm:
Use topological sort on the graph
Init D((n,n),(0,0)) = weight((n,n),(0,0)) (the target node)
Iterate the graph according to the sort descending and do for each vertex v:
D(v) = max { D(u) | for each (v,u) in E as described above } + weight(v)
When you are done D((0,0),(n,n)) will have the optimal result.
I could see a typo in the 2nd valid scenario
The valid 2nd scenario should be
R1 moves one square right, R2 moves one square down
but was given as
R2 moves one square right, R2 moves one square down

2D Array neighboring algorithm

I have a 2D array like this:
0,1,0,0,1
1,0,1,0,1
0,1,1,0,1
0,1,0,1,1
1,1,0,0,1
If we extract the coordinates of all the 1's we get:
(height,width)
1,2
1,5
2,1
...
So now I want to find the areas which are created by neighboring 1's (not diagonally).
In order to do this, I need to find a way to check the neighbours of neighbors.I've been thinking about using two arrays and swapping the neighbours of one neighbor to one then to another but it isn't a very efiicient way especially when it comes to processing a big array.Is there a better solution to this problem ?
Thanks
There are many such methods, they are referred as connected-component labeling. Here are some of them that are not so old (in no particular order):
Light Speed Labeling For RISC Architectures, 2009
Optimizing Two-pass Connected-Component Labeling Algorithms, 2009
A Linear-time Component-Labeling Algorithm Using Contour Tracing Technique, 2004
The second method is mentioned as "Wu's algorithm" in the literature (they actually refer to an older paper, but the algorithm presented there is the same), and is regarded as one of the fastest for this task. Using flood fill is certainly one of the last methods that you would want to use, as it is very slow compared to any of these. This Wu algorithm is a two-pass labeling based on the union-find data structure with path compression, and is relatively easy to implement. Since the paper deals with 8-connectivity, I'm including sample code for handling the 4-connectivity (which your question is about).
The code for the union-find structure is taken as is from the paper, but you will find similar code in about every text you read about this data structure.
def set_root(e, index, root):
# Set all nodes to point to a new root.
while e[index] < index:
e[index], index = root, e[index]
e[index] = root
def find_root(e, index):
# Find the root of the tree from node index.
root = index
while e[root] < root:
root = e[root]
return root
def union(e, i, j):
# Combine two trees containing node i and j.
# Return the root of the union.
root = find_root(e, i)
if i != j:
root_j = find_root(e, j)
if root > root_j:
root = root_j
set_root(e, j, root)
set_root(e, i, root)
return root
def flatten_label(e):
# Flatten the Union-Find tree and relabel the components.
label = 1
for i in xrange(1, len(e)):
if e[i] < i:
e[i] = e[e[i]]
else:
e[i] = label
label += 1
For simplicity I assume the array is padded with zeroes at the top and left sides.
def scan(a, width, height): # 4-connected
l = [[0 for _ in xrange(width)] for _ in xrange(height)]
p = [0] # Parent array.
label = 1
# Assumption: 'a' has been padded with zeroes (bottom and right parts
# does not require padding).
for y in xrange(1, height):
for x in xrange(1, width):
if a[y][x] == 0:
continue
# Decision tree for 4-connectivity.
if a[y - 1][x]: # b
if a[y][x - 1]: # d
l[y][x] = union(p, l[y - 1][x], l[y][x - 1])
else:
l[y][x] = l[y - 1][x]
elif a[y][x - 1]: # d
l[y][x] = l[y][x - 1]
else:
# new label
l[y][x] = label
p.append(label)
label += 1
return l, p
So initially you have an array a which you pass to this function scan. This is the first labeling pass. To resolve the labels, you simply call flatten_label(p). Then the second labeling pass is a trivial one:
for y in xrange(height):
for x in xrange(width):
l[y][x] = p[l[y][x]]
Now your 4-connected components have been labeled, and max(p) gives how many of those you have. If you read the paper along this code you should have no trouble understanding it. The syntax is from Python, if you have any doubt about its meaning, feel free to ask.
if my understanding of your question is right, you can use floodfill to solve the problem.

Uniformly sampling on hyperplanes

Given the vector size N, I want to generate a vector <s1,s2, ..., sn> that s1+s2+...+sn = S.
Known 0<S<1 and si < S. Also such vectors generated should be uniformly distributed.
Any code in C that helps explain would be great!
The code here seems to do the trick, though it's rather complex.
I would probably settle for a simpler rejection-based algorithm, namely: pick an orthonormal basis in n-dimensional space starting with the hyperplane's normal vector. Transform each of the points (S,0,0,0..0), (0,S,0,0..0) into that basis and store the minimum and maximum along each of the basis vectors. Sample uniformly each component in the new basis, except for the first one (the normal vector), which is always S, then transform back to the original space and check if the constraints are satisfied. If they are not, sample again.
P.S. I think this is more of a maths question, actually, could be a good idea to ask at http://maths.stackexchange.com or http://stats.stackexchange.com
[I'll skip "hyper-" prefix for simplicity]
One of possible ideas: generate many uniformly distributed points in some enclosing volume and project them on the target part of plane.
To get uniform distribution the volume must be shaped like the part of plane but with added margins along plane normal.
To uniformly generate points in such volumewe can enclose it in a cube and reject everything outside of the volume.
select margin, let's take margin=S for simplicity (once margin is positive it affects only performance)
generate a point in cube [-M,S+M]x[-M,S+M]x[-M,S+M]
if distance to the plane is more than M, reject the point and go to #2
project the point on the plane
check that projection falls into [0,S]x[0,S]x[0,S], if not - reject and go to #2
add this point to the resulting set and go to #2 is you need more points
The problem can be mapped to that of sampling on linear polytopes for which the common approaches are Monte Carlo methods, Random Walks, and hit-and-run methods (see https://www.jmlr.org/papers/volume19/18-158/18-158.pdf for examples a short comparison). It is related to linear programming, and can be extended to manifolds.
There is also the analysis of polytopes in compositional data analysis, e.g. https://link.springer.com/content/pdf/10.1023/A:1023818214614.pdf, which provide an invertible transformation between the plane and the polytope that can be used for sampling.
If you are working on low dimensions, you can use also rejection sampling. This means you first sample on the plane containing the polytope (defined by your inequalities). This later method is easy to implement (and wasteful, of course), the GNU Octave (I let the author of the question re-implement in C) code below is an example.
The first requirement is to get vector orthogonal to the hyperplane. For a sum of N variables this is n = (1,...,1). The second requirement is a point on the plane. For your example that could be p = (S,...,S)/N.
Now any point on the plane satisfies n^T * (x - p) = 0
we assume also that x_i >= 0
With these given you compute an orthonormal basis on the plane (the nullity of the vector n) and then create random combination on that bases. Finally you map back to the original space and apply your constraints on the generated samples.
# Example in 3D
dim = 3;
S = 1;
n = ones(dim, 1); # perpendicular vector
p = S * ones(dim, 1) / dim;
# null-space of the perpendicular vector (transposed, i.e. row vector)
# this generates a basis in the plane
V = null (n.');
# These steps are just to reduce the amount of samples that are rejected
# we build a tight bounding box
bb = S * eye(dim); # each column is a corner of the constrained region
# project on the null-space
w_bb = V \ (bb - repmat(p, 1, dim));
wmin = min (w_bb(:));
wmax = max (w_bb(:));
# random combinations and map back
nsamples = 1e3;
w = wmin + (wmax - wmin) * rand(dim - 1, nsamples);
x = V * w + p;
# mask the points inside the polytope
msk = true(1, nsamples);
for i = 1:dim
msk &= (x(i,:) >= 0);
endfor
x_in = x(:, msk); # inside the polytope (your samples)
x_out = x(:, !msk); # outside the polytope
# plot the results
scatter3 (x(1,:), x(2,:), x(3,:), 8, double(msk), 'filled');
hold on
plot3(bb(1,:), bb(2,:), bb(3,:), 'xr')
axis image

Resources