Check that smaller cubes fill bigger cube - c

Given one large cube (axis aligned and on integer coordinates), and many smaller cubes (also axis aligned and on integer coordinates). How can we check that the large cube is perfectly filled by the smaller cubes.
Currently we check that:
For each small cube it is fully contained by the large cube.
That it doesn't intersect any other small cube.
The sum of the volumes of the small cubes equals the volume of the large cube.
This is ok for small numbers of cubes but we need to support this test of cubes with dimensions greater than 2^32. Even at 2^16 the number of small cubes required to fill the large cube is large enough that step 2 takes a while (O(n^2) checking each cube intersects no other).
Is there a better algorithm?
EDIT:
There seems to be some confusion over this. I am not trying to split a cube into smaller cubes. That's already done. Part of our program splits large OpenCL ranges (axis aligned cubes on integer coordinates) into lots of smaller ranges that fit into a hardware job.
What I'm doing is hooking into this system and checking that the jobs it produces correctly cover the large initial range. My algorithm above works, but it's slow and given the amount of tests we have to run I'd like to keep these tests as fast as possible.

We are talking about 3D right?
For 2D one can do a similar (but simpler) process (with, I believe, an O(n log n) running time algorithm).
The basic idea of the below is the sweep-line algorithm.
Note that rectangle intersection can done by checking whether any corner of any cube is contained in any other cube.
You can improve on (2) as follows:
Split each cube into 2 rectangles on the y-z plane (so you'd have 2 rectangles defined by the same set of 4 (y,z) coordinates, but the x coordinates will be different between the rectangles).
Define the rectangle with the smaller x-coordinate as the start of a cube and the other rectangle as the end of a cube.
Sort the rectangles by x-coordinate
Have an initially empty interval tree
(each interval should also store a reference to the rectangle to which it belongs)
For each rectangle:
Look up the y-coordinate of each point of the rectangle in the interval tree.
For each matching interval, look up its rectangle and check whether the point is also contained within the z-coordinates (this is all that's required because the tree only contains x-coordinates in the correct range and we check the y-coordinates by doing the interval lookup).
If it is, we have overlap.
If the rectangle is the start of a cube, insert the 2 y-coordinates of the rectangle as an interval into the interval tree.
Otherwise, remove the interval defined by the 2 y-coordinates from the tree.
The running time is between O(n) (best case) and O(n2) (worst case), depending on how much overlap there is in the x- and y-coordinates (more overlap is worse).

order your insert cubes
insert the biggest insert cube in one of the corners of your cube and split up the remaining cube into subcubes
insert the second biggest insert cube in the first of the sub cubes that will fit and add the remaining subcubes of this subcube to the set of subcubes
etc.

Another go, again only addressing step 2 in the original question:
Define a space-filling curve with good spatial locality, such as a 3D Hilbert Curve.
For each cube calculate the pair of coordinates on the curve for the points at which the curve both enters and leaves the cube. The space-filling curve will enter and leave some cubes more than once, calculate more than one pair of coordinates for these cases.
You've now got I don't know how many pairs of coordinates, but I'd guess no more than 2^18. These coordinates define intervals along the space-filling curve, so sort them and look for overlaps.
Time complexity is probably dominated by the sort, space complexity is probably quite big.

Related

Generating a set of N random convex disjoint 2D polygons with at most V vertices, and two additional points?

I want to create a set of N random convex disjoint polygons on a plane where each polygon must have at most V vertices, where N and V are parameters for my function, and I'd like to obtain a distribution as close as possible to uniform (every possible set being equally probable). Also I need to randomly select two points on the plane that either match with one of the vertices in the scene or are in empty space (not inside a polygon).
I already implemented for other reasons in the same programming language an AABB tree, Separating Axis Theorem-based collision detection between convex polygons and I can generate a random convex polygon with arbitrary amount of vertices inside a circle of given radius. My best bet thus far:
Generate a random polygon using the function I have available.
Query the AABB tree to test for interception with existing polygons.
If the AABB tree query returns empty set I push the generated polygon into it, otherwise I test with SAT against all the other polygons whose AABB overlaps with the generated one's. If SAT returns "no intersection" I push the polygon, otherwise I discard it.
Repeat from 1 until N polygons are generated.
Generate a random number in {0,1}
If the generated number is 1 I pick a random polygon and a random vertex on it as a point
If the generated number is 0 I generate a random position in (x,y) and test if it falls within some polygon (I might create a tiny AABB around it and exploit the AABB tree to reduce the required number of PiP tests). In case it's not inside any polygon I approve it as a valid point, otherwise I repeat from 5.
Repeat from 5 once more to get the second point.
I think the solution would possibly work, but unfortunately there's no way to guarantee that I can generate N such polygons for very large N, or find two good points in an acceptable time, and I'm programming in React, where long operations run on the main thread blocking the UI till they end. I could circumvent the issue by ejecting from create-react-app and learn Web Workers, which would require probably more time than it's worth for me.
This is definitely non-uniform distribution, but perhaps you could begin by generating N points in the plane and then computing the Voronoi diagram for those points. The Voronoi diagram can be computed in O(n log n) time with Fortune's algorithm. The cells of the Voronoi diagram are convex, so you can then construct a random polygon of the desired number of vertices that lies within each cell of the diagram.
By Balu Ertl - Own work, CC BY-SA 4.0, Link
Ok, here is another proposal. I have little knowledge of js, but could cook up something in Python.
Use Poisson disk sampling with distance parameter d to generate N samples of the centers
For a given center make a circle with R≤d.
Generate V angles using Dirichlet distribution such that sum of angles is equal to 2π. Sort them.
Place vertices on the circle using angles generate at step#3 and connect them. This would be be your polygon
UPDATE
Instead using Poisson disk sampling for step 1, one could use Sobol quasi-random sequences. For N points sampled in the 1x1 box (well, you have to scale it afterwards), least distance between points would be
d = 0.5 * sqrt(D) / N,
where D is dimension of the problem, 2 in your case. So radius of the circle for step 2 would be 0.25 * sqrt(2) / N. This ties nicely N and d.
https://www.sciencedirect.com/science/article/abs/pii/S0378475406002382

Robustly finding the local maximum of an image patch with sub-pixel accuracy

I am developing a SLAM algorithm in C, and I have implemented the FAST corner finding method which gives me some strong keypoints in the image. The next step is to get the center of the keypoints with a sub-pixel accuracy, therefore I extract a 3x3 patch around each of them, and do a Least Squares fit of a two dimensional quadratic:
Where f(x,y) is the corner saliency measure of each pixel, similar to the FAST score proposed on the original paper, but modified to also provide a saliency measure in non corner pixels.
And the least squares:
With being the estimated parameters.
I can now calculate the location of the peak of the fitted quadratic, by taking the gradient equal to zero, achieving my original goal.
The issue arises on some corner cases, where the local peak is closer to the edge of the window, resulting in a fit with low residuals but a peak of the quadratic way outside the window.
An example:
The corner saliency and a contour of the fitted quadratic:
The saliency (blue) and fit (red) as 3D meshes:
Numeric values of this example are (row-major ordering):
[336, 522, 483, 423, 539, 153, 221, 412, 234]
And the resulting sub pixel center of (2.6, -17.1) being wrong.
How can I constrain the fit so the center is within the window?
I'm open to alternative methods for finding the sub pixel peak.
The obvious answer is to reject 3x3 (or 5x5, whatever you use) boxes whose discrete maximum is not at the center. In other words, to use a quadratic approximation only to refine the location of a maximum that must be located inside the box.
More generally, in such cases the first questions to ask is not "How do I constrain my model-fitting procedure to shoehorn a solution for this edge case?", but rather
"Does my model apply to this edge case?" and "Is this edge case even worth spending time on, or can I just ignore it?"
I tried my own code to fit a 2D quadratic function to the 3x3 values, using a stable least-squares solving algorithm, and also found a maximum outside of the domain. The 3x3 patch of data does not match a quadratic function, and therefore the fit is not useful.
Fitting a 2D quadratic to a 3x3 neighborhood requires a degree of smoothness in the data that you don't seem to have in your FAST output.
There are many other methods to find the sub-pixel location of the maximum. One that I like because it is more stable and less computationally intensive is the fitting of a "separable" quadratic function. In short, you fit a quadratic function to the three values around the local maximum in one dimension, and then another in the other dimension. Instead of solving 6 parameters with 9 values, this solves 3 parameters with 3 values, twice. The solution is guaranteed stable, as long as the center pixel is larger or equal to all pixels in the 4-connected neighborhood.
z1 = [f(-1,0), f(0,0), f(1,0)]^T
[1,-1,0]
X = [0,0,0]
[1,1,0]
solve: X b1 = z1
and
z2 = [f(0,-1), f(0,0), f(0,1)]^T
[1,-1,0]
X = [0,0,0]
[1,1,0]
solve: X b2 = z2
Now you get the x-coordinate of the centroid from b1 and the y-coordinate from b2.

Algorithm for best fit rectangle

I'm looking for an algorithm to do a best fit of an arbitrary rectangle to an unordered set of points. Specifically, I'm looking for a rectangle where the sum of the distances of the points to any one of the rectangle edges is minimised. I've found plenty of best fit line, circle and ellipse algorithms, but none for a rectangle. Ideally, I'd like something in C, C++ or Java, but not really that fussy on the language.
The input data will typically be comprised of most points lying on or close to the rectangle, with a few outliers. The distribution of data will be uneven, and unlikely to include all four corners.
Here are some ideas that might help you.
We can estimate if a point is on an edge or on a corner as follows:
Collect the point's n neares neighbours
Calculate the points' centroid
Calculate the points' covariance matrix as follows:
Start with Covariance = ((0, 0), (0, 0))
For each point calculate d = point - centroid
Covariance += outer_product(d, d)
Calculate the covariance's eigenvalues. (e.g. with SVD)
Classify point:
if one eigenvalue is large and the other very small, we are probably on an edge
otherwise we should be on a corner
Extract all corner points and do a segmentation. Choose the four segments with most entries. The centroid of those segments are candidates for the rectangle's corners.
Calculate the normalized direction vectors of two opposite sides and calculate their mean. Calculate the mean of the other two opposite sides. These are the direction vectors of a parallelogram. If you want a rectangle, calculate a perpendicular vector to one of those directions and calculate the mean with the other direction vector. Then the rectangle's direction's are the mean vector and a perpendicular vector.
In order to calculate the corners, you can project the candidates on their directions and move them so that they form the corners of a rectangle.
The idea of a line of best fit is to compute the vertical distances between your points and the line y=ax+b. Then you can use calculus to find the values of a and b that minimize the sum of the squares of the distances. The reason squaring is chosen over absolute value is because the former is differentiable at 0.
If you were to try the same approach with a rectangle, you would run into the problem that the square of the distance to the side of a rectangle is a piecewise defined function with 8 different pieces and is not differentiable when the pieces meet up inside the rectangle.
In order to proceed, you'll need to decide on a function that measures how far a point is from a rectangle that is everywhere differentiable.
Here's a general idea. Make a grid with smallish cells; calculate best fit line for each not-too-empty cell (the calculation is immediate1, there's no search involved). Join adjacent cells while making sure the standard deviation is improving/not worsening much. Thus we detect the four sides and the four corners, and divide our points into four groups, each belonging to one of the four sides.
Next, we throw away the corner cells, put the true rectangle in place of the four approximate
lines and do a bit of hill climbing (or whatever). The calculation of best fit line may be augmented for this case, since the two lines are parallel, and we've already separated our points into the four groups (for a given rectangle, we know the delta-y between the two opposing sides (taking horizontal-ish sides for a moment), so we just add this delta-y to the ys of the lower group of points and make the calculation).
The initial rectangular grid may be replaced with working by stripes (say, vertical). Then, at least half of the stripes will have two pronounced groupings of points (find them by dividing each stripe by horizontal division lines into cells).
1For a line Y = a*X+b, minimize the sum of squares of perpendicular distances of data points {xi,yi} to that line. This is directly solvable for a and b. For more vertical lines, flip the Xs and the Ys.
P.S. I interpret the problem as minimizing the sum of squares of perpendicular distances of each point to its nearest side of the rectangle, not to all the rectangle's sides.
I am not completely sure, but You might play around first 2 (3?) dimensions over the PCA from your points. it will work reasonably fast for the most cases.

About curse of dimensionality

My question is about this topic I've been reading about a bit. Basically my understanding is that in higher dimensions all points end up being very close to each other.
The doubt I have is whether this means that calculating distances the usual way (euclidean for instance) is valid or not. If it were still valid, this would mean that when comparing vectors in high dimensions, the two most similar wouldn't differ much from a third one even when this third one could be completely unrelated.
Is this correct? Then in this case, how would you be able to tell whether you have a match or not?
Basically the distance measurement is still correct, however, it becomes meaningless when you have "real world" data, which is noisy.
The effect we talk about here is that a high distance between two points in one dimension gets quickly overshadowed by small distances in all the other dimensions. That's why in the end, all points somewhat end up with the same distance. There exists a good illustration for this:
Say we want to classify data based on their value in each dimension. We just say we divide each dimension once (which has a range of 0..1). Values in [0, 0.5) are positive, values in [0.5, 1] are negative. With this rule, in 3 dimensions, 12.5% of the space are covered. In 5 dimensions, it is only 3.1%. In 10 dimensions, it is less than 0.1%.
So in each dimension we still allow half of the overall value range! Which is quite much. But all of it ends up in 0.1% of the total space -- the differences between these data points are huge in each dimension, but negligible over the whole space.
You can go further and say in each dimension you cut only 10% of the range. So you allow values in [0, 0.9). You still end up with less than 35% of the whole space covered in 10 dimensions. In 50 dimensions, it is 0.5%. So you see, wide ranges of data in each dimension are crammed into a very small portion of your search space.
That's why you need dimensionality reduction, where you basically disregard differences on less informative axes.
Here is a simple explanation in layman terms.
I tried to illustrate this with a simple illustration shown below.
Suppose you have some data features x1 and x2 (you can assume they are blood pressure and blood sugar levels) and you want to perform K-nearest neighbor classification. If we plot the data in 2D, we can easily see that the data nicely group together, each point has some close neighbors that we can use for our calculations.
Now let's say we decide to consider a new third feature x3 (say age) for our analysis.
Case (b) shows a situation where all of our previous data comes from people the same age. You can see that they are all located at the same level along the age (x3) axis.
Now we can quickly see that if we want to consider age for our classification, there is a lot of empty space along the age(x3) axis.
The data that we currently have only over a single level for the age. What happens if we want to make a prediction for someone that has a different age(red dot)?
As you can see there are not enough data points close this point to calculate the distance and find some neighbors. So, If we want to have good predictions with this new third feature, we have to go and gather more data from people of different ages to fill the empty space along the age axis.
(C) It is essentially showing the same concept. Here assume our initial data, were gathered from people of different ages. (i.e we did not care about the age in our previous 2 feature classification task and might have assumed that this feature does not have an effect on our classification).
In this case , assume our 2D data come from people of different ages ( third feature). Now, what happens to our relatively closely located 2d data, if we plot them in 3D? If we plot them in 3D, we can see that now they are more distant from each other,(more sparse) in our new higher dimension space(3D). As a result, finding the neighbors becomes harder since we don't have enough data for different values along our new third feature.
You can imagine that as we add more dimensions the data become more and more apart. (In other words, we need more and more data if you want to avoid having sparsity in our data)

Spatial Data Structures in C

I do work in theoretical chemistry on a high performance cluster, often involving molecular dynamics simulations. One of the problems my work addresses involves a static field of N-dimensional (typically N = 2-5) hyper-spheres, that a test particle may collide with. I'm looking to optimize (read: overhaul) the the data structure I use for representing the field of spheres so I can do rapid collision detection. Currently I use a dead simple array of pointers to an N-membered struct (doubles for each coordinate of the center) and a nearest-neighbor list. I've heard of oct- and quad- trees but haven't found a clear explanation of how they work, how to efficiently implement one, or how to then do fast collision detection with one. Given the size of my simulations, memory is (almost) no object, but cycles are.
How best to approach this for your problem depends on several factors that you have not described:
- Will the same hypersphere arrangement be used for many particle collision calculations?
- Are the hyperspheres uniform size?
- What is the movement of the particle (e.g. straight line/curve) and is that movement affected by the spheres?
- Do you consider the particle to have zero volume?
I assume that the particle does not have simple straight line movement as that would be the relatively fast calculation of finding the closest point between a line and a point, which is likely going to be about the same speed as finding which of the boxes the line intersects with (to determine where in the n-tree to examine).
If your hypersphere positions are fixed for a lot of particle collisions then computing a voronoi decomposition/Dirichlet tessellation would give you a fast way of later finding exactly which sphere is closest to your particle for any given point in the space.
However to answer your original question about octrees/quadtrees/2^n-trees, in n dimensions you start with a (hyper)-cube that contains the area of space that you are interested in. This will be subdivided into 2^n hypercubes if you deem the contents to be too complicated. This continues recursively until you have only simple elements (e.g. one hypersphere centroid) in the leaf nodes.
Now that the n-tree is built you use it for collision detection by taking the path of your particle and intersecting it with the outer hypercube. The intersection position will tell you which hypercube in the next level down of the tree to visit next, and you determine the position of intersection with all 2^n hypercubes at that level, following downwards until you reach a leaf node. Once you reach the leaf you can examine interactions between your particle path and the hypersphere stored at that leaf. If you have collision you have finished, otherwise you have to find the exit point of the particle path from the current hypercube leaf and determine which hypercube it moves to next. Continue until you find a collision or entirely leave the overall bounding hypercube.
Efficiently finding the neighbouring hypercube when exiting a hypercube is one of the most challenging parts of this approach. For 2^n trees Samet's approaches {1, 2} can be adapted. For kd-trees (binary trees) an approach is suggested in {3} section 4.3.3.
Efficient implementation can be as simple as storing a list of 8 pointers from each hypercube to its children hypercubes, and marking the hypercube in a special way if it is a leaf (e.g. make all pointers NULL).
A description of dividing space to create a quadtree (which you can generalise to n-tree) can be found in Klinger & Dyer {4}
As others have mentioned kd-trees may be more suited than 2^n-trees as extension to an arbitrary number of dimensions is more straightforward, however they will result in a deeper tree. It is also easier to adapt the split positions to match the geometry of your
hyperspheres with a kd-tree. The description above of collision detection in a 2^n tree is equally applicable to a kd-tree.
{1} Connected Component Labeling, Hanan Samet, Using Quadtrees Journal of the ACM Volume 28 , Issue 3 (July 1981)
{2} Neighbor finding in images represented by octrees, Hanan Samet, Computer Vision, Graphics, and Image Processing Volume 46 , Issue 3 (June 1989)
{3} Convex hull generation, connected component labelling, and minimum distance
calculation for set-theoretically defined models, Dan Pidcock, 2000
{4} Experiments in picture representation using regular decomposition, Klinger, A., and Dyer, C.R. E, Comptr. Graphics and Image Processing 5 (1976), 68-105.
It sounds like you'd want to implement a kd-tree, which would allow you to more quickly search the N-dimensional space. There's some more information and links to implementations at the Stony Brook Algorithm Repository.
Since your field is static (by which I'm assuming you mean that the hyper spheres don't move), then the fastest solution I know of is a Kdtree.
You can either make your own, or use someone else's, like this one:
http://libkdtree.alioth.debian.org/
A Quad tree is a 2 dimensional tree, in which at each level a node has 4 children, each of which covers 1/4 of the area of the parent node.
An Oct tree is a 3 dimensional tree, in which at each level a node has 8 children, each of which contains 1/8th of the volume of the parent node. Here is picture to help you visualize it: http://en.wikipedia.org/wiki/Octree
If you're doing N dimensional intersection tests, you could generalize this to an N tree.
Intersection algorithms work by starting at the top of the tree and recursively traversing into any child nodes that intersect the object being tested, at some point you get to leaf nodes, which contain the actual objects.
An octree will work as long as you can specify the spheres by their centres - it hierarchically bins points into cubic regions with eight children. Working out neighbours in an octree data structure will require you to do sphere-intersecting-cube calculations (to some extent easier than they look) to work out which cubic regions in an octree are within the sphere.
Finding the nearest neighbours means walking back up the tree until you get a node with more than one populated child and all surrounding nodes included (this ensures the query gets all sides).
From memory, this is the (somewhat naive) basic algorithm for sphere-cube intersection:
i. Is the centre within the cube (this gets the eponymous situation)
ii. Are any of the corners of the cube within radius r of the centre (corners within the sphere)
iii. For each surface of the cube (you can eliminate some of the surfaces by working out which side of the surface the centre lies on) work out (this is all first-year vector arithmetic):
a. A normal of the surface that goes to the centre of the sphere
b. The distance from the centre of the sphere to the intersection of the normal with the plane of the surface (chord intersets plane the surface of the cube)
c. Intersection of the plane lies within the side of the cube (one condition of chord intersection to the cube)
iv. Calculate the size of the chord (Sin of Cos^-1 of ratio of normal length to radius of sphere)
v. If the nearest point on the line is less than the distance of the chord and the point lies between the ends of the line the chord intersects one of the edges of the cube (chord intersects cube surface somewhere along one of the edges).
Slightly dimly remembered but this is something I did for a situation involving spherical regions using an octee data structure (many years ago). You may also wish to check out KD-trees as some of the other posters suggest but your initial question sounds very similar to what I did.

Resources