I'm assuming the best answer to this question requires using VoronoiDelaunay.jl, but I'm also open to other packages/approaches.
If I have a set of points in 2D (or 3D? Though not I'm not sure this is possible with the package VoronoiDelaunay.jl), what is the fastest way to get each of their nearest neighbors in a Voronoi-tesselation sense (e.g the neighbors within the 'first Voronoi shell')? I am also not really confident on the mathematics behind this or how it relates to Delaunay triangulation.
The data structure doesn't matter too much to me, but let's just assume the data is stored in a 2D array of type Array{Float64,2} called my_points, whose size is (nDims, nPoints), and nDims is 2 or 3, and nPoints is the number of points. Let's say I want to have the output be an edge list of some kind, e.g. array of arrays called edge_list (Array{Array{Int64,1}}) where each element i of edge_list gives me the indices of those points that are Voronoi neighbors of the focal point i (whose coordinates are stored in my_points[:,i]).
For every point (point in the red tessellation), I want the coordinates / identities of the points that are its Voronoi neighbors (the points in the orange tessellations). This image is taken from Figure 1b of this paper: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002678
Related
I am currently working on an implementation of a Local Search algorithm over an array of complex numbers in C. The point over which I run my algorithm are the following ones:
As you can see it on the figure, the numbers are spread in a circular shape over the complex plane.
Now, the first implementation of my algorithm is taking a random point in this space and tries to find the neighbours by computing the module over all the different points in the complex plane by running in a 1D complex array.
Once the neighbours have been found, the cost function is evaluated over the different neighbours and only the one that gives the best value is kept and the process is repeated again until the maximum of the cost function has been reached. This method is not really a Local Search because I could directly compute the cost function on all the different points in place of having to find the neighbours.
As you understand, the current way of doing it is not efficient at all and therefore I thought of moving to a 2D implementation.
The problem is that I don't know how I could sort my complex numbers to have a 2D array in which I could know directly the neighbours instead of running through all the points?
I would like to fill a plane with randomly placed points, check whether any of them overlap (and if they do, move one of them to empty place) and then calculate the average distance between them. Later I plan on extending that to 3D so that it is kind of having particles in a box.
I know there must be better ways of doing it but here's what I came up with. For placing random points in a plane:
int pos[NUMBER][2]; /* Creates an array of NUMBER amount of points with x and y coordinate */
int a, b;
srand( time(NULL) );
for(a=0;a<NUMBER;a++)
for(b=0;b<2;b++)
pos[a][b]=rand()%11; /* Using modulus is random enough for now */
The next stage is finding points that over lap:
for(a=0;a<NUMBER-1;a++)
for(b=a+1;b<NUMBER;b++)
if( pos[a][0] == pos[b][0] && pos[a][1] == pos[b][1])
printf("These points overlap:\t", pos[a][0], pos[a][1]);
Now when I identify which points overlap I have to move one of them, but when I do the point in new position might overlap with one of the earlier ones. Is there any accepted way of solving this problem? One way is infinite while(true) loop with breaking condition but that seems very inefficient especially when system gets dense.
Thank you!
Here's a sketch of a solution that I think could work:
Your point generation algorithm is good, can be left as is.
The correct time to check for overlap is already when the point is generated. We simply generate new points until we generate one that doesn't overlap with any previous.
To quickly find overlap, use a hash table such as the one from '''glib'''. The key could be two int32_t packed into a int64_t union:
typedef union _Point {
struct {
int32_t x;
int32_t y;
};
int64_t hashkey;
} Point;
Use the "iterate over all keys" functionality of your hash table to build the output array.
I haven't been able to test this but it should work. This assumes that the plane is large in relation to the number of points, so that overlaps are less likely. If the opposite is true, you can invert the logic: start with a full plane and add holes randomly.
Average complexity of this algorithm is O(n).
As you hinted that it should work for high densities as well, the best course of action is to create a 2D array of booleans (or bit vectors if you want to save space), where all elements are set to false initially. Then you loop NUMBER times, generating a random coordinate, and check whether the value in the array is true or not. If true, you generate another random coordinate. If false, you add the coordinate to the list, and set the corresponding element in the array to true.
The above assumes you want exactly NUMBER points, and a completely uniform chance of placing them. If either of those constraints are not necessary, there are other algorithms possible that use much less memory.
One solution is to place points at random, see if they overlap, and re-try on overlap. To avoid testing every point, you need to set up an index by space - if you have a 100*100 plane and a cut-off of 3-4, you could use 10*10 grid squares. Then you have to search four grid squares to check you don't have a hit.
But there are other ways of doing it. Uniformly placing points on a gird will create a Poisson distribution. So for each point, you can create a random number with the Poisson distribution. What happens when you get 2 or more? This method forces you to answer that question. Maybe you artificially clamp to one, maybe you move into the neighbouring slot. This method won't create exactly N points, so if you must have N, you can put in a fudge (randomly add/remove the last few points).
I'm using 2d arrays for to handle objects in a game. The dimensions of the array act like coordinates on a cartesian grid. When the player selects a point on I want to collect the N nearest grid cells from the array, even if the point is not a valid array index.
Example:
An array that goes from [0,0] to [10,10]
If the player selects (-1,-1), N=3, the nearest points would be [(0,0),(0,1),(1,0)]
The brute force approach would be to calculate the euclidean distance between the point selected and each array grid cell, put it in a list, and then sort. With very large arrays this might be performance prohibitive. Is there any greedier way to do this, if for example I knew what N was?
I know if the point selected was INSIDE the grid we could use the formula for a circle to get a rough area to check e.g. N/Pi = R^2. We can check the square that this R value creates in the x and y dimensions, which is much faster.
But what about when the point selected is near the edge of the grid. or off it? Or if you want to ignore certain grid cells?
I would start by finding the point with fractional co-ordinates (that is, not necessarily integers) which is closest to the given point. If only one co-ordinate is outside the range of the grid then just set this co-ordinate to the nearest in-range co-ordinate. If both co-ordinates are outside the range of the grid, then the closest point will be one of the corners.
Given that starting point you need to find N points with integer co-ordinates. If the closest point was a corner it already has integer co-ordinates. Otherwise the closest point is one of the grid points to either side of the closest point.
You know the other N-1 points will be connected to the closest grid point, because you are working out the intersection of two convex shapes - a circle and a rectangle. I would keep a heap of points ordered by distance. Start it off with the neighbours of the closest grid point. Repeatedly remove the point in the heap nearest to the original point outside the grid and put into the heap its neighbours, if they are not there already, until you have extracted the other N-1 points.
I have a two-dimensional array of doubles that implicitly define values on a two-dimensional bounded integer lattice. Separately, I have n 2D seed points (possibly with non-integer coordinates). I'd like to identify each grid point with its closest seed point, and then sum up the values of the grid points identified with each seed point.
What's the most efficient way to do with with JTS/Geotools? I've gotten as far as building a Voronoi diagram with VoronoiDiagramBuilder, but I'm not sure how to efficiently assign all the grid points based on it.
The best way to do this depends on the size of n and the number of polygons in your voronoi diagram. However basically you need to iterate of one of the sets and find the element in the other set that interacts with it.
So assuming that n is less than the number of polygons, I'd do something like:
// features is the collection of Voronoi polygons
// Points is the N points
Expression propertyName = filterFactory.property(features.getSchema()
.getGeometryDescriptor()
.getName());
for (Point p: points) {
Filter filter = filterFactory.contains(propertyName,
filterFactory.literal(p));
SimpleFeatureCollection sub = features.subCollection(filter);
//sub now contains your polygon
//do some processing or save ID
}
If n is larger than number of polygons - reverse the loops and use within instead of contains to find all the points in each polygon.
Say I need to find the euclidean distance from one (x,y) coordinate to every coordinate in an array of million coordinates and then select the coordinate with the smallest distance.
At present I loop though the million element array, calculate distance keeping track of the minimum. Is there a way I could do it differently and faster.
Thanks
You can improve your algorithm significantly by using a more complex data structure for instance a k-d tree. Still if what you expect to do is to simply search once for the nearest neighbour, you can not possibly perform better than iterating over all points.
What you can do, though is optimize the function that computes the distance and also(as mentioned in comments) you may omit the square root as comparing the squares of two non-negative integers is just the same as comparing the values.
What I understand from the question is that you wanna find closest pair of point. There is an algorithm Closest pair of points problem to solve this.
Closest Pair of a set of points:
Divide the set into two equal sized parts by the line l, and recursively compute the minimal distance in each part.
Let d be the minimal of the two minimal distances.
Eliminate points that lie farther than d apart from l
Sort the remaining points according to their y-coordinates
Scan the remaining points in the y order and compute the distances of each point to its five neighbors.
If any of these distances is less than d then update d.
The whole of algorithm Closest Pair takes O(logn*nlogn) = O(nlog2n) time.
We can improve on this algorithm slightly by reducing the time it takes to achieve the y-coordinate sorting in Step 4. This is done by asking that the recursive solution computed in Step 1 returns the points in sorted order by their y coordinates. This will yield two sorted lists of points which need only be merged (a linear time operation) in Step 4 in order to yield a complete sorted list. Hence the revised algorithm involves making the following changes:
Step 1: Divide the set into..., and recursively compute the distance in each part, returning the points in each set in sorted order by y-coordinate.
Step 4: Merge the two sorted lists into one sorted list in O(n) time.
Hence the merging process is now dominated by the linear time steps thereby yielding an O(nlogn) algorithm for finding the closest pair of a set of points in the plane.
You could save quite a chunk of time by first checking if both the distance along x and along y are <= than the last distance (squared) you stored. If it's true, then you carry on with calculating the distance (squared). Of course the amount of time you save depends on how the points are distributed.