Algorithm to find intersections between polylines - wpf

Bentley-Ottmann algorithm works for finding intersections of set of straight lines. But I have lot of polylines:
Is there a way to find intersections of the set of polylines?
I'm figuring out, but in the meanwhile, if someone can give some pointers or ideas, that would be helpful. Thanks for reading. By the way, I'm using WPF/C# and all the polylines are PathGeometry.
Source of the Image: http://www.sitepen.com/blog/wp-content/uploads/2007/07/gfx-curve-1.png

The sweep line algorithm has a nice theory but is hard to implement robustly. You need to treat vertical segments, and there might be cases where more than two line segments intersect in a single point (or even share a common line segment).
I'd use an R-Tree to store bounding boxes of the line segments of the polyline and then use the R-Tree to find possibly intersecting elements. Only these need to be tested for intersection. The advantage is that you can use a separate R-Tree for each polyline and thus avoid detection of selfintersections, if needed.
Consider using CGAL's exact predicates kernel to get reliable results.

Adding to Geom's suggestion (an R-Tree is the way to go), further performance improvements can be gained by doing the following:
1. Simplify the polyline - The number of points in the polyline can be reduced while keeping the polyline's general shape. This can be done using an angle threshold and processing each point or by using the Ramer-Douglas-Peucker algorithm. Depending on what you're doing, you may need to keep track of which points from the original polyline were used as the start/end points for each segment of the simplified polyline (the indices of the original polyline's points will need to be stored somewhere).
In this example you can see how a polyline's number of points can be reduced. The red points indicate points which were not used from the original polyline, and the green points indicate points which were kept to build the simplified polyline.
2. Store the simplified polylines in an R-Tree, and determine the intersections between each segment of each polyline (comparing bounds of segments to reduce calculations is beneficial to performance). As this is being done, the old indices from the original polyline's segments are stored as information relating to each detected intersection, along with which polylines intersected (some sort of identifier can be used). This essentially gives you the start and end bounds of each segment in the original polylines where the intersections exist with each other polyline.
3. This step is performed only if the intersection location must match the exact location of the intersections of the original polylines. You will need to go back and use the original non-simplified polylines, along with the data from the intersection information obtained in step 2. Each intersection should have a start and end index associated with it, and these indices can be used to determine which specific segments of the original polyline need to be processed. This will allow you to only process the necessary segments (given by the start and end indices stored with the intersection information). An alternative to that would be to use the point itself and extend a bounding box outward, then process segments of the orignal polylines which intersect with that bounding box (although this will likely take longer).
4. It may be necessary to take an additional step to check the endpoints of each polyline against every other polyline's segments, since the simplification process can knock out some endpoint intersections. (this is generally pretty fast).
This algorithm can be further improved by using the Bentley-Ottmann algorithm (this is the sweep-line algorithm Geom was referring to). Also note that depending on the simplification algorithm used and the parameters used for such algorithms (angular tolerance for example), there may be a trade-off between performance and accuracy (some intersection results can be lost depending on how simple the polylines are).
Obviously, there are libraries out there which may be viable, but if you're limited by license terms due to the company you work for or the product you're working on, third party libraries may not be an option. Additionally, other libraries may not perform as well as may be required.

Related

Obtaining meaningful velocity information from noisy position data

If this question is posted on the wrong stackexchange site - please suggest where I can migrate it to!
I'm studying the velocity of an object that undergoes multiple conditions with walls as well as with other objects. The raw data of the position of the object is slightly noisy, for two reasons: firstly, the resolution of the video is limited, and secondly, my tracking software also has some error in tracking the object (as the image of the object changes slightly over time).
If the velocity of the object is calculated simply by using the raw data of the position of the object, there is significant error (more than that of the velocity) as the object is being tracked at a high frame rate.
I am most interested in the velocity of the object during the time right before and after collisions, and this is thus a significant problem.
Possible options I've considered / attempted.
Applying a discrete Kalman Filter on the position data: This is a solution that comes out pretty often in posts about relate question. However, given that we have all the data when we begin to smooth our data, is the Kalman Filter the best way to make use of the available data? My understanding is that the filter is designed for data that is coming in over time (e.g. position data that is being received in real-time rather than a full set of position data).
Applying Savitsky-Golay smoothing on the position data: When I tried this on my data, I found that significant artifacts were introduced in the area of ±10 data points after each collision. I suspect this has something to do with the significant acceleration at collision, but after trying a range of parameters for the SG smoothing, I am unable to eliminate the artifacts.
Separating the data at collision, then smoothing velocities using a moving average: To overcome the problem introduced by the acceleration at each collision, I separated the data into multiple series at each collision point. For example, if there were three collisions, the data would be separated into four series. The velocity for each data series was then calculated and smoothed using a moving average.
In addition, some of my colleagues suggested passing the velocity information through a low-pass filter, which I have not attempted.
The two questions below are related to mine, and are provided as a reference.
Smooth of series data
Smooth GPS data
In addition, the paper below also seems to provide a good suggestion of how to implement the Kalman Filter, albeit for real-time data.
http://transportation.ce.gatech.edu/sites/default/files/files/smoothing_methods_designed_to_minimize_the_impact_of_gps_random_error_on_travel_distance_speed_and_acceleration_profile_estimates-trr.pdf
Choosing an appropriate filtering algorithm depends mainly on the behavior of your object and your measurement errors (or noise). So I can only give some generic tips:
Differentiation, i.e., calculating the velocity from position data amplifies noise considerably. So probably you do need some kind of smoothing. My ad-hoc approach would be: Fourier-Transform your position data, do the derivative in Fourier space and play around to find an appropriate boundaries for low-path filtering. Applying other transfer functions to your transformed positioned data can be interpreted as kernel smoothing (though some mathematical insight in kernel methods is needed to do that properly).
The Kalman filter is state estimator, which works recursively. If you have a proper (discrete time) motion model & measurement model, it will yield in good results and give you a direct estimate for the velocity. The rules of thumb for such an approach:
Model in the 3D space or 6D space if your object has rotational degrees of freedom, not in image space (the noise behaves differently)
Carefully investigate your projection errors (camera calibration) & carefully choose the noise parameters
Use an Unscented Kalman filter (it is much better than an Extended Kalman filter) if non-linearities occur
Kalman filtering and low path filtering are closely related. For many simple applications the Kalman Filter can be thought of an adaptive low path filter, which does smoothing.
The non-recursive Kalman Filter is a called Gaussian process - though I only see advantages over the Kalman filter if your trajectories have a small number data points. Their application is not as straight forward as the KF.

How to best do server-side geo clustering?

I want to do pre-clustering for a set of approx. 500,000 points.
I haven't started yet but this is what I had thought I would do:
store all points in a localSOLR index
determine "natural cluster positions" according to some administrative information (big cities for example)
and then calculate a cluster for each city:
for each city
for each zoom level
query the index to get the points contained in a radius around the city (the length of the radius depends on the zoom level)
This should be quite efficient because there are only 100 major cities and SOLR queries are very fast. But a little more thinking revealed it was wrong:
there may be clusters of points that are more "near" one another than near a city: they should get their own cluster
at some zoom levels, some points will not be within the acceptable distance of any city, and so they will not be counted
some cities are near one another and therefore, some points will be counted twice (added to both clusters)
There are other approaches:
examine each point and determine to which cluster it belongs; this eliminates problems 2 and 3 above, but not 1, and is also extremely inefficient
make a (rectangular) grid (for each zoom level); this works but will result in crazy / arbitrary clusters that don't "mean" anything
I guess I'm looking for a general purpose geo-clustering algorithm (or idea) and can't seem to find any.
Edit to answer comment from Geert-Jan
I'd like to build "natural" clusters, yes, and yes I'm afraid that if I use an arbitrary grid, it will not reflect the reality of the data. For example if there are many events that occur around a point that is at or near the intersection of two rectangles, I should get just one cluster but will in fact build two (one in each rectangle).
Originally I wanted to use localSOLR for performance reasons (and because I know it, and have better experience indexing a lot of data into SOLR than loading it in a conventional database); but since we're talking of pre-clustering, maybe performance is not that important (although it should not take days to visualize a result of a new clustering experiment). My first approach of querying lots of points according to a predefined set of "big points" is clearly flawed anyway, the first reason I mentioned being the strongest: clusters should reflect the reality of the data, and not some other bureaucratic definition (they will clearly overlap, sure, but data should come first).
There is a great clusterer for live clustering, that has been added to the core Google Maps API: Marker Clusterer. I wonder if anyone has tried to run it "offline": run it for any amount of time it needs, and then store the results?
Or is there a clusterer that examines each point, point after point, and outputs clusters with their coordinates and number of points included, and which does this in a reasonable amount of time?
You may want to look into advanced clustering algorithms such as OPTICS.
With a good database index, it should be fairly fast.

How do you actually MAKE an octree(for voxels)?

I have seen code that creates an octree, adds and remove data from them, but how does one actually build the octree? Is there 3d voxel software that will save to an array of some sort that can then be converted to an octree? or can you save directly to an octree?
Depends on your implementation -
If you are using oct-trees to subdivide space then typically you'll throw a bunch of V3s at it, and once a node has more than a certain amount of points in it, you subdivide, and redistribute them.
If you are looking for a way to store minecraft style voxels, then you'll subdivide until you reach 1:1 with your voxel size, and store your data in the leaf nodes.
Where the data comes from is up to you - Octrees are a way of storing, manipulating and searching for data, not a file-format as such.
Each node in an octree has one point. These nodes are broken up into (you guessed it) eight children nodes. These nodes in turn contain each, one single point.
In general, you don't add all of the your vertices to an octree unless you are doing some ungodly collision detection where every single vertex counts... Not that you can't make it fast then, but it's still slower than the approximation given by a smaller number of nodes. (This is true for nearly everything, approximation is faster).
Again, if you are doing rendering at a high quality octrees should probably have as many nodes as you have points.
Now on to the answer:
Create a root node with the bounding box that will enclose it from the center.
Insert each point. This should subdivide the octree in the relevant directions.
As you insert these points, the data will move further down into leaf nodes that more closely encapsulate your model.
Also, as you subdivide the bounding box is halved for each subsequent node down.
If you really want to save it, you can save the vertices (numbered), and then have your program write down the various connections between nodes, and vertices onto a disk, where they will be able to be loaded in roughly the same time it takes to build the octree from scratch in the first place.
Anyway, I hope I answered your question.

distance between two points across land using sql server

I am looking to calculate the shortest distance between two points inside SQL Server 2008 taking into account land mass only.
I have used the geography data type along with STDistance() to work out point x distance to point y as the crow flies, however this sometimes crosses the sea which i am trying to avoid.
I have also created a polygon around the land mass boundary I am interested in.
I believe that I need to combine these two methods to ensure that STDistance always remains within polygon - unless there is a simpler solution.
Thanks for any advice
Use STIntersects - http://msdn.microsoft.com/en-us/library/bb933899%28v=SQL.105%29.aspx to find out what part of the line is over land.
After reading your comment your requirement makes sense. However I'm pretty sure there are no inbuilt techniques to do this in SQL Server. I'm assuming you are ignoring roads, and taking an as-the-crow-flies approach but over land only.
The only way I can think to do this would be to convert your area into a raster (grid cells) and perform a cost path analysis. You would set the area of sea to have a prohibitively high cost so the algorithm would route around the sea. See this link for description of technique:
http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=cost_path
Otherwise try implementing the algorithm below!
http://bit.ly/ckvciz
There may be other libraries that do this. Alteratively how about using the new Google Directions API between the two cities - you'd get actual road distances then.
http://code.google.com/apis/maps/documentation/directions/

2D Game: Fast(est) way to find x closest entities for another entity - huge amount of entities, highly dynamic

I'm working on a 2D game that has a huge amount of dynamic entities.
For fun's sake, let's call them soldiers, and let's say there are 50000 of them (which I just randomly thought up, it might be much more or much less :)).
All these soldiers are moving every frame according to rules - think boids / flocking / steering behaviour.
For each soldier, to update it's movement I need the X soldiers that are closest to the one I'm processing.
What would be the best spatial hierarchy to store them to facilitate calculations like this without too much overhead ?
(All entities are updated/moved every frame, so it has to handle dynamic entities very well)
The simplest approach is to use a grid. It has several advantages:
simple
fast
easy to add and remove objects
easy to change the grid to a finer detail if you are still doing too many distance checks
Also, make sure you don't do a squareroot for every distance check. Since you are only comparing the distances, you can also compare the distance squared.
For broad-phase collision detection, a spatial index like a quad-tree (since it's 2D) or a grid will do. I've linked to Metanet Software's tutorial before; it outlines a grid-based scheme. Of course, your game doesn't even need to use grids so extensively. Just store each actor in a hidden grid and collide it with objects in the same and neighboring cells.
The whole point of selecting a good spatial hierarchy is to be able to quickly select only the objects that need testing.
(Once you've found out that small subset, the square-root is probably not going to hurt that much anymore)
I'm also interested in what the best / most optimal method is for 2d spatial indexing of highly dynamic objects.
Quadtree / kd-tree seem nice, but they're not too good with dynamic insertions.
KD-trees can become unbalanced.
Just a random thought, if your entities are points a double insertion-sorted structure (by X and Y) in combination with a binary search might be something to try..?

Resources