Get all coordinates closer than x - how to implement - database

I have multiple locations with coordinates. I'd like to ask what is the best (fastest) method how to get all points which are closer than, for example, 50km to the particular point.
I made it this way: I saved all coordinates into database, and I'm making a query to get the points. Is this a good solution? Is MySQL suitable for this?

The professional solution would be PostGIS, a "spatial database extender" for PostgreSQL.
For simple purposes this related answer works with basic Postgres, too:
How can I get results from a JPA entity ordered by distance?

Related

Storing 3D points into a database

I am actualy working on a database in which I have to store meteorological 3D points. With my partner we have diffrent point of view in the way of storing it. He would prefer to create a table of Points with one point for each line, and for me it was more appropriate to store the points for a given date/peripheral in a BLOB.
What do you think ?
Thanks
PS : we are also arguing about the DB we should use (SQLite, MySQL,...) if you have ideas, you are welcome !
All depends on whether you need to to write queries to get particular points. ie find me all the thingies with Points with an X < 300. If you don't have much relational stuff to do, and you are going down the blob route then you should consider one of the NoSql variants.
It's decision you should make based on your functional needs not a purist approach to relational db design.

Database solution for route matching

i'm working on an application that lets users search for trips from point A to point B.
it needs to solve the following use cases:
find trips that go from point A to point B
find trips that start in some other point, but go trough point A to point B
I'm now looking for a database solution that would be best to support such use cases.
For now we are using MongoDB. But i had to figure out a workaround for the first use case and i have a feling that it's not possible to solve the second use case with it.
It seems to me that all the available noSql dbs that support spatial features allow only for one geospatial index on a document,node etc. This is fine for queries like show me all shops in radius of 5km from this point and the like.
So i'm looking for a solution that could solve both use cases. Is there something like that available?
pgRouting could be used, indeed. First solution, that pops into mind: when first user has entered New York and Columbus as source and destination of his trip, perform routing query and store path as PostGIS linestring geometry.
When second user enters From: Pittsburgh To: Columbus into search form, geocode city names to locations and make PostGIS queries, how far are those points (or city boundaries) from first user's route path. If they are close enough and first user drives on suitable direction, they could share car.
Second idea: after first user has entered trip details, perform routing query and store all place names, that are passed by route, into database.
Both solutions could be easily implemented with Postgres+PostGIS+pgRouting. Biggest disadvantage of pgRouting is low speed (it's possible to improve performance by reducing data in routing graph; routing speed is not so important etc). It's also possible to export road data to external files; use some high-speed routing engines (like OSRM, MoNav etc); and, if necessary, write result back to PostGIS. But this requires definitely much more effort.
Also, if you choose to avoid the Database route (no pun intended), you could use GeoTools graphing Java library.
http://docs.geotools.org/latest/userguide/extension/graph/index.html
Here is some example code and data I produced myself to demonstrate how it can be used.
http://usefulpracticalgeoblog.blogspot.ch/2012/09/geotools-routing.html
It is pretty flexible in terms of the spatial data formats that can be used to build the street network graph, and how the results can be outputted.
Then to find if the starting point of trip B is close to the pre-calculated route for Trip A, you could use JTS (Java Topology Suite), which is part of the GeoTools library. Here is an example of the analysis you might use.
https://gis.stackexchange.com/questions/7699/for-a-given-feature-find-the-closest-point-along-a-given-path
Postgresql with postgis and pgrouting. You need nothing else.

database design performance issue asking for advice

I am designing a database to store geo-position. I want to implement similar function to Google Map. The usage scenario is, I have a large number of points, and related X, Y position. The database is seldom update (e.g. adding new points or modify X,Y position for existing points), but query frequently. The query scenario is, for a given square (the square's 4 corner points X,Y positions are known), find all points and related X,Y positions in the square.
I am wondering how to design database so that query performance is optimized. My design issue is very similar to map database design. I am also wondering whether Google Map or traditional map database is implemented to achieve the best performance?
I am new to the area of map database design, appreciate if anyone could refer some tutorials for newbie.
thanks in advance,
George
George,
you'll find your positions with
where (geoX between %x1 and %x2) and (geoY between %Y1 and %y2)
Regarding indexing, since you probably always will query for both X and Y a single index will do:
idx_XY (geoX, geoY)
If any need to search on just Y, add second index
idx_Y (geoY)
(And I'd rename X / Y to long/lat, but that's more or less a personal matter, ;-)
regards,
/t
Are you tied to some existing database backend? Some databases (like MongoDB and PostgreSQL) have this feature already built in.

Is this data appropriate for keeping in a database?

In relation to my previous question where I was asking for some database suggestions; it just occured to me that I don't even know if what I'm trying to store there is appropriate for a database. Or should some other data storage method be used.
I have some physical models testing (let's say wind tunnel data; something similar) where for every model (M-1234) I have:
name (M-1234)
length L
breadth B
height H
L/B ratio
L/H ratio
...
lot of other ratios and dimensions ...
force versus speed curve given in the form of a lot of points for x-y plotting
...
few other similar curves (all of them of type x-y).
Now, what I'm trying to accomplish is store that in some reasonable way, so that the user who will be using the database can come and see what are the closest ten models to L/B=2.5 (or some similar demand). Then for that, somehow get all the data of those models, including the curve data (in a plain text file format).
Is a sql database (or any other, for that matter) an appropriate way of handling something like this ? Or should I take some other approach ?
I have about a month to finish this, and in that time I have to learn enough about databases as well, so ... give your suggestions, please, bearing that in mind. Assume no previous knowledge on the subject, whatsoever.
I think what you're looking for is possible. I'm using Postgresql here, but any database should work. This is my test database
CREATE TABLE test (
id serial primary key,
ratio double precision
);
COPY test (id, ratio) FROM stdin;
1 0.29999999999999999
2 0.40000000000000002
3 0.59999999999999998
4 0.69999999999999996
.
Then, to find the nearest values to a particular ratio
select id,ratio,abs(ratio-0.5) as score from test order by score asc limit 2;
In this case, I'm looking for the 2 nearest to 0.5
I'd probably do a datamodel where you have one table for the main data, the ratios and so on, and then a second table which holds the curve points, as I'm assuming that the curves aren't always the same size.
Yes, a database is probably the best approach for this.
A relational database (which usually uses SQL for data access) is suitable for data that is more or less structured as tables.
To give you an idea:
You could have a main table model with fields name, width etc. . Then subtable(s) for any values which can appear more than once, which refers back to model (look up "foreign key").
Then a subtable for your actual curves, again refering back to model.
How to actually model the curves in the DB I don't know, as I don't know how you model them. But if its lots of numbers, it can go into the DB.
It seems you know little about relational DBMS. Consider reading something on WIkipedia, or doing a few simple DBMS tutorials (PostgreSQL has some: http://www.postgresql.org/docs/8.4/interactive/tutorial.html , but there are many others). Then pick a DBMS for trying out (PostgreSQL is probably not a bad choice, but again there are many others).
Then try implementing a simple table schema, and get back to us with any detail questions (which you'll probably have).
One more thing: Those questions are probably more appropriate to serverfault.com.
This is arguably scientific data: you might find libraries/formats intended for arbitrary scientific data useful: HDF5 http://www.hdfgroup.org/ (note I am not an expert)

Clustering Lat/Longs in a Database

I'm trying to see if anyone knows how to cluster some Lat/Long results, using a database, to reduce the number of results sent over the wire to the application.
There are a number of resources about how to cluster, either on the client side OR in the server (application) side .. but not in the database side :(
This is a similar question, asked by a fellow S.O. member. The solutions are server side based (ie. C# code behind).
Has anyone had any luck or experience with solving this, but in a database? Are there any database guru's out there who are after a hawt and sexy DB challenge?
please help :)
EDIT 1: Clarification - by clustering, i'm hoping to group x number of points into a single point, for an area. So, if i say cluster everything in a 1 mile / 1 km square, then all the results in that 'square' are GROUP'D into a single result (say ... the middle of the square).
EDIT 2: I'm using MS Sql 2008, but i'm open to hearing if there are other solutions in other DB's.
I'd probably use a modified* version of k-means clustering using the cartesian (e.g. WGS-84 ECF) coordinates for your points. It's easy to implement & converges quickly, and adapts to your data no matter what it looks like. Plus, you can pick k to suit your bandwidth requirements, and each cluster will have the same number of associated points (mod k).
I'd make a table of cluster centroids, and add a field to the original data table to indicate what cluster it belonged too. You'd obviously want to update the clustering periodically if your data is at all dynamic. I don't know if you could do that with a stored procedure & trigger, but perhaps.
*The "modification" would be to adjust the length of the computed centroid vectors so they'd be on the surface of the earth. Otherwise you'd end up with a bunch of points with negative altitude (when converted back to LLH).
If you're clustering on geographic location, and I can't imagine it being anything else :-), you could store the "cluster ID" in the database along with the lat/long co-ordinates.
What I mean by that is to divide the world map into (for example) a 100x100 matrix (10,000 clusters) and each co-ordinate gets assigned to one of those clusters.
Then, you can detect very close coordinates by selecting those in the same square and moderately close ones by selecting those in adjacent squares.
The size of your squares (and therefore the number of them) will be decided by how accurate you need the clustering to be. Obviously, if you only have a 2x2 matrix, you could get some clustering of co-ordinates that are a long way apart.
You will always have the edge cases such as two points close together but in different clusters (one northernmost in one cluster, the other southernmost in another) but you could adjust the cluster size OR post-process the results on the client side.
I did a similar thing for a geographic application where I wanted to ensure I could cache point sets easily. My geohashing code looks like this:
def compute_chunk(latitude, longitude)
(floor_lon(longitude) * 0x1000) | floor_lat(latitude)
end
def floor_lon(longitude)
((longitude + 180) * 10).to_i
end
def floor_lat(latitude)
((latitude + 90) * 10).to_i
end
Everything got really easy from there. I had some code for grabbing all of the chunks from a given point to a given radius that would translate into a single memcache multiget (and some code to backfill that when it was missing).
For movielandmarks.com I used the clustering code from Mike Purvis, one of the authors of Beginning Google Maps Applications with PHP and AJAX. It builds trees of clusters/points for different zoom levels using PHP and MySQL, storing it in the database so that recall is very fast. Some of it may be useful to you even if you are using a different database.
Why not testing multiple approaches?
translate the weka library in .NET CLI with IKVM.NET
add an assembly resulted from your code and weka.dll (use ilmerge) into your database
Make some tests, that is. No specific clustering works better than anyone else.
I believe you can use MSSQL's spatial data types. If they are similar to other spatial data types I know, they will store your points in a tree of rectangles, and then you can go to the lower-resolution rectangles to get implicit clusters.
If you end up wanting to explore Geohash's (which were invented at exactly the same time you posted this question), here's a more fleshed-out implementation of Geohash related functions for SQL Server's TSQL in which you might be interested.
QalGeohash-TSQL
I have used the Integer version of the Geohash extensively to cluster results to reduce data sent to a client for a limited viewport.

Resources