database design performance issue asking for advice

database design performance issue asking for advice - database

I am designing a database to store geo-position. I want to implement similar function to Google Map. The usage scenario is, I have a large number of points, and related X, Y position. The database is seldom update (e.g. adding new points or modify X,Y position for existing points), but query frequently. The query scenario is, for a given square (the square's 4 corner points X,Y positions are known), find all points and related X,Y positions in the square.
I am wondering how to design database so that query performance is optimized. My design issue is very similar to map database design. I am also wondering whether Google Map or traditional map database is implemented to achieve the best performance?
I am new to the area of map database design, appreciate if anyone could refer some tutorials for newbie.
thanks in advance,
George

George,
you'll find your positions with
where (geoX between %x1 and %x2) and (geoY between %Y1 and %y2)
Regarding indexing, since you probably always will query for both X and Y a single index will do:
idx_XY (geoX, geoY)
If any need to search on just Y, add second index
idx_Y (geoY)
(And I'd rename X / Y to long/lat, but that's more or less a personal matter, ;-)
regards,
/t

Are you tied to some existing database backend? Some databases (like MongoDB and PostgreSQL) have this feature already built in.

Related

Storing 3D points into a database

I am actualy working on a database in which I have to store meteorological 3D points. With my partner we have diffrent point of view in the way of storing it. He would prefer to create a table of Points with one point for each line, and for me it was more appropriate to store the points for a given date/peripheral in a BLOB.
What do you think ?
Thanks
PS : we are also arguing about the DB we should use (SQLite, MySQL,...) if you have ideas, you are welcome !

All depends on whether you need to to write queries to get particular points. ie find me all the thingies with Points with an X < 300. If you don't have much relational stuff to do, and you are going down the blob route then you should consider one of the NoSql variants.
It's decision you should make based on your functional needs not a purist approach to relational db design.

Get all coordinates closer than x - how to implement

I have multiple locations with coordinates. I'd like to ask what is the best (fastest) method how to get all points which are closer than, for example, 50km to the particular point.
I made it this way: I saved all coordinates into database, and I'm making a query to get the points. Is this a good solution? Is MySQL suitable for this?

The professional solution would be PostGIS, a "spatial database extender" for PostgreSQL.
For simple purposes this related answer works with basic Postgres, too:
How can I get results from a JPA entity ordered by distance?

Find all overlapping circles in a spatial index

Positions reported by phones are approximate - they contain a point (long, lat) and a radius - that is, a phone doesn't know where it is but does know it is within some distance of a certain point.
How can I store this in a database? How can I retrieve all those phones within a certain radius of some other point?
(I have looked at MySQL's point-type but MySQL doesn't seem to like circles and doesn't seem to have even a DISTANCE function; are there other databases that do this well and fast?)

I recommend you store the phones in a Quadtree. Then when you want to query a point, you can do an exhaustive search of only the phones nearby, and save time by not considering the ones too far away. I don't know of any normal database application that will do this for you, but it shouldn't be too difficult to implement yourself.

distance between two points across land using sql server

I am looking to calculate the shortest distance between two points inside SQL Server 2008 taking into account land mass only.
I have used the geography data type along with STDistance() to work out point x distance to point y as the crow flies, however this sometimes crosses the sea which i am trying to avoid.
I have also created a polygon around the land mass boundary I am interested in.
I believe that I need to combine these two methods to ensure that STDistance always remains within polygon - unless there is a simpler solution.
Thanks for any advice

Use STIntersects - http://msdn.microsoft.com/en-us/library/bb933899%28v=SQL.105%29.aspx to find out what part of the line is over land.
After reading your comment your requirement makes sense. However I'm pretty sure there are no inbuilt techniques to do this in SQL Server. I'm assuming you are ignoring roads, and taking an as-the-crow-flies approach but over land only.
The only way I can think to do this would be to convert your area into a raster (grid cells) and perform a cost path analysis. You would set the area of sea to have a prohibitively high cost so the algorithm would route around the sea. See this link for description of technique:
http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=cost_path
Otherwise try implementing the algorithm below!
http://bit.ly/ckvciz
There may be other libraries that do this. Alteratively how about using the new Google Directions API between the two cities - you'd get actual road distances then.
http://code.google.com/apis/maps/documentation/directions/

Clustering Lat/Longs in a Database

I'm trying to see if anyone knows how to cluster some Lat/Long results, using a database, to reduce the number of results sent over the wire to the application.
There are a number of resources about how to cluster, either on the client side OR in the server (application) side .. but not in the database side :(
This is a similar question, asked by a fellow S.O. member. The solutions are server side based (ie. C# code behind).
Has anyone had any luck or experience with solving this, but in a database? Are there any database guru's out there who are after a hawt and sexy DB challenge?
please help :)
EDIT 1: Clarification - by clustering, i'm hoping to group x number of points into a single point, for an area. So, if i say cluster everything in a 1 mile / 1 km square, then all the results in that 'square' are GROUP'D into a single result (say ... the middle of the square).
EDIT 2: I'm using MS Sql 2008, but i'm open to hearing if there are other solutions in other DB's.

I'd probably use a modified* version of k-means clustering using the cartesian (e.g. WGS-84 ECF) coordinates for your points. It's easy to implement & converges quickly, and adapts to your data no matter what it looks like. Plus, you can pick k to suit your bandwidth requirements, and each cluster will have the same number of associated points (mod k).
I'd make a table of cluster centroids, and add a field to the original data table to indicate what cluster it belonged too. You'd obviously want to update the clustering periodically if your data is at all dynamic. I don't know if you could do that with a stored procedure & trigger, but perhaps.
*The "modification" would be to adjust the length of the computed centroid vectors so they'd be on the surface of the earth. Otherwise you'd end up with a bunch of points with negative altitude (when converted back to LLH).

If you're clustering on geographic location, and I can't imagine it being anything else :-), you could store the "cluster ID" in the database along with the lat/long co-ordinates.
What I mean by that is to divide the world map into (for example) a 100x100 matrix (10,000 clusters) and each co-ordinate gets assigned to one of those clusters.
Then, you can detect very close coordinates by selecting those in the same square and moderately close ones by selecting those in adjacent squares.
The size of your squares (and therefore the number of them) will be decided by how accurate you need the clustering to be. Obviously, if you only have a 2x2 matrix, you could get some clustering of co-ordinates that are a long way apart.
You will always have the edge cases such as two points close together but in different clusters (one northernmost in one cluster, the other southernmost in another) but you could adjust the cluster size OR post-process the results on the client side.

I did a similar thing for a geographic application where I wanted to ensure I could cache point sets easily. My geohashing code looks like this:
def compute_chunk(latitude, longitude)
(floor_lon(longitude) * 0x1000) | floor_lat(latitude)
end
def floor_lon(longitude)
((longitude + 180) * 10).to_i
end
def floor_lat(latitude)
((latitude + 90) * 10).to_i
end
Everything got really easy from there. I had some code for grabbing all of the chunks from a given point to a given radius that would translate into a single memcache multiget (and some code to backfill that when it was missing).

For movielandmarks.com I used the clustering code from Mike Purvis, one of the authors of Beginning Google Maps Applications with PHP and AJAX. It builds trees of clusters/points for different zoom levels using PHP and MySQL, storing it in the database so that recall is very fast. Some of it may be useful to you even if you are using a different database.

Why not testing multiple approaches?
translate the weka library in .NET CLI with IKVM.NET
add an assembly resulted from your code and weka.dll (use ilmerge) into your database
Make some tests, that is. No specific clustering works better than anyone else.

I believe you can use MSSQL's spatial data types. If they are similar to other spatial data types I know, they will store your points in a tree of rectangles, and then you can go to the lower-resolution rectangles to get implicit clusters.

If you end up wanting to explore Geohash's (which were invented at exactly the same time you posted this question), here's a more fleshed-out implementation of Geohash related functions for SQL Server's TSQL in which you might be interested.
QalGeohash-TSQL
I have used the Integer version of the Geohash extensively to cluster results to reduce data sent to a client for a limited viewport.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

database design performance issue asking for advice - database

Are you tied to some existing database backend? Some databases (like MongoDB and PostgreSQL) have this feature already built in.

Related

Storing 3D points into a database

Get all coordinates closer than x - how to implement

Find all overlapping circles in a spatial index

distance between two points across land using sql server

Clustering Lat/Longs in a Database

Categories

Resources