Imagine I have a map shape file (.shp) or osm xml, I'm able to see different kind of data from different layers in GIS oriented programs, e.g. ArcGIS, QGIS etc. But how can I get this info programmatically? Is there a specific library for that?
What I'm really looking for is a some kind of method getMapData(longitude, latitude) to get landscape/terrain info (e.g. forest, river, city, highway) in specified location
Thanks in advance for your answers!
It still depends what you want to achieve whether you are better off using raster or vector data.
If your are using your grid to subdivide an area as an array of containers for geographic features, then stick with vector data. To do this, I would create a polygon grid file and intersect it with each of your data layers. You can then add an ID field that represents the cell's location in the array (and hence it's relative position to a known lat/long coordinate - let's say lower left). Alternatively you can use spatial queries to access your data by selecting a polygon in your vector grid file and then finding all the features in your other file that are contained by it.
OTOH, if you want to do some multi-feature analysis based on presence/abscence then you may be better going down the route of raster analysis. My gut feeling from what you have said is that this is what you are trying to achieve but I am still not 100% sure. You would handle this by creating a set of boolean rasters of a suitable resolution and then performing maths operations on the set (add, subtract, average etc - depending on what questions your are asking).
Let's say you are looking at animal migration. Let's say your model assumes that streams, hedges and towns are all obstacles to migration but roads only reduce the chance of an area being crossed. So you convert your obstacles to a value of '1' and NoData to '0' in each case, except roads where you decide to set the value to 0.5. You can then add all your rasters together in one big stack and predict migration routes.
Ok that's a simplistic example but perhaps you can see why we need EVEN more information on what you are wanting to do.
Shapefiles or an osm xml file are just containers that hold geometric shapes. There are plenty of software libraries out there that let you read these files and extract the data. I would recommend looking at GDAL/OGR as a starting point.
A method like getMapData(longitude, latitude) is essentially a search/query function. You need to be a little more specific too, do you want geometries that contain the point, are within a distance of a point, etc?
You could find the map data using a brute force algorithm
for shape in shapefile:
if shape.contains(query_point):
return shape
Or you can use more advanced algorithms/data structures such as RTrees, KDTrees, QuadTrees, etc. The easiest way to get start with querying map data is to load it into a spatial database. I would recommending investigating PostgreSQL+PostGIS and SpatiaLite
You may also like to look at Spatialite and/or PostGIS which are two spatial enabled databses that you could use separately or in conjunction with GDAL/OGR.
I must echo Charles' request that you explain your use-case in more detail because the actual implementation will depend greatly on exactly what you are wanting to achieve. My reading of this is that you may want to convert your data into a series of aligned rasters which you can overlay and treat as a 3 dimensional array.
Related
I'm new to writing postgis queries and I still get confused when writing queries where I have geom vs. a projected version of geom. My queries end up being a mess and I run into the error that goes something like conflicting SRID.
I feel like, mentally, I don't know when to solve it with ST_Transform and when to use ST_SetSRID.
Seems I'm not the only one confused on this. Postgis' website had this article from many years ago now:
ST_Transform and ST_SetSRID: To project or not to project?
People often get confused between the ST_Transform and ST_SetSRID functions.
ST_SetSRID doesn’t change the coordinates but adds meta data to state what spatial reference system the coordinate actually are. If you stamped your WGS 84 long lat data as a meter based projection. Guess what? Its still long lat. A spade by any other name is still a spade so don’t use ST_SetSRID and expect to magically get meter coordinates.
ST_Transform is used to change the underlying coordinates from a known spatial reference system to another known spatial reference system.
source: https://postgis.net/2013/08/30/st_transform-and-st_setsrid-to-project-or-not-to-project/
If the coordinates of the geometry already represent the correct spatial reference system, but it is not yet classified as such, use ST_SetSRID.
If you need to change the actual geometry to reflect a certain spatial reference system, use ST_Transform.
From the docs: "ST_Transform actually changes the coordinates of a geometry from one spatial reference system to another, while ST_SetSRID() simply changes the SRID identifier of the geometry." (https://postgis.net/docs/ST_Transform.html)
I am working on an Android application that uses the Overpass API at [1]. My goal is to get all circular ways that enclose a certain lat-long point.
In order to do so I build a request for a rectangle that contains my location, then parse the response XML and run a ray-casting algorithm to filter the ways that enclose the given lat-long position. This is too slow for the purpose of my application because sometimes the response has tens or hundreds of MB.
Is there any OSM API that I can call to get all ways that enclose a certain location? Otherwise, how could I optimize the process?
Thanks!
[1] http://overpass-api.de/
To my knowledge, there is no standard API in OSM to do this (it is indeed a very uncommon usecase).
I assume you define enclose as the point representing the current location is inside the inner area of the polygon. Furthermore I assume optimizing the process might including changing the entire concept of the algorithm.
First of all, you need to define the rectangle to fetch data. For that, you need to consider that querying a too large rectangle would yield too much data. As far as I know there is no specific API to query circular ways only, and even if there is, querying a too large rectangle would probably denied by the server, because the server load would be enormous.
Server-side precomputation / prefiltering
Therefore I suggest the first optimization: Instead of querying an API that is not specifically suited for your purpose, use an offline database saved on the Android device. OsmAnd and others save the whole database for a country offline, but in your specific usecase you only need to save a pre-filtered database of circular ways.
As far as I know, only a small fraction of the ways in OSM is circular. Therefore I suggest writing a script that regularly downloads OSM dumps e.g. from Geofabrik, remove non-circular ways (e.g. you could check if the last node ID in a way is equal to the first node ID, but you'd need to check if that captures any way you would define as circular). How often you would run it depends on your usecase.
This optimization solves:
The issue of downloading a large amount of data
The issue of overloading the API with large request
The issue of not being able to request large chunks of data
If that is not suitable for your usecase, I suggest to build a simple API for that on your server.
Re-chunking the data into appriopriate grids
However, you still would need to filter a large amount of data. In order to partially solve this, I suggest the second optimization: Re-chunk your data. For example, if your current location is in Virginia, you would not need to filter circular ways that have an area not beyond Texas. Because filtering by state etc. would by highly country-dependent and difficult (CPU-intensive), I suggest to choose a grid, say e.g. 0.05 lat/lon degree (I'd choose a equirectangular projection because it's easy to calculate if you already have lat/lon coordinates).
The script that preprocessed that data shall then create one chunk of data (that could be a file, but we don't know enough about your usecase to talk about specific data strucutres) for any rectangle in the area you want to use. A circular way is included in this chunk if and only if it has at least one node that is inside the chunk area.
You would then only request / filter the specific chunk your position is currently in. Choose the chunk size appropriately for your application (preferably rather small, but that depends on numerous factors!).
This optimization solves:
Assuming most of the circular ways are quite small in terms of their bounding rectangles, you only need to filter a tiny fraction of the overall ways
IO is minimized, especially if you
Hysteretic heuristics
If the aforementioned optimizations do not sufficiently reduce your computation time, I'd suggest the third optimization that depends on how many circular ways you want to find (if you really need to find all, it won't help at all): Use hysteresis. Save the circular ways you were inside of during the last computation (assuming the new current location is near to the last location) and check them first. If your location didn't change too much, you have a high chance of hitting a way you're inside of during the first few raycasts.
Leveraging relations between different circular ways
Also, a fourth optimization is possible: There will be some circular ways that are fully enclosed in another circular way. You could code your program so that it knows about that relation and checks the inner circular way first. If this check succeeds, you automatically now that the current position is also contained in the outer circular way. I think computing the information (server-side) could be incredibly CPU-intensive and implementing it might also be a hard task, so I'd suggest to use this optimization only if not avoidable.
Tuning the parameters of these optimizations should be sufficient to decrease the CPU time needed for your computation significantly. Please feel free to comment/ask if you have further questions regarding these suggestions.
I need to create a SSRS report with many measures in a map.
For example, how do I display Total sales in colour and Prices in bullets?
There is a map component to SSRS, but you'll need a spatial data set that defines the points or shapes you want your measures mapping to. There are a bunch of US-centric ones it comes with, and maybe a world one at the country level.
I'm not sure about how much information you could put on the annotations, but you can also combine traditional reporting techniques with the map and have a side bar with your metrics.
One caveat, I think the maximum size of the map component is 36 inches (last time I used it anyways), if you're looking for larger you'll likely need to use some GIS software instead.
There's a lot of different algorithms for computing the similarity between two images, but I can't find anything on how you would store this information in a database such that you can find similar images quickly.
By "similar" I mean exact duplicates that have been rotated (90 degree increments), color-adjusted, and/or re-saved (lossy jpeg compression).
I'm trying to come up with a "fingerprint" of the images such that I can look them up quickly.
The best I've come up with so far is to generate a grayscale histogram. With 16 bins and 256 shades of gray, I can easily create a 16-byte fingerprint. This works reasonably well, but it's not quite as robust as I'd like.
Another solution I tried was to resize the images, rotate them so they're all oriented the same way, grayscale them, normalize the histograms, and then shrink them down to about 8x8, and reduce the colors to 16 shades of gray. Although the miniature images were very similar, they were usually off by a pixel or two, which means that exact matching can't work.
Without exact-matching, I don't believe there's any efficient way to group similar photos (without comparing every photo to every other photo, i.e., O(n^2)).
So, (1) How can I create I create a fingerprint/signature that is invariant to the requirements mentioned above? Or, (2) if that's not possible, what other metric can I use such that given a single image, I can find it's best matches in a database of thousands?
There's one little confusing thing in your question: the "fingerprint" you linked to is explicitly not meant to find similar images (quote):
TinEye does not typically find similar images (i.e. a different image with the same subject matter); it finds exact matches including those that have been cropped, edited or resized.
Now, that said, I'm just going to assume you know what you are asking, and that you actually want to be able to find all similar images, not just edited exact copies.
If you want to try and get into it in detail, I would suggest looking up papers by Sivic, Zisserman and Nister, Stewenius. The idea these two papers (as well as quite a bit of others lately) have been using is to try and apply text-searching techniques to image databases, and search the image database in a same manner Google would search it's document (web-page) database.
The first paper I have linked to is a good starting point for this kind of approach, since it addresses mainly the big question: What are the "words" in the images?. Text searching techniques all focus on words, and base their similarity measures on calculations including word counts. Successful representation of images as collections of visual words is thus the first step to applying text-searching techniques to image databases.
The second paper then expands on the idea of using text-techniques, presenting a more suitable search structure. With this, they allow for a faster image retrieval and larger image databases. They also propose how to construct an image descriptor based on the underlying search structure.
The features used as visual words in both papers should satisfy your invariance constraints, and the second one definitely should be able to work with your required database size (maybe even the approach from the 1st paper would work).
Finally, I recommend looking up newer papers from the same authors (I'm positive Nister did something new, it's just that the approach from the linked paper has been enough for me until now), looking up some of their references and just generally searching for papers concerning Content based image (indexing and) retrieval (CBIR) - it is a very popular subject right now, so there should be plenty.
I'm trying to see if anyone knows how to cluster some Lat/Long results, using a database, to reduce the number of results sent over the wire to the application.
There are a number of resources about how to cluster, either on the client side OR in the server (application) side .. but not in the database side :(
This is a similar question, asked by a fellow S.O. member. The solutions are server side based (ie. C# code behind).
Has anyone had any luck or experience with solving this, but in a database? Are there any database guru's out there who are after a hawt and sexy DB challenge?
please help :)
EDIT 1: Clarification - by clustering, i'm hoping to group x number of points into a single point, for an area. So, if i say cluster everything in a 1 mile / 1 km square, then all the results in that 'square' are GROUP'D into a single result (say ... the middle of the square).
EDIT 2: I'm using MS Sql 2008, but i'm open to hearing if there are other solutions in other DB's.
I'd probably use a modified* version of k-means clustering using the cartesian (e.g. WGS-84 ECF) coordinates for your points. It's easy to implement & converges quickly, and adapts to your data no matter what it looks like. Plus, you can pick k to suit your bandwidth requirements, and each cluster will have the same number of associated points (mod k).
I'd make a table of cluster centroids, and add a field to the original data table to indicate what cluster it belonged too. You'd obviously want to update the clustering periodically if your data is at all dynamic. I don't know if you could do that with a stored procedure & trigger, but perhaps.
*The "modification" would be to adjust the length of the computed centroid vectors so they'd be on the surface of the earth. Otherwise you'd end up with a bunch of points with negative altitude (when converted back to LLH).
If you're clustering on geographic location, and I can't imagine it being anything else :-), you could store the "cluster ID" in the database along with the lat/long co-ordinates.
What I mean by that is to divide the world map into (for example) a 100x100 matrix (10,000 clusters) and each co-ordinate gets assigned to one of those clusters.
Then, you can detect very close coordinates by selecting those in the same square and moderately close ones by selecting those in adjacent squares.
The size of your squares (and therefore the number of them) will be decided by how accurate you need the clustering to be. Obviously, if you only have a 2x2 matrix, you could get some clustering of co-ordinates that are a long way apart.
You will always have the edge cases such as two points close together but in different clusters (one northernmost in one cluster, the other southernmost in another) but you could adjust the cluster size OR post-process the results on the client side.
I did a similar thing for a geographic application where I wanted to ensure I could cache point sets easily. My geohashing code looks like this:
def compute_chunk(latitude, longitude)
(floor_lon(longitude) * 0x1000) | floor_lat(latitude)
end
def floor_lon(longitude)
((longitude + 180) * 10).to_i
end
def floor_lat(latitude)
((latitude + 90) * 10).to_i
end
Everything got really easy from there. I had some code for grabbing all of the chunks from a given point to a given radius that would translate into a single memcache multiget (and some code to backfill that when it was missing).
For movielandmarks.com I used the clustering code from Mike Purvis, one of the authors of Beginning Google Maps Applications with PHP and AJAX. It builds trees of clusters/points for different zoom levels using PHP and MySQL, storing it in the database so that recall is very fast. Some of it may be useful to you even if you are using a different database.
Why not testing multiple approaches?
translate the weka library in .NET CLI with IKVM.NET
add an assembly resulted from your code and weka.dll (use ilmerge) into your database
Make some tests, that is. No specific clustering works better than anyone else.
I believe you can use MSSQL's spatial data types. If they are similar to other spatial data types I know, they will store your points in a tree of rectangles, and then you can go to the lower-resolution rectangles to get implicit clusters.
If you end up wanting to explore Geohash's (which were invented at exactly the same time you posted this question), here's a more fleshed-out implementation of Geohash related functions for SQL Server's TSQL in which you might be interested.
QalGeohash-TSQL
I have used the Integer version of the Geohash extensively to cluster results to reduce data sent to a client for a limited viewport.