Combining multiple geo sources in Tableau - maps

I want to create a filled map of administrative regions (districts, counties etc.) for a particular country where the intensity of color depends on some measure of these closed areas.
I have 2 data sources.
This contains the lat-long values and name of the area. This table contains all the points required to define a closed polygon for an administrative area.
Here I have a numeric value and lat-long positions of where this value was recorded.
Using tableau and the first data source, I could create a filled map of the administrative regions and assign arbitrary colors to them.
I want to use the second data source and use some aggregate, say AVG, on my numeric value that was recorded at lat-long values that fall in one of the polygons computed earlier and use them to color the polygons.
Is there some way to do this?
(I have not used Tableau in the past. But, I loved the way it can handle some basic plotting tasks. I have a feeling that it should be possible with Tableau, but could not find any tutorial on how to get this.)

Related

How do I design a database schema to compute IOU for two regions (coordinates)

Short Question
I want to store coordinates of multiple polygons (regions) in a postgres database. What I would need is to be able to get is all region pairs which have an IOU (Intersection Over Union) > 0.5 (any value). If one region has more than one matching regions, pick the one with the highest IOU.
It would be very helpful if someone can give me the approach on how the schema should be and what kind of SQL queries would be needed to achieve this.
Long Question
Context: Users and AI models add annotations on files (images) in our platform. Let's say AI draws 2 boxes on an image. Box1 with label l1 and Box2 with label l2. User draws 1 box on the same image named Box3 with label l1.
There could be millions of such files and we want to compute various detection and classification metrics from the above information.
Detection metrics would be based on if the box detected by AI matches the user's box or not. We rely on IOU to understand if 2 boxes match or not.
Classification metrics would be on top of those boxes which are determined to be correct based on the IOU by checking if label given by user is there in the labels given by AI.
I want an approach on what kind of DB schema should be used for this kinds of problem statements and how complex the SQL queries would be in terms of performance

Pre-Process polygons and linestrings into grid areas to partition data

Due to the size, number, and performance of my polygon queries (polygon in polygon) I would like to pre-process my data and separate the polygons into grids. My data is pretty uniform in my area of interest so like 12 even grids would work well. I may adjust this number later based on performance. Basically I am going to create 12 tables with associated spatial indexes or possibly I will just create a single table with a partition key of my grid. This will reduce my total index size 12x and hopefully increase performance. From the query side I will direct the query to the appropriate table.
The key is for me to be able to figure out how to group polygons into these grids. If the polygon falls within multiple grids then I would likely create a record in each and de-duplicate upon query. I wouldn't expect this to happen very often.
Essentially I will have a "grid" that I want to intersect my polygon and figure out what grids the polygon falls in.
Thanks
My process would be something like this:
Find the MIN/MAX ordinate values for your whole data set (both axes)
Extend those values by a margin that seems appropriate (in case the ordinates when combined don't form a regular rectangular shape)
Write a small loop that generates polygons at a set interval within those MIN/MAX ordinates - i.e. create one polygon per grid square
Use the SDO_COVERS to see which of the grid squares cover each polygon. If multiple grid squares cover a polygon, you should see multiple matches as you describe.
I also agree with your strategy of partitioning the data within a single table. I have heard positive comments about this, but I have never personally tried it. The overhead of going to multiple tables seems like something you'll want to avoid though.

Map displaying many measures

I need to create a SSRS report with many measures in a map.
For example, how do I display Total sales in colour and Prices in bullets?
There is a map component to SSRS, but you'll need a spatial data set that defines the points or shapes you want your measures mapping to. There are a bunch of US-centric ones it comes with, and maybe a world one at the country level.
I'm not sure about how much information you could put on the annotations, but you can also combine traditional reporting techniques with the map and have a side bar with your metrics.
One caveat, I think the maximum size of the map component is 36 inches (last time I used it anyways), if you're looking for larger you'll likely need to use some GIS software instead.

Get info from map data file by coordinate

Imagine I have a map shape file (.shp) or osm xml, I'm able to see different kind of data from different layers in GIS oriented programs, e.g. ArcGIS, QGIS etc. But how can I get this info programmatically? Is there a specific library for that?
What I'm really looking for is a some kind of method getMapData(longitude, latitude) to get landscape/terrain info (e.g. forest, river, city, highway) in specified location
Thanks in advance for your answers!
It still depends what you want to achieve whether you are better off using raster or vector data.
If your are using your grid to subdivide an area as an array of containers for geographic features, then stick with vector data. To do this, I would create a polygon grid file and intersect it with each of your data layers. You can then add an ID field that represents the cell's location in the array (and hence it's relative position to a known lat/long coordinate - let's say lower left). Alternatively you can use spatial queries to access your data by selecting a polygon in your vector grid file and then finding all the features in your other file that are contained by it.
OTOH, if you want to do some multi-feature analysis based on presence/abscence then you may be better going down the route of raster analysis. My gut feeling from what you have said is that this is what you are trying to achieve but I am still not 100% sure. You would handle this by creating a set of boolean rasters of a suitable resolution and then performing maths operations on the set (add, subtract, average etc - depending on what questions your are asking).
Let's say you are looking at animal migration. Let's say your model assumes that streams, hedges and towns are all obstacles to migration but roads only reduce the chance of an area being crossed. So you convert your obstacles to a value of '1' and NoData to '0' in each case, except roads where you decide to set the value to 0.5. You can then add all your rasters together in one big stack and predict migration routes.
Ok that's a simplistic example but perhaps you can see why we need EVEN more information on what you are wanting to do.
Shapefiles or an osm xml file are just containers that hold geometric shapes. There are plenty of software libraries out there that let you read these files and extract the data. I would recommend looking at GDAL/OGR as a starting point.
A method like getMapData(longitude, latitude) is essentially a search/query function. You need to be a little more specific too, do you want geometries that contain the point, are within a distance of a point, etc?
You could find the map data using a brute force algorithm
for shape in shapefile:
if shape.contains(query_point):
return shape
Or you can use more advanced algorithms/data structures such as RTrees, KDTrees, QuadTrees, etc. The easiest way to get start with querying map data is to load it into a spatial database. I would recommending investigating PostgreSQL+PostGIS and SpatiaLite
You may also like to look at Spatialite and/or PostGIS which are two spatial enabled databses that you could use separately or in conjunction with GDAL/OGR.
I must echo Charles' request that you explain your use-case in more detail because the actual implementation will depend greatly on exactly what you are wanting to achieve. My reading of this is that you may want to convert your data into a series of aligned rasters which you can overlay and treat as a 3 dimensional array.

How to efficiently query large multi-polygons with PostGIS

I am working with radio maps that seem to be too fragmented to query efficiently. The response time is 20-40 seconds when I ask if a single point is within the multipolygon (I have tested "within"/"contains"/"overlaps"). I use PostGIS, with GeoDjango to abstract the queries.
The multi-polygon column has a GiST index, and I have tried VACUUM ANALYZE. I use PostgreSQL 8.3.7. and Django 1.2.
The maps stretch over large geographical areas. They were originally generated by a topography-aware radio tool, and the radio cells/polygons are therefore fragmented.
My goal is to query for points within the multipolygons (i.e. houses that may or may not be covered by the signals).
All the radio maps are made up of between 100.000 and 300.000 vertices (total), with wildly varying number of polygons. Some of the maps have less than 10 polygons. From there it jumps to between 10.000 and 30.000 polygons. The ratio of polygons to vertices does not seem to effect the time the queries take to complete very much.
I use a projected coordinate system, and use the same system for both houses and radio sectors. Qgis shows that the radio sectors and maps are correctly placed in the terrain.
My test queries are with only one house at a time within a single radio map. I have tested queries like "within"/"contains"/"overlaps", and the results are the same:
Sub-second response if the house is "far from" the radio map (I guess this is because it is outside the bounding box that is automatically used in the query).
20-40 seconds response time if the house/point is close to or within the radio map.
Do I have alternative ways to optimize queries, or must I change/simplify the source material in some way? Any advice is appreciated.
Hallo
The first thing I would do was to split the multipolygons into single polygons and create a new index. Then the index will work a lot more effective. Now the whole multipolygon has one big bounding box and the index can do nothing more than tell if the house is inside the bounding box. So, the smaller polygons in relation to the whole dataset, the more effective index-use. There are even techniques to split single polygons into smaller ones with a grid to get the index-part of the query even more effective. But, the first thing would be to split the multi polygons into single ones with ST_Dump(). If you have a lot of attributes in the same table it would be wise to put that into another table and only keep an ID telling what radiomap it belongs to. Otherwise you will get a lot of duplicated attribute data.
HTH
Nicklas

Resources