Map displaying many measures - sql-server

I need to create a SSRS report with many measures in a map.
For example, how do I display Total sales in colour and Prices in bullets?

There is a map component to SSRS, but you'll need a spatial data set that defines the points or shapes you want your measures mapping to. There are a bunch of US-centric ones it comes with, and maybe a world one at the country level.
I'm not sure about how much information you could put on the annotations, but you can also combine traditional reporting techniques with the map and have a side bar with your metrics.
One caveat, I think the maximum size of the map component is 36 inches (last time I used it anyways), if you're looking for larger you'll likely need to use some GIS software instead.

Related

Does Apache Superset support Weighted Averages?

I'm trying to use Apache Superset to create a dashboard that will display the average rate of X/Y at different entities such that the time grain can be changed on the fly. However, all I have available as raw data is daily totals of X and Y for the entities in question.
It would be simple to do if I could just get a line chart that displayed sum(X)/sum(Y) as its own metric, where the sum range would change with the time grain, but that doesn't seem to be supported.
Creating a function in SQLAlchemy that calculates the daily rates and then uses that as the raw data is also an insufficient solution, since taking the average of that over different time ranges would not be properly weighted.
Is there a workaround I'm not seeing?
Is there a way to use Druid or some other tool to make displaying a quotient over a variable range possible?
My current best solution is to just set up different charts for each time grain size (day, month, quarter, year), but that's extremely inelegant and I'm hoping to do better.
There are multiple ways to do this, one is using the Metric editor as shown bellow, in this case the metric definition is stored as part of the chart.
Another way is to define a metric in the "datasource editor", where the metric will be stored with the datasource definition, and become reusable for any chart using this datasource, as shown here
Side note: depending on the database you use, you may have to CAST from say an integer to a numeric type as I did in the example, or multiply by 100 in order to get a proper result that's useful.

Deep Learning: Dataset containing images of varying dimensions and orientations

This is a repost of the question asked in ai.stackexchange. Since there is not much traction in that forum, I thought I might try my chances here.
I have a dataset of images of varying dimensions of a certain object. A few images of the object are also in varying orientations. The objective is to learn the features of the object (using Autoencoders).
Is it possible to create a network with layers that account for varying dimensions and orientations of the input image, or should I strictly consider a dataset containing images of uniform dimensions? What is the necessary criteria of an eligible dataset to be used for training a Deep Network in general.
The idea is, I want to avoid pre-processing my dataset by normalizing it via scaling, re-orienting operations etc. I would like my network to account for the variability in dimensions and orientations. Please point me to resources for the same.
EDIT:
As an example, consider a dataset consisting of images of bananas. They are of varying sizes, say, 265x525 px, 1200x1200 px, 165x520 px etc. 90% of the images display the banana in one orthogonal orientation (say, front view) and the rest display the banana in varying orientations (say, isometric views).
Almost always people will resize all their images to the same size before sending them to the CNN. Unless you're up for a real challenge this is probably what you should do.
That said, it is possible to build a single CNN that takes input of images as varying dimensions. There are a number of ways you might try to do this, and I'm not aware of any published science analyzing these different choices. The key is that the set of learned parameters needs to be shared between the different inputs sizes. While convolutions can be applied at different images sizes, ultimately they always get converted to a single vector to make predictions with, and the size of that vector will depend on the geometries of the inputs, convolutions and pooling layers. You'd probably want to dynamically change the pooling layers based on the input geometry and leave the convolutions the same, since the convolutional layers have parameters and pooling usually doesn't. So on bigger images you pool more aggressively.
Practically you'd want to group together similarly (identically) sized images together into minibatches for efficient processing. This is common for LSTM type models. This technique is commonly called "bucketing". See for example http://mxnet.io/how_to/bucketing.html for a description of how to do this efficiently.
Is it possible to create a network with layers that account for varying dimensions and orientations of the input image
The usual way to deal with different images is the following:
You take one or multiple crops of the image to make width = height. If you take multiple crops, you pass all of them through the network and average the results.
You scale the crop(s) to the size which is necessary for the network.
However, there is also Global Average Pooling (e.g. Keras docs).
What is the necessary criteria of an eligible dataset to be used for training a Deep Network in general.
That is a difficult question to answer as (1) there are many different approaches in deep learning and the field is quite young (2) I'm pretty sure there is no quantitative answer right now.
Here are two rules of thumb:
You should have at least 50 examples per class
The more parameters your model has, the more data you need
Learning curves and validation curves help to estimate the effect of more training data.

Databasing to feed ~9k data points to High Charts

I am working on a freelance project that captures an audio file, runs some fourier analysis, and spits out three charts (x-y plots). Each chart has about ~3000 data points, which I plan to display with High Charts in the browser.
What database techniques do you recommend for storing and accessing this much data? Should I be storing the points in an array or in multiple rows? I'm considering Mongo too. Plan is to use Rails, so I was hoping to use a single database for both data and authentication.
I haven't dealt with queries accessing this much data for a single page, and this may very well be a tiny overall amount of data. In addition this is an MVP for demonstration to investors, so making it scalable to huge levels isn't of immediate concern.
My initial thought is that using Postgres and having one large table of data points, stored per-row, will be fine, and that that a bunch of doubles is not going to be too memory-intensive relative to images and such.
Realistically, I may just pull 100 evenly-spaced data points to make the chart, but the original data must still be stored.
I've done a lot of Mongo work and I can tell you what I would do if I were you.
One of the very nice properties about your data is that the x,y coordinates are of a fixed size generally. In other words it's not like you are storing comments from users, which can vary greatly in size.
With Mongo I would first make a sample document with the 3,000 points. Just a simple array of x,y points. I would see how big that document is and how my front end handled it - in other words can High Charts handle that?
I would also try to stick to the easiest conceptual model to manage, which is one document per chart, each chart having 3k points. This is a natural way to think of the data and I would start there and see if there were any performance hits. Mongo can easily store those documents, so I think the biggest pain would be in the UI with rendering the data.
Mongo would handle authentication well. I think it's a good choice for general data storage for an MVP.

Get info from map data file by coordinate

Imagine I have a map shape file (.shp) or osm xml, I'm able to see different kind of data from different layers in GIS oriented programs, e.g. ArcGIS, QGIS etc. But how can I get this info programmatically? Is there a specific library for that?
What I'm really looking for is a some kind of method getMapData(longitude, latitude) to get landscape/terrain info (e.g. forest, river, city, highway) in specified location
Thanks in advance for your answers!
It still depends what you want to achieve whether you are better off using raster or vector data.
If your are using your grid to subdivide an area as an array of containers for geographic features, then stick with vector data. To do this, I would create a polygon grid file and intersect it with each of your data layers. You can then add an ID field that represents the cell's location in the array (and hence it's relative position to a known lat/long coordinate - let's say lower left). Alternatively you can use spatial queries to access your data by selecting a polygon in your vector grid file and then finding all the features in your other file that are contained by it.
OTOH, if you want to do some multi-feature analysis based on presence/abscence then you may be better going down the route of raster analysis. My gut feeling from what you have said is that this is what you are trying to achieve but I am still not 100% sure. You would handle this by creating a set of boolean rasters of a suitable resolution and then performing maths operations on the set (add, subtract, average etc - depending on what questions your are asking).
Let's say you are looking at animal migration. Let's say your model assumes that streams, hedges and towns are all obstacles to migration but roads only reduce the chance of an area being crossed. So you convert your obstacles to a value of '1' and NoData to '0' in each case, except roads where you decide to set the value to 0.5. You can then add all your rasters together in one big stack and predict migration routes.
Ok that's a simplistic example but perhaps you can see why we need EVEN more information on what you are wanting to do.
Shapefiles or an osm xml file are just containers that hold geometric shapes. There are plenty of software libraries out there that let you read these files and extract the data. I would recommend looking at GDAL/OGR as a starting point.
A method like getMapData(longitude, latitude) is essentially a search/query function. You need to be a little more specific too, do you want geometries that contain the point, are within a distance of a point, etc?
You could find the map data using a brute force algorithm
for shape in shapefile:
if shape.contains(query_point):
return shape
Or you can use more advanced algorithms/data structures such as RTrees, KDTrees, QuadTrees, etc. The easiest way to get start with querying map data is to load it into a spatial database. I would recommending investigating PostgreSQL+PostGIS and SpatiaLite
You may also like to look at Spatialite and/or PostGIS which are two spatial enabled databses that you could use separately or in conjunction with GDAL/OGR.
I must echo Charles' request that you explain your use-case in more detail because the actual implementation will depend greatly on exactly what you are wanting to achieve. My reading of this is that you may want to convert your data into a series of aligned rasters which you can overlay and treat as a 3 dimensional array.

How to efficiently query large multi-polygons with PostGIS

I am working with radio maps that seem to be too fragmented to query efficiently. The response time is 20-40 seconds when I ask if a single point is within the multipolygon (I have tested "within"/"contains"/"overlaps"). I use PostGIS, with GeoDjango to abstract the queries.
The multi-polygon column has a GiST index, and I have tried VACUUM ANALYZE. I use PostgreSQL 8.3.7. and Django 1.2.
The maps stretch over large geographical areas. They were originally generated by a topography-aware radio tool, and the radio cells/polygons are therefore fragmented.
My goal is to query for points within the multipolygons (i.e. houses that may or may not be covered by the signals).
All the radio maps are made up of between 100.000 and 300.000 vertices (total), with wildly varying number of polygons. Some of the maps have less than 10 polygons. From there it jumps to between 10.000 and 30.000 polygons. The ratio of polygons to vertices does not seem to effect the time the queries take to complete very much.
I use a projected coordinate system, and use the same system for both houses and radio sectors. Qgis shows that the radio sectors and maps are correctly placed in the terrain.
My test queries are with only one house at a time within a single radio map. I have tested queries like "within"/"contains"/"overlaps", and the results are the same:
Sub-second response if the house is "far from" the radio map (I guess this is because it is outside the bounding box that is automatically used in the query).
20-40 seconds response time if the house/point is close to or within the radio map.
Do I have alternative ways to optimize queries, or must I change/simplify the source material in some way? Any advice is appreciated.
Hallo
The first thing I would do was to split the multipolygons into single polygons and create a new index. Then the index will work a lot more effective. Now the whole multipolygon has one big bounding box and the index can do nothing more than tell if the house is inside the bounding box. So, the smaller polygons in relation to the whole dataset, the more effective index-use. There are even techniques to split single polygons into smaller ones with a grid to get the index-part of the query even more effective. But, the first thing would be to split the multi polygons into single ones with ST_Dump(). If you have a lot of attributes in the same table it would be wise to put that into another table and only keep an ID telling what radiomap it belongs to. Otherwise you will get a lot of duplicated attribute data.
HTH
Nicklas

Resources