I am trying to group points of geospatial data based on density and relative distance. Is there a way that this can be done in SQL Server 2008 using the spatial features or would it be better to translate the data into graph data and use a graph clustering algorithm?
As far as I know there are no inbuilt spatial methods for clustering points in SQL Server 2008. I've never come across any examples of this done in T-SQL / at the database level.
It would be far easier to go with your second approach and do these calculations at the application level - using R, GRASS, MapServer depending on your needs / development preferences.
If it is just for displaying clusters of points (rather than associated analysis) then check out the following links:
OpenLayers
http://openlayers.org/dev/examples/strategy-cluster.html
Google
http://googlemapsapi.martinpearman.co.uk/articles.php?cat_id=1
http://econym.org.uk/gmap/example_clusterer.htm
Python / PostGIS
http://wiki.osgeo.org/wiki/Point_Clustering
Related
We have a content ingestion system which receives (mobile) digital contents of different types (Music, Ringtone, Video, Game, Wallpaper etc) from various providers (Sony, Universal Music, EA Games etc) and then dispatches them across several online stores (e.g. Store1, Store2 etc).
The managers want to know how many of each content type, in a given time window, has been come through from each suppliers and they have gone to which store!
To me it seems like a report that needs an OLAP cube. Am I correct? The problem is that I am a .NET developer and not much skilled in BI and SQ Server Analysis Services therefore I want to make this simple yet flexible and meaningful. Is there an easier way of having a reporting cube, and a data mart to produce reports like this? (I am not sure if we can purchase SSAS and SSIS licenses at all).
And for such data mart and cube, what structure is suggested?
From your description, a cube isn't necessary. Assuming this data is in a database you can just write a query to get that result. If you've bought a licence of SQL Server (i,e, not the free edition) then you already have SSAS, SSIS, SSRS.
Some of a cube's main advantages are:
It's easier for end users to do adhoc reporting
Performance is often better than a relational (SQL Query) source
Some disadvantages are:
You need to spend processing time 'building' the cube
The query language (MDX) can be a challenge to learn
You don't have an adhoc user analysis requirement here
An SSAS cube presented in Excel Pivot Tables is probably still the most powerful and flexible end-user query tool out there, with a very low learning curve (most managers/analysts can already use Excel). Once they have a cube they can satisfy many requirements themselves, without you needing to constantly tweak queries. Even when they do want something more complex, you have a perfect source for report/query design and testing.
But designing and building an SSAS cube is very difficult and they are quite obscure to debug.
I suggest starting with Power Pivot - it's a free Excel Add-In that builds an in-memory cube, and presents the results as Excel Pivot Tables. It scales well through advanced compression and the resulting Model can be published to an SSAS Tabular server. The calculation language is DAX which is an improvement on the horrible MDX - DAX reads more like Excel functions.
This site is probably the best starting point for Power Pivot:
http://www.powerpivotpro.com/
You can solve this with just standard queries or views in SQL Server. Tools such as PowerPivot for Excel also allow you to create local cubes with very little effort.
Of course, purchasing an SSAS license and moving to a cube environment has several advantages, despite the extra cost:
Cubes are faster and allow for more complex calculations than SQL
Queries
With the introduction of the SSAS Tabular Model, making cubes really isn't hard anymore
Creating cubes often forces you to clean up your data model, which has a positive effect on your architecture overall in most cases
Create a cube might be overkilled for your scenario as your data is not quite complicate and not so big. But excel might not enough as it is hard to pivot data in your database directly.
You can try embed WebPivotTable into your website or your application. It provide all functions of excel pivot table and can be connect to CSV/Excel files or connect to database by web service interface. It is web based and the front end user interface are quite intuitive so that users can easily get what he want by simple drag and drops. Here is demo and Documents.
Of course, if you still want to create a cube, this tool can also be very helpful as it can connect to SSAS cubes directly.
I have a bunch of points in a SQL Server database using the geography data type, that I would like to be able to generate thiessen polygons for.
Is this processing available natively within SQL Server, or must this processing be done outside of the system?
As far as I know there's no built-in method to do this, but it can certainly be done using CLR-integration. I borrowed the Pro Spatial with SQL Server 2012 book from a library some time ago and I remember that it included an example of how to do Voronoi tessellations in Chapter 15, and the sample code too for it is available online for the curious to look at.
MongoDB has lots of street cred especially since FourSquare uses it. MS SQL Server 2008 R2 also has Geospatial support.
Which DB is easier/better for doing GPS-like search? e.g. k-nearest points around point X,Y?
If the only Geospatial function you need is to find k-nearest points around point X,Y then any old database will do. Just use Haversine formula, and it's been implemented in a bunch of languages.
Simple Geospatial Queries with MongoDB
http://blog.mxunit.org/2010/11/simple-geospatial-queries-with-mongodb.html
Much easier to use then SQL Server! I like it.
To make a long story short, SQL Server is way more robust than mongo with geospatial support. However, if you are just storing points on a map and want to calculate distances, mongo is more than adequate.
MongoDB supports Geospatial points. SQL Server supports geospatial objects of an arbitrary number of points.
Either solution will be fine for your geospatial needs, so its more about your data model, scalability, and what database your comfortable with.
You might also want to look at PostGre. GeoSpatial data processing is where PostGre excels in. Can't really tell you about Windows performance however
ColdFusion 9's full text search is now based on Apache Lucene Solr (or Verity, but it has too much limitations). We also use SQL Server.
Which one's better? Which one's easier?
UPDATE: going to use for... searching against the name & description fields of the Products table.
Thanks!
Here's my 2 cents tested with ~ 3 000 000 of images with captions (primary key + image caption text from 100 to 500 chars):
CF9's Solr implementation is fast in returning results, really easy to setup, fairly fast during building index.
SQL Server 2005 FTS wasn't good enough, tried it some time ago and didn't put it in production. SQL Server 2008 FTS is much better though, currently using it on our application. But basic setup had to be adjusted in order to get high level results.
Based on experiences of other colleagues working with huge data sets and applications mostly based on search and finding things I made my top list:
Lucene
Tuned SQL Server 2008 FTS
Solr
SQL Server 2005
Of course CF9's Solr is winner here if you are chasing fast setup since you need 3 tags to finish the job and get awesome results.
The important question: What are you going to use it for?
Can't pick the right tool for the job when you don't know what the job is ;)
At work we are having a bit of trouble with spatial support of SQL Server 2008.
In SQL Server 2008 we have a big system in production going on, managing a bunch of important stuff. In some of the tables, I have pairs of coordinates and need to display them to ArcGIS and other GIS software.
My question here really is: Is it possible to use DBI-Link (PostgreSQL tool) to connect to SQL Server 2008?
What kind of performance loss should I expect? I don't expect to conduct complicated queries. It's just a matter of reading from PostgreSQL a view inside SQL Server 2008 (a simple view, such as SELECT * FROM foo).
So, what are you thoughts about this? I know this is a bit haxor solution, but inside SQL Server I lose a lot of spatial handling functions, and all my databases in SQL Server stores are coordinate pairs.
Yes, that should work fine, as long as you have a DBI driver properly set up.
Performance - depends on what you're doing. DBI-link doesn't have the ability to push down restrictions, so if your view is on "SELECT * FROM foo", it will always do that. If your app does "SELECT * FROM myview WHERE pk=1", it will still request the whole table with SELECT * and then filter it on the pg side. You may be better off using functions that can adapt the query.
As long as your queries don't shuffle lots of data, performance is usually pretty decent.