Geospatial support, SQL Server 08 vs MongoDB? - sql-server

MongoDB has lots of street cred especially since FourSquare uses it. MS SQL Server 2008 R2 also has Geospatial support.
Which DB is easier/better for doing GPS-like search? e.g. k-nearest points around point X,Y?

If the only Geospatial function you need is to find k-nearest points around point X,Y then any old database will do. Just use Haversine formula, and it's been implemented in a bunch of languages.

Simple Geospatial Queries with MongoDB
http://blog.mxunit.org/2010/11/simple-geospatial-queries-with-mongodb.html
Much easier to use then SQL Server! I like it.

To make a long story short, SQL Server is way more robust than mongo with geospatial support. However, if you are just storing points on a map and want to calculate distances, mongo is more than adequate.
MongoDB supports Geospatial points. SQL Server supports geospatial objects of an arbitrary number of points.
Either solution will be fine for your geospatial needs, so its more about your data model, scalability, and what database your comfortable with.

You might also want to look at PostGre. GeoSpatial data processing is where PostGre excels in. Can't really tell you about Windows performance however

Related

Do I need a cube?

We have a content ingestion system which receives (mobile) digital contents of different types (Music, Ringtone, Video, Game, Wallpaper etc) from various providers (Sony, Universal Music, EA Games etc) and then dispatches them across several online stores (e.g. Store1, Store2 etc).
The managers want to know how many of each content type, in a given time window, has been come through from each suppliers and they have gone to which store!
To me it seems like a report that needs an OLAP cube. Am I correct? The problem is that I am a .NET developer and not much skilled in BI and SQ Server Analysis Services therefore I want to make this simple yet flexible and meaningful. Is there an easier way of having a reporting cube, and a data mart to produce reports like this? (I am not sure if we can purchase SSAS and SSIS licenses at all).
And for such data mart and cube, what structure is suggested?
From your description, a cube isn't necessary. Assuming this data is in a database you can just write a query to get that result. If you've bought a licence of SQL Server (i,e, not the free edition) then you already have SSAS, SSIS, SSRS.
Some of a cube's main advantages are:
It's easier for end users to do adhoc reporting
Performance is often better than a relational (SQL Query) source
Some disadvantages are:
You need to spend processing time 'building' the cube
The query language (MDX) can be a challenge to learn
You don't have an adhoc user analysis requirement here
An SSAS cube presented in Excel Pivot Tables is probably still the most powerful and flexible end-user query tool out there, with a very low learning curve (most managers/analysts can already use Excel). Once they have a cube they can satisfy many requirements themselves, without you needing to constantly tweak queries. Even when they do want something more complex, you have a perfect source for report/query design and testing.
But designing and building an SSAS cube is very difficult and they are quite obscure to debug.
I suggest starting with Power Pivot - it's a free Excel Add-In that builds an in-memory cube, and presents the results as Excel Pivot Tables. It scales well through advanced compression and the resulting Model can be published to an SSAS Tabular server. The calculation language is DAX which is an improvement on the horrible MDX - DAX reads more like Excel functions.
This site is probably the best starting point for Power Pivot:
http://www.powerpivotpro.com/
You can solve this with just standard queries or views in SQL Server. Tools such as PowerPivot for Excel also allow you to create local cubes with very little effort.
Of course, purchasing an SSAS license and moving to a cube environment has several advantages, despite the extra cost:
Cubes are faster and allow for more complex calculations than SQL
Queries
With the introduction of the SSAS Tabular Model, making cubes really isn't hard anymore
Creating cubes often forces you to clean up your data model, which has a positive effect on your architecture overall in most cases
Create a cube might be overkilled for your scenario as your data is not quite complicate and not so big. But excel might not enough as it is hard to pivot data in your database directly.
You can try embed WebPivotTable into your website or your application. It provide all functions of excel pivot table and can be connect to CSV/Excel files or connect to database by web service interface. It is web based and the front end user interface are quite intuitive so that users can easily get what he want by simple drag and drops. Here is demo and Documents.
Of course, if you still want to create a cube, this tool can also be very helpful as it can connect to SSAS cubes directly.

Is it possible to generate generate thiessen (Voronoï) polygons using the SQL Server Spatial datatypes?

I have a bunch of points in a SQL Server database using the geography data type, that I would like to be able to generate thiessen polygons for.
Is this processing available natively within SQL Server, or must this processing be done outside of the system?
As far as I know there's no built-in method to do this, but it can certainly be done using CLR-integration. I borrowed the Pro Spatial with SQL Server 2012 book from a library some time ago and I remember that it included an example of how to do Voronoi tessellations in Chapter 15, and the sample code too for it is available online for the curious to look at.

SQL Server 2008 Spatial Clustering

I am trying to group points of geospatial data based on density and relative distance. Is there a way that this can be done in SQL Server 2008 using the spatial features or would it be better to translate the data into graph data and use a graph clustering algorithm?
As far as I know there are no inbuilt spatial methods for clustering points in SQL Server 2008. I've never come across any examples of this done in T-SQL / at the database level.
It would be far easier to go with your second approach and do these calculations at the application level - using R, GRASS, MapServer depending on your needs / development preferences.
If it is just for displaying clusters of points (rather than associated analysis) then check out the following links:
OpenLayers
http://openlayers.org/dev/examples/strategy-cluster.html
Google
http://googlemapsapi.martinpearman.co.uk/articles.php?cat_id=1
http://econym.org.uk/gmap/example_clusterer.htm
Python / PostGIS
http://wiki.osgeo.org/wiki/Point_Clustering

Object Relational Features of SQL Server

Does anyone know a good reference to look into what Object Relational features are available in SQL Server (any version)? I found a really good summery for Oracle but all I can find for SQL Server is information about LINQ to SQL, which is good stuff, but I'm looking for more power in the database like defined types, nested tables, etc.
I know you can use CLR types in SQL Server, and that would be interesting to me too, I just am looking for a good place to read about all the OR features it has.
PS. I'm willing to purchase a book.
You should read Best Practices for Semantic Data Modeling for Performance and Scalability.
The SQL Server is not as object relational as one might expect - not that long ago I realized that it does not (even) support table inheritance.

Searching full-text fields in SQL Server to detect plagiarism

I'm storing papers in SQL Server 2005 and am looking for a way to paste in the text of a paper and then search for potential plagiarism (copied content) in the database.
What's the best way to go about this? Is there a way to get a gauge for the extent to which something is similar to something else using full-text indexing, for several paragraphs of content?
why don't you install google desktop and have it only index that one directory
then you can have google do the indexing for you
This is not really the sort of problem that full-text indexing in SQL Server is designed to solve. There's nothing built in to SQL Server that you can really use to help with this.
There are a number of specialised plagiarism detection tools, which a Google search will turn up for you. That's probably your best bet.

Resources