Geospatial data in SQL - sql-server

I have been experimenting with geography datatype lately and just love it. But I can't decide should i convert from my current schema, that stores latitude and longitude in two separate numeric(9,5) fields to geography type. I have calculated the size of both types and Lat/Long way of representing a point is 28 bytes for a single point whereas geography type is 26. Not a big gain in space but huge improvement in performing geospatial operations (intersect, distance measurement etc.) which are currently handled using awkward stored procedures and scalar functions. What I wonder is the indices. Will geography data type require more space for indexing the data? I have a feeling that it will, even though the actual data stored in columns is less, I thing the way geospatial indices work will eventually result in larger space allocation for them.
P.S. as a side note, it seems that SQL Server 2008 (not R2) does not automatically seek through geospatial indices unless explicitly told to using WITH(INDEX()) clause

In my opinion you should definitely use the spatial types only. The spatial type are optimized for spatial queries and if spatial queries are what you need then I think it is an easy choice.
As a sideeffect you can get rid of your geographical functions and procedures since they are (probably) built-in in SQL server 2008. One caveat though, you might have to spend some time optimizing the spatial indexes, but this depends on your specific case.

I understand that you are trying to decide between keep one of the two, but you might want to consider keeping both. If you export your data into shape files, its a common practice to let lat lon field be along with the geom field.

I would keep both. It can be useful to easily query the original coordinates of a particular feature without requiring spatial operations. You have the benefit of knowing the original points as well as the ability to create a new geometry from them in case you need it in a different coordinate system (like if you have your geometry in a particular projection that will lose a lot of precision going to another).

Related

Using STCrosses() with a Spatial Index in SQL Server

Does The Microsoft StCrosses() function for Geography data support Spatial Index?
When I try to execute this function with Spatial Index I get this error message:
"The query processor could not produce a query plan for a query with a spatial index hint. Reason: Spatial indexes do not support the method name supplied in the predicate. Try removing the index hints or removing SET FORCEPLAN"
No.
Indexing spatial data is nontrivial, and the class you are discussing can contain arbitrarily complex figures, not just simple geometric shapes. The specific way shapes and indexing is implemented can make finding overlaps difficult or impossible in the general case. It's also not based on whatever is indexed of the spatial data for complex geometries. This may be why you can't require SQL to only use the index - there is not enough data there. In the degenerate case there may be, but it would not know that, so it is turned off.
Imagine having a star-shape, with complex things embedded in it. The index may only store the boundary of the outer shpe, or the center of the shape, or the bounding rectangle. None of these would be enough to compute the intersect of 2 shapes, or if the shapes actually overlap.
See http://msdn.microsoft.com/en-us/library/bb895265.aspx#geometry to confirm that it is not supported.

Should I use SqlGeometry or SqlGeography?

We're in a bit of internal conflict on this issue, and can't seem to come to a happy conclusion.
We'll only be storing latitudes and longitudes, and possibly simple polygons. All we need it for is computing distance between two points (and possibly to see if a point is within a polygon), and the entirety of the data is in such close proximity to make planar estimations acceptable.
Since our requirements are so relaxed, half of the dev team suggests using SqlGeometry types, which are apparently simpler. I'm having trouble accepting this, though, since we're storing geographic data, which seems like storing them in SqlGeography is the right thing to do. Also, I'm not finding any substantive evidence that the SqlGeometry data type is that much easier to work with than the SqlGeography type.
Does anyone have advice as to which type would be more appropriate for this relatively simple scenario?
It's not a question of comparing features, or accuracy, or simplicity - the two spatial datatypes are for working with different sorts of data.
As an analogy, suppose you were choosing the best datatype for a column that contained a unique identifier for each row. If that UID only contained integer values, you'd use int, whereas if it was a 6-character alphanumeric value you'd use char(6). And if it had variable-length unicode values, you'd use nvarchar instead, right?
The same logic goes for spatial data - you choose the appropriate datatype based on the values that that column contains; if you're working with geographic (i.e. latitude/longitude) coordinates, use the SqlGeography datatype. It's that simple.
You can use SqlGeometry to store latitude/longitude values, but it would be like using nvarchar(max) to store an integer... and I promise you it will lead to further problems down the line (when all your area calculations come out measured in degrees squared, for example)
The SqlGeography type has less methods available than SqlGeometry (especially in Sql 2008).
SqlGeography reference
SqlGeometry reference
For example, suppose you want to get the centroid of a polygon in Sql2008. You have a native method for that in geometry, but not in geography.
Also, it has the following limitations:
You can't have a geography exceeding one hemisphere
The ring-order matters when creating the polygon
Also, most API and libraries available (that I know of) handle geometries better than geographies.
That said, if the distance calculation has to be precise, you have large distances and have coordinates all over the world, geography would probably be a better fit. Otherwise, and according to your description of the problem, you would be well served with the geometry type.
Regarding your question: "is that much easier to work?". It depends. Anyway, and as a rule of thumb, for simple scenarios I typically opt for SqlGeometry.
Anyway, IMHO you shouldn't worry too much on that decision. It's relatively easy to create a new column with the other type and migrate the data if necessary.
Four years later, it has become apparent that we should have stored the data in SqlGeometry instead of SqlGeography.
Why?
We were importing information from legislative district maps, and their data was stored in SqlGeometry. When determining if a particular lat/long was within a certain legislative district boundary, we'd get inconsistent results when the point was close to two boundaries.
This required us to do additional work to identify locations that were "close" to a boundary, and manually verify that they were assigned to the proper district. Not ideal.
Moral of the story: if you're relying on any data, consider what type it's stored in to help guide your decision.

calculate or save spatial data

When working with spatial data in a data base like Postgis, is it a good way to calculate on every SELECT the intersections of two polygons or the area of polygons? Or is it better for performance issues to do the calculations on an INSERT, UPDATE or DELETE-statement and save the results in a column of the tables? How is the approach in big spatial data bases?
Thanks for an answer.
The questuion is too abstract.
Of course if you use the intersection area (ST_Intersection) you should store the result of ST_Intersection geometry. But in practice we often have to calculate intersection on-fly, because entrance arguments depend on dynamical parameters (e.g. intersection of an area with temperature <30C with and area of wind > 20 ms). By the way you can use VIEW to simplify a query in that way.
Of course if your table contains both geometrical column-arguments or one of them is constant it's better to store the intersection. Particularly you can build spatial indexes for this column.
There is no any constant rules. You should be guided by practice conditions: size of database, type of using etc. For example I store the generated ellipse (belief zone) for point of lightning stroke, but I don't store facts of intersectioning (boolean) with powerlines because those intersetionings may be parametrized.

What is the most flexible data type in SQL Server 2008 R2?

I am experimenting with a possible data structure for an app of mine, and I need to provision a column in one of my SQL Server data tables to hold various data of unpredictable size.
Literally, this could mean a string of text, or a Base64 encoded video clip and everything in between.
I realize that the instant response is going to be that I should provision different tables for different types -- and I don't disagree -- but please humor me here.
varchar(MAX)?
nvarchar(MAX)?
I am not a DBA so I don't know what type gives me the most flexibility for the lowest storage cost.
VARBINARY(MAX)?
In principle, trying to force multiple different data types into a single type is a bad idea. You may be better served with a different table for each type. But if you're never going to search the field, you should be able to do anything with a binary field...
Consider using the xml datatype. It will permit you to store, query and index arbitrary XML documents.

Stringly typed values table in sql, is there a better way to do this? (we're using MSSQL)

We have have a table layout with property names in one table, and values in a second table, and items in a third. (Yes, we're re-implementing tables in SQL.)
We join all three to get a value of a property for a specific item.
Unfortunately the values can have multiple data types double, varchar, bit, etc. Currently the consensus is to stringly type all the values and store the type name in the column next to the value.
tblValues
DataTypeName nvarchar
Is there a better, cleaner way to do this?
Clarifications:
Our requirements state that we must add new "attributes" at run time without modifying the db schema
I would prefer not to use EAV, but that is the direction we are headed right now.
This system currently exists in SQL server using a traditional db design, but I can't see a way to fulfill our requirement of not modifying the db schema without moving to EAV.
There are really only two patterns for implementing an 'EAV model' (assuming that's what you want to do):
Implement it as you've described, where you explicitly store the property value type along with the value, and use that to convert the string values stored into the appropriate 'native' types in the application(s) that access the DB.
Add a separate column for each possible datatype you might store as a property value. You could also include a column that indicates the property value type, but it wouldn't be strictly necessary.
Solution 1 is a simpler design, but it incurs the overhead of converting the string values stored in the table into the appropriate data type as needed.
Solution 2 has the benefit of storing values as the appropriate native type, but it will necessarily require more, though not necessarily much more, space. This may be moot if there aren't a lot of rows in this table. You may want to add a check constraint that only allows one non-NULL value in the different value columns, or if you're including a type column (so as to avoid checking for non-NULL values in the different value columns), prevent mismatches between the value stored in the type column and which value column contains the non-NULL value.
As HLGEM states in her answer, this is less preferred than a standard relational design, but I'm more sympathetic to the use of EAV model table designs for data such as application option settings.
Well don't do that! You lose all the values of having datatypes if you do. You can't properly constrain them (and will, I guarantee it, get bad data eventually) and you have to cast them back to the proper type to use in mathematical or date calculations. All in all a performance loser.
Your whole design will not scale well. Read up on why you don't want to use EAV tables in a relational database. It is not only generally slower but unusually difficult to query especially for reporting.
Perhaps a noSQL database would better suit your needs or a proper relational design and NOT an EAV design. Is it really too hard to figure out what fields each table would really need or are your developers just lazy? Are you sacrificing performance for flexibility - a flexibility that most users will hate? Especially when it means bad performance? Have you ever used a database designed that way to try to do anything?

Resources