Should I use SqlGeometry or SqlGeography? - sql-server

We're in a bit of internal conflict on this issue, and can't seem to come to a happy conclusion.
We'll only be storing latitudes and longitudes, and possibly simple polygons. All we need it for is computing distance between two points (and possibly to see if a point is within a polygon), and the entirety of the data is in such close proximity to make planar estimations acceptable.
Since our requirements are so relaxed, half of the dev team suggests using SqlGeometry types, which are apparently simpler. I'm having trouble accepting this, though, since we're storing geographic data, which seems like storing them in SqlGeography is the right thing to do. Also, I'm not finding any substantive evidence that the SqlGeometry data type is that much easier to work with than the SqlGeography type.
Does anyone have advice as to which type would be more appropriate for this relatively simple scenario?

It's not a question of comparing features, or accuracy, or simplicity - the two spatial datatypes are for working with different sorts of data.
As an analogy, suppose you were choosing the best datatype for a column that contained a unique identifier for each row. If that UID only contained integer values, you'd use int, whereas if it was a 6-character alphanumeric value you'd use char(6). And if it had variable-length unicode values, you'd use nvarchar instead, right?
The same logic goes for spatial data - you choose the appropriate datatype based on the values that that column contains; if you're working with geographic (i.e. latitude/longitude) coordinates, use the SqlGeography datatype. It's that simple.
You can use SqlGeometry to store latitude/longitude values, but it would be like using nvarchar(max) to store an integer... and I promise you it will lead to further problems down the line (when all your area calculations come out measured in degrees squared, for example)

The SqlGeography type has less methods available than SqlGeometry (especially in Sql 2008).
SqlGeography reference
SqlGeometry reference
For example, suppose you want to get the centroid of a polygon in Sql2008. You have a native method for that in geometry, but not in geography.
Also, it has the following limitations:
You can't have a geography exceeding one hemisphere
The ring-order matters when creating the polygon
Also, most API and libraries available (that I know of) handle geometries better than geographies.
That said, if the distance calculation has to be precise, you have large distances and have coordinates all over the world, geography would probably be a better fit. Otherwise, and according to your description of the problem, you would be well served with the geometry type.
Regarding your question: "is that much easier to work?". It depends. Anyway, and as a rule of thumb, for simple scenarios I typically opt for SqlGeometry.
Anyway, IMHO you shouldn't worry too much on that decision. It's relatively easy to create a new column with the other type and migrate the data if necessary.

Four years later, it has become apparent that we should have stored the data in SqlGeometry instead of SqlGeography.
Why?
We were importing information from legislative district maps, and their data was stored in SqlGeometry. When determining if a particular lat/long was within a certain legislative district boundary, we'd get inconsistent results when the point was close to two boundaries.
This required us to do additional work to identify locations that were "close" to a boundary, and manually verify that they were assigned to the proper district. Not ideal.
Moral of the story: if you're relying on any data, consider what type it's stored in to help guide your decision.

Related

When to use ST_Transform and when to use ST_SetSRID in PostGIS?

I'm new to writing postgis queries and I still get confused when writing queries where I have geom vs. a projected version of geom. My queries end up being a mess and I run into the error that goes something like conflicting SRID.
I feel like, mentally, I don't know when to solve it with ST_Transform and when to use ST_SetSRID.
Seems I'm not the only one confused on this. Postgis' website had this article from many years ago now:
ST_Transform and ST_SetSRID: To project or not to project?
People often get confused between the ST_Transform and ST_SetSRID functions.
ST_SetSRID doesn’t change the coordinates but adds meta data to state what spatial reference system the coordinate actually are. If you stamped your WGS 84 long lat data as a meter based projection. Guess what? Its still long lat. A spade by any other name is still a spade so don’t use ST_SetSRID and expect to magically get meter coordinates.
ST_Transform is used to change the underlying coordinates from a known spatial reference system to another known spatial reference system.
source: https://postgis.net/2013/08/30/st_transform-and-st_setsrid-to-project-or-not-to-project/
If the coordinates of the geometry already represent the correct spatial reference system, but it is not yet classified as such, use ST_SetSRID.
If you need to change the actual geometry to reflect a certain spatial reference system, use ST_Transform.
From the docs: "ST_Transform actually changes the coordinates of a geometry from one spatial reference system to another, while ST_SetSRID() simply changes the SRID identifier of the geometry." (https://postgis.net/docs/ST_Transform.html)

Format for storing EXIF metadata in database

I'm working on an application for which I need to be able to store EXIF metadata in a relational database. In the future I'd like to also support XMP and IPTC metadata, but at the moment my focus is on EXIF.
There are a few questions on Stack Overflow about what the table structure should look like when storing EXIF metadata. However, none of them really address my concern. The problem I have is that different EXIF tags have values in different formats, and there isn't really one column type which conveniently stores them all.
The most common type is a "rational" which is an array of two four-byte integers representing a fraction. But there are also non-fractional short and long integers, ASCII strings, byte arrays, and "undefined" (an 8-bit type which must be interpreted according to a priori knowledge of the specific tag.) I'd like to support all of these types, and I want to do so in a convenient, efficient, lossless (i.e. without converting the rationals to floats), extensible and searchable manner.
Here's what I've considered so far:
My current solution is to store everything as a string. This makes it pretty easy to store all of the different types, and is also convenient for searching and debugging. However, it's kind of clunky and inefficient because when I want to actually use the data, I have to do a bunch of string manipulation to convert the rational values into their fractional equivalents, e.g. fraction = float(value.split('/')[0]) / float(value.split('/')[1]). (It's not actually a messy one-liner like that in my real code, but this demonstrates the problem.)
I could grab the raw EXIF bytes for each value from the file and store them in a blob column, but then I'd have to reinterpret the raw bytes every time. This could be marginally more CPU-efficient than the string solution, but it's much, much worse in every other way - on the whole, not worth it.
I could have a different table for each different EXIF datatype. Using this pattern I can maintain my foreign key relationships while storing my values in several different tables. However, this will make my most common query, which is to select all EXIF metadata for a given photo, kind of nasty. It will also become unwieldy very quickly when I add support for other metadata formats.
I'm not a database expert by any means, so there some pattern or magic union-style column type I'm missing that can make this problem go away? Or am I stuck picking my poison from among the three options above?
This is probably a very cheap solution, but I would personally just store the json or something like that within the database.
There is a cool way to extract EXIF data and parse it to json.
Here is the link: Img2JSON
I hope this kind of helps you!

Get info from map data file by coordinate

Imagine I have a map shape file (.shp) or osm xml, I'm able to see different kind of data from different layers in GIS oriented programs, e.g. ArcGIS, QGIS etc. But how can I get this info programmatically? Is there a specific library for that?
What I'm really looking for is a some kind of method getMapData(longitude, latitude) to get landscape/terrain info (e.g. forest, river, city, highway) in specified location
Thanks in advance for your answers!
It still depends what you want to achieve whether you are better off using raster or vector data.
If your are using your grid to subdivide an area as an array of containers for geographic features, then stick with vector data. To do this, I would create a polygon grid file and intersect it with each of your data layers. You can then add an ID field that represents the cell's location in the array (and hence it's relative position to a known lat/long coordinate - let's say lower left). Alternatively you can use spatial queries to access your data by selecting a polygon in your vector grid file and then finding all the features in your other file that are contained by it.
OTOH, if you want to do some multi-feature analysis based on presence/abscence then you may be better going down the route of raster analysis. My gut feeling from what you have said is that this is what you are trying to achieve but I am still not 100% sure. You would handle this by creating a set of boolean rasters of a suitable resolution and then performing maths operations on the set (add, subtract, average etc - depending on what questions your are asking).
Let's say you are looking at animal migration. Let's say your model assumes that streams, hedges and towns are all obstacles to migration but roads only reduce the chance of an area being crossed. So you convert your obstacles to a value of '1' and NoData to '0' in each case, except roads where you decide to set the value to 0.5. You can then add all your rasters together in one big stack and predict migration routes.
Ok that's a simplistic example but perhaps you can see why we need EVEN more information on what you are wanting to do.
Shapefiles or an osm xml file are just containers that hold geometric shapes. There are plenty of software libraries out there that let you read these files and extract the data. I would recommend looking at GDAL/OGR as a starting point.
A method like getMapData(longitude, latitude) is essentially a search/query function. You need to be a little more specific too, do you want geometries that contain the point, are within a distance of a point, etc?
You could find the map data using a brute force algorithm
for shape in shapefile:
if shape.contains(query_point):
return shape
Or you can use more advanced algorithms/data structures such as RTrees, KDTrees, QuadTrees, etc. The easiest way to get start with querying map data is to load it into a spatial database. I would recommending investigating PostgreSQL+PostGIS and SpatiaLite
You may also like to look at Spatialite and/or PostGIS which are two spatial enabled databses that you could use separately or in conjunction with GDAL/OGR.
I must echo Charles' request that you explain your use-case in more detail because the actual implementation will depend greatly on exactly what you are wanting to achieve. My reading of this is that you may want to convert your data into a series of aligned rasters which you can overlay and treat as a 3 dimensional array.

Geospatial data in SQL

I have been experimenting with geography datatype lately and just love it. But I can't decide should i convert from my current schema, that stores latitude and longitude in two separate numeric(9,5) fields to geography type. I have calculated the size of both types and Lat/Long way of representing a point is 28 bytes for a single point whereas geography type is 26. Not a big gain in space but huge improvement in performing geospatial operations (intersect, distance measurement etc.) which are currently handled using awkward stored procedures and scalar functions. What I wonder is the indices. Will geography data type require more space for indexing the data? I have a feeling that it will, even though the actual data stored in columns is less, I thing the way geospatial indices work will eventually result in larger space allocation for them.
P.S. as a side note, it seems that SQL Server 2008 (not R2) does not automatically seek through geospatial indices unless explicitly told to using WITH(INDEX()) clause
In my opinion you should definitely use the spatial types only. The spatial type are optimized for spatial queries and if spatial queries are what you need then I think it is an easy choice.
As a sideeffect you can get rid of your geographical functions and procedures since they are (probably) built-in in SQL server 2008. One caveat though, you might have to spend some time optimizing the spatial indexes, but this depends on your specific case.
I understand that you are trying to decide between keep one of the two, but you might want to consider keeping both. If you export your data into shape files, its a common practice to let lat lon field be along with the geom field.
I would keep both. It can be useful to easily query the original coordinates of a particular feature without requiring spatial operations. You have the benefit of knowing the original points as well as the ability to create a new geometry from them in case you need it in a different coordinate system (like if you have your geometry in a particular projection that will lose a lot of precision going to another).

How to Handle Unknown Data Type in one Table

I have a situation where I need to store a general piece of data (could be an int, float, or string) in my database, but I don't know ahead of time which it will be. I need a table (or less preferably tables) to store this unknown typed data.
What I think I am going to do is have a column for each data type, only use one for each record and leave the others NULL. This requires some logic above the database, but this is not too much of a problem because I will be representing these records in models anyway.
Basically, is there a best practice way to do something like this? I have not come up with anything that is less of a hack than this, but it seems like this is a somewhat common problem. Thanks in advance.
EDIT: Also, is this considered 3NF?
You could easily do that if you used SQLite as a database backend :
Any column in a version 3 database, except an INTEGER PRIMARY KEY column, may be used to store any type of value.
For other RDBMS systems, I would go with Philip's solution.
Note that in my line of software (business applications), I cannot think of any situation where this kind of requirement would be needed (a value with an unknown datatype). Unless the domain model was flawed, of course... I can imagine that other lines of software may incur different practices, but I suggest that you consider rethinking your overall design.
If your application can reliably convert datatypes, you might consider a single column solution based on a variable-length binary column, with a second column to track original data type. (I did a very small routine based on this once before, and it worked well enough.) Testing would show if conversion is more efficiently handled on the application or database side.
If I were to do this I would choose either your method, or I would cast everything to string and use only one column. Of course there would be another column with the type (which would probably be useful for the first method too).
For faster code I would probably go with your method.

Resources