A bug in SQL Geography POINT Lat, Long [closed] - sql-server

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
Update: this has been reported to Microsoft.
In a simple (SQL Server 2012) table with a geography column (name geopoint) populated with a few simple rows points similar to this. POINT (-0.120875610750927 54.1165118880234) etc. executing
select [geopoint].[STAsText](),
[geopoint].Lat lat,
[geopoint].Long long
from mytable
produces this
Untitled1 lat long
POINT (-0.120875610750927 54.1165118880234) 54.1165118880234 -0.120875610750927
which looks like a bug, but it is too basic and should have been caught before release. So am I doing something wrong?
Added info
IT professionals should look for the details of Microsoft's implementation of SQL server on MSDN. As there can be differences in implementation. As per this case. As a proof of this I just checked PostGist's implementation of ST_AsText for a geographic column. This works fine! and result is as one would expect. Therefore the bug is in implementation of SQL. The correct result for the above example should be
POINT (54.1165118880234 -0.120875610750927 ) 54.1165118880234 -0.120875610750927
Dare I say there is a high likelihood that there are other bugs associated with functions working geographic columns. As basic functionality in this area has not been fully tested.

This is working as intended.
According to your question, you stored the data in this pattern:
POINT (-0.120875610750927 54.1165118880234)
then you claimed that the lat/long is reversed according to the MSDN documentation of
Point(Lat, Long, SRID).
You may realize that the syntax you're using is not the same as the one you claim:
POINT(aValue anotherValue) vs Point(Lat, Long, SRID)
Now, the question is, what does MS SQL do to the data?
Turns out, MS SQL interprets the data as Open Geospatial Consortium (OGC) Well-Known Text (WKT), and thus use STPointFromText function since the format is the most suitable for 2-D point:
POINT(x y)
Now, the follow-up question, does it mean POINT(Lat Long)?
From the sample code
SET #g = geography::STPointFromText('POINT(-122.34900 47.65100)', 4326);
it should be clear that the first parameter is not latitude, but longitude (the range of latitude range is from -90 to 90 only), so now we guess that the format is POINT(Long Lat) then. But why?
As explained in this article,
As you can see [...], the longitude is specified first before the latitude. The reason is because in the Open Geospatial Consortium (OGC) Well-Known Text (WKT) representation, the format is (x, y). Geographic coordinates are usually specified by Lat/Long but between these two, the X is the Longitude while the Y is the Latitude.
You may be wondering why the X-coordinate is the Longitude while the Y-coordinate is the Latitude. Think of the equator of the earth as the x-axis while the prime meridian is the Y-axis. Longitude is defined as the distance from the prime meridian along the x-axis (or the equator). Similarly, latitude is defined as the distance from the equator along the Y-axis.

This is a bug. Returned value for STAsText for a geography column swaps the Lat and Long values. Definitely a bug which people should be aware of.

Related

Geometry operations on latitude/longitude coordinates

My question is probably a duplicate, but all the answers I have seen so far do not satisfy me or still leaves me in doubt.
I have a web application that uses Google Maps API to draw and save shapes (circles and polygons) in a SQL Server DB with the geometry data type (where I save lat/long coordinates) and an SRID = 4326.
My objective is to later on, determine if a point is contained in the area of those circles/polygons thanks to SQL function geometry::ST_Intersects().
I have been told so far that my method wouldn't work because I am using geometry instead of geography. But to my surprise... after checking with a few tests, it works perfectly well with geometry and I am not able to understand why or how?
Could somebody explain to me why the geometry type works well with operations on lat/long whereas geography would be more suited?
I post as an answer because as comment is too long
geometry works well to the extent that your intersections are approximable to flat intersections.
the difference between geometry and geography consists in the fact that the former works by hypothesizing to work on plane surfaces the second on spherical surfaces. in the case in which the polygons in question are related to small areas in the order of a few thousand meters geometry works very well. the difference between measured distance by imagining that the points lie on a plane or that the points lie on the earth's sphere is so small as to be negligible. Unlike the question if the points are a few hundred kilometers in this case the distance measured in the plane or on the sphere is very different and proportionally is also the result of the intersection between these areas

Calculate coordinates from a distance in mm [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm writing a piece of software which is responsible for logging the position of certain machine parts.
Now this is the case:
There is 1 RTK fixed GPS receiver (+/- 2cm accuracy), fixed on the machine. The heading is calculated using 2 different locations
There are 2 arms (left and right arm) on the machine that can rotate independent of each other outwards or inwards
There is 1 arm (mid arm) with a fixed location on the machine
What I Already have:
A piece of software which calculates the location of the outer location of the arms (this works like a charm). This produces a shapefile as logfile in which the location of the arms are visible and this works good for every heading.
The problem is:
The algorithm is calculating the location of the arms using the delta X and delta Y distances in mm.
My assumption was that the longitude 0.00000001 is equal to 1.1 mm on the X axis (source). Boy, what was I wrong...
When the shapefile that is generated is being measured using a shapefile viewer it returnes 2,19 meter instead of the calculated 3,25. Note that this is on the latitude 52.810146939 (Northern Hemisphere).
Thus the question:
Has anybody any idea how a formula can be formed that takes a latitude or longitude as starting point and a distance in [mm] and then returnes the corrected latitude or longitude? Or how I can calcuate the relative delta coordinates values to sum them with the Original coordinates?
I've got a snippet of the code:
Armlocations->leftarm.locationX = ownLocation.locationX + MM_TO_COOR(deltaX);
Armlocations->leftarm.locationY = ownLocation.locationY + MM_TO_COOR(deltaY);
deltaX and deltaY are the distances in mm that should be added to the coordinate. The macro MM_TO_COOR is this:
#define COOR_TO_MM(x) ((x) * 110000000)
#define MM_TO_COOR(x) ((x) / 110000000)
The question is not about programming -> I got that going for me, but more about the math involved to this.
I am sure this is not the right place, it might be a better fit to use https://gis.stackexchange.com.
About the latitude, it is always (about) 111 km per degree, so .00000001° (1E-8°) should indeed be 1.1 mm in y direction.
The relation for the longitude indeed depends on your current latitude: here, the factor per degree is 111 km multiplied by cos(latitude). In your case, that would (on a spherical earth) make a factor of 0.60445804192912, resulting in 67.1 km per degree or .671 mm per 1E-8°. On our earch, being flattened at the poles, the value is slightly different, but should be about the same (I cannot, however, tell how big the error is.)
Are you sure that your GPS device has this high resolution of several mm?

Does anyone know of potential problems with st_line_substring in postGIS?

Specifically I'm getting a result that I do not understand. It is possible that my understanding is simply wrong, but I don't think so. So I'm hoping that someone will either say "yes, that's a known problem" or "no, it is working correct and here is why your understanding is wrong".
Here is my example.
To start I have the following geometry of lat/longs.
LINESTRING(-1.32007599 51.06707497,-1.31192207 51.09430508,-1.30926132 51.10206677,-1.30376816 51.11133597,-1.29261017 51.12981493,-1.27510071 51.15906713,-1.27057314 51.16440941,-1.26606703 51.16897072,-1.26235485 51.17439257,-1.26089573 51.17875111,-1.26044512 51.1833917,-1.25793457 51.19727033,-1.25669003 51.20141159,-1.25347137 51.20630532,-1.24845028 51.21110444,-1.23325825 51.22457158,-1.2274003 51.22821321,-1.22038364 51.23103494,-1.20326042 51.23596583,-1.1776185 51.24346193,-1.16356373 51.24968088,-1.13167763 51.26363353,-1.12247229 51.2659966,-1.11629248 51.26682901,-1.10906124 51.26728549,-1.09052181 51.26823871,-1.08522177 51.26885628,-1.07013702 51.27070895,-1.03683472 51.27350122,-1.00917578 51.27572955,-0.98243952 51.2779175,-0.9509182 51.28095094,-0.9267354 51.28305811,-0.90499878 51.28511151,-0.86051702 51.2883055,-0.83661318 51.29023789,-0.7534647 51.29708113,-0.74908733 51.29795323,-0.7400322 51.2988924,-0.71535587 51.30125366,-0.68475723 51.29863749,-0.65746307 51.30220618,-0.63246489 51.30380261,-0.60542822 51.30645873,-0.58150291 51.3103219,-0.57603121 51.31150225,-0.57062387 51.31317883,-0.54195642 51.32475227,-0.4855442 51.34771616,-0.4553318 51.36283147)
This is in a column called "geom" in my table, called "fibre_lines". When I run the following query,
select st_length(geography(geom), false) as full_length,
st_length(geography(st_line_substring(geom, 0, 1)), false) as full_length_2,
st_length(geography(st_line_substring(geom, 0, 0.5)), false) as first_half,
st_length(geography(st_line_substring(geom, 0.5, 1)), false) as second_half
from fibre_lines
where id = 10;
I get the following result...
76399.4939375278 76399.4939375278 41008.9667229201 35390.5272197668
The first two make sense to me, they are simply the length of my line assuming a spherical earth. The first is just using the obvious function while the second is using st_line_substring to get the length of the entire line. These two values agree.
But the last two have me puzzled. I am asking for the length of the first half of the line, then I'm asking for the length of the last half. My expectation was that these would be equal or nearly equal. Instead the first half is about 6km longer than the second half.
If you plot the geometry on the map you will see that the first third of the line is fairly north/south oriented and the remaining two thirds are more east/west. I wouldn't have thought that would make a difference when asking for the length on a spherical earth, but I am happy to be told that I'm wrong (so long as it is also explained why I'm wrong).
For reference the PostGIS I am using is 1.5.8. If this is a bug, upgrading to a newer version is possible, but not trivial, so I would prefer to only do that if it is necessary.
Anyone have ideas?
While Arunas' comments didn't directly answer my question, it did lead me to some research that I think identifies the problem. I'm posting it here in part to get it straight in my own mind and in part in case others are wondering.
It seems the key is the PostGIS distinction between a "geometry" and a "geography". A geometry is a 2D planar geometry that is typically in UTMs and used with a projection of the globe onto a flat surface (which projection is configurable). A geography, on the other hand, is designed to store latitude/longitude information specifically and is used to work either on a sphere or a spheroid. So the essential problem I have is twofold:
Perhaps not obvious from my original post is that I am using a geometry object to store lat/long information rather than UTMs. I cast that to a geography most of the time so that I get the correct answers, but it would be more correct if I actually stored it as a geography object. That would eliminate the need for a number of the casts in my code as well as allow PostGIS to tell me when I am doing something wrong.
While ST_Length will work with either a geometry or a geography, ST_Line_Substring only works with geometries. Hence when I ask it for the halfway point, I am asking it for the halfway point of a flat geometry. This will give me the correct answer for the latitude coordinate, but for the longitude it will have an error term that increases (for most projections) the farther I am from the equator.
I've looked into newer versions of PostGIS and they don't seem to have an ST_Line_Substring or anything similar that will give me the 50% point of a geography, so I will have to do it the "hard" way by using ST_Length to give me all my segment lengths and then adding them up and doing the math needed for my interpolation.
Sorry I can't add comments so will provide it as an answer.
I experienced the same problem and I resolved by transforming my lat-lon geometries to utm geometries into st_line_substring function call. The I as getting sub-geometries with proper length. Of course I had to transform them back to lat-lon afterward.

Understanding forecast accuracy MAPE, WMAPE,WAPE?

I am new to the forecast space and I am trying to understand the different forecast accuracy measures. I am referring to the below link
https://www.otexts.org/fpp/2/5
Can anyone please help me understand the below things:
1. MAPE: I am trying to understand the disadvantage of MAPE "They also have the disadvantage that they put a heavier penalty on negative errors than on positive errors. " Can anyone please provide an example to explain this in detail?
2. Also, I was assuming that WMAPE and WAPE are same. I saw this post at stackoverflow which formulates them differently.
What's the gaps for the forecast error metrics: MAPE and WMAPE?
Also, can you please help me understand how the weights are calculated? My understanding is higher the value more important it is. But I am not sure how the value is calculated.
Thanks in advance!
MAPE = 100* mean(|(Actual-forecast)/Actual|)
If you check the website https://robjhyndman.com/hyndsight/smape/ and the example given u will notice that the denominator taken is the forecast which is incorrect (Should be the actual value). With this formula you can see that MAPE does not put a heavier penalty on negative errors than on positive errors.
WMAPE applies weights which may in fact be biased towards the error which would make the metric worse. The weightage for WMAPE is as far as I know based on the use case. For example you are trying to predict the loss but the percentage of loss needs to be weighted with volume of sales because a loss on a huge sale needs better prediction.
In cases where values to be predicted is very low MAD/Mean (a.k.a WAPE) should be used. For example if the sales is 3 units in one particular week (maybe a holiday) and the predicted value is 9 then the MAPE would be 200%. This would bloat up the total MAPE when you look at multiple weeks of data.
The link given below has details of some other stats used for error measurement
http://www.forecastpro.com/Trends/forecasting101August2011.html
I'm not very sure about the rest, but I came across an answer for the first question recently.
Check out this website - http://robjhyndman.com/hyndsight/smape/
The example given there is presented below -
"Armstrong and Collopy (1992) argued that the MAPE "puts a heavier penalty on forecasts that exceed the actual than those that are less than the actual". Makridakis (1993) took up the argument saying that "equal errors above the actual value result in a greater APE than those below the actual value". He provided an example where yt=150 and y^t=100, so that the relative error is 50/150=0.33, in contrast to the situation where yt=100 and y^t=150, when the relative error would be 50/100=0.50."
y^t == estimated value of y
WMAPE and MAPE are different measures.
MAPE is Mean Absolute Percent Error - this just averages the percent errors.
WMAPE is Weighted Mean Absolute Percent Error = This weights the errors by Volume so this is more rigorous and reliable.
Negative errors do not influence the calculation is this is all absolute error. This could result from the denominator used which is a separate debate.
You can download a detailed presentation from our website at https://valuechainplanning.com/download/24. The PDF can be downloaded at https://valuechainplanning.com/upload/details/Forecast_Accuracy_Presentation.pdf.

How to understand Locality Sensitive Hashing? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I noticed that LSH seems a good way to find similar items with high-dimension properties.
After reading the paper http://www.slaney.org/malcolm/yahoo/Slaney2008-LSHTutorial.pdf, I'm still confused with those formulas.
Does anyone know a blog or article that explains that the easy way?
The best tutorial I have seen for LSH is in the book: Mining of Massive Datasets.
Check Chapter 3 - Finding Similar Items
http://infolab.stanford.edu/~ullman/mmds/ch3a.pdf
Also I recommend the below slide:
http://www.cs.jhu.edu/%7Evandurme/papers/VanDurmeLallACL10-slides.pdf .
The example in the slide helps me a lot in understanding the hashing for cosine similarity.
I borrow two slides from Benjamin Van Durme & Ashwin Lall, ACL2010 and try to explain the intuitions of LSH Families for Cosine Distance a bit.
In the figure, there are two circles w/ red and yellow colored, representing two two-dimensional data points. We are trying to find their cosine similarity using LSH.
The gray lines are some uniformly randomly picked planes.
Depending on whether the data point locates above or below a gray line, we mark this relation as 0/1.
On the upper-left corner, there are two rows of white/black squares, representing the signature of the two data points respectively. Each square is corresponding to a bit 0(white) or 1(black).
So once you have a pool of planes, you can encode the data points with their location respective to the planes. Imagine that when we have more planes in the pool, the angular difference encoded in the signature is closer to the actual difference. Because only planes that resides between the two points will give the two data different bit value.
Now we look at the signature of the two data points. As in the example, we use only 6 bits(squares) to represent each data. This is the LSH hash for the original data we have.
The hamming distance between the two hashed value is 1, because their signatures only differ by 1 bit.
Considering the length of the signature, we can calculate their angular similarity as shown in the graph.
I have some sample code (just 50 lines) in python here which is using cosine similarity.
https://gist.github.com/94a3d425009be0f94751
Tweets in vector space can be a great example of high dimensional data.
Check out my blog post on applying Locality Sensitive Hashing to tweets to find similar ones.
http://micvog.com/2013/09/08/storm-first-story-detection/
And because one picture is a thousand words check the picture below:
http://micvog.files.wordpress.com/2013/08/lsh1.png
Hope it helps.
#mvogiatzis
Here's a presentation from Stanford that explains it. It made a big difference for me. Part two is more about LSH, but part one covers it as well.
A picture of the overview (There are much more in the slides):
Near Neighbor Search in High Dimensional Data - Part1:
http://www.stanford.edu/class/cs345a/slides/04-highdim.pdf
Near Neighbor Search in High Dimensional Data - Part2:
http://www.stanford.edu/class/cs345a/slides/05-LSH.pdf
LSH is a procedure that takes as input a set of documents/images/objects and outputs a kind of Hash Table.
The indexes of this table contain the documents such that documents that are on the same index are considered similar and those on different indexes are "dissimilar".
Where similar depends on the metric system and also on a threshold similarity s which acts like a global parameter of LSH.
It is up to you to define what the adequate threshold s is for your problem.
It is important to underline that different similarity measures have different implementations of LSH.
In my blog, I tried to thoroughly explain LSH for the cases of minHashing( jaccard similarity measure) and simHashing (cosine distance measure). I hope you find it useful:
https://aerodatablog.wordpress.com/2017/11/29/locality-sensitive-hashing-lsh/
I am a visual person. Here is what works for me as an intuition.
Say each of the things you want to search for approximately are physical objects such as an apple, a cube, a chair.
My intuition for an LSH is that it is similar to take the shadows of these objects. Like if you take the shadow of a 3D cube you get a 2D square-like on a piece of paper, or a 3D sphere will get you a circle-like shadow on a piece of paper.
Eventually, there are many more than three dimensions in a search problem (where each word in a text could be one dimension) but the shadow analogy is still very useful to me.
Now we can efficiently compare strings of bits in software. A fixed length bit string is kinda, more or less, like a line in a single dimension.
So with an LSH, I project the shadows of objects eventually as points (0 or 1) on a single fixed length line/bit string.
The whole trick is to take the shadows such that they still make sense in the lower dimension e.g. they resemble the original object in a good enough way that can be recognized.
A 2D drawing of a cube in perspective tells me this is a cube. But I cannot distinguish easily a 2D square from a 3D cube shadow without perspective: they both looks like a square to me.
How I present my object to the light will determine if I get a good recognizable shadow or not. So I think of a "good" LSH as the one that will turn my objects in front of a light such that their shadow is best recognizable as representing my object.
So to recap: I think of things to index with an LSH as physical objects like a cube, a table, or chair, and I project their shadows in 2D and eventually along a line (a bit string). And a "good" LSH "function" is how I present my objects in front of a light to get an approximately distinguishable shape in the 2D flatland and later my bit string.
Finally when I want to search if an object I have is similar to some objects that I indexed, I take the shadows of this "query" object using the same way to present my object in front of the light (eventually ending up with a bit string too). And now I can compare how similar is that bit string with all my other indexed bit strings which is a proxy for searching for my whole objects if I found a good and recognizable way to present my objects to my light.
As a very short, tldr answer:
An example of locality sensitive hashing could be to first set planes randomly (with a rotation and offset) in your space of inputs to hash, and then to drop your points to hash in the space, and for each plane you measure if the point is above or below it (e.g.: 0 or 1), and the answer is the hash. So points similar in space will have a similar hash if measured with the cosine distance before or after.
You could read this example using scikit-learn: https://github.com/guillaume-chevalier/SGNN-Self-Governing-Neural-Networks-Projection-Layer

Resources