Does anyone know of potential problems with st_line_substring in postGIS?

Does anyone know of potential problems with st_line_substring in postGIS? - postgis

Specifically I'm getting a result that I do not understand. It is possible that my understanding is simply wrong, but I don't think so. So I'm hoping that someone will either say "yes, that's a known problem" or "no, it is working correct and here is why your understanding is wrong".
Here is my example.
To start I have the following geometry of lat/longs.
LINESTRING(-1.32007599 51.06707497,-1.31192207 51.09430508,-1.30926132 51.10206677,-1.30376816 51.11133597,-1.29261017 51.12981493,-1.27510071 51.15906713,-1.27057314 51.16440941,-1.26606703 51.16897072,-1.26235485 51.17439257,-1.26089573 51.17875111,-1.26044512 51.1833917,-1.25793457 51.19727033,-1.25669003 51.20141159,-1.25347137 51.20630532,-1.24845028 51.21110444,-1.23325825 51.22457158,-1.2274003 51.22821321,-1.22038364 51.23103494,-1.20326042 51.23596583,-1.1776185 51.24346193,-1.16356373 51.24968088,-1.13167763 51.26363353,-1.12247229 51.2659966,-1.11629248 51.26682901,-1.10906124 51.26728549,-1.09052181 51.26823871,-1.08522177 51.26885628,-1.07013702 51.27070895,-1.03683472 51.27350122,-1.00917578 51.27572955,-0.98243952 51.2779175,-0.9509182 51.28095094,-0.9267354 51.28305811,-0.90499878 51.28511151,-0.86051702 51.2883055,-0.83661318 51.29023789,-0.7534647 51.29708113,-0.74908733 51.29795323,-0.7400322 51.2988924,-0.71535587 51.30125366,-0.68475723 51.29863749,-0.65746307 51.30220618,-0.63246489 51.30380261,-0.60542822 51.30645873,-0.58150291 51.3103219,-0.57603121 51.31150225,-0.57062387 51.31317883,-0.54195642 51.32475227,-0.4855442 51.34771616,-0.4553318 51.36283147)
This is in a column called "geom" in my table, called "fibre_lines". When I run the following query,
select st_length(geography(geom), false) as full_length,
st_length(geography(st_line_substring(geom, 0, 1)), false) as full_length_2,
st_length(geography(st_line_substring(geom, 0, 0.5)), false) as first_half,
st_length(geography(st_line_substring(geom, 0.5, 1)), false) as second_half
from fibre_lines
where id = 10;
I get the following result...
76399.4939375278 76399.4939375278 41008.9667229201 35390.5272197668
The first two make sense to me, they are simply the length of my line assuming a spherical earth. The first is just using the obvious function while the second is using st_line_substring to get the length of the entire line. These two values agree.
But the last two have me puzzled. I am asking for the length of the first half of the line, then I'm asking for the length of the last half. My expectation was that these would be equal or nearly equal. Instead the first half is about 6km longer than the second half.
If you plot the geometry on the map you will see that the first third of the line is fairly north/south oriented and the remaining two thirds are more east/west. I wouldn't have thought that would make a difference when asking for the length on a spherical earth, but I am happy to be told that I'm wrong (so long as it is also explained why I'm wrong).
For reference the PostGIS I am using is 1.5.8. If this is a bug, upgrading to a newer version is possible, but not trivial, so I would prefer to only do that if it is necessary.
Anyone have ideas?

While Arunas' comments didn't directly answer my question, it did lead me to some research that I think identifies the problem. I'm posting it here in part to get it straight in my own mind and in part in case others are wondering.
It seems the key is the PostGIS distinction between a "geometry" and a "geography". A geometry is a 2D planar geometry that is typically in UTMs and used with a projection of the globe onto a flat surface (which projection is configurable). A geography, on the other hand, is designed to store latitude/longitude information specifically and is used to work either on a sphere or a spheroid. So the essential problem I have is twofold:
Perhaps not obvious from my original post is that I am using a geometry object to store lat/long information rather than UTMs. I cast that to a geography most of the time so that I get the correct answers, but it would be more correct if I actually stored it as a geography object. That would eliminate the need for a number of the casts in my code as well as allow PostGIS to tell me when I am doing something wrong.
While ST_Length will work with either a geometry or a geography, ST_Line_Substring only works with geometries. Hence when I ask it for the halfway point, I am asking it for the halfway point of a flat geometry. This will give me the correct answer for the latitude coordinate, but for the longitude it will have an error term that increases (for most projections) the farther I am from the equator.
I've looked into newer versions of PostGIS and they don't seem to have an ST_Line_Substring or anything similar that will give me the 50% point of a geography, so I will have to do it the "hard" way by using ST_Length to give me all my segment lengths and then adding them up and doing the math needed for my interpolation.

Sorry I can't add comments so will provide it as an answer.
I experienced the same problem and I resolved by transforming my lat-lon geometries to utm geometries into st_line_substring function call. The I as getting sub-geometries with proper length. Of course I had to transform them back to lat-lon afterward.

Related

Basic interpretation of two table operations (row wise/not row wise) in Postgis

I'm new in PostGIS, I has been reading the docs, usually the docs are very good written, at least for tables of 1 row D:
Probs this will be a silly question, or obvs too all ones that know postgis, but plis help a little to can go inside from other languages.
I have checked a lot from:
https://postgis.net/workshops/postgis-intro/
Sadly, I still can't get an answer for a simple question, the behavior of a lot of functions in table-table operations.
I know R/sf, and I'm trying to learn Postgis, but usually, every function have its own way to relate the functions, as example, IIRC intersects exists in sf and geopandas, but..., the behavior of the function is different, even when they have the same name.
Lets pick an example:
https://postgis.net/docs/ST_Intersects.html
The function is defined as:
boolean ST_Intersects( geometry geomA , geometry geomB );
All the params are defined a geometry, that means it can be a column or a singular geometry, but we don't know what will be the behavior if the tables has more than 1 row, maybe when it says "geometries" will interpret the full table as one big geometry.
Then I can go to this link:
https://postgis.net/docs/geometry_overlaps.html
Where I can finally see a result that seems a matrix operation..., at some extent, here is where the possibilities starts to open.
Intersects is a row wise function?
Intersects will intersects every from from the first table over the second table? in case how would be the return...? (need a table of rows(table1)*rows(table2), this is not written in the docs)
Here above, are just the questions and what is confusing, checking intersects, now lets go back to the specific issue.
Probably, the relation of the functions is a common sense in postgis, because in the doc that is omitted, and not only in intersects in others like intersection, disjoint, etc. I think all of them has the same behavior, is just implicit.
So, postgis works in a element-element? table-element? element-table? table-table? or other interpretation? or every function have its own way but is not written or I need search on other place?
Thx!

How to understand Locality Sensitive Hashing? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I noticed that LSH seems a good way to find similar items with high-dimension properties.
After reading the paper http://www.slaney.org/malcolm/yahoo/Slaney2008-LSHTutorial.pdf, I'm still confused with those formulas.
Does anyone know a blog or article that explains that the easy way?

The best tutorial I have seen for LSH is in the book: Mining of Massive Datasets.
Check Chapter 3 - Finding Similar Items
http://infolab.stanford.edu/~ullman/mmds/ch3a.pdf
Also I recommend the below slide:
http://www.cs.jhu.edu/%7Evandurme/papers/VanDurmeLallACL10-slides.pdf .
The example in the slide helps me a lot in understanding the hashing for cosine similarity.
I borrow two slides from Benjamin Van Durme & Ashwin Lall, ACL2010 and try to explain the intuitions of LSH Families for Cosine Distance a bit.
In the figure, there are two circles w/ red and yellow colored, representing two two-dimensional data points. We are trying to find their cosine similarity using LSH.
The gray lines are some uniformly randomly picked planes.
Depending on whether the data point locates above or below a gray line, we mark this relation as 0/1.
On the upper-left corner, there are two rows of white/black squares, representing the signature of the two data points respectively. Each square is corresponding to a bit 0(white) or 1(black).
So once you have a pool of planes, you can encode the data points with their location respective to the planes. Imagine that when we have more planes in the pool, the angular difference encoded in the signature is closer to the actual difference. Because only planes that resides between the two points will give the two data different bit value.
Now we look at the signature of the two data points. As in the example, we use only 6 bits(squares) to represent each data. This is the LSH hash for the original data we have.
The hamming distance between the two hashed value is 1, because their signatures only differ by 1 bit.
Considering the length of the signature, we can calculate their angular similarity as shown in the graph.
I have some sample code (just 50 lines) in python here which is using cosine similarity.
https://gist.github.com/94a3d425009be0f94751

Tweets in vector space can be a great example of high dimensional data.
Check out my blog post on applying Locality Sensitive Hashing to tweets to find similar ones.
http://micvog.com/2013/09/08/storm-first-story-detection/
And because one picture is a thousand words check the picture below:
http://micvog.files.wordpress.com/2013/08/lsh1.png
Hope it helps.
#mvogiatzis

Here's a presentation from Stanford that explains it. It made a big difference for me. Part two is more about LSH, but part one covers it as well.
A picture of the overview (There are much more in the slides):
Near Neighbor Search in High Dimensional Data - Part1:
http://www.stanford.edu/class/cs345a/slides/04-highdim.pdf
Near Neighbor Search in High Dimensional Data - Part2:
http://www.stanford.edu/class/cs345a/slides/05-LSH.pdf

LSH is a procedure that takes as input a set of documents/images/objects and outputs a kind of Hash Table.
The indexes of this table contain the documents such that documents that are on the same index are considered similar and those on different indexes are "dissimilar".
Where similar depends on the metric system and also on a threshold similarity s which acts like a global parameter of LSH.
It is up to you to define what the adequate threshold s is for your problem.
It is important to underline that different similarity measures have different implementations of LSH.
In my blog, I tried to thoroughly explain LSH for the cases of minHashing( jaccard similarity measure) and simHashing (cosine distance measure). I hope you find it useful:
https://aerodatablog.wordpress.com/2017/11/29/locality-sensitive-hashing-lsh/

I am a visual person. Here is what works for me as an intuition.
Say each of the things you want to search for approximately are physical objects such as an apple, a cube, a chair.
My intuition for an LSH is that it is similar to take the shadows of these objects. Like if you take the shadow of a 3D cube you get a 2D square-like on a piece of paper, or a 3D sphere will get you a circle-like shadow on a piece of paper.
Eventually, there are many more than three dimensions in a search problem (where each word in a text could be one dimension) but the shadow analogy is still very useful to me.
Now we can efficiently compare strings of bits in software. A fixed length bit string is kinda, more or less, like a line in a single dimension.
So with an LSH, I project the shadows of objects eventually as points (0 or 1) on a single fixed length line/bit string.
The whole trick is to take the shadows such that they still make sense in the lower dimension e.g. they resemble the original object in a good enough way that can be recognized.
A 2D drawing of a cube in perspective tells me this is a cube. But I cannot distinguish easily a 2D square from a 3D cube shadow without perspective: they both looks like a square to me.
How I present my object to the light will determine if I get a good recognizable shadow or not. So I think of a "good" LSH as the one that will turn my objects in front of a light such that their shadow is best recognizable as representing my object.
So to recap: I think of things to index with an LSH as physical objects like a cube, a table, or chair, and I project their shadows in 2D and eventually along a line (a bit string). And a "good" LSH "function" is how I present my objects in front of a light to get an approximately distinguishable shape in the 2D flatland and later my bit string.
Finally when I want to search if an object I have is similar to some objects that I indexed, I take the shadows of this "query" object using the same way to present my object in front of the light (eventually ending up with a bit string too). And now I can compare how similar is that bit string with all my other indexed bit strings which is a proxy for searching for my whole objects if I found a good and recognizable way to present my objects to my light.

As a very short, tldr answer:
An example of locality sensitive hashing could be to first set planes randomly (with a rotation and offset) in your space of inputs to hash, and then to drop your points to hash in the space, and for each plane you measure if the point is above or below it (e.g.: 0 or 1), and the answer is the hash. So points similar in space will have a similar hash if measured with the cosine distance before or after.
You could read this example using scikit-learn: https://github.com/guillaume-chevalier/SGNN-Self-Governing-Neural-Networks-Projection-Layer

object / shape / piece fitting

I've been thinking for a few days about the best solution for this but can't seem to get the right idea on how to do this.
I have a pieces (objects) and I want to fit them in the smallest possible space.
What I'm ultimately looking for is something like this
http://i.stack.imgur.com/Yg09E.gif
But a simpler version of just calculating the best possible fit of two lines(stripes) would already do for now
like the lines(stripes) on the right
http://i.stack.imgur.com/HijMo.jpg
What I have is 2 arrays of points(vertices) on a xy axis representing two lines(stripes) and I'd like to arrange them in such a manner that there is 10 or 20 mm space between the closest point of the two.
I was thinking of looking at the first half of the array and finding the highest point then looking at the second half and finding it's highest point then compare the two
but that doesn't really seem to be a proper solution.
And I can't really imagine writing a program that fits shapes as in the first image is even possible using such methods.
Can anyone guide me in the right direction?

Well, this is really possible.
All you would Have to do is build area and distance function. You might need to add different algorithms for different kinds of shapes.
For the Ones you have provided in the first picture, it is difficult to calculate area. So, Probably will have to specify distance of vertices. Also, you need to add a condition to make sure that the locus of the shapes does not co-incide at any point.

Matching with SIFT (Conceptual)

I have two images of real world. (IMPORTANT)I approximately know transformation of one real world to another. Due to texture problem I don't get enough matches between two images. How can I bring transformation information into account to get more and correct matches by using SIFt.
Any idea will be helpful.

Have you tried other alternatives? Are you sure SIFT is the answer? First, OpenCV provides SIFT, among other tools. (At the moment, I can't speak highly enough of OpenCV).
If I were solving this problem, I would first try:
Downsample your two images to reduce the influence of "texture", i.e. cvPyrDown.
Perform some feature detection: edge detection, etc. OpenCV provides a Harris corner detector, among others. Google "cvGoodFeaturesToTrack" for some detail.
If you have good confidence in your transformations, take advantage of your a priori information and look for features in neighborhoods corresponding to the transformed locations.
If you still want to look at SIFT or SURF, OpenCV provides those capabilities, as well.

If you know the transform, then apply the transform and then apply SURF/SIFT to the transformed image. That's one standard way to extend the robustness of feature descriptors/matchers across large perspective changes.

There is another alternative:
In sift parameters, Contrast Threshold is set to 0.04. If you reduce it and set it to a lower value ( 0.02,0.01) SIFT would find more enough matches:
SIFT(int nfeatures=0, int nOctaveLayers=3, double contrastThreshold=0.04, double edgeThreshold=10, double sigma=1.6)

The first step I think is to try with the settings of the SIFT algorithm to find the best efficiency with respect to your problem.
One another way to use SIFT more effectively is adding the COLOR information to SIFT. So you can add the color information (RGB) of the points which are being used in the descriptor to it. For instance if your descriptor size is 10x128 then it shows that you are using 10 points in each descriptor. Now you can extract and add three column and make the size 10x(128+3) [R-G-B for each point]. In this way the SIFT algorithm will work more efficient. But remember, you need to apply weight to your descriptor and make the last three columns be stronger than the other 128 columns. Actually I do not know in your case how the images are. but this method helped me a lot. and you can see that this modification makes SIFT a stronger method than before.
A similar implementation can be find here.

Subtracting a SQL Server geometry from another

Is there a way to subtract a geometry from another? A kind of reverse STUnion..
The problem I am having is that I need to ensure a shape fits within another (without changing the larger shape). I thought I could use the STIntersection to get the shape thats "in". However, STIntersection is not accurate and produces a shape that can (and does) not equate to the true intersection.
You can easily see this if you then take the STDifference of the original shape.
So , what I would like to do is given two shapes I want to subtract one from the other - e.g. Take the STIntersection and then subtract the STDifference.
Any ideas?
Edit: For now, I have created my intersection from a STBuffer(-1) version of the bigger shape, this should account the mathematical variation of STIntersection with a slight reduction in accuracy. However, I would still love to know if you can subtract a geometry from another..

Just use .STDifference(). No need to intersect first, then subtract the intersection. Just subtract directly.

Did you try STWithin?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight