PostGIS ST_Distance_Spheroid or Haversine - postgis

I used ST_Distance_Spheroid in PostgreSQL (with Postgis) to calculate the distance between Woking and Edinburgh like as follows:
CREATE TABLE pointsTable (
id serial NOT NULL,
name varchar(255) NOT NULL,
location Point NOT NULL,
PRIMARY KEY (id)
);
INSERT INTO pointsTable (name, location) VALUES
( 'Woking', '(51.3168, -0.56)' ),
( 'Edinburgh', '(55.9533, -3.1883)' );
SELECT ST_Distance_Spheroid(geometry(a.location), geometry(b.location), 'SPHEROID["WGS 84",6378137,298.257223563]')
FROM pointsTable a, pointsTable b
WHERE a.id=1 AND b.id=2;
I got a result of 592km (592,053.100454442 meters).
Unfortunately, when I used various sources on the web to make the same calculation I consistently got around the 543km mark which is different by 8.2%.
source 1 - 338 miles (543.958 km)
source 2 - 544.410km
source 3 - 543.8km
Luckily, the third source clarified that they were using the haversine formula. I am not sure about the other two sources.
Did I do something wrong in my queries or is this down to a difference in the formulas used? If so, which calculation is closest to the shortest distance a crow could fly, keeping a constant elevation?

You swapped the latitude and the longitude. If you put them in the right order you would get 544 430m. The distance computation is using the great circle arcs, which is the true shortest distance between points over a sphere.
WITH src AS (
select st_geomfromtext('POINT(-0.56 51.3168)',4326) pt1,
st_geomfromtext('POINT(-3.1883 55.9533)',4326) pt2)
SELECT
ST_DistanceSpheroid(pt1, pt2, 'SPHEROID["WGS 84",6378137,298.257223563]') Dist_sphere,
ST_Distance(pt1::geography, pt2::geography) Dist_great_circle
FROM src;
dist_sphere | dist_great_circle
------------------+-------------------
544430.941199621 | 544430.94119962
(1 row)
On a side note, there is a warning
ST_Distance_Spheroid signature was deprecated in 2.2.0. Please use
ST_DistanceSpheroid

Related

Weighted Average w/ Array Formula & Query That Pulls From A Separate Sheet

Link To Sheet
So I've got an array formula which I've included below. I need to adjust this so that it becomes a weighted average based on variables stored on a sheet titled Variables.
Current Formula:
=ARRAYFORMULA(QUERY(
{PROPER(ADP!A3:A),ADP!E3:S;
PROPER(ADP!J3:J),ADP!S3:S;
PROPER(ADP!Z3:Z),ADP!AG3:AG},
"select Col1, Sum(Col2)
where
Col2 is not null and
Col1 is not null
group by Col1
order by Sum(Col2)
label
Col1 'PLAYER',
Sum(Col2) 'ADP AVG'"))
Here's what I thought would work but doesn't:
=ARRAYFORMULA(QUERY(
{PROPER(ADP!A3:A),ADP!E3:E*(Variables!$F$11/Variables!$F$14);
PROPER(ADP!J3:J),ADP!S3:S*(Variables!$F$12/Variables!$F$14);
PROPER(ADP!Z3:Z),ADP!AG3:AG*(Variables!$F$13/Variables!$F$14)},
"select Col1, Sum(Col2)
where
Col2 is not null and
Col1 is not null
group by Col1
order by Sum(Col2)
label
Col1 'PLAYER',
Sum(Col2) 'ADP AVG'"))
What I'm trying to get is the value pulled in K to be multiplied by the value in VariablesF11, the value pulled in Y to be multiplied by VariablesF12, and the value in AL multiplied by the variables in F13. And have that numerator divided by the value in VariablesF14.
After our extensive chat, I'm providing here the answer we came up with, just on the chance it might somehow help someone else. But the issue in your case was less about the technicalities of the formula, and more about the structuring of multiple data sources, and the associated logic to pull the data together.
Here is the main formula:
={"Adjusted
Ranking
by " & Variables!F21;
arrayformula(
if(A2:A<>"",
( if(((D2:D>0) * Source1Used),D2:D,Variables!$F$21)*Variables!$F$12
+ if(((F2:F>0) * Source2Used),F2:F,Variables!$F$21)*Variables!$F$13
+ if(((H2:H>0) * Source3Used),H2:H,Variables!$F$21)*Variables!$F$14
+ if(((J2:J>0) * Source4Used),J2:J,Variables!$F$21)*Variables!$F$15
+ if(((L2:L>0) * Source5Used),L2:L,Variables!$F$21)*Variables!$F$16
+ if(((N2:N>0) * Source6Used),N2:N,Variables!$F$21)*Variables!$F$17 )) / Variables!$F$18) }
A2:A is the list of players' names. The D2:D>0 is a test of whether that player has a rating obtained from a particular data source.
Source1Used is a named range for a tickbox cell, where the user can indicate whether that data source is to be included in the calculations.
This formula creates an average value, using from 1 to 6 possible sources, user selectable.
The formula that gave the rating value for one specific source is as follows:
={"Rating in
Source1";ArrayFormula(if(A2:A<>"",if(C2:C,vlookup(A2:A,indirect("ADP!$" & ADP!E3 & "$10:" & ADP!E5),ADP!E6-ADP!E4+1,0),0),""))}
This takes a name in column A, checks if it is listed in a specific source's data, and if so, it pulls back the rating value from the data source. INDIRECT is used since the column locations for each data source may vary, but are obtained from a fixed table, in cells ADP!E3 and E5. E4 and E6 are the numeric values of the column letters.

Find valid combinations based on matrix

I have a in CALC the following matrix: the first row (1) contains employee numbers, the first column (A) contains productcodes.
Everywhere there is an X that productitem was sold by the corresponding employee above
| 0302 | 0303 | 0304 | 0402 |
1625 | X | | X | X |
1643 | | X | X | |
...
We see that product 1643 was sold by employees 0303 and 0304
What I would like to see is a list of what product was sold by which employees but formatted like this:
1625 | 0302, 0304, 0402 |
1643 | 0303, 0304 |
The reason for this is that we need this matrix ultimately imported into an SQL SERVER table. We have no access to the origins of this matrix. It contains about 50 employees and 9000+ products.
Thanx for thinking with us!
try something like this
;with data as
(
SELECT *
FROM ( VALUES (1625,'X',NULL,'X','X'),
(1643,NULL,'X','X',NULL))
cs (col1, [0302], [0303], [0304], [0402])
),cte
AS (SELECT col1,
col
FROM data
CROSS apply (VALUES ('0302',[0302]),
('0303',[0303]),
('0304',[0304]),
('0402',[0402])) cs (col, val)
WHERE val IS NOT NULL)
SELECT col1,
LEFT(cs.col, Len(cs.col) - 1) AS col
FROM cte a
CROSS APPLY (SELECT col + ','
FROM cte B
WHERE a.col1 = b.col1
FOR XML PATH('')) cs (col)
GROUP BY col1,
LEFT(cs.col, Len(cs.col) - 1)
I think there are two problems to solve:
get the product codes for the X marks;
concatenate them into a single, comma-separated string.
I can't offer a solution for both issues in one step, but you may handle both issues separately.
1.
To replace the X marks by the respective product codes, you could use an array function to create a second table (matrix). To do so, create a new sheet, copy the first column / first row, and enter the following formula in cell B2:
=IF($B2:$E3="X";$B$1:$E$1;"")
You'll have to adapt the formula, so it covers your complete input data (If your last data cell is Z9999, it would be =IF($B2:$Z9999="X";$B$1:$Z$1;"")). My example just covers two rows and four columns.
After modifying it, confirm with CTRL+SHIFT+ENTER to apply it as array formula.
2.
Now, you'll have to concatenate the product codes. LO Calc lacks a feature to concatenate an array, but you could use a simple user-defined function. For such a string-join function, see this answer. Just create a new macro with the StarBasic code provided there and save it. Now, you have a STRJOIN() function at hand that accepts an array and concatenates its values, leaving empty values out.
You could add that function using a helper column on the second sheet and apply it by dragging it down. Finally, to get rid of the cells with the single product IDs, copy the complete second sheet, paste special into a third sheet, pasting only the values. Now, you can remove all columns except the first one (employee IDs) and the last one (with the concatenated product ids).
I created a table in sql for holding the data:
CREATE TABLE [dbo].[mydata](
[prod_code] [nvarchar](8) NULL,
[0100] [nvarchar](10) NULL,
[0101] [nvarchar](10) NULL,
[and so on...]
I created the list of columns in Calc by copying and pasting them transposed. After that I used the concatenate function to create the columnlist + datatype for the create table statement
I cleaned up the worksheet and imported it into this table using SQL Server's import wizard. Cleaning meant removing unnecessary rows/columns. Since the columnnames were identical mapping was done correctly for 99%.
Now I had the data in SQL Server.
I adapted the code MM93 suggested a bit:
;with data as
(
SELECT *
FROM dbo.mydata <-- here i simply referenced the whole table
),cte
and in the next part I uses the same 'worksheet' trick to list and format all the column names and pasted them in.
),cte
AS (SELECT prod_code, <-- had to replace col1 with 'prod_code'
col
FROM data
CROSS apply (VALUES ('0100',[0100]),
('0101', [0101] ),
(and so on... ),
The result of this query was inserted into a new table and my colleagues and I are querying our harts out :)
PS: removing the 'FOR XML' clause resulted in a table with two columns :
prodcode | employee
which containes al the unique combinations of prodcode + employeenumber which is a lot faster and much more practical to query.

linestring created from coordinates POSTGIS

Can someone help ?
Here is the part of my code (sql) which doesn't work :
SELECT ST_LENGTH(geom) into distance FROM
SELECT ST_GeographyFromText('srid=4326;linestring(lon_bus lat_bus, lon_stop lat_stop)') AS geom)
AS dis;
lon_bus, lat_bus, lon_stop and lat_stop are coordinates I get from my database. When I try this, I have an error of parsing. But when I replace these variable by numeric, it works. Can someone help me on it? I would like to keep these variables in my code.
It doesn't work because the WKT with variables is invalid. Remember, WKT is just regular text, so don't confuse WKT with SQL.
You can make a LineString from two point geometries, then cast it to ::geography.
SELECT ST_MakeLine(ST_MakePoint(lon_bus, lat_bus),
ST_MakePoint(lon_stop, lat_stop))::geography AS geog
FROM (
SELECT 1 AS lon_bus, 2 AS lat_bus, 3 AS lon_stop, 4 AS lat_stop
) AS data;
To get the geodesic length, use ST_Length on the geography.
Based on the usage, the question isn't about how to make a linestring, but how to calculated the distance between two geographic positions. There are several ways to do this:
SELECT
ST_Distance(bus, stop) AS cartesian_distance,
ST_Distance_Sphere(bus, stop) AS sphere_distance,
ST_Distance(bus::geography, stop::geography) AS geography_distance,
ST_Length(ST_MakeLine(bus, stop)::geography) AS geography_length
FROM (
SELECT ST_MakePoint(lon_bus, lat_bus) AS bus, ST_MakePoint(lon_stop, lat_stop) AS stop
FROM (SELECT 1 AS lon_bus, 2 AS lat_bus, 3 AS lon_stop, 4 AS lat_stop) AS data
) AS data;
-[ RECORD 1 ]------+-----------------
cartesian_distance | 2.82842712474619
sphere_distance | 314283.687770102
geography_distance | 313588.397192902
geography_length | 313588.397192902
The last two get the same result. If you don't need the linestring (e.g. to draw on a map), then the simplest method is used for geography_distance.

Is a point within a geographical radius - SQL Server 2008

Given the following data, would it be possible, and if so which would be the most efficient method of determining whether the location 'Shurdington' in the first table is contained within the given radius's of any of the locations in the second table.
The GeoData column is of the 'geography' type, so using SQL Servers spatial features are an option as well as using latitude and longitude.
Location GeoData Latitude Longitude
===========================================================
Shurdington XXXXXXXXXX 51.8677979 -2.113189
ID Location GeoData Latitude Longitude Radius
==============================================================================
1000 Gloucester XXXXXXXXXX 51.8907127 -2.274598 10
1001 Leafield XXXXXXXXXX 51.8360519 -1.537438 10
1002 Wotherton XXXXXXXXXX 52.5975151 -3.061798 5
1004 Nether Langwith XXXXXXXXXX 53.2275276 -1.212108 20
1005 Bromley XXXXXXXXXX 51.4152069 0.0292294 10
Any assistance is greatly apprecieded.
Create Data
CREATE TABLE #Data (
Id int,
Location nvarchar(50),
Latitude decimal(10,5),
Longitude decimal(10,5),
Radius int
)
INSERT #Data (Id,Location,Latitude,Longitude,Radius) VALUES
(1000,'Gloucester', 51.8907127 ,-2.274598 , 20), -- Increased to 20
(1001,'Leafield', 51.8360519 , -1.537438 , 10),
(1002,'Wotherton', 52.5975151, -3.061798 , 5),
(1004,'Nether Langwith', 53.2275276 , -1.212108 , 20),
(1005,'Bromley', 51.4152069 , 0.0292294 , 10)
Test
Declare your point of interest as a POINT
DECLARE #p GEOGRAPHY = GEOGRAPHY::STGeomFromText('POINT(-2.113189 51.8677979)', 4326);
To find out if it is in the radius of another point:
-- First create a Point.
DECLARE #point GEOGRAPHY = GEOGRAPHY::STGeomFromText('POINT(-2.27460 51.89071)', 4326);
-- Buffer the point (meters) and check if the 1st point intersects
SELECT #point.STBuffer(50000).STIntersects(#p)
Combining it all into a single query:
select *,
GEOGRAPHY::STGeomFromText('POINT('+
convert(nvarchar(20), Longitude)+' '+
convert( nvarchar(20), Latitude)+')', 4326)
.STBuffer(Radius * 1000).STIntersects(#p) as [Intersects]
from #Data
Gives:
Id Location Latitude Longitude Radius Intersects
1000 Gloucester 51.89071 -2.27460 20 1
1001 Leafield 51.83605 -1.53744 10 0
1002 Wotherton 52.59752 -3.06180 5 0
1004 Nether Langwith 53.22753 -1.21211 20 0
1005 Bromley 51.41521 0.02923 10 0
Re: Efficiency. With some correct indexing it appears SQL's spatial indexes can be very quick
If you want to do the maths yourself, you could use Equirectangular approximation based upon Pythagoras. The formula is:
var x = (lon2-lon1) * Math.cos((lat1+lat2)/2);
var y = (lat2-lat1);
var d = Math.sqrt(x*x + y*y) * R;
In terms of SQL, this should give those locations in your 2nd table that contain your entry in the 1st within their radius:
SELECT *
FROM Table2 t2
WHERE EXISTS (
SELECT 1 FROM Table1 t1
WHERE
ABS (
SQRT (
(SQUARE((RADIANS(t2.longitude) - RADIANS(t1.longitude)) * COS((RADIANS(t2.Latitude) + RADIANS(t1.Latitude))/2))) +
(SQUARE(RADIANS(t1.Latitude) - RADIANS(t2.Latitude)))
) * 6371 --Earth radius in km, use 3959 for miles
)
<= t2.Radius
)
Note that this is not the most accurate method available but is likely good enough. If you are looking at distances that stretch across the globe you may wish to Google 'haversine' formula.
It may be worth comparing this with Paddy's solution to see how well they agree and which performs best.
You calculate the distance between the two points and compare this distance to the given radius.
For calculating short distances, you can use the formula at Wikipedia - Geographical distance - Spherical Earth projected to a plane, which claims to be "very fast and produces fairly accurate result for small distances".
According to the formula, you need the difference in latitudes and longitudes and the mean latitude
with geo as (select g1.id, g1.latitude as lat1, g1.longitude as long1, g1.radius,
g2.latitude as lat2, g2.longitude as long2
from geography g1
join geography g2 on g2.location = 'shurdington'
and g1.location <> 'shurdington')
base as (select id,
(radians(lat1) - radians(lat2)) as dlat,
(radians(long1) - radians(long2)) as dlong,
(radians(lat1) + radians(lat2)) / 2 as mlat, radius
from geo)
dist as (select id,
6371.009 * sqrt(square(dlat) + square(cos(mlat) * dlong)) as distance,
radius
from base)
select id, distance
from dist
where distance <= radius
I used the with selects as intermediate steps to keep the calculations "readable".

How to create a circle in meters in postgis?

I would like to ask how to create a circle with radius=4km. I have tried the ST_Buffer function but it creates a larger circle. (I see the created circle by inserting its polygon into an new kml file.)
This is what i am trying.
INSERT INTO camera(geom_circle) VALUES(geometry(ST_Buffer(georgaphy(ST_GeomFromText('POINT(21.304116745663165 38.68607570952619)')), 4000)))
The center of the circle is a lon lat point but I don't know its SRID because I have imported it from a kml file.
Do I need the SRID in order to transform the geometries etc?
KML files are always lat/long and use SRID=4326. This SRID is implied if you use geography. Geography is a good way to mix-in the 4 km metric measure on lat/long data ... excellent you tried this!
Try this statement to fix up the casts, and use a parameterized point constructor:
SELECT ST_Buffer(ST_MakePoint(21.304116745663165, 38.68607570952619)::geography, 4000);
And if you need to cast this back to geometry, add a ::geometry cast to the end.
Update on accuracy
The previous answer internally re-projects the geometry (usually) to a UTM zone that the point fits within (see ST_Buffer). This may cause minor distortions if the point is on the edge of two UTM boundaries. Most folks won't care about the size of these errors, but it will often be several meters. However, if you require sub millimeter precision, consider building a dynamic azimuthal equidistant projection. This requires PostGIS 2.3's ST_Transform, and is adapted from another answer:
CREATE OR REPLACE FUNCTION geodesic_buffer(geom geometry, dist double precision,
num_seg_quarter_circle integer)
RETURNS geometry AS $$
SELECT ST_Transform(
ST_Buffer(ST_Point(0, 0), $2, $3),
('+proj=aeqd +x_0=0 +y_0=0 +lat_0='
|| ST_Y(ST_Centroid($1))::text || ' +lon_0=' || ST_X(ST_Centroid($1))::text),
ST_SRID($1))
$$ LANGUAGE sql IMMUTABLE STRICT COST 100;
CREATE OR REPLACE FUNCTION geodesic_buffer(geom geometry, dist double precision)
RETURNS geometry AS 'SELECT geodesic_buffer($1, $2, 8)'
LANGUAGE sql IMMUTABLE STRICT COST 100;
-- Optional warppers for geography type
CREATE OR REPLACE FUNCTION geodesic_buffer(geog geography, dist double precision)
RETURNS geography AS 'SELECT geodesic_buffer($1::geometry, $2)::geography'
LANGUAGE sql IMMUTABLE STRICT COST 100;
CREATE OR REPLACE FUNCTION geodesic_buffer(geog geography, dist double precision,
num_seg_quarter_circle integer)
RETURNS geography AS 'SELECT geodesic_buffer($1::geometry, $2, $3)::geography'
LANGUAGE sql IMMUTABLE STRICT COST 100;
A simple example to run one of the functions is:
SELECT geodesic_buffer(ST_MakePoint(21.304116745663165, 38.68607570952619)::geography, 4000);
And to compare the distances to each of the buffered points, here are the lengths of each geodesic (shortest path on an ellipsoid of revolution, i.e. WGS84). First this function:
SELECT count(*), min(buff_dist), avg(buff_dist), max(buff_dist)
FROM (
SELECT ST_Distance((ST_DumpPoints(geodesic_buffer(poi, dist)::geometry)).geom, poi) AS buff_dist
FROM (SELECT ST_MakePoint(21.304116745663165, 38.68607570952619)::geography AS poi, 4000 AS dist) AS f
) AS f;
count | min | avg | max
-------+----------------+-----------------+----------------
33 | 3999.999999953 | 3999.9999999743 | 4000.000000001
Compare this to ST_Buffer (first part of answer), that shows it's off by about 1.56 m:
SELECT count(*), min(buff_dist), avg(buff_dist), max(buff_dist)
FROM (
SELECT ST_Distance((ST_DumpPoints(ST_Buffer(poi, dist)::geometry)).geom, poi) AS buff_dist
FROM (SELECT ST_MakePoint(21.304116745663165, 38.68607570952619)::geography AS poi, 4000 AS dist) AS f
) AS f;
count | min | avg | max
-------+----------------+------------------+----------------
33 | 4001.560675049 | 4001.56585986067 | 4001.571105793

Resources