Calculating distance between GPS data using SQL Server - sql-server

I have tables in SQL Server management studio containing the location of individuals by date/time along several months. The tables have the following fields: AnimalID, Date/Time, Lat, Long, Global ID. I am trying to calculate and return the distance between each pair of points in order of its movement without manually entering in the lat and long each time.There are many posts here about calculating distance between two points but I'm trying to run a query that will calculate the distance between each pair in consecutive order. Some of my tables have hundreds of locations.
My values might look like:
`MD001 10/9/2019 1:00:00PM 40.73995 -111.8739
MD001 10/9/2019 6:00:00PM 40.75068 -111.8782
MD001 10/9/2019 10:00:00PM 40.74900 -111.89100`
I want to know the distance between 1:00PM and 6:00PM and then from 6:00PM and 10:00PM, and so forth. I want to accomplish this in SQL Server so that I can query out outliers in the data. Your insight is much appreciated. I also do not want to create a new field in this table.

The algorithm to calculate the distance between points is called Harvesine Formula
To calculate the distance between 2 points in SQL Server you have 2 options:
POINT 1 = 151.209030,-33.877814
POINT 2 = 144.971431, -37.808694
Option 1. You can do your own implementation of the harvesine formula:
select
2 * 6371 * asin(sqrt(POWER((sin(radians((-37.808694 - -33.877814) / 2))),2) + cos(radians(-33.877814)) * cos(radians(-37.808694)) * POWER((sin(radians((144.971431 - 151.209030) / 2))),2)))
Note this will give you the distance in kilometer. This is defined by the multiplier 6371. To get the distance in miles replace 6371 by 3959
If you do a search on the harvesine formula + sql you can find more in depth details about this implementation.
Option 2.. Use SQL Server built-in functions.
In order to do that you'll need to convert your lat and long columns to geography datatype and then use the STDistance function to calculate the actual distance.
The statement below should give you an idea to get started:
select
cast('POINT(151.209030 -33.877814)' as geography).STDistance(cast('POINT(144.971431 -37.808694)' as geography)) as distance_in_meters,
cast('POINT(151.209030 -33.877814)' as geography).STDistance(cast('POINT(144.971431 -37.808694)' as geography)) / 1000 as distance_in_km
The default result will be in meters.
Note there's a slight difference between these 2 options when they are applied to the same coordinates. So if you need precision then you might want to do some further investigation on why that is.

Related

Google Data Studio : how to obtain a SUM related to a COUNT_DISTINCT?

I have a dataset including 3 columns :
ID transac (The unique ID of the transaction - Dimension)
Source (The source of the transaction - Dimension)
Amount € (The amount of the transaction - Stat)
screenshot of my dataset
To Count the number of transactions (for one or more sources), i use COUNT_DISTINCT function
I want to make the sum of the transactions amounts (for one or more sources). But i don't want to additionate the amounts of the transactions with the same ID !
Is there a way to do this calcul with a DataStudio function ?
Thanks for your answers. :-)
EDIT : I saw that we could do this type of calculation via SQL here and I would like to do this in DataStudio (so that I don't have to pre-calculate the amounts per source.)
IMO, your dataset contains wrong data. Each value should be relative only to that line, but this is not the case: if the total is =20, each line should describe the participation of that line to the total. With 4 sources, each line should be =5 or something else that sums 20.
To solve it in DataStudio, you need something like CALCULATE function in PowerBI, but currently DataStudio doesn't support this feature.
But there are some options to consider to repair your data:
If you're sure there are always 4 sources, just create a new calculated field with the expression Amount/4 and SUM it. It is not an elegant solution, but it works.
If your data source is Google Sheets, you can easily repair the data using formulas, like in this example:
Link to spreadsheet
For this spreadsheet, I used this formula in adjusted_amount column: =C2/COUNTIF(A:A,A2). With this column in DataStudio, just use the usual SUM aggregation function to summarize it correctly.

Creating a Measure to calculate Percentage in Tableau

Normally in Tableau to calculate % I create a measure and use the below logic;
SUM ([MEASURE_1])/ SUM([MEASURE_2])
However i am trying to get the % when using a measure and an already aggregated measure and I'm getting the below error;
Argument to SUM (an aggregate function is already an aggregation, and cannot be further aggregated.
My percentage is to calculated the % difference between a policy count from table 1 which is a measure. and measure 2 which I had to create to get the count of policies from table 2 as follows; Count([Policy]). I'm using SQL Server database.
Any ideas how i can resolve this issue?
Thanks!
Measure 1 is a count from table 1, the data looks like the following;
Measure 2 is from table 2, I am creating a count in tableau of the policy number, the data looks like the below;
The 2 tables join on agent number
Table 1 structure;
[UNIQUEPOL_CNT]
,[UNIQUEAGT_CNT]
,[VECHICLE_TXT]
,[VEHICLE_DESC]
,[VEHICLE_CD]
,[CODE_DESC]
,[POLICY_ID]
,[POLICY_NBR]
,[STATE_CD]
,[AGENT_NBR]
,[AGENT_NM]
,[PROP_ID]
,[WRITTEN_DT]
,[PURCHASE_DT]
,[SUSP_IND]
,[SUSPEND_DT]
,[BIND_ID]
,[INSURED_NM]
,[CHANGEDBY_ID]
table 2 structure;
[REPORT_DT]
,[AGENT_NBR]
,[TRR_TOTALPIF_CNT]
,[TRR_TOTALPON_CNT]
,[TRR_TOTALREV_CNT]
,[TRR_TOTALERR_CNT]
This is the invalid output...
UPDATED
I created two datasets to demonstrate you a solution
dataset1
dataset 2
Connecting both datasets in Tableau.
I built a view in tableau as (Note policies field blue color as it has been converted to dimension)
Now calculate a field as
ATTR({FIXED [Agent code]: count([Policy no])}/[Policies])
Adding this to viz gives me
wherein desired percentages are displayed.
Note- since the tables have one-to-many relationships fixed statement in respect of measure on lesser side of relationship will also result in same result.
ATTR({FIXED [Agent code]: count([Policy no])}/
{FIXED [Agent code]: AVG([Policies])})

How to resolve a latitude/longitude to a polygon in postgis

I have a list of 13000 places (with latitude and longitude) --- in table : place.
I have a list of 22000 polygons ---- in another table called place_polygon.
I need to try and resolve the pois to the polygons that they belong to.
This is the query that I wrote :
select * from stg_place.place a
left join stg_place.place_polygon b on
ST_Within(ST_GeomFromText('SRID=4326;POINT('||a.longitude||' '||a.latitude||')'),b.geom);
also tried :
select * from stg_place.place a
left join stg_place.place_polygon b on
ST_Intersects(ST_GeomFromText('SRID=4326;POINT('||a.longitude||' '||a.latitude||')'),b.geom);
It's running forever.
But, if I put a filter in the query, then it runs very fast for a single record.
select * from stg_place.place a
left join stg_place.place_polygon b on
ST_Within(ST_GeomFromText('SRID=4326;POINT('||a.longitude||' '||a.latitude||')'),b.geom)
where a.id = <id>;
I also tried writing a stored procedure and tried to loop through a cursor to only do for one record at a time. That also didn't help. The program ran overnight with no signs of ending.
Is there a better way to solve this? (not necessarily in postgis, but in python geopy etc... ? )
(Should I consider indexing the tables?)
First of all use geography type for your data instead of lat long columns. Why geography, not geometry? Because you use SRID=4326 and with geography type, it will be much easier if you want for example calculate distance in meters then with geometry type which will calculate in degrees for this SRID.
To create geography with your lat long column use function st_setsrid(st_makepoint(long,lat),4326)::geography
Ok. Now answering your question on your actual structure
I have a list of 13000 places (with latitude and longitude) --- in table : place. I have a list of 22000 polygons ---- in another table called place_polygon. I need to try and resolve the pois to the polygons that they belong to.
This is the query that I wrote :
select *
from stg_place.place a
left join stg_place.place_polygon b on
ST_DWithin (st_setsrid(st_make_point(long,lat),4326),b.geom,0);
I used ST_DWithin() instead of ST_Within() because on an older version of Postgres+PostGIS (for sure 9.6 and below) it guarantees of using a spatial index on geoms if created.

Generating Working Hours using SQL Server Query

I have this data and I need to generate a query that will give the output below
You can do this kind of groupings of rows with 2 separate row_number()s. Have 1 for all the data, ordered by date and second one ordered by code and date. To get the groups separated from the data, use the difference between these 2 row_number()s. When it changes, then it's a new block of data. You can then use that number in group by and take the minimum / maximum dates for each of them.
For the final layout you can use pivot or sum + case, most likely you want to have a new row_number for getting the rows aligned properly. Depending if you can have data missing / not matching you'll need probably additional checks.

I tried all ways, but still my area is calculated wrongly in Postgis

I created a very simple polygon in the middle of Germany to demonstrate my problem.
You can visualize it in geojsonlint using the following GeoJSON
{"type":"Polygon","coordinates":[[
[10.439844131469727,51.17460781257472],
[10.430574417114258,51.1753073564544],
[10.429565906524658,51.17179607723465],
[10.438792705535889,51.170706315523866],
[10.439372062683105,51.17267055874809],
[10.43975830078125,51.17439256616884],
[10.439844131469727,51.17460781257472]]]G}
When calculating the surface with online tools (e.g. http://www.daftlogic.com/projects-google-maps-area-calculator-tool.htm, but I tried several),
I get the following numbers (these are based on a similar drawing of the polygon, but not the exact same one, as I couldn't copy it over to these tools):
276583.39 m²
0.28 km²
68.35 acres
27.66 hectares
2977118.86 feet²
0.08 square nautical miles
Now I want to calculate these areas using POSTGIS, but I always get wrong and not matching numbers.
First I started without transformation using the examples given here:
http://postgis.net/docs/ST_Area.html
SELECT ST_Area(the_geom) As sqft, ST_Area(the_geom)*POWER(0.3048,2) As sqm
FROM (SELECT ST_GeomFromText('
POLYGON ((51.17460781257472 10.439844131469727,
51.1753073564544 10.430574417114258,
51.17179607723465 10.429565906524658,
51.170706315523866 10.438792705535889,
51.17267055874809 0.439372062683105,
51.17439256616884 10.43975830078125,
51.17460781257472 10.439844131469727))',4326) ) As foo(the_geom);
--> sqft = 3.52643124351653e-05 and sqm = 3.27616182873666e-06
How can I interprete these numbers?
Then I tried to transform it to WGS 84 / UTM zone 33N 32633
SELECT ST_Area(the_geom) As sqft, ST_Area(the_geom)*POWER(0.3048,2) As sqm
FROM (SELECT ST_Transform(ST_GeomFromText('
POLYGON ((51.174661624019286 10.440187454223633,
51.17067940750161 10.438899993896484,
51.17197097486416 10.429544448852539,
51.17536116708255 10.430488586425781,
51.174661624019286 10.440187454223633))',4326),32633) ) As foo(the_geom);
--> sqft = 662918.939349234 and sqm = 61587.1847391195
But even these numbers don't come close.
The coordinates of the polygon were accidentally loaded as lat,lon instead of lon, lat.
http://postgis.net/2013/08/18/tip_lon_lat
says
In spatial databases spatial coordinates are in x = longitude, and y = latitude
I converted the coordinates into EPSG: 31467, see epsg:31467 which is projected to meters and applies to the area of Germany covered by your geometry.
select st_area(st_transform(st_setsrid(st_geomfromtext('POLYGON((10.439844131469727
51.17460781257472,10.430574417114258 51.1753073564544,10.429565906524658
51.17179607723465,10.438792705535889 51.170706315523866, 10.439372062683105
51.17267055874809, 10.43975830078125 51.17439256616884, 10.439844131469727
51.17460781257472))'),4326),31467));
and got the answer: 274442.27 m2 which is within 0.007% of your original answer.
Measurements are usually more accurate in projected coordinate systems that use a geoid appropriate to that region. If you run this query on the spatial reference system table in Postgis for that projection:
select * from spatial_ref_sys where srid=31467;
you will see some more details, such as the fact that it uses the Bessel 1841 spheroid.
EDIT: your original geojson has coordinates in x/y, but for some reason you flipped them when putting them into Postgis.

Resources