I have a list of 13000 places (with latitude and longitude) --- in table : place.
I have a list of 22000 polygons ---- in another table called place_polygon.
I need to try and resolve the pois to the polygons that they belong to.
This is the query that I wrote :
select * from stg_place.place a
left join stg_place.place_polygon b on
ST_Within(ST_GeomFromText('SRID=4326;POINT('||a.longitude||' '||a.latitude||')'),b.geom);
also tried :
select * from stg_place.place a
left join stg_place.place_polygon b on
ST_Intersects(ST_GeomFromText('SRID=4326;POINT('||a.longitude||' '||a.latitude||')'),b.geom);
It's running forever.
But, if I put a filter in the query, then it runs very fast for a single record.
select * from stg_place.place a
left join stg_place.place_polygon b on
ST_Within(ST_GeomFromText('SRID=4326;POINT('||a.longitude||' '||a.latitude||')'),b.geom)
where a.id = <id>;
I also tried writing a stored procedure and tried to loop through a cursor to only do for one record at a time. That also didn't help. The program ran overnight with no signs of ending.
Is there a better way to solve this? (not necessarily in postgis, but in python geopy etc... ? )
(Should I consider indexing the tables?)
First of all use geography type for your data instead of lat long columns. Why geography, not geometry? Because you use SRID=4326 and with geography type, it will be much easier if you want for example calculate distance in meters then with geometry type which will calculate in degrees for this SRID.
To create geography with your lat long column use function st_setsrid(st_makepoint(long,lat),4326)::geography
Ok. Now answering your question on your actual structure
I have a list of 13000 places (with latitude and longitude) --- in table : place. I have a list of 22000 polygons ---- in another table called place_polygon. I need to try and resolve the pois to the polygons that they belong to.
This is the query that I wrote :
select *
from stg_place.place a
left join stg_place.place_polygon b on
ST_DWithin (st_setsrid(st_make_point(long,lat),4326),b.geom,0);
I used ST_DWithin() instead of ST_Within() because on an older version of Postgres+PostGIS (for sure 9.6 and below) it guarantees of using a spatial index on geoms if created.
Related
I have tables in SQL Server management studio containing the location of individuals by date/time along several months. The tables have the following fields: AnimalID, Date/Time, Lat, Long, Global ID. I am trying to calculate and return the distance between each pair of points in order of its movement without manually entering in the lat and long each time.There are many posts here about calculating distance between two points but I'm trying to run a query that will calculate the distance between each pair in consecutive order. Some of my tables have hundreds of locations.
My values might look like:
`MD001 10/9/2019 1:00:00PM 40.73995 -111.8739
MD001 10/9/2019 6:00:00PM 40.75068 -111.8782
MD001 10/9/2019 10:00:00PM 40.74900 -111.89100`
I want to know the distance between 1:00PM and 6:00PM and then from 6:00PM and 10:00PM, and so forth. I want to accomplish this in SQL Server so that I can query out outliers in the data. Your insight is much appreciated. I also do not want to create a new field in this table.
The algorithm to calculate the distance between points is called Harvesine Formula
To calculate the distance between 2 points in SQL Server you have 2 options:
POINT 1 = 151.209030,-33.877814
POINT 2 = 144.971431, -37.808694
Option 1. You can do your own implementation of the harvesine formula:
select
2 * 6371 * asin(sqrt(POWER((sin(radians((-37.808694 - -33.877814) / 2))),2) + cos(radians(-33.877814)) * cos(radians(-37.808694)) * POWER((sin(radians((144.971431 - 151.209030) / 2))),2)))
Note this will give you the distance in kilometer. This is defined by the multiplier 6371. To get the distance in miles replace 6371 by 3959
If you do a search on the harvesine formula + sql you can find more in depth details about this implementation.
Option 2.. Use SQL Server built-in functions.
In order to do that you'll need to convert your lat and long columns to geography datatype and then use the STDistance function to calculate the actual distance.
The statement below should give you an idea to get started:
select
cast('POINT(151.209030 -33.877814)' as geography).STDistance(cast('POINT(144.971431 -37.808694)' as geography)) as distance_in_meters,
cast('POINT(151.209030 -33.877814)' as geography).STDistance(cast('POINT(144.971431 -37.808694)' as geography)) / 1000 as distance_in_km
The default result will be in meters.
Note there's a slight difference between these 2 options when they are applied to the same coordinates. So if you need precision then you might want to do some further investigation on why that is.
I have a table in SQL Server that contains GPS "tracks" stored as latitude / longitude Decimal columns. A track is simply a series of connected points that are recorded by the GPS.
When adding a new track I need to query the database to see if it matches any existing tracks. Since recorded GPS coordinates are not exactly the same, it must allow for an error margin. Can anyone suggest an efficient way to do this?
Try this
SELECT * FROM (SELECT * FROM `tracks` as table1
WHERE `longitude` BETWEEN '75.294472'-0.01 AND '75.294472'+0.01) as table2
WHERE `latitude` BETWEEN '19.881256'-0.01 AND '19.881256'+0.01
here 0.02 is the margin
table1 contains all the rows that satisfies the longitude condition which are again filtered in table2 satisfying the latitude condition
In my Postgres 9.5 database with PostGis 2.2.0 installed, I have a table buildings with geometric data (points) centroid. The table contains about 3 million buildings, of which about 300.000 contain special information.
Now, for each buildings.gid I want to know, how many other buildings of the same table are within a certain radius (I want to test this for different radiuses: 20meter, 50meter, 100m, 200m, 500m if it can be done in an adequat amount of time) and add this information to a column of buildings. The related columns are N20, N50,...
Query
I figured out to use something like:
UPDATE buildings
SET N50=sub.N
FROM (SELECT Count(n.gid) AS N
FROM buildings n, buildings b
WHERE ST_DWithin(b.centroid, n.centroid, 50) -- distance in meter
) sub
Related to this solution of #ErwinBrandstetter, where there is a coordinate given, around with the radius is produced. But even when testing for only one gid, I did not recieve an result in an acceptable amount of time.
The difference to my problem is, that I want this to be done for every building.
Table definitions
CREATE TABLE public.buildings
(
gid integer NOT NULL DEFAULT nextval('buildings_gid_seq'::regclass),
osm_id character varying(11),
name character varying(48),
type character varying(16),
geom geometry(MultiPolygon,4326),
centroid geometry(Point,4326),
gembez character varying(50),
gemname character varying(50),
krsbez character varying(50),
krsname character varying(50),
pv boolean,
gr numeric,
capac double precision,
instdate date,
pvid integer,
dist double precision,
gemewz integer,
n50 integer,
n100 integer,
n200 integer,
n500 integer,
n1000 integer,
IBASE numeric,
CONSTRAINT buildings_pkey PRIMARY KEY (gid)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.buildings
OWNER TO postgres;
CREATE INDEX build_centroid_gix
ON public.buildings
USING gist
(st_transform(centroid, 31467));
CREATE INDEX buildings_geom_idx
ON public.buildings
USING gist
(geom);
Advanced Problem
(The following might be another problem, hence should be another question on stackoverflow, but there might be the chance to implement this in the first question)
Furthermore, referring to the "special information", 268238 of the buildings contain information about dist,instdate,capac. These columns of the remaining buildings are NULL.
instdate is the date, at which a building had a "PV" installed. I need to transform the table buildings to a panel datatype table, which means that for each period (in my case 11 periods) exists one row for the same building.
Now I need to check, how many other buildings within the radius already had a "PV" installed.
To do so, I want to query all buildings within a radius (like in first question) where for example capac IS NOT NULL, but now the buildings shall not be counted, but their information about dist,instdate,capac shall be added as a string to IBASE.
Try building an index on a geography cast, which can be used for ST_DWithin (so you can calculated metric distances with geographic data)
CREATE INDEX buildings_geog_idx ON buildings USING gist (geom::geography);
UPDATE buildings SET n50=c.count
FROM (
SELECT a.gid, count(b.gid)
FROM buildings a
LEFT JOIN buildings b ON ST_DWithin(a.geom::geography, b.geom::geography, 50.0)
AND a.gid <> b.gid
GROUP BY a.gid
) c
WHERE c.gid = buildings.gid;
You could also try calculating on a sphere for faster performance, but potential errors from spheroid distances:
ST_DWithin(a.geom::geography, b.geom::geography, 50.0, false)
I have two tables:
The table LINKS has:
LINK_ID --- integer, unique ID
FROM_NODE_X -- numbers/floats, indicating a geographical position
FROM_NODE_Y --
FROM_NODE_Z --
TO_NODE_X --
TO_NODE_Y --
TO_NODE_Z --
The table LINK_COORDS has:
LINK_ID --- integer, refers to above UID
ORDER --- integer, indicating order
X ---
Y ---
Z ---
Logically each LINK consists of a number of waypoints. The final order is:
FROM_NODE , 1 , 2 , 3 , ... , TO_NODE
A link has at least two waypoints (FROM_NODE, TO_NODE), but can have a variable number of waypoints in between (0 to 100+).
I now would need a way to aggregate, sort and store the waypoints of each link in an array which later will be used to draw a line.
I'm struggling with the LINK_COORDS being available as individual rows. Having the start and end positions in the other (LINKS) table doesn't help either. If I had a way to at least get all the LINK_COORDS joined/updated to the LINKS table I probably could work out the rest myself again. So if you have an idea on how to get that far, it'd be much appreciated already.
Considering performance would be nice (the tables have somewhere between 500k to 1mio entries now and will have multiples of that later), but is not essential for now.
EDIT:
Thanks for the suggestion, a-horse-with-no-name.
I chose to create the point geometries (PostGIS) for each XYZ before this step, so in the end there's only an array of points to create from the individual points.
The adapted SQL
UPDATE "Link"
SET "POINTS" =
array_append(
(array_prepend(
"FROM_POINT",
(SELECT array_agg(lc."POINT" ORDER BY lc."COUNT")
FROM "LinkCoordinate" lc
WHERE lc."LINK_ID" = "Link"."LINK_ID")))
, "TO_POINT")
however is running extremely slow:
Running it exemplary on 10 links required ~120 seconds. Running it for all the 1,3mio links and plenty more linkcoords would probably take somewhere around half a year. Not really ideal.
How can I figure out where this immense slowness originates from?
If I get the source data in a pre-ordered format (so linkcoordinates of each link_ID), would this allow me to significantly speed up the SQL query?
EDIT: It appears the main slowdown originates from the SELECT subquery used in the array_agg() function. Everything else (incl. ordering) does not really cause any slowdown.
My current guess is that the SELECT query iterates over the entirety of "LinkCoordinate" for each and every link, making it work much harder than it has to, as all LinkCoordinates belonging to a Link are always stored in 'blocks' of rows. A single, sequential processing of the LinkCoordinates would be sufficient, really.
something like this maybe:
select l.link_id,
min(l.from_node_x) as from_node_x,
min(l.from_node_y) as from_node_y,
min(l.from_node_z) as from_node_z,
array_agg(lc.x order by lc."ORDER") as points_x,
array_agg(lc.y order by lc."ORDER") as points_y,
array_agg(lc.z order by lc."ORDER") as points_z,
min(l.to_node_x) as to_node_x,
min(l.to_node_y) as to_node_y,
min(l.to_node_z) as to_node_z
from links l
join link_coords lc on lc.link_id = l.link_id
group by l.link_id;
The min() is necessary due to the group by but won't change the result as all values from the links are the same anyway.
Another possibility is to use a scalar subquery. I'm unsure which of them is faster though - but the join/group by is probably more efficient.
select l.link_id,
l.from_node_x,
l.from_node_y,
l.from_node_z,
(select array_agg(lc.x order by lc."ORDER") from link_coords lc where lc.link_id = l.link_id) as points_x,
(select array_agg(lc.y order by lc."ORDER") from link_coords lc where lc.link_id = l.link_id) as points_y,
(select array_agg(lc.z order by lc."ORDER") from link_coords lc where lc.link_id = l.link_id) as points_z,
l.to_node_x,
l.to_node_y,
l.to_node_z
from links l
Hey, so I'm trying to figure out the best way of storing movement paths and then afterwards how they might be queried.
Let me try to explain a bit more. Say I have many cars moving around on a map and I want to determine if and when they're in a convoy. If I store just the paths then I can see that they travelled along the same road, but not if they were there at the same time. I can store the start and end times but that will not take into account the changes in speed of the two vehicles. I can't think of any obvious way to store and achieve this so I thought I'd put the question out there in case there's something I'm missing before trying to implement a solution. So does anyone know anything I don't?
Thanks,
Andrew
Well it depends on what type of movement information you have.
If you have some tables setup like:
Vehicle (Id, Type, Capacity, ...)
MovementPoint(VehicleId, Latitude, Longitude, DateTime, AverageSpeed)
This would allow you to query if two cars going to the same point plus or minus 5 minutes like so:
Select * from Vehicle v INNER JOIN MovementPoint mp on mp.VehicleId = v.Id
WHERE v.Id = #FirstCarID
AND EXISTS
(
SELECT 1 FROM Vehicle v2 INNER JOIN MovementPoint mp2 on mp2.VehicleId = v2.Id
WHERE v2.Id = #SecondCarId
AND mp2.Latitude = mp.Latitude AND mp2.Longitude = mp.Longitude
AND mp2.DateTime BETWEEN DATEADD(minute,-5,mp.DateTime) AND DATEADD(minute,5,mp.DateTime)
)
You could also query for multiple points in common between multiple vehicles with specific time windows.
Also you could make the query check latitude and longitude values are within a certain radius of each other.