Nearest Neighbour search in postgis giving wrong result - postgis

I try to find the nearest station to a set of polygons. I use the following query but the result is always the same station (actually the one with the lowest id).
SELECT DISTINCT ON (a.id)
a.id AS field_id,
a.name,
a.geom AS field_location,
b.stations_id,
b.stationsname,
st_distance(a.geom, b.geom) AS dist
FROM fields_filtered a,
kl_stationsliste b
WHERE b.bis_datum > '2020-04-01'::date
ORDER BY a.id, b.stations_id, (st_distance(a.geom, b.geom));
What am I doing wrong here?

Related

Subquery in snowflake with unknown matches, matches to columns

I have a few tables that I'm joining and one of them can have an unknown number of matches, up to 6. Each match should be returned as a row value in the initial query. For example:
SELECT a.ID, a.match1, a.match2, a.match3, a.match4, a.match5, a.match6
FROM table1 a, (SELECT ID, match FROM table2 WHERE a.ID = table2.ID) b
WHERE a.ID = b.ID
That's probably not the right syntax but hopefully it shows what I need. So the nested query MAY return 1 match or 5. Each match should be the value for the corresponding column name, ie a.match1 = first match, b.match2 = second match, etc etc.
Please let me know if I need to explain further. I know this isn't the optimal schema to use but it's what I was told to work with.
Something that SQL doesn't like is an unknown number of columns.
As a quick hack, you could aggregate all matches in an array, and then have a query around it transforming the matches into a predefined (large) number of columns.
Like this:
with data as (
select $1 id
from (values(1),(2))
), data2 as (
select $1 id, $2 match
from (values(1, 'a1'),(1, 'a2'),(2, 'b1'),(2, 'b2'),(2, 'b3'))
)
select id, matches[0], matches[1], matches[2], matches[3]
from (
select a.id, array_agg(match) matches
from data a
join data2 b
on a.id=b.id
group by 1
);

PostGIS minimum distance between two large sets of points

I have two tables of points in PostGIS, say A and B, and I want to know, for every point in A, what is the distance to the closest point in B. I am able to solve this for small sets of points with the following query:
SELECT a.id, MIN(ST_Distance_Sphere(a.geom, b.geom))
FROM table_a a, table_b b
GROUP BY a.id;
However, I have a couple million points in each table and this query runs indefinitely. Is there some more efficient way to approach this. I am open to getting an approximate distance rather than an exact one.
Edit: A slight modification to the answer provided by JGH to return distances in meters rather than degrees if points are unprojected.
SELECT
a.id, nn.id AS id_nn,
a.geom, nn.geom_closest,
ST_Distance_Sphere(a.geom, nn.geom_closest) AS min_dist
FROM
table_a AS a
CROSS JOIN LATERAL
(SELECT
b.id,
b.geom AS geom_closest
FROM table_b b
ORDER BY a.geom <-> b.geom
LIMIT 1) AS nn;
Your query is slow because it computes the distance between every points without using any index. You could rewrite it to use the <-> operator that uses the index if used in the order by clause.
select a.id,closest_pt.id, closest_pt.dist
from tablea a
CROSS JOIN LATERAL
(SELECT
id ,
a.geom <-> b.geom as dist
FROM tableb b
ORDER BY a.geom <-> b.geom
LIMIT 1) AS closest_pt;

Finding point of interest on a square wave using sql

Good day,
I have a sql table with the following setup:
DataPoints{ DateTime timeStampUtc , bit value}
The points are on a minute interval, and store either a 1(on) or a 0(off).
I need to write a stored procedure to find the points of interest from all the data points.
I have a simplified drawing below:
I need to find the corner points only. Please note that there may be many data points between a value change. For example:
{0,0,0,0,0,0,0,1,1,1,1,0,0,0}
This is my thinking atm (high level)
Select timeStampUtc, Value
From Data Points
Where Value before or value after differs by 1 or -1
I am struggling to convert this concept to sql, and I also have a feeling there is an more elegant mathematical solution that I am not aware off. This must be a common problem in electronics?
I have wrapped the table into a CTE. Then, I am joining every row in the CTE to the next row of itself. Also, I've added a condition that the consequent rows should differ in the value.
This would return you all rows where the value changes.
;WITH CTE AS(
SELECT ROW_NUMBER() OVER(ORDER BY TimeStampUTC) AS id, VALUE, TIMESTAMPUTC
FROM DataPoints
)
SELECT CTE.TimeStampUTC as "Time when the value changes", CTE.id, *
FROM CTE
INNER JOIN CTE as CTE2
ON CTE.id = CTE2.id + 1
AND CTE.Value != CTE2.Value
Here's a working fiddle: http://sqlfiddle.com/#!6/a0ddc/3
If I got it correct, you are looking for something like this:
with cte as (
select * from (values (1,0),(2,0),(3,1),(4,1),(5,0),(6,1),(7,0),(8,0),(9,1)) t(a,b)
)
select
min(a), b
from (
select
a, b, sum(c) over (order by a rows unbounded preceding) grp
from (
select
*, iif(b = lag(b) over (order by a), 0, 1) c
from
cte
) t
) t
group by b, grp

Should I use a cursor for this?

I have a table with three fields. Group number, X-coord and Y-coord. There can be from 0 to about 10 rows within each group number.
What I want to do is calculate the maximum and minimum distance between points within each group. Obviously, this will only give you a value if there are 2 or more rows within that group.
Output should consist of fields: group number, minDistance, maxDistance.
Is a cursor a good solution for this?
(Coordinates are in WGS84 and I have a working formula for calculating distances)
My reasoning for using a cursor is that I cannot avoid doing a cross join for each group and then applying the formula for each result of the cross join.
I wouldn't use a cursor in your situation but preferably a scalar User Defined Function with the required group number in argument, and calculate the maximum distance for that group inside the UDF.
Please note the calculation algorithm inside the function is much simpler than what you may have.
create table dist (groupId int, X int, Y int)
insert into dist(groupid, x, y) values (1,14,20),(1,11,20),(1,10,22),(1,12,24),(1,11,28),(1,19,78)
insert into dist(groupid, x, y) values (2,10,20),(2,11,20),(2,10,22),(2,12,24),(2,11,28),(2,17,52)
create function dbo.getMinMaxDistanceForGroup (#groupId int)
returns table as return (
select MIN(SQRT(SQUARE(b.X - a.X) + SQUARE(b.Y - a.Y))) MinDistance,
MAX(SQRT(SQUARE(b.X - a.X) + SQUARE(b.Y - a.Y))) MaxDistance
from dist a cross join dist b
where a.groupId = #groupId and b.groupId = #groupId
)
select groupId, MinDistance, MaxDistance
from dist OUTER APPLY dbo.getMinMaxDistanceForGroup(groupId)
group by groupid, MinDistance, MaxDistance

Substract 2 columns from postgreSQL LEFT JOIN query with NULL values

I have a postgreSQL query which should be the actual stock of samples on our lab.
The initial samples are taken from a table (tblStudies), but then there are 2 tables to look for to decrease the amount of samples.
So I made a union query for those 2 tables, and then matched the uniun query with the tblStudies to calculate the actual stock.
But the union query only gives values when there is a decrease in samples.
So when the study still has it's initial samples, the value isn't returned.
I figured out I should use a JOIN operation, but then I have NULL values for my study with initial samples.
Here is how far I got, any help please?
SELECT
"tblStudies"."Studie_ID", "SamplesWeggezet", c."Stalen_gebruikt", "SamplesWeggezet" - c."Stalen_gebruikt" as "Stock"
FROM
"Stability"."tblStudies"
LEFT JOIN
(
SELECT b."Studie_ID",sum(b."Stalen_gebruikt") as "Stalen_gebruikt"
FROM (
SELECT "tblAnalyses"."Studie_ID", sum("tblAnalyses"."Aant_stalen_gebruikt") AS "Stalen_gebruikt"
FROM "Stability"."tblAnalyses"
GROUP BY "tblAnalyses"."Studie_ID"
UNION
SELECT "tblStalenUitKamer"."Studie_ID", sum("tblStalenUitKamer".aant_stalen) AS "stalen_gebruikt"
FROM "Stability"."tblStalenUitKamer"
GROUP BY "tblStalenUitKamer"."Studie_ID"
) b
GROUP BY b."Studie_ID"
) c ON "tblStudies"."Studie_ID" = c."Studie_ID"
Because you're doing a LEFT JOIN to the inline query "C" some values of c."stalen_gebruikt" can be null. And any number - null is going to yield null. To address this we can use coalesce
So change
"samplesweggezet" - c."stalen_gebruikt" AS "Stock
to
"samplesweggezet" - COALESCE(c."stalen_gebruikt",0) AS "Stock

Resources