Hey, so I'm trying to figure out the best way of storing movement paths and then afterwards how they might be queried.
Let me try to explain a bit more. Say I have many cars moving around on a map and I want to determine if and when they're in a convoy. If I store just the paths then I can see that they travelled along the same road, but not if they were there at the same time. I can store the start and end times but that will not take into account the changes in speed of the two vehicles. I can't think of any obvious way to store and achieve this so I thought I'd put the question out there in case there's something I'm missing before trying to implement a solution. So does anyone know anything I don't?
Thanks,
Andrew
Well it depends on what type of movement information you have.
If you have some tables setup like:
Vehicle (Id, Type, Capacity, ...)
MovementPoint(VehicleId, Latitude, Longitude, DateTime, AverageSpeed)
This would allow you to query if two cars going to the same point plus or minus 5 minutes like so:
Select * from Vehicle v INNER JOIN MovementPoint mp on mp.VehicleId = v.Id
WHERE v.Id = #FirstCarID
AND EXISTS
(
SELECT 1 FROM Vehicle v2 INNER JOIN MovementPoint mp2 on mp2.VehicleId = v2.Id
WHERE v2.Id = #SecondCarId
AND mp2.Latitude = mp.Latitude AND mp2.Longitude = mp.Longitude
AND mp2.DateTime BETWEEN DATEADD(minute,-5,mp.DateTime) AND DATEADD(minute,5,mp.DateTime)
)
You could also query for multiple points in common between multiple vehicles with specific time windows.
Also you could make the query check latitude and longitude values are within a certain radius of each other.
Related
I am trying to get a total summation of both the ItemDetail.Quantity column and ItemDetail.NetPrice column. For sake of example, let's say the quantity that is listed is for each individual item is 5, 2, and 4 respectively. I am wondering if there is a way to display quantity as 11 for one single ItemGroup.ItemGroupName
The query I am using is listed below
select Location.LocationName, ItemDetail.DOB, SUM (ItemDetail.Quantity) as "Quantity",
ItemGroup.ItemGroupName, SUM (ItemDetail.NetPrice)
from ItemDetail
Join ItemGroupMember
on ItemDetail.ItemID = ItemGroupMember.ItemID
Join ItemGroup
on ItemGroupMember.ItemGroupID = ItemGroup.ItemGroupID
Join Location
on ItemDetail.LocationID = Location.LocationID
Inner Join Item
on ItemDetail.ItemID = Item.ItemID
where ItemGroup.ItemGroupID = '78' and DOB = '11/20/2019'
GROUP BY Location.LocationName, ItemDetail.DOB, Item.ItemName,
ItemDetail.NetPrice, ItemGroup.ItemGroupName
If you are using SQL Server 2012 , you can use the summation on partition to display the
details and aggregates in the same query.
SUM(SalesYTD) OVER (ORDER BY DATEPART(yy,ModifiedDate)),1)
Link :
https://learn.microsoft.com/en-us/sql/t-sql/functions/sum-transact-sql?view=sql-server-ver15
We can't be certain without seeing sample data. But I suspect you need to remove some fields from you GROUP BY clause -- probably Item.ItemName and ItemDetail.NetPrice.
Generally, you won't GROUP BY a column that you are applying an aggregate function to in the SELECT -- as in SUM(ItemDetail.NetPrice). And it is not very common, in my experience, to GROUP BY columns that aren't included in the SELECT list - as you are doing with Item.ItemName.
I think you need to go back to basics and read about what GROUP BY does.
First of all welcome to the overflow...
Second: The answer is going to be "It depends"
Any time you aggregate data you will need to Group by the other fields in the query, and you have that in the query. The gotcha is what happens when data is spread across multiple locations.
My suggestion is to rethink your problem and see if you really need these other fields in the query. This will depend on what the person using the data really wants to know.
Do they need to know how many of item X there are, or do they really need to know that item X is spread out over three sites?
You might find you are better off with two smaller queries.
I have a fairly complex SQL query that involves returning about 20 columns from a large number of joins, used to populate a grid of results in a UI. It also uses a couple of CTEs to pre-filter the results. I've included an approximation of the query below (I've commented out the lines that fix the performance)
As the amount of data in the DB increased, the query performance tanked pretty hard, with only about 2500 rows in the main table 'Contract'.
Through experimentation, I found that by just removing the order, offset fetch at the end the performance went from around 30sec to just 1 sec!
order by 1 OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY
This makes no sense to me. The final line should be pretty cheap, free even when the OFFSET is zero, so why is it adding 29secs on to my query time?
In order to maintain the same function for the SQL, I adapted it so that I first select into #TEMP, then perform the above order-offset-fetch on the temp table, then drop the temp table. This completes in about 2-3 seconds.
My 'optimisation' feels pretty wrong, surely there's a more sane way to achieve the same speed?
I haven't extensively tested this for larger datasets, it's essentially a quick fix to get performance back for now. I doubt it will be efficient as the data size grows.
Other than the Clustered Indexes on the primary keys, there are no indexes on the tables. The Query Execution plan didn't appear to show any major bottlenecks, but I'm not an expert on interpreting it.
WITH tableOfAllContractIdsThatMatchRequiredStatus(contractId)
AS (
SELECT DISTINCT c.id
FROM contract c
INNER JOIN site s ON s.ContractId = c.id
INNER JOIN SiteSupply ss ON ss.SiteId = s.id AND ss.status != 'Draft'
WHERE
ISNULL(s.Deleted, '0') = 0
AND ss.status in ('saved')
)
,tableOfAllStatusesForAContract(contractId, status)
AS (
SELECT DISTINCT c.id, ss.status
FROM contract c
INNER JOIN site s ON s.ContractId = c.id
INNER JOIN SiteSupply ss ON ss.SiteId = s.id AND ss.status != 'Draft'
WHERE ss.SupplyType IN ('Electricity') AND ISNULL(s.Deleted, '0') = 0
)
SELECT
[Contract].[Id]
,[Contract].[IsMultiSite]
,statuses.StatusesAsCsv
... lots more columns
,[WaterSupply].[Status] AS ws
--INTO #temp
FROM
(
SELECT
tableOfAllStatusesForAContract.contractId,
string_agg(status, ', ') AS StatusesAsCsv
FROM
tableOfAllStatusesForAContract
GROUP BY
tableOfAllStatusesForAContract.contractId
) statuses
JOIN contract ON Contract.id = statuses.contractId
JOIN tableOfAllContractIdsThatMatchRequiredStatus ON tableOfAllContractIdsThatMatchRequiredStatus.contractId = Contract.id
JOIN Site ON contract.Id = site.contractId and site.isprimarySite = 1 AND ISNULL(Site.Deleted,0) = 0
... several more joins
JOIN [User] ON [Contract].ownerUserId = [User].Id
WHERE isnull(Deleted, 0) = 0
AND
(
[Contract].[Id] = '12659'
OR [Site].[Id] = '12659'
... often more search term type predicates here
)
--select * from #temp
order by 1
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY
--drop table #temp
I've not had an answer, so I'm going to try and explain it myself, with my admittedly poor understanding of how SQL works and some pointers from Jeroen in comments above. It's probably not right, but from what I've discovered it could be correct, and I do know how to fix my immediate problem so it could help others.
I'll explain it with an analogy, as this is what I believe is probably happening:
Imagine you're a chef in a restaurant, and you have to prepare a large number of meals (rows in results). You know there's going to be a lot as you're front of house has told you this (TOP 10 or FETCH 10).
You spend time setting out the multitude of ingredients required (table joins) and equipment you'll need and as the first order comes in, you make sure you're going to be really efficient. Chopping up more that you need for the first order, putting it in little bowls ready to use on the subsequent orders. The first order takes you quite a while (30 secs) as you're planning ahead and want the subsequent dishes to go out as fast as possible.
However, as you're sat in the kitchen waiting for the next orders.. then don't arrive. That's it, just one order. Well that was a waste of time! If you'd just tried to get one dish out, you could have done it much faster (1sec) but you were planning ahead for something that was never needed.
The next night, you ditch your previous strategy and just do each plate at a time. However this time, there are 100s of customers. You can't deliver them fast enough doing them one at a time. The amount of time to deliver all the orders would have been much faster if you'd planned ahead like the previous night. (I've not tested this hypothesis, but I expect it is what would probably happen).
For my query, I don't know if there's going to be 1 result or 100s although I may be able to do some analysis up front based on the search criteria entered by the user, I may have to adapt my UI to give me more information so I can predict this better, which means I can pick the appropriate strategy for SQL to use upfront. As it is, I'm optimised for a small number of results which works fine for now - but I need to do some more extensive testing to see how performance is affected as the dataset grows.
"If you want a answer to something, post something that's wrong on the internet and someone will be sure to correct you"
I have a list of 13000 places (with latitude and longitude) --- in table : place.
I have a list of 22000 polygons ---- in another table called place_polygon.
I need to try and resolve the pois to the polygons that they belong to.
This is the query that I wrote :
select * from stg_place.place a
left join stg_place.place_polygon b on
ST_Within(ST_GeomFromText('SRID=4326;POINT('||a.longitude||' '||a.latitude||')'),b.geom);
also tried :
select * from stg_place.place a
left join stg_place.place_polygon b on
ST_Intersects(ST_GeomFromText('SRID=4326;POINT('||a.longitude||' '||a.latitude||')'),b.geom);
It's running forever.
But, if I put a filter in the query, then it runs very fast for a single record.
select * from stg_place.place a
left join stg_place.place_polygon b on
ST_Within(ST_GeomFromText('SRID=4326;POINT('||a.longitude||' '||a.latitude||')'),b.geom)
where a.id = <id>;
I also tried writing a stored procedure and tried to loop through a cursor to only do for one record at a time. That also didn't help. The program ran overnight with no signs of ending.
Is there a better way to solve this? (not necessarily in postgis, but in python geopy etc... ? )
(Should I consider indexing the tables?)
First of all use geography type for your data instead of lat long columns. Why geography, not geometry? Because you use SRID=4326 and with geography type, it will be much easier if you want for example calculate distance in meters then with geometry type which will calculate in degrees for this SRID.
To create geography with your lat long column use function st_setsrid(st_makepoint(long,lat),4326)::geography
Ok. Now answering your question on your actual structure
I have a list of 13000 places (with latitude and longitude) --- in table : place. I have a list of 22000 polygons ---- in another table called place_polygon. I need to try and resolve the pois to the polygons that they belong to.
This is the query that I wrote :
select *
from stg_place.place a
left join stg_place.place_polygon b on
ST_DWithin (st_setsrid(st_make_point(long,lat),4326),b.geom,0);
I used ST_DWithin() instead of ST_Within() because on an older version of Postgres+PostGIS (for sure 9.6 and below) it guarantees of using a spatial index on geoms if created.
I looked at several other SO questions that seem somewhat related, but not quite what i need (or i'm just not smart enough to connect the dots).
Working on an app for a client. Their database holds the origin and destination of people that are traveling, limited (i believe) to just places in US and Canada, and a date when the trip will take place. The records are updated regularly. Call these "trips."
Users come to the site, and enter an origin and destination city, and a radius for each, indicating how far away from their desired origin/destination cities they are willing to travel in order to make their trip.
The job of the app is to find any/all trips that are already in the database, that are closest to the origin and destination that the user needs to travel.
My original thought was to find all origin cities in the database that are within the radius of the user's desired origin, then use that recordset to search the destination cities in the database for any/all cities within the radius of the user's desired destination.
I also need a decent (preferably free... low budget project here) API that can help look up the city geographic location and perform the actual radius calculation... I think.
Is what I'm looking to do even close to the best options? It looks like the hardest part will be finding all the existing cities in the database that are within the radius of the user's desired cities - which is a bit of a twist on a more simple query of just "find all cities in the radius of X city".
So, this is KINDA like an Uber situation, except the Uber driver is deciding what the trip parameters are, and the user just needs to know which Uber drivers are going from/to the places nearest those of the user (on the specified date, to boot).
Right now, users are just looking things up at a state level - BC to NY, and reading down rows of data looking at rides to find the ones that seem closest to what they need.
Thanks in advance, for any clever insights you smart folks might have!
Declare #DriverLat float = 41.744068
Declare #DriverLng float = -71.315024
Declare #Within int = 20
Select *
From (
Select Distinct
A.ZipCode
,A.CityName
,A.StateCode
,Miles = [dbo].[udf-Geo-Calc-Miles] (#DriverLat,#DriverLng,A.Lat,A.Lng)
From [dbo].[ZipCodes] A
Where CityType = 'D'
and ZipType = 'S'
) A
Where Miles <= #Within
Order By Miles
Returns
The UDF
CREATE Function [dbo].[udf-geo-Calc-Miles] (#Lat1 float,#Lng1 float,#Lat2 Float,#Lng2 float)
Returns Float as
Begin
Declare #Miles Float = (Sin(Radians(#Lat1)) * Sin(Radians(#Lat2))) + (Cos(Radians(#Lat1)) * Cos(Radians(#Lat2)) * Cos(Radians(#Lng2) - Radians(#Lng1)))
Return Case When #Miles is null then 0 else abs((3958.75 * Atan(Sqrt(1 - power(#Miles, 2)) / #Miles))) end
End
I have two tables:
The table LINKS has:
LINK_ID --- integer, unique ID
FROM_NODE_X -- numbers/floats, indicating a geographical position
FROM_NODE_Y --
FROM_NODE_Z --
TO_NODE_X --
TO_NODE_Y --
TO_NODE_Z --
The table LINK_COORDS has:
LINK_ID --- integer, refers to above UID
ORDER --- integer, indicating order
X ---
Y ---
Z ---
Logically each LINK consists of a number of waypoints. The final order is:
FROM_NODE , 1 , 2 , 3 , ... , TO_NODE
A link has at least two waypoints (FROM_NODE, TO_NODE), but can have a variable number of waypoints in between (0 to 100+).
I now would need a way to aggregate, sort and store the waypoints of each link in an array which later will be used to draw a line.
I'm struggling with the LINK_COORDS being available as individual rows. Having the start and end positions in the other (LINKS) table doesn't help either. If I had a way to at least get all the LINK_COORDS joined/updated to the LINKS table I probably could work out the rest myself again. So if you have an idea on how to get that far, it'd be much appreciated already.
Considering performance would be nice (the tables have somewhere between 500k to 1mio entries now and will have multiples of that later), but is not essential for now.
EDIT:
Thanks for the suggestion, a-horse-with-no-name.
I chose to create the point geometries (PostGIS) for each XYZ before this step, so in the end there's only an array of points to create from the individual points.
The adapted SQL
UPDATE "Link"
SET "POINTS" =
array_append(
(array_prepend(
"FROM_POINT",
(SELECT array_agg(lc."POINT" ORDER BY lc."COUNT")
FROM "LinkCoordinate" lc
WHERE lc."LINK_ID" = "Link"."LINK_ID")))
, "TO_POINT")
however is running extremely slow:
Running it exemplary on 10 links required ~120 seconds. Running it for all the 1,3mio links and plenty more linkcoords would probably take somewhere around half a year. Not really ideal.
How can I figure out where this immense slowness originates from?
If I get the source data in a pre-ordered format (so linkcoordinates of each link_ID), would this allow me to significantly speed up the SQL query?
EDIT: It appears the main slowdown originates from the SELECT subquery used in the array_agg() function. Everything else (incl. ordering) does not really cause any slowdown.
My current guess is that the SELECT query iterates over the entirety of "LinkCoordinate" for each and every link, making it work much harder than it has to, as all LinkCoordinates belonging to a Link are always stored in 'blocks' of rows. A single, sequential processing of the LinkCoordinates would be sufficient, really.
something like this maybe:
select l.link_id,
min(l.from_node_x) as from_node_x,
min(l.from_node_y) as from_node_y,
min(l.from_node_z) as from_node_z,
array_agg(lc.x order by lc."ORDER") as points_x,
array_agg(lc.y order by lc."ORDER") as points_y,
array_agg(lc.z order by lc."ORDER") as points_z,
min(l.to_node_x) as to_node_x,
min(l.to_node_y) as to_node_y,
min(l.to_node_z) as to_node_z
from links l
join link_coords lc on lc.link_id = l.link_id
group by l.link_id;
The min() is necessary due to the group by but won't change the result as all values from the links are the same anyway.
Another possibility is to use a scalar subquery. I'm unsure which of them is faster though - but the join/group by is probably more efficient.
select l.link_id,
l.from_node_x,
l.from_node_y,
l.from_node_z,
(select array_agg(lc.x order by lc."ORDER") from link_coords lc where lc.link_id = l.link_id) as points_x,
(select array_agg(lc.y order by lc."ORDER") from link_coords lc where lc.link_id = l.link_id) as points_y,
(select array_agg(lc.z order by lc."ORDER") from link_coords lc where lc.link_id = l.link_id) as points_z,
l.to_node_x,
l.to_node_y,
l.to_node_z
from links l