PostGIS: Spatial Joins Summing using Lats/Lngs - postgis

I have a table, called pets, with lats and lng coordinates and variables and a table that was created by uploading a shape file, called city, with columns left, right, top, bottom, lat, lng, and geom.
lat lng cats dogs
-99.4 42.1 2 0
-98.1 44.3 1 1
-99.7 43.6 0 3
-99.8 42.0 2 1
I'm wondering how I would construct a query that will sum up the number of cats and dogs within each boundary. Usually I would do a spatial join in QGIS but my dataset is too large and it crashes before successfully joining the two.
I am pretty familiar with QGIS and Postgres, but very new to PostGIS. I've constructed a query but unsuccessful in getting the answer I needed:
select sum(pets.dogs)
from pets, city
WHERE ST_Within(pets.dogs, city.geom);
Any pointers or resources on PostGIS written for a newbie would be greatly appreciated. Thanks!

The pets table needs a GeometryColumn representing the lat/lng. Add it like this:
psql> SELECT AddGeometryColumn ('my_schema','pets','geom',4326,'POINT',2);
psql> UPDATE pets SET geom=ST_SetSRID(ST_MakePoint(lng, lat), 4326);
For large datasets, it's advisable to add a spatial index to the column:
psql> CREATE INDEX geom_idx ON pets USING GIST (geom);
This should allow for speedy queries like:
psql> SELECT sum(pets.dogs)
FROM pets, city
WHERE ST_Within(pets.geom, city.geom);
This will give a single count of all dogs within any city geometry. Group by city name (or id) to get counts by city:
psql> SELECT city.id, city.name, sum(pets.dogs)
FROM pets, city
WHERE ST_Within(pets.geom, city.geom)
GROUP BY city.id;

Related

How to implement many-to-many-to-many database relationship?

I am building a SQLite database and am not sure how to proceed with this scenario.
I'll use a real-world example to explain what I need:
I have a list products that are sold by many stores in various states. Not every Store sells a particular Product at all, and those that do, may only sell it in one State or another. Most stores sell a product in most states, but not all.
For example, let's say I am trying to buy a vacuum cleaner in Hawaii. Joe's Hardware sells vacuums in 18 states, but not in Hawaii. Walmart sells vacuums in Hawaii, but not microwaves. Burger King does not sell vacuums at all, but will give me a Whopper anywhere in the US.
So if I am in Hawaii and search for a vacuum, I should only get Walmart as a result. While other stores may sell vacuums, and may sell in Hawaii, they don't do both but Walmart does.
How do I efficiently create this type of relationship in a relational database (specifically, I am currently using SQLite, but need to be able to convert to MySQL in the future).
Obviously, I would need tables for Product, Store, and State, but I am at a loss on how to create and query the appropriate join tables...
If I, for example, query a certain Product, how would I determine which Store would sell it in a particular State, keeping in mind that Walmart may not sell vacuums in Hawaii, but they do sell tea there?
I understand the basics of 1:1, 1:n, and M:n relationships in RD, but I am not sure how to handle this complexity where there is a many-to-many-to-many situation.
If you could show some SQL statements (or DDL) that demonstrates this, I would be very grateful. Thank you!
An accepted and common way is the utilisation of a table that has a column for referencing the product and another for the store. There's many names for such a table reference table, associative table mapping table to name some.
You want these to be efficient so therefore try to reference by a number which of course has to uniquely identify what it is referencing. With SQLite by default a table has a special column, normally hidden, that is such a unique number. It's the rowid and is typically the most efficient way of accessing rows as SQLite has been designed this common usage in mind.
SQLite allows you to create a column per table that is an alias of the rowid you simple provide the column followed by INTEGER PRIMARY KEY and typically you'd name the column id.
So utilising these the reference table would have a column for the product's id and another for the store's id catering for every combination of product/store.
As an example three tables are created (stores products and a reference/mapping table) the former being populated using :-
CREATE TABLE IF NOT EXISTS _products(id INTEGER PRIMARY KEY, productname TEXT, productcost REAL);
CREATE TABLE IF NOT EXISTS _stores (id INTEGER PRIMARY KEY, storename TEXT);
CREATE TABLE IF NOT EXISTS _product_store_relationships (storereference INTEGER, productreference INTEGER);
INSERT INTO _products (productname,productcost) VALUES
('thingummy',25.30),
('Sky Hook',56.90),
('Tartan Paint',100.34),
('Spirit Level Bubbles - Large', 10.43),
('Spirit Level bubbles - Small',7.77)
;
INSERT INTO _stores (storename) VALUES
('Acme'),
('Shops-R-Them'),
('Harrods'),
('X-Mart')
;
The resultant tables being :-
_product_store_relationships would be empty
Placing products into stores (for example) could be done using :-
-- Build some relationships/references/mappings
INSERT INTO _product_store_relationships VALUES
(2,2), -- Sky Hooks are in Shops-R-Them
(2,4), -- Sky Hooks in x-Mart
(1,3), -- thingummys in Harrods
(1,1), -- and Acme
(1,2), -- and Shops-R-Them
(4,4), -- Spirit Level Bubbles Large in X-Mart
(5,4), -- Spiirit Level Bubble Small in X-Mart
(3,3) -- Tartn paint in Harrods
;
The _product_store_relationships would then be :-
A query such as the following would list the products in stores sorted by store and then product :-
SELECT storename, productname, productcost FROM _stores
JOIN _product_store_relationships ON _stores.id = storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
ORDER BY storename, productname
;
The resultant output being :-
This query will only list stores that have a product name that contains an s or S (as like is typically case sensitive) the output being sorted according to productcost in ASCending order, then storename, then productname:-
SELECT storename, productname, productcost FROM _stores
JOIN _product_store_relationships ON _stores.id = storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
WHERE productname LIKE '%s%'
ORDER BY productcost,storename, productname
;
Output :-
Expanding the above to consider states.
2 new tables states and store_state_reference
Although no real need for a reference table (a store would only be in one state unless you consider a chain of stores to be a store, in which case this would also cope)
The SQL could be :-
CREATE TABLE IF NOT EXISTS _states (id INTEGER PRIMARY KEY, statename TEXT);
INSERT INTO _states (statename) VALUES
('Texas'),
('Ohio'),
('Alabama'),
('Queensland'),
('New South Wales')
;
CREATE TABLE IF NOT EXISTS _store_state_references (storereference, statereference);
INSERT INTO _store_state_references VALUES
(1,1),
(2,5),
(3,1),
(4,3)
;
If the following query were run :-
SELECT storename,productname,productcost,statename
FROM _stores
JOIN _store_state_references ON _stores.id = _store_state_references.storereference
JOIN _states ON _store_state_references.statereference =_states.id
JOIN _product_store_relationships ON _stores.id = _product_store_relationships.storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
WHERE statename = 'Texas' AND productname = 'Sky Hook'
;
The output would be :-
Without the WHERE clause :-
make Stores-R-Them have a presence in all states :-
The following would make Stores-R-Them have a presence in all states :-
INSERT INTO _store_state_references VALUES
(2,1),(2,2),(2,3),(2,4)
;
Now the Sky Hook's in Texas results in :-
Note This just covers the basics of the topic.
You will need to create combine mapping table of product, states and stores as tbl_product_states_stores which will store mapping of products, state and store. The columns will be id, product_id, state_id, stores_id.

How to calculate the "Nearest Neighbour" for multiple sources in SQL Server?

The "Nearest Neighbour" problem is very common when working with spatial data.
There's even some nice, simple documentation about how to do it with MS Sql Server in their docs!
I'm usually seeing examples where it's using 1x source Lat/Long and it returns the 'x' number of nearest neighbour Lat/Longs. Fine...
e.g.
USE AdventureWorks2012
GO
DECLARE #g geography = 'POINT(-121.626 47.8315)';
SELECT TOP(7) SpatialLocation.ToString(), City FROM Person.Address
WHERE SpatialLocation.STDistance(#g) IS NOT NULL
ORDER BY SpatialLocation.STDistance(#g);
In my case, I have multiple Lat/Long sources ... and for each source, need to return the 'x' number of nearest neighbours.
Here's my schema
Table: SomeGeogBoundaries
LocationId INTEGER PRIMARY KEY (it's not an identity, but a PK & FK)
CentrePoint GEOGRAPHY
Index:
Spatial Index on CentrePoint column. [Geography || MEDIUM, MEDIUM, HIGH, HIGH]
Sample data:
LocationId | CP Lat/Long
1 | 10,10
2 | 11,11
3 | 20,20
..
So for each location in this table, I need to find the closest.. say 5 other locations.
Update
So far, it looks like using a CURSOR is the only way .. but I'm open to more set based solutions.
You need to find the nearest neighbors within the same set?
SELECT *
FROM SomeGeogBoundaries as b
OUTER APPLY (
SELECT TOP(5) CentrePoint
FROM SomeGeogBoundaries as t
WHERE t.CentrePoint.STInsersects(b.CentrePoint.STBuffer(100))
ORDER by b.CentrePoint.STDistance(t.CentrePoint)
) AS nn
Two notes.
The where clause in the outer apply is to limit the search to (in this case) points that are within 100 meters of eachother (assuming that you're using an SRID whose native unit of measure is meters). That may or may not be appropriate for you. If not, just omit the where clause.
I think this is still a cursor. Don't fool yourself into thinking that just because there is nary a declare cursor statement to be seen that the db engine has much of a choice but to iterate through your table and evaluate the apply for each row.

How to merge adjactent polygons to 1 polygon and keep min/max data?

I have the following polygons in PostGIS
Each polygon has field with "Data" value.
I would like auto merge the polygons which touch each other :
1-2 and 3-4-5-6-7
Also , If possible I would like to have the Min/Max values from the columns of each polygon kept to the new polygon
Id Data Geom
1 8.45098 MULTIPOLYGON(((178253.411393551 665205.232423685,178248.411393552 665205.232423685,178248.411393552 665210.232423684,178253.411393551 665210.232423684,178253.411393551 665205.232423685)))
2 10.7918 MULTIPOLYGON(((178258.411393551 665205.232423685,178253.411393551 665205.232423685,178253.411393551 665210.232423684,178258.411393551 665210.232423684,178258.411393551 665205.232423685)))
3 10.7918 MULTIPOLYGON(((178263.411393552 665185.232423682,178258.411393551 665185.232423682,178258.411393551 665190.232423685,178263.411393552 665190.232423685,178263.411393552 665185.232423682)))
4 10.4139 MULTIPOLYGON(((178268.411393553 665185.232423682,178263.411393552 665185.232423682,178263.411393552 665190.232423685,178268.411393553 665190.232423685,178268.411393553 665185.232423682)))
5 7.448 MULTIPOLYGON(((178263.411393552 665180.232423684,178258.411393551 665180.232423684,178258.411393551 665185.232423682,178263.411393552 665185.232423682,178263.411393552 665180.232423684)))
6 10.2318 MULTIPOLYGON(((178268.411393553 665180.232423684,178263.411393552 665180.232423684,178263.411393552 665185.232423682,178268.411393553 665185.232423682,178268.411393553 665180.232423684)))
7 10.998 MULTIPOLYGON(((178263.411393552 665175.232423685,178253.411393551 665175.232423685,178253.411393551 665180.232423684,178258.411393551 665180.232423684,178263.411393552 665180.232423684,178263.411393552 665175.232423685)))
8 10.7548 MULTIPOLYGON(((178263.411393552 665175.232423685,178253.411393551 665175.232423685,178253.411393551 665180.232423684,178258.411393551 665180.232423684,178263.411393552 665180.232423684,178263.411393552 665175.232423685)))
What will be the easiest way to do it (I have little knowledge in QGIS/ArcMap and better knowledge with PostGIS ) ?
The only way I could figure out how to do this, was to create a table of unioned geometries in a CTE, use ST_Dump to produce individual polygons (ie, 1-2 and 3-4-5-6 in your question) and then select the max and min values of the data attributes from the original table (which I have called polygons, as you didn't specify a name), that intersect with the new unioned geometries, and grouping by the same new unioned geometries.
WITH geoms (geom) as
(SELECT (ST_Dump(ST_Union(geom))).geom from polygons)
SELECT max(data), min(data), g.geom
FROM polygons p, geoms g
WHERE St_Intersects(s.geom, g.geom)
GROUP BY g.geom;
If you want to save this to a new table, then add CREATE TABLE new_table AS in front of the WITH. There may be a more efficient way, but this works. In your question, your input polygons are MutliPolygons, so if you want this in the output also, add ST_Multi in front of the new unioned geometry. Putting that all together, you get something like:
CREATE TABLE Unioned_geometries AS
WITH geoms (geom) as
(SELECT (ST_Dump(ST_Union(geom))).geom from polygons)
SELECT max(data), min(data), ST_Multi(g.geom)
FROM polygons p, geoms g
WHERE St_Intersects(s.geom, g.geom)
GROUP BY g.geom;
You can use ST_Dump and ST_Union, but you will have problem on bigger data, if you will UNION milions of polygons, your geometry will be very very complex and PostGIS isn`t designed to work with big, complex geometries. You can use topology, or somethink like this
CREATE TABLE block_buildings AS
SELECT
block_id
, ST_MemUnion(geometry)
FROM houses building
, LATERAL (
with recursive building_block AS (
SELECT building.id
UNION
SELECT building2.id FROM building_block
JOIN houses build_geom USING(id)
JOIN houses building2
ON st_dwithin(build_geom.geometry, building2.geometry, 0.5)
)
SELECT md5(string_agg(id::text, ',' order by id)) block_id FROM building_block JOIN houses USING(id)
) block
GROUP BY block_id
;
LATERAL works like for loop, subquery is evaluated for every row. WITH recursive is common table expression, it works recursive, like snowball. ST_DWithin is used because of optimalization, you can use dump on outgoing geometries, if you want merge only polygons with shared boundary, or overlaps. It is slow, but not so much memory consuming (because of lateral), it can be optimalized (for example with plpgsql), because every group is computed for all its polygons. But you can use in aggregate query some aggregates for atrs. If you will create only geometry, you can agregate attrs into using ST_With and ST_PointOnSurface, it is pretty fast, if is well indexed.
-------- edit
In actual PostGIS are functions for clustering
this or this or this
This functions

SQL Server 2008 R2 STcontains spatial join using two tables

I've been stabbing at this for a while and am getting nowhere, so I'm hoping that someone with greater skills than I might have the answer.
I have two tables and in one is a set of latitude and longitude coordinates as separate columns. In the second able I have polygon shapes set in to a spatial geometry column.
The goal is to select all of the latitude and longitude pairs from table 1, which might be called separately as:
SQLSTRING = "SELECT LAT,LONG FROM dbo.Table1;"
The second table can be called using a scripting language loop to parse through each result one by one by using the following query:
SQLSTRING = "SELECT * FROM dbo.Table2 a WHERE a.POLY.STContains(geometry::STPointFromText('POINT(" & -Text Longitude Value from Table 1- & " " & -Text Latitude Value from Table 1- & ")',0))=1;"
So, my dilemma is that it surely would be possible to select all items from Table 1 and run them through a query that will only return those results where the latitude and longitude from table 1 are contained within any specified polygon stored in table 2. The scripting language loop is so obviously inefficient, so a single SQL query that could replace this and just return any matches would be a major time and resource saver.
Any help or pointers would be most gratefully appreciated. Thank you, in advance, for your advice.
Since you're working with spatial data, you can do a cross join (join all the rows from both tables together), then filter out what matches.
SELECT *
FROM dbo.Table2 AS a
, dbo.Table1 AS b
WHERE a.POLY.STContains(geometry::STPointFromText('POINT('+CAST(b.LONG AS VARCHAR)+' '+CAST(b.LAT AS VARCHAR)+')',0))=1;
One problem with performance here is that it will need to generate the geometry object repeatedly. It would be better if you could create a column to hold the geometry for table1. Make sure you have an spatial index on POLY in Table2 also.

Distance Calculation with huge SQL Server database

I have a huge database of businesses (about 500,000) with zipcode, address etc . I need to display them by ascending order from 100 miles are of users zipcode. I have a table for zipcodes with related latitude and longitude. What will be faster/better solution ?
Case 1: to calculate distance and sort by distance. I will have users current zipcode, latitude and longitude in session. I will calculate distance using a SQL Server function.
Case 2: to get all zipcodes in 50 miles area and get businesses with all those zipcodes. Here I will have to write a select in nested query while finding businesses.
I think case 1 will calculate distance for all businesses in database. While 2nd case will just fetch zipcodes and will end up fetching only required businesses. Hence case 2 should be better? I would appreciate any suggestion here.
Here is LINQ query I have for case 1.
var businessListQuery = (from b in _DB.Businesses
let distance = _DB.CalculateDistance(b.Zipcode,userLattitude,userLogntitude)
where b.BusinessCategories.Any(bc => bc.SubCategoryId == subCategoryId)
&& distance < 100
orderby distance
select new BusinessDetails(b, distance.ToString()));
int totalRecords = businessListQuery.Count();
var ret = businessListQuery.ToList().Skip(startRow).Take(pageSize).ToList();
On a side note app is in C# .
Thanks
You could do worse than look at the GEOGRAPHY datatype, for example:
CREATE TABLE Places
(
SeqID INT IDENTITY(1,1),
Place NVARCHAR(20),
Location GEOGRAPHY
)
GO
INSERT INTO Places (Place, Location) VALUES ('Coventry', geography::Point(52.4167, -1.55, 4326))
INSERT INTO Places (Place, Location) VALUES ('Sheffield', geography::Point(53.3667, -1.5, 4326))
INSERT INTO Places (Place, Location) VALUES ('Penzance', geography::Point(50.1214, -5.5347, 4326))
INSERT INTO Places (Place, Location) VALUES ('Brentwood', geography::Point(52.6208, 0.3033, 4326))
INSERT INTO Places (Place, Location) VALUES ('Inverness', geography::Point(57.4760, -4.2254, 4326))
GO
SELECT p1.Place, p2.place, p1.location.STDistance(p2.location) / 1000 AS DistanceInKilometres
FROM Places p1
CROSS JOIN Places p2
GO
SELECT p1.Place, p2.place, p1.location.STDistance(p2.location) / 1000 AS DistanceInKilometres
FROM Places p1
INNER JOIN Places p2 ON p1.SeqID > p2.SeqID
GO
geography::Point takes the latitude and longitude as well as an SRID (Special Reference ID number). In this case, the SRID is 4326 which is standard latitude and longitude. As you already have latitude and longitude, you can just ALTER TABLE to add the geography column then UPDATE to populate it.
I've shown two ways to get the data out of the table, however you can't create an indexed view with this (indexed views can't have self-joins). You could though create a secondary table that is effectively a cache, that's populated based on the above. You then just have to worry about maintaining it (could be done through triggers or some other process).
Note that the cross join will give you 250,000,000,000 rows, but searching is simple as you only need look at one of the places columns (i.e., SELECT * FROM table WHERE Place1 = 'Sheffield' AND distance < 100, the second will give you significantly less rows, but the query then needs to consider both the Place1 and Place2 column).

Resources