SQL Server 2008 R2 STcontains spatial join using two tables - sql-server

I've been stabbing at this for a while and am getting nowhere, so I'm hoping that someone with greater skills than I might have the answer.
I have two tables and in one is a set of latitude and longitude coordinates as separate columns. In the second able I have polygon shapes set in to a spatial geometry column.
The goal is to select all of the latitude and longitude pairs from table 1, which might be called separately as:
SQLSTRING = "SELECT LAT,LONG FROM dbo.Table1;"
The second table can be called using a scripting language loop to parse through each result one by one by using the following query:
SQLSTRING = "SELECT * FROM dbo.Table2 a WHERE a.POLY.STContains(geometry::STPointFromText('POINT(" & -Text Longitude Value from Table 1- & " " & -Text Latitude Value from Table 1- & ")',0))=1;"
So, my dilemma is that it surely would be possible to select all items from Table 1 and run them through a query that will only return those results where the latitude and longitude from table 1 are contained within any specified polygon stored in table 2. The scripting language loop is so obviously inefficient, so a single SQL query that could replace this and just return any matches would be a major time and resource saver.
Any help or pointers would be most gratefully appreciated. Thank you, in advance, for your advice.

Since you're working with spatial data, you can do a cross join (join all the rows from both tables together), then filter out what matches.
SELECT *
FROM dbo.Table2 AS a
, dbo.Table1 AS b
WHERE a.POLY.STContains(geometry::STPointFromText('POINT('+CAST(b.LONG AS VARCHAR)+' '+CAST(b.LAT AS VARCHAR)+')',0))=1;
One problem with performance here is that it will need to generate the geometry object repeatedly. It would be better if you could create a column to hold the geometry for table1. Make sure you have an spatial index on POLY in Table2 also.

Related

SQL Server: Comparing dates from multiple records

When a part is created in a table ("ASC_PMA_TBL"), a number is auto-generated. Any "sub-parts" that are subsequently created then have an associated number. So for example, the "master" part might be 18245, and it may have several subparts which would be "18245-50", or "18245-40", etc. Subparts are always identified by having the master part number, followed by a '-' then a two-digit number. Each sub part has a date associated with it ("EO_DATE"). All I want to do is display records where the "master" dates don't match each of the sub-parts dates. All data is in the one table "ASC_PMA_TBL".
Normally this would be easily achieved using a join. However in the database, the subparts are not related to their master through the use of foreign keys, so I'm having to find a different way of doing things.
Furthermore, the date field is a date/ time field, so to compare them I first have to convert the field into a date only field. I can do this, but then am unable to use the alias in my query!
Any help is much appreciated :)
I have tried creating temporary tables and using subqueries, but cannot solve this problem :(
UPDATE: Managed to solve the problem using temporary tables, truncating the part number of the sub-parts to match the master parts, and then joining the two to compare the dates. Might be messy, but it works!
SELECT
PMA_PART_ONLY,
CONVERT(DATE,PMA_EFFECT_DATE_OFF) As 'EO_DATE'
INTO
##MParts
FROM
ASC_PMA_TBL
WHERE
(PMA_PROC_CODE = 'M') AND
(PMA_EFFECT_DATE_OFF IS NOT NULL)
SELECT
PMA_PART_ONLY,
CONVERT(DATE,PMA_EFFECT_DATE_OFF) As 'EO_DATE',
SUBSTRING(PMA_PART_ONLY,0,CHARINDEX('-',PMA_PART_ONLY,0)) As 'MP_NO'
INTO
##SParts
FROM
ASC_PMA_TBL
WHERE
(PMA_PROC_CODE = 'S') AND
(PMA_EFFECT_DATE_OFF IS NOT NULL)
SELECT
##SParts.PMA_PART_ONLY As 'SUB_PART_NO',
##MParts.EO_DATE As 'M_PART_DATE',
##SParts.EO_DATE As 'S_PART_DATE'
FROM
##MParts INNER JOIN ##SParts ON ##SParts.MP_NO = ##MParts.PMA_PART_ONLY
WHERE
(##MParts.EO_DATE <> ##SParts.EO_DATE)
ORDER BY
SUB_PART_NO DESC
DROP TABLE ##MParts
DROP TABLE ##SParts
If you want to compare just dates and not times you gotta convert the dates:
select *
from ASC_PMA_TBL master
inner join ASC_PMA_TBL parts
ON parts.number like CAST(master.number AS VARCHAR(30)) + '[_]%'
where CAST(master.EO_DATE AS DATE) <> CAST(parts.EO_DATE AS DATE)
That's the main idea, get all master and parts where part number is like master number + underscope.
Note that you have to escape "_" in []-quotes when performing LIKE

SQL loop for updating rows

I'm new to SQL Server. The scenario is the following:
I have a csv with a bunch of Serial N0, which are unique.
Example:
Serial No
-----------
01561
21654
156416
89489
I also have a SQL Server database table, where are several rows which can be identified with the serial no. For example I have 6 rows in the SQL Server table with the serial no. 01561. Now I want to update a field in all these rows with "Yes". If it is only about this number, I know the solution it's
UPDATE dbo.Table1
SET DeleteFlag = 'Yes'
WHERE Serial No. = 01561;
Unfortunately I have more than 10,000 Serial No in the csv for what I have to do that. Can you help me to find a solution for that?
First you should use the TASK feature to import the CSV. You right click to do this on the database and select "TASK" and import data. Its a UI which is pretty self explanatory, so itll help you get the job done quickly and easily. Make note of the name you give the table, SQL Server will try and give it a defualt name with a "$" in the name. Change that to something like "MyTableImport". If the data is already in SQL Server, go to the next step.
Step 2 - You can do the UPDATE for the entire table via a join. All youre doing is matching the ID's to another table, right? Looping would be a bad idea here especially since itll take a minute to loop through 10000+ and run an update FOR EACH ONE. Thats against an idea known as "Set based approach" which to sum it up nicely is to do things all at once (google it though because im horribly over simplifying the idea for you). Here is a sample update join query for you:
UPDATE x
SET x.DeleteFlag='Yes'
FROM yourimportable y
INNER JOIN yourLocal x ON y.SerialNo=x.SerialNo
Assuming that you have created a temp table to load CSV with a bunch of serial number. Now you can update your permanent table with the temp table data using update join like this:
UPDATE t1
SET t1.DeleteFlag = 'Yes'
FROM dbo.Table1 AS t1
INNER JOIN #TempTable2 AS t2
ON t1.Serial_No = t2.Serial_No

Excel formula to SQL

I would like to add a column with computed values to my MSSQL database, but I don't know how to create the SQL code
My data contains the columns PricePerUnit and Instance_Type
I would like the new computed value column to show what percentage cheaper each Instance_Type is versus the most expensive of that same Instance_Type. For example the most expensive c5.12xlarge is on the first row (London) and therefore is 0% cheaper, but the same c5.12xlarge is Ireland is cheaper by 4.95%, and in Oregon that identical Instance_Type of c5.12xlarge is 15.84% cheaper. I would like the computed value column in SQL to show 0% and 4.95%. 15.84% and so on.
In Excel I would use the following
formula =(MAXIFS(A:A,B:B,B2)-A2)/MAXIFS(A:A,B:B,B2)
The database table is called AmazonEC2
Here is an image of it working in Excel. The first blue table is identical to the data in the SQL database, the black table represents what I want to achieve in SQL.
I don't think it would be good (it may not even be possible) to do this as a computed column.
From the comments it seems the data type of PricePerUnit is nvarchar(). If that is the case I must point out that is poor database design. I understand that may be beyond your control to change.
Anyway, I created an AmazonEC2 table as I think you may have it. Using your spreadsheet of data I created insert statements to populate that table with your data using the following formula.
=CONCATENATE("insert into AmazonEC2 (PricePerUnit, Instance_Type, Instance_Family, Location) values ('", A2, "', ", "'", B2, "', '", C2, "', '", D2, "')")
I built all of that into a dbfiddle so you can see it in action and so that other people here can manipulate the data and try different approaches.
Here is final SQL statement to retrieve your data and calculate the PercentCheaper at the time of retrieval. You could also create a view based on this SQL statement.
SELECT
x.PricePerUnit
, x.Instance_Type
, x.Instance_Family
, x.Location
, FORMAT((x.MaxPricePerUnit - convert(decimal(10, 2), PricePerUnit)) / x.MaxPricePerUnit, 'P') AS PercentCheaper
FROM (
SELECT
PricePerUnit
, Instance_Type
, Instance_Family
, Location
, MAX(convert(decimal(10, 2), PricePerUnit)) OVER (PARTITION BY Instance_Type) AS MaxPricePerUnit
FROM AmazonEC2
) x;
What we are doing here is getting the maximum PricePerUnit for each Instance_Type in the subquery which I have aliased as "x". Then I select from that and perform the calculation to find the PercentCheaper for each row.
Since it seems PricePerUnit is an nvarchar() column you need to convert it to a number in order to do the calculations. Note that you do not need to convert MaxPricePerUnit because the conversion happened before it was used as an input to the MAX() function resulting in an output with a decimal(10,2) data type.

SQL query runs into a timeout on a sparse dataset

For sync purposes, I am trying to get a subset of the existing objects in a table.
The table has two fields, [Group] and Member, which are both stringified Guids.
All rows together may be to large to fit into a datatable; I already encountered an OutOfMemory exception. But I have to check that everything I need right now is in the datatable. So I take the Guids I want to check (they come in chunks of 1000), and query only for the related objects.
So, instead of filling my datatable once with all
SELECT * FROM Group_Membership
I am running the following SQL query against my SQL database to get related objects for one thousand Guids at a time:
SELECT *
FROM Group_Membership
WHERE
[Group] IN (#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, #Guid5, ..., #Guid999)
The table in question now contains a total of 142 entries, and the query already times out (CommandTimeout = 30 seconds). On other tables, which are not as sparsely populated, similar queries don't time out.
Could someone shed some light on the logic of SQL Server and whether/how I could hint it into the right direction?
I already tried to add a nonclustered index on the column Group, but it didn't help.
I'm not sure that WHERE IN will be able to maximally use an index on [Group], or if at all. However, if you had a second table containing the GUID values, and furthermore if that column had an index, then a join might perform very fast.
Create a temporary table for the GUIDs and populate it:
CREATE TABLE #Guids (
Guid varchar(255)
)
INSERT INTO #Guids (Guid)
VALUES
(#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, ...)
CREATE INDEX Idx_Guid ON #Guids (Guid);
Now try rephrasing your current query using a join instead of a WHERE IN (...):
SELECT *
FROM Group_Membership t1
INNER JOIN #Guids t2
ON t1.[Group] = t2.Guid;
As a disclaimer, if this doesn't improve the performance, it could be because your table has low cardinality. In such a case, an index might not be very effective.

How to calculate the "Nearest Neighbour" for multiple sources in SQL Server?

The "Nearest Neighbour" problem is very common when working with spatial data.
There's even some nice, simple documentation about how to do it with MS Sql Server in their docs!
I'm usually seeing examples where it's using 1x source Lat/Long and it returns the 'x' number of nearest neighbour Lat/Longs. Fine...
e.g.
USE AdventureWorks2012
GO
DECLARE #g geography = 'POINT(-121.626 47.8315)';
SELECT TOP(7) SpatialLocation.ToString(), City FROM Person.Address
WHERE SpatialLocation.STDistance(#g) IS NOT NULL
ORDER BY SpatialLocation.STDistance(#g);
In my case, I have multiple Lat/Long sources ... and for each source, need to return the 'x' number of nearest neighbours.
Here's my schema
Table: SomeGeogBoundaries
LocationId INTEGER PRIMARY KEY (it's not an identity, but a PK & FK)
CentrePoint GEOGRAPHY
Index:
Spatial Index on CentrePoint column. [Geography || MEDIUM, MEDIUM, HIGH, HIGH]
Sample data:
LocationId | CP Lat/Long
1 | 10,10
2 | 11,11
3 | 20,20
..
So for each location in this table, I need to find the closest.. say 5 other locations.
Update
So far, it looks like using a CURSOR is the only way .. but I'm open to more set based solutions.
You need to find the nearest neighbors within the same set?
SELECT *
FROM SomeGeogBoundaries as b
OUTER APPLY (
SELECT TOP(5) CentrePoint
FROM SomeGeogBoundaries as t
WHERE t.CentrePoint.STInsersects(b.CentrePoint.STBuffer(100))
ORDER by b.CentrePoint.STDistance(t.CentrePoint)
) AS nn
Two notes.
The where clause in the outer apply is to limit the search to (in this case) points that are within 100 meters of eachother (assuming that you're using an SRID whose native unit of measure is meters). That may or may not be appropriate for you. If not, just omit the where clause.
I think this is still a cursor. Don't fool yourself into thinking that just because there is nary a declare cursor statement to be seen that the db engine has much of a choice but to iterate through your table and evaluate the apply for each row.

Resources