Calculate Nearest Neighbor in SQL Server

Calculate Nearest Neighbor in SQL Server - sql-server

I have 2 datasets, Fire Hydrants (3381 records) and Street Centerlines (6636 records). I would like to identify the Street nearest to each Hydrant record.
Following some SQL Server resources and tutorials I put together a script, but it has been running for over an hour now. I understand nearest neighbor may take awhile to run but it seems like something might be wrong in the logic.
select
h.OBJECTID,
st.FULL_ST_NAME
from WATER_HYDRANTS as h LEFT OUTER JOIN
STREET_CENTERLINES as st
on st.Shape.STDistance(h.SHAPE) is NOT NULL
ORDER BY st.Shape.STDistance(h.SHAPE) ASC
The reason I think something is wrong in the logic is because when I add a WHERE clause to select only one record with an ID, the query returns a list of the entire dataset. In the ObjectID column it is all the same value (e.g., 13992) and in the FULL_ST_NAME column is (I assume) a list of every street in ordered by proximity to the feature.
select
h.OBJECTID,
st.FULL_ST_NAME
from WATER_HYDRANTS as h LEFT OUTER JOIN
STREET_CENTERLINES as st
on st.Shape.STDistance(h.SHAPE) is NOT NULL
where h.OBJECTID = '13992'
ORDER BY st.Shape.STDistance(h.SHAPE) ASC
Ideally, each record in the objectID column will be unique and the FULL_ST_NAME column will have the street that is closest to each hydrant.
Please let me know if I can provide any other information. I tried to be thorough in my explanation and made an attempt at due diligence and research before coming to SO.
Thanks

Instead of LEFT OUTER JOIN to StreetCenterLines you need CROSS APPLY.
The TOP 1 with ORDER BY STDistance inside the CROSS APPLY gives you the nearest street for each Hydrant. (Your original query was giving ALL the streets for each hydrant.)
Easier to show than to explain; it's like this:
select
h.OBJECTID,
st2.FULL_ST_NAME
from WATER_HYDRANTS as h
CROSS APPLY (SELECT TOP 1 st.FULL_ST_NAME
FROM STREET_CENTERLINES as st
WHERE st.Shape.STDistance(h.SHAPE) IS NOT NULL
ORDER BY st.Shape.STDistance(h.SHAPE) ASC) as st2
It might take a long time to run though, since for every hydrant it has to calculate the distance to every street and then find the shortest one.

Related

SQL Server, 2 tables, one is input, second is database with items, find closest match

This is my first Stackflow question, I hope someone can help me out with this. I I am completely lost and a newbie at SQL.
I have two tables (which I overly simplified for this question), the first one has the customer info and the car tire that they need. The second one is simply filled with a tire id, and all of the information for the tires. I am trying to input only the customer ID and return the one closest tire that matches the input along with the values of both the selected tire and the customer's tire. The matches also need to be prioritized in that order (size most important, width next most important, ratio is least important). Any suggestions on how to do this or where to start? Is there anything I can look at to help me solve this problem? I have been trying many different procedures, and some nested selects, but nothing is getting me close. Thank you.
customertable (custno, custsize, custwidth, custratio)
1,17,255,50
2,16,235,50
etc...
tirecollection (tireid, tiresize, tirewidth, tireratio)
1,15,225,40
2,16,225,50
3,17,250,55
4,17,235,30
5,18,255,40
etc...

This is not a 100% complete solution, but may work towards coming up with a solution. The approach here is combining the tyre dimensions into one value and then ranking them within a tyre size partition. You could then pass in the customer tyre dimensions to get the closest match.
with CTE
as
(
select *, TyreSize + TyreWidth as [TyreDimensions]
from tblTyres
)
select TC.CustId, C.TyreId, C.TyreSize, C.TyreWidth, C.[TyreDimensions],
rank() over(partition by C.TyreSize order by C.[TyreDimensions]) as [RNK]
from tblTyreCustomer as TC
join CTE as C
on TC.CustTyreSize = C.TyreSize

Assuming you're running SQL Server 2008 or later, this should work (this assumes you want to get a result for a single customer on a case-by-case basis):
CREATE FUNCTION udf.GetClosestTireMatch
(
#CustomerNo int
)
RETURNS TABLE
AS RETURN
SELECT custno, tireid, tiresize, tirewidth, tireratio
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY sizediff, widthdiff, ratiodiff) AS rownum
, c.custno, c.custsize, c.custwidth, c.custratio, t.tireid, t.tiresize, t.tirewidth, t.tireratio
, ABS(c.custsize-t.tiresize) AS sizediff, ABS(c.custwidth-t.tirewidth) AS widthdiff, ABS(c.custratio-t.tireratio) AS ratiodiff
FROM (SELECT * FROM customertable WHERE custno = #CustomerNo) c
CROSS JOIN tirecollection
) sub
WHERE rownum = 1
GO
Then you run the function with:
SELECT * FROM udf.GetClosestTireMatch(5)
(where 5=the customernumber you're querying).

How to Correct This SQL CODE?

This is the question using AdvetureWorks2012.
Create a VIEW dbo.vw_Commissions to display the commissions earned last
year by all sales employees. Round the result set to two decimal places and do not include any salesperson who did not earn a commission. Include the
salesperson name, the commission earned, and the job title. Concatenate the
salesperson first and last names.
This code is not working for me. What am I screwing up?
USE AdventureWorks2012
GO
CREATE VIEW dbo.vw_Commissions
AS
SELECT
Sales.SalesPerson.SalesLastYear,
Person.Person.LastName,
Person.Person.FirstName,
HumanResources.Employee.JobTitle
FROM
Sales.SalesPerson
LEFT OUTER JOIN
Sales.SalesPerson ON Sales.SalesPerson.BusinessEntityID = Person.Person.BusinessEntityID
LEFT OUTER JOIN
Person.Person ON Person.Person.BusinessEntityID = HumanResources.Employee.BusinessEntityID

There are multiple problems with your query.
The biggest problem is it is not answering the questions asked. You are selecting SalesLastYear whereas the question asks to calculate the Commissions.
You are not filtering SalesPersons which have not earned any commission.
You need this to run only for last year.
Concatenate the FirstName and LastName.
The error you are getting is because you are using Sales.SalesPerson twice in your query. You need to give them alias names. However, I don't think you need two instances of the same table in this query. Also to use HumanResources.Employee.JobTitle column in the select list, you need to include table HumanResources.Employee in the from list.

SQL Server - Count the number of times the contents of a specified field repeat in a table

What's the best way to 'SELECT' a 'DISTINCT' list of a field from a table / view (with 'WHERE' criteria) and alongside that count the number of times that that field content repeats in the table / view?
In other words, I have an initial view that looks a bit like this:
I'd like a single SQL query to filter it (SELECT...WHERE...) so that we are only considering records where [ORDER COMPLETE] = False and [PERSONAL] = Null...
...and then create a distinct list of names with counts of the number of times each name appears in the previous table:
*Displaying the [ORDER COMPLETE] and [PERSONAL] fields is redundant by this point and could be dropped to simplify.
I can do the steps individually as above, but struggling to get a single query to do it all... any help appreciated!
Thanks in advance,
-Tim

This should just be the following
SELECT dbo.tblPerson.Person,
COUNT(dbo.tblPerson.Person) AS Count
FROM dbo.tblPerson
INNER JOIN dbo.tblNotifications ON dbo.tblPerson.PersonID = dbo.tblNotifications.AddresseeID
WHERE dbo.tblNotifications.Complete = 'False'
AND dbo.tblNotifications.Personal IS NULL
GROUP BY dbo.tblPerson.Person
ORDER BY COUNT(dbo.tblPerson.Person) DESC
You don't need your DISTINCT or TOP 100 PERCENT,
Here is a simplified fiddle

Well I got downvoted into oblivion (probably for displaying the full extent of my own ignorance!), but just in case someone from the future experiences the same problem as me and stumbles across this question while Googling (or whatever verb you use for "searching all digitised human knowledge" in the distant future), here's some sanitised code of the query I managed to get to work in the end - thanks to Mark Sinkinson's snippet for helping me realise the obvious...
SELECT DISTINCT TOP (100) PERCENT dbo.tblPerson.Person, COUNT(dbo.tblPerson.Person) AS CountPerson
FROM dbo.tblPerson INNER JOIN
dbo.tblNotifications ON dbo.tblPerson.PersonID = dbo.tblNotifications.AddresseeID
WHERE (dbo.tblNotifications.Complete = 'False') AND (dbo.tblNotifications.Personal IS NULL)
GROUP BY dbo.tblPerson.Person
ORDER BY CountPerson DESC

*EDIT** Thanks for all the input, and sorry for late reply. I have been away during the weekend without access to internet. I realized from the answers that I needed to provide more information, so people could understand the problem more throughly so here it comes:
I am migrating an old database design to a new design. The old one is a mess and very confusing ( I haven't been involved in the old design ). I've attached a picture of the relevent part of the old design below:
The table called Item will exist in the new design as well, and it got all columns that I need in the new design as well except one and it is here my problem begin. I need the column which I named 'neededProp' to be associated( with associated I mean like a column in the new Item table in the new design) with each new migrated row from Item.
So for a particular eid in table Environment there can be n entries in table Item. The "corresponding" set exists in table Room. The only way to know which rows that are associated in Item and Room are with the help of the columns "itemId" and "objectId" in the respective table. So for example for a particular eid there might be 100 entries in Item and Room and their "itemId" and "objectId" can be values from 1 to 100, so that column is only unique for a particular eid ( or baseSeq which it is called in table BaseFile).
Basically you can say that the tables Environment and BaseFile reminds of each other and the tables Item and Room reminds of each other. The difference is that some tables lack some columns and other may have some extra. I have no idea why it is designed like this from the beginning.
My question is if someone can help me with creating a query so that I can be able to find out the proper "neededProp" for each row in the Item-table so I can get that data into the new design?
*OLD-PART**This might be a trivial question but I can't get it to work as I want. I want to join a few tables as in the sql-statement below. If I start like this and run this query
select * from Environment e
join items ei on e.eid = ei.eid
I get like 400000 rows which is what I want. However if I add one more line so it looks like this:
select * from Environment e
join items ei on e.eid= ei.eid
left join Room r on e.roomnr = r.roomobjectnr
I get an insane amount of rows so there must be some multiplication going on. I want to get the same amount of rows ( like 400000 in this case ) even after joining the third table. Is that possible somehow? Maybe like creating a temporary view with the first 2 rows.
I am using MSSQL server.

So without knowing what data you have in your second query it's very difficult to say exactly how to write this out, and you're likely having a problem where there's an additional column that you are joining to in Rooms that perhaps you have forgotten such as something indicating a facility or hallway perhaps where you have multiple 'Room 1' entries as an example.
However, to answer your question regarding another way to write this out without using a temp table I've crufted up the below as an example of using a common table expression which will only return one record per source row.
;WITH cte_EnvironmentItems AS (
SELECT *
FROM Environment E
INNER JOIN Items I ON I.eid = E.eid
), cte_RankedRoom AS (
SELECT *
,ROW_NUMBER() OVER (ORDER BY R.UpdateDate DESC) [RN]
FROM Room R
)
SELECT *
FROM cte_EnvironmentItems E
LEFT JOIN cte_RankedRoom R ON E.roomnr = R.roomobjectnr
AND R.RN = 1

btw,do you want column from room table.if no then
select * from Environment e
join items ei on e.eid= ei.eid
where e.roomnr in (select r.roomobjectnr from Room r )
else
select * from Environment e
join items ei on e.eid= ei.eid
left join (select distinct roomobjectnr from Room) r on e.roomnr = r.roomobjectnr

Can i say that one column is more important then another in a full-text search?

I use Full-text indexing in SQL Server 2008.
I can't seem to find an answer to this question. Say that i have a full-text index on a table with the columns "Name" and "Description". I want to make the "Name" column much more important then the "Description" column.
So if you search for "Internet" the result with the name "Internet" will always come on top no matter how many occurences there is for internet in the description. It must be possible right?

I found this article just now.
http://www.goodercode.com/wp/?p=10
In my code it became like this. Works exactly as i want to :) Thanks for you help!
SELECT dbo.Item.Name, dbo.Item.[Description],NAME_SRCH.RANK AS rank1, DESC_SRCH.RANK AS rank2
FROM dbo.Item LEFT OUTER JOIN
FREETEXTTABLE(dbo.Item, name, 'Firefox') NAME_SRCH ON
dbo.Item.ItemId = NAME_SRCH.[KEY] LEFT OUTER JOIN
FREETEXTTABLE(dbo.Item, *, 'Firefox') DESC_SRCH ON
dbo.Item.ItemId = DESC_SRCH.[KEY]
ORDER BY rank1 DESC, rank2 DESC

You could add a computed column to your select list using Case where you have assign a value to that column based on the occurence of the search term in the columns of interest and then order by that column. So for example something like:
SELECT (CASE WHEN Name LIKE "%searchTerm%" THEN 10
WHEN Description LIKE "%searchTerm%" THEN 5
ELSE 0 END CASE) AS computedValue
FROM myTable
ORDER BY computedValue DESC

As I know there is no T-SQL syntax to specify rate of column.
But you can get this with trick of select.
(assuming using FREETEXTTABLE, but you can rewrite with other FTS constructions)
SELECT CASE
WHEN ISNULL(hi_prior.RANK) low_prior.KEY
ELSE hi_prior.RANK END --hi prior is selected whenever is possible
FROM
FREETEXTTABLE (Table1, (HiPriorColumn), #query) hi_prior
FULL OUTER JOIN
FREETEXTTABLE (Table1, (LowPriorColumn), #query) low_prior
ON (hi_prior.KEY = low_prior_KEY)
In case you need both result - use UNION, but multiply lowest on some rate: low_prior.RANK * 0.7

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight