Complicated SQL selection Statement in Big Query - sql-server

I have a table with 4 columns, latLng, dataTime, stage and index . I want to query the table in a way that the result would be
Within a time range
no duplicate of latlng, returning the most recent latLng which is of the nature "lat,lng" eg. 23.123,1344
ordered by stage and then index.
within the specified radius of the latLng.
Don't know how to achieve this in sql statement yet, but big query is making matter worse coz statement like distinct is not supported. My options so far just to achieve the first 2 on the list has really being challenging.
SELECT * FROM data.example
WHERE timeCollected IN
(SELECT max(timeCollected) FROM data.example GROUP BY latlng) order by col1,col2,col3
In what way can i achive this, Thanks.
Update
with this statement, i am able to query data within a range and specified time. but still unable to select duplicate rows with most recent latlng (if more than one row has same latlng, it should pick the most recent).
SELECT *, ( 3959 * acos( cos( radians(12.18663) ) * cos( radians( lat ) ) * cos( radians( long) - radians(6.65604) ) + sin( radians(12.18663) ) * sin( radians( lat ) ) ) ) AS distance FROM data.example WHERE TIMESTAMP(timeCollected) <= DATE_ADD(USEC_TO_TIMESTAMP(NOW()), 60, 'minute') HAVING distance < 25 ORDER BY
distance ASC

was able to run do it after some long hours. Don't know how efficient this statement might be but here it is:
SELECT latlng, max(TIMESTAMP(timeCollected)) as timeCollected,first(sessionKey) as session,first(stage) as stage,first(index) as index,
( 3959 * acos( cos( radians(9.0071) ) * cos( radians( lat ) ) * cos( radians( long) - radians(7.56511) ) + sin( radians(9.0071) ) * sin( radians( lat ) ) ) ) AS distance
FROM opendata.openQueryData WHERE TIMESTAMP(timeCollected) > DATE_ADD(USEC_TO_TIMESTAMP(NOW()), -60, 'minute') GROUP BY latlng,distance HAVING distance < 25
order by session,stage,index ASC

Related

Format column result with % char in SQL Server

I have the following SQL code that uses the lag function.
SELECT SalesAmount
,lag(SalesAmount) OVER (
ORDER BY DATE
) AS PreviousDaylSales
,(
SalesAmount - (
lag(SalesAmount) OVER (
ORDER BY DATE
)
)
) AS Difference
,(
SalesAmount - (
lag(SalesAmount) OVER (
ORDER BY DATE
)
)
) / (
lag(SalesAmount) OVER (
ORDER BY DATE
)
) * 100 AS PercentChange
FROM Sheet1$
Can I format the alias column PercentChange values with a % char?
Yes, you can use square bracket []:
select . . . as [%Change]
from Sheet1$;
You can use sub-query to avoid repetition of same expression :
SELECT t.*, (SalesAmount - PreviousDaylSales) AS AS Difference,
CONCAT('%', SalesAmount - PreviousDaylSales / PreviousDaylSales * 100) AS [%Change]
FROM (SELECT SalesAmount, lag(SalesAmount) OVER (ORDER BY DATE) AS PreviousDaylSales
FROM Sheet1$
) t;
You can use CONCAT function.
select ...
CONCAT(...*100, '%') as PercentChange
from Sheet1$

How to use recursive CTE to add resolution to a data set

I'm attempting to create a recursive CTE statement that adds blank rows in between data points that will later for interpolation. I'm a beginner with SQL and this is my first time using CTE's and am having some difficulty finding the proper way to do this.
I've attempted a few different slight variations on the code I have provided below after some research but haven't grasped a good enough understanding to see my issue yet. The following code should simulate sparse sampling by taking a observation every 4 hours from the sample data set and the second portion should add rows with there respective x values every 0.1 of an hour which will later be filled with interpolated values derived from a cubic spline.
--Sample Data
create table #temperatures (hour integer, temperature double precision);
insert into #temperatures (hour, temperature) values
(0,18.5),
(1,16.9),
(2,15.3),
(3,14.1),
(4,13.8),
(5,14.7),
(6,14.7),
(7,13.5),
(8,12.2),
(9,11.4),
(10,10.9),
(11,10.5),
(12,12.3),
(13,16.4),
(14,22.3),
(15,27.2),
(16,31.1),
(17,34),
(18,35.6),
(19,33.1),
(20,25.1),
(21,21.3),
(22,22.3),
(23,20.3),
(24,18.4),
(25,16.8),
(26,15.6),
(27,15.4),
(28,14.7),
(29,14.1),
(30,14.2),
(31,14),
(32,13.9),
(33,13.9),
(34,13.6),
(35,13.1),
(36,15),
(37,18.2),
(38,21.8),
(39,24.1),
(40,25.7),
(41,29.9),
(42,28.9),
(43,31.7),
(44,29.4),
(45,30.7),
(46,29.9),
(47,27);
--1
WITH xy (x,y)
AS
(
SELECT TOP 12
CAST(hour AS double precision) AS x
,temperature AS y
FROM #temperatures
WHERE cast(hour as integer) % 4 = 0
)
Select x,y
INTO #xy
FROM xy
Select [x] As [x_input]
INTO #x_series
FROM #xy
--2
with recursive
, x_series(input_x) as (
select
min(x)
from
#xy
union all
select
input_x + 0.1
from
x_series
where
input_x + 0.1 < (select max(x) from x)
)
, x_coordinate as (
select
input_x
, max(x) over(order by input_x) as previous_x
from
x_series
left join
#xy on abs(x_series.input_x - xy.x) < 0.001
)
The first CTE works as expected and produces a list of 12 (a sample every 4 hours for two days) but the second produces syntax error. The expected out put would be something like
(4,13.8), (4.1,null/0), (4.2,null/0),....., (8,12.2)
I dont think you need recursive.
What about this:
SQL DEMO
SELECT DISTINCT n = number *1.0 /10 , #xy.x, #xy.y
FROM master..[spt_values] step
LEFT JOIN #xy
ON step.number*1.0 /10 = #xy.x
WHERE number BETWEEN 40 AND 480
This 480 is based on the two days you mention.
OUTPUT
You dont even need the temporal table
SELECT DISTINCT n = number *1.0 /10 , #temperatures.temperature
FROM master..[spt_values] step
LEFT JOIN #temperatures
ON step.number *1.0 / 10 = #temperatures.hour
AND #temperatures.hour % 4 = 0
WHERE number BETWEEN 40 AND 480;
I don't think you need a recursive CTE here. I think a solution like this would be a better approach. Modify accordingly.
DECLARE #max_value FLOAT =
(SELECT MAX(hour) FROM #temperatures) * 10
INSERT INTO #temperatures (hour, temperature)
SELECT X.N / 10, NULL
FROM (
select CAST(ROW_NUMBER() over(order by t1.number) AS FLOAT) AS N
from master..spt_values t1
cross join master..spt_values t2
) X
WHERE X.N <= #max_value
AND X.N NOT IN (SELECT hour FROM #temperatures)
Use the temp table #xy produced in --1 you have, the following will give you a x series:
;with x_series(input_x)
as
(
select min(x) AS input_x
from #xy
union all
select input_x + 0.1
from x_series
where input_x + 0.1 < (select max(x) from #xy)
)
SELECT * FROM x_series;

Getting Largest Distance from group of GPS Coordinates

So I have a database with multiple rows of GPS coordinates. I know how to calculate the distance from a given lat/lng from any one of them in the database, but what I want to do basically is look at the coordinates of a set of rows and get the two rows that are farthest apart. I'd love it if I could do this in SQL, but if I have to do it in my application code that would work to. Here is what I am doing to calculate the distance between two points:
ROUND(( 3960 * acos( cos( radians( :lat ) ) *
cos( radians( p.latitude ) ) * cos( radians( p.longitude ) - radians( :lng ) ) +
sin( radians( :lat ) ) * sin( radians( p.latitude ) ) ) ),1) AS distance
What we are trying to do is, look at GPS data for a specific user and make sure they aren't moving wildly all over the country. All the coordinates for a user should be within a couple miles at most of each other. A flag that there is malicious activity in our system is if the coordinates are all over the country. So I'd like to be able to quickly run through the data for a spcicic user and know what is the max distance they have been.
I thought about just running a Max/Min on the lat and lng separately and set an internal threshold for what is acceptable. And maybe that is easier, but if what I asked in the first part is possible, that would be best.
If you have SQL Server 2008 or later then you can use GEOGRAPHY to calculate the distance, e.g.:
DECLARE #lat1 DECIMAL(19,6) = 44.968046;
DECLARE #lon1 DECIMAL(19,6) = -94.420307;
DECLARE #lat2 DECIMAL(19,6) = 44.33328;
DECLARE #lon2 DECIMAL(19,6) = -89.132008;
SELECT GEOGRAPHY::Point(#lat1, #lon1, 4326).STDistance(GEOGRAPHY::Point(#lat2, #lon2, 4326));
This makes the problem pretty trivial?
For a set of lats/ longs for a user you would need to calculate the distance between each set and then return the highest distance. Putting this all together, you could probably do something like this:
DECLARE #UserGPS TABLE (
UserId INT, --the user
GPSId INT, --the incrementing unique id associated with this GPS reading (could link to a table with more details, e.g. time, date)
Lat DECIMAL(19,6), --lattitude
Lon DECIMAL(19,6)); --longitude
INSERT INTO #UserGPS SELECT 1, 1, 44.968046, -94.420307; --User #1 goes on a very long journey
INSERT INTO #UserGPS SELECT 1, 2, 44.33328, -89.132008;
INSERT INTO #UserGPS SELECT 1, 3, 34.12345, -92.21369;
INSERT INTO #UserGPS SELECT 1, 4, 44.978046, -94.430307;
INSERT INTO #UserGPS SELECT 2, 1, 44.968046, -94.420307; --User #2 doesn't get far
INSERT INTO #UserGPS SELECT 2, 2, 44.978046, -94.430307;
--Make a working table to store the distances between each set of co-ordinates
--This isn't strictly necessary; we could change this into a common-table expression
DECLARE #WorkTable TABLE (
UserId INT, --the user
GPSIdFrom INT, --the id of the first set of co-ordinates
GPSIdTo INT, --the id of the second set of co-ordinates being compared
Distance NUMERIC(19,6)); --the distance
--Get the distance between each and every combination of co-ordinates for each user
INSERT INTO
#WorkTable
SELECT
c1.UserId,
c1.GPSId,
c2.GPSId,
GEOGRAPHY::Point(c1.Lat, c1.Lon, 4326).STDistance(GEOGRAPHY::Point(c2.Lat, c2.Lon, 4326))
FROM
#UserGPS c1
INNER JOIN #UserGPS c2 ON c2.UserId = c1.UserId AND c2.GPSId > c1.GPSId;
--Note this is a self-join, but single-tailed. So we compare each set of co-ordinates to each other set of co-ordinates for a user
--This is handled by the "c2.GPSID > c1.GPSId" in the JOIN clause
--As an example, say we have three sets of co-ordinates for a user
--We would compare set #1 to set #2
--We would compare set #1 to set #3
--We would compare set #2 to set #3
--We wouldn't compare set #3 to anything (as we already did this)
--Determine the maximum distance between all the GPS co-ordinates per user
WITH MaxDistance AS (
SELECT
UserId,
MAX(Distance) AS Distance
FROM
#WorkTable
GROUP BY
UserId)
--Report the results
SELECT
w.UserId,
g1.GPSId,
g1.Lat,
g1.Lon,
g2.GPSId,
g2.Lat,
g2.Lon,
md.Distance AS MaxDistance
FROM
MaxDistance md
INNER JOIN #WorkTable w ON w.UserId = md.UserId AND w.Distance = md.Distance
INNER JOIN #UserGPS g1 ON g1.UserId = md.UserId AND g1.GPSId = w.GPSIdFrom
INNER JOIN #UserGPS g2 ON g2.UserId = md.UserId AND g2.GPSId = w.GPSIdTo;
Results are:
UserId GPSId Lat Lon GPSId Lat Lon MaxDistance
1 3 34.123450 -92.213690 4 44.978046 -94.430307 1219979.460185
2 1 44.968046 -94.420307 2 44.978046 -94.430307 1362.820895
Now I made a LOT of assumptions about what data you are holding as there was no information about the detail of this in your question. You would probably need to adapt this to some degree?

Sql Query to calculate Distance between 2 latitudes and Longitudes

This question was posted many a times, but again i had to post the same due to incorrect result what am getting. Can any one help me what am doing wrong.
What i need is the nearest warehouse name for the given customer in Cust_Master and Distance between WH and Customer
I have 2 tables as below.
WH_Master
WH_Name Latitude Longitude
----------- --------- ---------
Horamavu 13.02457 77.65723
White Field 12.985278 77.729899
Hennur 13.030672 77.634034
Cust_Master
Cust_ID Latitude Longitude
------- --------- ---------
Cust-1 13.025579 77.6515
I have tried the below Option and it gives me a wrong distance and location. For the current customer in the example Horamavu is the nearest warehouse and as per the google the distance is 1.8 KM. But am getting 0.751 which is Wrong.
The Query i used is below.
SELECT Top 1 WH_Name, (( 6367.45 * acos( cos( radians(13.025579) ) * cos( radians( Latitude ) ) * cos( radians( Longitude ) - radians(77.6515) ) + sin( radians(13.025579) ) * sin( radians( Latitude ) ) ) )) AS distance_KM FROM WH_Master
Unfortunately this is getting me the same WH_Name and the distance am getting is also wrong. Can you please let me know the correct query. Am using MS SQL Server as my database.
If you are using SQL 2008 or later, you should use a geography data type, and the STDistance function.
eg:
declare #t table (wh_name nvarchar(50), p geography)
insert #t values
('horamavu',geography::STGeomFromText('POINT(13.02457 77.65723)', 4326)),
('white field',geography::STGeomFromText('POINT(12.985278 77.729899)', 4326)),
('hennur', geography::STGeomFromText('POINT(13.030672 77.634034)', 4326))
select wh_name,
p.STDistance(geography::STGeomFromText('POINT(13.025579 77.6515)', 4326)) distance_m
from #t
order by distance_m
Are you sure the google distance is the crow flies distance, and not the via road distance?
As for your original query, if you want TOP you need to specify the ORDER BY

sort by distance using latitude and longitude

I am trying to build a store locator, and am having trouble forming my sql statements. I have the following so far:
SELECT TOP 3 Custno
, ( 3959
* acos( cos( radians(36) )
* cos( radians( Latitude ) )
* cos( radians( Longitude ) - radians(120) )
+ sin( radians(120) ) * sin( radians( Latitude ) )
)
) AS distance
FROM Customers
ORDER BY distance
When I run that statement I get:
Msg 0, Level 11, State 0, Line 0
A severe error occurred on the current command.
The results, if any, should be discarded.
However the query works when I remove the order by clause and when I change the order by clause to use Custno. What is causing this error and how can I avoid it?
Starting with SQL Server 2008 there's a Geography data type which is designed for things like this. Here's a couple links:
http://msdn.microsoft.com/en-us/library/ff929109.aspx
http://blogs.msdn.com/b/isaac/archive/2008/10/23/nearest-neighbors.aspx

Resources