Ride Sharing App - finding surrounding origins and destinations - sql-server

I looked at several other SO questions that seem somewhat related, but not quite what i need (or i'm just not smart enough to connect the dots).
Working on an app for a client. Their database holds the origin and destination of people that are traveling, limited (i believe) to just places in US and Canada, and a date when the trip will take place. The records are updated regularly. Call these "trips."
Users come to the site, and enter an origin and destination city, and a radius for each, indicating how far away from their desired origin/destination cities they are willing to travel in order to make their trip.
The job of the app is to find any/all trips that are already in the database, that are closest to the origin and destination that the user needs to travel.
My original thought was to find all origin cities in the database that are within the radius of the user's desired origin, then use that recordset to search the destination cities in the database for any/all cities within the radius of the user's desired destination.
I also need a decent (preferably free... low budget project here) API that can help look up the city geographic location and perform the actual radius calculation... I think.
Is what I'm looking to do even close to the best options? It looks like the hardest part will be finding all the existing cities in the database that are within the radius of the user's desired cities - which is a bit of a twist on a more simple query of just "find all cities in the radius of X city".
So, this is KINDA like an Uber situation, except the Uber driver is deciding what the trip parameters are, and the user just needs to know which Uber drivers are going from/to the places nearest those of the user (on the specified date, to boot).
Right now, users are just looking things up at a state level - BC to NY, and reading down rows of data looking at rides to find the ones that seem closest to what they need.
Thanks in advance, for any clever insights you smart folks might have!

Declare #DriverLat float = 41.744068
Declare #DriverLng float = -71.315024
Declare #Within int = 20
Select *
From (
Select Distinct
A.ZipCode
,A.CityName
,A.StateCode
,Miles = [dbo].[udf-Geo-Calc-Miles] (#DriverLat,#DriverLng,A.Lat,A.Lng)
From [dbo].[ZipCodes] A
Where CityType = 'D'
and ZipType = 'S'
) A
Where Miles <= #Within
Order By Miles
Returns
The UDF
CREATE Function [dbo].[udf-geo-Calc-Miles] (#Lat1 float,#Lng1 float,#Lat2 Float,#Lng2 float)
Returns Float as
Begin
Declare #Miles Float = (Sin(Radians(#Lat1)) * Sin(Radians(#Lat2))) + (Cos(Radians(#Lat1)) * Cos(Radians(#Lat2)) * Cos(Radians(#Lng2) - Radians(#Lng1)))
Return Case When #Miles is null then 0 else abs((3958.75 * Atan(Sqrt(1 - power(#Miles, 2)) / #Miles))) end
End

Related

How to find the name of a place from latitude and longitude value

I am analysing app data which contains lat value and lon value of a user visited places. I was able to export the data to tableau and plot it on the map but I want to find the name of place for each pair of lat and lon.
One solution could be, if I get a table of three columns (Lat, Lon, Place) then I can join it with my user data table to find the name of a place at a given Lat and Lon.
My question is, do we have a ready made table with the above three columns which I can import in my SQL-Server? I am interested in places of UK or London. Is there any other approach to achieve it?
You can get this from the Ordinance Survey which should get you lat, long, postcode;
https://www.ordnancesurvey.co.uk/business-and-government/products/code-point-open.html
You'll then need another data source to map the postcode to location name (e.g. town, county etc). See the similar post below;
Where can I find a list of all UK _full_ postcodes including street name and their precise coordinates?
It might take a little fiddling about, and you're always going to have the issue with data being a little out of date but it should be good enough.
I wrote an API wrapper in R for postcodes.io, which is a free UK postcode database. Check the original documentation so that you could create an API wrapper in your language of choice. Wrappers in languages other than R are also available.
If you use R, then type you can get the place names in the following way:
if (!require("devtools")) install.packages("devtools")
devtools::install_github("erzk/PostcodesioR")
library(PostcodesioR)
rev_geo <- reverse_geocoding(0.127, 51.507)
It will return a list with extensive information about the latitude and longitude, e.g. wards, NUTS, administrative district, county, parish, consituency, CCG and many more.
There is also a bulk_reverse_geocoding() function which takes several lat and lon inputs.

Structure data in app engine ndb and speed up query

I am looking for some help as to the best way to structure data in app engine ndb using python, process it and query it later. I want to store temperature data at hourly intervals for different geographical regions.
I can think of two entity options but there maybe something much better. The first would be to store the hourly temperature in individual properties:
class TempData(ndb.Model):
region = ndb.StringProperty()
date = ndb.DateProperty()
00:00 = ndb.FloatProperty()
01:00 = ndb.FloatProperty()
...
23:00 = ndb.FloatProperty()
Or I could store the data
class TempData(ndb.Model):
region = ndb.StringProperty()
date = ndb.DateProperty()
time = ndb.TimeProperty()
temp = ndb.FloatProperty()
(it might be better to store date and time as one property?)
I want to be able to query the datastore to calculate the Total, Max, Min, and average temperature for any given date range. In the first option I could potentially create 4 more properties to effectively pre-process and store the Total, Max etc for each day so if I wanted to query the total temperature for a year I would only have to sum 365 values as opposed to 8760? I'm not sure how I would do this in the second option?
I am relatively new to app engine and datastore and I think I am still thinking in terms of relationship db's so any help would really be appreciated. Later on it might be necessary to store data in different time zones.
Thanks
Paul
Personally, I'd go with a variant of the first approach:
class TempData(ndb.Model):
region = ndb.StringProperty()
date = ndb.DateProperty()
temp = ndb.FloatProperty(repeated=True)
using the temp list to store temperatures by hour in order as you learn about them. I don't think the preprocessing per-date will add anything much: to compute whatever for a year, you'd still need to fetch 365 entities, and the delay for that will swamp the tiny amount of time required to sum up a few thousand numbers anyway.
In general, preprocessing is useful if you want to handily query by the new fields you create by such processing (e.g rapidly answer the question "which dates in locale X had average temperatures greater than 20 Celsius"). That does not seem to be your use case.
If anything, if it's common for you to have to compute many-month values, preprocessing to aggregate things per-month (into simpler TempDataMonth entities) may be more useful. Or, any other several-days period you find useful, of course (weeks, ten-day-groups, whatever). Those could be computed in a background task periodically checking which such periods have become complete since the last check. But, this is a bit beyond your question, so I'm not getting into fine-grained details.
The general idea is that minimizing the number of entities to fetch tends to be the single most important optimization; other optimizations are of course also possible, but, they tend to play second fiddle to that:-).

How to keep track changing items in a stock portfolio?

I have a system where people can pick some stocks and it values their portfolios but I'm having trouble doing this in a efficient way on a daily basis because I'm creating entries for days that don't have any changes(think of it like I'm measuring the values and having version control so I can track changes to the way the portfolio is designed).
Here's a example(each day's portfolio with stock name and weight):
Day1:
ibm = 10%
microsoft = 50%
google = 40%
day5:
ibm = 20%
microsoft = 20%
google = 40%
cisco = 20%
I can measure the value of the portfolio on day1 and understand I need to measure it again on day5(when it changed) but how do I measure day2-4 without recreating day1's entry in the database?
My approach right now(which I don't like) is to create a temp entry in my database for when someone changes the portfolio and then at the end of the day when I calculate the values if there is a temp entry I use that otherwise I create a new entry(for day2-4) using the last days data. The issue is as data often doesn't change I'm creating entries that are basically duplicates. The catch is: my stock data is all daily. I also thought of taking the portfolio and if it hasn't been updated in 3 days to find the returns of the last 3 days for each stock but I wasn't sure if there was a better solution.
Any ideas? I think this is a straight forward problem but I just can't see a efficient way of doing it.
note: in finance terms, its called creating a NAV and most firms do it the inefficient way I'm doing it but its because the process was created like 50 years ago and hasn't changed. I think this problem is very similar to version control but I can't seem to make a solution.
In storage terms is makes most sense to just store:
UserId - StockId1 - 23% - 2012-06-25
UserId - StockId2 - 11% - 2012-06-26
UserId - StockId1 - 20% - 2012-06-30
So you see that stock 1 went down at 30th. Now if you want to know the StockId1 percentage at the 28th you just select:
SELECT *
FROM stocks
WHERE datecolumn<=DATE(2012-06-28)
ORDER BY datecolumn DESC LIMIT 0,1
If it gives nothing back you did not have it, otherwise you get the last position back.
BTW. if you need for example a graph of stock 1 you could left join against a table full of dates. Then you can fill in the gaps easily.
Found this post here for example:
UPDATE mytable
SET number = (#n := COALESCE(number, #n))
ORDER BY date;
SQL QUERY replace NULL value in a row with a value from the previous known value

Movement Paths & Spatial-Temporal Queries in SQL Server

Hey, so I'm trying to figure out the best way of storing movement paths and then afterwards how they might be queried.
Let me try to explain a bit more. Say I have many cars moving around on a map and I want to determine if and when they're in a convoy. If I store just the paths then I can see that they travelled along the same road, but not if they were there at the same time. I can store the start and end times but that will not take into account the changes in speed of the two vehicles. I can't think of any obvious way to store and achieve this so I thought I'd put the question out there in case there's something I'm missing before trying to implement a solution. So does anyone know anything I don't?
Thanks,
Andrew
Well it depends on what type of movement information you have.
If you have some tables setup like:
Vehicle (Id, Type, Capacity, ...)
MovementPoint(VehicleId, Latitude, Longitude, DateTime, AverageSpeed)
This would allow you to query if two cars going to the same point plus or minus 5 minutes like so:
Select * from Vehicle v INNER JOIN MovementPoint mp on mp.VehicleId = v.Id
WHERE v.Id = #FirstCarID
AND EXISTS
(
SELECT 1 FROM Vehicle v2 INNER JOIN MovementPoint mp2 on mp2.VehicleId = v2.Id
WHERE v2.Id = #SecondCarId
AND mp2.Latitude = mp.Latitude AND mp2.Longitude = mp.Longitude
AND mp2.DateTime BETWEEN DATEADD(minute,-5,mp.DateTime) AND DATEADD(minute,5,mp.DateTime)
)
You could also query for multiple points in common between multiple vehicles with specific time windows.
Also you could make the query check latitude and longitude values are within a certain radius of each other.

SQL Server code to duplicate Excel calculation that includes circular reference

Is there a way to duplicate a formula with a circular reference from a Excel file into SQL Server? My client uses a excel file to calculate a Selling Price. The Selling Price field is (costs/1-Projected Margin)) = 6.5224 (1-.6) = 16.3060. One of the numbers that goes into the costs is commission which is defined as SellingPrice times a commission rate.
Costs = 6.5224
Projected Margin = 60%
Commissions = 16.3060(Selling Price) * .10(Commission Rate) = 1.6306 (which is part of the 6.5224)
They get around the circular reference issue in Excel because Excel allows them to check a Enable Iterative Calculation option and stops the iterations after 100 times.
Is this possible using SQL Server 2005?
Thanks
Don
This is a business problem, not an IT one, so it follows that you need a business solution, not an IT one. It doesn't sound like you're working for a particularly astute customer. Essentially, you're feeding the commission back into the costs and recalculating commission 100 times. So the salesman is earning commission based on their commission?!? Seriously? :-)
I would try persuading them to calculate costs and commissions separately. In professional organisations with good accounting practices were I've worked before these costs are often broken down into operating and non-operating or raw materials costs, which should improve their understanding of their business. To report total costs later on, add commission and raw materials costs. No circular loops and good accounting reports.
At banks where I've worked these costs are often called things like Cost (no commissions or fees), Net Cost (Cost + Commission) and then bizzarely Net Net Cost (Cost + Commission + Fees). Depending on the business model, cost breakdowns can get quite interesting.
Here are 2 sensible options you might suggest for them to calculate the selling price.
Option 1: If you're going to calculate margin to exclude commission then
Price before commission = Cost + (Cost * (1 - Projected Margin))
Selling price = Price before commission + (Price before commision * Commission)
Option 2: If your client insists on calculating margin to include commission (which it sounds like they might want to do) then
Cost price = Cost + (Cost * Commission)
Profit per Unit or Contribution per Unit = Cost price * (1-Projected Margin)
Selling Price = Cost Price + Profit per Unit
This is sensible in accounting terms and a doddle to implement with SQL or any other software tool. It also means your customer has a way of analysing their sales to highlight per unit costs and per unit profits when the projected margin is different per product. This invariably happens as the business grows.
Don't blindly accept calculations from spreadsheets. Think them through and don't be afraid to ask your customer what they're trying to achieve. All too often broken business processes make it as far as the IT department before being called into question. Don't be afraid of doing a good job and that sometimes means challenging customer requests when they don't make sense.
Good luck!
No, it is not possible
mysql> select 2+a as a;
ERROR 1054 (42S22): Unknown column 'a' in 'field list'
sql expressions can only refer to expressions that already exist.
You can not even write
mysql> select 2 as a, 2+a as b;
ERROR 1054 (42S22): Unknown column 'a' in 'field list'
The way to look at databases is as transactional engines that take data from one state into another state in one step (with combination of operators that operate not only on scalar values, but also on sets).
Whilst I agree with #Sir Wobin's answer, if you do want to write some recursive code, you may be able to do it by abusing Recursive Common Table Expressions:
with RecurseCalc as (
select CAST(1.5 as float) as Value,1 as Iter
union all
select 2 * Value,1+Iter from RecurseCalc where Iter < 100
), FinalResult as (
select top 1 Value from RecurseCalc order by Iter desc
)
select * from FinalResult option (maxrecursion 100)

Resources