Extracting road distance without Google Maps API - maps

Is there a global alternative to Google Maps API related primarily to the extraction of road distances?
Say, I have latitude and longitude information for a great number of locations, in which I would like to build a matrix of distances. Google sets a limit to our queues (2500 daily extractions), which is significantly lower than what I need. Instead of paying google or doing approximations with geodesic distances, what would be my alternatives?

Try Distancematrix.ai. This is an alternative to Google Maps Distance Matrix API, but it has even more advantages.
Main features:
Unlimited matrix size
Accounts for traffic conditions, traffic jams, and restrictions
Distances are calculated to account for driving, walking, or bicycling
Departure and arrival time option is used to build a forecast
Easy migration, you have just to switch endpoint
Disclaimer: I work at the company that creates the API.

Related

google appstore, how to split fees per datastore namespace

I'd like to make a GAE app multi-tenant to cater to different clients (companies), database namespaces seems like a GAE endorsed solution. Is there a meaningful way to split GAE fees among client/namespaces? GAE costs for app are mainly depends on user activities - backend instances up time, because new instances are created or (after 15 min delay) terminated proportionally to the server load, not total volume of data user has or created. Ideally the way the fees are split should be meaningful and could be explained to the clients.
I guess the most fair fee splitting solution is just create a new app for a new client, so all costs reported separately, yet total cost will grow up, I expect few apps running on same instances will use server resources more economically.
Every app engine request is logged with a rough estimated cost measurement. It is possible to log the namespace/client associated with every request and query the logs to add up the estimated instance costs for that namespace. Note that the estimated cost field is deprecated and may be inaccurate. It is mostly useful as a rough guide to the proportion of instance cost associated with each client.
As far as datastore pricing goes, the cloud console will tell you how much data has been stored in each namespace, and you can calculate costs from that. For reads/writes, we have set up a logging system to help us track reads and writes per namespace (i.e. every request tracks the number of datastore reads and writes it does in each namespace and logs these numbers at the end of the request).
The bottom line is that with some investments into infrastructure and logging, it is possible to roughly track costs per namespace. But no, App Engine does not make this easy, and it may be impossible to calculate very accurate cost estimates.

What is currently the best accepted way to get incrementing numbers in google app engine?

We have an application to move to google app engine, which the owners require to continue to use incrementing numbers, or an approximation thereof, ie it would be ok if each server has a block of 100 or so numbers to dish (sharding).
Is there a library for this or is it still roll your own?
Roll your own. But that's a highly unscalable requirement; you won't be able to get more than about one number per second in the simple case (a singleton counter datastore entity).
There are solutions if you have wiggle room for 100, such as a sharded counter allocator.

Best approach to caching geocoded results

We have a database with roughly 1.5 million rows, soon to be closer to 3 million. Each row has an address. Our service is responsible for visualizing each row on a map in several ways. The issue is that many times, each map will have well over a thousand different rows being displayed. Because of that, it is impractical to have the client (or the server) load all 1000 coordinates from something such as Google Maps API v3.
Ideally, we'd like to store the coordinate values into the table so they are ready for use whenever. However, rate limiting would make that take months to successfully cache all the data.
Is there a service that has no limit, or maybe allows for multiple addresses to be sent at a time to expedite the process?
You could try LiveAddress by SmartyStreets -- the addresses will not only be geocoded but also verified. (Though, you won't get geocoding results for addresses which don't verify.)
You could upload a list with all your addresses or process them through our API. (If your addresses aren't split into components, you'll need to use the API for now, which can receive 100 addresses per request.) Granted, for 1 million+ rows, it's not free (unless you're a non-profit or educational), but the service scales in the cloud and can handle thousands of addresses per second. There are plans which fit millions of lookups, all the way to just plain Unlimited. (By the way, I work at SmartyStreets.)
Most addresses have "Zip9" precision (meaning, the coordinates are accurate to the 9-digit ZIP code level, which is like block-level). Some are Zip7 or Zip5 which are less accurate but might be good enough for your needs.
If you need to arrange precision skydiving, though, you might consider a more dedicated mapping service that gives you rooftop-level precision and allows you to store the data. I know that you can store and cache SmartyStreets data, but map services have different restrictions. For example, Google has rooftop-level data for most US addresses, and lets you cache their data to improve performance, but you aren't allowed to store it in a database and build your own data set. You could also pay Google to raise your rate limits, though it's a little pricey.
I'm not sure what terms the other mapping providers have. (Geocoding services like TAMU have better accuracy but less capable infrastructure, thus rate limits, although you can probably pay to have those raised or lifted.)

How to estimate hosting services cost on GAE?

I'm building a system which I plan to deploy on Google App Engine. Current pricing is described here:
Google App Engine - Pricing and Features
I need an estimate of cost per client managed by the webapp. The cost won't be very accurate until I have completed the development. GAE uses such fine grained price calculation such as READs and WRITEs that it becomes a very daunting task to estimate operation cost per user.
I have an agile dev. process which leaves me even more clueless in determining my cost. I've been exploiting my users stories to create a cost baseline per user story. Then I roughly estimate how will the user execute each story workflow to finally compute a simplistic estimation.
As I see it, computing estimates for Datastore API is overly complex for a startup project. The other costs are a bit easier to grasp. Unfortunately, I need to give an approximate cost to my manager!
Has anyone undergone such a task? Any pointers would be great, regarding tools, examples, or any other related information.
Thank you.
Yes, it is possible to do cost estimate analysis for app engine applications. Based on my experience, the three major areas of cost that I encountered while doing my analysis are the instance hour cost, the datastore read/write cost, and the datastore stored data cost.
YMMV based on the type of app that you are developing, of course. If it is an intense OLTP application that handle simple-but-frequent CRUD to your data records, most of the cost would be on the datastore read/write operations, so I would suggest to start your estimate on this resource.
For datastore read/write, the cost for writing is generally much more expensive than the cost for reading the data. This is because write cost take into account not only the cost to write the entity, but also to write all the indexes associated with the entity. I would suggest you to read an article by Google about the life of a datastore write, especially the part about Apply Phase, to understand how to calculate the number of write per entity based on your data model.
To do an estimate of instance hours that you would need, the simplest approach (but not always feasible) would be to deploy a simple app to test how long would a particular request took. If this approach is undesirable, you might also base your estimate on the Google App Engine System Status page (e.g. what would be the latency for a datastore write for a particularly sized entity) to get a (very) rough picture on how long would it take to process your request.
The third major area of cost, in my opinion, is the datastore stored data cost. This would vary based on your data model, of course, but any estimate you made need to also take into account the storage that would be taken by the entity indexes. Taking a quick glance on the datastore statistic page, I think the indexes could increase the storage size between 40% to 400%, depending on how many index you have for the particular entity.
Remember that most costs are an estimation of real costs. The definite source of truth is here: https://cloud.google.com/pricing/.
A good tool to estimate your cost for Appengine is this awesome Chrome Extension: "App Engine Offline Statistics Estimator".
You can also check out the AppStats package (to infer costs from within the app via API).
Recap:
Official Appengine Pricing
AppStats for Python
AppStats for Java
Online Estimator (OSE) Chrome Extension
You can use the pricing calculator
https://cloud.google.com/products/calculator/

What is workload throttling?

Could somebody give a good explanation for newbie, what does following phrase means:
1) workload throttling within a single cluster and 2) workload
balance across multiple clusters.
This is from overview of advantages of one ETL-jobs tool, that helps perform ETL (Extract, Transform, Load) jobs on Redshift database.
Many web services allocate a maximum amount of "interaction" that you can have with a service. Once your exceed that amount, the service will shift in how it completes its interactions.
Amazon imposes limitations on how much compute power you can consume within your nodes. The phrase "workload throttling" means that if you exceed the limits detailed in Amazon's documentation Amazon Redshift Limts, your queries, jobs, tasks, or work items will be given lower priority or fail outright.
The idea is that Amazon doesn't want you to consume so much compute power that it prevents others from using the service and, honestly, they don't want you to consume more power than it costs them to provide.
Workload throttling isn't an idea exclusive to this Amazon service, or cloud services in general. The concept can be found in any system that needs to account for receiving more tasks than it can handle. Some systems deal with being overburdened differently.
For example, some systems will defer you to alternate services in the case of a load balancer. 3rd party data APIs will delegate you a maximum amount of data per hour/minute and then either delay the responses you get back, charge you more money, or stop responding altogether.
Another service that you can look at that deals with throttling is the Google Maps Geocoding service. If you look on their documentation, Google Maps Geocoding API Usage Limits, you will see that:
Users of the standard API:
2,500 free requests per day, calculated as the sum of client-side and server-side queries.
50 requests per second, calculated as the sum of client-side and server-side queries.
If you exceed this and have billing enabled, Google will shift to:
$0.50 USD / 1000 additional requests, up to 100,000 daily.
I can't remember what the response looks like after you hit that daily limit, but once you hit it, you basically don't get responses back until the day resets.

Resources