How to use an array to work around google scripts "service invoked too many times for one day" error - arrays

I wrote a simple script to fetch the distance between two locations, each in a different cell in GoogleSheets (below). My sheet has one set of 65 locations in the top row and a second set of 6000 locations listed in the first column. I want to find the distance between each location in the top row and each location in the first column.
Given the size of my data set, I'm running into the "service invoked too many times for one day: route" error message. I found this post suggesting that one could create an array to execute calculations for the whole spreadsheet at once, rather than cell by cell. Would this be a suitable solution for my current problem? If so, how would I go about writing the script? Here's my current code:
function GOOGLEMAPS(start_address,end_address) {
Utilities.sleep(1000)
var mapObj = Maps.newDirectionFinder();
mapObj.setOrigin(start_address);
mapObj.setDestination(end_address);
var directions = mapObj.getDirections();
var meters = directions["routes"][0]["legs"][0]["distance"]["value"];
var distance = meters * 0.000621371
//Logger.log(distance)
return distance;
}

If you desire overcome a daily limit established by the Google Apps Script you should initialize a Map with the Google Maps Premium plan credentials. This means you should contact Google and purchase a Google Maps Platform Premium plan license in order to get additional quota allowances.
There is a Maps.setAuthentication(clientId, signingKey); method for this purpose.
Enables the use of an externally established Maps API for Business account, to leverage additional quota allowances. Your client ID and signing key can be obtained from the Google Enterprise Support Portal. Set these values to null to go back to using the default quota allowances.
source: https://developers.google.com/apps-script/reference/maps/maps#setAuthentication(String,String)
I hope this answer clarifies your doubt.

Related

NDB Queries Exceeding GAE Soft Private Memory Limit

I currently have a an application running in the Google App Engine Standard Environment, which, among other things, contains a large database of weather data and a frontend endpoint that generates graph of this data. The database lives in Google Cloud Datastore, and the Python Flask application accesses it via the NDB library.
My issue is as follows: when I try to generate graphs for WeatherData spanning more than about a week (the data is stored for every 5 minutes), my application exceeds GAE's soft private memory limit and crashes. However, stored in each of my WeatherData entities are the relevant fields that I want to graph, in addition to a very large json string containing forecast data that I do not need for this graphing application. So, the part of the WeatherData entities that is causing my application to exceed the soft private memory limit is not even needed in this application.
My question is thus as follows: is there any way to query only certain properties in the entity, such as can be done for specific columns in a SQL-style query? Again, I don't need the entire forecast json string for graphing, only a few other fields stored in the entity. The other approach I tried to run was to only fetch a couple of entities out at a time and split the query into multiple API calls, but it ended up taking so long that the page would time out and I couldn't get it to work properly.
Below is my code for how it is currently implemented and breaking. Any input is much appreciated:
wDataCsv = 'Time,' + ','.join(wData.keys())
qry = WeatherData.time_ordered_query(ndb.Key('Location', loc),start=start_date,end=end_date)
for acct in qry.fetch():
d = [acct.time.strftime(date_string)]
for attr in wData.keys():
d.append(str(acct.dict_access(attr)))
wData[attr].append([acct.time.strftime(date_string),acct.dict_access(attr)])
wDataCsv += '\\n' + ','.join(d)
# Children Entity - log of a weather at parent location
class WeatherData(ndb.Model):
# model for data to save
...
# Function for querying data below a given ancestor between two optional
# times
#classmethod
def time_ordered_query(cls, ancestor_key, start=None, end=None):
return cls.query(cls.time>=start, cls.time<=end,ancestor=ancestor_key).order(-cls.time)
EDIT: I tried the iterative page fetching strategy described in the link from the answer below. My code was updated to the following:
wDataCsv = 'Time,' + ','.join(wData.keys())
qry = WeatherData.time_ordered_query(ndb.Key('Location', loc),start=start_date,end=end_date)
cursor = None
while True:
gc.collect()
fetched, next_cursor, more = qry.fetch_page(FETCHNUM, start_cursor=cursor)
if fetched:
for acct in fetched:
d = [acct.time.strftime(date_string)]
for attr in wData.keys():
d.append(str(acct.dict_access(attr)))
wData[attr].append([acct.time.strftime(date_string),acct.dict_access(attr)])
wDataCsv += '\\n' + ','.join(d)
if more and next_cursor:
cursor = next_cursor
else:
break
where FETCHNUM=500. In this case, I am still exceeding the soft private memory limit for queries of the same length as before, and the query takes much, much longer to run. I suspect the problem may be with Python's garbage collector not deleting the already used information that is re-referenced, but even when I include gc.collect() I see no improvement there.
EDIT:
Following the advice below, I fixed the problem using Projection Queries. Rather than have a separate projection for each custom query, I simply ran the same projection each time: namely querying all properties of the entity excluding the JSON string. While this is not ideal as it still pulls gratuitous information from the database each time, generating individual queries of each specific query is not scalable due to the exponential growth of necessary indices. For this application, as each additional property is negligible additional memory (aside form that json string), it works!
You can use projection queries to fetch only the properties of interest from each entity. Watch out for the limitations, though. And this still can't scale indefinitely.
You can split your queries across multiple requests (more scalable), but use bigger chunks, not just a couple (you can fetch 500 at a time) and cursors. Check out examples in How to delete all the entries from google datastore?
You can bump your instance class to one with more memory (if not done already).
You can prepare intermediate results (also in the datastore) from the big entities ahead of time and use these intermediate pre-computed values in the final stage.
Finally you could try to create and store just portions of the graphs and just stitch them together in the end (only if it comes down to that, I'm not sure how exactly it would be done, I imagine it wouldn't be trivial).

Incrementing keys for a multi-user tool in google cloud datastore

I am building a tool using the Google Cloud Datastore Java API. The backend of this tool has a bunch of methods and APIs that we made which are hosted on the Google App Engine. The data we collect from the tool comes from a Chrome extension we built and using the above mentioned APIs we store our data in GCD. Everything works perfectly well in our implementation except for one thing, the Identifiers.
I created a method to store all our relevant information in several tables and while submitting I am creating each Entity with an Identifier which is the next number in ascending order from the previous entry in the table. The tool is being used by several people and the entries for that particular day are stored in correct order. However, everyday it seems that thee ID variable is reset and our table starts overwriting information as the ID starts from 1 again. It remains constant during that day but soon as the date changes, the ID starts from 1 again.
AtomicInteger Identifier = new AtomicInteger();
public void DataEntity(String EmpName, String Date, String Col1, String Col2)
{
id = Identifier.incrementAndGet();
Entity en = new Entity("DataTable", id);
task.setProperty("Employee Name", EmpName);
task.setProperty("Submit_Date", Date);
task.setProperty("Column1", Col1);
task.setProperty("Column2", Col2);
...
ds.put(en);
}
My guess is that at the end of the day all the methods are garbage collected. I should also note that our app is threadsafe, hence, data is not getting overwritten simultaneously. Only the next day when all the variables seem to have been reset and everything starts from 1. Any help will be much appreciated. Please let me know in case you have any questions. I'll be happy to provide more info.

App Engine query in admin datastore viewer returning different results than programmatic query

I'm flummoxed.
I noticed today that some data I thought should be present in my production appengine app wasn't showing up. I connected to the app via the remote console and ran the queries manually. Sure enough it looked like I only had 15 of the 101 rows I was expecting to see.
Then I went to my admin console at appengine.google.com and fired up the datastore viewer with the following query:
SELECT * FROM Assignment where game = KEY('Game', '201212-foo') and player = KEY('Player', 'player-mb')
The result I see is the first page of 20 results. I page through those results, and am able to see all 101 entities. HOORAY! My data is still there. BUT why then can't I access it via the db api? (NOTE: I've already tried clearing memcache via the memcache viewer, even though this particularly query isn't manually memcached)
From the remote console:
> from google.appengine.ext.db import GqlQuery
> GqlQuery("SELECT * FROM Assignment WHERE game = KEY('Game', '201212-foo') and player = KEY('Player', 'player-mb')").count()
15
The remote console agrees with the app itself, which only seems to be able to see 15 of the expected 101 rows.
What gives?
UPDATE:
I suspect this might be an indexing issue. If I issue get_by_key_name for one of the missing rows, it subsequently shows up in db api queries.
> GqlQuery("SELECT * FROM Assignment WHERE game = KEY('Game', '201212-foo') and player = KEY('Player', 'player-mb')").count()
15
> entities.Assignment.get_by_key_name('201212-assignment-135.9')
<entities.Assignment object at 0xa11eb6c>
> GqlQuery("SELECT * FROM Assignment WHERE game = KEY('Game', '201212-foo') and player = KEY('Player', 'player-mb')").count()
16
So should I (or can I) rebuild my indexes to remedy this problem?
UPDATE #2:
I attempted to build a perfect index for this query, and have just verified that even when the query does use the just-built index (via query.index_list()), the results are still only limited to a small subset of items available via the datastore viewer. Infuriatingly, it's actually a different subset than is available with the previous index (20 items vs 15 items). So now adding an additional filter term results in an additional 5 rows returned. So dumb.
All indexes claim to be "serving" so there shouldn't be any reason that the indexes are this far off.
UPDATE #3:
Sometimes, using my new index, I'll get the right answer:
> GqlQuery("SELECT * FROM Assignment WHERE game = KEY('Game', '201212-foo') and player = KEY('Player', 'player-mb') and user = 'zee'").count()
101
However if I issue this query 10 times, it comes back with the 'bad' results about half the time:
> GqlQuery("SELECT * FROM Assignment WHERE game = KEY('Game', '201212-foo') and player = KEY('Player', 'player-mb') and user = 'zee'").count()
16
So maybe its an issue of a bad/behind bigtable replica that I'm hitting half the time, or something else completely opaque that we won't get an answer to (appengine status does list a service disruption today), but I have a feeling that this will be fixed on its own. Will update again if it does.
FINAL UPDATE:
As I suspected, when I woke up this morning my app (and manual queries) now see a consistent, correct view of the data. Would still love an answer as to why this happened, but until I get that I'm going to chalk it up to internal Google bigtable weirdness.
I filed this issue against appengine to see if I can get an answer from someone in the know.
For HRD applications, this is working as intended. App Engine High Replication Datastore (HRD) stores your data synchronously in multiple datacenters. However, the delay from the time a write is committed until it becomes visible in all datacenters means that queries across multiple entity groups (non-ancestor queries) can only guarantee eventually consistent results. [1]
In your specific case, the discrepancy between the results from your application and the Admin Console Datastore Viewer is just because they most likely are reading from different Datastore servers with different consistency.
If you require a consistent view of your data, I advise taking a closer look into the article "Structuring Data for Strong Consistency"
[1] https://developers.google.com/appengine/docs/java/datastore/structuring_for_strong_consistency

Why is geosearching/location based searches returning zero results?

I am trying to use app engine's search API to search locations:
https://developers.google.com/appengine/docs/python/search/overview#Performing_Location-Based_Searches
The problem is no matter what I do, I get zero results. I set the search lat/lng as the the exact point on a document's GeoPoint property and it still returns zero.
I know the regular search is working because if I change the query to be a regular full-text search, it works.
Here is an example of my data (this is actually from the example app here: http://www.youtube.com/watch?v=cE6gb5pqr1k)
Full Text Search > stores1
Document Id: sanjose
Field Name Field Value
store_address 123 Main St.
store_location search.GeoPoint(latitude=37.37, longitude=-121.92)
store_name San Jose
And then my query:
index = search.Index('stores1')
loc = (37.37, -121.92)
query = "distance(store_location, geopoint(37.37, -121.92)) < 4500"
loc_expr = "distance(store_location, geopoint(37.37, -121.92))"
sortexpr = search.SortExpression(
expression=loc_expr,
direction=search.SortExpression.ASCENDING, default_value=4501)
search_query = search.Query(
query_string=query,
options=search.QueryOptions(
sort_options=search.SortOptions(expressions=[sortexpr])))
results = index.search(search_query)
print results
And the returns:
search.SearchResults(number_found=0L)
Am I missing something or doing something wrong? This should return at least that one result, right?
** UPDATE **
After doing some prying/searching/testing I think this may be a bug regarding the google app engine development server.
If I run location searches on the same data in the production environment, I get expected results. When I compare and run the exact same query on the data in the development environment, I get the unexpected 0 results.
If anybody has any insight on this, please advise. Otherwise, for those of you seeing the same problem, I created an issue on app engine's issue tracker
here.
You've probably already figured this out, but in case someone comes across this post, the geosearch feature of AppEngine's Search API returns zero results on the dev server. From https://developers.google.com/appengine/training/fts_intro/lesson2:
"...some search queries are not fully supported on the Development Web Server (running locally), so you’ll need to run them using a deployed application."
Here's another useful link:
https://developers.google.com/appengine/docs/python/search/devserver

How do travel websites implement the sorting of search results?

For example you make a search for a hotel in London and get 250 hotels out of which 25 hotels are shown on first page. On each page user has an option to sort the hotels based on price, name, user-reviews etc. Now the intelligent thing to do will be to only get the first 25 hotels on the first page from the database. When user moves to page 2, make another database query for next 25 hotels and keep the previous results in cache.
Now consider this, user is on page 1 and sees 25 hotels sorted by price and now he sorts them based on user-ratings, in this case, we should keep the hotels we already got in cache and only request for additional hotels. How is that implemented? Is there something built in any language (preferably php) or we have to implement it from scratch using multiple queries?
This is usually done as follows:
The query is executed with order by the required field, and with a top (in some databases limit) set to (page_index + 1) * entries_per_page results. The query returns a random-access rowset (you might also hear of this referred to as a resultset or a recordset depending on the database library you are using) which supports methods such as MoveTo( row_index ) and MoveNext(). So, we execute MoveTo( page_index * entries_per_page ) and then we read and display entries_per_page results. The rowset generally also offers a Count property which we invoke to get the total number of rows that would be fetched by the query if we ever let it run to the end (which of course we don't) so that we can compute and show the user how many pages exist.

Resources