ndb.put_multi_async with cloud ndb - google-app-engine

The Google cloud ndb documentation doesn't say much about async operations.
In the old days, I would do this
#ndb.toplevel
#flask.route('/', methods=['GET'])
def page():
for x in xxx:
ndb.put_multi_async([...])
return 'Done', 200
and the toplevel decorator would make sure that my async puts were done.
How do I do this with the latest cloud ndb?
The cloud ndb docs for toplevel say
Use of this decorator is largely unnecessary, as you should be using
context() which also flushes pending work when exiting the context.
but it would be helpful to have more clarity. When would it still be necessary to use toplevel?

As stated in the documentation for the NDB Asynchronous Operation:
As a convenience, you can decorate the request handler with #ndb.toplevel. This tells the handler not to exit until its asynchronous requests have finished.
...
Using a toplevel application is more convenient than all its handler functions.
This was convenient when using the NDB Client Library with Python 2, as you've said:
the toplevel decorator would make sure that my async puts were done
Nowadays using the Cloud NDB library, as shown in this answer,
each NDB operation needs to be wrapped in a context manager:
with ndb_client.context(): # <- you need this line
cls.get_or_insert('master')
That is why the documentation says that the use of toplevel decorator
is largely unnecessary, as you should be using context()
because context decorator replaced it and it will
flush pending work_ as async operations.
As referred in the Client NDB documentation:
The context is used to manage the connection to Google Cloud Datastore, an event loop for asynchronous API calls, runtime caching policy, and other essential runtime state.
Finally, as referred in the ndb Migration notes:
The biggest difference is in establishing a runtime context for your NDB application. The Google App Engine Python 2.7 runtime had a strong assumption that all code executed inside a web framework request-response cycle, in a single thread per request. In order to decouple from that assumption, Cloud NDB implements explicit clients and contexts.

Personally, I've always been in the habit of calling .get_result() on my async tasklets/operations, so this is something that I've never actually used.
The only use case i can think of for toplevel is if you want to force the flush to occur before you reach the end of your request handler (because at the end of your request handler, you should be exiting the context). In the example below, we want the puts in operation_1 to finish before operation_2 begins:
#ndb.toplevel
def operation_1():
for x in xxx:
ndb.put_multi_async([...])
#flask.route('/', methods=['GET'])
def page():
operation_1()
operation_2()
return 'Done', 200
This could be useful for request handlers for Google Cloud Tasks which can run for up to 10 minutes, so you could be doing a bunch of things in there.

Related

Init and destroy function

I am still beginner with golang in Google Cloud Appengine (standard).
I want to use a function that is automatically call for the instance shutting down.
There is a function init called during startup.
Now I am looking for the opposite part like a destroy function.
Seems there is something like that for python, but could not find
anything for golang.
How could you realise such a destroy fuction in google appengine instances ?
This is documented at Go - How Instances are Managed.
Unfortunately the Go doc is incomplete, here's the link to the Pyton version: Python - How Instances are Managed. The way it is implemented / supported is language-agnostic.
When an instance is spin up, an HTTP GET request is sent to the /_ah/start path.
Before an instance is taken down, an HTTP GET request is sent to the /_ah/stop path.
You should use package init() functions for initialization purposes as that always runs, and only once. If a request is required for your init functions, then register a handler to the _/ah/start path.
And you may register a handler to /_ah/stop and implement "shutdown" functionality like this:
func init() {
http.HandleFunc("/_ah/stop", shutdownHandler)
}
func shutdownHandler(w http.ResponseWriter, r *http.Request) {
doSomeWork()
saveState()
}
But you can't rely on this 100%:
Note: It's important to recognize that the shutdown hook is not always able to run before an instance terminates. In rare cases, an outage can occur that prevents App Engine from providing 30 seconds of shutdown time. Thus, we recommend periodically checkpointing the state of your instance and using it primarily as an in-memory cache rather than a reliable data store.

How many Async calls we can call in one request in Google App Engine

I'm writing an app on top of google app engine. Using Java as language.
I'm new to GAE. I want to use URL fetch service for async calls in application.
I'm having few doubts
1. How many async calls I can call in one appengine request ?
2. What happens if I call more async calls in one request, let's say 50
3. What happens these async calls takes more than 1 minute (the appengine request timeout)
4. What is default timeout for each async call (I'm assuming it is 60 sec.Is it correct)
There is no limit on the number of calls that App Engine can process in general. There are limits (quotas) set for different APIs: e.g. number of Mail API, Task API, URLFetch, etc. calls. Often quotas are different for paying customers, and in most cases you can request to increase a quota if your app requires it. In other words, these quotas do not reflect a technological limitation - they are in place to prevent abuse.
The number of requests that your app can process depends on how many instances you have running, and how fast each call can be processed. You can enable multithreading, so each instance can process multiple calls in parallel.
As for the timeouts, they depend on the instance type and the type of calls that you are making (Datastore API, Cloud Storage API, URLFetch, etc.)

GAE behavior when relocating an application to another server

Two questions:
Does Google App Engine send any kind of message to an application just before relocating it to another server?
If so, what is that message?
No it doesnt. It doesnt relocate either, old instances keep running (and eventually stop when idle for long enough) while new ones are spawned.
There are times when App Engine needs to move your instance to a different machine to improve load distribution.
When App Engine needs to turn down a manual scaling instance it first
notifies the instance. There are two ways to receive this
notification. First, the is_shutting_down() method from
google.appengine.api.runtime begins returning true. Second, if you
have registered a shutdown hook, it will be called. It's a good idea
to register a shutdown hook in your start request. After the
notification is issued, existing requests are given 30 seconds to
complete, and new requests immediately return 404.
If an instance is
handling a request, App Engine pauses the request and runs the
shutdown hook. If there is no active request, App Engine sends an
/_ah/stop request, which runs the shutdown hook. The /_ah/stop request
bypasses normal handling logic and cannot be handled by user code; its
sole purpose is to invoke the shutdown hook. If you raise an exception
in your shutdown hook while handling another request, it will bubble
up into the request, where you can catch it.
The following code sample demonstrates a basic shutdown hook:
from google.appengine.api import apiproxy_stub_map
from google.appengine.api import runtime
def my_shutdown_hook():
apiproxy_stub_map.apiproxy.CancelApiCalls()
save_state()
# May want to raise an exception
runtime.set_shutdown_hook(my_shutdown_hook)
Alternatively, the following sample demonstrates how to use the is_shutting_down() method:
while more_work_to_do and not runtime.is_shutting_down():
do_some_work()
save_state()
More details here: https://developers.google.com/appengine/docs/python/modules/#Python_Instance_states

Is ndb async guaranteed to execute after application request has finished?

I am using ndb to write a profiling model that logs some data per application request. Each request calls a ndb request by ndb.put_async to log the data, while the client do not care about the result. In essence, I do not want the application request to wait for saving statistics data for profiling.
However, I was confused about the explanation from the official documentation. If an application request has finished before the ndb request finishes, would the ndb request still be guaranteed to finish? The documentation indicates that
if the request handler exists too early, the put might never happen
Under what criteria would this happen? Does this mean that regardless of whether a user care about the result, future.get_result needs to be called anyway just to make sure the ndb request is performed?
The original documentation (https://developers.google.com/appengine/docs/python/ndb/async) says:
In this example, it's a little silly to call future.get_result: the
application never uses the result from NDB. That code is just in there
to make sure that the request handler doesn't exit before the NDB put
finishes; if the request handler exits too early, the put might never
happen. As a convenience, you can decorate the request handler with
#ndb.toplevel. This tells the handler not to exit until its
asynchronous requests have finished. This in turn lets you send off
the request and not worry about the result.
If an application request has finished before the ndb request finishes, would the ndb request still be guaranteed to finish?
No.
Does this mean that regardless of whether a user care about the result, future.get_result needs to be called anyway just to make sure the ndb request is performed?
Basically yes, but you can use ndb.toplevel decorator for the convenience so that you don't have to wait for the result explicitly. That said, I don't think this is what you want.
Probably taskqueue is what you want. Please check it out.
Thanks for the clarification. What about a general RPC (non-NDB) - e.g., incr_async() in memcache.Client()? Setting aside that this is a very, very fast RPC call, is it guaranteed that the RPC will complete?
I.e., which of the following is true:
(a) there is something in the infrastructure that will wait on all known RPCs before completing the request
(b) the request will complete and the async RPCs will also complete regardless of when the request completes
(c) the in-flight RPCs are formally cancelled
(d) something else?

GaeUtilities: Session Problem

I'm programming an application with google app engine, with django 1.1 (no django pacth or others), well as you know is impossible use django login and session features so I download
Gae utility and use Session Object (http://gaeutilities.appspot.com/) but some time this object create 2 sessions instead 1 session ... here's code
def index(request):
aSWrap = SWrap(SWrap.createSession())
....
def login(request):
aSWrap = SWrap(SWrap.createSession())
....
class SWrap(object):
#classmethod
def createSession():
return Session(cookie_name='my_cookie',session_expire_time=7200)
and for setting session no expiration or really long expiration...enter code here
Thanks
Judging by the code, you're calling createsession twice within the same request. That will cause problems with David's library as well.
Also, gaeutilties session included a config file where you can modify all the default values as you like.
https://github.com/joerussbowman/gaeutilities/blob/master/appengine_utilities/settings_default.py
gaeutilities session also has security features lacking in gae-sessions. I'm afraid David didn't attempt to answer you question, rather just suggested you use his library which under your current implementation would have the exact same problem. You need to be sure you only initiate the session once per http request no matter what session library you're using.
I'm moving gaeutilities session to a decorator in order to address this issue as well and provide better performance. You can watch the master branch on Github for updates. https://github.com/joerussbowman/gaeutilities
I suggest using a different sessions library. Check out this comparison of the available sessions libraries for GAE.
I'd recommend gae-sessions - it presents an API almost identical to the library you are currently using, but it is much faster and shouldn't give you headaches like the bug you've encountered above.
Disclaimer: I wrote gae-sessions, but I'm not the only one who would recommend it. Here is a recent thread discussing sessions on the google group for GAE python.
What are you trying to do with SWrap(SWrap.createSession())? It looks like the result of SWrap.createSession() is passed to the SWrap() constructor. Have you omitted part of the definition of SWrap?
Perhaps this is more what you are wanting:
def index(request):
mysession = SWrap.createSession()
....
def login(request):
mysession = SWrap.createSession()
....
class SWrap(object):
#staticmethod
def createSession():
return Session(cookie_name='my_cookie',session_expire_time=7200)

Resources