GAE python ndb put_async issue

GAE python ndb put_async issue - google-app-engine

I've a model named Conversation with some fields that includes date_created and date_updated as DateTimePropterty with auto_now_add and auto_now.
If I update the model using put() method, date_updated field is getting updated.
But when I use the put_async method, the value in the date_updated field is not updating.
And I also have the test case using Python's unittest.Testcase, there it works fine.
Note: It works when I use put_async().get_result().
Sample model class:
class Conversation(ndb.Model):
participants = ndb.StringProperty(repeated=True)
conversation_name = ndb.StringProperty()
date_created = ndb.DateTimeProperty(required=True, auto_now_add=True)
date_updated = ndb.DateTimeProperty(required=True, auto_now=True)
#staticmethod
def update_conversation_date_by_id(conversation_id):
conversation = Conversation.get_by_id(conversation_id) if conversation_id else None
if conversation is None:
raise CannotFindEntity("Given conversation_id is not found")
else:
conversation.put_async().get
return conversation

If the request handler exits before the NDB put finishes the put might never happen.
class MyRequestHandler(webapp2.RequestHandler):
def get(self):
acct = Account.get_by_id(users.get_current_user().user_id())
acct.view_counter += 1
future = acct.put_async()
# ...read something else from Datastore...
template.render(...)
future.get_result()
Try adding something like the last line in that codeblock to force it to wait.
In this example, it's a little silly to call future.get_result: the
application never uses the result from NDB. That code is just in there
to make sure that the request handler doesn't exit before the NDB put
finishes; if the request handler exits too early, the put might never
happen. As a convenience, you can decorate the request handler with
#ndb.toplevel. This tells the handler not to exit until its
asynchronous requests have finished. This in turn lets you send off
the request and not worry about the result.
https://developers.google.com/appengine/docs/python/ndb/async

the async methods stop executing when the main thread stops(request handler finishes)... so most likely your code doesnt execute.
this can be prevented by adding #ndb.toplevel to your request handler. then the handler will wait for your async to complete.
https://developers.google.com/appengine/docs/python/ndb/async
async doesnt let you run code when the request handler is completed. async lets you run other (async) commands while waiting for async commands on the same request handler thread :)

Related

Signal not being triggered on .save method

I want to create a streaming API(server sent events) for alerts i.e new added alerts should be sent to open connections as server sent events. For that, i am using StreamingHttpResponse. Header is set to text/event-stream as it should be for SSEs. I have a model that handles alerts in DB. I have written a signals.py file that defines signal for post_save for Alert model and for post_save event on Alert model, I return StreamingHttpResponse with newly added alert. When I got the API first time, I get response with all alerts but not getting new added alert/s when ever a new alert is added to DB. What am I doing wrong?
monitoring/signals.py
#receiver(post_save, sender=Alert)
def send_alert(sender, instance, **kwargs):
qs_json = serializers.serialize('json', instance)
return StreamingHttpResponse(qs_json, content_type='text/event-stream')
monitoring/views.py
# just importing defined signal receiver here
from .signals import send_alert
# endpoint view function
def get_alerts(self):
queryset = Alert.objects.all().order_by('-date')
resolved = self.request.query_params.get('resolved')
server = self.request.query_params.get('server')
source = self.request.query_params.get('source')
if source is not None:
queryset = queryset.filter(source=source)
if resolved is not None:
queryset = queryset.filter(resolved=resolved)
if server is not None:
queryset = queryset.filter(server=server)
return HttpResponse(queryset)
Even when I hit the .save method on DB myself on Django shell, signal is not triggered.
>>> from monitoring.models import Alert
>>> a=Alert(container="something")
>>> s.save()

Closing Google Cloud PubSub client with future running

I'm using Cloud Pubsub and for one of our systems I'm starting to get "Too many files open". lsof shows a tons of requests to Google Cloud, which I'm pretty sure are pubsub.
Googling led me to https://github.com/googleapis/google-cloud-python/issues/5523 which indicates I need to close the transport explicitly.
The problem is that I'm using a helper python package (which is invoked by about 50-100 other services) to publish my messages, which looks roughly like this:
def pubsub_callback(future):
message_id = future.result()
LOGGER.info("Successfully published %s", message_id)
def send_oneoff_pubsub_message(self, client=None):
if not client:
client = self.get_client('pubsubpub') # Creates a pubsub publisher client
future = client.publish({...})
try:
future.exception(timeout=10)
except Exception as exc:
print("error")
future.add_done_callback(pubsub_callback)
Now in many places we're slowly refactoring to explicitly create a client outside of the function (so we're not creating too many clients). However I would still like to refactor this to close the client once the message is published.
The linked issue recommends client.api.transport._channel.close() after you'r finished with the client. However, in this case I'm only finished with it after the pubsub_callback has been triggered.
I'm not seeing any way to get the client from the future, and the callback add_done_callback doesn't (right so) doesn't allow to send arguments.
Any creative solutions?
I need to bite the bullet anyway and refactor the heavy pubsub clients, but it's not always clear cut.
Update:
Looking at the code, it appears as if this would successfully close the client after the future:
def send_oneoff_pubsub_message(self, client=None):
if not client:
client = self.get_client('pubsubpub') # Creates a pubsub publisher client
future = client.publish({...})
try:
future.exception(timeout=10)
except Exception as exc:
print("error")
future.add_done_callback(pubsub_callback)
future.result(timeout=10)
client.api.transport._channel.close()
Any downside of such asn approach? except that the function blocks until published (which is OK for me)

Your code does a mix of asynchronous operation and synchronous operation. After you call publish, you wait up to 10 seconds for an exception on the future. Within 10 seconds in most cases, publish will have completed anyway, so at that point, you might as well call future.result synchronously and not even bother with the add_done_callback:
def pubsub_callback(future):
try:
message_id = future.result()
LOGGER.info("Successfully published %s", message_id)
except Exception as exc:
print("error")
def send_oneoff_pubsub_message(self, client=None):
if not client:
client_created = True
client = self.get_client('pubsubpub') # Creates a pubsub publisher client
future = client.publish({...})
pubsub_callback(future)
if client_created:
client.api.transport._channel.close()
If you want to do things asynchronously, you could use functools.partial:
from functools import partial
def pubsub_callback(client, future):
try:
message_id = future.result()
LOGGER.info("Successfully published %s", message_id)
except Exception as exc:
print("error")
if client:
client.api.transport._channel.close()
def send_oneoff_pubsub_message(self, client=None):
if not client:
client_created = True
client = self.get_client('pubsubpub') # Creates a pubsub publisher client
if client_created:
callback = partial(pubsub_callback, client)
else:
callback = partial(pubsub_callback, None)
future = client.publish({...})
future.add_done_callback(callback)
Either way should allow you to close the client at the desired point.

Webapp2_extras.auth with backend 'datastore' not working properly

Setting the session backend to datastore will not work properly because every call to auth.get_user_by_session() causes a put() to the datastore of the session entity even if you didn't modify the session values.
That's caused because a call to auth.get_user_by_session() makes a call to auth.get_session_data(pop=True). Pop=True will indeed pop the _user from the session dict object (SessionDict from the webapp2_extras.sessions) which causes the session to get the attribute modified to True.
Therefore when the handler completes it's tasks a call to save_session in the handler dispatch method will check if the session was modified and if True (which always is TRUE!!!) will do the put to the datastore.
So, for every request the session is put to the datastore because obviously you're calling auth.get_user_by_session() on each request.
I don't know how to handle this but I have already open a issue in the google code project page (which seems abandoned):
https://code.google.com/p/webapp-improved/issues/detail?id=100&sort=-id
Can anyone help me understand what's happening here? and how can I solve this?

So, if we inspect the following code from webapp2_extras.auth you the get the following:
def get_user_by_session(self, save_session=True):
"""Returns a user based on the current session.
:param save_session:
If True, saves the user in the session if authentication succeeds.
:returns:
A user dict or None.
"""
if self._user is None:
data = self.get_session_data(pop=True)
if not data:
self._user = _anon
else:
self._user = self.get_user_by_token(
user_id=data['user_id'], token=data['token'],
token_ts=data['token_ts'], cache=data,
cache_ts=data['cache_ts'], remember=data['remember'],
save_session=save_session)
return self._user_or_none()
After that lets inspect the get_session_data()
def get_session_data(self, pop=False):
"""Returns the session data as a dictionary.
:param pop:
If True, removes the session.
:returns:
A deserialized session, or None.
"""
func = self.session.pop if pop else self.session.get
rv = func('_user', None)
if rv is not None:
data = self.store.deserialize_session(rv)
if data:
return data
elif not pop:
self.session.pop('_user', None)
return None
I do not see where why it pops in case you have a user in the session? it points to the user_or_none which unsets the session if the user is None.
Amazing blog post on built in authorization in webapp2:
http://blog.abahgat.com/2013/01/07/user-authentication-with-webapp2-on-google-app-engine/

Why is an entity not being fetched from NDB's in-context cache?

I have an entity that is used to store some global app settings. These settings can be edited via an admin HTML page, but very rarely change. I have only one instance of this entity (a singleton of sorts) and always refer to this instance when I need access to the settings.
Here's what it boils down to:
class Settings(ndb.Model):
SINGLETON_DATASTORE_KEY = 'SINGLETON'
#classmethod
def singleton(cls):
return cls.get_or_insert(cls.SINGLETON_DATASTORE_KEY)
foo = ndb.IntegerProperty(
default = 100,
verbose_name = "Some setting called 'foo'",
indexed = False)
#ndb.tasklet
def foo():
# Even though settings has already been fetched from memcache and
# should be available in NDB's in-context cache, the following call
# fetches it from memcache anyways. Why?
settings = Settings.singleton()
class SomeHandler(webapp2.RequestHandler):
#ndb.toplevel
def get(self):
settings = Settings.singleton()
# Do some stuff
yield foo()
self.response.write("The 'foo' setting value is %d" % settings.foo)
I was under the assumption that calling Settings.singleton() more than once per request handler would be pretty fast, as the first call would most probably retrieve the Settings entity from memcache (since the entity is seldom updated) and all subsequent calls within the same request handler would retrieve it from NDB's in-context cache. From the documentation:
The in-context cache persists only for the duration of a single incoming HTTP request and is "visible" only to the code that handles that request. It's fast; this cache lives in memory.
However, AppStat is showing that my Settings entity is being retrieved from memcache multiple times within the same request handler. I know this by looking at a request handler's detailed page in AppStat, expanding the call trace of each call to memcache.Get and looking at the memcahe key that is being reteived.
I am using a lot of tasklets in my request handlers, and I call Settings.singleton() from within the tasklets that need access to the settings. Could this be the reason why the Settings entity is being fetched from memcache again instead of from the in-context cache? If so, what are the exact rules that govern if/when an entity can be fetched from the in-context cache or not? I have not been able to find this information in the NDB documentation.
Update 2013/02/15: I am unable to reproduce this in a dummy test application. Test code is:
class Foo(ndb.Model):
prop_a = ndb.DateTimeProperty(auto_now_add = True)
def use_foo():
foo = Foo.get_or_insert('singleton')
logging.info("Function using foo: %r", foo.prop_a)
#ndb.tasklet
def use_foo_tasklet():
foo = Foo.get_or_insert('singleton')
logging.info("Function using foo: %r", foo.prop_a)
#ndb.tasklet
def use_foo_async_tasklet():
foo = yield Foo.get_or_insert_async('singleton')
logging.info("Function using foo: %r", foo.prop_a)
class FuncGetOrInsertHandler(webapp2.RequestHandler):
def get(self):
for i in xrange(10):
logging.info("Iteration %d", i)
use_foo()
class TaskletGetOrInsertHandler(webapp2.RequestHandler):
#ndb.toplevel
def get(self):
logging.info("Toplevel")
use_foo()
for i in xrange(10):
logging.info("Iteration %d", i)
use_foo_tasklet()
class AsyncTaskletGetOrInsertHandler(webapp2.RequestHandler):
#ndb.toplevel
def get(self):
logging.info("Toplevel")
use_foo()
for i in xrange(10):
logging.info("Iteration %d", i)
use_foo_async_tasklet()
Before running any of the test handlers, I make sure that the Foo entity with keyname singleton exists.
Contrary to what I am seeing in my production app, all of these request handlers show a single call to memcache.Get in Appstats.
Update 2013/02/21: I am finally able to reproduce this in a dummy test application. Test code is:
class ToplevelAsyncTaskletGetOrInsertHandler(webapp2.RequestHandler):
#ndb.toplevel
def get(self):
logging.info("Toplevel 1")
use_foo()
self._toplevel2()
#ndb.toplevel
def _toplevel2(self):
logging.info("Toplevel 2")
use_foo()
for i in xrange(10):
logging.info("Iteration %d", i)
use_foo_async_tasklet()
This handler does show 2 calls to memcache.Get in Appstats, just like my production code.
Indeed, in my production request handler codepath, I have a toplevel called by another toplevel. It seems like a toplevel creates a new ndb context.
Changing the nested toplevel to a synctasklet fixes the problem.

It seems like a toplevel creates a new ndb context.
Exactly, each handler with a toplevel decorator have its own context and therefore a separate cache. You can take a look to the code for toplevel in the link below, in the function documentation states that toplevel is "A sync tasklet that sets a fresh default Context".
https://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/ndb/tasklets.py#1033

What happens when an async put, results in a contention exception, after the request has ended, on Appengine with NDB?

Using ndb, lets say I put_async'd 40 elements, with #ndb.toplevel, wrote an output to user and ended the request, however one of those put_async's resulted in a contention exception, would the response be 500 or 200? Or lets say If it it a task, would the task get re-executed?
One solution is get_result()'ing all those 40 requests before the request ending and catching those exceptions -if they occur- but I'm not sure whether it will affect performance.

As far as I understand, using #ndb.toplevel causes the handler wait for all async operations to finish before exiting.
From the docs:
As a convenience, you can decorate the request handler with #ndb.toplevel. This tells the handler not to exit until its asynchronous requests have finished. This in turn lets you send off the request and not worry about the result. https://developers.google.com/appengine/docs/python/ndb/async#intro
So by adding #ndb.toplevel that the response doesn't actually get returned until after the async methods have finished executing. Using #ndb.toplevel removes the need to call get_result on all the async calls that were fired off (for convenience). So based on this, the request would still return 500 if the async queries failed, because all the async queries needed to complete before returning. Updated: below
If using a task (I assume you mean task queue) the task queue will retry the request if the request fails.
So your handler could be something like:
def get(self):
deferred.defer(execute_stuff_in_background, param,param1)
template.render(...)
and execute_stuff_in_background would do all the expensive puts once the handler had returned. If there was a contention issue in the task, your original handler would still return 200.
If you suspect there is going to be a contention issue, perhaps consider sharding or using a fork-join queue implementation to handle the writes (see implementation here: http://www.youtube.com/watch?v=zSDC_TU7rtc#t=41m35)
Edit: Short answer
The request will fail (return 500) if the async requests fail, because #ndb.toplevel waits
for all results to finish before exiting.
Updated:Having looked at #alexis's answer below, I re-ran my original test (where I turned off datastore writes and called put_async in the handler decorated with #ndb.toplevel), the response raises 500 intermittently (I assume this depends on execution time). Based on this and #alexis's answer below, don't expect the result to be 500 if an async task throws an exception and the calling function is decorated with #ndb.toplevel

That's odd, I use toplevel and expect the opposite behavior. And that's what I observe. Did something change since the first answer to this question?
As the doc says:
This in turn lets you send off the request and not worry about the
result.
You can try the following unittest (using testbed):
#ndb.tasklet
def raiseSomething():
yield ndb.Key('foo','bar').get_async()
raise Exception()
#ndb.toplevel
def callRaiseSomething():
future = raiseSomething()
return "hello"
response = callRaiseSomething()
self.assertEqual(response, "hello")
This test passes. NDB logs a warning: "suspended generator raiseSomething(tests.py:90) raised Exception()", but it does not re-raise the exception.
ndb.toplevel only waits for the RPCs, but does nothing of the actual result.
If your decorated function is itself a tasklet, it will call get_result() on it first. At this point exceptions will be raised. Then it will wait for remaining 'orphaned' RPCs, and will only log something if an exception is raised.
So my response is: the request will succeed (return 200)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

GAE python ndb put_async issue - google-app-engine

Related

Signal not being triggered on .save method

Closing Google Cloud PubSub client with future running

Webapp2_extras.auth with backend 'datastore' not working properly

Why is an entity not being fetched from NDB's in-context cache?

What happens when an async put, results in a contention exception, after the request has ended, on Appengine with NDB?

Categories

Resources