Many of my handlers add a task to a task queue to do non-critical background processing. Since this processing isn't critical, if the call to taskqueue.add() throws an exception, my code just ignores it.
Tonight the task queue seemed to be down for around half an hour. Although my handlers correctly ignored the failure, they took about 5 seconds for the taskqueue.add() call to timeout and move on to processing the rest of the page. This therefore made my site run very slowly.
So, is it possible to enqueue a task asynchronously - meaning a way to add a task, without waiting to see if the addition succeeded?
Alternatively, is there a way to reduce that timeout from 5 seconds down to eg 1 second?
Thanks.
You can use the new taskqueue methods create_rpc and add_async. If you don't care if the add succeeds, simply call add_async and ignore the result. If you care, but only want to wait 1 second, set the deadline when calling create_rpc, and use the return value as the RPC argument to add_async. Call get_result to find out if the tasks were successfully added.
I think you can't do anything about it because the RPC call underneath the add method is a synchronous blocking API call.
You could try to add some check using the Capabilities API.
I am pretty sure GAE announced that TQ adds will be async with the next release (experimental feature).
Related
I'm learning libev and I've stumbled upon this question. Assume that I want to process something as soon as possible but not now (i.e. not in the current executing function). For example I want to divide some big synchronous job into multiple pieces that will be queued so that other callbacks can fire in between. In other words I want to schedule a callback with timeout 0.
So the first idea is to use ev_timer with timeout 0. The first question is: is that efficient? Is libev capable of transforming 0 timeout timer into an efficient "call as soon as possible" job? I assume it is not.
I've been digging through libev's docs and I found other options as well:
it can artificially delay invoking the callback by using a prepare or idle watcher
So the idle watcher is probably not going to be good here because
Idle watchers trigger events when no other events of the same or higher priority are pending
Which probably is not what I want. Prepare watchers might work here. But why not check watcher? Is there any crutial difference in the context I'm talking about?
The other option these docs suggest is:
or more sneakily, by reusing an existing (stopped) watcher and pushing it into the pending queue:
ev_set_cb (watcher, callback);
ev_feed_event (EV_A_ watcher, 0);
But that would require to always have a stopped watcher. Also since I don't know a priori how many calls I want to schedule at the same time then I would have to have multiple watchers and additionally keep track of them via some kind of list and increase it when needed.
So am I on the right track? Are these all possibilities or am I missing something simple?
You may want to check out the ev_prepare watcher. That one is scheduled for execution as the last handler in the given event loop iteration. It can be used for 'Execute this task ASAP' implementations. You can create dedicated watcher for each task you want to execute, or you can implement a queue with a one prepare watcher that is started once queue contains at least one task.
Alternatively, you can implement similar mechanism using ev_idle watcher, but this time, it will be executed only if the application doesn't process any 'higher priority' watcher handlers.
Google's doc on async tasks assumes knowledge of the difference between regular and asynchronously added tasks.
add_async(task, transactional=False, rpc=None)
Asynchronously add a Task or a list of Tasks to this Queue.
How is adding tasks asynchronously different to adding them regularly.
I.e. what is the difference between using add(task, transactional=False) and add_async(task, transactional=False, rpc=None)
I've heard that adding tasks regularly blocks certain things. Any explanation of what it blocks and how, and how async tasks don't block would be greatly appreciated.
tasks are scheduled and run elsewhere.
The async bit refers to the fact the call returns immediately (your code does not wait for the round trip of the RPC that submits the task to a queue) however you still have to check/wait for the result at the end of the request, but it means you can be doing work and then check that the call completed before you exit.
I'm doing some operations that should complete under 60 seconds but there may be some rare cases where it takes longer (but will never take longer than 10 minutes). It says in the app engine docs if you catch a DeadlineExceededException you have less than a second to do operations before it permanently fails. Would this be enough time to add a task to a queue and/or do a datastore write? I assume the safest way would be to add a task async/write a datastore entity (async) at the beginning of an operation and remove it from the queue if the operation completes. The latter method would use up twice as many api calls but is it worth it?
I would suggest to use the queue as default for all operations so you won't have to implement the fallback to it if you catch a dead line exceed error. It is more clean and easier to maintain along with the fact that the user doesn't have to wait for the operation to complete. In order to achieve this you can trigger your queue with an ajax call and get the result in the background, so the user will not wait for the operation to complete. Yes it worth's it, since it can "guarantee" the window of time you might need.
The runtime environment gives the request handler a little bit more time (less than a second) after raising the exception to prepare a custom response. so it would be sufficient to add that it into task queue.
If you do not want the client to keep polling for a task queue result, I suggest you have a look at the Channel API. It will enable you to implement push notifications to the client.
At the end of your task queue, you'll just have to send a notification to the client to let him now that is task has been processed.
Is the following simple pattern enough to ensure the task sequence never stops even after application updates or hard, 'erratic' google failures.
def do_work():
... ....
deferred.defer(do_work, _countdown=..in 7 days..)
Can I schedule such a self-scheduling worker and never look back?
Two answers:
Yes, tasks will eventually execute and will also retry execution in case of errors in task execution. The retry options are set when you define the task.
No, task queue is not a scheduler, so you can not schedule a task to run at certain time. Tasks put into a task queue are served immediatelly in a FIFO fashion.
As #Jesse noted, for scheduling jobs you should look into GAE cron.
If a task is queued successfully, it will eventually execute. (And App Engine will keep trying for as long as it takes.)
The pattern you show might be better implemented using cron jobs, though, which run a task on a regular basis. A common pattern I use is to have a daily cron job kick off a task on a task queue with a small number of retries (so that if there's a temporary glitch, it will retry immediately).
If you do want to use the method above, rather than cron, there's another thing to worry about: since your method can be retried due to it failing or other system issues (e.g. the instance running it going down) you should make sure that you don't end up with two tasks. Imagine if it ran, registered the next task and then the node went down; App Engine would retry, starting a second task. To prevent this, you could use the data store (in a transaction) to test and see if the next task has already been enqueued. Something like:
def do_work(counter):
...
#db.transactional
def start_next():
# fetch myModel from the data store here
if myModel.counter == counter:
return # already started next job
myModel.counter = counter
myModel.put()
deferred.defer(do_work, counter + 1, _transactional=True, _countdown=...)
start_next()
Note the "transactional" argument in the defer call; this ensures that the MyModel instance will be updated if and only if the next task is enqueued.
You might also want to look into sending an email to an administrator after a certain number of failed retries. (You can find this in the request HTTP headers, but you can't use the deferred library if you want to do this; you have to use the task queue API directly.)
We're using libspotify to update playlists that we have generated against a single account that need to be kept up to date over time. We're using a fork of the spotify-api-server to do this https://github.com/tom-martin/spotify-api-server
After sending an update to a playlist's tracks using libspotify we generally wait for the callback that we passed to sp_playlist_add_callbacks to be called before we report a success to the user. Often this callback arrives within a suitable time frame but increasingly we're getting unacceptable delays in receiving this callback. Sometimes 30 seconds, sometimes even longer, sometimes minutes, sometimes hours. It seems that generally these delays are caused by libspotify pausing for a period and not calling any callbacks until it seemingly "unfreezes" and calls all the backed up callbacks in quick succession.
Is it reasonable to use this callback as an indicator of a successful playlist update? Is there any obvious reason for these long delays?
Are you correctly handling the notify_main_thread function to keep libSpotify running?
Also, sometimes the playlist system gets backed up, goes down or otherwise takes a while to respond to requests. Our own clients keep their own cache of what the playlist tree should look like once pending transactions are successful to keep the UI snappy.