How to increase wait time for Google Cloud Tasks? - google-app-engine

I am creating a lot of tasks to be processed in Cloud Tasks, but some of them are failing due to lack of available resources (instances). Please see the image below:
As you can see, the average time Google waits before throwing http 500 error is 10 seconds, and sometimes, less than 10ms is enough to throw http 500. This queue has auto-retry set, so, eventually all tasks are executed, but the error remains.
Is there a way to increase this wait time? I don't care waiting 5 minutes to process the task, I just want to minimize the amount of errors like this on my logging panel.

I asked a similar question not too long ago and didn't get any helpful answers. I don't know if you can increase the rate of creating tasks.
A few things to try:
Batch tasks together -- Instead of creating 10 tasks can you create one task that does the work of all 10?
Serial tasks -- Create a first task that does work and then creates a second task. The second task does work and then creates a third task, etc.
Pull queues might allow for a higher task creation rate. Not sure about that.

Related

Why is Google Cloud Tasks so slow?

I use Google Cloud Tasks with AppEngine to process tasks, but the tasks wait about 2-3 minutes in the queue before being sent to my App Engine endpoint.
There is no "delay" set on the tasks, and I expect them to be sent right away.
So the question is: Is Cloud Tasks slow?
As you can see is the following screenshot, Cloud Tasks gives an ETA of about 3 mins:
The official word from Google is that this is the best you can expect from their task queues.
In my experience, how you configure tasks seems to influence how quickly they get executed.
It seems that:
If you don't change the default behavior of your task queues (e.g., maximum concurrent, etc.) and if you don't specify an execution time of a task (e.g., eta) then your tasks will execute very soon after submission.
If you mess with either of these two things, then Google takes longer to execute your tasks. My guess is that it is the extra overhead of controlling task rate and execution.
I see from your screenshot that you have a task with an ETA of 2 min 49 sec which is the time until your task will be run. You have high bucket size and concurrency numbers, so I think your issue has more to do with the parameters you are using when queueing your tasks, especially the scheduled_time attribute. Check your code to see if you are adding a delay to your tasks, and make sure to tune it down.
Just adding here, that as of February 2023, I can queue tasks and then consume them VERY fast using the Python 3.7 libraries.
Takes me about 13.5 seconds to queue up 1000 tasks.
Takes about 1 minute to process those 1000 tasks using a Cloud Run deployed python/flask app. (No other processing done, just receive and reply with 200).
So, super fast!
BTW, pubsub was much slower in my tests... about 40ms per message to queue a message.

Google app engine API: Running large tasks

Good day,
I am running a back-end to an application as an app engine (Java).
Using endpoints, I receive requests. The problem is, there is something big I need to compute, but I need fast response times for the front end. So as a solution I want to precompute something, and store it a dedicated the memcache.
The way I did this, is by adding in a static block, and then running a deferred task on the default queue. Is there a better way to have something calculated on startup?
Now, this deferred task performs a large amount of datastore operations. Sometimes, they time out. So I created a system where it retries on a timeout until it succeeds. However, when I start up the app engine, it immediately creates two of the deferred task. It also keeps retrying the tasks when they fail, despite the fact that I set DeferredTaskContext.setDoNotRetry(true);.
Honestly, the deferred tasks feel very finicky.
I just want to run a method that takes >5 minutes (probably longer as the data set grows). I want to run this method on startup, and afterwards on a regular basis. How would you model this? My first thought was a cron job but they are limited in time. I would need a cron job that runs a deferred task, hope they don't pile up somehow or spawn duplicates or start retrying.
Thanks for the help and good day.
Dries
Your datastore operations should never time out. You need to fix this - most likely, by using cursors and setting the right batch size for your large queries.
You can perform initialization of objects on instance startup - check if an object is available, if not - do the calculations.
Remember to store the results of your calculations in the datastore (in addition to Memcache) as Memcache is volatile. This way you don't have to recalculate everything a few seconds after the first calculation was completed if a Memcache object was dropped for any reason.
Deferred tasks can be scheduled to perform after a specified delay. So instead of using a cron job, you can create a task to be executed after 1 hour (for example). This task, when it completes its own calculations, can create another task to be excited after an hour, and so on.

Task queue doesn't run all my tasks -- App engine backend

I have an app running on a backend instance. It has 11 tasks. The first one is started by /_ah/start and it, in turn, starts the other ten, the worker tasks. The worker tasks have this structure:
done = False
while not done:
do_important_stuff()
time.sleep (30)
if a_long_time_has_passed():
done = True
The execution behavior on app engine is the same every time. The scheduling task runs and enqueues the 10 worker tasks. The first seven worker tasks start running, executing correctly. The last three sit in the queue, never running. The task queue app console shows all ten tasks in the queue with seven of them running.
The app also stop responding to HTTP requests, returning 503 status codes with the logs not showing that my http handlers are getting invoked.
The worker task queue is configured with a maximum rate of 1/s and 2 buckets. It's curious that the admin console shows that the enforced rate is 0.1 sec. Since the tasks run forever, they aren't returning unsuccessful completion status codes. And the cpu load is negligible. The workers mostly do a URL fetch and then wait 30 seconds to do it again.
The logs are not helpful. I don't know where to go to find diagnostics that will help me figure it out. I'm testing in a free account. Could there be a limit of 8 tasks executing at one time? I see nothing like that in the documentation, but I've run out of ideas. Eventually, I'd like to run even more tasks in parallel.
Thanks for any advice you can give me.
There's a limit to how many simultaneous requests a backend instance will process, and it sounds like you're running into that limit.
Alternatives include:
Use regular task queues rather than ones against a backend
Start more than one instance of your backend
Use threading to start threads yourself from the start request, rather than relying on the task queue to do it for you
Note that if your tasks are CPU bound, you're not going to get any extra benefit from running 10 of them over 5, 2, or maybe even 1.

How to create X tasks as fast as possible on Google App Engine

We push out alerts from GAE, and let's say we need to push out 50 000 alerts to CD2M (Cloud 2 Device Messaging). For this we:
Read all who wants alerts from the datastore
Loop through and create a "push task" for each notification
The problem is that the creation of the task takes some time so this doesn't scale when the user base grows. In my experience we are getting 20-30 seconds just creating the tasks when there is a lot of them. The reason for one task pr. push message is so that we can retry the task if something fails and it will only affect a single subscriber. Also C2DM only supports sending to one user at a time.
Will it be faster if we:
Read all who wants alerts from the datastore
Loop through and create a "pool task" for each 100 subscribers
Each "Pool task" will generate 100 "push tasks" when they execute
The task execution is very fast so in our scenario it seems like the creation of the tasks is the bottleneck and not the execution of the tasks. That's why I thought about this scenario to be able to increase the parallelism of the application. I would guess this would lead to faster execution but then again I may be all wrong :-)
We do something similar with APNS (Apple Push Notification Server): we create a task for a batch of notifications at a time (= pool task as you call it). When task executes, we iterate over a batch and send it to push server.
The difference with your setup is that we have a separate server for communicating with push, as APNS only supports socket communication.
The only downside is if there is an error, then whole task will be repeated and some users might get two notifications.
This sounds like it varies based on the number of alerts you need to send out, how long it takes to send each alert, and the number of active instances you have running.
My guess is that it takes a few milliseconds to tens of milliseconds to send out a CD2M alert, while it takes a few seconds for an instance to spin up, so you can probably issue a few hundred or a few thousand alerts before justifying another task instance. The ratio of the amount of time it takes to send each CD2M message vs the time it takes to launch an instance will dictate how many messages you'd want to send per task.
If you already have a fair number of instances running though, you don't have the delay of waiting for instances to spin up.
BTW, this seems almost like a perfect application of the MapReduce API. It mostly does what you describe in the second version, except it takes your initial query, and breaks that up into subqueries that each return a "page" of the result set. A task is launched for each subquery which processes all the items in its "page". This is an improvement from what you describe, because you don't need to spend the time looping through your initial result set.
I believe the default implementation for the MapReduce API just queries for all entities of a particular kind (ie all User objects), but you can change the filter used.

Task Queue VS. URLFetch

I need to run a script (python) in App Engine for many times.
One possibility is just to run a loop and use urlfetch with a link to the script.
The other one is to open a task with the script URL.
What is the difference between both ways? It seems like Tasks have a quota (100,000 daily free tasks) so why should I use them?
Thanks,
Joel
Briefly:
Bulk adding tasks to the queue will probably be easier, and possibly quicker, than using URLFetch. Although using async url-fetches might help with this.
When a task fails, it will automatically retry. Assuming you check the status of your call, URLFetch might just hang for a while before you get some type of error.
You can control the rate at which tasks are executed. So if you add 1,000 tasks fast you can let them slowly run at 10 / minute (or whatever you want), helping you not blow through your other quotas.
If you enable billing, the free quota is 20,000,000 / tasks per day.
Depending on what you are doing, tasks can be transactionally enqueued, which gives you some really powerful abilities.

Resources