App Engine - Pull queues max_concurrent_requests limit?

App Engine - Pull queues max_concurrent_requests limit? - google-app-engine

I'm using Google App Engine pull queues to send massive push notifications to APNS, GCM and OneSignal mostly following this architecture: https://cloudplatform.googleblog.com/2013/07/google-app-engine-takes-pain-out-of-sending-ios-push-notifications.html
The problem is that I'm hitting some kind of limit about how many tasks are leased at the same time: my Notification Workers lease 3 notifications at a time, but when there are more than about 30 workers running, leaseTasks() returns an empty array, even when there are hundreds or thousands of pending tasks. As far as I know, there is no limit about how many tasks are leased at the same time, so this behaviour is unexpected.

Have you seen this limit of pull queues in the docs:
If you generate more than 10 LeaseTasks requests per queue per second,
only the first 10 requests will return results. The others will return
no results.
If you have 30 workers, it seems that you could easily hit this limit. Could you lease more tasks at a time and use fewer workers?

Related

Why is Google Cloud Tasks so slow?

I use Google Cloud Tasks with AppEngine to process tasks, but the tasks wait about 2-3 minutes in the queue before being sent to my App Engine endpoint.
There is no "delay" set on the tasks, and I expect them to be sent right away.
So the question is: Is Cloud Tasks slow?
As you can see is the following screenshot, Cloud Tasks gives an ETA of about 3 mins:

The official word from Google is that this is the best you can expect from their task queues.
In my experience, how you configure tasks seems to influence how quickly they get executed.
It seems that:
If you don't change the default behavior of your task queues (e.g., maximum concurrent, etc.) and if you don't specify an execution time of a task (e.g., eta) then your tasks will execute very soon after submission.
If you mess with either of these two things, then Google takes longer to execute your tasks. My guess is that it is the extra overhead of controlling task rate and execution.

I see from your screenshot that you have a task with an ETA of 2 min 49 sec which is the time until your task will be run. You have high bucket size and concurrency numbers, so I think your issue has more to do with the parameters you are using when queueing your tasks, especially the scheduled_time attribute. Check your code to see if you are adding a delay to your tasks, and make sure to tune it down.

Just adding here, that as of February 2023, I can queue tasks and then consume them VERY fast using the Python 3.7 libraries.
Takes me about 13.5 seconds to queue up 1000 tasks.
Takes about 1 minute to process those 1000 tasks using a Cloud Run deployed python/flask app. (No other processing done, just receive and reply with 200).
So, super fast!
BTW, pubsub was much slower in my tests... about 40ms per message to queue a message.

Performance is way too slow with pubsub push -> app engine

I have a requirement to process ~20k calls per second. My system processes lists of ~1M entries and performs multiple jobs for each item. It is very “bursty” in nature as it isn’t always processing a list. I added a App Engine flex env (with Rails), using automatic scaling, with an test endpoint to wait 5 seconds and return. I push to the pubsub topic and a push subscription sends to App Engine. Running this hits a steady state of 20-30 requests per second.
I guessed that the problem was the interaction of the pubsub push volume algorithm interacting with the App Engine, but then I ran a second test where I just blasted curl requests as in a loop with multiple processes. This also ran at 20-30 rps.
I’m stuck at this point and wondering how to proceed. How can I configure the system for higher performance? I need a performance of three orders of magnitude from what I see.
Thanks so much for helping!

My experience with pub/sub is that is really slow for single messaging processing. I'd guess that there's an overhead for being http. And the average time I see is around 40ms outside google and 20ms if you are running your code on google servers.
What worked for me is to batch messages instead, I was able to get up to 100k msgs/sec when batching them in 1k messages per publish.

Receiving email on GAE: still 60s to complete processing?

My application is doing some juggling of email attachments. So far it's clocking in around 20s and everything works fine. But if I send larger attachments and it passes 60s, is it going to break?

The App Engine doc does not say if the mail reception servlet have a timeout of 60s or 10 minutes, so it's hard to say.
In any case, I would recommend you perform the following in the servlet that handles /_ah/mail :
Store the mail content in Cloud Storage or blob store
Start a task to process this mail
That way you will take advantage of the retry capabilities of task, and you'll have 10 minutes to process your mail.
If you believe your task may take more than 10 minutes, you can either break up in smaller tasks (chained or parallel depending on your use case) or use modules to go beyond the 10 minutes limit. Note that modules will not stay up forever and you should not expect to perform 4 hour tasks on modules, for example.

Is lease_tasks() in gae pull queues a blocking method?

I have a pull-queue in Google App Engine and a resident backend which processes the tasks in the pull queue. The backend has several worker threads for consuming and processing tasks, as suggested in a post in Google Cloud Platform blog
https://cloud.google.com/resources/articles/ios-push-notifications
Workers poll the pull-queue with lease_tasks(). My question is: is lease_tasks() supposed to be a blocking method, i.e. block the current thread's execution until either the queue has some tasks or a deadline is exceeded?
According to the GAE documentation
https://developers.google.com/appengine/docs/python/taskqueue/overview-pull#Python_Leasing_tasks
lease_tasks() accepts a 'deadline' parameter and may raise the DeadlineExceededError, thus isn't rational to assume that lease_tasks() blocks up to 'deadline' seconds?
The problem is that while I 'm developing the application in the development server, lease_tasks() returns immediately with an empty list of tasks. The result is that the worker thread's while-loop is constantly calling lease_tasks(), thus consuming 100% of CPU. If I put an explicit sleep(), say for 5 sec, that will make the worker go to sleep and won't wake up if a task is placed in the queue in the mean time. That would make the worker less responsive (worst case, it might take ->5 secs for handling the next task), plus I would consume more CPU (wake up->sleep cycles) than just having the thread block in a 'queue' (I know the pull-queue is actually an RPC, yet it abstractly remains a producer queue)
Perhaps this happens only with the dev app server while in GAE lease_tasks() blocks. However, the example code from the blog post mentioned above also suspends thread execution with sleep(). The example code is available in github and a link is in the blog post (unfortunately I cannot post it here)

lease_tasks does not wait for new tasks to be added. Most task queue calls take up to 5 seconds. The calls to lease tasks and to fetch queue statistics take longer - up to 10 seconds by default.
Most users won't need to set the deadline, it is an optional parameter. If you have very many workers contending on the same queue and often experience transient errors after 10 seconds, consider increasing the lease deadline to 20 seconds (or shard the load over more queues and/or tags). Alternately, if you only have one worker and it always needs time to perform other work in addition to leasing tasks, a small deadline like 5 seconds could be used, but it is much better to use the async API instead.

Preparing for a flash crowd on Google App Engine

I recently experienced a sharp, short-lived increase in the load of my service on Google App Engine. The load went from ~1-2 req/second to about 10 req/second for about a couple of hours. My number of dynamic instances scaled up pretty quickly but in the process I did get a number of "Request waited too long" timeout messages.
So the next time around, I would like to be prepared with enough idle instances to handle my load. But now the question is, how do I determine how many is adequate. I expect a much larger burst in load this time - from practically nothing to an average of 500 requests/second, possibly with a peak of 3000. This is to last between 15 minutes and 1 hour.
My main goal is to ensure that the information passed via HTTP Post is saved to the datastore by means of a single write.
Here are the steps I have taken to prepare for the burst:
I have pruned the fast path to disable analytics and other reporting, which typically generate 2 urlfetch requests.
The datastore write is to be deferred to a taskqueue via the deferred library
What I would like to know is:
1. Tips/insights into calculating how many idle instances one would need per N requests/second.
2. It seems that the maximum throughput of a task queue is 500/second. Is this the rate at which you can push tasks, and if not, then is there a cap on that? I'm guessing not, since these are probably just datastore writes, but I would like to be sure.
My fallback plan if I am not confident of saving all of the information for this flash mob is to set up a beefy Amazon EC2 instance, run a web server on it and make my clients send a backup request to this server.

You must understand that Idle Instances are only used when new frontend instances are being spun-up. This means that they are only used during traffic increases. When traffic is steady they are not used.
Now if your instance needs 20 sec to spin up and can handle 10 req/sec of steady traffic and you traffic INCREASE is 5 req/sec, then you'll need 20 * 5 / 10 = 10 idle instances if you don't want any requests dropped.
What you should do is:
Maximize instance throughput (number of requests it can handle): optimize code, use async db operations and enable Concurrent Requests.
Minimize your instance startup time. This is important because idle instances are used during spinning up of new instances and the time it takes to spin up a new instance directly relates to how many idle instances you need. If you use Java this means getting rid of any heavy frameworks that do classpath scanning (Spring, etc..).
Fourth, number of frontend instances needed is VERY application specific. But since you already had traffic increase you should know how many requests your frontend instance can handle per second.
Edit: There is one more obvious thing you should do: HTTP caching. GAE has a transparent HTTP cache which can be simply controlled via Cache-Control headers.
Also, if analytics has a big performance impact on your server, consider using client side analytics services (like Google Analytics). They also work for devices.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight