When is Enforced Rate reduced below queue.xml rate? - google-app-engine

Our "Enforced Rate" dropped to 0.10/s even though the queue is defined to be 20/s for a Push Task Queue. We had a 2 hour backlog of tasks built up.
The documentation says:
The enforced rate may be decreased when your application returns a 503
HTTP response code, or if there are no instances able to execute a
request for an extended period of time.
There were no 503s (all 200s), yet we saw 0 (zero) instances running for the version processing that queue, which may explain the Enforce Rate drop. The target version of the application processing the queue is not the primary version, and the target version has no other source of requests than the Push Task Queue workload.
If the Enforced Rate had been dropped due to lack of instances, why did GAE not spin up new instances?
There were no budget limits or rate limits that explain the drop to 0 instances.

I'm answering this as the bug we're starring http://code.google.com/p/googleappengine/issues/detail?id=7338

Related

Why is Google Cloud Tasks so slow?

I use Google Cloud Tasks with AppEngine to process tasks, but the tasks wait about 2-3 minutes in the queue before being sent to my App Engine endpoint.
There is no "delay" set on the tasks, and I expect them to be sent right away.
So the question is: Is Cloud Tasks slow?
As you can see is the following screenshot, Cloud Tasks gives an ETA of about 3 mins:
The official word from Google is that this is the best you can expect from their task queues.
In my experience, how you configure tasks seems to influence how quickly they get executed.
It seems that:
If you don't change the default behavior of your task queues (e.g., maximum concurrent, etc.) and if you don't specify an execution time of a task (e.g., eta) then your tasks will execute very soon after submission.
If you mess with either of these two things, then Google takes longer to execute your tasks. My guess is that it is the extra overhead of controlling task rate and execution.
I see from your screenshot that you have a task with an ETA of 2 min 49 sec which is the time until your task will be run. You have high bucket size and concurrency numbers, so I think your issue has more to do with the parameters you are using when queueing your tasks, especially the scheduled_time attribute. Check your code to see if you are adding a delay to your tasks, and make sure to tune it down.
Just adding here, that as of February 2023, I can queue tasks and then consume them VERY fast using the Python 3.7 libraries.
Takes me about 13.5 seconds to queue up 1000 tasks.
Takes about 1 minute to process those 1000 tasks using a Cloud Run deployed python/flask app. (No other processing done, just receive and reply with 200).
So, super fast!
BTW, pubsub was much slower in my tests... about 40ms per message to queue a message.

Cloud PubSub quotas for "Push subscriber throughput" do not apply if set too low

We tried to enforce a certain rate limit on Cloud PubSub Push Subscrber by setting the quota on "Push subscriber throughput, kB" to 1, effectively meaning that PubSub should process no more than 1 kbps with the push subscriber.
However, the actual throughput can be higher than that, around to 6-8 kbps.
Why is that not limiting the throughput as expected?
More details:
The goal is to have a rate limit of 50 messages per second.
We can assume the average message size, for the purposes of our testing we use 50 bytes messages, which is 50 bytes * 60 second = 3000 bytes per second, or 3 kbps for a message every second. By setting the quota to 1 we expected to get way less than 50 messages per second pushed by PubSub. During testing we got signiticantly more than that.
At the moment, there is a known issue with the enforcement of push subscriber quota in Google Cloud Pub/Sub.
In general, push subscriber quota is not really a good way to try to enforce flow control. For true flow control, it is better to use pull subscribers and the client libraries. The goal of flow control in the subscriber is to prevent the subscriber from being overwhelmed. In the client library, flow control is defined in terms of outstanding messages and/or outstanding bytes. When one of these limits is reached, Cloud Pub/Sub suspends the delivery of more messages.
The issue with rate-based flow control is that it doesn't account well for unexpected issues with the subscriber or its downstream dependencies. For example, imagine that the subscriber receives messages, writes to a database, and then acknowledges the message. If the database were suffering from high latency or just unavailable for a period of time, then rate-based flow control is still going to deliver more messages to the subscriber, which will back up and could eventually overload its memory. With flow control based on outstanding messages or bytes, the fact that the database is unavailable (which prevents the acknowledgement of messages by the subscriber) means that delivery is completely halted. In this situation where the database cannot process any messages or is processing them extremely slowly, sending more messages--even at a very low rate--is still harmful to the subscriber.

Preparing for a flash crowd on Google App Engine

I recently experienced a sharp, short-lived increase in the load of my service on Google App Engine. The load went from ~1-2 req/second to about 10 req/second for about a couple of hours. My number of dynamic instances scaled up pretty quickly but in the process I did get a number of "Request waited too long" timeout messages.
So the next time around, I would like to be prepared with enough idle instances to handle my load. But now the question is, how do I determine how many is adequate. I expect a much larger burst in load this time - from practically nothing to an average of 500 requests/second, possibly with a peak of 3000. This is to last between 15 minutes and 1 hour.
My main goal is to ensure that the information passed via HTTP Post is saved to the datastore by means of a single write.
Here are the steps I have taken to prepare for the burst:
I have pruned the fast path to disable analytics and other reporting, which typically generate 2 urlfetch requests.
The datastore write is to be deferred to a taskqueue via the deferred library
What I would like to know is:
1. Tips/insights into calculating how many idle instances one would need per N requests/second.
2. It seems that the maximum throughput of a task queue is 500/second. Is this the rate at which you can push tasks, and if not, then is there a cap on that? I'm guessing not, since these are probably just datastore writes, but I would like to be sure.
My fallback plan if I am not confident of saving all of the information for this flash mob is to set up a beefy Amazon EC2 instance, run a web server on it and make my clients send a backup request to this server.
You must understand that Idle Instances are only used when new frontend instances are being spun-up. This means that they are only used during traffic increases. When traffic is steady they are not used.
Now if your instance needs 20 sec to spin up and can handle 10 req/sec of steady traffic and you traffic INCREASE is 5 req/sec, then you'll need 20 * 5 / 10 = 10 idle instances if you don't want any requests dropped.
What you should do is:
Maximize instance throughput (number of requests it can handle): optimize code, use async db operations and enable Concurrent Requests.
Minimize your instance startup time. This is important because idle instances are used during spinning up of new instances and the time it takes to spin up a new instance directly relates to how many idle instances you need. If you use Java this means getting rid of any heavy frameworks that do classpath scanning (Spring, etc..).
Fourth, number of frontend instances needed is VERY application specific. But since you already had traffic increase you should know how many requests your frontend instance can handle per second.
Edit: There is one more obvious thing you should do: HTTP caching. GAE has a transparent HTTP cache which can be simply controlled via Cache-Control headers.
Also, if analytics has a big performance impact on your server, consider using client side analytics services (like Google Analytics). They also work for devices.

Handling AppEngine outage with pending_ms during several minutes

Today AppEngine went down for a while:
http://code.google.com/status/appengine/detail/serving/2012/10/26#ae-trust-detail-helloworld-get-latency
The result was that all requests were kept as pending, for some for as long as 24 minutes. Here is an excerpt from my server log. These requests are in general handled in less than 200 ms.
https://www.evernote.com/shard/s8/sh/ad3b58bf-9338-4cf7-aa35-a255d96aebbc/4b90815ba1c8cd2080b157a54d714ae0
My quota (8$ per day) was exploded in a matter of minutes when it previously was at around 2$ per day.
How can I prevent pending_ms to eat all my quota, even though my actual request is still responding very fast? I had the pending delay from 300 ms to Automatic. Does limiting the maximum to 10 seconds prevent that type of outbreak?
blackjack75,
You're right, raising the pending latency to something like 10 seconds will help reduce the number of instances started.
It looks like the long running requests tied up your instances. When this happens, app engine spins up new instances to handle the new requests, and of course instances cost money.
Lowering your min and max idle instances to smaller numbers should also help.
On your dashboard, you can look at your instance graph you to see how long the burst of instances was left idle after the request load was finished.
You can look at your typical usage to help estimate a safe max.
Lowering them can cause slowness when legitimate traffic needs to spin up a new instance, especially with bursty traffic, so you would want to adjust this to match your budget. For comparision, on a non-production appspot having the min and max set to 1 works fine.
Besides that, general techniques for reducing app engine resource usage will help. It sounds like you've gone through that already since your typical request time is low. Enabling concurrent requests could help here if your code will handle threads correctly (no globals, etc.) and your instances have enough free memory handle multiple requests.

GAE tasks and execution rate

No matter what I set rate and bucket_size I always see only one or two tasks running in the same time. Does anyone know what might be the reason?
My queue configuration is:
name: MyQueue
rate: 30/s
bucket_size: 50
Note that the documentation mentions
To ensure that the taskqueue system does not overwhelm your
application, it may throttle the rate at which requests are sent. This
throttled rate is known as the enforced rate. The enforced rate may be
decreased when your application returns a 503 HTTP response code, or
if there are no instances able to execute a request for an extended
period of time.
In such situations, the admin console will show an Enforced Rate lower than the 30/s you requested.

Resources