bulk email with GAE and per minute quota - google-app-engine

I'm developing a voting application with GAE, which involves sending email to each voter. In my initial tests, I went over the per-minute email quota, and this exception was raised:
OverQuotaError: The API call mail.Send() required more quota than is available.
I was able to solve this short term by enabling billing, which greatly increases the per minute email quota, but what is the right way to prevent such an exception from being raised in the future? If my app becomes wildly successful and I exceed the larger quota, it would be a big problem to have this exception raised.
I don't want to put the call to send emails in a try, except block, since this is being done after processing a form, and I don't want the user to wait around for the response to the POST.
Is this a good use case for a task queue? If so, would I put a request to send a batch of emails in the task queue or would each request to send an email go in the task queue? The former seems better in that processing the POST would be faster. Regardless of which way I do it, would I add a delay between sending each email to ensure they are not sent to fast and I go over quota?

yes, ideally suited to the task queue as you can limit the rate at which your emails are sent out by changing the properties in the queue.yaml
one email per task would be best, so if the task fails and is retried it will only retry the failed one not all of the batch

yes. use a task queue. if a task is sending a email, you can decide how many tasks should run per minute. and if a task failed it will retry to execute.

Related

For Cloud Run triggered from PubSub, when is the right time to send ACK for the request message?

I was building a service that runs on Cloud Run that is triggered by PubSub through EventArc.
'PubSub' guarantees delivery at least one time and it would retry for every acknowledgement deadline. This deadline is set in the queue subscription details.
We could send an acknowledgement back at two points when a service receives a pub-sub request (which is received as a POST request in the service).
At the beginning of the request as soon as the request was received. The service would then continue to process the request at its own pace. However, this article points out that
When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.
So sending a response at the beginning may not be an option
After the request has been processed by the service. So this would mean that, depending on what the service would do, we cannot always predict how long it would take to process the request. Hence we cannot set the Acknowledgement deadline correctly, resulting in PubSub retries and duplicate requests.
So what is the best practice here? Is there a better way to handle this?
Best practice is generally to ack a message once the processing is complete. In addition to the Cloud Run limitation you linked, consider that if the endpoint acked a message immediately upon receipt and then an error occurred in processing it, your application could lose that message.
To minimize duplicates, you can set the ack deadline to an upper bound of the processing time. (If your endpoint ends up processing messages faster than this, the ack deadline won’t rate-limit incoming messages.) If the 600s deadline is not sufficient, you could consider writing the message to some persistent storage and then acking it. Then, a separate worker can asynchronously process the messages from persistent storage.
Since you are concerned that you might not be able to set the correct "Acknowledgement Deadline", you can use modify_ack_deadline() in your code where you can dynamically extend your deadline if the process is still running. You can refer to this document for sample code implementations.
Be wary that the maximum acknowledgement deadline is 600 seconds. Just make sure that your processing in cloud run does not exceed the said limit.
Acknowledgements do not apply to Cloud Run, because acks are for "pull subscriptions" where a process is continuously pulling the Cloud PubSub API.
To get events from PubSub into Cloud Run, you use "push subscriptions" where PubSub makes an HTTP request to Cloud Run, and waits for it to finish.
In this push scenario, PubSub already knows it made you a request (you received the event) so it does not need an acknowledgement about the receipt of the message. However, if your request sends a faulty response code (e.g. http 500) PubSub will make another request to retry (and this is configurable on the Push Subscription itself).

How to scale pull queues with Google Cloud Tasks

I have a GAE/P/Standard/FirstGen app that sends a lot of email with Sendgrid. Sendgrid sends my app a lot of notifications for when email is delivered, opened, etc.
This is how I process the Sendgrid notifications:
My handler processes the Sendgrid notification and adds a task to a pull queue
About once every minute I lease a batch of tasks from the pull queue to process them.
This works great except when I am sending more emails than usual. When I am adding tasks to the pull queue at a high rate, the pull queue refuses to lease tasks (it responds with TransientError) so the pull queue keeps filling up.
What is the best way to scale this procedure?
If I create a second pull queue and split the tasks between the two of them, will that double my capacity? Or is there something else I should consider?
====
This is how I add tasks:
q = taskqueue.Queue("pull-queue-name")
q.add(taskqueue.Task(data, method="PULL", tag=tag_name))
I have found some information about it in Google documentation here. According to it solution for TransientError should be to:
catch these exceptions, back off from calling lease_tasks(), and then
try again later.
etc.
Actually I suppose this is App Engine Task queue, not Cloud Tasks which are different product.
According to my understanding there is no option to scale this better. It seems that the some solution might be to migrate to Cloud Task and Pub/Sub which is better way to manage queues in GAE as you may find here.
I hope it will help somehow... :)

Cloud Pub/Sub subscriber repeats messages over 600ms

We recently integrated google pubsub into our app, and some of our long running tasks are now under problem, as they take more than 1 minute sometimes. We have configured our subscriber's ack deadline to 600 seconds, yet, anything that is taking more than 600ms, is being retried by pubsub.
this is our config:
gcloud pubsub subscriptions describe name
ackDeadlineSeconds: 600
expirationPolicy: {}
messageRetentionDuration: 604800s
Not sure what is the issue. Most of our tasks will get repeated because of this
Pub/Sub has a built in At-least-once delivery system which will retry messages that were not acknowledged. In this case, after 600s have passed, the message you first sent becomes unacknowledged, thus Pub/Sub retries the message. It will keep retrying it for 600s until it reaches the messageRetentionDuration or you acknowledge it.
Keep in mind that it's specified in the documentation that your subscriber should be idempotent. So, making your code be able to handle multiple messages should be the best approach to this issue.
You could also decrease the messageRetentionDuration to 600s(it's minimum) so anything that passes the 10 min mark will not be retried.
Also, it is stated in the FAQs that:
Why are there too many duplicate messages?
Cloud Pub/Sub guarantees at-least-once message delivery, which means
that occasional duplicates are to be expected. However, a high rate of
duplicates may indicate that the client is not acknowledging messages
within the configured ack_deadline_seconds, and Cloud Pub/Sub is
retrying the message delivery. This can be observed in the monitoring
metrics.
pubsub.googleapis.com/subscription/pull_ack_message_operation_count
for pull subscriptions, and
pubsub.googleapis.com/subscription/push_request_count for push
subscriptions. Look for elevated expired or webhook_timeout values in
the /response_code. This is particularly likely if there are many
small messages, since Cloud Pub/Sub may batch messages internally and
a partially acknowledged batch will be fully redelivered.
Another possibility is that the subscriber is not acknowledging some
messages because the code path processing those specific messages
fails, and the Acknowledge call is never made; or the push endpoint
never responds or responds with an error.

why does the default task queue of google app engine gets executed endlessly?

I am importing contacts from a CSV file, and using the blobstore service of the google app engine to save the blob and i send the blobkey as a parameter to the task queue url. So that the task queue url can use the blob key to parse the CSV file and save it in the datastore.
This here is my java code for creating a task queue.
Queue queue = QueueFactory.getDefaultQueue();
queue.add(TaskOptions.Builder.withUrl("/queuetoimport").param("contactsToImport", contactsDetail));
The Task queue actually gets executed but it does not end. It endlessly keep on saving the same contact to the datastore until i manually delete it.
What could be the reason.
This is done for error recovery. Suppose, for example, that your task was fetching a JSON feed from the network, parsing it, and storing it in a database... in the event that the network connection failed, timed out, etc. or the feed that was returned happened to be bad temporarily and failed to parse or any other number of intermittent, probabilistic sources of failure, this automatic retrying behavior (with exponential back off) would ensure that the task eventually completed successfuly (assuming that the failure is one that could be fixed by retrying and not a programmer error that would guarantee failure each and every time). The HTTP status code of the task is used to determine how the task completed (successfully or unsuccessfully) to determine if it needs to be retried. If you don't want the task to be retried, make sure it completes succesfully (and lets App Engine know about it by using a success status code, which is any of the 2xx-level codes).
If you consider the contacts example, ensuring that the contact is saved (even if there is a temporary glitch in the task handler for it), is much better than silently dropping user data.

Google App Engine - Cron or Task Queue?

I'm building a simple "play against a random opponent" back-end using Goole App Engine. So far I'm adding each user that wants to play into a "table" in the Datastore. As soon as there are more than 1 player in the Datastore, I can start to match them.
The Schedule Tasks with Cron looked promising for this work until I saw that the lowest resolution seems to be minutely. If there are plenty of players signing up I want them to be matched quickly and not have to wait a whole minute (worst case).
I thought about having the servlet that recives the "play against random opponent" request POST to a Task Queue that would do the match making, but I think this will lead to a lot of contention when reading from the Datastore and deleting the enteties from the "random" table after they have been matched?
Basically I want a single worker that will do the matching, and I want to signal this worker from time to time that now is a good time to try to match opponents.
Any suggestions on what would be the right course of action here?
You can guarantee exclusive access via transactions:
Receive a request to play via REST. Check (within a transaction) if there is any request in database.
If there is, notify both users to start the play and delete request (transactionaly) from database.
If there isn't, add it to the database and wait for the next request.
Update:
Alternativelly you can achieve what you need via pull queue. Same scenario as above, just instead of datastore you'd check if there is a task in the pull queue, retrieve if there is or create a new one if there isn't one.

Resources