Delay message processing and delete before processing - google-cloud-pubsub

I need this ability to send push notifications for an action in a mobile app but wait for the user to undo the action until say 10 seconds.
Is it possible to delay the processing of a message published in a topic by 10 seconds ? And then (sometimes, if user does undo) delete the message before 10 seconds, if it doesn't need to be processed ?

Depends on if you write the subscribers as well or not:
You have control over your subscriber's code:
In your PubSub messages add a timestamp for when you want that message to be processed.
In your clients (subscribers), have logic to acknowledge the message only if the timestamp to process the message is reached.
PubSub will retry delivering the message until it's acknowledged (or 10 days)
If you don't have control over your subscriber you can have a my-topic and my-delayed-topic. Folks can publish to the former topic and that topic will have only one subscriber which you will implement:
Publish message as before to my-topic.
You will have a subscriber for that topic that can do the same throttling as shown above.
If the time for that message has reached your handler will publish/relay that message to my-delayed-topic.
You can also implement the logic above with task-queue+pubsub-topic instead of pubsub-topic+pubsub-topic.

If architecturally possible at all, you could use Cloud Tasks. This API has the following features that might suit your usecase:
You can schedule the delivery of the message (task)
You can delete the tasks from the queue (before they are executed)
Assuming that your client has a storage for some task Ids:
Create a task with schedule_time set to 10s in the future.
Store the task name in memory (you can either assign a name to the task at creation time, or use the automatically generated ID returned from the create response).
If user undid the job, then call DeleteTask.

Just wanted to share that I noticed Pub/Sub supports retry policies 1 that are GA as of 2020-06-16 2.
If the acknowledgement deadline expires or a subscriber responds with a negative acknowledgement, Pub/Sub can send the message again using exponential backoff.
If the retry policy isn't set, Pub/Sub resends the message as soon as the acknowledgement deadline expires or a subscriber responds with a negative acknowledgement.
If the maximum backoff duration is set, the default minimum backoff duration is 10 seconds. If the minimum backoff duration is set, the default maximum backoff duration is 600 seconds.
The longest backoff duration that you can specify is 600 seconds.

Related

For Cloud Run triggered from PubSub, when is the right time to send ACK for the request message?

I was building a service that runs on Cloud Run that is triggered by PubSub through EventArc.
'PubSub' guarantees delivery at least one time and it would retry for every acknowledgement deadline. This deadline is set in the queue subscription details.
We could send an acknowledgement back at two points when a service receives a pub-sub request (which is received as a POST request in the service).
At the beginning of the request as soon as the request was received. The service would then continue to process the request at its own pace. However, this article points out that
When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.
So sending a response at the beginning may not be an option
After the request has been processed by the service. So this would mean that, depending on what the service would do, we cannot always predict how long it would take to process the request. Hence we cannot set the Acknowledgement deadline correctly, resulting in PubSub retries and duplicate requests.
So what is the best practice here? Is there a better way to handle this?
Best practice is generally to ack a message once the processing is complete. In addition to the Cloud Run limitation you linked, consider that if the endpoint acked a message immediately upon receipt and then an error occurred in processing it, your application could lose that message.
To minimize duplicates, you can set the ack deadline to an upper bound of the processing time. (If your endpoint ends up processing messages faster than this, the ack deadline won’t rate-limit incoming messages.) If the 600s deadline is not sufficient, you could consider writing the message to some persistent storage and then acking it. Then, a separate worker can asynchronously process the messages from persistent storage.
Since you are concerned that you might not be able to set the correct "Acknowledgement Deadline", you can use modify_ack_deadline() in your code where you can dynamically extend your deadline if the process is still running. You can refer to this document for sample code implementations.
Be wary that the maximum acknowledgement deadline is 600 seconds. Just make sure that your processing in cloud run does not exceed the said limit.
Acknowledgements do not apply to Cloud Run, because acks are for "pull subscriptions" where a process is continuously pulling the Cloud PubSub API.
To get events from PubSub into Cloud Run, you use "push subscriptions" where PubSub makes an HTTP request to Cloud Run, and waits for it to finish.
In this push scenario, PubSub already knows it made you a request (you received the event) so it does not need an acknowledgement about the receipt of the message. However, if your request sends a faulty response code (e.g. http 500) PubSub will make another request to retry (and this is configurable on the Push Subscription itself).

Pub/Sub push return 503 for basic scaling

I am using Pub/Sub push subscription, ack deadline is set to 10 minutes, the push endpoint is hosted within AppEngine using basic scaling.
In my logs, I see that some of the Pub/Sub (supposedly delivered to starting instances) push requests are failed with 503 error status and Request was aborted after waiting too long to attempt to service your request. log message. The execution time for this request varies from 10 seconds (for most of the requests) up to 30 seconds for some of them.
According to this article https://cloud.google.com/appengine/docs/standard/python/how-instances-are-managed#instance_scaling Deadlines for HTTP request is 24 hours and request should not be aborted in 10 seconds.
Is there a way to avoid such exceptions?
These failed requests are most likely timing out in the Pending Request Queue, meaning that no instances are available to serve them. This usually happens during spikes of PubSub messages that are delivered in burst and App Engine can't scale up quickly enough to cope with them.
One option to mitigate this would be to switch the scaling option to automatic scaling in your app.yaml file. You can tweak the min_pending_latency and max_pending_latency to better fit your scenario. You can also specify min_idle_instances to get idle instances that would be ready to handle extra load (make sure to also enable and handle warmup requests)
Take into account though that PubSub will automatically retry to deliver failed messages. It will adjust the delivery rate according to your system's behavior, as documented here. So you may experience some errors during spikes of messages, while new instances are being spawned, but your messages will eventually be processed (as long as you have setup max_instances high enough to handle the load).

Cloud Pub/Sub subscriber repeats messages over 600ms

We recently integrated google pubsub into our app, and some of our long running tasks are now under problem, as they take more than 1 minute sometimes. We have configured our subscriber's ack deadline to 600 seconds, yet, anything that is taking more than 600ms, is being retried by pubsub.
this is our config:
gcloud pubsub subscriptions describe name
ackDeadlineSeconds: 600
expirationPolicy: {}
messageRetentionDuration: 604800s
Not sure what is the issue. Most of our tasks will get repeated because of this
Pub/Sub has a built in At-least-once delivery system which will retry messages that were not acknowledged. In this case, after 600s have passed, the message you first sent becomes unacknowledged, thus Pub/Sub retries the message. It will keep retrying it for 600s until it reaches the messageRetentionDuration or you acknowledge it.
Keep in mind that it's specified in the documentation that your subscriber should be idempotent. So, making your code be able to handle multiple messages should be the best approach to this issue.
You could also decrease the messageRetentionDuration to 600s(it's minimum) so anything that passes the 10 min mark will not be retried.
Also, it is stated in the FAQs that:
Why are there too many duplicate messages?
Cloud Pub/Sub guarantees at-least-once message delivery, which means
that occasional duplicates are to be expected. However, a high rate of
duplicates may indicate that the client is not acknowledging messages
within the configured ack_deadline_seconds, and Cloud Pub/Sub is
retrying the message delivery. This can be observed in the monitoring
metrics.
pubsub.googleapis.com/subscription/pull_ack_message_operation_count
for pull subscriptions, and
pubsub.googleapis.com/subscription/push_request_count for push
subscriptions. Look for elevated expired or webhook_timeout values in
the /response_code. This is particularly likely if there are many
small messages, since Cloud Pub/Sub may batch messages internally and
a partially acknowledged batch will be fully redelivered.
Another possibility is that the subscriber is not acknowledging some
messages because the code path processing those specific messages
fails, and the Acknowledge call is never made; or the push endpoint
never responds or responds with an error.

Google Pub/Sub retry policy ? how to deal with a poison pill?

In a Pub/Sub 'push' model the docs say this:
If the push endpoint returns an error code, messages are retried for up to 7 days with an exponential backoff policy (capped at 10 seconds).
Is there a way to decide what to do with the message after the retry period ? i.e. send it to some error queue etc ?
The seven-day retry period represents the maximum amount of time unacknowledged messages are retrained in Cloud Pub/Sub to be delivered to subscribers. After the seven days pass, a message is automatically deleted from Cloud Pub/Sub and no longer delivered. The system does not currently support performing any actions on these deleted messages such as sending them to an error queue.

bulk email with GAE and per minute quota

I'm developing a voting application with GAE, which involves sending email to each voter. In my initial tests, I went over the per-minute email quota, and this exception was raised:
OverQuotaError: The API call mail.Send() required more quota than is available.
I was able to solve this short term by enabling billing, which greatly increases the per minute email quota, but what is the right way to prevent such an exception from being raised in the future? If my app becomes wildly successful and I exceed the larger quota, it would be a big problem to have this exception raised.
I don't want to put the call to send emails in a try, except block, since this is being done after processing a form, and I don't want the user to wait around for the response to the POST.
Is this a good use case for a task queue? If so, would I put a request to send a batch of emails in the task queue or would each request to send an email go in the task queue? The former seems better in that processing the POST would be faster. Regardless of which way I do it, would I add a delay between sending each email to ensure they are not sent to fast and I go over quota?
yes, ideally suited to the task queue as you can limit the rate at which your emails are sent out by changing the properties in the queue.yaml
one email per task would be best, so if the task fails and is retried it will only retry the failed one not all of the batch
yes. use a task queue. if a task is sending a email, you can decide how many tasks should run per minute. and if a task failed it will retry to execute.

Resources