In a Pub/Sub 'push' model the docs say this:
If the push endpoint returns an error code, messages are retried for up to 7 days with an exponential backoff policy (capped at 10 seconds).
Is there a way to decide what to do with the message after the retry period ? i.e. send it to some error queue etc ?
The seven-day retry period represents the maximum amount of time unacknowledged messages are retrained in Cloud Pub/Sub to be delivered to subscribers. After the seven days pass, a message is automatically deleted from Cloud Pub/Sub and no longer delivered. The system does not currently support performing any actions on these deleted messages such as sending them to an error queue.
Related
I was building a service that runs on Cloud Run that is triggered by PubSub through EventArc.
'PubSub' guarantees delivery at least one time and it would retry for every acknowledgement deadline. This deadline is set in the queue subscription details.
We could send an acknowledgement back at two points when a service receives a pub-sub request (which is received as a POST request in the service).
At the beginning of the request as soon as the request was received. The service would then continue to process the request at its own pace. However, this article points out that
When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.
So sending a response at the beginning may not be an option
After the request has been processed by the service. So this would mean that, depending on what the service would do, we cannot always predict how long it would take to process the request. Hence we cannot set the Acknowledgement deadline correctly, resulting in PubSub retries and duplicate requests.
So what is the best practice here? Is there a better way to handle this?
Best practice is generally to ack a message once the processing is complete. In addition to the Cloud Run limitation you linked, consider that if the endpoint acked a message immediately upon receipt and then an error occurred in processing it, your application could lose that message.
To minimize duplicates, you can set the ack deadline to an upper bound of the processing time. (If your endpoint ends up processing messages faster than this, the ack deadline won’t rate-limit incoming messages.) If the 600s deadline is not sufficient, you could consider writing the message to some persistent storage and then acking it. Then, a separate worker can asynchronously process the messages from persistent storage.
Since you are concerned that you might not be able to set the correct "Acknowledgement Deadline", you can use modify_ack_deadline() in your code where you can dynamically extend your deadline if the process is still running. You can refer to this document for sample code implementations.
Be wary that the maximum acknowledgement deadline is 600 seconds. Just make sure that your processing in cloud run does not exceed the said limit.
Acknowledgements do not apply to Cloud Run, because acks are for "pull subscriptions" where a process is continuously pulling the Cloud PubSub API.
To get events from PubSub into Cloud Run, you use "push subscriptions" where PubSub makes an HTTP request to Cloud Run, and waits for it to finish.
In this push scenario, PubSub already knows it made you a request (you received the event) so it does not need an acknowledgement about the receipt of the message. However, if your request sends a faulty response code (e.g. http 500) PubSub will make another request to retry (and this is configurable on the Push Subscription itself).
I have a requirement to track the undelivered messages in PubSub. But when a subscriber to a PubSub Pull subscription is unavailable after the retention period the message will be lost forever from the subscription. It is not been captured by the dead letter topic created for the subscription.
It seems the PubSub only sends a message to a dead letter topic if the number of retries exceeds and the acknowledgement not been received by the subscriber.
Is there a way to push a message to dead letter topic before the message get lost for forever?
There is no way to send messages to a dead letter topic before the message is deleted due to the retention period expiring, no. The goal of the dead letter topic is to capture messages that are causing issues for subscribers and potentially preventing the processing of other messages, e.g., if the subscribers are crashing due to an unexpected message. The way this state is detected is via the retry count.
We recently integrated google pubsub into our app, and some of our long running tasks are now under problem, as they take more than 1 minute sometimes. We have configured our subscriber's ack deadline to 600 seconds, yet, anything that is taking more than 600ms, is being retried by pubsub.
this is our config:
gcloud pubsub subscriptions describe name
ackDeadlineSeconds: 600
expirationPolicy: {}
messageRetentionDuration: 604800s
Not sure what is the issue. Most of our tasks will get repeated because of this
Pub/Sub has a built in At-least-once delivery system which will retry messages that were not acknowledged. In this case, after 600s have passed, the message you first sent becomes unacknowledged, thus Pub/Sub retries the message. It will keep retrying it for 600s until it reaches the messageRetentionDuration or you acknowledge it.
Keep in mind that it's specified in the documentation that your subscriber should be idempotent. So, making your code be able to handle multiple messages should be the best approach to this issue.
You could also decrease the messageRetentionDuration to 600s(it's minimum) so anything that passes the 10 min mark will not be retried.
Also, it is stated in the FAQs that:
Why are there too many duplicate messages?
Cloud Pub/Sub guarantees at-least-once message delivery, which means
that occasional duplicates are to be expected. However, a high rate of
duplicates may indicate that the client is not acknowledging messages
within the configured ack_deadline_seconds, and Cloud Pub/Sub is
retrying the message delivery. This can be observed in the monitoring
metrics.
pubsub.googleapis.com/subscription/pull_ack_message_operation_count
for pull subscriptions, and
pubsub.googleapis.com/subscription/push_request_count for push
subscriptions. Look for elevated expired or webhook_timeout values in
the /response_code. This is particularly likely if there are many
small messages, since Cloud Pub/Sub may batch messages internally and
a partially acknowledged batch will be fully redelivered.
Another possibility is that the subscriber is not acknowledging some
messages because the code path processing those specific messages
fails, and the Acknowledge call is never made; or the push endpoint
never responds or responds with an error.
When using Amazon's SQS for example, I can define a dead letter queue (DLQ) where any message that has failed to be deleted (ack'd) after X retries will be routed to for separate processing.. but it Google Cloud Platform I don't see any mention of this
Google Cloud Pub/Sub does not currently have any automatic dead letter queues. If you are worried about "poison pill" messages, you will have to support this in some capacity yourself by persistently keeping a map from the message ID to the number of times the message has been delivered. You would update this map in your subscriber before reading the data in the message. Once acknowledged, you remove from the map. If the count exceeds some threshold, you could publish it to a separate Cloud Pub/Sub topic that you use to keep track of such messages and then ack the message.
Cloud Pub/Sub now supports Dead Letter Queues that can be used to handle poison pill messages.
I need this ability to send push notifications for an action in a mobile app but wait for the user to undo the action until say 10 seconds.
Is it possible to delay the processing of a message published in a topic by 10 seconds ? And then (sometimes, if user does undo) delete the message before 10 seconds, if it doesn't need to be processed ?
Depends on if you write the subscribers as well or not:
You have control over your subscriber's code:
In your PubSub messages add a timestamp for when you want that message to be processed.
In your clients (subscribers), have logic to acknowledge the message only if the timestamp to process the message is reached.
PubSub will retry delivering the message until it's acknowledged (or 10 days)
If you don't have control over your subscriber you can have a my-topic and my-delayed-topic. Folks can publish to the former topic and that topic will have only one subscriber which you will implement:
Publish message as before to my-topic.
You will have a subscriber for that topic that can do the same throttling as shown above.
If the time for that message has reached your handler will publish/relay that message to my-delayed-topic.
You can also implement the logic above with task-queue+pubsub-topic instead of pubsub-topic+pubsub-topic.
If architecturally possible at all, you could use Cloud Tasks. This API has the following features that might suit your usecase:
You can schedule the delivery of the message (task)
You can delete the tasks from the queue (before they are executed)
Assuming that your client has a storage for some task Ids:
Create a task with schedule_time set to 10s in the future.
Store the task name in memory (you can either assign a name to the task at creation time, or use the automatically generated ID returned from the create response).
If user undid the job, then call DeleteTask.
Just wanted to share that I noticed Pub/Sub supports retry policies 1 that are GA as of 2020-06-16 2.
If the acknowledgement deadline expires or a subscriber responds with a negative acknowledgement, Pub/Sub can send the message again using exponential backoff.
If the retry policy isn't set, Pub/Sub resends the message as soon as the acknowledgement deadline expires or a subscriber responds with a negative acknowledgement.
If the maximum backoff duration is set, the default minimum backoff duration is 10 seconds. If the minimum backoff duration is set, the default maximum backoff duration is 600 seconds.
The longest backoff duration that you can specify is 600 seconds.