Why Google pubsub java8 client is receiving duplicate messages even after acknowledgement? - google-cloud-pubsub

We are using google-cloud-pubsub (0.24.0-beta) pull client for reading messages from subscriber and seeing high rate of duplicates in that. Google documentation says that little duplication is expected but in our case, we are seeing 80% of messages are getting duplicated even after acknowledgement.
The most weird part is, even if we acknowledge the message immediately in receiver using consumer.ack(), duplicates are still occurring.
Does anybody know how to handle this.

A large number of message duplicates could be the result of flow control settings being set too high or too low. If your flow control settings are too high, where you are allowing too many messages to be outstanding to your client at the same time, then it is possible that the acks are being set too late. If this is the cause, you would probably see the CPU of your machine at or near 100%. In this case, try setting the max number of outstanding messages or bytes to a lower number.
It could also be the case that the flow control settings are set too low. Some messages get buffered in the client before they are delivered to your MessageReceiver, particularly if you are flow controlled. In this case, messages may spend too much time buffered in the client before they are delivered. There is an issue with messages in this state that is being fixed in an outstanding PR. In this scenario, you could either increase your max outstanding bytes or messages (up to whatever your subscriber can actually handle) or you can try to setAckExpirationPadding to a larger value than the default 500ms.
It is also worth checking your publisher to see if it is unexpectedly publishing messages multiple times. If that is the case, you may see the contents of your messages being the same, but they aren't duplicate messages being generated by Google Cloud Pub/Sub itself.
Edited to mention bug that was in the client library:
If you were using a version of google-cloud-pubsub between v0.22.0 and v0.29.0, you might have been running into an issue where a change in the underlying mechanism for getting messages could result in excessive duplicates. The issue has since been fixed.

Related

What is the purpose of Google Pub/Sub?

So I was looking at using Google's Pub/Sub service for queues but by trial and error I came to a conclusion that I have no idea what it's good for in real applications.
Google says that it's
A global service for real-time and reliable messaging and streaming
data
but the way it work is really strange to me. It holds acked messages up to 7 days, if the subscriber re-subscribes it will get all the messages from the past 7 days even if it already acked them, acked messages will most likely be sent again to the same subscriber that acked them already and there's no FIFO as well.
So I really do not understand how one should use this service if the only thing that it guarantees is that a message will be delivered at least once to any subscriber. This cannot be used for idempotent actions, each subscriber has to store an information about all messages that were acked already so it won't process the message multiple times and so on...
Google Cloud Pub/Sub has a lot of different applications where decoupled systems need to send and receive messages. The overview page offers a number of use cases including balancing work loads, logging, and event notifications. It is true that Google Cloud Pub/Sub does not currently offer any FIFO guarantees and that messages can be redelivered.
However, the fact that the delivery guarantee is "at least once" should not be taken to mean acked messages are redelivered when a subscriber re-subscribers. Redelivery of acked messages is a rare event. This generally only happens when the ack did not make it all the way back to the service due to a networking issue, a machine failure, or some other exceptional condition. While that means that apps do need to be able to handle this case, it does not mean it will happen frequently.
For different applications, what happens on message redelivery can differ. In a case such as cache invalidation, mentioned in the overview page, getting two events to invalidate an entry in a cache just means the value will have to be reloaded an extra time, so there is not a correctness concern.
In other cases, like tracking button clicks or other events on a website for logging or stats purposes, infrequent acked message redelivery is likely not going to affect the information gathered in a significant way, so not bothering to check if events are duplicates is fine.
For cases where it is necessary to ensure that messages are processed exactly once, then there has to be some sort of tracking on the subscriber side to ensure this is the case. It might be that the subscriber is already accessing and updating an underlying database in response to messages and duplicate events can be detected via that storage.

pubsub Dynamic rate limiting

Can anyone give details on the Dynamic rate limiting implemented by the Pub/Sub system? I couldn't find any details on the gcloud docs or the faq pages.
Here is my pubsub usage:
I'm planning to use pubsub in our production. Right now, I have 1 topic, 1 subscription and 1 subscriber (Webhook HTTPS callback). Sometimes my subscriber can throw an exception (very rarely), in that situation my subscriber shall return a 400 response back to the pubsub, so that the pubsub can retain the message and retry.
If the pubsub gets a 400 response from the subscriber, will it severely impact the flow rate of other messages? Given the scarce documentation on how the flow control is implemented, i'm mainly concerned about the impact of one bad message on latencies of all other good messages.
I can split my one topic into multiple topics and multiple subscriptions, if it helps reduce the impact of a bad message.
If you are only occasionally returning a 400, you should not see a severe impact on the rate of messages delivered to your subscriber. When a 400 response occurs, as mentioned in the Subscriber Guide, the number of allowed outstanding messages would be cut in half. If you then return success for another outstanding message, the window will be immediately doubled again, effectively not reducing the number of outstanding messages allowed.
Message delivery for subsequent messages is delayed by an amount that is exponentially increasing on subsequent failures, starting with a delay that is O(10s of ms). Whenever a success response is returned, subsequent messages are no longer delayed. Therefore, a single 400 response from a subscriber that is otherwise returning successes shouldn't really have any noticeable impact.
Messages in Pub/Sub are retained until the consumer acknowledges the message. As long as the consumer does not acknowledge that it processed the message, the message will be retained and re-delivered.

Not persisting messages when the system comes up in the wrong order

We're sending messages to Apache Camel using RabbitMQ.
We have a "sender" and a Camel route that processes a RabbitMQ message sent by the sender.
We're having deployment issues regarding which end of the system comes up first.
Our system is low-volume. I am sending perhaps 100 messages at a time. The point of the message is to reduce 'temporal cohesion' between a thing happening in our primary database, and logging of same to a different database. We don't want our front-end to have to wait.
The "sender" will create an exchange if it does not exist.
The issue is causing deployment issues.
Here's what I see:
If I down the sender, down Camel, delete the exchange (clean slate), start the sender, then start Camel, and send 100 messages, the system works. (I think because the sender has to be run manually for testing, the Exchange is being created by the Camel Route...)
If I clean slate, and send a message, and then up Camel afterwards, I can see the messages land in RabbitMQ (using the web tool). No queues are bound. Once I start Camel, I can see its bound queue attached to the Exchange. But the messages have been lost to time and fate; they have apparently been dropped.
If, from the current state, I send more messages, they flow properly.
I think that if the messages that got dropped were persisted, I'd be ok. What am I missing?
For me it's hard to say what exactly is wrong, but I'll try and provide some pointers.
You should set up all exchanges and queues to be durable, and the messages persistent. You should never delete any of these entities (unless they are empty and you no longer use them) and maybe look at them as tables in a database. It's your infrastructure of sorts, and as with database, you wouldn't want that the first DB client to create a table that it needs (this of course applies to your use case, at least that's what it seems to me).
In the comments I mentioned flow state of the queue, but with 100 messages this will probably never happen.
Regarding message delivery - persistent or not, the broker (server) keeps them until they are consumed with acknowledgment that's sent back by the consumer (in lot's of APIs this is done automatically but it's actually one of the most important concepts).
If the exchange to which the messages were published is deleted, they are gone. If the server gets killed or restarted and the messages are persisted - again, they're gone. There may as well be some more scenarios in which messages get dropped (if I think of some I'll edit the answer).
If you don't have control over creating (declaring usually in the APIs) exchanges and queues, than (aside from the fact that's it's not the best thing IMHO) it can be tricky since declaring those entities is idempotent, i.e. you can't create a durable queue q1 , if a non durable queue with the same name already exists. This could also be a problem in your case, since you mention the which part of the system comes first thing - maybe something is not declared with same parameters on both sides...

StreamedResponse with Silverlight 4 polling duplex not sending updates

I'm trying to enable a streamed response using Silverlight 4 and polling duplex, but I'm getting strange behaviour when the rate at which updates are sent to the client is greater than the maxOutputDelay, which results in no updates being sent.
For example, with a maxOutputDelay of 7 seconds, and 1 update sent every 10 seconds, everything works fine. But if I have a maxOutputDelay of 1 second, and an update sent every 500 milliseconds, the updates just sit on the server side and don't get sent to the client.
It's my understanding that setting transferMode="StreamedResponse" should send the updates immediately to the client, but this doesn't seem to be working.
Here's the binding in my Web.config for the web service:
This config is based on the information from this article: http://blogs.msdn.com/b/silverlightws/archive/2010/06/25/http-duplex-improvements-silverlight-4.aspx
Thanks.
If you are not totally focussed on using Duplex Channels (which are a pain to configure in anything but a single host scenario) it might be worth checking out alternative solutions for implementing Server Callbacks - even if that means that you have to maintain two different types of connection to your backend.
Duplex Channel Alternatives:
PokeIn
Kaazing WebSocket Gateway
I think this article answers the question:
http://blogs.msdn.com/b/silverlightws/archive/2010/07/16/pollingduplex-multiple-mode-timeouts-demystified.aspx
The maxOutputDelay is more like an intra-message timer. So if your message rate exceeds this delay you will never trigger a flush until the buffer fills. It gets reset on each new message added to the queue. So I guess we have to tune the queue size as well as this timer to achieve a maximum actual latency.
I'm not sure why the streamed response still buffers but I see it too. Does anyone know how to tune that buffer size?
[Edited]
Ok, this article says that we cannot control the buffering int he streamed response (it is 16k in self-hosted and 32k in IIS). So, given that, it seems like small messages coming in at a rate greater than your maxoutputdelay are a pathological case. Maybe I have to pad them with data...
http://blogs.msdn.com/b/silverlightws/archive/2010/06/25/http-duplex-improvements-silverlight-4.aspx

Are PeopleSoft Integration Broker asynchronous messages fired serially on the receiving end?

I have a strange problem on a PeopleSoft application. It appears that integration broker messages are being processed out of order. There is another possibility, and that is that the commit is being fired asynchronously, allowing the transactions to complete out of order.
There are many inserts of detail records, followed by a trailer record which performs an update on the rows just inserted. Some of the rows are not receiving the update. This problem is sporadic, about once every 6 months, but it causes statistically significant financial reporting errors.
I am hoping that someone has had enough dealings with the internals of PeopleTools to know what it is up to, so that perhaps I can find a work around to the problem.
You don't mentioned whether you've set this or not, but you have a choice with Integration Broker. All messages flow through message channels, and a channel can either be ordered or unordered. If a channel is ordered then - if a message errors - all subsequent messages queue up behind it and will not be processed until it succeeds.
Whether a channel is ordered or not depends upon the checkbox on the message channel properties in Application Designer. From memory channels are ordered by default, but you can uncheck the box to increase throughput.
Hope this helps.
PS. As of Tools 8.49 the setup changed slightly, Channels became Queues, Messages Service Operations etc.
I heard from GSC. We had two domains on the sending end as well as two domains on the receiving end. All were active. According to them, it is possible when you have multiple domains for each of the servers to pick up some of the messages in the group, and therefore, process them asynchronously, rather than truly serially.
We are going to reduce the active servers to one, and see it it happens again, but it is so sporadic that we may never know for sure.
There are few changes happened in PSFT 9 IB so please let me know the version of your apps. Async services can work with Sync now. Message channel properties are need to set properly. Similar kind of problem, I found on www.itwisesolutions.com/PsftTraining.html website but that was more related to implementing itself.
thanks

Resources