StreamingPull request has a very high latency

StreamingPull request has a very high latency - google-cloud-pubsub

We are experiencing very high latencies when we start a Google PubSub client. Messages are not arriving before minutes after the client initialization.
When looking in the Google Cloud console, we can indeed see that google.pubsub.v1.Subscriber.StreamingPull calls have very high latencies (around 8 minutes):
Is it expected behaviour? If not, what could cause this issue?
Best regards

The latency in the Google Cloud console would not be correlated with latency in receiving messages. The nature of a StreamingPull request is that it stays open for a long time, until shut down by a connection error or when a shutdown is initiated on the client. The latency in the console would indicate how long the connections are staying open, not how long it is taking to receive messages. This is also why the error rate is 100%.
Messages should be received quickly after starting up a subscriber, assuming there are messages available in the backlog to receive. There are many different things that could lead to delays in message delivery:
Subscriber client running on a machine with limited available resources that
Very tight flow control settings that only allow a few messages through at a time.
Publisher-side latency due to the publisher running on a machine with limited available resources.
Messages having been received earlier by another subscriber client running or via a pull command on gcloud tool on the same subscription, resulting in messages not being redelivered until the ack deadline has expired.

Related

Cloud PubSub quotas for "Push subscriber throughput" do not apply if set too low

We tried to enforce a certain rate limit on Cloud PubSub Push Subscrber by setting the quota on "Push subscriber throughput, kB" to 1, effectively meaning that PubSub should process no more than 1 kbps with the push subscriber.
However, the actual throughput can be higher than that, around to 6-8 kbps.
Why is that not limiting the throughput as expected?
More details:
The goal is to have a rate limit of 50 messages per second.
We can assume the average message size, for the purposes of our testing we use 50 bytes messages, which is 50 bytes * 60 second = 3000 bytes per second, or 3 kbps for a message every second. By setting the quota to 1 we expected to get way less than 50 messages per second pushed by PubSub. During testing we got signiticantly more than that.

At the moment, there is a known issue with the enforcement of push subscriber quota in Google Cloud Pub/Sub.
In general, push subscriber quota is not really a good way to try to enforce flow control. For true flow control, it is better to use pull subscribers and the client libraries. The goal of flow control in the subscriber is to prevent the subscriber from being overwhelmed. In the client library, flow control is defined in terms of outstanding messages and/or outstanding bytes. When one of these limits is reached, Cloud Pub/Sub suspends the delivery of more messages.
The issue with rate-based flow control is that it doesn't account well for unexpected issues with the subscriber or its downstream dependencies. For example, imagine that the subscriber receives messages, writes to a database, and then acknowledges the message. If the database were suffering from high latency or just unavailable for a period of time, then rate-based flow control is still going to deliver more messages to the subscriber, which will back up and could eventually overload its memory. With flow control based on outstanding messages or bytes, the fact that the database is unavailable (which prevents the acknowledgement of messages by the subscriber) means that delivery is completely halted. In this situation where the database cannot process any messages or is processing them extremely slowly, sending more messages--even at a very low rate--is still harmful to the subscriber.

Why Google pubsub java8 client is receiving duplicate messages even after acknowledgement?

We are using google-cloud-pubsub (0.24.0-beta) pull client for reading messages from subscriber and seeing high rate of duplicates in that. Google documentation says that little duplication is expected but in our case, we are seeing 80% of messages are getting duplicated even after acknowledgement.
The most weird part is, even if we acknowledge the message immediately in receiver using consumer.ack(), duplicates are still occurring.
Does anybody know how to handle this.

A large number of message duplicates could be the result of flow control settings being set too high or too low. If your flow control settings are too high, where you are allowing too many messages to be outstanding to your client at the same time, then it is possible that the acks are being set too late. If this is the cause, you would probably see the CPU of your machine at or near 100%. In this case, try setting the max number of outstanding messages or bytes to a lower number.
It could also be the case that the flow control settings are set too low. Some messages get buffered in the client before they are delivered to your MessageReceiver, particularly if you are flow controlled. In this case, messages may spend too much time buffered in the client before they are delivered. There is an issue with messages in this state that is being fixed in an outstanding PR. In this scenario, you could either increase your max outstanding bytes or messages (up to whatever your subscriber can actually handle) or you can try to setAckExpirationPadding to a larger value than the default 500ms.
It is also worth checking your publisher to see if it is unexpectedly publishing messages multiple times. If that is the case, you may see the contents of your messages being the same, but they aren't duplicate messages being generated by Google Cloud Pub/Sub itself.
Edited to mention bug that was in the client library:
If you were using a version of google-cloud-pubsub between v0.22.0 and v0.29.0, you might have been running into an issue where a change in the underlying mechanism for getting messages could result in excessive duplicates. The issue has since been fixed.

Google PubSub Reliability

Is Google PubSub suitable for low-volume (10 msg/sec) but mission-critical messaging, where timely delivery of each message is guaranteed within any fixed period of time?
Or, is it rather suited for high-throughput, where individual messages might be occasionally lost or delayed indefinitely?
Edit: To rephrase this question a bit: Is it true, that any particular message in PubSub, regardless of volume of messages produced, can be indefinitely delayed?

Google Cloud Pub/Sub guarantees delivery of all messages, whether low throughput or high throughput, so there should be no concern about messages being lost.
Latency for message delivery from publisher to subscriber depends on many different factors. In particular, the rate at which the subscriber is able to process messages and request more messages is vitally important. For pull subscribers, this means always having several outstanding pull requests to the server. For push subscribers, they should be returning a successful HTTP response code as quickly as possible. You can read more about the difference between push and pull subscribers.
Google Cloud Pub/Sub tries to minimize latency as much as possible, though there are no guarantees made. Empirically, Cloud Pub/Sub consistently delivers messages in no more than a couple of seconds at the 99th percentile. Note that if your publishers or subscribers are not running on Google Cloud Platform, then network latency between your servers and Google servers could also be a factor.

pubsub Dynamic rate limiting

Can anyone give details on the Dynamic rate limiting implemented by the Pub/Sub system? I couldn't find any details on the gcloud docs or the faq pages.
Here is my pubsub usage:
I'm planning to use pubsub in our production. Right now, I have 1 topic, 1 subscription and 1 subscriber (Webhook HTTPS callback). Sometimes my subscriber can throw an exception (very rarely), in that situation my subscriber shall return a 400 response back to the pubsub, so that the pubsub can retain the message and retry.
If the pubsub gets a 400 response from the subscriber, will it severely impact the flow rate of other messages? Given the scarce documentation on how the flow control is implemented, i'm mainly concerned about the impact of one bad message on latencies of all other good messages.
I can split my one topic into multiple topics and multiple subscriptions, if it helps reduce the impact of a bad message.

If you are only occasionally returning a 400, you should not see a severe impact on the rate of messages delivered to your subscriber. When a 400 response occurs, as mentioned in the Subscriber Guide, the number of allowed outstanding messages would be cut in half. If you then return success for another outstanding message, the window will be immediately doubled again, effectively not reducing the number of outstanding messages allowed.
Message delivery for subsequent messages is delayed by an amount that is exponentially increasing on subsequent failures, starting with a delay that is O(10s of ms). Whenever a success response is returned, subsequent messages are no longer delayed. Therefore, a single 400 response from a subscriber that is otherwise returning successes shouldn't really have any noticeable impact.

Messages in Pub/Sub are retained until the consumer acknowledges the message. As long as the consumer does not acknowledge that it processed the message, the message will be retained and re-delivered.

Google App Engine Channels API and sending heartbeat signals from client

Working on a GAE project and one requirement we have is that we want to in a timely manner be able to determine if a user has left the application. Currently we have this working, but is unreliable so I am researching alternatives.
The way we do this now is we have a function setup to run in JS on an interval that sends a heartbeat signal to the GAE app using an AJAX call. This works relatively well, but is generating a lot of traffic and CPU usage. If we don't hear a heartbeat from a client for several minutes, we determine they have left the application. We also have the unload function wired up to send a part message, again through an AJAX call. This works less then well, but most of the time not at all.
We are also making use of the Channels API. One thing I have noticed is that our app when using an open channel, the client seems to also be sending a heartbeat signal in the form of a call to http://talkgadget.google.com/talkgadget/dch/bind. I believe this is happening from the iFrame and/or JS that gets loaded when opening channel in the client.
My question is, can my app on the server side some how hook in to these calls to http://talkgadget.google.com/talkgadget/dch/bind and use this as the heartbeat signal? Is there a better way to detect if a client is still connected even if they aren't actively doing anything in the client?

Google have added this feature:
See https://developers.google.com/appengine/docs/java/channel/overview
Tracking Client Connections and Disconnections
Applications may register to be notified when a client connects to or
disconnects from a channel.
You can enable this inbound service in appengine-web.xml:

Currently the channel API bills you up-front for all the CPU time the channel will consume for two hours, so it's probably cheaper to send messages to a dead channel than to send a bunch of heartbeat messages to the server.
https://groups.google.com/d/msg/google-appengine/sfPTgfbLR0M/yctHe4uU824J
What I would try is attach a "please acknowledge" parameter to every Nth message (staggered to avoid every client acknowledging a single message). If 2 of these are ignored mute the channel until you hear from that client.

You can't currently use the Channel API to determine if a user is still online or not. Your best option for now depends on how important it is to know as soon as a user goes offline.
If you simply want to know they're offline so you can stop sending messages, or it's otherwise not vital you know immediately, you can simply piggyback pings on regular interactions. Whenever you send the client an update and you haven't heard anything from them in a while, tag the message with a 'ping request', and have the client send an HTTP ping whenever it gets such a tagged message. This way, you'll know they're gone shortly after you send them a message. You're also not imposing a lot of extra overhead, as they only need to send explicit pings if you're not hearing anything else from them.
If you expect long periods of inactivity and it's important to know promptly when they go offline, you'll have to have them send pings on a schedule, as you suggested. You can still use the trick of piggybacking pings on other requests to minimize them, and you should set the interval between pings as long as you can manage, to reduce load.

I do not have a good solution to your core problem of "hooking" the client to server. But I do have an interesting thought on your current problem of "traffic and CPU usage" for periodic pings.
I assume you have a predefined heart-beat interval time, say 1 min. So, if there are 120 clients, your server would process heart beats at an average rate of 2 per second. Not good if half of them are "idle clients".
Lets assume a client is idle for 15 minutes already. Does this client browser still need to send heart-beats at the constant pre-defined interval of 1 min?? Why not make it variable?
My proposal is simple: Vary the heart-beats depending on activity levels of client.
When the client is "active", heart-beats work at 1 per minute. When the client is "inactive" for more than 5 minutes, heart-beat rate slows down to 50% (one after every 2 minutes). Another 10 minutes, and heart-beat rate goes down another 50% (1 after every 4 minutes)... At some threshold point, consider the client as "unhooked".
In this method, "idle clients" would not be troubling the server with frequent heartbeats, allowing your app server to focus on "active clients".
Its a lot of javascript to do, but probably worth if you are having trouble with traffic and CPU usage :-)