pubsub streaming pull nack vs no acknowledge behaviour - google-cloud-pubsub

nack() have following beaviour
nack()
"""Decline to acknowldge the given message.
This will cause the message to be re-delivered to the subscription.
Now in streaming pull, I am pulling taxiride streaming data tested following behavior.
with nack()
Streaming pull continues to receive messages which were previously nacked()
Neither nack() or ack()
Streaming reads initial bunch of messages and waits for long time. I waited for almost 15 mins but it didn't pulled any new message.
Now my question is, in streaming pull when a message is neither ack() or nack(), what is the expected behavior and right way to process these messages?
Lets say if I want to count backlog messages every minute as processing requirement?

When a message is neither acked nor nacked, the Cloud Pub/Sub client library maintains the lease on the message for up to the maxAckExtensionPeriod. Once that time period has passed, the message will be nacked and redelivered. The reason you are not getting any more messages when you neither ack nor nack is likely because you are running into the values specified in the flowControlSettings, which limits the number of messages that can be outstanding and not yet acked or nacked.
It is generally an anti-pattern to neither ack nor nack messages. If you successfully process the message, you should ack it. If you are unable to process it (say some type of transient exception occurs) then you should nack it. Trying to receive messages without performing one of these actions isn't really going to be an effective way to count the number of messages in the backlog.

Related

Problem receiving multicast traffic from several groups on one socket

I am working on an application in C that listens to several multicast groups on one single socket. I am disabling the socket option: IP_MULTICAST_ALL.
The socket is receiving traffic from 20 different multicast groups. This traffic arrives in bursts to the socket. One of those channels only publishes one message per second and no problem has been observed here.
I have also a reliable protocol for this multicasts feeds. If one listener misses a message then the protocol tries to recover the message by talking with the source via messages, then a retransmission is performed via the same channel as usual.
The problem appears when there are message bursts that arrived to the socket, then the RUDP protocol forces the retransmission of those messages. The messages arrive without problem but if the message-burst groups stop to retransmit new data because they don't have any more traffic to send, sometimes (it is pretty easy to reproduce it) the socket does not read those pending incoming messages from these groups if a periodic message arrives from a different group (the one that has tiny traffic, tiny and periodic traffic).
The situation up-to here is that there are many incoming messages sent before, pending to be read by the application (no more data is sent via this groups), periodic messages that arrive from the other group that sends periodically a few messages.
What I have seen here is that application reads a message from the group that periodically sends a few messages and then a batch of messages from other groups (the burst groups). The socket is configured as non-blocking and I get the EAGAIN errno everytime a batch of messages is read from the socket, then there is no more data to read until the socket gets a new message from the periodic group, then this message is read and a batch of the other pending messages from the other groups (the application is only reading from one single socket). I made sure the other groups do not produce more data because I tested stopping the other processes to send more data. So all the pending messages on these groups are already been sent.
The most surprising fact is that if I prevent the process that writes to the periodic group to send more messages, then the listener socket gets magically all the pending traffic from the groups that published a burst of messages before. It is like if the traffic of the periodic group stops somehow the processing of the traffic from the groups that do not publish new data but the buffers are plenty of it.
At first I thought it was related with IGMP or the poll mechanism (my application can perform active waiting or blocking waiting). The blocking waiting is implemented with the non-blocking socket but if the errno is set to EAGAIN then the app waits on a poll for new messages. I get the same behavior in both situations.
I don't think it is the IGMP because the IGMP_SNOOPING is off in the switches and because I reproduce the same behavior using one computer loopback for all this communications betweeen processes.
I also reproduce this behavior using kernel-bypass technologies (not using the kernel API to deal with the network), so it does not seem related to TCP/IP stack. Using kernel-bypass technologies I have the same paradigm: one message interface that gets all the traffic from all the groups. In this scenario all the processes use this mechanism to communicate, not several TCP/IP and several kernel-bypass. The model is homogeneous.
How could it be that I only receive batches of messages (but not all) when I am receiving live traffic from several groups but then I receive all the pending traffic if I stop the periodic traffic that arrives from a different multicast group?. This periodic traffic group it is only one message per second. The burst group does not publish anymore as all the messages were already published.
Please, does someone have an idea what should I check next?

zmq - what happens to pull queue when process dies?

So I have 8 workers (PULL sockets) that feed from a single bound PUSH socket. They deal with a huge amount of data per second and randomly crash sometimes. Obviously, I should try to get a handle on these crashes, but I'm curious how resilient this system is currently.
I've noticed that the worker processes sometimes balloon in memory usage during periods of high activity (it's not a leak though because it goes back down and this is a C program with no garbage collection) leading me to believe that the zmq PULL socket queue is filling up as the worker sorts through all the back logged messages.
What happens if the process dies while it is in this state? Are the messages also queued in the PUSH socket or are they lost?
AFAIK, yes, if the process that has a PULL socket open dies, then any messages in the receiver side queue that had not been received in the callback just disappear.
Also, yes, you will see some memory usage increase if the PULL sockets can't keep up with the PUSHers. Basically the messages start piling up in the queue of the the PULL socket on the client's side.

ZeroMQ "lazy pirate pattern" fairly servicing multiple clients

I need an architecture for a single server reliably servicing multiple clients, with clients responding to unresponsive server similar to the lazy pirate pattern from the 0MQ guide (ie, they use zmq_poll to poll for replies; if timeout elapses, disconnect and reconnect the client socket and resend the request).
I took the "lazy pirate pattern" as a starting point, from the ZMQ C language examples directory (lpclient.c and lpserver.c). Removed the simulated failure stuff from lpserver.c so that it would run normally without simulating crashes, as follows:
Server has a simple loop:
Read next message from the socket
Do some simulated work (1 second sleep)
Reply that it has serviced the request
Client has simple loop:
Send request to server
Run zmq_poll to check for response with some set timeout value
If timeout has elapsed, disconnect and reconnect to reset the connection and resend request at start of next iteration of loop
This worked great for one or two clients. I then tried to service 20 clients by running them like:
$ ./lpserver &
$ for i in {1..20}
do
./lpclient &
done
The behaviour I get is:
Clients all send their requests and begin polling for replies.
Server does one second work on first message it gets, then replies
First client gets its response back and sends a new request
Server does one second work on second message it gets, then replies
Second client gets its response back and sends a new request
Server receives third client's request, but third client times out before work completes (2.5 second timeout, server work period is 1 second, so on the third request clients start dropping out).
Multiple clients (fourth through Nth) timeout and resend their requests.
Server keeps processing the defunct requests from the incoming message queue and doing work which hogs up the server, causing all clients to eventually timeout as it takes 20 seconds to get through each round of the queue with all of the defunct messages.
Eventually all clients are dead and server is still spitting out responses to defunct connections. This is terrible because the server keeps responding to requests the client has given up on (and therefore shouldn't expect that the work has been done), and spending all this time servicing dead requests guarantees that all future client requests will timeout.
This example was presented as a way to handle multiple clients and a single server, but it simply doesn't work (I mean, if you did very quick work and had a long timeout, you would have some illusion of reliability, but it's pretty easy to envision this catastrophic collapse rearing its head under this design).
So what's a good alternative? Yes, I could shorten the time required to do work (spinning off worker threads if needed) and increase the timeout period, but this doesn't really address the core shortcoming - just reduces its likelihood - which isn't a solution.
I just need a simple request / reply pattern that handles multiple clients and a single server that processes requests serially, in the order they're received, but in which clients can time-out reliably in the event that the server is taking too long and the server doesn't waste resources responding to defunct requests.

MQSUB ended with reason code 2429 in pub sub

I am using IBM WebSphere MQ to set up a durable subscription for Pub/Sub. I am using their C APIs. I have set up a subscription name and have MQSO_RESUME in my options.
When I set a wait interval for my subscriber and I properly close my subscriber, it works fine and restarts fine.
But if I force crash my subscriber (Ctrl-C) and I try to re open it, I get a MQSUB ended with reason code 2429 which is MQRC_SUBSCRIPTION_IN_USE.
I use MQWI_UNLIMITED as my WaitInterval in my MQGET and use MQGMO_WAIT | MQGMO_NO_SYNCPOINT | MQGMO_CONVERT as my MQGET options
This error pops up only when the topic has no pending messages for that subscription. If it has pending messages that the subscription can resume, then it resumes and it ignores the first published message in that topic
I tried changing the heartbeat interval to 2 seconds and that didn't fix it.
How do I prevent this?
This happens because the queue manager has not yet detected that your application has lost its connection to the queue manager. You can see this by issuing the following MQSC command:-
DISPLAY CONN(*) TYPE(ALL) ALL WHERE(APPLTYPE EQ USER)
and you will see your application still listed as connected. As soon as the queue manager notices that your process has gone you will be able to resume the subscription again. You don't say whether your connection is a locally bound connection or a client connection, but there are some tricks to help speed up the detection of connections depending on the type of connection.
You say that in the times when you are able to resume you don't get the first message, this is because you are retrieving this messages with MQGMO_NO_SYNCPOINT, and so that message you are not getting was removed from the queue and was on its way down the socket to the client application at the time you forcibly crashed it, and so that message is gone. If you use MQGMO_SYNCPOINT, (and MQCMIT) you will not have that issue.
You say that you don't see the problem when there are still messages on the queue to be processed, that you only see it when the queue is empty. I suspect the difference here is whether your application is in an MQGET-wait or processing a message when you forcibly crash it. Clearly, when there are no messages left on the queue, you are guarenteed with the use of MQWL_UNLIMITED, to be in the MQGET-wait, but when processing messages, you probably spend more time out of the MQGET than in it.
You mention tuning down the heartbeat interval, to try to reduce the time frame, this was a good idea. You said it didn't work. Please remember that you have to change it at both ends of the channel, or you will still be using the default 5 minutes.

Measure the delay between data sent by udp server and received by the client

I want to check my udp server client application. One of the features that I wanted to check is the time delay between data sent by the server and received by the client and vice versa.
I figured out a way of sending a message from the server to the client and note the time. The client when receives this message sends the same message back to the server. The server gets this echoed message back and again notes the time. The difference between the time at which the message was sent and at which the echoed message is received back tells me the delay between data sent by the server and received by the client.
Is this approach correct?
Because I also foresee a lot of other delays involved using this approach. What could be a possible way to calculate more accurate delays?
Waiting for help.
Yes this is the most traditional way of doing ,you can do this.
You can see on sniffer, using relative time taken between sender's udp packet and receiver's udp packet. For the need of more accurate results, you have to go deep into the window's stack where it checks for udp packet received or not. And for calculation of timer's you can use a real time clock which gives upto microsecond delay. Also you are using udp which has high priority of packet getting lost, unlike tcp which is much reliable.
What stack are you using? LwIP ?

Resources