So I have 8 workers (PULL sockets) that feed from a single bound PUSH socket. They deal with a huge amount of data per second and randomly crash sometimes. Obviously, I should try to get a handle on these crashes, but I'm curious how resilient this system is currently.
I've noticed that the worker processes sometimes balloon in memory usage during periods of high activity (it's not a leak though because it goes back down and this is a C program with no garbage collection) leading me to believe that the zmq PULL socket queue is filling up as the worker sorts through all the back logged messages.
What happens if the process dies while it is in this state? Are the messages also queued in the PUSH socket or are they lost?
AFAIK, yes, if the process that has a PULL socket open dies, then any messages in the receiver side queue that had not been received in the callback just disappear.
Also, yes, you will see some memory usage increase if the PULL sockets can't keep up with the PUSHers. Basically the messages start piling up in the queue of the the PULL socket on the client's side.
Related
I like to know what should be the execution pattern of Multiple Threads of a Server to implement TCP in request-response cycle of hi-performance Server (like dozens of packets with single or no system call on Linux using Packet MMAP or some other way).
Design 1) For simplicity, Start two thread in main at the start of a Server program. one thread just getting packets directly from network interface(s) like wlan0/eth0. and once number of packets read in one cycle (using while loop with poll() in Linux). wake up the other thread using conditional variable signal call. and after waking up, other thread (sender) process and send packet as tcp response.
Design 2) Start receiver thread at the start of main program. The packet receiver thread reads packets from interfaces using while loop and poll(). When number of packets received, create sender thread and pass number of packets received in one cycle to sender as parameter. Sender thread process the packets and respond as tcp response.
(I think, Design 2 will be more easy to implement but is there any design issue or possible performance issue with this approach this is the question). Since creating buffer to pass to sender thread from receiver thread need to be allocated prior to receiving packets. So I know the size of buffer to allocate. Also in this execution pattern I am creating new thread (which will return and end execution after processing packets and responding tcp response). I like to know what will be the performance issue with this approach since I am creating new thread every time I get a batch of packet from interfaces.
In first approach I am not creating more than two threads (or limited number of threads and threads can be tracked easily for logging and debugging since I will know how many thread are initially created) In second approach I don't know how many threads are hanging around and executing concurrently.
I need any advise how real website like youtube/ or others may have handled this in there hi-performance server if they had followed this way of implementing their front facing servers.
First when going to a 'real' website the magic lies in having a load balancers and a whole bunch of worker nodes to take the load and you easily exceed the boundary of a single system. For example take a look at the following AWS reference architecture for serving web pages at scale AWS Cloud Architecture for serving web whitepaper.
That being said taking this one level down it is always interesting to look at how other well-known products have solved this issue. For example NGINX has an excellent infographic available and matching blogpost describing their architecture and threading.
I'm trying to architecture the main event handling of a libuv-based application. The application is composed of one (or more) UDP receivers sharing a socket, whose job is to delegate processing incoming messages to a common worker pool.
As the protocol handled is stateful, all packets coming from any given server should always be directed to the same worker – this constraint seem to make using LibUV built-in worker pool impossible.
The workers should be able to send themselves packets.
As such, and as I am new to LibUV, I wanted to share with you the intended architecture, in order to get feedback and best practices about it.
– Each worker run their very own LibUV loop, allowing them to send directly packets over the network. Additionally, each worker has a dedicated concurrent queue for sending it messages.
– When a packet is received, its source address is hashed to select the corresponding worker from the pool.
– The receiver created a unique async handle on the receiver loop, to act as callback when processing has finished.
– The receiver notifies the worker with an async handle that a new message is available, which wakes up the worker, that starts to process all enqueued messages.
– The worker thread calls the async handle on the receiver queue, which will cause the receiver to return the buffer to pool and free all allocated resources (as such, the pool does not need to be thread-safe).
The main questions I have would be:
– What is the overhead of creating an async handle for each received message? Is it a good design?
– Is there any built-in way to send a message to another event loop?
– Would it be better to send outgoing packets using another loop, instead of doing it right from the worker loop?
Thanks.
Context: I'm developing a client-server application that is fairly solid most of the time, despite frequent network problems, outages, broken pipes, and so on. I use non-blocking sockets, select(), and OpenSSL to deliver messages between one or more nodes in a cluster, contingent on application-level heartbeats. Messages are queued and not removed from the queue until the entire message has been transferred and all the SSL_write()s return successfully. I maintain two sockets for each relationship, one incoming and one outgoing. I do this for a reason, and that's because it's much easier to detect a failed connection (very frequent) on a write than it is on a read. If a client is connecting, and I already have a connection, I replace it. Basically, the client performing the write is responsible for detecting errors and initiating a new connection (which will then replace the existing (dead) read connection on the server). This has worked well for me with one exception.
Alas, I'm losing messages. 99.9% of the time, the messages go through fine. But every now and then, I'll send, and I have no errors detected on either side for a few minutes... and then I'll get an error on the socket. The problem is that SSL_write has already returned successfully.
Let me guess: if I was blocking this would be fine, but since I'm non-blocking, I don't wait for the read on my remote end. As long as my TCP buffer can fit more, I keep stuffing things in the pipe. And when my socket goes poof, I lose anything in that buffer yet to be delivered?
How can I deal with this? Are application-level acks really necessary? (I'd rather not travel down the long road of complicated lost-acks and duplicate message complexity) Is there an elegant way to know what message I've lost? Or is there a way I can delay removal from my queue until I know it has been delivered? (Without an ack, how?)
Thanks for any help in advance.
I am developing a windows application for Client Server communication using UDP, but since UDP is connectionless, whenever a Client goes down, the Server does not know that Client is off and keeps sending the data. Similar is the case when a Server is down.
How can I cater this condition that whenever any of the Client or Server is down, the other party must know it and can handle it.
Waiting for reply.
What you are asking is beyond the scope of UDP. You'd need to implement your own protocol, over UDP, to achieve this.
One simple idea could be to periodically send keepalive messages (TCP on the other hand has this feature).
You can have a simple implementation as follows:
Have a background thread keep sending those messages and waiting for replies.
Upon receiving replies, you can populate some sort of data structure
or a file with a list of alive devices.
Your other main thread (or threads) can have the following changes:
Before sending any data, check if the client you're going to send to is present in that file/data structure.
If not, skip this client.
Repeat the above for all remaining clients in the populated file/data structure.
One problem I can see in the above implementation is analogous to the RAW hazard from the main thread's perspective.
Use the following analogy instead of the mentioned example for the RAW hazard,
i1 = Your background thread which sends the keepalive messages.
i2 = Your main thread (or threads) which send/receive data and do your other tasks.
The RAW hazard here would be when i2 tries to read the data structure/file which is populated by i1 before i1 has updated it.
This means (worst case), i2 will not get the updated list and it can miss out a few clients this way.
If this loss would be critical, I can suggest that you possibly have a sort of mechanism whereby i1 will signal i2 when it completes any-ongoing writing.
If this loss is not critical, then you can skip the above mechanism to make your program faster.
Explanation for Keepalive Messages:
You just need to send a very lightweight message (usually has no data. Just the header information). Make sure this message is unique. You do not want another message being interpreted as a keepalive message.
You can send this message using a sendto() call to a broadcast address. After you finish sending, wait for replies for a certain timeout using recv().
Log every reply in a data structure/file. After the timeout expires, have the thread go to sleep for some time. When that time expires, repeat the above process.
To help you get started writing good, robust networking code, please go through Beej's Guide to Network Programming. It is absolutely wonderful. It explains many concepts.
I have been working on a customization around UDP to make it reliable. I have this design problem which I realized only after my entire program was ready and I started sending packets from source to sink.
Scenario:
I created a single thread for reception of packets. The parent does packet sending job. Since this is just a POC, I have kept the buffer and common data structures as global pointer for whom memory is allocated on heap by the parent. I am taking care of critical memory sections using mutex.
As part of reliability I send across some control packets apart from data packets. At anytime, client will send data packets and receive control packets from server whereas server will receive data packets and send out control packets. I have used single socket, as my understanding is send & recv works simultaneously on single socket and default blocking.
Problem:
For test purpose, I send 100 packets from source to sink. Unfortunately, the thread on the server side busy keeps receiving packets and stores it in buffer. Server code isn't delivering packets to application until the parent thread gets the context switch. This add to unacceptable delay in overall communication.
Please help me understand, what is the issue; what can be changed to improve the performance?
Thanks in advance, Kedar
Since you're using a mutex, when the mutex is released on one thread after the packets are sent, then the other thread should consume the packet. Perhaps you are not releasing the mutex soon enough.
Alternately, let the socket's select() method handle the unblock-on-receive for you.