Execution Patter of Multi-Threaded Server on Linux - c

I like to know what should be the execution pattern of Multiple Threads of a Server to implement TCP in request-response cycle of hi-performance Server (like dozens of packets with single or no system call on Linux using Packet MMAP or some other way).
Design 1) For simplicity, Start two thread in main at the start of a Server program. one thread just getting packets directly from network interface(s) like wlan0/eth0. and once number of packets read in one cycle (using while loop with poll() in Linux). wake up the other thread using conditional variable signal call. and after waking up, other thread (sender) process and send packet as tcp response.
Design 2) Start receiver thread at the start of main program. The packet receiver thread reads packets from interfaces using while loop and poll(). When number of packets received, create sender thread and pass number of packets received in one cycle to sender as parameter. Sender thread process the packets and respond as tcp response.
(I think, Design 2 will be more easy to implement but is there any design issue or possible performance issue with this approach this is the question). Since creating buffer to pass to sender thread from receiver thread need to be allocated prior to receiving packets. So I know the size of buffer to allocate. Also in this execution pattern I am creating new thread (which will return and end execution after processing packets and responding tcp response). I like to know what will be the performance issue with this approach since I am creating new thread every time I get a batch of packet from interfaces.
In first approach I am not creating more than two threads (or limited number of threads and threads can be tracked easily for logging and debugging since I will know how many thread are initially created) In second approach I don't know how many threads are hanging around and executing concurrently.
I need any advise how real website like youtube/ or others may have handled this in there hi-performance server if they had followed this way of implementing their front facing servers.

First when going to a 'real' website the magic lies in having a load balancers and a whole bunch of worker nodes to take the load and you easily exceed the boundary of a single system. For example take a look at the following AWS reference architecture for serving web pages at scale AWS Cloud Architecture for serving web whitepaper.
That being said taking this one level down it is always interesting to look at how other well-known products have solved this issue. For example NGINX has an excellent infographic available and matching blogpost describing their architecture and threading.

Related

C/C++ code using pthreads to execute sync and async communications

I am using a Linux machine to communicate with a PLC. The PLC and Linux-machine are connected within a local network, and use UDP/IP as the base protocol. Also, the port number is fixed on both sides.
Such a communication needs to achieve:
Requirement 1: Linux machine could send commands (one command each time) to the PLC. After each command received, the PLC will response the Linux machine with a success/failure message within 50ms.
Requirement 2: Vise versa, PLC could send commands to the Linux machine. The Linux machine has to response back with a message within 50ms. The PLC sending is asynchronous to the Linux machine. Therefore the Linux machine needs to monitor(or listen to) the port continuously.
Simple C/C++ code has been used for testing the communication separately regarding the aforementioned requirements. It worked. But blocking mechanism was conducted.
Here comes the challenging part. I would like to use pthreads for such a communication. My solution is to simply create two threads for each requirement. I sketched my thought using the attached pic https://www.dropbox.com/s/vriyrprl7j6tntx/multi-thread%20solution.png?dl=0, with 'thread 0' denoting main thread, 'thread 1' denoting Requirement 1 thread and 'thread 2' denoting Requirement 2 thread. 'shared data' indicates the data that could be shared throughout all the child threads. 'thread 1 data' is dedicated for thread 1 usage, and other threads will not access. Likewise, 'thread 2 data' is only used by thread 2.
My concern rises considering two threads will be conducting system calls on the same port. Hence, I need reviews on my solution, and it would be awesome to get more working solutions. P.S. I am not too worried about thread synchronization and creation. And it is totally cool to me if thread sync and creation are necessary in your solution.
Thanks in advance.
There is no general problem with two threads executing system calls on the same socket. You may encounter some specific issues, though:
If you call recvfrom() in both threads (one waiting for the PLC to send a request, and the other waiting for the PLC to respond to a command from the server), you don't know which one will receive the response. To get around this, you can dedicate one thread to reading from the PLC, and have it pass reply messages from the PLC to the sending thread using shared queue or similar structure.
You have to be careful when you close a socket that could be in use by another thread - because of the way file descriptors are reused, it's easy to have a race condition that ends up with a thread acting on the wrong socket.

Handling of multiple UDP streams

Let's assume I have m UDP streams uniquely identified by some id (e.g. RTP SSRC). I need to process them in n associated threads and association is 1-N, i.e. one UDP stream is processed by one or many threads.
What is the difference in kernel's networking stack performance if I:
Start m UDP servers each on different port. Each server processes one stream and pushes it's data to one or more associated threads.
Start just one server. All streams are handled by it's single port and this thread pushes each stream data next to one or more associated threads.
I think it comes down to the question: is it better to open one single port or many of them, where each will receive proportionally less data?
Is there possibility that single socket may be overwhelmed by the amount of incoming data? Or maybe socket, which is more logical thing in linux kernel than a physical thing, has not so much to do that the data itself, so there is no real difference?
What is the maximum bitrate the single UDP socket (with enlarged buffer) can handle?
I am sure I will best find the answer by browsing of kernel's networking code but maybe someone could give the answer straight away please. Thank you.
There is no easy answering this question because it all boils down to the processing speed of your threads and how you delegate the work among them.
If you think that the udp socket is going to be overwhelmed you can create a queue right behind the udp socket. This queue can grow as large as you allow it to grow. Of course you then use more memory.
What you will have then is a consumers/producer paradigm. One thread is putting things in the queue, other threads are taking from the queue.
If the processing of the threads is slower than the thread which is filling the queue, and this keeps going for a long time. Your queue is anyhow going to get overrun.
There are frameworks dedicated to the task of multimedia processing. You might want to take a look at gstreamer. http://gstreamer.freedesktop.org/documentation/
It has support for RTP and is basically a system that allows to create a pipeline of a datastream which is exactly what you are doing here.
You will find that gstreamer has ready made queues that allow to queue up some data somewhere in the pipeline. This anyhow proves to me that something like this is needed when you are processing at high speeds. Though I am not a gstreamer specialist. Gstreamer has building blocks so you can experiment with a pipeline and easily add queueing, remove it and compare the results of the overall pipeline. It does require some studying to get to know the api. It is written in C.
The more sockets you have, the more socket receive buffers you have, so the more space is available for incoming data.
This suggests that multiple socket may be the better option.
However datagrams can be lost anywhere, not just at the target host.

Libevent: multithreading to handle HTTP keep-alive connections

I am writing an HTTP reverse-proxy in C using Libevent and I would like to implement multithreading to make use of all available CPU cores. I had a look at this example: http://roncemer.com/software-development/multi-threaded-libevent-server-example/
In this example it appears that one thread is used for the full duration of a connection, but for HTTP 1.1 I don't think this will be the most effective solution as connections are kept alive by default after each request so that they can be reused later. I have noticed that even one browser panel can open several connections to one server and keep them open until the tab is closed which would immediately exhaust the thread pool. For an HTTP 1.1 proxy there will be many open connections but only very few of them actively transferring data at a given moment.
So I was thinking of an alternative, to have one event base for all incoming connections and have the event callback functions delegate to worker threads. This way we could have many open connections and make use of a thread only when data arrives on a connection, returning it back to the pool once the data has been dealt with.
My question is: is this a suitable implementation of threads with Libevent?
Specifically – is there any need to have one event base per connection as in the example or is one for all connections sufficient?
Also – are there any other issues I should be aware of?
Currently the only problem I can see is with burstiness, when data is received in many small chunks triggering many read events per HTTP response which would lead to a lot of handing-off to worker threads. Would this be a problem? If it would be, then it could be somewhat negated using Libevent's watermarking, although I'm not sure how that works if a request arrives in two chunks and the second chunk is sufficiently small to leave the buffer size below the watermark. Would it then stay there until more data arrives?
Also, I would need to implement scheduling so that a chunk is only sent once the previous chunk has been fully sent.
The second problem I thought of is when the thread pool is exhausted, i.e. all threads are currently doing something, and another read event occurs – this would lead to the read event callback blocking. Does that matter? I thought of putting these into another queue, but surely that's exactly what happens internally in the event base. On the other hand, a second queue might be a good way to organise scheduling of the chunks without blocking worker threads.

Is threading the best way to handle 40 Clients at a time in UDP Server?

I am working on a UDP server/client application.
I want my server to be able to handle 40 clients at a time. I have thought of creating 40 threads at server side, each thread handling one client. Clients are distinguished on the basis of IP addresses and there is one thread for each unique IP address.
Whenever a client sends some data to a server, the main thread extracts the IP address of the client and decides which thread will process this specific client. Is there a better way to achieve this functionality?
There are different approaches for scale able server application, one thread per client seems good if no of clients are not many, another most efficient approach to accomplish this task is to use thread pool. These threads are work as task base when ever you have any new task assign this task to free worker thread.
Take a look at this project, I think it is very helpful to start with: http://www.codeproject.com/Articles/16935/A-Chat-Application-Using-Asynchronous-UDP-sockets
With IPAddress.Any, we specify that the server should accept client
requests coming on any interface. To use any particular interface, we
can use IPAddress.Parse (“192.168.1.1”) instead of IPAddress.Any. The
Bind function then bounds the serverSocket to this IP address. The
epSender identifies the clients from where the data is coming.
With BeginReceiveFrom, we start receiving the data that will be sent
by the client. Note that we pass epSender as the last parameter of
BeginReceiveFrom, the AsyncCallback OnReceive gets this object via the
AsyncState property of IAsyncResult, and it then processes the client
requests (login, logout, and send message to the users). Please see
the code attached to understand the implementation of OnReceive.
A better way would be to use the Proactor pattern (take a look at Boost.Asio library), instead of creating thread per client. With such an approach your application would have much better scalability and performace (especially on platforms that have native async i/o)
Besides, with this technique the threading would be de-coupled from the concurrency, meaning that you don't necessarily have to mess with multi-threading with all its complications.

Server Architecture for Embedded Device

I am working on a server application for an embedded ARM platform. The ARM board is connected to various digital IOs, ADCs, etc that the system will consistently poll. It is currently running a Linux kernel with the hardware interfaces developed as drivers. The idea is to have a client application which can connect to the embedded device and receive the sensory data as it is updated and issue commands to the device (shutdown sensor 1, restart sensor 2, etc). Assume the access to the sensory devices is done through typical ioctl.
Now my question relates to the design/architecture of this server application running on the embedded device. At first I was thinking to use something like libevent or libev, lightweight C event handling libraries. The application would prioritize the sensor polling event (and then send the information to the client after the polling is done) and process client commands as they are received (over a typical TCP socket). The server would typically have a single connection but may have up to a dozen or so, but not something like thousands of connections. Is this the best approach to designing something like this? Of the two event handling libraries I listed, is one better for embedded applications or are there any other alternatives?
The other approach under consideration is a multi-threaded application in which the sensor polling is done in a prioritized/blocking thread which reads the sensory data and each client connection is handled in separate thread. The sensory data is updated into some sort of buffer/data structure and the connection threads handle sending out the data to the client and processing client commands (I supposed you would still need an event loop of sort in these threads to monitor for incoming commands). Are there any libraries or typical packages used which facilitate designing an application like this or is this something you have to start from scratch?
How would you design what I am trying to accomplish?
I would use a unix domain socket -- and write the library myself, can't see any advantages to using libvent since the application is tied to linux, and libevent is also for hundreds of connections. You can do all of what you are trying to do with a single thread in your daemon. KISS.
You don't need a dedicated master thread for priority queues you just need to write your threads so that it always processes high priority events before anything else.
In terms of libraries, you will possibly benifit from Google's protocol buffers (for serialization and representing your protocol) -- however it only has first class supports for C++, and the over the wire (serialization) format does a bit of simple bit shifting to numeric data. I doubt it will add any serious overhead. However an alternative is ASN.1 (asn1c).
My suggestion would be a modified form of your 2nd proposal. I would create a server that has two threads. One thread polling the sensors, and another for ALL of your client connections. I have used in embedded devices (MIPS) boost::asio library with great results.
A single thread that handles all sockets connections asynchronously can usually handle the load easily (of course, it depends on how many clients you have). It would then serve the data it has on a shared buffer. To reduce the amount and complexity of mutexes, I would create two buffers, one 'active' and another 'inactive', and a flag to indicate the current active buffer. The polling thread would read data and put it in the inactive buffer. When it finished and had created a 'consistent' state, it would flip the flag and swap the active and inactive buffers. This could be done atomically and should therefore not require anything more complex than this.
This would all be very simple to set up since you would pretty much have only two threads that know nothing about the other.

Resources