Let's assume I have m UDP streams uniquely identified by some id (e.g. RTP SSRC). I need to process them in n associated threads and association is 1-N, i.e. one UDP stream is processed by one or many threads.
What is the difference in kernel's networking stack performance if I:
Start m UDP servers each on different port. Each server processes one stream and pushes it's data to one or more associated threads.
Start just one server. All streams are handled by it's single port and this thread pushes each stream data next to one or more associated threads.
I think it comes down to the question: is it better to open one single port or many of them, where each will receive proportionally less data?
Is there possibility that single socket may be overwhelmed by the amount of incoming data? Or maybe socket, which is more logical thing in linux kernel than a physical thing, has not so much to do that the data itself, so there is no real difference?
What is the maximum bitrate the single UDP socket (with enlarged buffer) can handle?
I am sure I will best find the answer by browsing of kernel's networking code but maybe someone could give the answer straight away please. Thank you.
There is no easy answering this question because it all boils down to the processing speed of your threads and how you delegate the work among them.
If you think that the udp socket is going to be overwhelmed you can create a queue right behind the udp socket. This queue can grow as large as you allow it to grow. Of course you then use more memory.
What you will have then is a consumers/producer paradigm. One thread is putting things in the queue, other threads are taking from the queue.
If the processing of the threads is slower than the thread which is filling the queue, and this keeps going for a long time. Your queue is anyhow going to get overrun.
There are frameworks dedicated to the task of multimedia processing. You might want to take a look at gstreamer. http://gstreamer.freedesktop.org/documentation/
It has support for RTP and is basically a system that allows to create a pipeline of a datastream which is exactly what you are doing here.
You will find that gstreamer has ready made queues that allow to queue up some data somewhere in the pipeline. This anyhow proves to me that something like this is needed when you are processing at high speeds. Though I am not a gstreamer specialist. Gstreamer has building blocks so you can experiment with a pipeline and easily add queueing, remove it and compare the results of the overall pipeline. It does require some studying to get to know the api. It is written in C.
The more sockets you have, the more socket receive buffers you have, so the more space is available for incoming data.
This suggests that multiple socket may be the better option.
However datagrams can be lost anywhere, not just at the target host.
Related
I like to know what should be the execution pattern of Multiple Threads of a Server to implement TCP in request-response cycle of hi-performance Server (like dozens of packets with single or no system call on Linux using Packet MMAP or some other way).
Design 1) For simplicity, Start two thread in main at the start of a Server program. one thread just getting packets directly from network interface(s) like wlan0/eth0. and once number of packets read in one cycle (using while loop with poll() in Linux). wake up the other thread using conditional variable signal call. and after waking up, other thread (sender) process and send packet as tcp response.
Design 2) Start receiver thread at the start of main program. The packet receiver thread reads packets from interfaces using while loop and poll(). When number of packets received, create sender thread and pass number of packets received in one cycle to sender as parameter. Sender thread process the packets and respond as tcp response.
(I think, Design 2 will be more easy to implement but is there any design issue or possible performance issue with this approach this is the question). Since creating buffer to pass to sender thread from receiver thread need to be allocated prior to receiving packets. So I know the size of buffer to allocate. Also in this execution pattern I am creating new thread (which will return and end execution after processing packets and responding tcp response). I like to know what will be the performance issue with this approach since I am creating new thread every time I get a batch of packet from interfaces.
In first approach I am not creating more than two threads (or limited number of threads and threads can be tracked easily for logging and debugging since I will know how many thread are initially created) In second approach I don't know how many threads are hanging around and executing concurrently.
I need any advise how real website like youtube/ or others may have handled this in there hi-performance server if they had followed this way of implementing their front facing servers.
First when going to a 'real' website the magic lies in having a load balancers and a whole bunch of worker nodes to take the load and you easily exceed the boundary of a single system. For example take a look at the following AWS reference architecture for serving web pages at scale AWS Cloud Architecture for serving web whitepaper.
That being said taking this one level down it is always interesting to look at how other well-known products have solved this issue. For example NGINX has an excellent infographic available and matching blogpost describing their architecture and threading.
I've made a test C program that creates an AF_PACKET socket, creates x amount of threads via pthreads, and within each thread performs epoll on the socket's file descriptor. This program was made for Linux and I've compiled it using GCC on Ubuntu 18.04. I've submitted a GitHub Gist of the program here since it's 200+ lines of code. I am still fairly new to C and network programming. Therefore, I'm sure there are many improvements I can make to the code. I am open to suggestions!
I have two main questions:
Is there a better way to receive and process high amounts of packets/traffic in a user space program than the above? I've read using pthreads along with epoll would be the best option, but I've also looked into select and standard poll.
When the program above is executed without any debug output via fprintf(), each thread consumes 100% CPU on the epoll_wait() function within the while loop. Is this normal behavior or am I using epoll incorrectly? I've looked at some other examples and I use epoll the same way as the examples do. I've taken a look at the manual page for epoll and I believe I'm using it correctly in my case. I've also tried setting a timeout for the epoll_wait() function, but it was still consuming 100% CPU per thread (which I'd expect due to the while loop).
I plan to make a program that will redirect traffic after inspecting the traffic and I expect a lot of incoming packets which is why I wanted to see if there is a better way to receive and process high amounts of packets. I also understand I could just use standard SOCK_DGRAM or SOCK_STREAM sockets and bind them to an IP and port. However, I do want to process and inspect all incoming traffic to an interface and forward traffic if necessary (e.g. if the destination address matches a forwarding rule). I also wasn't sure if I should make multiple sockets in this case (perhaps a socket per thread). I did do this initially, but it resulted in unexpected behavior and it was only ever reading from one socket descriptor anyways. Perhaps I wasn't creating the new sockets properly.
Any help is highly appreciated and if you need any more information, please let me know.
Thank you for your time.
I have an ARM device running a Linux 2.6 Kernel, with total ram of 64 MB RAM.
There is a data source, which consists of a meter that is queried by the Linux box, through RS485 and ModBus as app protocol.
There is another task, that consists of reading these values and making a json object, then HTTP POST to a specific server.
Network operation might be slower than serial, especially on low GPRS Coverage.
I need concurrency, program is written in C.
Which way would you have concurrency? Using select() or using pthreads?
When analyzing this particular application there's really only one question relevant to choosing pthreads:
Do the sensor reader and network writer need to share an address space?
In this instance I think the answer is clearly "no". Of course that isn't the only possible question, but the only germane one. There are reasons to prefer separate processes:
the two halves of the application have no common code; RS485 is wildly different from HTTP/JSON
segregation of responsibility: if the RS485 side is waiting on a UART, do you really want to block the HTTP side?
letting the OS do its job so you don't have to: if using pthreads, you have to handle a lot of the synchronization and preemption that the kernel does for you for free and code that you don't have to write has no new bugs.
Further analysis would require more detail than you've given, but here is one additional way to think about the choice: threads were invented to mitigate some limitations of the process model. Unless you know that you are going to hit those limitations, use separate processes.
added in response to comments:
I half agree with psusi's suggested design. There need only be two processes, one (let's say the sensor reader, that's a fine choice) which forks one and only one http sender. The two processes can communicate using traditional IPC like a pipe. The sensor process sends data down the pipe when it has some and the child (http) process packs it up in json and sends it on its way.
It only takes two long-lived processes, it uses probably about the same amount of core as would a pthread implementation and it is far, far easier to get right.
select() is more efficient, because it avoids the context switching that comes with multiple threads. And threads would be more efficient than separate processes, because you avoid having to copy the data (unless you setup shared memory, but at that point you might as well have gone with threads). However, writing non-blocking I/O, as with select(), is harder to do and get right, and doesn't enjoy the multitasking that comes with multiple threads. And multiple processes is likely to be the easiest implementation, especially because you can use curl rather than writing the HTTP POST half yourself.
Why you need concurrency? Is the meter has to be polled in a strict time interval?
If the answer is YES: Just use two processes, one poll the meter data and write to a ring buffer in nand storage, the other read the data from the ring buffer and send HTTP data.
If the answer is NO: You don't need concurrency and non-block at all. Use a big loop in main() is enough.
Say there are two programs running on a computer (for the sake of simplification, the only user programs running on linux) one of which calls recv(), and one of which is using pcap to detect incoming packets. A packet arrives, and it is detected by both the program using pcap, and by the program using recv. But, is there any case (for instance recv() returning between calls to pcap_next()) in which one of these two will not get the packet?
I really don't understand how the buffering system works here, so the more detailed explanation the better - is there any conceivable case in which one of these programs would see a packet that the other does not? And if so, what is it and how can I prevent it?
AFAIK, there do exist cases where one would receive the data and the other wouldn't (both ways). It's possible that I've gotten some of the details wrong here, but I'm sure someone will correct me.
Pcap uses different mechanisms to sniff on interfaces, but here's how the general case works:
A network card receives a packet (the driver is notified via an interrupt)
The kernel places that packet into appropriate listening queues: e.g.,
The TCP stack.
A bridge driver, if the interface is bridged.
The interface that PCAP uses (a raw socket connection).
Those buffers are flushed independently of each other:
As TCP streams are assembled and data delivered to processes.
As the bridge sends the packet to the appropriate connected interfaces.
As PCAP reads received packets.
I would guess that there is no hard way to guarantee that both programs receive both packets. That would require blocking on a buffer when it's full (and that could lead to starvation, deadlock, all kinds of problems). It may be possible with interconnects other than Ethernet, but the general philosophy there is best-effort.
Unless the system is under heavy-load however, I would say that the loss rates would be quite low and that most packets would be received by all. You can decrease the risk of loss by increasing the buffer sizes. A quick google search tuned this up, but I'm sure there's a million more ways to do it.
If you need hard guarantees, I think a more powerful model of the network is needed. I've heard great things about Netgraph for these kinds of tasks. You could also just install a physical box that inspects packets (the hardest guarantee you can get).
I wrote a program that creates a TCP and UDP socket in C and starts both servers up. The goal of the application is to monitor requests over the TCP socket as to what UDP packets to send it (i.e. monitor for something like "0x01 0x02" and if I see it, then have the UDP server parse the payload, and forward it over to the TCP server for processing). The problem is, the UDP server will be busy keeping another device up, literally sending thousands of packets back and forth with this device. So what is the best way to continuously monitor requests from the TCP server, but send it certain payloads from the UDP server when requested since the UDP server will be busy?
I looked into pthreads with semaphores and/or mutex (not sure all the socket operations are thread safe, though, and if this is the right way to approach it) as well as fork / pipe. Forking the UDP server off as a child process seems easy enough, but I don't see exactly how I would be passing the kind of data I need among both servers (need request data from TCP and payload data from the UDP).
Firstly, would it make sense to put these two servers into one program? If so, you won't have to communicate between processes, and the whole logic becomes substantially easier. You will have to think about doing asynchronous input and output, and the select() function is designed for just this. There will be many explanations around on how to do this, and a quick look finds this page.
However, if you must have two separate processes, then you will need to choose a mechanism for inter-process communication, of which there are several, and your choice will be affected by your operating system. A pipe, if available, might be suitable, as might a Unix named pipe. Or you could look into third-party message passing frameworks, or just use shared memory and/or semaphores (but be very careful!).
What you should look at is libevent, anything else you are reinventing the wheel writing this low level code yourself. Here is a Tutorial, Google, Krugle
Also you should use some predefined protocol between the servers. There are lots to choose from. Ranging from the extremely simple XDR to Protocol Buffers.
You could use pipes on Unix. See http://tldp.org/LDP/lpg/node11.html
Well, you certainly picked an interesting introduction to C!
You might try shared memory. What OS?