I have an ARM device running a Linux 2.6 Kernel, with total ram of 64 MB RAM.
There is a data source, which consists of a meter that is queried by the Linux box, through RS485 and ModBus as app protocol.
There is another task, that consists of reading these values and making a json object, then HTTP POST to a specific server.
Network operation might be slower than serial, especially on low GPRS Coverage.
I need concurrency, program is written in C.
Which way would you have concurrency? Using select() or using pthreads?
When analyzing this particular application there's really only one question relevant to choosing pthreads:
Do the sensor reader and network writer need to share an address space?
In this instance I think the answer is clearly "no". Of course that isn't the only possible question, but the only germane one. There are reasons to prefer separate processes:
the two halves of the application have no common code; RS485 is wildly different from HTTP/JSON
segregation of responsibility: if the RS485 side is waiting on a UART, do you really want to block the HTTP side?
letting the OS do its job so you don't have to: if using pthreads, you have to handle a lot of the synchronization and preemption that the kernel does for you for free and code that you don't have to write has no new bugs.
Further analysis would require more detail than you've given, but here is one additional way to think about the choice: threads were invented to mitigate some limitations of the process model. Unless you know that you are going to hit those limitations, use separate processes.
added in response to comments:
I half agree with psusi's suggested design. There need only be two processes, one (let's say the sensor reader, that's a fine choice) which forks one and only one http sender. The two processes can communicate using traditional IPC like a pipe. The sensor process sends data down the pipe when it has some and the child (http) process packs it up in json and sends it on its way.
It only takes two long-lived processes, it uses probably about the same amount of core as would a pthread implementation and it is far, far easier to get right.
select() is more efficient, because it avoids the context switching that comes with multiple threads. And threads would be more efficient than separate processes, because you avoid having to copy the data (unless you setup shared memory, but at that point you might as well have gone with threads). However, writing non-blocking I/O, as with select(), is harder to do and get right, and doesn't enjoy the multitasking that comes with multiple threads. And multiple processes is likely to be the easiest implementation, especially because you can use curl rather than writing the HTTP POST half yourself.
Why you need concurrency? Is the meter has to be polled in a strict time interval?
If the answer is YES: Just use two processes, one poll the meter data and write to a ring buffer in nand storage, the other read the data from the ring buffer and send HTTP data.
If the answer is NO: You don't need concurrency and non-block at all. Use a big loop in main() is enough.
Related
I am creating a server that will be sending and receiving tasks from over 200 clients simultaneously (potentially more client in the future). There will also be background engines on the clients that will perform tasks and send responses to the server without asking first. I expect there to be a high volume of information transferred both ways. I've been doing research into multi-threading and using the select function, and I'm wondering given some of the parameters of the project which option (or a combination) would be the most efficient scalable solution based on the amount of traffic that might occur.
Any suggestions would be greatly appreciated. I'd be glad to answer any questions to provide more clarity.
Either approach will work; as far is which is "better", that's going to depend a lot on how you define the word "better".
The single-threaded approach avoids any chance of problems with race conditions or deadlocks, because those problems inherently can't occur in a single-threaded program. In a multithreaded program you have to be extremely careful about data-locking patterns, or else you will find yourself trying to debug very mysterious malfunctions that only occur once every few days/weeks/months.
On the other hand, the single-threaded approach limits you to using a single core; it won't be able to take advantage of a modern multi-core CPU to give you a parallelism speedup.
On the third hand, the multi-threaded approach can get hairy (and lose its speedup potential) if the various threads/connections often need to access any shared/mutable data structures. In that "shared data bottleneck" scenario, the threads may spend a lot of their time blocked waiting to lock a mutex, and then you're mostly back to using a single core anyway. If each connection operates independently of the others (e.g. as part of a simple web server) and doesn't need to interact with the other threads, then this shouldn't be a concern.
Multithreading allows you to use blocking I/O (which is simpler to implement than non-blocking I/O), but blocking I/O limits your control over the threads (e.g. how do you get a thread to exit cleanly, or take some other non-client-initiated action, if it is blocked indefinitely inside a recv() call? There aren't any good solutions to that problem, only poor ones)
Single-threading requires you to use non-blocking I/O (otherwise a single unresponsive client can halt service to all the other clients while the server is blocked inside a send() or recv() call), and non-blocking I/O is tricky to do correctly, since you have to handle partial-reads and partial-writes gracefully.
If your program ever needs to do a non-trivial amount of computation or file I/O, note that a single-threaded design will force all clients to wait while the computation (or I/O) for any client completes. In a multithreaded design, OTOH, clients B through Z can continue to be serviced on other cores/threads while client A's is busy reading from the disk or crunching numbers.
The overhead of spawning and maintaining threads will vary from one OS to another. If you're going to be running hundreds of threads simultaneously, you might want to verify first that your target OS (and hardware) will be able to handle that load efficiently. (You can reduce the overhead of spawning and reaping threads via a thread-pool, at some expense of increased RAM usage)
I personally prefer the single-threaded/non-blocking-I/O approach, because blocking I/O is problematic if you want your program to be able to shut down cleanly and reliably (which you should want, if only so you can do e.g. memory-leak testing under valgrind). If single-core performance turns out to be insufficient, it's often fairly straightforward extend the handle-N-sockets-on-1-thread design to a more powerful handle-N-sockets-on-each-of-M-threads design, and then you can play around with different values of N and M until you find the one that gives you the best performance (e.g. by setting M to the number of cores on the host machine, and handing out newly-accepted sockets to whichever thread is currently handling the smallest number of sockets)
I once made a program in Java, a chat application, that each connection with the server that was established, represented a new Thread in the server, to manage the client in question.
Inside the Server class, there was a static variable, to manage which clients were connected.
I don't know if recommend different technologies is the right way to answer you question, but i think, that for your case, would be a good idea to take a look at Erlang/Elixir platform, the premise is the is able to hold a lot of clients at the same time.
Currently, big companies, like Whatsapp uses Erlang and Discord Elixir.
I hope that my answer was helpful.
Let's assume I have m UDP streams uniquely identified by some id (e.g. RTP SSRC). I need to process them in n associated threads and association is 1-N, i.e. one UDP stream is processed by one or many threads.
What is the difference in kernel's networking stack performance if I:
Start m UDP servers each on different port. Each server processes one stream and pushes it's data to one or more associated threads.
Start just one server. All streams are handled by it's single port and this thread pushes each stream data next to one or more associated threads.
I think it comes down to the question: is it better to open one single port or many of them, where each will receive proportionally less data?
Is there possibility that single socket may be overwhelmed by the amount of incoming data? Or maybe socket, which is more logical thing in linux kernel than a physical thing, has not so much to do that the data itself, so there is no real difference?
What is the maximum bitrate the single UDP socket (with enlarged buffer) can handle?
I am sure I will best find the answer by browsing of kernel's networking code but maybe someone could give the answer straight away please. Thank you.
There is no easy answering this question because it all boils down to the processing speed of your threads and how you delegate the work among them.
If you think that the udp socket is going to be overwhelmed you can create a queue right behind the udp socket. This queue can grow as large as you allow it to grow. Of course you then use more memory.
What you will have then is a consumers/producer paradigm. One thread is putting things in the queue, other threads are taking from the queue.
If the processing of the threads is slower than the thread which is filling the queue, and this keeps going for a long time. Your queue is anyhow going to get overrun.
There are frameworks dedicated to the task of multimedia processing. You might want to take a look at gstreamer. http://gstreamer.freedesktop.org/documentation/
It has support for RTP and is basically a system that allows to create a pipeline of a datastream which is exactly what you are doing here.
You will find that gstreamer has ready made queues that allow to queue up some data somewhere in the pipeline. This anyhow proves to me that something like this is needed when you are processing at high speeds. Though I am not a gstreamer specialist. Gstreamer has building blocks so you can experiment with a pipeline and easily add queueing, remove it and compare the results of the overall pipeline. It does require some studying to get to know the api. It is written in C.
The more sockets you have, the more socket receive buffers you have, so the more space is available for incoming data.
This suggests that multiple socket may be the better option.
However datagrams can be lost anywhere, not just at the target host.
Am exploring with several concepts for a web crawler in C on Linux. To decide if i'll use blocking IO, multiplexed OI, AIO, a certain combination, etc., I esp need to know (I probably should discover it for myself practically via some test code, but for expediency I prefer to know from others) when a call to IO in blocking mode is made, is it the particular thread (assuming a multithreaded app/svc) or the whole process itself that is blocked? Even more specifically, in a multitheaded (POSIX) app/service can a thread dedicated to remote read/writes block the entire process? If so, how can I unblock such a thread without terminating the entire process?
NB: Whether or not I should use blocking/nonblocking is not really the question here.
Kindly
Blocking calls block only the thread that made them, not the entire process.
Whether to use blocking I/O (with one socket per thread) or non-blocking I/O (with each thread managing multiple sockets) is something you are going to have to benchmark. But as a rule of thumb...
Linux handles multiple threads reasonably efficiently. So if you are only handling a few dozen sockets, using one thread for each is easy to code and should perform well. If you are handling hundreds of sockets, it is a closer call. And for thousands of sockets, you are almost certainly better off using one thread (or process) to manage large groups.
In the latter case, for optimal performance you probably want to use epoll, even though it is Linux-specific.
I've found out that native files access has no "non-blocking" state. (I'm correct?)
I've been googling for daemons which are "non-blocking", and I've found one which achieved said behavior by threading file access operations, so that the daemon won't block.
My question is, wouldn't threading and IPC'ing such operations be rather expensive? wouldn't it make more sense to either:
A) Pre-thread pool, simply have each client at a thread and let it block for which ever blocking operations it might need. Or,
B) In case of file access blocking, use a relatively small buffer, that way it's still blocking - but one would assume that a tiny buffer for multiple operations would make more sense than paying the price of threading each operation and IPC it?
If you use threading, little IPC overhead is needed. You have the same memory space for all your threads, so a simple mutex or semaphore may be all you need. Now, if you are blocking on a mutex or semaphore too long or too often, why use async I/O in the first place?
As to the actual computation performed by threads doing I/O, they are waiting for the kernel to wake them up most of the time, so I wouldn't worry.
If your application is going to revolve around reading files and other I/O sources, you may want to read up on Reactor patterns, and event-driven programming.
Also, you mentioned a daemon, and servicing clients. If the service you provide is reading files, the computational cost of spawning a new thread to serve each client is minimal, since each individual thread will take "long" to complete requests, and block most of the time anyway. There may be a memory problem if your client count is in the thousands, but otherwise I think you'll do okay.
Give us a little more detail about what you want to do, maybe there are more straightforward ways.
In a linux application I'm using pipes to pass information between threads.
The idea behind using pipes is that I can wait for multiple pipes at once using poll(2). That works well in practice, and my threads are sleeping most of the time. They only wake up if there is something to do.
In user-space the pipes look just like two file-handles. Now I wonder how much resources such a pipes use on the OS side.
Btw: In my application I only send single bytes every now and then. Think about my pipes as simple message queues that allow me to wake-up receiving threads, tell them to send some status-data or to terminate.
No, I would not consider pipes "lightweight", but that doesn't necessarily mean they're the wrong answer for your application either.
Sending a byte over a pipe is going to require a minimum of 3 system calls (write,poll,read). Using an in-memory queue and pthread operations (mutex_lock, cond_signal) involves much less overhead. Open file descriptors definitely do consume kernel resources; that's why processes are typically limited to 256 open files by default (not that the limit can't be expanded where appropriate).
Still, the pipe/poll solution for inter-thread communication does have advantages too: particularly if you need to wait for input from a combination of sources (network + other threads).
As you are using Linux you can investigate and compare pipe performance with eventfd's. They are technically faster and lighter weight but you'll be very lucky to actually see the gains in practice.
http://www.kernel.org/doc/man-pages/online/pages/man2/eventfd.2.html
Measure and you'll know. Full processes with pipes are sufficiently lightweight for lots of applications. Other applications require something lighter weight, like OS threads (pthreads being the popular choice for many Unix apps), or superlightweight, like a user-level threads package that never goes into kernel mode except to handle I/O. While the only way to know for sure is to measure, pipes are probably good enough for up to a few tens of threads, whereas you probably want user-level threads once you get to a few tens of thousands of threads. Exactly where the boundaries should be drawn using today's codes, I don't know. If I wanted to know, I would measure :-)