Maintaining connections with many real-time devices - c

I'm writing a program on Linux to control about 1000 Patient Monitors at same time over UDP sockets. I've successfully written a library to parse and send messages to collect the data from a single patient monitor device. There are various scheduling constraints on the the device, listed below:-
Each device must constantly get an alive-request from computer client within max time-period of 300 milliseconds(may differ for different devices), otherwise connection is lost.
Computer client must send a poll-request to a device in order fetch the data within some time period. I'm polling for about 5 seconds of averaged data from patient monitor, therefore, I'm required to send poll-request in every 5 * 3 = 15 seconds. If I fail to send the request within 15 seconds time-frame, I looses the connection from device.
Now, I'm trying to extend my current program so that it is capable of handling about 1000+ devices at same time. Right now, my program can efficiently handle and parse response from just one device. In case of handling multiple devices, it is necessary to synchronize multiple responses from different device and serialize them and stream it over TCP socket, so that remote computers can also analyze the data. Well, that is not a problem because it is a well know multiple-producer and single consumer problem. My main concern is, what approach should I use in order to maintain alive-connection 1000+ devices.
After reading over Internet and browsing for similar questions on this website, I'm mainly considering two options:-
Use one thread per device. In order to control 1000+ device, I would end up in making 1000+ threads which does not look feasible to me.
Use multiplexing approach, selecting FD that requires attention and deal with it one at a time. I'm not sure how would I go about it and if multiplexing approach would be able to maintain alive-connection with all the devices considering above two constants.
I need some suggestions and advice on how to deal with this situation where you need to control 1000+ real-time-device over UDP sockets. Each device requires some alive-signal every 300 milliseconds (differ for different devices) and they require poll request in about 3 times the time interval mentioned during association phase. For example, patient monitors in ICU may require real-time (1 second averaged) data where as patient monitors in general wards may require 10-seconds averaged data, therefore, poll period for two devices would be 3*1(3 seconds) and 3*10 (30 seconds) respectively.
Thanks
Shivam Kalra

for the most part either approach is at least functionally capable of handling the functionality you describe, but by the sounds of things performance will be a crucial issue. From the figures you have provided it seems that the application could be CPU-buond.
A multithreaded approach has the advantage of using all of the available CPU cores on the machine, but multithreaded programs are notorious for being difficult to make reliable and robust.
You could also use the Apache's old tried-and-true forked-worker model - create, say, a separate process to handle a maximum of 100 devices. You could then need to write code to manage the mapping of connections to processes.
You could also use multiple hosts and some mechanism to distribute devices among them. This would have the advantage of making it easier to handle recovery situations. It sounds like your application could well be mission critical, and it may need to be architected so that if any one piece of hardware breaks then other hardware will take over automatically.

Related

Is it better to use select or multi-threading (or both) when creating a server that will simultaneously send/receive tasks for 200+ clients

I am creating a server that will be sending and receiving tasks from over 200 clients simultaneously (potentially more client in the future). There will also be background engines on the clients that will perform tasks and send responses to the server without asking first. I expect there to be a high volume of information transferred both ways. I've been doing research into multi-threading and using the select function, and I'm wondering given some of the parameters of the project which option (or a combination) would be the most efficient scalable solution based on the amount of traffic that might occur.
Any suggestions would be greatly appreciated. I'd be glad to answer any questions to provide more clarity.
Either approach will work; as far is which is "better", that's going to depend a lot on how you define the word "better".
The single-threaded approach avoids any chance of problems with race conditions or deadlocks, because those problems inherently can't occur in a single-threaded program. In a multithreaded program you have to be extremely careful about data-locking patterns, or else you will find yourself trying to debug very mysterious malfunctions that only occur once every few days/weeks/months.
On the other hand, the single-threaded approach limits you to using a single core; it won't be able to take advantage of a modern multi-core CPU to give you a parallelism speedup.
On the third hand, the multi-threaded approach can get hairy (and lose its speedup potential) if the various threads/connections often need to access any shared/mutable data structures. In that "shared data bottleneck" scenario, the threads may spend a lot of their time blocked waiting to lock a mutex, and then you're mostly back to using a single core anyway. If each connection operates independently of the others (e.g. as part of a simple web server) and doesn't need to interact with the other threads, then this shouldn't be a concern.
Multithreading allows you to use blocking I/O (which is simpler to implement than non-blocking I/O), but blocking I/O limits your control over the threads (e.g. how do you get a thread to exit cleanly, or take some other non-client-initiated action, if it is blocked indefinitely inside a recv() call? There aren't any good solutions to that problem, only poor ones)
Single-threading requires you to use non-blocking I/O (otherwise a single unresponsive client can halt service to all the other clients while the server is blocked inside a send() or recv() call), and non-blocking I/O is tricky to do correctly, since you have to handle partial-reads and partial-writes gracefully.
If your program ever needs to do a non-trivial amount of computation or file I/O, note that a single-threaded design will force all clients to wait while the computation (or I/O) for any client completes. In a multithreaded design, OTOH, clients B through Z can continue to be serviced on other cores/threads while client A's is busy reading from the disk or crunching numbers.
The overhead of spawning and maintaining threads will vary from one OS to another. If you're going to be running hundreds of threads simultaneously, you might want to verify first that your target OS (and hardware) will be able to handle that load efficiently. (You can reduce the overhead of spawning and reaping threads via a thread-pool, at some expense of increased RAM usage)
I personally prefer the single-threaded/non-blocking-I/O approach, because blocking I/O is problematic if you want your program to be able to shut down cleanly and reliably (which you should want, if only so you can do e.g. memory-leak testing under valgrind). If single-core performance turns out to be insufficient, it's often fairly straightforward extend the handle-N-sockets-on-1-thread design to a more powerful handle-N-sockets-on-each-of-M-threads design, and then you can play around with different values of N and M until you find the one that gives you the best performance (e.g. by setting M to the number of cores on the host machine, and handing out newly-accepted sockets to whichever thread is currently handling the smallest number of sockets)
I once made a program in Java, a chat application, that each connection with the server that was established, represented a new Thread in the server, to manage the client in question.
Inside the Server class, there was a static variable, to manage which clients were connected.
I don't know if recommend different technologies is the right way to answer you question, but i think, that for your case, would be a good idea to take a look at Erlang/Elixir platform, the premise is the is able to hold a lot of clients at the same time.
Currently, big companies, like Whatsapp uses Erlang and Discord Elixir.
I hope that my answer was helpful.

should I use processes or threads for my application?

I have an ARM device running a Linux 2.6 Kernel, with total ram of 64 MB RAM.
There is a data source, which consists of a meter that is queried by the Linux box, through RS485 and ModBus as app protocol.
There is another task, that consists of reading these values and making a json object, then HTTP POST to a specific server.
Network operation might be slower than serial, especially on low GPRS Coverage.
I need concurrency, program is written in C.
Which way would you have concurrency? Using select() or using pthreads?
When analyzing this particular application there's really only one question relevant to choosing pthreads:
Do the sensor reader and network writer need to share an address space?
In this instance I think the answer is clearly "no". Of course that isn't the only possible question, but the only germane one. There are reasons to prefer separate processes:
the two halves of the application have no common code; RS485 is wildly different from HTTP/JSON
segregation of responsibility: if the RS485 side is waiting on a UART, do you really want to block the HTTP side?
letting the OS do its job so you don't have to: if using pthreads, you have to handle a lot of the synchronization and preemption that the kernel does for you for free and code that you don't have to write has no new bugs.
Further analysis would require more detail than you've given, but here is one additional way to think about the choice: threads were invented to mitigate some limitations of the process model. Unless you know that you are going to hit those limitations, use separate processes.
added in response to comments:
I half agree with psusi's suggested design. There need only be two processes, one (let's say the sensor reader, that's a fine choice) which forks one and only one http sender. The two processes can communicate using traditional IPC like a pipe. The sensor process sends data down the pipe when it has some and the child (http) process packs it up in json and sends it on its way.
It only takes two long-lived processes, it uses probably about the same amount of core as would a pthread implementation and it is far, far easier to get right.
select() is more efficient, because it avoids the context switching that comes with multiple threads. And threads would be more efficient than separate processes, because you avoid having to copy the data (unless you setup shared memory, but at that point you might as well have gone with threads). However, writing non-blocking I/O, as with select(), is harder to do and get right, and doesn't enjoy the multitasking that comes with multiple threads. And multiple processes is likely to be the easiest implementation, especially because you can use curl rather than writing the HTTP POST half yourself.
Why you need concurrency? Is the meter has to be polled in a strict time interval?
If the answer is YES: Just use two processes, one poll the meter data and write to a ring buffer in nand storage, the other read the data from the ring buffer and send HTTP data.
If the answer is NO: You don't need concurrency and non-block at all. Use a big loop in main() is enough.

How would one go about to measure differences in clock time on two different devices?

I'm currently in an early phase of developing a mobile app that depends heavily on timestamps.
A master device is connected to several client devices over wifi, and issues various commands to these. When the client devices receive commands, they need to mark the (relative) timestamp when the command is executed.
While all this is simple enough, I haven't come up with a solution for how to deal with clock differences. For example, the master device might have its clock at 12:01:01, while client A is on 12:01:02 and client B on 12:01:03. Mostly, I can expect these devices to be set to similar times, as they sync over NTP. However, the nature of my application requires ms precision, so therefore I would like to safeguard against discrepancies.
A short delay between issuing a command and executing the command is fine, however an incorrect timestamp of when that command was executed is not.
So far, I'm thinking of something along the line of having the master device ping each client device to determine transaction time, and then request the client to send their "local" time. Based on this, I can calculate what the time difference is between master and client. Once the time difference is know, the client can adapt its timestamps accordingly.
I am not very familiar with networking though, and I suspect that pinging a device is not a very reliable method of establishing transaction time, since a lot factors apply, and latency may change.
I assume that there are many real-world settings where such timing issues are important, and thus there should be solutions already. Does anyone know of any? Is it enough to simply divide response time by two?
Thanks!
One heads over to RFC 5905 for NTPv4 and learns from the folks who really have put their noodle to this problem and how to figure it out.
Or you simply make sure NTP is working properly on your servers so that you don't have this problem in the first place.

Server Architecture for Embedded Device

I am working on a server application for an embedded ARM platform. The ARM board is connected to various digital IOs, ADCs, etc that the system will consistently poll. It is currently running a Linux kernel with the hardware interfaces developed as drivers. The idea is to have a client application which can connect to the embedded device and receive the sensory data as it is updated and issue commands to the device (shutdown sensor 1, restart sensor 2, etc). Assume the access to the sensory devices is done through typical ioctl.
Now my question relates to the design/architecture of this server application running on the embedded device. At first I was thinking to use something like libevent or libev, lightweight C event handling libraries. The application would prioritize the sensor polling event (and then send the information to the client after the polling is done) and process client commands as they are received (over a typical TCP socket). The server would typically have a single connection but may have up to a dozen or so, but not something like thousands of connections. Is this the best approach to designing something like this? Of the two event handling libraries I listed, is one better for embedded applications or are there any other alternatives?
The other approach under consideration is a multi-threaded application in which the sensor polling is done in a prioritized/blocking thread which reads the sensory data and each client connection is handled in separate thread. The sensory data is updated into some sort of buffer/data structure and the connection threads handle sending out the data to the client and processing client commands (I supposed you would still need an event loop of sort in these threads to monitor for incoming commands). Are there any libraries or typical packages used which facilitate designing an application like this or is this something you have to start from scratch?
How would you design what I am trying to accomplish?
I would use a unix domain socket -- and write the library myself, can't see any advantages to using libvent since the application is tied to linux, and libevent is also for hundreds of connections. You can do all of what you are trying to do with a single thread in your daemon. KISS.
You don't need a dedicated master thread for priority queues you just need to write your threads so that it always processes high priority events before anything else.
In terms of libraries, you will possibly benifit from Google's protocol buffers (for serialization and representing your protocol) -- however it only has first class supports for C++, and the over the wire (serialization) format does a bit of simple bit shifting to numeric data. I doubt it will add any serious overhead. However an alternative is ASN.1 (asn1c).
My suggestion would be a modified form of your 2nd proposal. I would create a server that has two threads. One thread polling the sensors, and another for ALL of your client connections. I have used in embedded devices (MIPS) boost::asio library with great results.
A single thread that handles all sockets connections asynchronously can usually handle the load easily (of course, it depends on how many clients you have). It would then serve the data it has on a shared buffer. To reduce the amount and complexity of mutexes, I would create two buffers, one 'active' and another 'inactive', and a flag to indicate the current active buffer. The polling thread would read data and put it in the inactive buffer. When it finished and had created a 'consistent' state, it would flip the flag and swap the active and inactive buffers. This could be done atomically and should therefore not require anything more complex than this.
This would all be very simple to set up since you would pretty much have only two threads that know nothing about the other.

How to scale a TCP listener on modern multicore/multisocket machines

I have a daemon to write in C, that will need to handle 20-150K TCP connections simultaneously. They are long running connections, and rarely ever tear down. They have a very small amount of data (rarely exceeding MTU even.. it's a stimulus/response protocol) in transmit at any given time, but response times to them are critical. I'm wondering what the current UNIX community is using to get large amounts of sockets, and minimizing the latency on response of them. I've seen designs revolving around multiplexing connects to fork worker pools, threads (per connection), static sized thread pools. Any suggestions?
the easiest suggestion is to use libevent, it makes it easy to write a simple non-blocking single-threaded server that would comply with your requirements.
if the processing for each response takes some time, or if it uses some blocking API (like almost anything from a DB), then you'll need some threading.
One answer is the worker threads, where you spawn a set of threads, each listening on some queue to work. it can be separate processes, instead of threads, if you like. The main difference would be the communications mechanism to tell the workers what to do.
A different way to do is to use several threads, and give to each of them a portion of those 150K connections. each will have it's own process loop and work mostly like the single-threaded server, except for the listening port, which will be handled by a single thread. This helps spreading the load between cores, but if you use a blocking resource, it would block all the connections handled by this specific thread.
libevent lets you use the second way if you're careful; but there's also an alternative: libev. it's not as well known as libevent, but it specifically supports the multi-loop scheme.
If performance is critical then you'll really want to go for a multithreaded event loop solution - i.e. a pool of worker threads to handle your connections. Unfortunately, there is no abstraction library to do this that works on most Unix platforms (note that libevent is only single-threaded as are most of these event-loop libraries), so you'll have to do the dirty work yourself.
On Linux that means using edge-triggered epoll with a pool of worker threads (Windows would have I/O completion ports which also works fine in a multithreaded environment - I am not sure about other Unixes).
BTW, I have done some work trying to abstract edge-triggered epoll on Linux and Windows I/O completion ports on http://nginetd.cmeerw.org (it is work in progress, but might provide some ideas).
If you have system configuration access don't over-do it and set up some iptables/pf/etc to load-balance connections across n daemon instances (processes) as this will work out of the box. Depending on how blocking the nature of the daemon n should be from the number of cores on the system or several times higher. This approach looks crude but it can handle broken daemons and even restart them if necessary. Also migration would be smooth as you could start diverting new connections to another set of processes (for example, a new release or migrating to a new box) instead of service interruptions. On top of that you get several features like source affinity wich can help significantly caching and contention of problematic sessions.
If you don't have system access (or ops can't be bothered), you can use load balancer daemon (there are plenty of open source ones) instead of iptables/pf/etc and use also n service daemons, like above.
Also this approach helps with separating privileges of ports. If the external service needs to service on a low port (<1024) you only need the load balancer running privileged/or admin/root, or kernel.)
I've written several IP load balancers in the past and it can be very error-prone in production. You don't want to support and debug that. Also operations and management will tend second-guess your code more than external code.
i think javier's answer makes the most sense. if you want to test the theory out, then check out the node javascript project.
Node is based on Google's v8 engine which compiles javascript to machine code and is as fast as c for certain tasks. It is also based on libev and is designed to be completely non-blocking, meaning you don't have to worry about context switching between threads (everything runs on a single event loop). It is very similar to erlang in that respect.
Writing high performance servers in javascript is now really, really easy with node. You could also, with a little bit of effort, write your custom code in c and create bindings for node to call into it to do your actual processing (look at the node source to see how to do this - documentation is a little sketchy at the moment). as an uglier alternative, you could build your custom c code as an application and use stdin/stdout to communicate with it.
I've tested node myself with upwards of 150k connections with absolutely no issues (of course you will need some serious hardware if all these connections are going to be communicating at once). A TCP connection in node.js on average uses only 2-3k of memory so you could theoretically handle 350-500k connections per 1GB of RAM.
Note - Node.js is not currently supported on windows, but it is only at an early stage of development and i'd imagine it will be ported at some stage.
Note 2 - you will have to ensure the code you are calling into from Node does not block
Several systems have been developed to improve on select(2) performance: kqueue, epoll, and /dev/poll. In all these systems, you can have a pool of worker threads waiting for tasks; you will not be forced to setup all file handles over and over again when done with one of them.
do you have to start from scratch? You could use something like gearman.

Resources