POSIX Message Queues For Passing Data Between Pthreads - c

I have a Linux C program where I'm passing data between threads. I was looking into using POSIX message queues to solve this since they don't require mutexes/locks.
Looking at the mq_open() call, I have to specify permissions and the path to the queue. This leads me to two questions.
Is there a well known convention for specifying the filepath? I was
just going to dump the queues in the same folder as the executable.
In terms of permissions, I was going to use 0600, but I want to restrict this even further to prevent other processes from accessing the queues (I'm sharing data between threads and not processes). Given that the queue is "just" a file, can I use flock() with LOCK_EX to prevent accesses from other processes?
Thanks in advance.

Regarding your question 1 look at the implementation notes for mq_open on your system. At least on Linux and FreeBSD message queue names must start with a slash, but must not contain other slashes.
So while the name of a message queue looks like a path, it might or might not be an actual inode in a filesystem, depending on the implementation. According to mq_overview(7), Linux uses a virtual filesystem for message queues, which may or may not be mounted.
In view of this, question 2 might be moot. You'd have to run a test or check the kernel source if locking of a file in /dev/mqueue is actually even supported and if it accomplishes what you want.

I would not bother protecting the queue from outside processes.
Since flock is only advisory not mandatory it will not do you any good.
Also I not sure that flock will even work on queue descriptors.
Running your service as it's own user will keep other processes from being able to access the queue with mode 0600 of course.
I would however ensure on startup only one service can work on a queue at a time.
You could use pid locking or d-bus to do so.

Related

Cross-platform synchronization primitives which allow determining which PID is using them

I need to design a wrapper for a process synchronization primitive which acts like a semaphore with let's say limit 1 (so that only one client can have it locked at the same time). If this was the only requirement then I could just use named semaphores. But I'd also like to know, in the scenarios where a client can not lock the primitive, who actually has locked it. The best would be to know the locking process id. I see how I can achieve this on POSIX systems with semctl and GETPID but Windows does not expose anything like that. I am also aware that I can easily achieve this with files (e.g. opening a known file with shared read and non shared write permissions - when locking the client creates that file and writes it's PID so that the others can read it), but if possible I'd like to use actual OS API primitives instead of filesystem. Is this possible?
In Windows there is the Wait Chain Traversal which allows you to see who has locked what.

How can I serialize access to a directory in Linux?

Lets say 4 simultaneous processes are running on a processor, and data needs to be copied from an HDFS (used with Spark) file system to a local directory. Now I want only one process to copy that data, while the other processes just wait for that data to be copied by the first process.
So, basically, I want some kind of a semaphore mechanism, where every process tries to obtain semaphore to try copying the data, but only one process gets the semaphore. All processes who failed to acquire the semaphore would then just wait for the semaphore to be cleared (the process who was able to acquire the semaphore would clear it after its done with copying), and when its cleared they know the data has already been copied. How can I do that in Linux?
There's a lot of different ways to implement semaphores. The classical, System V semaphore way is described in man semop and more broadly in man sem_overview.
You might still want to do something more easily scalable and modern. Many IPC frameworks (Apache has one or two of those, too!) have atomic IPC operations. These can be used to implement semaphores, but I'd be very very careful.
Generally, I regularly encourage people who write multi-process or multi-threaded applications to use C++ instead of C. It's often simpler to see where a shared state must be protected if your state is nicely encapsulated in an object which might do its own locking. Hence, I urge you to have a look at Boost's IPC synchronization mechanisms.
In addition of Marcus Müller's answer, you could use some file locking mechanism to synchronize.
File locking might not work very well on networked or remote file systems. You should use it on a locally mounted file system (e.g. Ext4, BTRFS, ...) not on a remote one (e.g. NFS)
For example, you might adopt the convention that your directory contains (or else you'll create it) some .lock file and use an advisory lock flock(2) (or a POSIX lockf(3)) on that .lock file before accessing the directory.
If using flock, you could even lock the directory directly....
The advantage of using such a file lock approach is that you could code shell scripts using flock(1)
And on Linux, you might also use inotify(7) (e.g. to be notified when some file is created in that directory)
Notice that most solutions are (advisory, so) presupposing that every process accessing that directory is following some convention (in other words, without more precautions like using flock(1), a careless user could access that directory - e.g. with a plain cp command -, or files under it, while your locking process is accessing the directory). If you don't accept that, you might look for mandatory file locking (which is a feature of some Linux kernels & filesystems, AFAIK it is sort-of deprecated).
BTW, you might read more about ACID properties and consider using some database, etc...

What are the disadvantages of Linux's message queues?

I am working on a message queue used to communication among process on embedded Linux. I am wondering why I'm not using the message queues provided by Linux as following:
msgctl, msgget msgrcv, msgsnd.
instead of creating shared memory, and sync up with semaphore?
What's the disadvantage of using this set of functions directly on a business embedded product?
The functions msgctl(), msgget(), msgrcv(), and msgsnd() are the 'System V IPC' message queue functions. They'll work for you, but they're fairly heavy-weight. They are standardized by POSIX.
POSIX also provides a more modern set of functions, mq_close(), mq_getattr(), mq_notify(), mq_open(), mq_receive(), mq_send(), mq_setattr(), and mq_unlink() which might be better for you (such an embarrassment of riches).
However, you will need to check which, if either, is installed on your target platforms by default. Especially in an embedded system, it could be that you have to configure them, or even get them installed because they aren't there by default (and the same might be true of shared memory and semaphores).
The primary advantage of either set of message facilities is that they are pre-debugged (probably) and therefore have concurrency issues already resolved - whereas if you're going to do it for yourself with shared memory and semaphores, you've got a lot of work to do to get to the same level of functionality.
So, (re)use when you can. If it is an option, use one of the two message queue systems rather than reinvent your own. If you eventually find that there is a performance bottleneck or something similar, then you can investigate writing your own alternatives, but until then — reuse!
System V message queues (the ones manipulated by the msg* system calls) have a lot of weird quirks and gotchas. For new code, I'd strongly recommend using UNIX domain sockets.
That being said, I'd also strongly recommend message-passing IPC over shared-memory schemes. Shared memory is much easier to get wrong, and tends to go wrong much more catastrophically.
Message passing is great for small data chunks and where immutability needs to be maintained, as message queues copy data.
A shared memory area does not copy data on send/receive and can be more efficient for larger data sets at the tradeoff of a less clean programming model.
The disadvantages message queues are miniscule - some system call and copying overhead - which amount to nothing for most applications. The benefits far outweigh that overhead. Synchronization is automatic and they can be used in a variety of ways: blocking, non-blocking, and since in linux the message queue types are implemented as file descriptors they can even be used in select() calls for multiplexing. In the POSIX variety, which you should be using unless you have a really compelling need to use SYSV queues, you can even automatically generate threads or signals to process the queue items. And best of all they are fully debugged.
Message queue and shared memory are different. And it is upto the programmer and his requirement to select which to use. In shared memory you have to be bit careful in reading and writing. And the processes should be synchronized. So the order of execution is very important in shared memory. In shared memory, there is no way to find whether the reading value is newly written value or the older one. And there is no explicit mechanism to wait.
Message queue and shared memory are different. And it is upto the programmer and his requirement to select which to use. There are predefined functions to make your life easy in message queue.

Inter-program communication for an arbitrary number of programs

I am attempting to have a bunch of independent programs intelligently allocate shared resources among themselves. However, I could have only one program running, or could have a whole bunch of them.
My thought was to mmap a virtual file in each program, but the concurrency is killing me. Mutexes are obviously ineffective because each program could have a lock on the file and be completely oblivious of the others. However, my attempts to write a semaphore have all failed, since the semaphore would be internal to the file, and I can't rely on only one thing writing to it at a time, etc.
I've seen quite a bit about named pipes but it doesn't seem to be to be a practical solution for what I'm doing since I don't know how many other programs there will be, if any, nor any way of identifying which program is participating in my resource-sharing operation.
You could use a UNIX-domain socket (AF_UNIX) - see man 7 unix.
When a process starts up, it tries to bind() a well-known path. If the bind() succeeds then it knows that it is the first to start up, and becomes the "resource allocator". If the bind() fails with EADDRINUSE then another process is already running, and it can connect() to it instead.
You could also use a dedicated resource allocator process that always listens on the path, and arbitrates resource requests.
Not entirely clear what you're trying to do, but personally my first thought would be to use dbus (more detail). Should be easy enough within that framework for your processes/programs to register/announce themselves and enumerate/signal other registered processes, and/or to create a central resource arbiter and communicate with it. Readily available on any system with gnome or KDE installed too.

What have you used sysv/posix message queues for?

I've never seen any project or anything utilizing posix or sysv message queues - and being curious, what problems or projects have you guys used them for ?
I had a series of commands that needed to be executed in order, but the main program flow did not depend on their completion so I queued them up and passed them to another process via a System V message queue to be executed independently of the main program. Since message queues provide an asynchronous communications protocol, they were a good fit for this task.
To be honest, I used System V message queues because I had never used them before and I wanted to. I'm sure there are other IPC methods I could have used.
It's been a while since I've done any real VxWorks programming, but you can also find message queues used in VxWorks applications. According to the VxWorks Application Programmer's Guide (Google search), the primary intertask communication mechanism within a single CPU is message queues. VxWorks uses two message queue subroutine libraries (POSIX and VxWorks).
I once wrote a text-mode I/O generator utility that had one thread in charge of updating the UI and a number of worker threads to do the actual I/O work. When a worker thread completed an I/O, it sent an update message to the UI thread. I implemented this message system using a POSIX message queue.
Why implement it like this? It sounded like a good idea at the time, and I was curious about how they worked. I figured I could solve the problem and learn something at the same time. There were many different techniques I could have used, and I don't suppose there was any profound reason why I chose this technique. I didn't realize it until later, but I was glad I used a POSIX queue when I had to port the utility to another system (it was also POSIX compliant, so I didn't have to worry about porting external libraries to get my app to run).
You can use it for IPC for sure because it is an IPC mechanism. With this mechanism you can write multi-process event processing applications in which all of the applications are using the queue and each of which are waiting for a special type of message (an special event to occur). When the message arrives that process takes the message, processes that and puts the result back into the queue so that the other process can use it.
Once i wrote such an application using message queues. It is pretty easy to work with and does not need Inter-process synchronization mechanisms such as semaphores. You can use it in place of Shared Memory of Memory Mapped files as well, in situations in which all you need is just sending a structure or some kind of packed data to other processes Message Queues are far easier to use than any other IPC mechanism.
This book contains all information you need to know about Message Queues and other IPC mechanisms in Linux.

Resources