Inter-program communication for an arbitrary number of programs - c

I am attempting to have a bunch of independent programs intelligently allocate shared resources among themselves. However, I could have only one program running, or could have a whole bunch of them.
My thought was to mmap a virtual file in each program, but the concurrency is killing me. Mutexes are obviously ineffective because each program could have a lock on the file and be completely oblivious of the others. However, my attempts to write a semaphore have all failed, since the semaphore would be internal to the file, and I can't rely on only one thing writing to it at a time, etc.
I've seen quite a bit about named pipes but it doesn't seem to be to be a practical solution for what I'm doing since I don't know how many other programs there will be, if any, nor any way of identifying which program is participating in my resource-sharing operation.

You could use a UNIX-domain socket (AF_UNIX) - see man 7 unix.
When a process starts up, it tries to bind() a well-known path. If the bind() succeeds then it knows that it is the first to start up, and becomes the "resource allocator". If the bind() fails with EADDRINUSE then another process is already running, and it can connect() to it instead.
You could also use a dedicated resource allocator process that always listens on the path, and arbitrates resource requests.

Not entirely clear what you're trying to do, but personally my first thought would be to use dbus (more detail). Should be easy enough within that framework for your processes/programs to register/announce themselves and enumerate/signal other registered processes, and/or to create a central resource arbiter and communicate with it. Readily available on any system with gnome or KDE installed too.

Related

When a process writes to a file

Generally, when a process writes to a file, e.g a python script running open('file', 'w').write('text'), what are the exact events that occur? By that I mean something among the lines of 'process A loads file from hard disk to RAM, process B changes content then ...'. I've read about IPC and now I'm trying to dig deeper and understand more on the subject of processes. I couldn't find a thorough explanation on the subject, so if you could find one or explain I'd really appreciate it.
The example of "a python script running open('file', 'w').write('text')" is heavily OS-dependent. The only processes involved here are the process running the Python interpreter, which, e.g. on Linux, can sometimes execute in userspace and sometimes execute in kernel space, and possibly some kernel-only processes, with any IPC, if required, happening inside the kernel. There is no particular requirement that everything down to the disk read itself cannot be handled on the user's process when it is running in kernel mode, but in practice, there may be other processes involved. This is OS- and even driver-specific behavior.
In this particular example (which isn't great, because it relies on the automatic cPython close when the variable goes out of scope), the Python process makes a system call to open a file, one to write the file, and one to close the file. These are all blocking -- that is, they do not return until the results are ready. When the process blocks, it is put on a queue waiting for some event to occur to make it ready to run again.
The opposite of this is asynchronous I/O, which can be performed by polling, by callbacks, or by the select statement, which can block until any one of a number of events has occurred.
But when most people talk about IPC, they are not usually talking about communication between or with kernel processes. Rather, they are talking about communication between multiple user processes and/or threads, using semaphores, mutexes, named pipes, etc. A good introduction to these sorts of things would be any tutorial information you can find on using pthreads, or even the Python threads and multiprocessing modules. There are examples there for several simple cases.
The primary difference between processes and threads on Linux is that threads share an address space and processes each have their own address space. Python itself adds the wrinkle of the GIL, which limits the utility of threads in Python somewhat.

POSIX Message Queues For Passing Data Between Pthreads

I have a Linux C program where I'm passing data between threads. I was looking into using POSIX message queues to solve this since they don't require mutexes/locks.
Looking at the mq_open() call, I have to specify permissions and the path to the queue. This leads me to two questions.
Is there a well known convention for specifying the filepath? I was
just going to dump the queues in the same folder as the executable.
In terms of permissions, I was going to use 0600, but I want to restrict this even further to prevent other processes from accessing the queues (I'm sharing data between threads and not processes). Given that the queue is "just" a file, can I use flock() with LOCK_EX to prevent accesses from other processes?
Thanks in advance.
Regarding your question 1 look at the implementation notes for mq_open on your system. At least on Linux and FreeBSD message queue names must start with a slash, but must not contain other slashes.
So while the name of a message queue looks like a path, it might or might not be an actual inode in a filesystem, depending on the implementation. According to mq_overview(7), Linux uses a virtual filesystem for message queues, which may or may not be mounted.
In view of this, question 2 might be moot. You'd have to run a test or check the kernel source if locking of a file in /dev/mqueue is actually even supported and if it accomplishes what you want.
I would not bother protecting the queue from outside processes.
Since flock is only advisory not mandatory it will not do you any good.
Also I not sure that flock will even work on queue descriptors.
Running your service as it's own user will keep other processes from being able to access the queue with mode 0600 of course.
I would however ensure on startup only one service can work on a queue at a time.
You could use pid locking or d-bus to do so.

How to portably share a variable between threads/processes?

I have a server that spawns a new process or thread for every incoming request and I need to read and write a variable defined in this server from both threads and processes. Since the server program needs to work both on UNIX and Windows I need to share the variable in a portable way, but how do I do it?
I need to use the standard C library or the native syscalls, so please don’t suggest third party libraries.
shared memory is operating system specific. On Linux, consider reading shm_overview(7) and (since with shared memory you always need some way to synchronize) sem_overview(7).
Of course you need to find out the similar (but probably not equivalent) Windows function calls.
Notice that threads are not the same as processes. Threads by definition share a common single address space. With threads, the main issue is then mostly synchronization, often using mutexes (e.g. pthread_mutex_lock etc...). On Linux, read a pthread tutorial & pthreads(7)
Recall that several libraries (glib, QtCore, Poco, ...) provide useful abstractions above operating system specific functionalities, but you seem to want avoiding them.
At last, I am not at all sure that sharing a variable like you ask is the best way to achieve your goals (I would definitely consider some message passing approach with an event loop: pipe(7) & poll(2), perhaps with a textual protocol à la JSON).

setting up IPC between unrelated processes

I would like to inject a shared library into a process (I'm using ptrace() to do that part) and then be able to get output from the shared library back into the debugger I'm writing using some form of IPC. My instinct is to use a pipe, but the only real requirements are:
I don't want to store anything on the filesystem to facilitate the communication as it will only last as long as the debugger is running.
I want a portable Unix solution (so Unix-standard syscalls would be ideal).
The problem I'm running into is that as far as I can see, if I call pipe() in the debugger, there is no way to pass the "sending" end of the pipe to the target process, and vice versa with the receiving end. I could set up shared memory, but I think that would require creating a file somewhere so I could reference the memory segment from both processes. How do other debuggers capture output when they attach to a process after it has already begun running?
I assume that you are in need of a debugging system for your business logic code (I mean application). From my experience, this kind of problem is tackled with below explained system design. (My experience is in C++, I think the same must hold good for the C based system also.)
Have a logger system (a separate process). This will contain - logger manager and the logging code - which will take the responsibility of dumping the log into hard disk.
Each application instance (process running in Unix) will communicate to this process with sockets. So you can have your own messaging protocol and communicate with the logger system with socket based communication.
Later, for each of this application - have a switch which can switch off/on the log.So that you can have a tool - to send signal to this process to switch on/off the message logging.
At a high level, this is the most generic way to develop a logging system. In case you need any information - Do comment it. I will try to answer.
Using better search terms showed me this question is a dup of these guys:
Can I share a file descriptor to another process on linux or are they local to the process?
Can I open a socket and pass it to another process in Linux
How to use sendmsg() to send a file-descriptor via sockets between 2 processes?
The top answers were what I was looking for. You can use a Unix-domain socket to hand a file descriptor off to a different process. This could work either from debugger to library or vice versa, but is probably easier to do from debugger to library because the debugger can write the socket's address into the target process while it injects the library.
However, once I pass the socket's address into the target process, I might as well just use the socket itself instead of using a pipe in addition.

Any possible solution to capture process entry/exit?

I Would like to capture the process entry, exit and maintain a log for the entire system (probably a daemon process).
One approach was to read /proc file system periodically and maintain the list, as I do not see the possibility to register inotify for /proc. Also, for desktop applications, I could get the help of dbus, and whenever client registers to desktop, I can capture.
But for non-desktop applications, I don't know how to go ahead apart from reading /proc periodically.
Kindly provide suggestions.
You mentioned /proc, so I'm going to assume you've got a linux system there.
Install the acct package. The lastcomm command shows all processes executed and their run duration, which is what you're asking for. Have your program "tail" /var/log/account/pacct (you'll find its structure described in acct(5)) and voila. It's just notification on termination, though. To detect start-ups, you'll need to dig through the system process table periodically, if that's what you really need.
Maybe the safer way to move is to create a SuperProcess that acts as a parent and forks children. Everytime a child process stops the father can find it. That is just a thought in case that architecture fits your needs.
Of course, if the parent process is not doable then you must go to the kernel.
If you want to log really all process entry and exits, you'll need to hook into kernel. Which means modifying the kernel or at least writing a kernel module. The "linux security modules" will certainly allow hooking into entry, but I am not sure whether it's possible to hook into exit.
If you can live with occasional exit slipping past (if the binary is linked statically or somehow avoids your environment setting), there is a simple option by preloading a library.
Linux dynamic linker has a feature, that if environment variable LD_PRELOAD (see this question) names a shared library, it will force-load that library into the starting process. So you can create a library, that will in it's static initialization tell the daemon that a process has started and do it so that the process will find out when the process exits.
Static initialization is easiest done by creating a global object with constructor in C++. The dynamic linker will ensure the static constructor will run when the library is loaded.
It will also try to make the corresponding destructor to run when the process exits, so you could simply log the process in the constructor and destructor. But it won't work if the process dies of signal 9 (KILL) and I am not sure what other signals will do.
So instead you should have a daemon and in the constructor tell the daemon about process start and make sure it will notice when the process exits on it's own. One option that comes to mind is opening a unix-domain socket to the daemon and leave it open. Kernel will close it when the process dies and the daemon will notice. You should take some precautions to use high descriptor number for the socket, since some processes may assume the low descriptor numbers (3, 4, 5) are free and dup2 to them. And don't forget to allow more filedescriptors for the daemon and for the system in general.
Note that just polling the /proc filesystem you would probably miss the great number of processes that only live for split second. There are really many of them on unix.
Here is an outline of the solution that we came up with.
We created a program that read a configuration file of all possible applications that the system is able to monitor. This program read the configuration file and through a command line interface you was able to start or stop programs. The program itself stored a table in shared memory that marked applications as running or not. A interface that anybody could access could get the status of these programs. This program also had an alarm system that could either email/page or set off an alarm.
This solution does not require any changes to the kernel and is therefore a less painful solution.
Hope this helps.

Resources