Runtime-detecting nommu Linux unobtrusively - c

I'm looking for a reliable, unobtrusive runtime check a process can make for whether it's running on Linux without mmu. By unobtrusive, I mean having minimal or no side effects on process state. For example, getting EINVAL from fork would be one indication, but would create a child process if the test failed. Attempting to cause a fault and catch a signal is out of the question since it involves changing global signal dispositions. Anything involving /proc or /sys would be unreliable since they may not be mounted/visible (e.g. in chroot or mount namespace).
Failure of mprotect with ENOSYS seems to be reliable, and can be done without any side effects beyond the need to map a test page to attempt it on. But I'm not sure if it's safe to rely on this.
Are there any better/recommended ways to go about this?
Before anyone tries to challenge the premise and answer that this is known statically at compile time, no, it's not. Assuming you build a position-independent executable for an ISA level supported by both mmu-ful and mmu-less variants of the architecture, it can run on either. (I'm the author of the kernel commit that made this work.)

Related

Linux timers with O_ASYNC?

The man page for open() states for O_ASYNC:
This feature is available only for terminals, pseudoterminals,
sockets, and (since Linux 2.6) pipes and FIFOs. See fcntl(2) for
further details.
But I've used linux timers with epoll() successfully and setting O_ASYNC with fcntl() on a timer fd does not return an error. Obviously, no signals are being sent either. My question is, is it possible to get O_ASYNC working with linux timers? Are there any examples online? I know about the POSIX alternative, but was hoping to avoid it.
We can see that F_SETFL calls setfl which calls a fasync function specific to the type of file.
By searching for fasync we can see how async support is implemented in many devices. Seems that it's not too complicated as mostly it only needs the device to store the async registration and send the signal (here is the fasync function implementation for this kind of file).
Going back to setfl we can notice that if the file type's fasync function is null, it just silently succeeds. This could be a bug, or it could be intentional. Let's assume it's a bug.
Now that the bug is in the kernel, there are probably programs relying on it, which would stop working if the bug was fixed. If a program did break and someone complained about it, the fix would get undone so the program would keep working, because Linus doesn't like to break programs. If it doesn't break any programs that actually exist (which is unlikely, in my opinion), it can be fixed.
Another option is to update the documentation.
Another option is to make it actually work.
My question is, is it possible to get O_ASYNC working with linux timers?
It's unlikely (but still possible) that any program is setting O_ASYNC on a timerfd since it doesn't work - so it's unlikely that it will break compatibility. And it looks like it's not terribly complicated to implement, based on the other examples. So, go ahead and write this patch and send it to the mailing list.
If you meant if it's possible to implement on today's kernels, without a patch, the answer is no, it is not. Here is the timerfd ops structure and there is no entry for fasync
Are there any examples online?
Yes, the examples are the source code for all the other kinds of files that support fasync.

Does umount asynchronously release the underlying device?

I have some code that umounts a file system on a device and then immediately removes the device from device-mapper using the DM_DEV_REMOVE ioctl command.
Sometimes, as part of a stress test, I run this code in a tight loop of:
create the device
mount the file system on the device
unmount the file system
remove the device
Often, when running this test over thousands of iterations, I will eventually get the errno EBUSY when trying to remove the device. The umount is always successful.
I have tried searching on this issue, but mostly what I find is people having issues with getting EBUSY when umounting, which is not the problem I am having.
The closest thing to being helpful that I could find is that in the man page for dmsetup it talks about using the --retry option as a workaround for udev rules opening up devices when you are trying to remove them. Unfortunately for me though, I have been able to confirm that udev does not have my device open when I am trying to remove it.
I have used the DM_DEV_STATUS command to check the open_count for my device, and what I see is that the open_count is always 1 before the umount and when my test succeeds it was 0 after the umount and when it fails it was 1 after the umount.
Now, what I am trying to find out to root-cause my issue is, "Could my resource busy failure be caused by umount asynchronously releasing my device, thus creating a race condition?". I know that umount is supposed to be synchronous when it comes to the actual unmounting, but I couldn't find any documentation for whether releasing/closing the underlying device could occur asynchronously or not.
And, if it isn't umount holding a open handle to my device, are there any other likely candidates?
My test is running on a 3.10 kernel.
Historically, system calls blocked the process involved until all the task is done (being write(2) to a block device the first major exception for obvious reasons) The reason was that you need one process to do the job and the syscall involved process was there for that reason (and you could charge the cpu processing to that user's account)
Nowadays, there are plenty of kernel threads involved in solving non-process related issues, and the umount(2) syscall can be one of the syscalls demanding some background (I think it isn't as umount(2) is not frequently issued to justify a change in the code)
But linux is not a unix descendant, so umount(2) could be implemented this way. I don't believe that, anyway.
umount(2) syscall normally succeeds, except when inodes on the filesystem are in use. That's not the case. But the kernel can be involved in some heavy duty process that makes it to alloc some kernel memory (not swappable) and fail in the request. This can lead to the error (note that this is only a guess, I have not checked this in the code, you had better to look at the umount(2) syscall implementation) you get anyway.
There's another issue, that could block your umount process (or fail) in case you have touched someway the filesystem. There's some references dependency code that makes filesystems capable of resist power failures in a consistent status (in linux, this is calles ordered data, in BSD systems it is called software updates, that makes erased files to not be freed immediately after unlink(2). This could block umount(2) (or make it fail) if some data has to be updated on the filesystem, previous to make the actual umount(2) call. But again, this should not be your case, as you say, you don't modify the mounted filesystem.

Any possible solution to capture process entry/exit?

I Would like to capture the process entry, exit and maintain a log for the entire system (probably a daemon process).
One approach was to read /proc file system periodically and maintain the list, as I do not see the possibility to register inotify for /proc. Also, for desktop applications, I could get the help of dbus, and whenever client registers to desktop, I can capture.
But for non-desktop applications, I don't know how to go ahead apart from reading /proc periodically.
Kindly provide suggestions.
You mentioned /proc, so I'm going to assume you've got a linux system there.
Install the acct package. The lastcomm command shows all processes executed and their run duration, which is what you're asking for. Have your program "tail" /var/log/account/pacct (you'll find its structure described in acct(5)) and voila. It's just notification on termination, though. To detect start-ups, you'll need to dig through the system process table periodically, if that's what you really need.
Maybe the safer way to move is to create a SuperProcess that acts as a parent and forks children. Everytime a child process stops the father can find it. That is just a thought in case that architecture fits your needs.
Of course, if the parent process is not doable then you must go to the kernel.
If you want to log really all process entry and exits, you'll need to hook into kernel. Which means modifying the kernel or at least writing a kernel module. The "linux security modules" will certainly allow hooking into entry, but I am not sure whether it's possible to hook into exit.
If you can live with occasional exit slipping past (if the binary is linked statically or somehow avoids your environment setting), there is a simple option by preloading a library.
Linux dynamic linker has a feature, that if environment variable LD_PRELOAD (see this question) names a shared library, it will force-load that library into the starting process. So you can create a library, that will in it's static initialization tell the daemon that a process has started and do it so that the process will find out when the process exits.
Static initialization is easiest done by creating a global object with constructor in C++. The dynamic linker will ensure the static constructor will run when the library is loaded.
It will also try to make the corresponding destructor to run when the process exits, so you could simply log the process in the constructor and destructor. But it won't work if the process dies of signal 9 (KILL) and I am not sure what other signals will do.
So instead you should have a daemon and in the constructor tell the daemon about process start and make sure it will notice when the process exits on it's own. One option that comes to mind is opening a unix-domain socket to the daemon and leave it open. Kernel will close it when the process dies and the daemon will notice. You should take some precautions to use high descriptor number for the socket, since some processes may assume the low descriptor numbers (3, 4, 5) are free and dup2 to them. And don't forget to allow more filedescriptors for the daemon and for the system in general.
Note that just polling the /proc filesystem you would probably miss the great number of processes that only live for split second. There are really many of them on unix.
Here is an outline of the solution that we came up with.
We created a program that read a configuration file of all possible applications that the system is able to monitor. This program read the configuration file and through a command line interface you was able to start or stop programs. The program itself stored a table in shared memory that marked applications as running or not. A interface that anybody could access could get the status of these programs. This program also had an alarm system that could either email/page or set off an alarm.
This solution does not require any changes to the kernel and is therefore a less painful solution.
Hope this helps.

Inter-program communication for an arbitrary number of programs

I am attempting to have a bunch of independent programs intelligently allocate shared resources among themselves. However, I could have only one program running, or could have a whole bunch of them.
My thought was to mmap a virtual file in each program, but the concurrency is killing me. Mutexes are obviously ineffective because each program could have a lock on the file and be completely oblivious of the others. However, my attempts to write a semaphore have all failed, since the semaphore would be internal to the file, and I can't rely on only one thing writing to it at a time, etc.
I've seen quite a bit about named pipes but it doesn't seem to be to be a practical solution for what I'm doing since I don't know how many other programs there will be, if any, nor any way of identifying which program is participating in my resource-sharing operation.
You could use a UNIX-domain socket (AF_UNIX) - see man 7 unix.
When a process starts up, it tries to bind() a well-known path. If the bind() succeeds then it knows that it is the first to start up, and becomes the "resource allocator". If the bind() fails with EADDRINUSE then another process is already running, and it can connect() to it instead.
You could also use a dedicated resource allocator process that always listens on the path, and arbitrates resource requests.
Not entirely clear what you're trying to do, but personally my first thought would be to use dbus (more detail). Should be easy enough within that framework for your processes/programs to register/announce themselves and enumerate/signal other registered processes, and/or to create a central resource arbiter and communicate with it. Readily available on any system with gnome or KDE installed too.

Debugging tools for Multithreaded Apps

What free tools can I use to debug multithreaded programs created using the pthread library in Linux? (Besides pen & paper, of course...)
The usual way of printing debug messages isn't working out very well.
Debugging concurrent programs is inherently difficult, because the debugging tool tends to change the scheduling (often making it tamer so that the bug disappears).
One technique I've had some success with is to log to a data structure that is not protected by a lock. Then, once the system is idle, print the data structure or look at it in a debugger. The important thing is to avoid making a system call or invoking a synchronization primitive when logging, so that logging has minimal influence on the scheduler.
static char *log_buffer[LOG_BUFFER_LENGTH];
static size_t log_index;
#define LOG(message) (log_buffer[log_index++] = (message))
If your thread is interrupted in the middle of logging, the log buffer will become inconsistent. This is improbable enough to be useful for debugging, though it must be kept in mind. I've never tried this on a multiprocessor machine; I would expect the lack of synchronization to make the log buffer inconsistent very quickly in practice¹.
¹
Which is one more reason not to do multithreaded programming on multiprocessor machines. Use message passing instead.
Both the GNU gdb debugger and its ddd graphical front-end support threads.

Resources