Question: are initialized pthread_mutex_t objects kernel persistent?
-- concern is for Linux V 2.6 onward.
Motivation:
If persistent: the objects resources will not be released with specific cleanup, pthread_mutex_destroy
In practical coding terms this means the mutex object will persist after the
creating program exits or aborts without cleanup, unless pthread_mutex_destroy
is called. I have code which is routinely removed by a nasty control program,
that employs kill -9, SIGKILL, after trying kill -15 (SIGTERM). The design
of the program is not going to change, it is vendor code. There is no way to
alter its base behavior. Correctly cleaning up the code often takes longer than the
control daemon likes, so 'zap' goes the process. This occurs frequently.
https://www.kernel.org/doc/Documentation/mutex-design.txt
From Ingo Molnar
[ this is older material which says 'yes', spinlocks are a kernel mode object ]
'struct mutex' is the new mutex type, defined in include/linux/mutex.h and
implemented in kernel/locking/mutex.c. It is a counter-based mutex with a
spinlock and a wait-list. The counter has 3 states: 1 for "unlocked", 0 for
"locked" and negative numbers (usually -1) for "locked, potential waiters
queued".
http://man7.org/linux/man-pages/man2/execve.2.html has:
All threads other than the calling thread are destroyed during an
execve(). Mutexes, condition variables, and other pthreads
objects are not preserved.
So calling one of the exec(), family is not a way to determine persistence.
http://man7.org/linux/man-pages/man3/exit.3.html has nothing about mutexes one
way or the other.
Can someone point me to definitive code or documentation one way or the other?
I need to confront our vendor with something solid.
Pthreads mutexes in Linux are not kernel objects. pthread_mutex_destroy does not make any system calls because there's no kernel resource to free. strace it and see for yourself.
The linked document by Ingo Molnar talks about mutexes that are internal to the Linux kernel, not about pthreads. They are totally different beasts.
Related
Background, from POSIX:
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
The difficulty is that we generally don't know if we're a multi-threaded process, since threads may have been created by library code. And "async-signal-safe" is a quite-severe restriction.
It is nonsensical to ask "how many threads are there", since if other threads are still running, they may be exiting or creating new threads while asking. We can, however, get answers (or partial answers) to simpler questions:
Is it even possible for other threads to exist?
Am I the only thread that ever existed?
Am I the only thread that exists right now?
...
For simplicity's sake let's assume:
we're not in a signal handler
nobody is mad enough to invoke UB by calling pthread_create or C11's thrd_create from a signal handler
nobody is doing threads outside of pthreads, C11, and C++11
C++11 threads appear to always be implemented in terms of pthreads (on platforms that support fork, at least)
C11 threads are very similar to pthreads, although we sometimes have to handle the functions separately.
Answers that involve arcane implementation details are encouraged, as long as they are (fairly) stable.
Some partial answers (more still needed):
Question 1 is addressed by libstdc++'s __gthread_active_p() for several libc implementations. The header is compatible with C, but it a static function in a C++-only part of the include path, and also relies on the existence of macro __GXX_WEAK__ which is only predefined for C++. (libc++ unconditionally pulls in pthreads)
Unfortunately, this is dangerously unreliable for the dlopen case (race conditions in correct user code), see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78017
Question 2 can be addressed by installing interceptors for pthread_create and thrd_creat. But this can potentially be finicky (see comments in gthr.h about interceptors).
If calling clock_gettime with CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID differs, this may be proof that another thread has existed, but beware of races, resolution, and clock settability (setting these clocks is not possible on Linux, but POSIX potentially allows it)
Question 3 is the interesting one, anyway:
GDB is likely to know the answer, but spawning a whole other process seems unnecessary (e.g. answers involving ps should be rewritten to use /proc/ directly)), and it may not have permission anyway
libthread_db.so exists but appears undocumented except in the original Solaris version. It looks like it might be possible to implement the proc_service.h callbacks for the current process however, if we ignore the "stop" part ...
On Linux, if gettid() != getpid(), you're not the main thread, thus there probably are at least two threads. (it's possible for the main thread to call pthread_exit, but this is weird)
A (somewhat) more portable version of the preceding: use __attribute__((constructor)) (or politely ask your caller) to stash the value of pthread_self() for the main thread. Unfortunately, there is a disturbing comment in libstdc++'s <thread> header (grep for __GLIBC__) about returning 0 (but I cannot reproduce this).
On Linux if /proc is mounted and accessible, you can enumerate /proc/self/task/. The code to do this is portable, it will just fail on OSes that don't provide this. (are there others that do provide this much?). Is /proc/self/status or /proc/self/stat any more portable? They have less information (and stat is hard to parse securely), but we probably don't need any more. (need to test these for the "main thread exited" case)
On GLIBC, we could possibly read the debug symbols to find the multiple_threads flag (sometimes global, sometimes part of struct pthread - ugh). But this is probably similar to libthread_db.so
Similarly MUSL has a count (minus one) and a linked list ... though it prefers to take an internal lock first. If we're only reading, is it safe to skip that?
If we block a signal and then kill the current process (not thread) with it, and our thread isn't the one that receives it, we know that other threads must exist to handle it. But there's no way to know how long to wait, and signals are dangerous global state anyway.
On Linux, unshare(2) ignores CLONE_THREAD for single-threaded processes and errors for multithreaded processes! (There's also some harder cases with user namespaces but I don't think they're needed)
On Linux, SELinux's setcon(3) is guaranteed to fail for multithreaded processes under certain conditions. This requires investigation; it takes some steps to correlate the kernel implementation to userland headers (there is a userland library involved).
From grepping kernel sources those are the only 2 that use specific functions, but there's nothing stopping other functions from being implemented on the same data structures.
I've just started to study the pthread API. I've been using different books and websites, and judging from what they all report, pthread synchronization functions (e.g. those involving mutexes) all work both for a uniprocessor and multiprocessor environments. But none of these sources explicitly stated it, so I wanted to know if that's actually the case (of course I believe so, I just wanted to be 100% sure).
So, if two threads running on different CPUs called a lock (e.g. pthread_mutex_lock()) on the same mutex at the same time, would the execution of this routine be executed sequentially rather than in parallel? And after the first lock is over and the thread invoking it has private access to the critical section, does the lock executed by the other thread on another CPU cause the latter thread to suspend?
Yes, it does. The POSIX API is described in terms of requirements on implementations - for example, a pthread_mutex_lock() that returns zero or EOWNERDEAD must return with the mutex locked and owned by the calling thread. There's no exception for multiprocessor environments, so conforming implementations in multiprocessor environments must continue to make it work.
So, if two threads running on different CPUs called a lock (e.g.
pthread_mutex_lock()) on the same mutex at the same time, would the
execution of this routine be executed sequentially rather than in
parallel?
It's not specified how pthread_mutex_lock() works underneath, but from an application point of view you know that if it doesn't return an error, your thread has acquired the lock.
And after the first lock is over and the thread invoking it has
private access to the critical section, does the lock executed by the
other thread on another CPU cause the latter thread to suspend?
Yes - the specification for pthread_mutex_lock() says:
If the mutex is already locked by another thread, the calling thread
shall block until the mutex becomes available.
I am trying to create a shared memory which will be used by multiple processes. these processes communicate with each other using MPI calls (MPI_Send, MPI_Recv).
I need a mechanism to control the access of this shared memory I added a question yesterday to see if MPI provides any facility to do that. Shared memory access control mechanism for processes created by MPI , but it seems that there is no such provision by MPI.
So I have to choose between named semaphore or flock.
For named semaphore if any of the process dies abruptly without calling sem_cloe(), than that semaphore always remains and can be seen by ll /dev/shm/. This results in deadlock sometimes(if I run the same code again!), for this reason I am currently thinking of using flock.
Just wanted to confirm if flock is best suited for this type of operation ?
Are there any disadvantages of using flock?
Is there anything else apart from named semaphore and flock that can be used here ?
I am working on C under linux.
You can also use a POSIX mutex in shared memory; you just have to set the "pshared" attribute on it first. See pthread_mutexattr_setpshared. This is arguably the most direct way to do what you want.
That said, you can also call sem_unlink on your named semaphore while you are still using it. This will remove it from the file system, but the underlying semaphore object will continue to exist until the last process calls sem_close on it (which happens automatically if the process exits or crashes).
I can think of two minor disadvantages to using flock. First, it is not POSIX, so it makes your code somewhat less portable, although I believe most Unixes implement it in practice. Second, it is implemented as a system call, so it will be slower. Both pthread_mutex_lock and sem_wait use the "futex" mechanism on Linux, which only does a system call when you actually have to wait. This is only a concern if you are grabbing and releasing the lock a lot.
In a comment on the question Automatically release mutex on crashes in Unix back in 2010, jilles claimed:
glibc's robust mutexes are so fast because glibc takes dangerous shortcuts. There is no guarantee that the mutex still exists when the kernel marks it as "will cause EOWNERDEAD". If the mutex was destroyed and the memory replaced by a memory mapped file that happens to contain the last owning thread's ID at the right place and the last owning thread terminates just after writing the lock word (but before fully removing the mutex from its list of owned mutexes), the file is corrupted. Solaris and will-be-FreeBSD9 robust mutexes are slower because they do not want to take this risk.
I can't make any sense of the claim, since destroying a mutex is not legal unless it's unlocked (and thus not in any thread's robust list). I also can't find any references searching for such a bug/issue. Was the claim simply erroneous?
The reason I ask and that I'm interested is that this is relevant to the correctness of my own implementation built upon the same Linux robust-mutex primitive.
I think I found the race, and it is indeed very ugly. It goes like this:
Thread A has held the robust mutex and unlocks it. The basic procedure is:
Put it in the "pending" slot of the thread's robust list header.
Remove it from the linked list of robust mutexes held by the current thread.
Unlock the mutex.
Clear the "pending" slot of the thread's robust list header.
The problem is that between steps 3 and 4, another thread in the same process could obtain the mutex, then unlock it, and (rightly) believing itself to be the final user of the mutex, destroy and free/munmap it. After that, if any thread in the process creates a shared mapping of a file, device, or shared memory and it happens to get assigned the same address, and the value at that location happens to match the pid of the thread that's still between steps 3 and 4 of unlocking, you have a situation whereby, if the process is killed, the kernel will corrupt the mapped file by setting the high bit of a 32-bit integer it thinks is the mutex owner id.
The solution is to hold a global lock on mmap/munmap between steps 2 and 4 above, exactly the same as in my solution to the barrier issue described in my answer to this question:
Can a correct fail-safe process-shared barrier be implemented on Linux?
The description of the race by FreeBSD pthread developer David Xu: http://lists.freebsd.org/pipermail/svn-src-user/2010-November/003668.html
I don't think the munmap/mmap cycle is strictly required for the race. The piece of shared memory might be put to a different use as well. This is uncommon but valid.
As also mentioned in that message, more "fun" occurs if threads with different privilege access a common robust mutex. Because the node for the list of owned robust mutexes is in the mutex itself, a thread with low privilege may corrupt a high privilege thread's list. This could be exploited easily to make the high privilege thread crash and in rare cases this might allow the high privilege thread's memory to be corrupted. Apparently Linux's robust mutexes are only designed for use by threads with the same privileges. This could have been avoided easily by making the robust list an array fully in the thread's memory instead of a linked list.
I am looking at changing some code that I would like to run on linux, unix, and OSX. There are some calls in the code for a sem_init, but the pshared value is set to zero. I did some reading in the Rochkind book on unix programming and he basically said that sem_init that is not shared is the same as a pthread_mutex_init because it's acting in an in-memory, binary fashion.
The question is - am I safe to change these sem_init's to pthread_mutex_init, or use sem_open to get a more portable version of this code?
OSX does not support unnamed semaphores, but I guess the other two do. I don't really want to have a separate compile flag to #ifdef(__APPLE__) or something either.
Thanks
mutexes and semaphore have different semantics. A mutex must be unlocked by the same thread that has taken the lock. So lock / unlock must always come in pairs in the same thread.
A semaphore is much more flexible in that another thread can post a token that another thread consumes. They are e.g commonly used to implement producer / consumer patterns. So you'd have to check the program that you want to port if it fits to the restricted semantic of mutexes.
The semantics of mutexes and semaphores are different. It is true that a non-shared semaphore is equivalent to a mutex if it is only used as a binary semaphore, i.e. if its value is never greater than 1. However, this is something you need to determine from your code's logic not how it is initialized. If you are sure that the semaphore is only used as a binary semaphore then a pthread mutex is a perfect replacement. If not you can either use sem_open() for portability or write a wrapper that emulates semaphores using pthread mutexes and condition variables.
Switching to mutexes should be safe in the given instance. If only one thread can enter the given critical section at a time, you effectively have a mutex whether it's written as a semaphore or not. However, depending on how the functions are implemented by the OS, you may get different performance characteristics. It's not something I would lose sleep over, but still something to keep in the back of your mind while testing.
I prefer to use mutex and condition_variable.
Because in my past work, I have encountered problems caused by incorrect use of semaphores, and these problems are extremely difficult to locate.
However, it's hard to use sem_init and sem_post in absolutely correct way.
Like:
// Thread a
sem_init(&sem);
// Thread b
sem_wait(&sem);
// Kernel: Linux 3.10
If Thread a starts before Thread b, Thread b may block on sem_wait forever.
It is hard to assume the start sequence of multi-threads, and thread a may restart when it crash. \
But if you call pthread_mutex_init repeatedly, the function will return EBUSY
https://pubs.opengroup.org/onlinepubs/007908799/xsh/pthread_mutex_init.html