This answer recommends fts as a good way to have a reentrant filesystem traversal. While reading the manpages, however, I noticed that fts_read and fts_children are marked as MT-Unsafe.
I could not find anywhere information on why it was marked as such. I found this thread, so I suspect the reason is because of chdir being called (two threads will try to chdir the process at the same time, it can't be good).
If that is so, I guess that passing FTS_NOCHDIR would be enough to have thread safety. Is there any other reason I don't see?
(And for the record, I'm very surprised that we came to this day without having a good reentrant, reasonable to use way of scanning through a filesystem tree! Seriously? ☺)
Related
Details about the issue leading to the question
We're facing a SIGSEGV error under extremely rare circumstances when using the lmdb database library that are not easily reproducible. The best we got out of it is a stack-trace that looks like this through the core dump:
#0 in mdb_env_reader_dest (ptr=...) at .../mdb.c: 4935
#1 in __nptl_deallocate_tsd () at pthread_create.c:301
...
The line the stack-trace is pointing to is this (it's not the same because we attempted some changes to fix this).
Having tracked the issue for a while, the only explanation I have is that, when closing the lmdb environment from a thread other than the main thread, there's some kind of race between this line and this line, where the latter deletes the key, calls the custom destructor mdb_env_reader_dest, which causes SIGSEGV according to the documentation of lmdb, when attempting to use resources after freeing them.
The question
The documentation of pthread_key_create and pthread_key_delete are ambiguous to me, personally, in the sense whether they're talking about the calls to pthread_key_create and pthread_key_delete or the data underneath the pointers. This is what the docs say:
The associated destructor functions used to free thread-specific data at thread exit time are only guaranteed to work correctly when called in the thread that allocated the thread-specific data.
So the question is, can we call mdb_env_open and mdb_env_close from two different threads, leading to pthread_key_create and pthread_key_delete from different threads, and expect correct behavior?
I couldn't find such a requirement in the lmdb documentation. The closest I could find is this, which doesn't seem to reference the mdb_env_open function.
Are we allowed to call pthread_key_create and pthread_key_delete from different threads?
Yes.
However, a key must be created via pthread_key_create before it can be used by any thread, and that is not, itself, inherently thread safe. The key creation is often synchronized by performing it before starting any (other) threads that will use the key, but there are other alternatives.
Similarly, a key must not be deleted before all threads are finished with it, and the deletion is not, itself, inherently thread safe. TSD keys often are not deleted at all, and when they are deleted, that is often synchronized by first joining all (other) threads that may use the key. But again, there are other alternatives.
The documentation of pthread_key_create and pthread_key_delete are
ambiguous to me, personally, in the sense whether they're talking
about the calls to pthread_key_create and pthread_key_delete or the
data underneath the pointers. This is what the docs say:
The associated destructor functions used to free thread-specific data
at thread exit time are only guaranteed to work correctly when called
in the thread that allocated the thread-specific data.
The destructor functions those docs are talking about are the ones passed as the second argument to pthread_key_create().
And note well that that text is drawn from the Rationale section of the docs, not the Description section. It is talking about why the TSD destructor functions are not called by pthread_key_delete(), not trying to explain what the function does. That particular point is that TSD destructor functions must run in each thread carrying non-NULL TSD, as opposed to in the thread that calls pthread_key_delete().
So the question is, can we call mdb_env_open and mdb_env_close from
two different threads, leading to pthread_key_create and
pthread_key_delete from different threads, and expect correct
behavior?
The library's use of thread-specific data does not imply otherwise. However, you seem to be suggesting that there is a race between two different lines in the same function, mdb_env_close0, which can be the case only if that function is called in parallel by two different threads. The MDB docs say of mdb_env_close() that "Only a single thread may call this function." I would guess that they mean that to be scoped to a particular MDB environment. In any case, if you really have the race you think you have, then it seems to me that your program must be calling mdb_env_close() from multiple threads, contrary to specification.
So, as far as I know or can tell, the thread that calls mdb_env_close() does not need to be the same one that called mdb_env_open() (or mdb_env_create()), but it does need to be the only one that calls it.
Background, from POSIX:
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
The difficulty is that we generally don't know if we're a multi-threaded process, since threads may have been created by library code. And "async-signal-safe" is a quite-severe restriction.
It is nonsensical to ask "how many threads are there", since if other threads are still running, they may be exiting or creating new threads while asking. We can, however, get answers (or partial answers) to simpler questions:
Is it even possible for other threads to exist?
Am I the only thread that ever existed?
Am I the only thread that exists right now?
...
For simplicity's sake let's assume:
we're not in a signal handler
nobody is mad enough to invoke UB by calling pthread_create or C11's thrd_create from a signal handler
nobody is doing threads outside of pthreads, C11, and C++11
C++11 threads appear to always be implemented in terms of pthreads (on platforms that support fork, at least)
C11 threads are very similar to pthreads, although we sometimes have to handle the functions separately.
Answers that involve arcane implementation details are encouraged, as long as they are (fairly) stable.
Some partial answers (more still needed):
Question 1 is addressed by libstdc++'s __gthread_active_p() for several libc implementations. The header is compatible with C, but it a static function in a C++-only part of the include path, and also relies on the existence of macro __GXX_WEAK__ which is only predefined for C++. (libc++ unconditionally pulls in pthreads)
Unfortunately, this is dangerously unreliable for the dlopen case (race conditions in correct user code), see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78017
Question 2 can be addressed by installing interceptors for pthread_create and thrd_creat. But this can potentially be finicky (see comments in gthr.h about interceptors).
If calling clock_gettime with CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID differs, this may be proof that another thread has existed, but beware of races, resolution, and clock settability (setting these clocks is not possible on Linux, but POSIX potentially allows it)
Question 3 is the interesting one, anyway:
GDB is likely to know the answer, but spawning a whole other process seems unnecessary (e.g. answers involving ps should be rewritten to use /proc/ directly)), and it may not have permission anyway
libthread_db.so exists but appears undocumented except in the original Solaris version. It looks like it might be possible to implement the proc_service.h callbacks for the current process however, if we ignore the "stop" part ...
On Linux, if gettid() != getpid(), you're not the main thread, thus there probably are at least two threads. (it's possible for the main thread to call pthread_exit, but this is weird)
A (somewhat) more portable version of the preceding: use __attribute__((constructor)) (or politely ask your caller) to stash the value of pthread_self() for the main thread. Unfortunately, there is a disturbing comment in libstdc++'s <thread> header (grep for __GLIBC__) about returning 0 (but I cannot reproduce this).
On Linux if /proc is mounted and accessible, you can enumerate /proc/self/task/. The code to do this is portable, it will just fail on OSes that don't provide this. (are there others that do provide this much?). Is /proc/self/status or /proc/self/stat any more portable? They have less information (and stat is hard to parse securely), but we probably don't need any more. (need to test these for the "main thread exited" case)
On GLIBC, we could possibly read the debug symbols to find the multiple_threads flag (sometimes global, sometimes part of struct pthread - ugh). But this is probably similar to libthread_db.so
Similarly MUSL has a count (minus one) and a linked list ... though it prefers to take an internal lock first. If we're only reading, is it safe to skip that?
If we block a signal and then kill the current process (not thread) with it, and our thread isn't the one that receives it, we know that other threads must exist to handle it. But there's no way to know how long to wait, and signals are dangerous global state anyway.
On Linux, unshare(2) ignores CLONE_THREAD for single-threaded processes and errors for multithreaded processes! (There's also some harder cases with user namespaces but I don't think they're needed)
On Linux, SELinux's setcon(3) is guaranteed to fail for multithreaded processes under certain conditions. This requires investigation; it takes some steps to correlate the kernel implementation to userland headers (there is a userland library involved).
From grepping kernel sources those are the only 2 that use specific functions, but there's nothing stopping other functions from being implemented on the same data structures.
I am new to C programming. I used to think using exit() was the cleanest way of process termination (as it is capable of removing temporary files, closing open files, normal process termination...), but when I tried man exit command on the terminal (Ubuntu 16.04.5, gcc 5.4.0) I saw the following line:
The exit() function uses a global variable that is not protected, so
it is not thread-safe.
After that I tried to make some research about better replacement for exit() (to change my programming behavior from the beginning). While doing that I faced with this question in which side effects of exit() is mentioned and it is suggested to use atexit() properly to solve the problem (at least partially).
There were some cases in which using abort() was preferred over exit(). On top of that, this question suggests that atexit() might also be harmful.
So here are my questions:
Is there any general and better way of process terminating (which is guaranteed to clean like exit() and is not harmful for the system at any case)?
If the answer to the first question is NO!, what is the best possible way of process terminating (including the cases in which they are most useful)?
what is the best possible way of process terminating
If going single threaded just use exit(), as your code is not going multi-threaded.
Else make sure all but one thread have ended before the last thread and then safely call exit() because of 1. above.
Given that power/hardware fails can happen at any time, the imposs.. extreme difficulty of reliably terminating threads with user code and the chaotic nature of the use of memory pools etc. in many non-trivial multithreaded apps, it is better to design apps and systems that can clean temp files etc. on start-up, rather than trying to micro-manage shutdown.
'Clean up all the resources you allocate before you exit' sounds like good advice in a classroom or lecture, but quickly becomes a whole chain of albatross round your neck when faced with a dozen threads, queues and pools in a continually changing dynamic system.
If you can, if you are running under a non trivial OS, let it do its job and clean up for you. It's much better at it than your user code will ever be.
I have a multi-threaded application in a POSIX/Linux environment - I have no control over the code that creates the pthreads. At some point the process - owner of the pthreads - receives a signal.
The handler of that signal should abort,cancel or stop all the pthreads and log how many pthreads where running.
My problem is that I could not find how to list all the pthreads running in process.
There doesn't seem to be any portable way to enumerate the threads in a process.
Linux has pthread_kill_other_threads_np, which looks like a leftover from the original purely-userland pthreads implementation that may or may not work as documented today. It doesn't tell you how many threads there were.
You can get a lot of information about your process by looking in /proc/self (or, for other processes, /proc/123). Although many unices have a file or directory with that name, the layout is completely different, so any code using /proc will be Linux-specific. The documentation of /proc is in Documentation/filesystems/proc.txt in the kernel source. In particular, /proc/self/task has a subdirectory for each thread. The name of the subdirectory is the LWP id; unfortunately, [1][2][3] there doesn't seem to be a way to associate LWP ids with pthread ids (but you can get your own thread id with gettid(2) if you work for it). Of course, reading /proc/self/task is not atomic; the number of threads is available atomically through /proc/self/status (but of course it might change before you act on it).
If you can't achieve what you want with the limited support you get from Linux pthreads, another tactic is to play dynamic linking tricks to provide your own version of pthread_create that logs to a data structure you can inspect afterwards.
You could wrap ps -eLF (or another command that more closely reads just the process you're interested in) and read the NLWP column to find out how many threads are running.
Given that the threads are in your process, they should be under your control. You can record all of them in a data structure and keep track.
However, doing this won't be race-condition free unless it's appropriately managed (or you only ever create and join threads from one thread).
Any threads created by libraries you use are their business and you should not be messing with them directory, or the library may break.
If you are planning to exit the process of course, you can just leave the threads running anyway, as calling exit() terminates them all.
Remember that a robust application should be crash-safe anyway, so you should not depend upon shutdown behaviour to avoid data loss etc.
Is there something equivalent to SIGSTOP and SICONT for threads? Am using pthreads.
Thanks
An edit:
I am implementing a crude form of file access syncronization among threads. So if a file is already opened by a thread, and another thread wants to open it again, I need to halt or pause the second thread at that point of its execution. When the first thread has completed its work it will check what other threads wanted to use a file it released and "wake" them up. The second thread then resumes execution from exactly that point. I use my own book keeping datastructures.
I'm going to tell you how to do things instead of answering the question. (Look up the "X Y problem".)
You are trying to prevent two threads from accessing the same file at the same time. In other words, access is MUTually EXclusive. A "mutex" is designed to do this. In general, it is easier to find help if you search for what you are trying to do (prevent two threads from accessing the same resource simultaneously) rather than searching for how you want to do it (make one thread wait for the other).
Edit: It sounds like you actually want many readers but one writer. This is probably the second most common synchronization problem (after the "producer-consumer" problem). Use a pthread_rwlock: readers call pthread_rdlock and writers call pthread_wrlock.
If you're doing something this sophisticated, you really should start reading the relevant literature. If you think you can do multithreaded programming some serious reading, you are much smarter than me and you don't need my help. I recommend "The Little Book of Semaphores" which is a free download (source). It's not about pthreads, but it's good stuff. The readers-writers problem you are asking about is found under §4.2 in the chapter "Classical Synchronization Problems" (heck, this problem is even mentioned in the blurb).
Multithreaded programing is HARD with capital letters and a bold font.
Well, there is pthread_kill.
But you almost certainly do not want to do this. What if the other thread holds (e.g.) a mutex for the heap, and you try to call new while it is stopped?
Since you do not know what the runtime is doing with mutexes, there is no way to avoid this kind of problem in general unless you completely avoid the standard library.
[edit]
Actually, come to think of it, I am not sure what happens if you target a specific thread with SIGSTOP, since that signal usually affects the whole process.
So to update my answer, I do not believe there is any standard mechanism for suspending a thread asynchronously... And for the reason mentioned above, I do not think you want one.
Depending on your application, Pthreads supports what can be considered more refined mechanisms, such as http://www.unix.com/man-page/all/3t/pthread_suspend/ and Mutex mechnisms