How does _dl_runtime_resolve achieve thread safety?

How does _dl_runtime_resolve achieve thread safety? - linker

I'm reading this article: https://ypl.coffee/dl-resolve/
Since the GOT table is resolved at runtime (at first call to this function), I'm wondering how does _dl_runtime_resolve ensure it's multi-thread safe?

how does _dl_runtime_resolve ensure it's multi-thread safe?
Any pure function is thread-safe by definition.
_dl_runtime_resolve is not pure (updates the GOT slot), but if two threads are resolving the same symbol in parallel, both threads will resolve to the same symbol (i.e. they'll compute the exact same target address), and will write the same value into the GOT slot.
While technically this is a data race, practically it doesn't lead to any observable problems.

Related

Are we allowed to call pthread_key_create and pthread_key_delete from different threads? Should lmdb make this clear?

Details about the issue leading to the question
We're facing a SIGSEGV error under extremely rare circumstances when using the lmdb database library that are not easily reproducible. The best we got out of it is a stack-trace that looks like this through the core dump:
#0 in mdb_env_reader_dest (ptr=...) at .../mdb.c: 4935
#1 in __nptl_deallocate_tsd () at pthread_create.c:301
...
The line the stack-trace is pointing to is this (it's not the same because we attempted some changes to fix this).
Having tracked the issue for a while, the only explanation I have is that, when closing the lmdb environment from a thread other than the main thread, there's some kind of race between this line and this line, where the latter deletes the key, calls the custom destructor mdb_env_reader_dest, which causes SIGSEGV according to the documentation of lmdb, when attempting to use resources after freeing them.
The question
The documentation of pthread_key_create and pthread_key_delete are ambiguous to me, personally, in the sense whether they're talking about the calls to pthread_key_create and pthread_key_delete or the data underneath the pointers. This is what the docs say:
The associated destructor functions used to free thread-specific data at thread exit time are only guaranteed to work correctly when called in the thread that allocated the thread-specific data.
So the question is, can we call mdb_env_open and mdb_env_close from two different threads, leading to pthread_key_create and pthread_key_delete from different threads, and expect correct behavior?
I couldn't find such a requirement in the lmdb documentation. The closest I could find is this, which doesn't seem to reference the mdb_env_open function.

Are we allowed to call pthread_key_create and pthread_key_delete from different threads?
Yes.
However, a key must be created via pthread_key_create before it can be used by any thread, and that is not, itself, inherently thread safe. The key creation is often synchronized by performing it before starting any (other) threads that will use the key, but there are other alternatives.
Similarly, a key must not be deleted before all threads are finished with it, and the deletion is not, itself, inherently thread safe. TSD keys often are not deleted at all, and when they are deleted, that is often synchronized by first joining all (other) threads that may use the key. But again, there are other alternatives.
The documentation of pthread_key_create and pthread_key_delete are
ambiguous to me, personally, in the sense whether they're talking
about the calls to pthread_key_create and pthread_key_delete or the
data underneath the pointers. This is what the docs say:
The associated destructor functions used to free thread-specific data
at thread exit time are only guaranteed to work correctly when called
in the thread that allocated the thread-specific data.
The destructor functions those docs are talking about are the ones passed as the second argument to pthread_key_create().
And note well that that text is drawn from the Rationale section of the docs, not the Description section. It is talking about why the TSD destructor functions are not called by pthread_key_delete(), not trying to explain what the function does. That particular point is that TSD destructor functions must run in each thread carrying non-NULL TSD, as opposed to in the thread that calls pthread_key_delete().
So the question is, can we call mdb_env_open and mdb_env_close from
two different threads, leading to pthread_key_create and
pthread_key_delete from different threads, and expect correct
behavior?
The library's use of thread-specific data does not imply otherwise. However, you seem to be suggesting that there is a race between two different lines in the same function, mdb_env_close0, which can be the case only if that function is called in parallel by two different threads. The MDB docs say of mdb_env_close() that "Only a single thread may call this function." I would guess that they mean that to be scoped to a particular MDB environment. In any case, if you really have the race you think you have, then it seems to me that your program must be calling mdb_env_close() from multiple threads, contrary to specification.
So, as far as I know or can tell, the thread that calls mdb_env_close() does not need to be the same one that called mdb_env_open() (or mdb_env_create()), but it does need to be the only one that calls it.

Call functions that require the main thread from different fibers

There are lots of functions that are supposed to be called from the main thread. In my limited experience, these are mostly UI functions.
Examples:
-[UIApplication delegate] must be called from main thread only)
java.lang.IllegalStateException: Not on the main thread
Drawing to window from child thread
Suppose I have a fiber library that creates "threads" with set/get context. Is it safe to call main thread only functions from any fiber started from the main OS thread?
I think it is fine since the OS doesn't know about my fibers, but I'm not sure. I would test this, but the results would not be definitively since it might work but be relying on undefined behavior.
Edit: marking this question C since set/get context are C functions, although as mentioned in the comments I think it may apply to programs written in other languages as well.

Yes, you can call any function in your program from any context. Note that using getcontext and setcontext are not making real "threads", and you're not getting any parallel processing with this - you're only getting scheduling. That's why it will work, no matter if it's a UI function or not. It's basically just a goto that works cross-function. To quote the manpage directly:
If the context was obtained by a call of getcontext(), program
execution continues as if this call just returned.
That means if I write
... code ...
getcontext(&cxt);
... code ...
setcontext(&cxt);
Then when I reach setcontext, the state that I go to is identical to when the function getcontext just returned. There is no perceivable difference (Of course, you may have changed memory values in the mean time, but that's beside the point). The manpage has a similar guarantee with makecontext, but with the note that it'll redirect you after the given function finishes execution.
The examples you give are in higher level programming langauges, which have a lot more complexity, and thus are not as simple as setcontext/getcontext in C. The Java Error you posted seems to actually be a distinct OS thread, and same with the third example. The first example looks like it might be a fake thread but of course there are hidden complexities which might prevent UI calls from working (Since they interact with external APIs).
That's why threading in JS is so easy: because the threads aren't real. What you lose in parallel performance you gain in being able to call anything anywhere from your dispatched functions and ajax calls.
If you know your fiber library is really only using getcontext and setcontext, then you'll be fine. The library might do something else though, so it would be good to verify with the library writers in such a situation.

Glib hash table issues with signal handling code

I've got some system level code that fires timers every once in a while, and has a signal handler that manages these signals when they arrive. This works fine and seems completely reasonable. There are also two separate threads running alongside the main program, but they do not share any variables, but use glib's async queues to pass messages in one direction only.
The same code uses glib's GHashTable to store, well, key/value pairs. When the signal code is commented out of the system, the hash table appears to operate fine. When it is enabled, however, there is a strange race condition where the call to g_hash_table_lookup actually returns NULL (meaning that there is no entry with the key used to look it up), when indeed the entry is actually there (yes I made sure by printing the whole list of key/value pairs with g_hash_table_foreach). Why would this occur most of the time? Is GLib's hash table implementation buggy? Sometimes the lookup call is successful.
It's a very particular situation, and I can clarify further if it didn't make sense, but I'm hoping I am doing something wrong so that this can actually be fixed.
More info: The code segments that are not within the signal handler scope but access the g_hash_table variable are surrounded by signal blocking calls so that the signal handler does not access these variables when the process was originally accessing them too.

Generally, signal handlers can only set flags and make system calls
As it happens, there are severe restrictions in ISO C regarding what signal handlers can do, and most library entry points and most API's are not even remotely 100% multi-thread-safe and approximately 0.0% of them are signal-handler-safe. That is, there is an absolute prohibition against calling almost anything from a signal handler.
In particular, for GHashTable, g_hash_table_ref() and g_hash_table_unref() are the only API elements that are even thread-safe, and none of them are signal-handler safe. Actually, ISO-C only allows signal handlers to modify objects declared with volatile sig_atomic_t and only a couple of library routines may be called.
Some of us consider threaded systems to be intrinsically dangerous, practically radioactive sources of subtle bugs. A good place to start worrying is The Problem with Threads. (And note that signal handlers themselves are much worse. No one thinks an API is safe there...)

POSIX threads and global variables in C on Linux

If I have two threads and one global variable (one thread constantly loops to read the variable; the other constantly loops to write to it) would anything happen that shouldn't? (ex: exceptions, errors). If it, does what is a way to prevent this. I was reading about mutex locks and that they allow exclusive access to a variable to one thread. Does this mean that only that thread can read and write to it and no other?

Would anything happen that shouldn't?
It depends in part on the type of the variables. If the variable is, say, a string (long array of characters), then if the writer and the reader access it at the same time, it is completely undefined what the reader will see.
This is why mutexes and other coordinating mechanisms are provided by pthreads.
Does this mean that only that thread can read and write to it and no other?
Mutexes ensure that at most one thread that is using the mutex can have permission to proceed. All other threads using the same mutex will be held up until the first thread releases the mutex. Therefore, if the code is written properly, at any time, only one thread will be able to access the variable. If the code is not written properly, then:
one thread might access the variable without checking that it has permission to do so
one thread might acquire the mutex and never release it
one thread might destroy the mutex without notifying the other
None of these is desirable behaviour, but the mere existence of a mutex does not prevent any of these happening.
Nevertheless, your code could reasonably use a mutex carefully and then the access to the global variable would be properly controlled. While it has permission via the mutex, either thread could modify the variable, or just read the variable. Either will be safe from interference by the other thread.

Does this mean that only that thread can read and write to it and no other?
It means that only one thread can read or write to the global variable at a time.
The two threads will not race amongst themselves to access the global variable neither will they access it at the same time at any given point of time.
In short the access to the global variable is Synchronized.

First; In C/C++ unsynchronized read/write of variable does not generate any exceptions or system error, BUT it can generate application level errors -- mostly because you are unlikely to fully understand how the memory is accessed, and whether it is atomic unless you look at the generated assembler. A multi core CPU may likely create hard-to-debug race conditions when you access shared memory without synchronization.
Hence
Second; You should always use synchronization -- such as mutex locks -- when dealing with shared memory. A mutex lock is cheap; so it will not really impact performance if done right. Rule of thumb; keep the lcok for as short as possible, such as just for the duration of reading/incrementing/writing the shared memory.
However, from your description, it sounds like that one of your threads is doing nothing BUT waiting for the shared meory to change state before doing something -- that is a bad multi-threaded design which cost unnecessary CPU burn, so
Third; Look at using semaphores (sem_create/wait/post) for synchronization between your threads if you are trying to send a "message" from one thread to the other

As others already said, when communicating between threads through "normal" objects you have to take care of race conditions. Besides mutexes and other lock structures that are relatively heavy weight, the new C standard (C11) provides atomic types and operations that are guaranteed to be race-free. Most modern processors provide instructions for such types and many modern compilers (in particular gcc on linux) already provide their proper interfaces for such operations.

If the threads truly are only one producer and only one consumer, then (barring compiler bugs) then
1) marking the variable as volatile, and
2) making sure that it is correctly aligned, so as to avoid interleaved fetches and stores
will allow you to do this without locking.

Can I catch SIGSEGV and other signals in a multi-threaded (pthreads) app and print a backtrace of the thread that caused it, or all threads?

I saw Getting a backtrace of other thread but it didn't contain a lot of practical information.
What I want is to be able to catch SIGSEGV in a C multi-threaded app using POSIX threads running on Linux (CentOS, 2.6 kernel), and print the stack trace of the thread that caused it. Of course, not knowing which thread caused it, it's Good Enough For Me (tm) that the main thread that caught the signal to enumerate over all the threads and just print the stack trace of all of them.
It was noted over there that perhaps libunwind can be used for this, but its documentation is rather lacking and I couldn't find a good example of how to go about using it for this purpose. Also, I wondered if it has any significant performance overhead or other impact, and whether it is battle-tested and used in production code, or if it's mostly only used in debugging and development, and not in production systems.
Does anyone have sample code using libunwind or another reasonably straightforward (like not writing it in assembly) way to do this?

Getting the backtrace of the thread that caused the exception is easy, more or less:
Pass the -rdynamic flag to the linker
Then, in your coderegister signal handler, extract the EIP of the fault from the signal handler parameters and then use it and the backtrace() function to get an array of the addresses.
Find some way to pass the data in the array outside your app (to a different process over a pipe for exeample) and there you can use backtrace_symbols() to translate the backtrace to symbol names.
Make sure not to use any thread async non safe function in the signal handler, don't take any locks, allocate memory or call any function that does.
Here are the slides to a presentation I gave on the subject: http://www.scribd.com/doc/3726406/Crash-N-Burn-Writing-Linux-application-fault-handlers
The video is also available somewhere of the talk but I can't find it now...
Extending this to get the backtrace of multiple threads is possible but quite tricky - you need to keep tab of your various threads and send signals to them at the event of a crash