Making a C library thread safe - c

I am writing a shared library in C. I know C functions are not thread safe.
My library routines looks like,
struct lib_handle {
....
};
int lib_init(lib_handle **handle);
int lib_process(lib_handle *handle);
....
....
Every method takes a pointer to lib_handle object. All the state is stored inside this structure. No global variables are used.
I assume if each thread creates it's own lib_handle instances, multiple threads can use the library functions. Since each thread has it's own handle, everythibg should work.
I haven't validated this assumption yet. I am wondering what you guys think about this design and do you thing I can state my library as thread safe given each thread has it's own handles?
Any help would be great!

That will make data/state of library thread safe.
But you also have to make sure that your library uses threadsafe functions from other libraries, e.g. use strtok_r instead of strtok.

Threads works in shared memory space. Unsafe objects are the objects which can be accessed by multiple threads simulteniously. So if you have single lib_handle object for each threads there will be no problems.

If each thread has a private lib_handle object your library should be fully threadsafe; if you let several threads share lib_handle objects the person using your library can still makea thread safe program if she uses your library correctly (i.e. your library is not inherently thread-unsafe which it would be if you used e.g. global variables).
If this mode of operation (shared lib_handle) is interesting you should clearly separate the functions which only read the state of lib_handle and those which manipulate the state of lib_handle. The former needing a read lock and the latter needing a write lock (the calling scope must handle this).
For what it is worth I have used the pattern you describe quite a lot, and like it.

Related

Initializing a thread local variable

Is there a mechanism for a library supplying a thread local variable to register a constructor function for it?
I'd like to have my library supply a thread-local struct which should be initialized on thread creation with dynamically obtained data.
If the struct was just global but not thread-local, I'd have a function marked with gcc's __attribute__((__constructor__)) initialize it, but these constructors don't retrigger when a new thread is created.
No, thread creation does not invoke any constructors. This is a good thing; automatic invocation of constructors would not scale in a potentially large application where most threads have nothing to do with your library code and will never call it.
Instead, you need to either have your library code that uses the thread-local object construct it lazily on the first library call in the new thread, or require the calling application to call an initialization function explicitly in threads that will use it. The first option is generally a lot better and the performance impact should not even be measurable; accessing thread-local storage in a library takes longer than a predictable branch:
static _Thread_local int init_done;
if (!init_done) ...

Is shared library local variable thread safe?

I'm developing a multi-threaded application which application which will access a shared library, now i see that the shared library doesn't contain any global variable, so does it mean that the library is thread safe? for example.
I'm calling function func() from various threads to a shared library like:
thread 1 -> func()
thread 2 -> func()
...
thread N ->func()
and the func() is defined as below,
void func(){
int var;
func2(&var);
}
In this cases, will it be thread safe?
The usage that you are showing is thread-safe, because invocations of func from each thread will have their own copy of the variable var.
This is not a guarantee, though, for several reasons:
Library needs to be careful about its use of static variables as well. If you replace int var with static int var, func would not longer be thread-safe
You need to be careful about calling the library. If the same pattern that you show is present in your code, i.e. if your code shares a local variable among threads, the code would not be thread-safe.
The library may use functions that are not thread-safe, such as strtok. Using these functions makes your library not thread-safe.
Yes, the code in question will execute in the context of each thread, and the local automatic variable will typically be stored on each thread's stack.

non-thread safe libraries and threads

Using non-thread safe libraries with threads. Say I have a library that makes a connection to a server. And it is non thread safe. Can I use initiate the library inside 2 threads?
ie:
thread_1(){
telnet_lib_t *connection1;
while(1){
do_somestuff
}
free_telnet(connection1);
}
thread_2(){
telnet_lib_t *connection2;
while(1){
do_somestuff;
}
free_telnet(connection2);
}
Would this work? Now I have 2 independent instances of the library running. So they would not interfere with each other, right?
No, you can't do this. If the library has no global state and its functions are just internally non-thread-safe, you could solve the problem by having the whole library protected by a mutex and only allowing one thread to access it at once (but this might be prohibitive, especially if the library performs any slow or blocking tasks). But if the library fundamentally has a singular global state it uses, and no way to save/restore/swap states, then there's simply no way to use it in multiple threads (or even to use multiple contexts in alternation in a non-threaded program). Such libraries are generally considered trash and should be replaced.
It depends on why it isn't threadsafe.
For example, if the library uses some static variable then that would still be shared between two threads.
So, generally this would be a bad idea. If something isn't threadsafe don't use it in threads, but, you could fork a child process and then use it, which is heavier than threads, but safer.
Without knowing more about the specifics of the non-thread-safe library, it's not possible to say it's safe to use as you suggest.
If the library has any global shared resource (e.g. a global variable), the two threads could well step on each other, overwriting that global variable in a manner not intended by the library writer.
The trouble is, no amount of testing can prove with certainty that you will not eventually trigger a conflict.
If you must use the library in parallel, the only safe way I can think of to do that is to use process isolation... create multiple child processes that each load an instance of the library.
You can do that only if you know that the telnet_lib_t and its methods don't work with any global state (i.e. they are not relying on global variables). If you know that the library's state is contained within itself, for sure then use it, otherwise don't do it. Even if you don't run into any issue during your testing, it won't mean there isn't a problem lurking somewhere.

Modifying a threads data from outside the thread

How does one modify a threads data from outside a thread?
If a thread is running a function that loops for the runtime of the application, how can its data be set, changed?
How does one call functions which modify a specific threads functions?
Where do these functions belong?
The advantage and disadvantage of threads is that they share the memory space with every other thread in the process. You can use any form of data transfer you would use in single threaded applications to pass data betweens segments of you application. However, in a multi-threaded application you must use some type of synchronization to assure data integrity and prevent deadlocks.
If the "thread's data" you want to modify from outside is in the form of local variables in a function running in the thread, or thread-specific data created with the __thread extension, then the only way you can modify them from outside (modulo code with UB that's technically just trashing memory) is by having the thread take the addresses of its variables and store that somewhere where other threads can see it (either in a global variable, or at a location passed in via the thread start function's void * argument.
Also note that, as rerun pointed out, you have to use some method of synchronization if multiple threads are accessing the same data. The only standard/portable synchronization methods are the pthread ones: pthread_mutex_lock etc., but you can also use assembly or compiler intrinsics (like __sync_* in gcc).

Can we lock a function with a pthreads mutex for all its other calls?

Say a program spawns a thread. That thread calls func1(). However, func1() is also called in various places elsewhere in the main app. If i wrap it in a mutex lock in the thread only, will it be safe for the whole of the app? Or will one have to go in it and lock it? And if in it are other functions that are called by it but also on the main app in various places, does one have to go recursively and lock them?
Get out of the habit of thinking that you protect functions with mutexes, you don't.
You actually protect resources such as variables shared amongst threads.
Once you accept that little pearl of wisdom, you start thinking in terms of what data has to be protected and can minimise the granularity of the protections.
For example, if func1() and func2() both access the shared variable x, and you can call func2() either from func1() or main(), you're going to have to engineer a solution that can detect if the mutex is already locked so that func2() can claim/release (when called from main) or do nothing (when called from func1()). Either that or use a recursive mutex.
Functions which are thread-unsafe (such as using static data areas) can be protected with mutexes but I find it's usually easier to refactor them so that they're inherently thread-safe (with allocated memory or thread-local storage).
You only need to lock shared resources, or anything not thread-local. You should also consider writing your functions to be reentrant whenever possible. Reentrant functions are inherently thread-safe, whereas not all thread-safe functions can be made reentrant.
As long as you declare static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; in the function and use it, you can accomplish what you want. But making functions which are not reentrant, which have global state, etc. is generally a Bad Thing(tm). Good design is to lock data, and not to have globals (or singletons which is a euphemism for global variables).

Resources