Using thread-safe libraries - c

I can think of two ways a thread-safe library can be used:
One is having a global instance of the library protected by a mutex, which is initialised by the main thread and used by worker threads, like so:
mutex g_lib_mutex;
lib_t g_lib;
thread:
lock(&g_lib_mutex);
/* use lib */
unlock(&g_lib_mutex);
main:
lib_init(&g_lib);
start_threads(thread);
lock(&g_lib_mutex);
/* use lib */
unlock(&g_lib_mutex);
join_threads();
lib_close(&g_lib);
The other, is for every thread to have a local instance of the library, something like this:
thread:
lib_t g_lib;
lib_init(&g_lib);
/* use lib */
lib_close(&g_lib);
main:
start_threads(thread);
lib_t g_lib;
lib_init(&g_lib);
/* use lib */
lib_close(&g_lib);
Which of these ways is more correct / preferable?
Do I need to protect library calls with a global mutex in both cases?
I was trying to use libmysql and POSIX message queues in a multi-threaded application when this question crossed my mind.

Generally, only initialize a library once. Remember, all threads happen in the same process' memory space, so whatever you do to any global variables in thread X is true for all threads. Library initialization should happen only once per process.
Now, whether library calls are thread safe or must be protected by mutexes is a question of your library. Modern libraries should have definite documentation on what functions you're allowed to call from multiple threads. If that info is missing you can either
assume the worst and encapsulate everything that changes something that the library deals with, or calls into the library, with a single global mutex, or
read the source code of the library to figure out what might go wrong where, introduce security measures (mutexes/conditions) accordingly, and make sure that no one uses a different version of the library (where things might be different), or
improve the documentation, send that patch to the upstream developers asking them to verify that what you document in thread-(un)safety is intentional and matches reality, (documentation patches are, for any project that I know of, always welcome) or
modify the library itself to be thread safe (making yourself a hero).

Related

Calling a function from inside a pthread?

If I have a threaded function, and from inside it I call another function (that is located in a separate .c file), does each instance of the threaded function get it's own version of the called function? Example:
file 1
void* threaded_func(void* data) {
function2(...);
}
file 2
int a;
int function2(...) {
int b;
}
In my example, would int a be the same int a for all threads? What about for variables I define inside function2 (i.e int b)?
Some context:
I'm creating a simple HTTP server and my function2 is a serve_client function which should only return when the connection is closed. I'm having a problem that I think is due to some variables in file2 (int a) being overwritten for each new thread. I think my problem is me expecting a C source file to behave like a class would in Java.
does each instance of the threaded function get it's own version of the called function?
Loosely speaking, that's correct, provided the function doesn't use any global variables (e.g., int a; in your example) or any variable with static storage duration. If a is "shared" by multiple threads, that's likely to the source of your problem.
If a needs to be shared among threads, then you need to synchronize its access. If you a needs to be unique to each of threads, you need thread local storage. E.g., using C11's __Thread_local or gcc's __thread.
A C function is just machine code, and could be run by several threads (or processes). What matters is the data used by that function. C don't have closures (you could emulate them with callbacks, that is some function using additional client data). As answered by usr read also about thread local storage duration with thread_local. Each thread has its own call stack.
You should first read some POSIX thread tutorial.
You need to care about synchronization (and semaphores), notably using mutexes (and lock so serialize access to global shared variables). Read also about ACID transactions. Sometimes you can use atomic types and atomic operations.
BTW, you might want to use some HTTP server library like libonion.
You should look into the source code of existing free software projects (e.g. on github).
On Linux, read also about pthreads(7), nptl(7) and futex(7) and sem_overview(7) and clone(2).

Why thread specific data is required in pthread?

all the threads share memory location. For example a global variable changes in one thread will reflect in another thread. Since each thread has its own stack, the local
variables that are created inside the thread is unique. In this case, why do we need
to go for thread specific data mechanism?. Can't it be achieved by auto storage varibles
inside the thread function ?
Kindly clarify!!!.
BR
Rj
Normal globals are shared between threads. Local variables are specific to a particular invocation of a function. If you want something that (for example) is visible to a number of functions running in the same thread, but unique to that thread, then thread specific data is what you're looking for.
It's not required but it's rather handy. Some functions like rand and strtok use static storage duration information which is likely to be problematic when shared among threads.
Say you have a random number function where you want to maintain a different sequence (hence seed) for each thread. You have two approaches.
You can use something like the kludgy:
int seed;
srand (&seed, time (NULL));
int r = rand_r (void *seed);
where the seed has to be created by the caller and passed in each time.
Or you can use the rather nicer, ISO-compliant:
srand (time (NULL));
int r = rand();
that uses thread-local storage to maintain a thread-specific seed. Similarly with the information used by strtok regarding the locations within the string it's processing.
That way, you don't have to muck about with changing your code between threaded and non-threaded versions.
Now you could create that information in the thread function but how is the rand function going to know about it's address without it being passed down. And what if rand is called 87 stack levels down? That's an awful lot of levels to be transferring a pointer through.
And, even if you do something like:
void pthread_fn (void *unused) {
int seed;
rand_set_seed_location (&seed);
:
}
and rand subsequently uses that value regardless of how deep it is in the stack, that's still a code change from the standard. It may work but so may writing an operating system in COBOL. That doesn't make it a good idea :-)
Yes, the stack is one way of allocating thread-local storage (including handles to heap allocations local to the particular thread).
The best example for thread specific data is the "errno". When a call to some function in c library failed, the errno is set, and you can check it out to find the reason of the failure. If there's no thread specific data, it's impossible to port these functions to multi-thread environment because the errno could be set by other threads before you check it.
As a general rule, most uses of TSD should be avoided in new APIs. If a function needs some information, it should be passed to it.
However, sometimes you need TSD to 'paper over' an API defect. A good example is 'gmtime'. The 'gmtime' function returns a pointer to a structure that is valid until the next call to 'gmtime'. But that would make 'gmtime' awfully hard to use in a multi-threaded program. What if some library called 'gmtime' when you didn't expect it, trashing your structure? One simple workaround is make the structure returned thread-specific. (The long-term solution, of course, is to create a more suitable API such as 'gmtime_r'.)
One case where it's perfectly reasonable to use TSD in new designs is for information that won't be accessed frequently that would clutter the API. For example, if a critical error is discovered, it might be nice to log certain context information from higher-level code (Which client were you serving? What command did they send?). Your choices are basically to pass this context information from function to function to function (which isn't even always possible if some of the functions are outside your control) or to store it in TSD.

Information Exchange between two threads by calling a shared DLL

Can you create a "conversation" (or-Information Exchange) between 2 threads, if those two threads are calling a shared DLL library? And, if this conversation is possible, What are the requirements or restrictions for it to actually take place between the threads?
This question was given to us by our professor. I can only assume, by the question's context, that my professor is referring to synchronization required between the two threads for the conversation to succeed, or restricting the DLL linking type (Implicit or Explicit).
Then again, assumptions or not, I am rather at a loss here :)
P.s. - In this case, we are programming in C.
Thanks in advance for your help :)
It appears that your professor is testing your understanding of what space DLLs are loaded into, and how this relates to threads.
Without doing your homework for you, I encourage you to consider what happens if two threads each call LoadLibrary() on a particular DLL. Is the DLL loaded into the process twice?
Given the result of the above, what implications does this have regarding the two threads making calls into that DLL?
Did you think about using Boost.Interprocess, because C++ has many implicit allocations. In general you need a system-wide mutex in order to synchronize access to that portion of memory.
I think that give each thread calls for LoadLibrary() the system will allocate different memory segment for each DLL thus each thread will not have a mutual resource to work with thus they will be unable to exchange any information.
but...
Say we will link explicitly to the DLL using #Pragam Comment(lib, "myDLL.lib")
I think that in this way you'll be able to share resources between threads because the DLL is fully loaded at the program startup.
Jeff? .. is this right ?...

Tips to write thread-safe UNIX code?

What are the guidelines to write thread-safe UNIX code in C and C++?
I know only a few:
Don't use globals
Don't use static local storage
What others are there?
The simple thing to do is read a little. The following list contains some stuff to look at and research.
Spend time reading the Open Group Base Specification particularly the General Information section and the subsection on threads. This is the basis information for multithreading under most UN*X-alike systems.
Learn the difference between a mutex and a semaphore
Realize that everything that is shared MUST be protected. This applies to global variables, static variables, and any shared dynamically allocated memory.
Replace global state flags with condition variables. These are implemented using pthread_cond_init and related functions.
Once you understand the basics, learn about the common problems so that you can identify them when they occur:
Lock inversion deadlocks
Priority inversion - if you are interested in a real life scenario, then read this snippet about the Mars Pathfinder
It really comes down to shared state, globals and static local are examples of shared state. If you don't share state, you won't have a problem. Other examples of shared state include multiple threads writing to a file or socket.
Any shared resource will need to be managed properly - that might mean making something mutex protected, opening another file, or intelligently serializing requests.
If two threads are reading and writing from the same struct, you'll need to handle that case.
Beware of the sem_t functions, they may return uncompleted on interrupts, IO, SIGCHLD etc. If you need them, be sure to allways capture that case.
pthread_mut_t and pthread_cond_t functions are safe with respect to EINTR.
A good open book about concurrency in general can be found here: Little Book of Semaphores
It presents various problems that are solved step-by step and include solutions to common concurrency issues like starvation, race conditions etc.
It is not language-specific but contains short chapters about implementing the solutions in C with the Pthread-Library or Python.

example of thread specific data in C

Does anybody know of (or can post) an example of the use of thread-specific data? I'm looking for something that's clearly explained and easy to understand. I've got a global char * variable that I'm wanting to share between a couple threads, and I'm thinking this is what the thread specific data mechanism in C is for, am I right?
I'm a Linux user!
Actually, thread-specific data is for when you DON'T want to share data between threads -- with thread-specific data, each thread can use the same variable name, but that variable refers to distinct storage.
With gcc, you can declare a variable as thread-specific using the __thread attribute. If you are only trying to make a primitive type thread-specific, and you are only dealing with Linux and GCC, then this is a possible solution. If you actually want to be portable, though, between various unices (a desireable goal), or if you want to make complex data types thread-specific, than you need to use the UNIX routines for that...
The way it works in UNIX is that you use pthread_key_create before any thread is spawned, in order to create a unique variable name. You then use pthread_setspecific and pthread_getspecific to modify/access the data associated with the key. The semantics of the set/get specific functions is that the key behaves as an index into a map, where each thread has its own map, so executing these routines from different threads causes different data to be accessed/modified. If you can use a map, you can use thread-specific storage.
Obviously, when you are done, you need to call the appropriate routines to cleanup the data. You can use pthread_cleanup_push to schedule a cleanup routine to deallocate any datastructures you have associated with the thread-specific key, and you can use pthread_key_destroy when the key is no longer in use.
The errno variable from the original C runtime library is a good example. If a process has two threads making system calls, it would be extremely bad for that to be a shared variable.
thread 1:
int f = open (...);
if (f < 0)
printf ("error %d encountered\n", errno);
thread 2:
int s = socket (...);
if (s < 0)
printf ("error %d encountered\n", errno);
Imagine the confusion if open and socket are called at about the same time, both fail somehow, and both try to display the error number!
To solve this, multi-threaded runtime libraries make errno an item of thread-specific data.
The short answer to your question is: you don't have to do anything to share a variable between multiple thread. All the global variables are shared among all threads by default.
When a variable has to be different for each thread, if you are using a ISO-C99 compliant implementation (like GCC), you only need to add the __thread storage class keyword to your variable declaration, as in:
__thread char *variable;
This will instruct all the tiers in the building chain (cc, ld, ld.so, libc.so and libpthread.so) to manipulate this variable in a special thread-specific way.
The following compilers support this syntax (cf wikipedia):
Sun Studio C/C++
IBM XL C/C++
GNU C
Intel C/C++ (Linux systems)
Borland C++ Builder

Resources