Thread Safety in C

Thread Safety in C - c

imagine I write a library in C. Further, imagine this library to be used from a multi-threaded environment. How do I make it thread-safe? More specific: How do I assure, that certain functions are executed only by one thread at a time?
In opposite to Java or C# for example, C has no means to deal with threads/locks/etc., nor does the C standard library. I know, that operating systems support threads, but using their api would restrict the compatibility of my library very much. Which possibilities do I have, to keep my library as compatible/portable as possible? (for example relying on OpenMP, or on Posix threads to keep it compatible with at least all unix-like operating systems?)

You can create wrappers with #ifdef. It's really the best you can do. (Or you can use a third party library to do this).
I'll show how I did it as an example for windows and linux. It's in C++ and not C but again it's just an example:
#ifdef WIN32
typedef HANDLE thread_t;
typedef unsigned ThreadEntryFunction;
#define thread __declspec(thread)
class Mutex : NoCopyAssign
{
public:
Mutex() { InitializeCriticalSection(&mActual); }
~Mutex() { DeleteCriticalSection(&mActual); }
void Lock() { EnterCriticalSection(&mActual); }
void Unlock() { LeaveCriticalSection(&mActual); }
private:
CRITICAL_SECTION mActual;
};
class ThreadEvent : NoCopyAssign
{
public:
ThreadEvent() { Actual = CreateEvent(NULL, false, false, NULL); }
~ThreadEvent() { CloseHandle(Actual); }
void Send() { SetEvent(Actual); }
HANDLE Actual;
};
#else
typedef pthread_t thread_t;
typedef void *ThreadEntryFunction;
#define thread __thread
extern pthread_mutexattr_t MutexAttributeRecursive;
class Mutex : NoCopyAssign
{
public:
Mutex() { pthread_mutex_init(&mActual, &MutexAttributeRecursive); }
~Mutex() { pthread_mutex_destroy(&mActual); }
void Lock() { pthread_mutex_lock(&mActual); }
void Unlock() { pthread_mutex_unlock(&mActual); }
private:
pthread_mutex_t mActual;
};
class ThreadEvent : NoCopyAssign
{
public:
ThreadEvent() { pthread_cond_init(&mActual, NULL); }
~ThreadEvent() { pthread_cond_destroy(&mActual); }
void Send() { pthread_cond_signal(&mActual); }
private:
pthread_cond_t mActual;
};
inline thread_t GetCurrentThread() { return pthread_self(); }
#endif
/* Allows for easy mutex locking */
class MutexLock : NoAssign
{
public:
MutexLock(Mutex &m) : mMutex(m) { mMutex.Lock(); }
~MutexLock() { mMutex.Unlock(); }
private:
Mutex &mMutex;
};

You will need to use your OS's threading library. On Posix, that will usually be pthreads and you'll want pthread_mutex_lock.
Windows has it's own threading library and you'll want to look at either critical sections or CreateMutex. Critical sections are more optimized but are limited to a single process and you can't use them in WaitForMultipleObjects.

You have two main options:
1) You specify which multi-threaded environment your library is thread-safe in, and use the synchronisation functions of that environment.
2) You specify that your library is not thread-safe. If your caller wants to use it in a multi-threaded environment, then it's their responsibility to make it thread-safe, by using external synchronisation if necessary to serialise all calls to your library. If your library uses handles and doesn't need any global state, this might for instance mean that if they have a handle they only use in a single thread, then they don't need any synchronisation on that handle, because it's automatically serialised.
Obviously you can take a multi-pack approach to (1), and use compile-time constants to support all the environments you know about.
You could also use a callback architecture, link-time dependency, or macros, to let your caller tell you how to synchronise. This is kind of a mixture of (1) and (2).
But there's no such thing as a standard multi-threaded environment, so it's pretty much impossible to write self-contained code that is thread-safe everywhere unless it's completely stateless (that is, the functions are all side-effect free). Even then you have to interpret "side-effect" liberally, since of course the C standard does not define which library functions are thread-safe. It's a bit like asking how to write C code which can execute in a hardware interrupt handler. "What's an interrupt?", you might very well ask, "and what things that I might do in C aren't valid in one?". The only answers are OS-specific.

You also should avoid static and global variables that can be modified avoiding synchronization code all over your module

It is a misconception that the pthreads library doesn't work on Windows. Check out sourceforge.net. I would recommend pthreads because it is cross-platform and its mutexes are way faster than e.g. the Windows builtin mutexes.

Write your own lock.
Since you're targeting PCs you're dealing with the x86 architecture which natively supplies all the multi-threading support you should need. Go over your code and identify any functions that have shared resources. Give each shared resource a 32-bit counter. Then using the interlocked operations that are implemented by the CPUs keep track of how many threads are using each shared resource and make any thread that wants to use a shared resource wait until the resource is released.
Here's a really good blog post about interlocked operations: Using Interlocked Instructions from C/C++
The author focuses mostly on using the Win32 Interlocked wrappers, but pretty much every operating system has their own wrappers for the interlocked operations, and you can always write the assembly (each of these operations is only one instruction).

If your goal is to be compatible on unix-like operating systems, I would use POSIX threading.
That being said, if you want to support windows as well, you'll need to have two code paths for this - pthreads on unix and Windows threads on Windows. It's fairly easy to just make your own "thread library" to wrap these.
There are quite a few that do this (like OpenThreads), but most of them I've used are C++, not C.

Using Posix threads sounds like a good idea to me (but I'm no expert). In particular, Posix has good primitives for ensuring mutual exclusion.
If you had to create a library without any dependencies, you would have to implement the mutual exclusion algorithms yourself, which is a bad idea.

"imagine I write a library in C. Further, imagine this library to be used from a multi-threaded environment. How do I make it thread-safe? More specific: How do I assure, that certain functions are executed only by one thread at a time?"
You can't -> write a thread-safe or better re-entrant functions.
Unless, You would like to write system-wide locks - a very bad idea.
"In opposite to Java or C# for example, C has no means to deal with threads/locks/etc."
This is a joke - right? Long before the Java and C# was developed, the locks were invented and widely used as an synchronization objects...
"I know, that operating systems support threads, but using their api would restrict the compatibility of my library very much."
The thing is, that such libraries already exists - f.e. wxWidgets, which are offering the portable wxThread... (but this is C++)
Anyway, there are 2 main "flavours" of C: the ANSI C and the GNU C -> two different worlds... pick one or the other.

Related

Thread-safe init of read-only global data

Let's imagine that I'm writing a library that has a reasonably large amount of read-only global data that needs to be initialized before the library can be used. For example, perhaps the global data be lookup tables for various parts of the application logic that won't change during the lifetime of the program.
Now I have a few ways to initialize this data:
I may require that the user call some kind of init() function before the library is used.
I may lazily construct the data the first time a function is called on my library.
I may include the data in a initializer statement in the source, such that variables are statically initialized to their final value.
Now if my data is read-only and should be the same for every environment the library runs in, then (3) is fairly appealing. Even in that case it has some downsides: if the data is very large (but easy to generate procedurally) the size of bloat up a lot (e.g., a library with 50K of code but 8MB of lookup tables would end up around 8050K). Similarly, the source itself may be very large, or the build system needs to handle the generation of the source at compile time.
The main reason you might not able to use (3) is that the tables might be fixed (read-only), but require generation at runtime because they embed some information about the environment (e.g., the value of an environment variable, I configuration setting read from a file, information about the machine architecture, whatever). This data can't be embedded in the source since depends on the runtime environment.
So we have methods (1) and (2) at least - but I can't see how to make these thread-safe in a simple way. The rest of the library can be thread-safe simply by not mutating any global state - just like the vast majority of C functions can be written in a thread-safe way w/o any explicit use of threading primitives.
I can't figure out a similar alternative for this global init, however:
(1) Is undesirable because we prefer not to require the user to call this method, and in any case it simply moves the problem up to the calling code: the calling code then needs to organize to call this init() method exactly once across all threads using the library, and before any thread uses the library.
(2) Fails since concurrent calls to the library might do a double init.
In C++ you can just initialize globals with a method call, like int data[] = loadData(). Is there any equivalent in C? Or am I stuck using threading primitives (which vary by platform, e.g., pthread_once, call_once and whatever Windows has) just to get my thread-safe init?

I don't know of any platform-independent way of initializing a library in a thread-safe manner. That's not surprising since there's no platform-independent threading model in C.
So your solution is going to be platform-specific.
#ThingyWotsit mentions in the comments using C++ to initialize your library, and that will be thread-safe. But it may very well lock you into a specific C++ run-time, so it may not be a useful solution for your C shared object/library. You may not be willing or able to add a dependency on C++ and you may especially not be willing or able to be locked into a specific C++ run-time.
For GCC, you can use the __attribute((constructor)) to have your iniitaliziation function called when the shared object is loaded:
constructor
destructor
constructor (priority)
destructor (priority)
The constructor attribute causes the function to be called automatically before execution enters main ().
Similarly, the destructor attribute causes the function to be called
automatically after main () has completed or exit () has been called.
Functions with these attributes are useful for initializing data that
will be used implicitly during the execution of the program.
You may provide an optional integer priority to control the order in
which constructor and destructor functions are run. A constructor with
a smaller priority number runs before a constructor with a larger
priority number; the opposite relationship holds for destructors. So,
if you have a constructor that allocates a resource and a destructor
that deallocates the same resource, both functions typically have the
same priority. The priorities for constructor and destructor functions
are the same as those specified for namespace-scope C++ objects (see
C++ Attributes).
For example:
static __attribute__((constructor)) void my_lib_init_func( void )
{
...
}
Your code will run before main() is called.
If your library is dynamically loaded (explicit call to dlopen(), for exmaple), your init function will be called when your library is loaded, and your library won't be considered loaded until it returns.
Other compilers provide the functionally-identical #pragma init():
#pragma init(my_lib_init_func)
static void my_lib_init_func( void )
{
...
}
See #pragma init and #pragma fini using gcc compiler on linux
For Windows? The Windows C++ run-time is pretty stable and ubiquitous. I'd just use a C++ solution on Windows, especially if you're compiling with MSVC. (But see the comments...)

Option 3 is always preferable when possible. Your reasoning about the cons is wrong. If you have an 8MB constant table in the executable file, it's directly mapped and shared by all instances of the program or users of the shared library on any remotely modern operating system. If you generate it at runtime, each process will have its own copy of the table.
When option 3 is not available you must use pthread_once or equivalent or implement your own version of the same (much less efficiently) using a lock. There is little reason to use weird OS-specific replacements for it; all major platforms either support POSIX threads API natively or have existing libraries which provide it on top of the platform's low-level primitives.

What is zalloc in embedded programming?

I am looking into programming the ESP8266 serial-wifi chip. In its SDK examples it makes extensive use of a function called os_zalloc where I would expect malloc.
Occasionally though, os_malloc is used as well. So they do not appear to be identical in function.
Unfortunately there is no documentation. Can anybody make an educated guess from the following header file?
#ifndef __MEM_H__
#define __MEM_H__
//void *pvPortMalloc( size_t xWantedSize );
//void vPortFree( void *pv );
//void *pvPortZalloc(size_t size);
#define os_malloc pvPortMalloc
#define os_free vPortFree
#define os_zalloc pvPortZalloc
#endif

Since os_zalloc is a macro, and the definition is given in mem.h, a better question to ask would be about what pvPortZalloc does.
Given the function names pvPortMalloc, vPortFree and pvPortZalloc it would appear that the OS in use is FreeRTOS (or it's commercially licensed equivalent OpenRTOS), which is documented - although not specifically pvPortZalloc, but it would be strange if it was not simply allocate and zero initialise - that is for example what it means here. The functions are part of the target porting layer for FreeRTOS, and are not normally called by the application level, but I imagine here the macro wrapper is used to access the porting layer code for application user rather than write it twice.
In an RTOS kernel RTOS aware dynamic memory allocation functions are required to ensure thread safety, although some standard library implementations include thread safety stubs that you implement using the RTOS mutex calls, which is a better method since existing libaries and C++ new/delete can be more easily used.

I would say "allocate memory and fill with zeros"

How to create a library which uses mutexes only if pthread is linked?

I'm creating a C library on Linux which has several functions, which together operate upon some global data. In order for these functions to be thread safe, they must employ mutexes at the appropriate points in the code.
In Linux, in order to use pthreads in an application, one needs to link in the appropriate library, -lpthread. In the case of my library once compiled, I'd like to make it work both if the user of it decided to use pthreads in their application, as well as if they don't.
In the case where a developer does not use threads in their application, they will not link against pthreads. Therefore I'd like my compiled library to not require it, and furthermore, employing mutexes in a single threaded application uses needless overhead (not to mention is silly).
Is there some kind of way to write code (with GCC extensions if necessary) that a certain block of code will only run if certain symbols were linked in? I'm aware I can use dlopen() and friends, but that in itself would require some of what I'm trying to avoid. I imagine what I'm looking for must exist, as several standard functions are in the same boat, and would require mutexes to be thread safe (and they are), but work even when not linked with pthreads.
On this point, I notice that FreeBSD's popen() function on line 66 & 67 employs a non portable check - isthreaded, to determine if threads are used or not, and whether to use mutexes. I doubt anything like that is standardized in any way. But more to the point such code can't compile and link if the symbols aren't recognized, which in Linux, the mutex symbols won't even be present if pthread is not linked.
To summarize: On Linux, how does one create a library, which knows when threads are also used, and if so, employs mutexes where appropriate, and does not require linking against pthreads, unless the application developer specifically wants to use threading somewhere?

After some testing, it seems that Linux already does what I want automatically! You only need to link against pthreads if you use threading, not if you just want pthread mutex support.
In this test case:
#include <stdio.h>
#include <errno.h>
#include <pthread.h>
int main()
{
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
if (!(errno = pthread_mutex_lock(&mutex))) { puts("Mutex locked!"); }
else { perror("Could not lock mutex"); }
if (!(errno = pthread_mutex_lock(&mutex))) { puts("Mutex locked!"); }
else { perror("Could not lock mutex"); }
return 0;
}
When compiling this without pthreads linked, I see "Mutex locked!" twice. Which indicates that pthread_mutex_lock() is essentially a non-op. But with pthreads linked, running this application will stall after the first time "Mutex locked!" is printed.
Therefore, I can use mutexes in my library where appropriate, and don't need to require pthreads to use, and no (signifigant?) overhead where it isn't needed.

The usual solutions are:
Use a #define switch to control at build time whether to call the pthreads functions or not, and have your build process create two versions of your library: one pthread-aware and one not, with different names. Rely on the user of your library to link against the correct one.
Don't call the pthreads functions directly, but instead call user-provided lock and unlock callbacks (and thread-local-storage too, if you need that). The library user is responsible for allocating and calling the appropriate locking mechanisms, which also allows them to use a non-pthreads threading library.
Do nothing at all, and merely document that user code should ensure that your library functions aren't entered at the same time from multiple threads.
glibc does something different again - it uses tricks with lazy binding symbols to call the pthreads functions only if they are linked into the binary. This isn't portable though, because it relies on specific details of the glibc implementation of pthreads. See the definition of __libc_maybe_call():
#ifdef __PIC__
# define __libc_maybe_call(FUNC, ARGS, ELSE) \
(__extension__ ({ __typeof (FUNC) *_fn = (FUNC); \
_fn != NULL ? (*_fn) ARGS : ELSE; }))
#else
# define __libc_maybe_call(FUNC, ARGS, ELSE) \
(FUNC != NULL ? FUNC ARGS : ELSE)
#endif

Where can I find the list of non-reentrant functions provided in gnu libc?

I am now porting an single-threaded library to support multi-threads, and I need the whole list of functions that use local static or global variables.
Any information is appreciated.

Check the manual page for each function you use ... the non-thread-safe ones will be identified as such, and the manual page will mention a thread safe version when there is one (e.g., readdir_r). You could extract the list by running a script over the man pages.
Edit: Although my answer has been accepted, I fear that it is inaccurate and possibly dangerous. For example, while strerror_r mentions that it is a thread safe version of strerror, strerror itself says nothing about thread safety ... what it says instead is "the string might be overwritten", which merely implies that it isn't thread-safe. So you need to search for at least "might be overwritten" as well as "thread", but there's no guarantee that even that will be complete.

Its always a good idea to know if a particular function is reentrant or not, but you must also consider the situation when you may call several reentrant functions from a shared piece of code from multiple threads, which could also lead to problems when using shared data.
So, if you have any data shared between threads, the data must be "protected" irregardless of the fact that the functions being called are reentrant.
Consider the following function:
void yourFunc(CommonObject *o)
{
/* This function is NOT thread safe */
reentrant_func1(o->propertyA);
reentrant_func2(o->propertyA);
}
If this function is not mutex protected, you will get undesired behavior in a multithreaded application, irregardless of the fact that func1 and func2 are reentrant.

Are there monitors in C?

I am reading synchronization chapter in Operating system and am reading the topic "Monitors". I understand that monitors are high level language constructs. This makes me wonder if C provides something like monitor? Perhaps the library containing posix threads implementation should provide the monitor construct as well. Also, threads in C are not part of stl, right?
if yes, which header file/library contains it, a most elementary test program to use monitors and how the library implements monitors.
The book says a monitor type is an ADT - abstract data types. I wonder, does a C structure simulate a monitor data type?
Thanks,

C has no notion of thread and doesn't provide monitors as syntactic structure.
the POSIX thread library is just a library. And C abstraction facilities are not powerful enough to allow monitors to be provided as library element. POSIX gives the primitive needed to build monitors.
STL is a C++ term (and not even a good one as it means different things for different people).
to implement a monitor in C, you'd need a structure whose content you keep private and has at least a mutex, and a set of functions operating on the struct which start by taking the mutex.

C doesn't even have support for threads, that's implementation specific. You'll need to use a library for your monitor.

You're right that threads are not part of the standard C library.
POSIX threads don't provide monitors specifically, but everything that you can do with a monitor, you can do with a mutex plus a condition variable. Or possibly two condition variables, depending exactly what kind of monitor you're interested in: http://en.wikipedia.org/wiki/Monitor_%28synchronization%29

Threads are only foreseen for the next version of the C standard, not the current one. The current proposal resembles very much the functionality of POSIX threads, and has e.g mutexes and conditional variables as control structures. AFAIR monitors are not among them.