Portable thread-safety in C?

Portable thread-safety in C? - c

Purpose
I'm writing a small library for which portability is the biggest concern. It has been designed to assume only a mostly-compliant C90 (ISO/IEC 9899:1990) environment... nothing more. The set of functions provided by the library all operate (read/write) on an internal data structure. I've considered some other design alternatives, but nothing else seems feasible for what the library is trying to achieve.
Question
Are there any portable algorithms, techniques, or incantations which can be used to ensure thread-safety? I am not concerned with making the functions re-entrant. Moreover, I am not concerned with speed or (possibly) wasting resources if the algorithm/technique/incantation is portable. Ideally, I don't want to depend on any libraries (such as GNU Pth) or system-specific operations (like atomic test-and-set).
I have considered modifying Lamport's bakery algorithm, but I do not know how to alter it to work inside of the functions called by the threads instead of working in the threads themselves.
Any help is greatly appreciated.

Without OS/hardware support, at least an atomic CAS, there's nothing you can do that's practical. There are portable libraries that abstract various platforms into a common interface, though.
http://www.gnu.org/software/pth/related.html

Almost all systems (even Windows) can run libpthread these days.

Lamport's bakery algorithm would probably work; unfortunately, there are still practical problems with it. Specifically, many CPUs implement out-of-order memory operations: even if you've compiled your code into a perfectly correct instruction sequence, the CPU, when executing your code, may decide to reorder the instructions on the fly to achieve better performance. The only way to get around this is to use memory barriers, which are highly system- and CPU-specific.
You really only have two choices here: either (1) keep your library thread-unsafe and make your users aware of that in the documentation, or (2) use a platform-specific mutex. Option 2 can be made easier by using another library that implements mutexes for a large variety of platforms and provides you with a unified, abstract interface.

Functions either cannot be thread safe or are innately thread safe, depending on how you want to look at it. And threading/locking is innately platform specific. Really, it is up to users of your library to handle the the threading issues.

Related

What would be a simple to use JIT library?

I'm trying to write a language runtime (and a language itself) that is similar to .NET or to the JVM. It's got a form of bytecode that is custom.
What I want is a way to translate said bytecode to actual, runnable machine code. So, because I'm not wanting to write such a translator myself (this is more of a toy project/personal side project) I want to find a good JIT library to use.
Here's what I want the library to do:
The library should be as easy to use as possible (toy project and I don't really have much experience here)
The library should support at least x86_64 (development machine), though preferably it should cover other architectures as well
The library should preferably do some low level optimizations (register tracking and allocation, reducing memory accesses etc); those optimizations shouldn't be very expensive to do though (I will myself do other optimizations to e.g. remove virtual calls and convert them to direct ones, for example). I can accept a library with no optimization if it's easiest to use though.
The library must have an interface that is usable from C (preferred) or C++ (acceptable).
I will use Boehm GC for garbage collection, if it matters (probably doesn't, but just in case). Maybe a compacting GC would be nice, but I guess I shouldn't combine the questions...

I would suggest llvm. There are some great tutorials on how to implement your own language with it and basic stuff is not too complicated. You also get the option to do a lot of more advanced stuff later on. As a bonus not only can you use JIT but you can also statically compile and optimize your binaries. LLVM also does have a C interface and can target all common CPU architectures and even a lot of more obscure ones.

Multithreading implementation itself in C

I am a beginner C/C++ programmer first of all, but I am curious about it.
My question is more theoretical.
I heard that C does not have explicit multithreading (MT) support, however there are libraries which implement this. I found "process.h" header which has to be included for building MT programs, but the thing I don't understand is how the MT itself works.
I know there are threads in CPU (assume it's single core for simplicity) running and there is only one thread per moment. The CPU is switching between threads really fast so that user sees it as a simultaneous work (correct me if not).
But - what really happens when I write the following
beginthread( Thread, 0, NULL ) //or whatever function/class method we use
keeping in mind that C does not have MT support. I mean, how does code tell the PC to run two functions multithreaded while it is not possible by the language explicit methods? I guess there is some "cheat" inside library related to "process.h", but what is that cheat, I can't just find on the web.
To be more specific - I am not asking about how to use MT, but how is it build?
Sorry if was answered earlier, or question is too complicated :)
UPD:
Imagine we have C language. It has functions, variables, pointers etc. I dont know any "special" function type that can run concurrently with other. Unless there are calls to some other functions from it. But then the caller function stops and waits?
Is it so that when I run MT applications, there is a special "global" function that calls my f1() and f2() repeatedly which looks like they were simultaneously working?

First of all, C11 does actually add multithreading support to the standard, so the premise that C does not support multithreading is no longer entirely correct.
However, I'm assuming your question is more to do with how can multithreading be implemented by a C library when standard C does(/did) not provide the necessary tools. The answer lies in the word “standard” – compilers and platforms can provide additional functionality beyond that required by the standard. Using such extra features makes the program/library less portable (i.e., more is required than is specified in the C standard), but the language and function call semantics can still be C.
Perhaps it is helpful to consider a standard library function such as fopen – somewhere inside that function code must eventually be called which could not be written in standard C, i.e., the implementation of the standard library itself must also rely on platform-specific code to access operating system functionality such as the file system. Every implementation of the standard library must thus implement the non-portable parts in a way specific to that platform (this is kind of the point of having a standard library instead of all code being platform-specific). But likewise a multithreading library can be implemented with non-standard features provided by that platform, but using such a library makes the code portable only to the platforms for which the same (or compatible) multithreading library is available.
As for how multithreading itself works, it is certainly outside the scope of what can be answered here, but as a simplified conceptual model on a single processor core, you can imagine the operating system managing “concurrent” processes by running one process for a short time, interrupting it, saving its state (current instruction, registers, etc), loading the saved state of another process, and repeating this. This gives the illusion of concurrent execution though in actual fact it is switching rapidly between different processes. On multi-core systems the execution on different cores can actually be concurrent, but there are typically more processes than there are cores, so this kind of switching will still happen on individual cores. Things are further complicated by processes waiting for something (I/O, another process, a timer, etc). Perhaps it suffice to say that the scheduler is a piece of software managing all of this inside the operating system and the multithreading library communicates with it.
(Note that there are many different ways to implement multithreading and multitasking, and statements in the above paragraph do not apply to all of them.)

It's platform specific. On Windows it eventually goes down to NtCreateThread which uses the assembly instruction syscall to call the operating system. So you can qualify it as a cheat.
On Linux it's the same, just the function with the syscall is called clone instead.

Disable __thread support

I'm implementing a pthread replacement library which is extremely lightweight. There are several reasons why I want to disable __thread completely.
It's a waste of memory. If I'm creating a thousand threads which has nothing to do with the context that declares a variable with __thread they will still allocate the program will still have allocated 1000*the size of that data bytes and never use it. It's simply not memory compatible with the mass-concurrency model. If we need extremely lightweight fibers with only 8K of stack, a TLS block of just 4K would be an overhead of 50% of the memory used by each thread. In some scenarios the TLS overhead would be enormous.
TLS is a complex standard and I simply don't have the time/resources to support it. It's to expensive. Personally I think the standard is poorly designed. It should have defined standard functions that had to be provided by the linker so thread libraries can take control over where TLS allocation takes place and insert relevant offsets and addresses it requires. Also the standard ELF implementation has been infected with pthread, expecting pthread sized structs to calculate offsets making it really hard to adapt to something else.
It's just a bad pattern. It encourages using globals and creating functions with static functions/side effects. This is not a territory we want to be in if we're creating correct programs that are easy to analyze.
If we really need "thread context" for some magic that tracks thread state behind the scenes (like for allocation or cancellation tracking) why not just expose the magic that TLS uses to understand that context in the first place? Personally I'm just going to use the %fs register directly. This would not be possible in libraries for obvious reasons but why should they be thread aware to begin with? Why not just design them correctly so they get the context related data they need in the first place right in the argument list?
My question is simply: What is the simplest way to disable __thread support and make clang emit errors if you accidentally used it? How can I get errors if I load a dynamic library which happens to require TLS?

I believe the simplest way is adding something like this unconditionally to your CFLAGS (perhaps from clang's equivalent of the gcc specfile if you want it to be system-global):
-D__thread='^-^'
where the righthand side can be anything that's syntactically invalid (a constraint violation) at any point in a C program.
As for preventing loading of libraries with TLS, you'd have to patch the linker and/or dynamic linker to reject them. If you're just talking about dlopen, your program could first read the file and parse the ELF headers for TLS relocations, then reject the library (without passing it to dlopen) if it has any. This might even be possible with an LD_PRELOAD wrapper.
I agree with you that, especially in its current implementation, TLS is something whose use should generally be avoided, but may I ask if you've measured the costs? I think stamping it out completely will be fairly difficult on a system that's designed to use it, and there's much lower-hanging fruit for cutting bloat. Which libc are you using? If it's glibc, I'm pretty sure glibc has lots of TLS it uses internally itself these days... Of course if you're writing your own threads implementation, that's going to entail a lot of interaction with the rest of the standard library, so perhaps you're already patching it out...?
By the way (shameless plug follows), we have an extremely light-weight threads implementation in musl libc which presently does not have TLS. I don't think it would be easy to integrate with another libc (and I'm sure if you're writing your own you will find it difficult to integrate with glibc, especially glibc's dynamic linker, which expects TLS to be supported) but if you can use the whole library as-is, it may meet your needs for specific projects, or have useful code you can borrow (license is MIT).

Mixing OpenMP with pthreads

My question is whether is it a good idea to mix OpenMP with pthreads. Are there applications out there which combine these two. Is it a good practice to mix these two? Or typical applications normally just use one of the two.

Typically it's better to just use one or the other. But for myself at least, I do regularly mix the two and it's safe if it's done correctly.
The most common case I do this is where I have a lower-level library that is threaded using pthreads, but I'm calling it in a user application that uses OpenMP.
There are some cases where it isn't safe. If for example, you kill a pthread before you exit all OpenMP regions in that thread.

I don't think so..
Its not a good idea. See the thing is, OpenMP is basically made for portability. Now if u are using pthread, then you are loosing the very essence of it!
pthread could only be supported by POSIX compliant OS's. While OpenMP could be used virtually on any OS provided they have a support for it.
Anyway, OpenMP gives you an abstraction much higher than what is provided by pthead.

No problem.
The purpose of OpenMP and pthreads are different. OpenMP is perfect to write a loop-level parallelism. However, OpenMP is not adequate to express sophisticated thread communications and synchronizations. OpenMP does not support all kinds of synchronizations, such as condition variables.
The caveat would be, as Mystrical pointed out, handling and accessing native threads within OpenMP parallel constructs.
FYI, Intel's TBB and Cilk Plus are also often used in a mixed way.

On Windows and Linux it seems to work just fine. However, OpenMP does not work on a Mac if it is run in a new thread. It only works in the main thread.
It appears that the behavior of how to mix the two threading modules is not defined. Some platform/compilers support it, others do not.

Sure. I do it all the time. You have to be careful. Why do it, though? Because there are some instances in which you have to! In complicated tasking models, such as pipelined functions where you want to keep the pipe going, it may be the only way to take advantage of all the power available.

I find very hard that you would need to use pthreads if you already use OpenMP. You could use a section pragma to run procedures with different functions. I personally have used it to implement pipeline parallelism.
Nowadays OpenMP does much more than pthreads, so if you use OpenMP you are covered. For instance, GCC 5.0 forward implements OpenMP extensions that exports code to GPU. :D

What's the difference between "C system calls" and "C library routines"?

There are multiple sections in the manpages. Two of them are:
2 Unix and C system calls
3 C Library routines for C programs
For example there is getmntinfo(3) and getfsstat(2), both look like they do the same thing. When should one use which and what is the difference?

System calls are operating system functions, like on UNIX, the malloc() function is built on top of the sbrk() system call (for resizing process memory space).
Libraries are just application code that's not part of the operating system and will often be available on more than one OS. They're basically the same as function calls within your own program.
The line can be a little blurry but just view system calls as kernel-level functionality.

Libraries of common functions are built on top of the system call interface, but applications are free to use both.
System calls are like authentication keys which have the access to use kernel resources.
Above image is from Advanced Linux programming and helps to understand how the user apps interact with kernel.

System calls are the interface between user-level code and the kernel. C Library routines are library calls like any other, they just happen to be really commonly provided (pretty much universally). A lot of standard library routines are wrappers (thin or otherwise) around system calls, which does tend to blur the line a bit.
As to which one to use, as a general rule, use the one that best suits your needs.

The calls described in section 2 of the manual are all relatively thin wrappers around actual calls to system services that trap to the kernel. The C standard library routines described in section 3 of the manual are client-side library functions that may or may not actually use system calls.
This posting has a description of system calls and trapping to the kernel (in a slightly different context) and explains the underlying mechanism behind system calls with some references.

As a general rule, you should always use the C library version. They often have wrappers that handle esoteric things like restarts on a signal (if you have requested that). This is especially true if you have already linked with the library. All rules have reasons to be broken. Reasons to use the direct calls,
You want to be libc agnostic; Maybe with an installer. Such code could run on Android (bionic), uClibc, and more traditional glibc/eglibc systems, regardless of the library used. Also, dynamic loading with wrappers to make a run-time glibc/bionic layer allowing a dual Android/Linux binary.
You need extreme performance. Although this is probably rare and most likely misguided. Probably rethinking the problem will give better performance benefits and not calling the system is often a performance win, which the libc can occasionally do.
You are writing some initramfs or init code without a library; to create a smaller image or boot faster.
You are testing a new kernel/platform and don't want to complicate life with a full blown file system; very similar to the initramfs.
You wish to do something very quickly on program startup, but eventually want to use the libc routines.
To avoid a known bug in the libc.
The functionality is not available through libc.
Sorry, most of the examples are Linux specific, but the rationals should apply to other Unix variants. The last item is quite common when new features are introduced into a kernel. For example when kqueue or epoll where first introduced, there was no libc to support them. This may also happen if the system has an older library, but a newer kernel and you wish to use this functionality.
If your process hasn't used the libc, then most likely something in the system will have. By coding your own variants, you can negate the cache by providing two paths to the same end goal. Also, Unix
will share the code pages between processes. Generally there is no reason not to use the libc version.
Other answers have already done a stellar job on the difference between libc and system calls.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight