Mixing OpenMP with pthreads - c

My question is whether is it a good idea to mix OpenMP with pthreads. Are there applications out there which combine these two. Is it a good practice to mix these two? Or typical applications normally just use one of the two.

Typically it's better to just use one or the other. But for myself at least, I do regularly mix the two and it's safe if it's done correctly.
The most common case I do this is where I have a lower-level library that is threaded using pthreads, but I'm calling it in a user application that uses OpenMP.
There are some cases where it isn't safe. If for example, you kill a pthread before you exit all OpenMP regions in that thread.

I don't think so..
Its not a good idea. See the thing is, OpenMP is basically made for portability. Now if u are using pthread, then you are loosing the very essence of it!
pthread could only be supported by POSIX compliant OS's. While OpenMP could be used virtually on any OS provided they have a support for it.
Anyway, OpenMP gives you an abstraction much higher than what is provided by pthead.

No problem.
The purpose of OpenMP and pthreads are different. OpenMP is perfect to write a loop-level parallelism. However, OpenMP is not adequate to express sophisticated thread communications and synchronizations. OpenMP does not support all kinds of synchronizations, such as condition variables.
The caveat would be, as Mystrical pointed out, handling and accessing native threads within OpenMP parallel constructs.
FYI, Intel's TBB and Cilk Plus are also often used in a mixed way.

On Windows and Linux it seems to work just fine. However, OpenMP does not work on a Mac if it is run in a new thread. It only works in the main thread.
It appears that the behavior of how to mix the two threading modules is not defined. Some platform/compilers support it, others do not.

Sure. I do it all the time. You have to be careful. Why do it, though? Because there are some instances in which you have to! In complicated tasking models, such as pipelined functions where you want to keep the pipe going, it may be the only way to take advantage of all the power available.

I find very hard that you would need to use pthreads if you already use OpenMP. You could use a section pragma to run procedures with different functions. I personally have used it to implement pipeline parallelism.
Nowadays OpenMP does much more than pthreads, so if you use OpenMP you are covered. For instance, GCC 5.0 forward implements OpenMP extensions that exports code to GPU. :D

Related

Task library for C?

Is there a task library for C? I'm talking about the parallel task library as it exists in C#, or Java. In other words, I need a layer of abstraction over pthread for Linux. Thanks.
Give a look at OpenMP.
In particular, you might be interested in the Task feature of OpenMP 3.0.
I suggest you, however, to try to see if your problem can be solved using other, "basic" constructs, such as parallel for, since they are simpler to use.
Probably the most widely-used parallel programming primitives aside from the Win32 ones are those provided by pthreads.
They are quite low-level but include everything you need to write an efficient blocking queue and so create a thread pool of workers that carry out a queue of asynchronous tasks.
There is also a Win32 implementation so you can use the same codebase on Windows as well as POSIX systems.
Many concepts in TPL (Task, Work-Stealing Scheduler,...) are inspired by a very successful project named Cilk at MIT. Their advanced framework (Cilk Plus) was acquired by Intel and integrated to Intel Parallel Building Block. You still can use Cilk as an open source project without some advanced features. The good news is Intel is releasing Cilk Plus as open source in GCC.
You should try out Cilk as it adds another layer of abstraction to C, which makes it easy to express parallel algorithms but it is close enough to C to ensure good performance.
I've been meaning to checking out libdispatch. Yeah it's built for OS X and blocks, but they have function interfaces as well. Haven't really had time to look at it yet though so not sure if it fills all your needs.
There is an academia project called Wool that implements work stealing scheduler in C (with significant help of C preprocessor AFAIK). Might be worth looking at, though it does not seem actively developed.

Is there any concurrency package for C-language?

I know Java and C# both have library package to support concurrency programming. Does anyone know whether or not there is library package for C? Thanks
Qt QThread
pthread
MPI (for computations on multiple computers)
(more)
At the lowest level, you have pthreads, which give you threads, locks, condition variables, etc. It's about as basic as you can get. If your program uses a framework, it might provide its own threading primitives so you don't have to use pthreads directly.
Qt Threading Support
Glib threads (used by GTK)
Boost threads (for C++)
Other packages provide higher-level concurrency operations that may be easier to reason about.
Intel Threading Building Blocks
OpenMP
MPI
QtConcurrent
There is OpenMP which is supported by compilers like icc, msvc and gcc (at least).

Pthreads vs. OpenMP

I'm creating a multi-threaded application in C using Linux.
I'm unsure whether I should use the POSIX thread API or the OpenMP API.
What are the pros & cons of using either?
Edit:
Could someone clarify whether both APIs create kernel-level or user-level threads?
Pthreads and OpenMP represent two totally different multiprocessing paradigms.
Pthreads is a very low-level API for working with threads. Thus, you have extremely fine-grained control over thread management (create/join/etc), mutexes, and so on. It's fairly bare-bones.
On the other hand, OpenMP is much higher level, is more portable and doesn't limit you to using C. It's also much more easily scaled than pthreads. One specific example of this is OpenMP's work-sharing constructs, which let you divide work across multiple threads with relative ease. (See also Wikipedia's pros and cons list.)
That said, you've really provided no detail about the specific program you're implementing, or how you plan on using it, so it's fairly impossible to recommend one API over the other.
If you use OpenMP, it can be as simple as adding a single pragma, and you'll be 90% of the way to properly multithreaded code with linear speedup. To get the same performance boost with pthreads takes a lot more work.
But as usual, you get more flexibility with pthreads.
Basically, it depends on what your application is. Do you have a trivially-parallelisable algorithm? Or do you just have lots of arbitrary tasks that you'd like to simultaneously? How much do the tasks need to talk to each other? How much synchronisation is required?
OpenMP has the advantages of being cross platform, and simpler for some operations. It handles threading in a different manner, in that it gives you higher level threading options, such as parallelization of loops, such as:
#pragma omp parallel for
for (i = 0; i < 500; i++)
arr[i] = 2 * i;
If this interests you, and if C++ is an option, I'd also recommend Threading Building Blocks.
Pthreads is a lower level API for generating threads and synchronization explicitly. In that respect, it provides more control.
It depends on 2 things- your code base and your place within it. The key questions are- 1) "Does you code base have threads, threadpools, and the control primitives (locks, events, etc.)" and 2) "Are you developing reusable libraries or ordinary apps?"
If your library has thread tools (almost always built on some flavor of PThread), USE THOSE. If you are a library developer, spend the time (if possible) to build them. It is worth it- you can put together much more fine-grained, advanced threading than OpenMP will give you.
Conversely, if you are pressed for time or just developing apps or something off of 3rd party tools, use OpenMP. You can wrap it in a few macros and get the basic parallelism you need.
In general, OpenMP is good enough for basic multi-threading. Once you start getting to the point that you're managing system resourced directly on building highly async code, its ease-of-use advantage gets crowded out by performance and interface issues.
Think of it this way. On a linux system, it is very highly likely that the OpenMP API itself uses pthreads to implement its features such as parallelism, barriers and locks/mutex. Having said that, there are good reasons to work directly with the pthreads API.
It is my opinion that -
You use OpenMP when -
Your program contains easy to spot for loops where each iteration is independent from other.
You want to retain the readability of the program. (Basically keep the parallelized program looking like the sequential version)
You want to stay away from the nitty-gritty details of how threads are spawned and controlled at a micro level.
And pthreads when -
You don't have easy to parallelize loops.
You have different tasks which need to be performed concurrently, you might be wanting to give different responsibilities to each of those tasks.
You want to control the flow of thread execution at a micro level.
Feel free to correct me in the comments.

GCC Atomic Builtins instead of pthread?

I've found following article: Use GCC-provided atomic lock operations to replace pthread_mutex_lock functions
It refers to GCC Atomic Builtins.
What the article suggest, is to use GCC atomic builtins instead of pthread synchronization tools.
Is this a good idea?
PS. The mysql post is obviously misleading. Atomic Builtins can't replace all pthread tools. For example, the locking requires, that if a lock can't be acquired, a thread has to wait. In other words, it asks the OS to wait, so that the wait is passive. Simple GCC builtin can't do that.
Is this a good idea?
Not if you ever intend to compile the code with something other than gcc. Is pthreads causing you any specific problems?
If you are already using pthread, and the pthread lock functions already do what you want, it is best to use the pthread lock functions.
These atomic builtins are just the building blocks for higher-level primitives; writing these higher-level primitives tends to be tricky, and any mistakes can cause errors which can take a long time to show up (since they usually depend on timing). If you already have a library with higher-level primitives which do what you want and are fast enough for your needs (and do not assume they are too slow just because you have to do a function call), it is best to not reinvent the wheel.
No. The GCC built-ins probably make good sense to the guys who write operating systems, libc, and maybe pthreads itself, but for your average application there is no reason not to use the pthreads approach.
And even if you always use GCC, some day you may want to run a static analysis tool which won't handle all the customer GCC extensions.
C Extensions
http://gcc.gnu.org/ml/libstdc++/2006-06/msg00089.html
Atomic+Builtins
http://sources.redhat.com/ml/libc-alpha/2005-06/msg00132.html
http://www.redi.uklinux.net/doc/c++/shared_ptr.html
atomic builtins make sense if you want to improve performance. Builtins allow you to minimize contention caused by the serialization of mutexes. When you use mutexes and create a critical session you serialize access to that section of your code; in performance code you may want to try to avoid contention by using thread-specific data and when not possible using atomics. Last case is locking and when locking, minimize the time during which the lock is held (using messaging and double-checked locking though some claim it doesn't work -- works for me).

Portable thread-safety in C?

Purpose
I'm writing a small library for which portability is the biggest concern. It has been designed to assume only a mostly-compliant C90 (ISO/IEC 9899:1990) environment... nothing more. The set of functions provided by the library all operate (read/write) on an internal data structure. I've considered some other design alternatives, but nothing else seems feasible for what the library is trying to achieve.
Question
Are there any portable algorithms, techniques, or incantations which can be used to ensure thread-safety? I am not concerned with making the functions re-entrant. Moreover, I am not concerned with speed or (possibly) wasting resources if the algorithm/technique/incantation is portable. Ideally, I don't want to depend on any libraries (such as GNU Pth) or system-specific operations (like atomic test-and-set).
I have considered modifying Lamport's bakery algorithm, but I do not know how to alter it to work inside of the functions called by the threads instead of working in the threads themselves.
Any help is greatly appreciated.
Without OS/hardware support, at least an atomic CAS, there's nothing you can do that's practical. There are portable libraries that abstract various platforms into a common interface, though.
http://www.gnu.org/software/pth/related.html
Almost all systems (even Windows) can run libpthread these days.
Lamport's bakery algorithm would probably work; unfortunately, there are still practical problems with it. Specifically, many CPUs implement out-of-order memory operations: even if you've compiled your code into a perfectly correct instruction sequence, the CPU, when executing your code, may decide to reorder the instructions on the fly to achieve better performance. The only way to get around this is to use memory barriers, which are highly system- and CPU-specific.
You really only have two choices here: either (1) keep your library thread-unsafe and make your users aware of that in the documentation, or (2) use a platform-specific mutex. Option 2 can be made easier by using another library that implements mutexes for a large variety of platforms and provides you with a unified, abstract interface.
Functions either cannot be thread safe or are innately thread safe, depending on how you want to look at it. And threading/locking is innately platform specific. Really, it is up to users of your library to handle the the threading issues.

Resources