I'm creating a multi-threaded application in C using Linux.
I'm unsure whether I should use the POSIX thread API or the OpenMP API.
What are the pros & cons of using either?
Edit:
Could someone clarify whether both APIs create kernel-level or user-level threads?
Pthreads and OpenMP represent two totally different multiprocessing paradigms.
Pthreads is a very low-level API for working with threads. Thus, you have extremely fine-grained control over thread management (create/join/etc), mutexes, and so on. It's fairly bare-bones.
On the other hand, OpenMP is much higher level, is more portable and doesn't limit you to using C. It's also much more easily scaled than pthreads. One specific example of this is OpenMP's work-sharing constructs, which let you divide work across multiple threads with relative ease. (See also Wikipedia's pros and cons list.)
That said, you've really provided no detail about the specific program you're implementing, or how you plan on using it, so it's fairly impossible to recommend one API over the other.
If you use OpenMP, it can be as simple as adding a single pragma, and you'll be 90% of the way to properly multithreaded code with linear speedup. To get the same performance boost with pthreads takes a lot more work.
But as usual, you get more flexibility with pthreads.
Basically, it depends on what your application is. Do you have a trivially-parallelisable algorithm? Or do you just have lots of arbitrary tasks that you'd like to simultaneously? How much do the tasks need to talk to each other? How much synchronisation is required?
OpenMP has the advantages of being cross platform, and simpler for some operations. It handles threading in a different manner, in that it gives you higher level threading options, such as parallelization of loops, such as:
#pragma omp parallel for
for (i = 0; i < 500; i++)
arr[i] = 2 * i;
If this interests you, and if C++ is an option, I'd also recommend Threading Building Blocks.
Pthreads is a lower level API for generating threads and synchronization explicitly. In that respect, it provides more control.
It depends on 2 things- your code base and your place within it. The key questions are- 1) "Does you code base have threads, threadpools, and the control primitives (locks, events, etc.)" and 2) "Are you developing reusable libraries or ordinary apps?"
If your library has thread tools (almost always built on some flavor of PThread), USE THOSE. If you are a library developer, spend the time (if possible) to build them. It is worth it- you can put together much more fine-grained, advanced threading than OpenMP will give you.
Conversely, if you are pressed for time or just developing apps or something off of 3rd party tools, use OpenMP. You can wrap it in a few macros and get the basic parallelism you need.
In general, OpenMP is good enough for basic multi-threading. Once you start getting to the point that you're managing system resourced directly on building highly async code, its ease-of-use advantage gets crowded out by performance and interface issues.
Think of it this way. On a linux system, it is very highly likely that the OpenMP API itself uses pthreads to implement its features such as parallelism, barriers and locks/mutex. Having said that, there are good reasons to work directly with the pthreads API.
It is my opinion that -
You use OpenMP when -
Your program contains easy to spot for loops where each iteration is independent from other.
You want to retain the readability of the program. (Basically keep the parallelized program looking like the sequential version)
You want to stay away from the nitty-gritty details of how threads are spawned and controlled at a micro level.
And pthreads when -
You don't have easy to parallelize loops.
You have different tasks which need to be performed concurrently, you might be wanting to give different responsibilities to each of those tasks.
You want to control the flow of thread execution at a micro level.
Feel free to correct me in the comments.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Say you wanted to write your own version of Opencl from scratch in C. How would you go about doing it? How does OpenCL accomplish parallel programming "under the hood"? Is it just pthreads?
OpenCL covers much functionality, including a runtime API library, a programming language based on C, a library environment for that language, and likely a loader library for supporting multiple implementations. If you want to look at an open source example of how it could be implemented, Pocl, Clover, Beignet and ROCm exist. At least Pocl's CPU target does indeed use pthreads, but OpenCL is designed to support offloading tasks to coprocessors such as GPUs, as well as using vector operations, so one thread does not necessarily run one work item.
The title does not refer to OpenCL, but does request to use "standard" libraries. The great thing about standards is that there are so many to choose from; for instance, the C standard provides no multithreading and no guarantee of multitasking. Multiprocessing frequently refers to running in multiple processes (in e.g. CPython, this is the only way to get concurrent execution of Python code because of the global interpreter lock). That can be done with the Unix standard function fork. Multithreading can be done using POSIX threads (POSIX.1c standard extension) or OpenMP. Recent versions of OpenMP also support accelerator offloading, which is what OpenCL was designed for. Since OpenMP and OpenCL provide restricted and abstracted environments, they could in principle be implemented on top of many of the others, for instance CUDA.
Implementing parallel execution itself requires hardware knowledge and access, and is typically the domain of the operating system; POSIX threads is often an abstraction layer on this, using e.g. clone on Linux.
OpenMP is frequently the easiest way to convert a C program to parallel execution, as it is supported by many compilers; you annotate branching points using pragmas and compile with e.g. -fopenmp for GCC. Such programs will still work as before if compiled without OpenMP.
First off: OpenCL != parallel processing. That is one of its strengths, but there's a lot more to it.
Focusing on one part of your question:
Say you wanted to write your own version of Opencl from scratch in C.
For one: get familiar with driver development. Our GPU CL runtime is pretty intimately involved with the drivers. If you want to start from scratch, you're going to need to get very familiar with the PCIe protocols and dig up some memories about toggling pins. This is doable, but it exemplifies "nontrivial."
Multithreading at the CPU level is an entirely different matter that's been documented out the yin-yang. The great thing about using an OS that you didn't have to write yourself is that this is already handled for you.
Is it just pthreads?
How do you think those are implemented? Their functionality is part of the spec, but their implementation is entirely platform-dependent, which you may call "non standard." The underlying implementation of a thread depends on the OS (if there is one, which is not a given), compiler, and a ton of other factors.
This is a great question.
I am a beginner C/C++ programmer first of all, but I am curious about it.
My question is more theoretical.
I heard that C does not have explicit multithreading (MT) support, however there are libraries which implement this. I found "process.h" header which has to be included for building MT programs, but the thing I don't understand is how the MT itself works.
I know there are threads in CPU (assume it's single core for simplicity) running and there is only one thread per moment. The CPU is switching between threads really fast so that user sees it as a simultaneous work (correct me if not).
But - what really happens when I write the following
beginthread( Thread, 0, NULL ) //or whatever function/class method we use
keeping in mind that C does not have MT support. I mean, how does code tell the PC to run two functions multithreaded while it is not possible by the language explicit methods? I guess there is some "cheat" inside library related to "process.h", but what is that cheat, I can't just find on the web.
To be more specific - I am not asking about how to use MT, but how is it build?
Sorry if was answered earlier, or question is too complicated :)
UPD:
Imagine we have C language. It has functions, variables, pointers etc. I dont know any "special" function type that can run concurrently with other. Unless there are calls to some other functions from it. But then the caller function stops and waits?
Is it so that when I run MT applications, there is a special "global" function that calls my f1() and f2() repeatedly which looks like they were simultaneously working?
First of all, C11 does actually add multithreading support to the standard, so the premise that C does not support multithreading is no longer entirely correct.
However, I'm assuming your question is more to do with how can multithreading be implemented by a C library when standard C does(/did) not provide the necessary tools. The answer lies in the word “standard” – compilers and platforms can provide additional functionality beyond that required by the standard. Using such extra features makes the program/library less portable (i.e., more is required than is specified in the C standard), but the language and function call semantics can still be C.
Perhaps it is helpful to consider a standard library function such as fopen – somewhere inside that function code must eventually be called which could not be written in standard C, i.e., the implementation of the standard library itself must also rely on platform-specific code to access operating system functionality such as the file system. Every implementation of the standard library must thus implement the non-portable parts in a way specific to that platform (this is kind of the point of having a standard library instead of all code being platform-specific). But likewise a multithreading library can be implemented with non-standard features provided by that platform, but using such a library makes the code portable only to the platforms for which the same (or compatible) multithreading library is available.
As for how multithreading itself works, it is certainly outside the scope of what can be answered here, but as a simplified conceptual model on a single processor core, you can imagine the operating system managing “concurrent” processes by running one process for a short time, interrupting it, saving its state (current instruction, registers, etc), loading the saved state of another process, and repeating this. This gives the illusion of concurrent execution though in actual fact it is switching rapidly between different processes. On multi-core systems the execution on different cores can actually be concurrent, but there are typically more processes than there are cores, so this kind of switching will still happen on individual cores. Things are further complicated by processes waiting for something (I/O, another process, a timer, etc). Perhaps it suffice to say that the scheduler is a piece of software managing all of this inside the operating system and the multithreading library communicates with it.
(Note that there are many different ways to implement multithreading and multitasking, and statements in the above paragraph do not apply to all of them.)
It's platform specific. On Windows it eventually goes down to NtCreateThread which uses the assembly instruction syscall to call the operating system. So you can qualify it as a cheat.
On Linux it's the same, just the function with the syscall is called clone instead.
Is there a task library for C? I'm talking about the parallel task library as it exists in C#, or Java. In other words, I need a layer of abstraction over pthread for Linux. Thanks.
Give a look at OpenMP.
In particular, you might be interested in the Task feature of OpenMP 3.0.
I suggest you, however, to try to see if your problem can be solved using other, "basic" constructs, such as parallel for, since they are simpler to use.
Probably the most widely-used parallel programming primitives aside from the Win32 ones are those provided by pthreads.
They are quite low-level but include everything you need to write an efficient blocking queue and so create a thread pool of workers that carry out a queue of asynchronous tasks.
There is also a Win32 implementation so you can use the same codebase on Windows as well as POSIX systems.
Many concepts in TPL (Task, Work-Stealing Scheduler,...) are inspired by a very successful project named Cilk at MIT. Their advanced framework (Cilk Plus) was acquired by Intel and integrated to Intel Parallel Building Block. You still can use Cilk as an open source project without some advanced features. The good news is Intel is releasing Cilk Plus as open source in GCC.
You should try out Cilk as it adds another layer of abstraction to C, which makes it easy to express parallel algorithms but it is close enough to C to ensure good performance.
I've been meaning to checking out libdispatch. Yeah it's built for OS X and blocks, but they have function interfaces as well. Haven't really had time to look at it yet though so not sure if it fills all your needs.
There is an academia project called Wool that implements work stealing scheduler in C (with significant help of C preprocessor AFAIK). Might be worth looking at, though it does not seem actively developed.
My question is whether is it a good idea to mix OpenMP with pthreads. Are there applications out there which combine these two. Is it a good practice to mix these two? Or typical applications normally just use one of the two.
Typically it's better to just use one or the other. But for myself at least, I do regularly mix the two and it's safe if it's done correctly.
The most common case I do this is where I have a lower-level library that is threaded using pthreads, but I'm calling it in a user application that uses OpenMP.
There are some cases where it isn't safe. If for example, you kill a pthread before you exit all OpenMP regions in that thread.
I don't think so..
Its not a good idea. See the thing is, OpenMP is basically made for portability. Now if u are using pthread, then you are loosing the very essence of it!
pthread could only be supported by POSIX compliant OS's. While OpenMP could be used virtually on any OS provided they have a support for it.
Anyway, OpenMP gives you an abstraction much higher than what is provided by pthead.
No problem.
The purpose of OpenMP and pthreads are different. OpenMP is perfect to write a loop-level parallelism. However, OpenMP is not adequate to express sophisticated thread communications and synchronizations. OpenMP does not support all kinds of synchronizations, such as condition variables.
The caveat would be, as Mystrical pointed out, handling and accessing native threads within OpenMP parallel constructs.
FYI, Intel's TBB and Cilk Plus are also often used in a mixed way.
On Windows and Linux it seems to work just fine. However, OpenMP does not work on a Mac if it is run in a new thread. It only works in the main thread.
It appears that the behavior of how to mix the two threading modules is not defined. Some platform/compilers support it, others do not.
Sure. I do it all the time. You have to be careful. Why do it, though? Because there are some instances in which you have to! In complicated tasking models, such as pipelined functions where you want to keep the pipe going, it may be the only way to take advantage of all the power available.
I find very hard that you would need to use pthreads if you already use OpenMP. You could use a section pragma to run procedures with different functions. I personally have used it to implement pipeline parallelism.
Nowadays OpenMP does much more than pthreads, so if you use OpenMP you are covered. For instance, GCC 5.0 forward implements OpenMP extensions that exports code to GPU. :D
Purpose
I'm writing a small library for which portability is the biggest concern. It has been designed to assume only a mostly-compliant C90 (ISO/IEC 9899:1990) environment... nothing more. The set of functions provided by the library all operate (read/write) on an internal data structure. I've considered some other design alternatives, but nothing else seems feasible for what the library is trying to achieve.
Question
Are there any portable algorithms, techniques, or incantations which can be used to ensure thread-safety? I am not concerned with making the functions re-entrant. Moreover, I am not concerned with speed or (possibly) wasting resources if the algorithm/technique/incantation is portable. Ideally, I don't want to depend on any libraries (such as GNU Pth) or system-specific operations (like atomic test-and-set).
I have considered modifying Lamport's bakery algorithm, but I do not know how to alter it to work inside of the functions called by the threads instead of working in the threads themselves.
Any help is greatly appreciated.
Without OS/hardware support, at least an atomic CAS, there's nothing you can do that's practical. There are portable libraries that abstract various platforms into a common interface, though.
http://www.gnu.org/software/pth/related.html
Almost all systems (even Windows) can run libpthread these days.
Lamport's bakery algorithm would probably work; unfortunately, there are still practical problems with it. Specifically, many CPUs implement out-of-order memory operations: even if you've compiled your code into a perfectly correct instruction sequence, the CPU, when executing your code, may decide to reorder the instructions on the fly to achieve better performance. The only way to get around this is to use memory barriers, which are highly system- and CPU-specific.
You really only have two choices here: either (1) keep your library thread-unsafe and make your users aware of that in the documentation, or (2) use a platform-specific mutex. Option 2 can be made easier by using another library that implements mutexes for a large variety of platforms and provides you with a unified, abstract interface.
Functions either cannot be thread safe or are innately thread safe, depending on how you want to look at it. And threading/locking is innately platform specific. Really, it is up to users of your library to handle the the threading issues.