OpenMP, multithreading or multiprocessing (C)? - c

I'm having some trouble understanding how OpenMP works. I know that it executes tasks in parallel and that it's a multi-processing tool, but what does it mean?
It uses 'threads' but at the same time it's a multi-processing tool? Aren't the two mutually exclusive, you use one method but not the other? Can you help explain which one it is?
To clarify, I've only worked with multi-threading with POSIX pthreads. And that's totally different from multiprocessing with fork and exec and shared memory.
Thank you.

OpenMP was developed to allow for an abstraction layer for parallel architectures utlizing multi-threading and shared memory so you don't have to write often used parallel code from scratch. Note, in general threads still have access to shared memory (the master thread's memory allocated). It takes advantage of multiple processors, but uses threads.
MPI is its counterpart for distributed systems. This might be more of the traditional "multi-processing" version you are thinking of, since all the "ranks" operate independently of eachother without shared memory, and must communicate through concepts such as scatter/map/reduce etc.

OpenMP is a used for multithreading. I go pretty in depth on how to use OpenMP and the pitfalls:
http://austingwalters.com/the-cache-and-multithreading/
It works very similar to the POSIX pthreads, except no fuss. It was developed to be incorporated into code that was already developed and then recompiled with an appropriate compiler (g++, clang/llvm will not work currently). If you clicked on my link above you'll note that a thread enables multiprocessing since it can be executed on any of the processors available.
Meaning if you have a single core, threads would could still execute faster since your processor shares time amongst all the programs. If you have multiple processors you and multiple threads the threads can be accessed from different processors simultaneously and therefore execute even faster.
Further OpenMP allows shared (and unshared memory), depending on the implementation and I believe you can use OpenMP with POSIX threading as well, though you will not gain any advantages if the pthreads were used correctly.
Below is a link to an excellent guide to OpenMP:
http://bisqwit.iki.fi/story/howto/openmp/

Related

Is there any benefit to using a barrier over a semaphore?

Preface:
I'm maintaining some code in a library that currently uses a cross-platform implementation of semaphores to sync a few threads one time at the beginning of the program. The semaphore implementation is a thin wrapper around the pthread library in linux and around winbase's semaphore calls in Windows. This platform agnostic code only needs to operate on those two systems.
My conundrum:
I would like to switch to a barrier implementation, because that's all the semaphores are being used for in this library anyway. However, in order to add this functionality, I would have to add similar platform agnostic code for a barrier. Since Windows's barrier synchronization barrier API is quite different from it's other thread-related code (mutex and semaphores), it would be a fair bit of work to translate the Windows sync code into the platform agnostic version. I would like to make the change to barriers, but if there is no benefit to using barriers then I see no reason to go through the hassle of making a new implementation for a library that already works with semaphores.
Question:
Is there any performance benefit (or other benefit) that using a synchronization barrier would give over using a plain old semaphore implementation?

How does N<->1 threading model work?

In continuation to question, This is an additional query on N-1 threading model.
It is taught that, before designing an application, selection of threading model need to be taken care.
In N-1 threading model, a single kernel thread is available to work on behalf of each user process. OS scheduler gives a single CPU time slice to this kernel thread.
In user space, programmer would use either POSIX pthread or Windows CreateThread() to spawn multiple threads within a user process. As the programmer used POSIX pthread or Windows CreateThread() the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. SO, that means every user thread will get a kernel thread.
My question:
So, How does N-1 threading model looks possible to exist? It would be 1-1 threading model. Please clarify.
In user space, programmer would use either POSIX pthread or Windows CreateThread() to spawn multiple threads within a user process. As the programmer used POSIX pthread or Windows CreateThread() the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. SO, that means every user thread will get a kernel thread.
That's how 1-to-1 threading works.
This doesn't have to be the case. A platform can implement pthread_create, CreateThread, or whatever other "create a thread" function it offers that does whatever it wants.
My question:
So, How does N-1 threading model looks possible to exist? It would be 1-1 threading model.
Please clarify.
Precisely as you explained in the beginning of your question -- when the programmer creates a thread, instead of creating a thread the kernel is aware of, it creates a thread that the userland scheduler is aware of, still using a single kernel thread for the entire process.
Short answer: there is more than Windows and Linux.
Slightly longer answer (EDITED):
Many programming languages and frameworks introduce multithreading to the programmer. At the same time, they aim to be portable, i.e., it is not known, whether any target plattform does support threads at all. Here, the best way is to implement a N:1 threading, either in general, are at least for the backends without threading support.
The classic example is Java: the language supports multithreading, while JVMs exist even for very simple embedded plattforms, that do not support threads. However, there are JVMs (actually, most of them) that use kernel threads (e.g. AFIK, the JVM by Sun/Oracle).
Another reason that a language/plattform does not want to transfer the threading control completely to the operating system are sometimes special implementation features as reactor modells or global language locks. Here, the objective is to use information on execution special patterns in the user runtime system (which does the local scheduling) that the OS scheduling has no access to.
Does [1:1 threading] add more space occupancy on User process virtual
address space because of these kernel threads?
Well, in theory, execution flow (processes, threads, etc.) and address space are independent concepts. One can find all kinds of mapping between processes (here used as a general term) and memory spaces: 1:1, n:1, 1:n, n:n. However, the classic approach of threading is that several threads of a process share the memory space of the task (that is the owner of the memory space). And thus, there is usually no difference between user threads and kernel threads regarding the memory space. (One exception is, e.g., the Erlang-VM: here, there exist user threads with isolated memory spaces).

Shared memory access control mechanism for processes created by MPI

I have a shared memory used by multiple processes, these processes are created using MPI.
Now I need a mechanism to control the access of this shared memory.
I know that named semaphore and flock mechanisms can be used to do this but just wanted to know if MPI provides any special locking mechanism for shared memory usage ?
I am working on C under Linux.
MPI actually does provide support for shared memory now (as of version 3.0). You might try looking at the One-sided communication chapter (http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf) starting with MPI_WIN_ALLOCATE_SHARED (11.2.3). To use this, you'll have to make sure you have an implementation that supports it. I know that the most recent versions of both MPICH and Open MPI work.
No, MPI doesn't provide any support for shared memory. In fact, MPI would not want to support shared memory. The reason is that a program written with MPI is supposed to scale to a large number of processors, and a large number of processors never have shared memory.
However, it may happen, and often does, that groups of small number of processors (in that set of large number of processors) do have shared memory. To utilize that shared memory however, OpenMP is used.
OpenMP is very simple. I strongly suggest you learn it.

OpenMp and Shared Memory definition

According to the OpenMP web site OpenMp is "the de-facto standard for parallel programming on shared memory systems" According to Wikipedia "Using memory for communication inside a single program, for example among its multiple threads, is generally not referred to as shared memory."
What is wrong here ? Is it the "generally" term ?
OpenMp is really just creating threads "sharing memory" through a single same virtual adress space, isn't it ?
Moreover, I guess OpenMP is able to run on NUMA architectures where all the memory can be addressed by all the processors, but with some increased memory access time when threads sharing data, are assigned to cores accessing to different memories at different access time. Is this true ?
I'm redacting a full-fledged answer here to try to answer further questions asked as comments to lucas1024's answer.
On the meaning of "shared memory"
On the one hand, you have the software-oriented (i.e. OS-oriented) meaning of shared-memory: a way to enable different processes to access the same chunk of memory (i.e. to relax the usual OS constraint that a given process should not be able to tamper with other processes' memory). As stated in the wikipedia page, the POSIX shared memory API is one implementation of such a facility. In this acception, it does not make much sense to speak of threads (an OS might well provide shared memory without even providing threads).
On the other hand, you have the hardware-oriented meaning of "shared-memory": an hardware configuration where all CPUs have access to the same piece of RAM.
On the meaning of "thread"
Now we have to disambiguate another term: "thread". An OS might provide a way to have multiple concurrent execution flows within a process. POSIX threads are an implementation of such a feature.
However, the OpenMP specification has its own definitions:
thread: An execution entity with a stack and associated static memory, called
threadprivate memory.
OpenMP thread: A thread that is managed by the OpenMP runtime system.
Such definitions fit nicely with the definition of e.g. POSIX threads, and most OpenMP implementations indeed use POSIX threads to create OpenMP threads. But you might imagine OpenMP implementations on top of OSes which do not provide POSIX threads or equivalent features. Such OpenMP implementations would have to internally manage execution flows, which is difficult enough but entirely doable. Alternatively, they might map OpenMP threads to OS processes and use some kind of "shared memory" feature (in the OS sense) to enable them sharing memory (though I don't know of any OpenMP implementation doing this).
In the end, the only constraint you have for an OpenMP implementation is that all CPUs should have a way to share access to the same central memory. That is to say OpenMP programs should run on "shared memory" systems in the hardware sense. However, OpenMP threads do not necessarily have to be POSIX threads of the same OS process.
A "shared memory system" is simply a system where multiple cores or CPUs are accessing a single pool of memory through a local bus. So the OpenMP site is correct.
Communicating between threads in a program is not done using "shared memory" - instead the term typically refers to communication between processes on the same machine through memory. So the Wikipedia entry is not in contradiction and it, in fact, points out the difference in terminology between hardware and software.

Threading in C, cross platform

I am dealing with an existing project (in C) that is currently running on a single thread, and we would like to run on multiple platforms AND have multiple threads. Hopefully, there is a library for this, because, IMHO, the Win32 API is like poking yourself in the eye repeatedly. I know about Boost.Thread for C++, but, this must be C (and compilable on MinGW and gcc). Cygwin is not an option, sorry.
Try OpenMP API, it's multi-platform and you can compile it with GCC.
Brief description from the wikipedia:
OpenMP (Open Multi-Processing) is an application programming interface
(API) that supports multi-platform shared memory multiprocessing
programming in C, C++, and Fortran,[3] on most platforms, processor
architectures and operating systems, including Solaris, AIX, HP-UX,
Linux, macOS, and Windows. It consists of a set of compiler
directives, library routines, and environment variables that influence
run-time behavior.
I would use the POSIX thread API - pthread. This article has some hints for implementing it on Windows, and a header-file-only download (BSD license):
http://locklessinc.com/articles/pthreads_on_windows/
Edit: I used the sourceforge pthreads-win32 project in the past for multi-platform threading and it worked really nicely. Things have moved on since then and the above link seems more up-to-date, though I haven't tried it. This answer assumes of course that pthreads are available on your non-Windows targets (for Mac / Linux I should think they are, probably even embedded)
Windows threading has sufficiently different functionality when compared to that of Linux such that perhaps you should consider two different implementations, at least if application performance could be an issue. On the other hand, simply implementing multi-threading may well make your app slower than it was before. Lets assume that performance is an issue and that multi-threading is the best option.
With Windows threads I'm specifically thinking of I/O Completion Ports (IOCPs) which allow implementing I/O-event driven threads that make the most efficient use of the hardware.
Many "classic" applications are constructed along one thread/one socket (/one user or similar) concept where the number of simultaneous sessions will be limited by the scheduler's ability to handle large numbers of threads (>1000). The IOCP concept allows limiting the number of threads to the number of cores in your system which means that the scheduler will have very little to do. The threads will only execute when the IOCP releases them after an I/O event has occurred. The thread services the IOC, (typically) initiates a new I/O and returns to wait at the IOCP for the next completion. Before releasing a thread the IOCP will also provide the context of the completion such that the thread will "know" what processing context the IOC belongs to.
The IOCP concept completely does away with polling which is a great resource waster although "wait on multiple object" polling is somewhat of an improvement. The last time I looked Linux had nothing remotely like IOCPs so a Linux multi-threaded application would be constructed quite differently compared to a Windows app with IOCPs.
In really efficient IOCP apps there is a risk that so many IOs (or rather Outputs) are queued to the IO resource involved that the system runs out of non-paged memory to store them. Conversely, in really inefficient IOCP apps there is a risk that so many Inputs are queued (waiting to be serviced) that the non-paged memory is exhausted when trying to temporarily buffer them.
If someone needs a portable and lightweight solution for threading in C, take a look at the plibsys library. It provides you thread management and synchronization, as well as other useful features like portable socket implementation. All major operating systems (Windows, Linux, OS X) are supported, various other less popular operating systems are also supported (i.e. AIX, HP-UX, Solaris, QNX, IRIX, etc). On every platform only the native calls are used to minimize the overheads. The library is fully covered with Unit tests which are run on a regular basis.
glib threads can be compiled cross-platforms.
The "best"/"simplest"/... answer here is definitely pthreads. It's the native threading architecture on Unix/POSIX systems and works almost as good on Windows. No need to look any further.
Given that you are constrained with C. I have two suggestions:
1) I have a seen a project (similar to yours) that had to run on Windows and Linux with threads. The way it was written was that it (the same codebase) used pthreads on Linux and win32 threads on Windows. This was achieved by a conditional #ifdef statement wherever threads needed to be created such as
#ifdef WIN32
//use win32 threads
#else
//use pthreads
#endif
2) The second suggestion might be to use OpenMP. Have you considered OpenMP at all?
Please let me know if I missed something or if you want more details. I am happy to help.
Best,
Krishna
From my experience, multi threading in C for windows is heavily tied to Win32 APIs. Other languages like C# and JAVA supported by a framework also tie into these core libraries while offering their thread classes.
However, I did find an openthreads API platform on sourceforge which might help you:
http://openthreads.sourceforge.net/
The API is modeled with respect to the Java and POSIX thread standard,
I have not tried this myself as I currently do not have a need to support multiple platforms on my C/C++ projects.

Resources