What happens when multi thread program creates new processes? - c

I am a bit confused I would like to know in detail, what happens if a C program with more than one thread creates new processes. Does the behaviour depends on which thread is creating new processes or how many threads create new processes?

With pthreads, only the calling thread is forked in the new process when fork is called.
From the Linux man page:
The child process is created with a single thread--the one that
called fork(). The entire virtual address space of the parent
is replicated in the child, including the states of mutexes,
condition variables, and other pthreads objects; the use of
pthread_atfork(3) may be helpful for dealing with problems that this
can cause.
There are however some versions of fork on Solaris that duplicate all threads.
From the Solaris man page:
A call to forkall() or forkallx() replicates in the child process
all of the threads (see
thr_create(3C) and pthread_create(3C)) in the parent process. A call to fork1() or forkx()
replicates only the calling thread in the child process.
A call to fork() is identical to a call to fork1(); only the calling
thread is replicated in the child process. This is the POSIX-specified
behavior for fork().
In releases of Solaris prior to Solaris 10, the behavior of fork()
depended on whether or not the application was linked with the
POSIX threads library. When linked with -lthread (Solaris Threads)
but not linked with -lpthread (POSIX Threads), fork() was the same
as forkall(). When linked with -lpthread, whether or not also
linked with -lthread, fork() was the same as fork1().

Related

Are threads copied when calling fork?

If I have a program running with threads and call fork() on a unix-based system, are the threads copied? I know that the virtual memory for the current process is copied 1:1 to the new process spawned. I know that threads have their own stack in the virtual memory of a process. Thus, at least the stack of threads should be copied too. However, I do not know if there is anything more to threads that does not reside in virtual memory and is thus NOT copied over. If there is not, do the two processes share the threads or are they independent copies?
No.
Threads are not copied on fork(). POSIX specification says (emphasize is mine):
fork - create a new process
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
To circumvent this problem, there exists a pthread_atfork() function to help.
man fork:
The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the child, including the states of mutexes, condition variables, and other pthreads objects; the use of pthread_atfork(3) may be helpful for dealing with problems that this can cause.
From The Open Group Base Specifications Issue 7, 2018 edition's fork:
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
When the application calls fork() from a signal handler and any of the fork handlers registered by pthread_atfork() calls a function that is not async-signal-safe, the behavior is undefined.
Originally, "fork" was achieved by writing the task to disk and then, rather than reading in a different thread (which would be done if swapping the task with a different one), modifying the task ID of the image still in memory and continuing with its execution (as the new task). This was a very simple modification to the basic task switching mechanism, where only one task would occupy RAM memory at a time.
Of course, as memory management got more elaborate this scheme was modified to suit the new environment.

vfork VS fork in MPI mutiple thread

My program is very very large. So, I can't list it here. My program uses openMPI & mutiple_thread.
The problem has been solved. (using vfork() instead of fork()) But I don't know why it works. So, could anyone give me an explaination about it?
The problem is caused by free().
There are some segments of code in my program. All these segments are in threads which is created by pthread_create. The logic of these segments are like:
{
*p = malloc();
fun(p);
free(p);
}
All errors are at free(). It report a segment fault error. I ran the program more than 100 times. I found that there is always a fork() being called before each corruption at free.
The logic of fork segment is like(in thread):
{
MPI_program_code...
if(!fork())
{
execv(exe_file,arg);
}
MPI_program_code...
}
(Note that, in exe_file no MPI_function is used.)
When I use vfork() instead of fork(), there is no problem at all. But I don't know why it works.
So, could anyone explain why it works?
You might find the Open MPI FAQ topic on forking child processes very useful. Also an explanation on why using fork() with InfiniBand is dangerous can be found here.
vfork(2) differs from fork(2) in that it is specifically designed to be as lightweight as possible and is only meant to be used together with an immediately following execve(2) (or any of its wrappers from the C library) or _exit(2) call. The reason for that is that vfork(2) creates a child process that shares all memory with the parent instead of having it copy-on-write mapped, i.e. the new child is more like a thread than like a full-blown process. Since the child also uses the stack of the original thread, the parent is blocked until the child has either execve'd another executable or exited.
Open MPI registers a fork() handler using pthread_atfork(). The handler is not called when vfork() is used on modern Linux systems, therefore no actions are taken by the parent process upon forking.

Relation between Thread ID and Process ID

I am having some confusion between Process Id and Thread Id. I have gone through several web-post including stack overflow here, Which says
starting a new process gives you a new PID and a new TGID, while starting a new thread gives you a new PID while maintaining the same TGID.
So when I run a program why all the threads created from the program don't have different PID?
I know in programming we usually say that the main is a thread and execution starts from main , So if I create multiple thread from main, all the threads will have the same PID which is equal to the main's PID.
So what I wanted to ask is as below:
1) When we run a program it will run as a process or a thread?
2) Is there any difference between main thread creating threads and Process creating threads?
3) Is there any difference between thread and process in linux? Since I read somewhere that linux doesn't differentiate between Thread and Process.
Simplifying a bit:
The PID is the process ID, TID is the thread ID. The thing is that for the first thread created by fork(), PID=TID. If you create more threads within the process, with a clone() command, then PID and TID will be different, PID will always be smaller than TID.
No, there is no difference, except maybe that if main is killed, all other threads are also killed.
Yes, the thread is what actually gets scheduled. Technically, the process is only a memory mapping of the different segments of code (text, bss, stack, heap and the OS).
This confusion comes from the Linux concept of tasks.
In Linux there is little difference between a task and a thread though.
Every process is a self contained VM running at least one task.
Each task is an independent execution unit within a process scope.
The main task of a process gives it's task id (TID) to the process as it's process id (PID).
Every new thread that you spawn within a process creates a new task within it. In order to identify then individually in the kernel they get assigned their own individual task id (TID).
All tasks within a process share the same task group id (TGID).
I got the answer here on stackoverflow. It states that if we run a program on Linux that contains the libc libuClibc-0.9.30.1.so (1). Basically an older version of libc then thread created will have different PID as shown below
root#OpenWrt:~# ./test
main thread pid is 1151
child thread pid is 1153
and I tried to run this program with a linux that contains the libc from ubuntu libc6 (2) i.e newer version of libc then Thread created will have the same PID as the process.
$ ./test
main thread pid is 2609
child thread pid is 2609
The libc (1) use linuxthreads implementation of pthread
And the libc (2) use NPTL ("Native posix thread library") implementation of pthread
According to the linuxthreads FAQ (in J.3 answer):
each thread is really a distinct process with a distinct PID, and signals sent to the PID of a thread can only be handled by that thread
So in the old libc which use linuxthreads implementation, each thread has its distinct PID
In the new libc version which use NPTL implementation, all threads has the same PID of the main process.
The NPTL was developed by redhat team. and according to the redhat NPTL document: One of the problems which are solved in the NPTL implementation is:
(Chapter: Problems with the Existing Implementation, page5)
Each thread having a different process ID causes compatibility problems with other POSIX thread implementations. This is in part a moot point since signals can't be used very well but is still noticeable
And that explain this issue.
I am using the new libc version that contains the NPTL ("Native posix thread library") implementation of pthread.
The post you have shown describes of Linux threading implementation which I suppose is the older version of Linux implementation where threads were created as a different process.
In the POSIX implementation of threads, the threads are not created as a different process rather they create different streams of parallel execution of the code which have some components differ in the those parallel execution, the information of which is stored by Thread Descriptor storing the TID.
Whereas the process creating multiple thread can be referred as a multi-threaded process, thus has a same PID of all its thread but different TID's. The main process creating thread can be referred as Main thread
You will get same Process ID as all threads are sharing your program data which is your process so when you call for Process ID you get the same.

Operating system inside

I have three questions which are causing me a lot of doubts:
If one thread in a program calls fork(), does the new process
duplicate all threads, or is the new process single-threaded?
If a thread invokes exec(), will the program specified in the parameter
to exec() replace the entire process including ALL the threads?
Are system calls preemptive? For example whether a process can be scheduled in middle of a system call?
For exec, from man execve:
All threads other than the calling thread are destroyed during an execve().
From man fork:
The child process is created with a single thread — the one that called fork().
W.r.t. #3: Yes, you can invoke a system call that directly or indirectly makes another thread ready to run. And if that thread has a greater priority than the current and the system is designed to schedule it right then, it can do so.

How to handle a fork error for a multithreaded process?

I am working on a multithreaded process that forks to execute another process. Sometimes, the fork may error if the execution file does not exist. Since this process has multiple threads running prior to fork I have a couple questions:
Are threads copied over to the forked process.
What is the best practice to handling an error from fork with a multithreaded process. For example:
/* in a multithreaded process */
pid = fork();
if(pid == 0)
{
execlp(filename, filename, NULL);
fprintf(stderr, "filename doesn't exist");
/* what do i do here if there's multiple threads running
from the fork? */
exit(-1);
}
Well, the fork doesn't error if the executable file doesn't exist. The exec errors in that case. But, to your actual question, POSIX states that fork creates a new process with a single thread, a copy of the thread that called fork. See here for details:
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources.
Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
So what you have is okay, if a little sparse :-)
A single thread will be running in the child and, if you cannot exec another program, log a message and exit.
And, in the rationale section, it explains why it was done that way:
There are two reasons why POSIX programmers call fork(). One reason is to create a new thread of control within the same program (which was originally only possible in POSIX by creating a new process); the other is to create a new process running a different program. In the latter case, the call to fork() is soon followed by a call to one of the exec functions.
The general problem with making fork() work in a multi-threaded world is what to do with all of the threads. There are two alternatives. One is to copy all of the threads into the new process. This causes the programmer or implementation to deal with threads that are suspended on system calls or that might be about to execute system calls that should not be executed in the new process. The other alternative is to copy only the thread that calls fork(). This creates the difficulty that the state of process-local resources is usually held in process memory. If a thread that is not calling fork() holds a resource, that resource is never released in the child process because the thread whose job it is to release the resource does not exist in the child process.
When a programmer is writing a multi-threaded program, the first described use of fork(), creating new threads in the same program, is provided by the pthread_create() function. The fork() function is thus used only to run new programs, and the effects of calling functions that require certain resources between the call to fork() and the call to an exec function are undefined.

Resources