Operating system inside - c

I have three questions which are causing me a lot of doubts:
If one thread in a program calls fork(), does the new process
duplicate all threads, or is the new process single-threaded?
If a thread invokes exec(), will the program specified in the parameter
to exec() replace the entire process including ALL the threads?
Are system calls preemptive? For example whether a process can be scheduled in middle of a system call?

For exec, from man execve:
All threads other than the calling thread are destroyed during an execve().
From man fork:
The child process is created with a single thread — the one that called fork().

W.r.t. #3: Yes, you can invoke a system call that directly or indirectly makes another thread ready to run. And if that thread has a greater priority than the current and the system is designed to schedule it right then, it can do so.

Related

Are threads copied when calling fork?

If I have a program running with threads and call fork() on a unix-based system, are the threads copied? I know that the virtual memory for the current process is copied 1:1 to the new process spawned. I know that threads have their own stack in the virtual memory of a process. Thus, at least the stack of threads should be copied too. However, I do not know if there is anything more to threads that does not reside in virtual memory and is thus NOT copied over. If there is not, do the two processes share the threads or are they independent copies?
No.
Threads are not copied on fork(). POSIX specification says (emphasize is mine):
fork - create a new process
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
To circumvent this problem, there exists a pthread_atfork() function to help.
man fork:
The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the child, including the states of mutexes, condition variables, and other pthreads objects; the use of pthread_atfork(3) may be helpful for dealing with problems that this can cause.
From The Open Group Base Specifications Issue 7, 2018 edition's fork:
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
When the application calls fork() from a signal handler and any of the fork handlers registered by pthread_atfork() calls a function that is not async-signal-safe, the behavior is undefined.
Originally, "fork" was achieved by writing the task to disk and then, rather than reading in a different thread (which would be done if swapping the task with a different one), modifying the task ID of the image still in memory and continuing with its execution (as the new task). This was a very simple modification to the basic task switching mechanism, where only one task would occupy RAM memory at a time.
Of course, as memory management got more elaborate this scheme was modified to suit the new environment.

What happens when multi thread program creates new processes?

I am a bit confused I would like to know in detail, what happens if a C program with more than one thread creates new processes. Does the behaviour depends on which thread is creating new processes or how many threads create new processes?
With pthreads, only the calling thread is forked in the new process when fork is called.
From the Linux man page:
The child process is created with a single thread--the one that
called fork(). The entire virtual address space of the parent
is replicated in the child, including the states of mutexes,
condition variables, and other pthreads objects; the use of
pthread_atfork(3) may be helpful for dealing with problems that this
can cause.
There are however some versions of fork on Solaris that duplicate all threads.
From the Solaris man page:
A call to forkall() or forkallx() replicates in the child process
all of the threads (see
thr_create(3C) and pthread_create(3C)) in the parent process. A call to fork1() or forkx()
replicates only the calling thread in the child process.
A call to fork() is identical to a call to fork1(); only the calling
thread is replicated in the child process. This is the POSIX-specified
behavior for fork().
In releases of Solaris prior to Solaris 10, the behavior of fork()
depended on whether or not the application was linked with the
POSIX threads library. When linked with -lthread (Solaris Threads)
but not linked with -lpthread (POSIX Threads), fork() was the same
as forkall(). When linked with -lpthread, whether or not also
linked with -lthread, fork() was the same as fork1().

Use of pthread_join()

I am wondering, what can happen if we do a pthread_create without a pthread_join?
Who will "clean" all the memory of the "non-joined" thread.
When the process terminates, all resources associated with the process cease to exist. (This of course does not include shared resources the process created, like files in the filesystem, shared memory segments, etc.) Until then, unjoined threads will continue to consume resources, potentially calling future calls to pthread_create or even malloc to fail.
Well, assuming that it's an app-lifetime thread that does not need or try to explicitly terminate, the OS will do it when its process is terminated, (on all non-trivial OS).
If the thread is created without using pthread_join then when the main thread completes execution all other threads created in main function will be stopped and hence will not complete executing the whole statements in it.
Look at the documentation of Pthread_join.
It will make the main thread to suspend until the spawned thread completes execution.

Is there a way to spawn a new process without terminating the other threads?

I'm writing the multithreaded program on Linux and want to create a process in a thread without ending the other threads. I looked into fork/exec but in the exec man page in section 3p on linux states:
A call to any exec function from a process with more than one thread
shall result in all threads being terminated and the new executable
image being loaded and executed. No destructor functions shall be
called.
Is there a way to spawn a new process without terminating the other threads?
But if you fork() first and exec in the child, the child process only has one thread and that is destroyed by the exec function. The parent process and all of its threads are unaffected.

How to handle a fork error for a multithreaded process?

I am working on a multithreaded process that forks to execute another process. Sometimes, the fork may error if the execution file does not exist. Since this process has multiple threads running prior to fork I have a couple questions:
Are threads copied over to the forked process.
What is the best practice to handling an error from fork with a multithreaded process. For example:
/* in a multithreaded process */
pid = fork();
if(pid == 0)
{
execlp(filename, filename, NULL);
fprintf(stderr, "filename doesn't exist");
/* what do i do here if there's multiple threads running
from the fork? */
exit(-1);
}
Well, the fork doesn't error if the executable file doesn't exist. The exec errors in that case. But, to your actual question, POSIX states that fork creates a new process with a single thread, a copy of the thread that called fork. See here for details:
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources.
Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
So what you have is okay, if a little sparse :-)
A single thread will be running in the child and, if you cannot exec another program, log a message and exit.
And, in the rationale section, it explains why it was done that way:
There are two reasons why POSIX programmers call fork(). One reason is to create a new thread of control within the same program (which was originally only possible in POSIX by creating a new process); the other is to create a new process running a different program. In the latter case, the call to fork() is soon followed by a call to one of the exec functions.
The general problem with making fork() work in a multi-threaded world is what to do with all of the threads. There are two alternatives. One is to copy all of the threads into the new process. This causes the programmer or implementation to deal with threads that are suspended on system calls or that might be about to execute system calls that should not be executed in the new process. The other alternative is to copy only the thread that calls fork(). This creates the difficulty that the state of process-local resources is usually held in process memory. If a thread that is not calling fork() holds a resource, that resource is never released in the child process because the thread whose job it is to release the resource does not exist in the child process.
When a programmer is writing a multi-threaded program, the first described use of fork(), creating new threads in the same program, is provided by the pthread_create() function. The fork() function is thus used only to run new programs, and the effects of calling functions that require certain resources between the call to fork() and the call to an exec function are undefined.

Resources