Is fork Thread Safe? [duplicate]

Is fork Thread Safe? [duplicate] - c

Let me explain: I have already been developing an application on Linux which forks and execs an external binary and waits for it to finish. Results are communicated by shm files that are unique to the fork + process. The entire code is encapsulated within a class.
Now I am considering threading the process in order to speed things up. Having many different instances of class functions fork and execute the binary concurrently (with different parameters) and communicate results with their own unique shm files.
Is this thread safe? If I fork within a thread, apart from being safe, is there something I have to watch for? Any advice or help is much appreciated!

The problem is that fork() only copies the calling thread, and any mutexes held in child threads will be forever locked in the forked child. The pthread solution was the pthread_atfork() handlers. The idea was you can register 3 handlers: one prefork, one parent handler, and one child handler. When fork() happens prefork is called prior to fork and is expected to obtain all application mutexes. Both parent and child must release all mutexes in parent and child processes respectively.
This isn't the end of the story though! Libraries call pthread_atfork to register handlers for library specific mutexes, for example Libc does this. This is a good thing: the application can't possibly know about the mutexes held by 3rd party libraries, so each library must call pthread_atfork to ensure it's own mutexes are cleaned up in the event of a fork().
The problem is that the order that pthread_atfork handlers are called for unrelated libraries is undefined (it depends on the order that the libraries are loaded by the program). So this means that technically a deadlock can happen inside of a prefork handler because of a race condition.
For example, consider this sequence:
Thread T1 calls fork()
libc prefork handlers are called in T1 (e.g. T1 now holds all libc locks)
Next, in Thread T2, a 3rd party library A acquires its own mutex AM, and then makes a libc call which requires a mutex. This blocks, because libc mutexes are held by T1.
Thread T1 runs prefork handler for library A, which blocks waiting to obtain AM, which is held by T2.
There's your deadlock and its unrelated to your own mutexes or code.
This actually happened on a project I once worked on. The advice I had found at that time was to choose fork or threads but not both. But for some applications that's probably not practical.

It's safe to fork in a multithreaded program as long as you are very careful about the code between fork and exec. You can make only re-enterant (aka asynchronous-safe) system calls in that span. In theory, you are not allowed to malloc or free there, although in practice the default Linux allocator is safe, and Linux libraries came to rely on it End result is that you must use the default allocator.

Back at the Dawn of Time, we called threads "lightweight processes" because while they act a lot like processes, they're not identical. The biggest distinction is that threads by definition live in the same address space of one process. This has advantages: switching from thread to thread is fast, they inherently share memory so inter-thread communications are fast, and creating and disposing of threads is fast.
The distinction here is with "heavyweight processes", which are complete address spaces. A new heavyweight process is created by fork(2). As virtual memory came into the UNIX world, that was augmented with vfork(2) and some others.
A fork(2) copies the entire address space of the process, including all the registers, and puts that process under the control of the operating system scheduler; the next time the scheduler comes around, the instruction counter picks up at the next instruction -- the forked child process is a clone of the parent. (If you want to run another program, say because you're writing a shell, you follow the fork with an exec(2) call, which loads that new address space with a new program, replacing the one that was cloned.)
Basically, your answer is buried in that explanation: when you have a process with many LWPs threads and you fork the process, you will have two independent processes with many threads, running concurrently.
This trick is even useful: in many programs, you have a parent process that may have many threads, some of which fork new child processes. (For example, an HTTP server might do that: each connection to port 80 is handled by a thread, and then a child process for something like a CGI program could be forked; exec(2) would then be called to run the CGI program in place of the parent process close.)

While you can use Linux's NPTL pthreads(7) support for your program, threads are an awkward fit on Unix systems, as you've discovered with your fork(2) question.
Since fork(2) is a very cheap operation on modern systems, you might do better to just fork(2) your process when you have more handling to perform. It depends upon how much data you intend to move back and forth, the share-nothing philosophy of forked processes is good for reducing shared-data bugs but does mean you either need to create pipes to move data between processes or use shared memory (shmget(2) or shm_open(3)).
But if you choose to use threading, you can fork(2) a new process, with the following hints from the fork(2) manpage:
* The child process is created with a single thread — the
one that called fork(). The entire virtual address space
of the parent is replicated in the child, including the
states of mutexes, condition variables, and other pthreads
objects; the use of pthread_atfork(3) may be helpful for
dealing with problems that this can cause.

Provided you quickly either call exec() or _exit() in the forked child process, you're ok in practice.
You might want to use posix_spawn() instead which will probably do the Right Thing.

My experience of fork()'ing within threads is really bad. The software generally fails pretty quickly.
I've found several solutions to the matter, although you may not like them much, I think these are generally the best way to avoid close to undebuggable errors.
Fork first
Assuming you know the number of external processes you need at the start, you can create them upfront and just have them sit there waiting for an event (i.e. read from a blocking pipe, wait on a semaphore, etc.)
Once you forked enough children you are free to use threads and communicate with those forked processes via your pipes, semaphores, etc. From the time you create a first thread, you cannot call fork anymore. Keep in mind that if you're using 3rd party libraries which may create threads, those have to be used/initialized after the fork() calls happened.
Note that you can then start using threads within the main and fork()'ed processes.
Know your state
In some circumstances, it may be possible for you to stop all of your threads to start a process and then restart your threads. This is somewhat similar to point (1) in the sense that you do not want threads running at the time you call fork(), although it requires a way for you to know about all the threads currently running in your software (something not always possible with 3rd party libraries).
Remember that "stopping a thread" using a wait is not going to work. You have to join with the thread so it is fully exited, because a wait require a mutex and those need to be unlocked when you call fork(). You just cannot know when the wait is going to unlock/re-lock the mutex and that's usually where you get stuck.
Choose one or the other
The other obvious possibility is to choose one or the other and not bother with whether you're going to interfere with one or the other. This is by far the simplest method if at all possible in your software.
Create Threads only when Necessary
In some software, one creates one or more threads in a function, use said threads, then joins all of them when exiting the function. This is somewhat equivalent to point (2) above, only you (micro-)manage threads as required instead of creating threads that sit around and get used when necessary. This will work too, just keep in mind that creating a thread is a costly call. It has to allocate a new task with a stack and its own set of registers... it is a complex function. However, this makes it easy to know when you have threads running and except from within those functions, you are free to call fork().
In my programming, I used all of these solutions. I used Point (2) because the threaded version of log4cplus and I needed to use fork() for some parts of my software.
As mentioned by others, if you are using a fork() to then call execve() then the idea is to use as little as possible between the two calls. That is likely to work 99.999% of the time (many people use system() or popen() with fairly good successes too and these do similar things). The fact is that if you do not hit any of the mutexes held by the other threads, then this will work without issue.
On the other hand, if, like me, you want to do a fork() and never call execve(), then it's not likely to work right while any thread is running.
What is actually happening?
The issue is that fork() create a separate copy of only the current task (a process under Linux is called a task in the kernel).
Each time you create a new thread (pthread_create()), you also create a new task, but within the same process (i.e. the new task shares the process space: memory, file descriptors, ownership, etc.). However, a fork() ignores those extra tasks when duplicating the currently running task.
+-----------------------------------------------+
| Process A |
| |
| +----------+ +----------+ +----------+ |
| | thread 1 | | thread 2 | | thread 3 | |
| +----------+ +----+-----+ +----------+ |
| | |
+----------------------|------------------------+
| fork()
|
+----------------------|------------------------+
| v Process B |
| +----------+ |
| | thread 1 | |
| +----------+ |
| |
+-----------------------------------------------+
So in Process B, we lose thread 1 & thread 3 from Process A. This means that if either or both have a lock on mutexes or something similar, then Process B is going to lock up quickly. The locks are the worst, but any resources that either thread still has at the time the fork() happens are lost (socket connection, memory allocations, device handle, etc.) This is where point (2) above comes in. You need to know your state before the fork(). If you have a very small number of threads or worker threads defined in one place and can easily stop all of them, then it will be easy enough.

If you are using the unix 'fork()' system call, then you are not technically using threads- you are using processes- they will have their own memory space, and therefore cannot interfere with eachother.
As long as each process uses different files, there should not be any issue.

Related

Are threads copied when calling fork?

If I have a program running with threads and call fork() on a unix-based system, are the threads copied? I know that the virtual memory for the current process is copied 1:1 to the new process spawned. I know that threads have their own stack in the virtual memory of a process. Thus, at least the stack of threads should be copied too. However, I do not know if there is anything more to threads that does not reside in virtual memory and is thus NOT copied over. If there is not, do the two processes share the threads or are they independent copies?

No.
Threads are not copied on fork(). POSIX specification says (emphasize is mine):
fork - create a new process
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
To circumvent this problem, there exists a pthread_atfork() function to help.

man fork:
The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the child, including the states of mutexes, condition variables, and other pthreads objects; the use of pthread_atfork(3) may be helpful for dealing with problems that this can cause.

From The Open Group Base Specifications Issue 7, 2018 edition's fork:
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
When the application calls fork() from a signal handler and any of the fork handlers registered by pthread_atfork() calls a function that is not async-signal-safe, the behavior is undefined.

Originally, "fork" was achieved by writing the task to disk and then, rather than reading in a different thread (which would be done if swapping the task with a different one), modifying the task ID of the image still in memory and continuing with its execution (as the new task). This was a very simple modification to the basic task switching mechanism, where only one task would occupy RAM memory at a time.
Of course, as memory management got more elaborate this scheme was modified to suit the new environment.

Where does the forked process start from if a call of fork in a thread occurs?

I'm going to write a program in which the main thread creates new thread and then the new thread creates a child process. Since I have a hard time keeping track of the new thread and forked process, I'd like to gain a wise answer from someone.
My question is
1. Does a created process in a thread start to execute codes after pthread_create?
2. If 1 is not, where does the forked process start from if a call of fork in a thread occurs?
Thank you for reading my question.

Some of this is a bit OS-dependent, as different systems have different POSIX thread implementations and this can expose internals.
POSIX offers pthread_atfork as a somewhat blunt instrument for dealing with some of the issues, but it still looks pretty messy to me.
If your system uses a one-to-one map between "user land thread" and "kernel thread" using clone or rfork to achieve proper user-space sharing of data between threads, then fork will merely duplicate the (single) thread that calls it. However, if your system has a many-to-many style mapping (so that one user process is handling multiple threads, at least before they enter into blocking syscalls), fork may internally duplicate multiple threads. POSIX says it should look like it only duplicated one thread, so that's not supposed to be visible, but I'm not sure how well all systems implement this.
There's some general advice at http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them (Linux-centric, obviously, but still useful).
Is there some particular reason you want to fork inside a thread but not exec? In general, if you just want to run more code in parallel, you just spin off yet another thread (i.e., once you choose to run any threads, you do everything in threads, except if you have to fork for exec; if the exec fails, just _exit).

Synchronize two processes using two different states

I am trying to work out a way to synchronize two processes which share data.
Basically I have two processes linked using shared memory. I need process A to set some data in the shared memory area, then process B to read that data and act on it.
The sequence of events I am looking to have is:
B blocks waiting for data available signal
A writes data
A signals data available
B reads data
B blocks waiting for data not available signal
A signals data not available
All goes back to the beginning.
In other terms, B would block until it got a "1" signal, get the data, then block again until that signal went to "0".
I have managed to emulate it OK using purely shared memory, but either I block using a while loop which consumes 100% of CPU time, or I use a while loop with a nanosleep in it which sometimes misses some of the signals.
I have tried using semaphores, but I can only find a way to wait for a zero, not for a one, and trying to use two semaphores just didn't work. I don't think semaphores are the way to go.
There will be numerous processes all accessing the same shared memory area, and all processes need to be notified when that shared memory has been modified.
It's basically trying to emulate a hardware data and control bus, where events are edge rather than level triggered. It's the transitions between states I am interested in, rather than the states themselves.
So, any ideas or thoughts?

Linux has its own eventfd(2) facility that you can incorporate into your normal poll/select loop. You can pass eventfd file descriptor from process to process through a UNIX socket the usual way, or just inherit it with fork(2).
Edit 0:
After re-reading the question I think one of your options is signals and process groups: start your "listening" processes under the same process group (setpgid(2)), then signal them all with negative pid argument to kill(2) or sigqueue(2). Again, Linux provides signalfd(2) for polling and avoiding slow signal trampolines.

If 2 processes are involved you can use a file , shared memory or even networking to pass the flag or signal. But if the processes are more, there may be some suitable solutions in modifying the kernel. There is one shared memory in your question, right ?! How the signals are passed now ?!

In linux, all POSIX control structures (mutex, conditions, read-write-locks, semaphores) have an option such that they also can be used between processes if they reside in shared memory. For the process that you describe a classic mutex/condition pair seem to fit the job well. Look into the man pages of the ..._init functions for these structures.
Linux has other proper utilities such as "futex" to handle this even more efficiently. But these are probably not the right tools to start with.

1 Single Reader & Single Writer
1 Single Reader & Single Writer
This can be implemented using semaphores.
In posix semaphore api, you have sem_wait() which will wait until value of the semaphore count is zero once it is incremented using sem_post from other process the wait will finish.
In this case you have to use 2 semaphores for synchronization.
process 1 (reader)
sem_wait(sem1);
.......
sem_post(sem2);
process 2(writer)
sem_wait(sem2);
.......
sem_post(sem1);
In this way you can achieve synchronization in shared memory.

Recreate dead threads after a fork

As you might know, all threads in the application die in a forked process, other than the thread doing the fork. However, I plan to ressurrect those threads in the forked process by calling pthread_create and using pthread_attr_setstack, so as to assign the newly created threads the same stack as the dead threads. Something like as follows.
// stackAddr and stacksize taken from the dead thread
pthread_attr_setstack(&attr, stackAddr, stacksize);
rc = pthread_create(&thread, &attr, threadRoutine, NULL);
However, I would still need to get the CPU register values, such as stack pointer, base pointer, instruction pointer etc, to restart threads from the same point. How can I do that? And what else do I need to do to successfully achieve my goal?
Also note that I'm using a 64-bit architecture. What additional difficulties would it have as compared to 32-bit one?

I see two possible ways to shoot yourself in the foot and lose hair^W^W^W^W^W^W^W^Wtry to do this:
Try to force each thread into calling getcontext() before the fork(), and then restore the context of each thread via setcontext(). Probably won't work, but you can try for fun.
Save ptrace(PTRACE_GETREGS), ptrace(PTRACE_GETFPREGS), and restore with ptrace(PTRACE_SETREGS), ptrace(PTRACE_SETFPREGS).

The other threads in the current process aren't killed by a fork -- they're still there and running in the parent. The problem you seem to have is that fork only forks a SINGLE thread in the current procces, creating a new process running one thread with a copy of all non-thread resources in the parent.
What you apparently want is a way of duplicating an entire multithreaded task, forking all the threads in it and creating a new process/task with the same number of threads.
In order to do THAT, you would need to find and pause all the other threads in the process, dump their current state (including all locks they hold), fork a new process, and then (re)create each of those other threads in the child, rewiring the lock state to refer to the new child threads where needed.
Unfortunately, the POSIX pthread interface is hopelessly underspecified, and provides no way of doing that. In particular, it lacks any sort of reflective interface allowing you to figure out what threads are actually running.
If you want to try to do this anyway, I can see two ways of trying to approach this:
poke around in /proc/self/task to figure out what threads are running in your process, effectively getting that reflective interface in a highly non-portable way. You'll likely end up having to ptrace(2) the other threads to get their internal state. This will be very difficult.
wrap the pthreads library -- instead of using library directly, intercept every call and keep track of all the threads/mutexes/locks that get created, so that you have that information available when you want to fork. This will work fine as long as you don't want to use any third-party libraries that use pthreads
The second option is much easier (and somewhat portable), but only works well if you have access to all the source code of your entire application, and can modify it to use your wrappers properly.

Just googling around I found that solaris has a forkall() call that does exactly what you want, see the documentation here:
http://download.oracle.com/docs/cd/E19963-01/html/821-1601/gen-1.html
I assume you're running on linux, but it is possible to run solaris on x86 hardware. So maybe that is an option for you.

Many processes executed by one thread

Is something like the following possible in C on Linux platform:
I have a thread say A reading system calls(intercepting system calls) made by application processes. For each process A creates a thread, which performs the required system call and then sleeps till A wakes it up with another system call which was made by its corresponding application process. When a process exits, it worker thread ceases to exist.
So its like a number of processes converzing on a thread which then fans out to many threads with one thread per process.
Thanks

If you are looking for some kind of threadpool implementation and are not strictly limited to C I would recommend threadpool (which is almost Boost). Its easy to use and quite lean. The only logic you now need is the catching of the system event and then spawn a new task thread that will execute the call. The threadpool will keep track of all created threads and assign work automatically to the threads.
EDIT
Since you are limited to C, try this implementation. It looks fairly complete and rather simple, but it will basically do the job.