backgrounding a threaded application with fork() - c

So I have an application which uses threads. Now when the program first starts up, I want it to go through setting up database connections and whatnot before it backgrounds itself so that whatever/whoever starts the program can know if there was an error starting up.
I did some looking around and have found some resources that say 'do not mix fork and threads', while others say that forking in linux will only duplicate the main thread and leave the others alone.
In the case of the latter (where it just duplicates the main thread), how then do the threads access file level (global) variables? Will the threads not be able to access the variables that are now in the forked process's address space?
Ultimately the goal is to have the application background itself after threads have been created. If this is not possible, I can put the fork before the thread creation, just would like to do it as late as possible.
Note: at the time of the fork, the threads will be doing a sleep() loop until the main thread puts data into a shared variable for them to process. So if the sleep gets interrupted, they wont be harmed.

There is no way to duplicate threads as part of the fork, and the parent's threads will all terminate when the parent exits, so even if they could access the child's memory, it wouldn't help you. You need to either create your threads after forking, or use pthread_atfork to register handlers that will recreate them in the child process. I would recommend just waiting until after forking to create your threads since it's a lot simpler and more efficient.
Why is it that you want to delay forking as long as possible? If you want to maintain connection to a terminal or something until initialization is finished, you can just have the parent process wait to terminate until the child process (with its threads) is done initializing and ready to be "in the background". Various synchronization tools could be used to accomplish this. One simple one would be opening a pipe through which the child sends its output back to the parent to display; the parent could simply exit when it receives EOF on this pipe.

Forking a process creates two different processes and threads in one process will not be able to access memory in the second process. If you want different processes to access the same memory, you want something called shared memory.

When a thread in a process calls fork(), a new process is created by copying, among other things, (1) the full address space of the process and (2) the (one) thread that called fork. If there are other threads in the process, they don't get copied. This will almost certainly lead to bugs in your program. Hence the advice not to mix threads and forks.
If you want to create a background process with many threads, you must fork it before spawning any other thread. Then, the two processes behave normally, like any two isolated processes: threads within one process share the same memory, but your background threads and your foreground process won't share any memory (by default).

Related

Are my fork processes running parallel or executing one after another?

I am just going to post pseudo code,
but my question is I have a loop like such
for(i<n){
createfork();
if(child)
/*
Exit so I can control exact amount of forks
without children creating more children
*/
exit
}
void createfork(){
fork
//execute other methods
}
Does my fork create a process do what it is suppose to do and exit then create another process and repeat? And if so what are some ways around this, to get the processes running concurrently?
Your pseudocode is correct as written and does not need to be modified.
The processes are already executing in parallel, all six of them or however many you spawn. As written, the parent process does not wait for the children to finish before spawning more children. It calls fork(), checks if (child) (which is skipped), then immediately proceeds to the next for loop iteration and forks again.
Notably, there's no wait() call. If the parent were to call wait() or waitpid() to wait for each child to finish then that would introduce the serialism you're trying to avoid. But there is no such call, so you're good.
When a process successfully performs a POSIX fork(), that process and the new child process are initially both eligible to run. In that sense, they will run concurrently until one or the other blocks. Whether there will be any periods of time when both are executing machine instructions (on different processing units) depends at least on details of hardware capabilities, OS scheduling, the work each process is performing, and what other processes there are in the system and what they are doing.
The parent certainly does not, in general, automatically wait for the child to terminate before it proceeds with its own work (there is a family of functions to make it wait when you want that), nor does the child process automatically wait for any kind of signal from the parent. If the next thing the parent does is fork another child, then that will under many circumstances result in the parent running concurrently with both (all) children, in the sense described above.
I cannot speak to specifics of the behavior of your pseudocode, because it's pseudocode.

Forked processes order of execution

I know there's another thread with the same name, but this is actually a different question.
When a process forks multiple times, does the parent finish executing before the children? Vice versa? Concurrently?
Here's an example. Lets say I have a for loop that forks 1 parent process into 4 children. At the end of that for loop, I want the parent process to feed some data to the children via pipes. The data is written to each child process' respective stdin.
Will the parent send the data first, before any of the children execute their code? This is important, because we don't want it to start working from an invalid stdin.
The order of the execution is determined by the specific OS scheduling policy and not guaranteed by anything. In order to synchronize the processes there are special facilities for the inter-process communication (IPC) which are designed for this purpose. The mentioned pipes are one example. They make the reading process to actually wait for the other process to write it, creating a (one-way) synchronization point. The other examples would be FIFOs and sockets. For simpler tasks the wait() family of functions or signals can be used.
When a process forks multiple times, does the parent finish executing before the children? Vice versa? Concurrently? -
Concurrently and depends on the scheduler and its unpredictable.
Using pipe to pass integer values between parent and child
This link explains in detail about sharing data between parent process and child.
Since you have four child process you may need to create different individual pipes between each child process.
Each byte of data written to a pipe will be read exactly once. It isn't duplicated to every process with the read end of the pipe open.
Multiple child processes reading/writing on the same pipe
Alternatively you can try shared memory for the data transfer.
They will execute concurrently. This is basically the point of processes.
Look into mutexes or other ways to deal with concurrency.

Where does the forked process start from if a call of fork in a thread occurs?

I'm going to write a program in which the main thread creates new thread and then the new thread creates a child process. Since I have a hard time keeping track of the new thread and forked process, I'd like to gain a wise answer from someone.
My question is
1. Does a created process in a thread start to execute codes after pthread_create?
2. If 1 is not, where does the forked process start from if a call of fork in a thread occurs?
Thank you for reading my question.
Some of this is a bit OS-dependent, as different systems have different POSIX thread implementations and this can expose internals.
POSIX offers pthread_atfork as a somewhat blunt instrument for dealing with some of the issues, but it still looks pretty messy to me.
If your system uses a one-to-one map between "user land thread" and "kernel thread" using clone or rfork to achieve proper user-space sharing of data between threads, then fork will merely duplicate the (single) thread that calls it. However, if your system has a many-to-many style mapping (so that one user process is handling multiple threads, at least before they enter into blocking syscalls), fork may internally duplicate multiple threads. POSIX says it should look like it only duplicated one thread, so that's not supposed to be visible, but I'm not sure how well all systems implement this.
There's some general advice at http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them (Linux-centric, obviously, but still useful).
Is there some particular reason you want to fork inside a thread but not exec? In general, if you just want to run more code in parallel, you just spin off yet another thread (i.e., once you choose to run any threads, you do everything in threads, except if you have to fork for exec; if the exec fails, just _exit).

what is the difference in system calls in thread creation and child process creation

How is the implementation of threads done in a system?
I know that child processes are created using the fork() call
and a thread is a light weight. How does the creation of a thread differ from that of a child process?
Threads are created using the clone() system call that can make a new process that shares memory space and some of the kernel control structures with its parent. These processes are called LWPs (light-weight processes) and are also known as kernel-level threads.
fork() creates a new process that initially shares memory with its parent but pages are copy-on-write, which means that separate memory pages are created when the content of the original one is altered. Thus both parent and child processes can no longer change each other's memory and effectively they run as separate processes. Also the newely forked child is a full-blown processes with its separate kernel control structures.
Each process has its own address space aka range of virtual addresses that the process can access. When a new process is forked a duplicate copy of all the resources involved has to be made. After the forking is complete the child and the parent have their own distinct address space and all the resources involved within it.Naturally, this is an performance intensive operation.
While all threads in the same process share the same address space, So when a new thread is spawned each thread only needs its own stack and there is no duplication of all resources as in case of processes.Hence spawning of an thread is considerably less performance intensive.
Ofcourse the two operations cannot and should not be compared because both provide essentially different features for different requirements.
Well it differs very much, first of all child process is in some way copy of parent program and have all variables duplicated, and you differ child from parent by its PID. Threads are like new programs , they run at the same time as main program (it looks like at the same time, due to slicing time of cpu by os ). Threads could use global variables in program, but they don't make duplicate as processes. So it`s much cheaper to use threads then new processes.
Well you've read the important parts, now here's something behind the curtains:
In current implementations(where current means the last few decades), the process memory isn't technically copied immediately upon forking. Read-only sections are just shared between the two processes (as they can't change anyway), as well as the read-only parts of shared libraries, of course. But most importantly, everything writeable is initially also just shared. However, it is shared in a write-protected manner, and as soon as you write to the child process memory (e.g. by incrementing a variable), a page fault is generated in the kernel, which only then causes the kernel to actually copy the respective page (where the modification then occurs).
This great optimization, which is called "copy on write", results in child processes usually not really consuming exactly as much (physical) memory as their parent processes. To the program developer (and user), however, it's completely transparent.

pthread and child process data sharing in C

my question is somewhat conceptual, how is parent process' data shared with child process created by a fork() call or with a thread created by pthread_create()
for example, are global variables directly passed into child process and if so, does modification on that variable made by child process effect value of it in parent process?
i appreciate partial and complete answers in advance, if i'm missing any existing resource, i'm sorry, i've done some search on google but couldn't find good results
thanks again for your time and answers
The semantics of fork() and pthread_create() are a little different.
fork() will create a new process, where the global variables will be separate between the parent and children. Most OS implementations will use copy-on-write semantics, meaning that both the parent and child process will use the same physical memory pages for all global variables until one of the processes attempts to edit the physical memory, at which point a copy of that page is made, so that now each process gets its own copy and does not see the other process's, so that the processes are isolated.
pthread_create() on the other hand, creates a new thread within the same process. The new thread will have a separate stack space from the other running threads of the same process, however the global variables and heap space are shared between all threads of the same process. This is why you often need a mutex to coordinate access to a shared piece of memory between multiple threads of the same process.
TL;DR version: with fork(), you don't see the other guy's changes; with pthread_create() you do.
A fork creates an almost exact copy of the calling process, including memory and file descriptors. Global variables are copied along with everything else, but they are not in any way linked to the parent process. Since file descriptors are also copied, parent and child can interact via these (as long as they're setup properly, usually via pipe or socketpair).
There's a big difference between processes created by fork and between threads created with pthread_create. Processes don't share global variables and should communicate through pipes, sockets, or other tools provided by the OS. A good solution is MPI - which is a message-passing library for inter-process communication.
Threads are quite different. A thread created with pthread_create shares all the global variables with its caller. Moreover, the caller can pass an arbitrary structure into the thread, and this structure will also be shared. This means that one should be extremely careful when programming with threads - such amounts of sharing are dangerous and error prone. The pthread API provides mutexes and conditions for robust synchronization between threads (although it still requires practice and expertise to implement correctly).

Resources