Does Linux clone() create a new thread for the child? - c

I have a C program that uses fork-execv to create a child process to run a Python program. I want the parent and child processes to communicate through IPC (for example, FIFOs) and the child process will continue running for the duration of the parent process, performing Python functions for the parent at intervals signalled through IPC.
My problem is that once the child process starts it prevents the C program from running, so there is no opportunity for the two to interact. The C program is created on the main thread, not with a new thread created with pthreads.
It looks like the solution is to use clone() instead of fork(), which is also helpful because I want the two programs to share the same heap, which I can't do with fork().
The Python child process runs in a while True loop to keep it running while the parent process proceeds. Here is a very simplified version for the purpose of illustration (without IPC):
#!/usr/bin/python3
import os
import time
a = 0
while True:
a = a + 1
if a > 50000:
a = 0
time.sleep(1)
The C program for fork-execv:
#include <sys/types.h> /* for pid_t */
#include <sys/wait.h> /* for wait */
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
int call_PyFn(int64_t fdx)
{
//Create buffer for file descriptor
int buflen = (int)((ceil(log10(fdx))+1)*sizeof(char));
char fdbuf[buflen];
sprintf(fdbuf, "%d", (int)fdx);
int status;
char * paramsList[] = {
"/usr/bin/python3",
"-m",
"NLTK_Python_Libs",
"-c",
fdbuf,
(char *)NULL };
if ( fork() == 0 ){
printf("I am the child\n");
execv(paramsList[0],paramsList); }
else {
printf("I am the parent\n");
wait( &status ); }
return 0;
}
My questions are:
Will clone() run the child process on a separate thread?
When I do multithreading work I use pthread_setaffinity to pin each thread to a specific core. Can I do that with clone()? From the docs, "If CLONE_THREAD is set, the child is placed in the same thread group as the calling process" but "without specifying CLONE_THREAD, then the resulting thread is placed in a new thread group whose TGID is the same as the thread's TID." https://linux.die.net/man/2/clone. That doesn't completely answer this question re affinity. Ideally I would like the two processes to run on the same core so that if I use this in a multicore project I can have each core run its own cloned process on the second thread of the parent's core.
I've read a lot of information on this but nothing completely clears it up for me. In the scenario described, am I right that fork does not create a new thread, which results in the child process blocking the parent, whereas clone will create a new thread?
It looks like the answer is yes, but confirmation would help. The question at How to create a real thread with clone() on Linux? comes close, but doesn't clear up all my questions.
Thanks.

Related

How many process are created in this program?

#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
int main ()
{
int i = 3;
int pid;
while(i > 0) {
pid = fork();
if(pid > 0) {
exit(0);
} else {
i--;
}
}
}
How many process are created on that program??? I am still confused with the fork() system calls, can anybody help me to explain this?
What does fork() do?
fork() is an interesting call. You can think of it as cloning the state of your program into two exact copies -- the only difference between them will be the return value of fork(). The process that did the fork() receives the process id (pid) of the new process, while the new process receives 0.
With that in mind:
How many processes are created?
Each time you fork, you create a new process and exit the parent. You do this three times, therefore -- three processes are created by forking. This doesn't include the one that you started by starting the process, of course. :)
during start of you program, system creates 1 process (+1)
i=3, program creates new process (+1), parent exits, child continue
i=2, program creates new process (+1), parent exits, child continue
i=1, program creates new process (+1), parent exits, child continue
i=0, program exit
so, totally - 4 processes, 3 created by program, 1 by system

Where does code Execution start in a child process?

Consider the code:
#include <stdio.h>
#include <errno.h>
#include <sys/types.h>
#include <unistd.h>
/* main --- do the work */
int main(int argc, char **argv)
{
pid_t child;
if ((child = fork()) < 0) {
fprintf(stderr, "%s: fork of child failed: %s\n",
argv[0], strerror(errno));
exit(1);
} else if (child == 0) {
// do something in child
}
} else {
// do something in parent
}
}
My question is from where does in the code the child process starts executing, i.e. which line is executed first??
If it executes the whole code, it will also create its own child process and thing will go on happening continuously which does not happen for sure!!!
If it starts after the fork() command, how does it goes in if statement at first??
It starts the execution of the child in the return of the fork function. Not in the start of the code. The fork returns the pid of the child in the parent process, and return 0 in the child process.
When you execute a fork() the thread is duplicated into memory.
So what effectively happens is that you will have two threads that executes the snippet you posted but their fork() return values will be different.
For the child thread fork() will return 0, so the other branch of the if won't be executed, same thing happens for the father thread.
When fork() is called the operating system assigns a new address space to the new thread that is going to spawn, then starts it, they will both share the same code segment but since the return value will be different they'll execute different parts of the code (if correctly split, like in your example)
The child starts by executing the next instruction (not line) after fork. So in your case it is the assignment of the fork's return value to the child variable.
Well, if i understand your question correctly, i can say to you that your code will run as a process already.When you run a code,it is already a process , so that this process goes if statement anyway. After fork(), you will have another process(child process).
In Unix, a process can create another process, that's why that happens.
Code execution in a child process starts from the next instruction following the fork() system call.
fork() system call just creates a seperate address space for the child process therefore it is a cloned copy of the parent process and the child process has all the memory elements of it's parent's process.
Thus, after spawning a child process through fork(), both processes (the parent process and the child process) resumes the execution right from the next instruction following the fork() system call.

Can the order of execution of fork() be determined?

I'm working on an exercise on the textbook "Operating System Concepts 7th Edition", and I'm a bit confused about how does fork() work. From my understanding, fork() creates a child process which runs concurrently with its parent. But then, how do we know exactly which process runs first? I meant the order of execution.
Problem
Write a C program using fork() system call that generates the Fibonacci sequence in the child process. The number of sequence will be provided in the command line.
This is my solution:
#include <sys/types.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
void display_fibonacci_sequence( int n ) {
int i = 0;
int a = 1;
int b = 1;
int value;
printf( "%d, %d, ", a, b );
for( ;i < n - 2; ++i ) {
value = a + b;
printf( "%d, ", value );
a = b;
b = value;
}
printf( "\n" );
}
int main( int argc, char** argv ) {
int n;
pid_t pid;
pid = fork();
if( argc != 2 ) {
fprintf( stderr, "Invalid arguments" );
exit( -1 );
}
n = atoi( argv[1] );
if( pid < 0 ) {
fprintf( stderr, "Fork failed" );
exit( -1 );
}
else if( pid == 0 ) {
display_fibonacci_sequence( n );
}
else { // parent process
// what do we need to do here?
}
}
To be honest, I don't see any difference between using fork and not using fork. Besides, if I want the parent process to handle the input from user, and let the child process handle the display, how could I do that?
You are asking many questions, I'll try to answer them in a convenient order.
First question
To be honest, I don't see any difference between using fork and not
using fork.
That's because the example is not a very good one. In your example the parent doesn't do anything so the fork is useless.
Second
else {
// what do we need to do here?
}
You need to wait(2) for the child to terminate. Make sure you read that page carefully.
Third
I want the parent process to handle the input from user, and let the
child process handle the display
Read the input before the fork and "handle" the display inside if (pid == 0)
Fourth
But then, how do we know exactly which process runs first?
Very few programs should concern themselves with this. You can't know the order of execution, it's entirely dependent on the environment. TLPI says this:
After a fork(), it is indeterminate which process—the parent or the
child—next has access to the CPU. On a multiprocessor system, they may both simultaneously get access to a CPU.
Applications that implicitly or explicitly rely on a particular
sequence of execution in order to achieve correct results are open to
failure due to race conditions
That said, an operating system can allow you to control this order. For instance, Linux has /proc/sys/kernel/sched_child_runs_first.
We don't know which runs first, the parent or the child. This is why the parent generally has to wait for the child process to complete if there is some dependency on order of execution between them.
In your specific problem, there isn't any particular reason to use fork(). Your professor probably gave you this just for a trivial example.
If you want the parent to handle input and the child to calculate, all you have to do is move the call to fork() below the point at which you handle the command-line args. Using the same basic logic as above, have the child call display_fibonacci_sequence, and have the parent simply wait
The process which is selected by your system scheduler is chosen to run, not unlike any other application running on your operating system. The process spawned is treated like any other process where the scheduler assigns a priority or spot in queue or whatever the implementation is.
But then, how do we know exactly which process runs first? I meant the
order of execution.
There is no guarantee to which one ran first. fork returns 0 if it is the child and the pid of the child if it is the parent. Theoretically they could run at exactly the same time on a multiprocessor system. If you actually wanted to determine which ran first you could have a shared lock between the two processes. The one that acquires the lock first could be said to have run first.
In terms of what to do in your else statement. You'll want to wait for the child process to exit using wait or waitpid.
To be honest, I don't see any difference between using fork and not using fork.
The difference is that you create a child process. Another process on the system doing computation. For this simple problem the end user experience is the same. But fork is very different when you are writing systems like servers that need to deal with things concurrently.
Besides, if I want the parent process to handle the input from user, and let the child process handle the display, how could I do that?
You appear to have that setup already. The parent process just needs to wait for the child process to finish. The child process will printf the results to the terminal. And the parent process currently gets user input from the command line.
While you cannot control which process (parent or child) gets scheduled first after the fork (in fact on SMP/multicore it might be both!) there are many ways to synchronize the two processes, having one wait until the other reaches a certain point before it performs any nontrivial operations. One classic, extremely portable method is the following:
Prior to fork, call pipe to create a pipe.
Immediately after fork, the process that wants to wait should close the writing end of the pipe and call read on the reading end of the pipe.
The other process should immediately close the reading end of the pipe, and wait to close the writing end of the pipe until it's ready to let the other process run. (read will then return 0 in the other process)

How variables are shared between two process when the fork is involved

/* In alarm.c, the first function, ding, simulates an alarm clock. */
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
static int alarm_fired = 0;
void ding(int sig)
{
alarm_fired = 1;
}
/* In main, we tell the child process to wait for five seconds
before sending a SIGALRM signal to its parent. */
int main()
{
pid_t pid;
printf("alarm application starting\n");
pid = fork();
switch(pid) {
case -1:
/* Failure */
perror("fork failed");
exit(1);
case 0:
/* child */
sleep(5);
printf("getppid: %d\n", getppid());
kill(getppid(), SIGALRM);
exit(0);
}
/* The parent process arranges to catch SIGALRM with a call to signal
and then waits for the inevitable. */
printf("waiting for alarm to go off\n");
(void) signal(SIGALRM, ding);
printf("pid: %d\n", getpid());
pause();
if (alarm_fired)
printf("Ding!\n");
printf("done\n");
exit(0);
}
I have run the above code under Ubuntu 10.04 LTS
> user#ubuntu:~/Documents/./alarm
> alarm application starting
> waiting for alarm to go off
> pid: 3055
> getppid: 3055
> Ding!
> done
I have read the following statement from a book.
It’s important to be clear about the
difference between the fork system
call and the creation of new threads.
When a process executes a fork call, a
new copy of the process is created
with its own variables and its own
PID. This new process is scheduled
independently, and (in general)
executes almost independently of the
process that created it.
Question:
It seems to me that the variable alarm_fired is shared between the original process and the new created process.
Is that correct?
No. Each process gets its own copy of the variable (and pretty much everything else). If you change the variable in one process, it is changed only in that process, not in both. Each process has its own address space.
Compare that with threads, where all threads share a single address space, so a change in a variable in one thread will be visible in all other threads (within that process).
From the Linux fork(2) manpage:
fork() creates a child process that differs from the parent process only in its PID and PPID, and in the fact that resource utilizations are set to 0. File locks and pending signals are not inherited.
It is shared in the sense that immediately after the fork it has the same value in both processes. BUT when either writes to it the change is not propagated to the other process (that what different .
Also, see copy on write for interesting stuff.
EDIT
It seems that the new created process
modified the variable alarm_fired
which is then later seen by the old
process
The child is sending a signal to the parent. The parent then executes the handler and personally sets alarm_fired to one. The child itself never touches that variable.
No, variables are not shared across a fork(). In your code, the child process never touches alarm_fired. What the child does is send a signal to the parent. That signal fires a signal handler in the parent process' context, setting the variable.

Only one process must execute a code portion at a time

I am sorry I am repeating a question https://stackoverflow.com/questions/5687837/monitor-implementation-in-c but not getting a solution as yet. I have probably asked the question incorrectly.
Say I have a code portion B. A parent process spawns a number of child processes to execute code B but I would like only one process to be inside code portion B at a time. How can I do it in C on Linux platform?
Thanks for your help
An edit. Not threads but process.
You want a mutex.
pthread_mutex_t mutexsum;
pthread_mutex_init(&mutexsum, NULL);
pthread_mutex_lock (&mutexsum);
// Critical code
pthread_mutex_unlock (&mutexsum);
If you are serious about it being multiple processes instead of multiple threads, the mutex needs to be stored in a shared memory segment.
So what you want is to have exactly one child running at any point of time, then why spawn all the children processes all at once?
When a child process ends, a SIGCHLD is issued, you can write your own handler for this signal and call spawn from the handler. Then you have one new child process created when one perishes -- only one child process running. Below is a hack (useless, just for demo) to achieve this:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <signal.h>
#include <stdlib.h>
void spawn(void){
pid_t child_pid=fork();
if(child_pid > 0){
printf("new child created by %d !\n",getpid());
sleep(1);
}else if(child_pid == 0){
printf("child %d created !\n",getpid());
}else{
exit(EXIT_FAILURE);
}
}
void handler(int sigval){
spawn();
}
int main(void){
signal(SIGCHLD,handler);
spawn();
return 0;
}

Resources