I have a program in C which utilizes the fork() system call:
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
void doit(void)
{
pid_t pid;
fork();
fork();
printf("Unix System Programming\n");
return;
}
int main(void)
{
doit();
printf("WTF\n");
exit(0);
}
Now, this program gives me 8 lines of output. I think that is because of two forks 2^2 = 4 Times * 2 Print Statement = 8 Times. If I am wrong, please correct me and explain why.
Now, the question is why am I getting different outputs on each run? Let's say I execute this code: the first time I get output
Unix System Programming
WTF
Unix System Programming
Unix System Programming
Unix System Programming
WTF
WTF
WTF
and the 2nd time I get:
Unix System Programming
WTF
Unix System Programming
Unix System Programming
WTF
Unix System Programming
WTF
WTF
And third time again different. Why does this happen? I am clueless; kindly explain in detail.
When you fork a new process, the parent and child both run concurrently. The order that they execute their respective printf() statements is unpredictable -- sometimes the parent will print first, sometimes the child will.
You might understand better if you included the PID in the output, so you could see which process is printing each line. SO change it to:
printf("%d: Unix System Programming\n", getpid());
and
printf("%d: WTF\n", getpid());
What you should see is that each process prints Unix System Programming before WTF, but the order of processes will be mixed up.
The outputs that you are seeing are from different processes. Once fork has succeeded, you get a different child process (as well as the parent process). Since they are from different processes, you can't guarantee when one process gets its turn and executes. So the outputs of different processes get intermixed and the order might be different at each run.
If you want to make sure that parent and child processes run in some specific order then you have to synchronize them, which is typically a lot more work than just forking. (It will probably require a wait operation or the use of pipes — depending on the nature of synchronization that you want.)
Related
Consider the code given below:
#include <stdio.h>
#include <unistd.h>
int main()
{
fork();
fork() && fork() || fork();
fork();
printf("forked\n");
return 0;
}
The question is how many times forked wil be printed. As per my analysis, it should be printed 20 times. Also this answer confirms the same.
However when I run the code on onlinegdb.com and ideone.com, they print it 18 and 5 times respectively. Why so?
Your code don't create any threads. These are called Pthreads on Linux and you'll use pthread_create(3) (which internally uses clone(2)) to create them.
Of course, your code is using (incorrectly) fork(2) so it creates processes (unless fork is failing). Notice that fork is difficult to understand (and to explain, so I won't even try here). You may need to read a lot about it, e.g. fork wikipage, ALP, and perhaps Operating Systems: Three Easy Pieces (both have several chapters explaining it).
You should handle the failure of fork. As explained here, there are three cases to consider for each fork, and you'll better rewrite your code to do at most one fork per statement (an assignment like pid_t pida = fork();)
BTW, you'll better flush standard streams (and the data in their buffers) before every fork. I recommend using fflush(3) by calling fflush(NULL); before every fork.
Notice that each process has its own (unique) pid (see getpid(2) and credentials(7)). You might understand things better if you print it, so try using something like printf("forked in %ld\n", (long) getpid());
when I run the code
You really should run that code on your computer under Linux. Consider installing a Linux distribution (perhaps in some virtual machine) on your laptop or desktop. Notice that Linux is very developer- and student- friendly, and it is mostly made of free software whose source code you can study.
they print it 18 and 5 times respectively. Why so?
Free web services should limit the resources used by outside clients (on Linux, they probably would use setrlimit(2) for that purpose). Obviously, such sites -giving the ability to run nearly arbitrary C code- want to avoid fork bombs. Very likely, some of your fork-s failed on them (and since your original code don't check for failures, you did not notice that).
Even on your own desktop, you could not create a lot of processes. As a rule of thumb, you might have a few hundreds processes on your computer with most of them being idle (waiting, perhaps with poll(2) or a blocking read(2), etc, ... for some IO or for some timeout, see also time(7)), and only a dozen of them being runnable (by the process scheduler of your kernel). In other words, a process is quite a costly computing resource. If you have too many runnable processes you could experiment thrashing.
Use ps(1) and top(1) (and also htop and pgrep(1)) to query the processes on your Linux system. If you want to do that programmatically, use /proc/ (and see proc(5) for more) - which is used by ps, top, pgrep, htop etc...
I am trying to program a shell in C , and I found that each command is executed in a new process, my question is why do we make a new process to execute the command? can't we just execute the command in the current process?
It's because of how the UNIX system was designed, where the exec family of calls replace the current process. Therefore you need to create a new process for the exec call if you want the shell to continue afterward.
When you execute a command, one of the following happens:
You're executing a builtin command
You're executing an executable program
An executable program needs many things to work: different memory sections (stack, heap, code, ...), it is executed with a specific set of privileges, and many more things are happening.
If you run this new executable program in your current process, you're going to replace the current program (your shell) with the new one. It works perfectly fine but when the new executable program is done, you cannot go back to your shell since it's not in memory anymore. This is why we create a new process and run the executable program in this new process. The shell waits for this new process to be done, then it collects its exit status and prompts you again for a new command to execute.
can't we just execute the command in the current process?
Sure we can, but that would then replace the shell program with the program of the command called. But that's probably not something you want in this particular application. There are in fact, many situations in which replacing the process program via execve is a the most straightforward way to implement something. But in the case of a shell, that's likely not what you want.
You should not think processes to be something to be avoided or "feared". As a matter of fact, segregating different things into different processes is the foundation of reliability and security features. Processes are (mostly) isolated from each other, so if a process gets terminated for whatever reason (bug, crash, etc.) this in the first degree affects only that particular process.
Here's something to try out:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int segfault_crash()
{
fprintf(stderr, "I will SIGSEGV...\n");
fputs(NULL, stderr);
return 0;
}
int main(int argc, char *argv)
{
int status = -1;
pid_t const forked_pid = fork();
if( -1 == forked_pid ){
perror("fork: ");
return 1;
}
if( 0 == forked_pid ){
return segfault_crash();
}
waitpid(forked_pid, &status, 0);
if( WIFSIGNALED(status) ){
fprintf(stderr, "Child process %lld terminated by signal %d\n",
(long long)forked_pid,
(int)WTERMSIG(status) );
} else {
fprintf(stderr, "Child process %lld terminated normally\n",
(long long)forked_pid);
}
return 0;
}
This little program forks itself, then calls a function that deliberately performs undefined behavior, that on commonplace systems triggers some kind of memory protection fault (Access Violation on Windows, Segmentation Fault on *nix systems). But because this crash has been isolated into dedicated process, the parent process (and also siblings) are not crashing together with it.
Furthermore processes may drop their privileges, limit themselves to only a subset of system calls, and be moved into namespaces/containers, each of which prevents a bug in the process to damage the rest of the system. This is how modern browsers (for example) implement sandboxing, to improve security.
When using fork(), is it possible to ensure that the child process executes before the parent without using wait() in the parent?
This is related to a homework problem in the Process API chapter of Operating Systems: Three Easy Pieces, a free online operating systems book.
The problem says:
Write another program using fork(). The child process should
print "hello"; the parent process should print "goodbye". You should
try to ensure that the child process always prints first; can you do
this without calling wait() in the parent?
Here's my solution using wait():
#include <stdio.h>
#include <stdlib.h> // exit
#include <sys/wait.h> // wait
#include <unistd.h> // fork
int main(void) {
int f = fork();
if (f < 0) { // fork failed
fprintf(stderr, "fork failed\n");
exit(1);
} else if (f == 0) { // child
printf("hello\n");
} else { // parent
wait(NULL);
printf("goodbye\n");
}
}
After thinking about it, I decided the answer to the last question was "no, you can't", but then a later question seems to imply that you can:
Now write a program that uses wait() to wait for the child process
to finish in the parent. What does wait() return? What happens if
you use wait() in the child?
Am I interpreting the second question wrong? If not, how do you do what the first question asks? How can I make the child print first without using wait() in the parent?
I hope this answer is not too late.
Minutes ago, I have emailed Remiz(this book's author), and got such a replay(extract some segments):
Without calling wait() is hard, and not really the main point.
What you did -- learning about signals on your own -- is a good sign,
showing you will seek out deeper knowledge. Good for you!
Later, you'll be able to use a shared memory segment, and
either condition variables or semaphores, to solve this problem.
Create a pipe in the parent. After fork, close the write half in the parent and the read half in the child.
Then, poll for readability. Since the child never writes to it, it will wait until the child (and all grandchildren, unless you take special care) no longer exists, at which time poll will give a "read with hangup" response. (Alternatively, you could actually communicate over the pipe).
You should read about O_CLOEXEC. As a general rule, that flag should always be set unless you have a good reason to clear it.
I can't see why second question would imply that answer is "yes" to the first.
Yes there is plenty of solutions to obtain what asked, but of course I suspect that all are not in the "spirit" of the problem/question where the focus in on fork/wait primitives. The point is always to remember that you can't assume anything after a fork regarding the way processes ran relatively to each other.
To ensure the child process print first you need a kind of synchronization in between both processes, and there is a lot of system primitives that have a semantic of "communication" between processes (for example locks, semaphores, signals, etc). I doubt one of these is to be used her, as they are generally introduced slightly later in such a course.
Any other attempt only that will only rely on time assumption (like using sleep or loops to "slow" down the parent, etc) can lead to failure, means that you will not be able to prove that it will always succeed. Even if testing would probably show you that it seems correct, most of the runs you would try will not have the bad characteristics that lead to failure. Remember that except in realtime OSes, scheduling is almost an approximation of fair concurrency.
NOTE:
As Jonathan Leffler commented, I also suppose that using other wait-like primitives is forbidden (aka wait4, waitpid, etc) -- "spirit" argument.
I'm not sure whether this is against the spirit of the question, but I think that calling the pause system call in the parent process branch will cause the scheduler to immediately run the child process (if it didn't already run).
#include <unistd.h>
#include <stdio.h>
int main(){
fork();
return 0;
}
In my understanding, fork() will copy the parent's process, and run it as a child process; if that was the case, would the program above break? Because how I am understanding this program is: the program above will call fork() indefinitely, and eventually cause a Stack Overflow.
According to the POSIX specification:
Both processes shall continue to execute from the fork() function.
So, both processes will continue after the call to fork(), and both will immediately terminate.
The fork call does not make either the child or the parent process go back to the beginning of main and start over. It returns like a normal function, but it does it twice, once in the child and once in the parent, with different return values so you can tell which is which.
So, in your program, fork succeeds and then both processes go on to the return 0 and exit. Nothing bad will happen.
A variation will cause problems, though:
#include <unistd.h>
int
main(void)
{
for (;;)
fork();
/* not reached */
}
This is called a "fork bomb". Because it calls fork inside an infinite loop, never checking whether it's the parent or the child, the original process becomes two processes, and then four, and then eight, and ... until you run out of RAM, or at least process IDs. And it doesn't check for failure either, so it doesn't stop after the fork calls start failing. All of these processes will continue chewing up CPU forever, and none of the other programs running on the computer will be able to make forward progress.
Back in the days of mammoths and SunOS 4 it was even worse than that, a fork bomb would be liable to tickle a kernel bug and outright crash the minicomputer, and then the BOFH would come looking for you and he or she would not be happy. I would expect a modern kernel not to crash, and you might even be able to kill off the entire exponential process tree with control-C, but I'm not going to try it just to find out.
Incidentally, return_type whatever() is bad style in C, because for historical reasons it means whatever takes any number of arguments. Always write return_type whatever(void) instead.
How do you run an external program and pass it command line parameters using C? If you have to use operating system API, include a solution for Windows, Mac, and Linux.
It really depends on what you're trying to do, exactly, as it's:
OS dependent
Not quite clear what you're trying to do.
Nevertheless, I'll try to provide some information for you to decide.
On UNIX, fork() creates a clone of your process from the place where you called fork. Meaning, if I have the following process:
#include <unistd.h>
#include <stdio.h>
int main()
{
printf( "hi 2 u\n" );
int mypid = fork();
if( 0 == mypid )
printf( "lol child\n" );
else
printf( "lol parent\n" );
return( 0 );
}
The output will look as follows:
hi 2 u
lol child
lol parent
When you fork() the pid returned in the child is 0, and the pid returned in the parent is the child's pid. Notice that "hi2u" is only printed once... by the parent.
execve() and its family of functions are almost always used with fork(). execve() and the like overwrite the current stackframe with the name of the application you pass to it. execve() is almost always used with fork() where you fork a child process and if you're the parent you do whatever you need to keep doing and if you're the child you exec a new process. execve() is also almost always used with waitpid() -- waitpid takes a pid of a child process and, quite literally, waits until the child terminates and returns the child's exit status to you.
Using this information, you should be able to write a very basic shell; one that takes process names on the command line and runs processes you tell it to. Of course, shells do more than that, like piping input and output, but you should be able to accomplish the basics using fork(), execve() and waitpid().
NOTE: This is *nix specific! This will NOT work on Windows.
Hope this helped.
If you want to perform more complicated operations, like reading the output of the external program, you may be better served by the popen system call. For example, to programmatically access a directory listing (this is a somewhat silly example, but useful as an example), you could write something like this:
#include <stdio.h>
int main()
{
int entry = 1;
char line[200];
FILE* output = popen("/usr/bin/ls -1 /usr/man", "r");
while ( fgets(line, 199, output) )
{
printf("%5d: %s", entry++, line);
}
}
to give output like this
1: cat1
2: cat1b
3: cat1c
4: cat1f
5: cat1m
6: cat1s
...
#include <stdlib.h>
int main()
{
system("echo HAI");
return 0;
}
I want to give a big warning to not use system and 100% never use system when you write a library. It was designed 30 years ago when multithreading was unknown to the toy operating system called Unix. And it is still not useable even when almost all programs are multithreaded today.
Use popen or do a fork+execvp, all else is will give you hard to find problems with signal handling, crashs in environment handling code etc. It's pure evil and a shame that the selected and most rated answer is promoting the use of "system". It's more healthy to promote the use of Cocain on the workplace.
On UNIX, I think you basically need to fork it if you want the spawned process to run detached from your the spawing one : For instance if you don't want your spawned process to be terminate when you quit your spawning process.
Here is a page that explains all the subtle differences between Fork, System, Exec.
If you work on Win,Mac and linux, I can recommend you the Qt Framework and its QProcess object, but I don't know if that's an option for you. The great advantages is that you will be able to compile the same code on windows linux and mac :
QString program = "./yourspawnedprogram";
QProcess * spawnedProcess = new QProcess(parent);
spawnedProcess->start(program);
// or spawnedProcess->startDetached(program);
And for extra, you can even kill the child process from the mother process,
and keep in communication with it through a stream.
One solution is the system function defined in stdlib.h
int system(const char *string);
system api example
If you need to check/read/parse the output of your external command, I would suggest to use popen() instead of system().
Speaking of platform-dependent recipes, on Windows use CreateProcess, on Posix (Linux, Mac) use fork + execvp. But system() should cover your basic needs and is part of standard library.