Different outputs with fork() - c

can anyone explain why the output of
main()
{
printf("hello ");
fork();
printf("hello ");
}
is:
hello hello hello hello
and the output of:
main()
{
printf("hello\n");
fork();
printf("hello ");
}
is:
hello
hello hello
what difference does \n make w.r.t to buffer?

When you fork the memory of the process is copied. This includes stdio buffers, so if the hello stays in the buffer it will be printed by both processes. Both processes go on about their business and eventually flush their buffers and you see "hello" twice.
Now on most implementations stdout is line-buffered which means a \n triggers a flush. So when the fork happens the buffer is empty. A sure fire way to prevent this would be to flush everything before forking.
EDIT
so why does the hello appears twice in the second line of second
output
There are now two processes (parent & child) executing the same code so that printf is executed twice.

While cnicutar's answer describes the mechanism behind what's happening in common implementations, the core issue is that your program is invoking undefined behavior. POSIX places rules on switching the "active handle" for an open file:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_05_01
fork is one situation where new handles for a file come into existence, and unless you have prepared for switching the active handle prior to fork, it's undefined behavior to use both of them after fork:
Note that after a fork(), two handles exist where one existed before. The application shall ensure that, if both handles can ever be accessed, they are both in a state where the other could become the active handle first. The application shall prepare for a fork() exactly as if it were a change of active handle. (If the only action performed by one of the processes is one of the exec functions or _exit() (not exit()), the handle is never accessed in that process.)

Related

when a child process is created using fork() system call, where the child process starts execution? [duplicate]

This question already has answers here:
printf anomaly after "fork()"
(3 answers)
Closed 8 years ago.
fork() creates a new process and the child process starts to execute from the current state of the parent process.
This is the thing I know about fork() in Linux.
So, accordingly the following code:
int main() {
printf("Hi");
fork();
return 0;
}
needs to print "Hi" only once as per the above.
But on executing the above in Linux, compiled with gcc, it prints "Hi" twice.
Can someone explain to me what is happening actually on using fork() and if I have understood the working of fork() properly?
(Incorporating some explanation from a comment by user #Jack)
When you print something to the "Standard Output" stdout (computer monitor usually, although you can redirect it to a file), it gets stored in temporary buffer initially.
Both sides of the fork inherit the unflushed buffer, so when each side of the fork hits the return statement and ends, it gets flushed twice.
Before you fork, you should fflush(stdout); which will flush the buffer so that the child doesn't inherit it.
stdout to the screen (as opposed to when you're redirecting it to a file) is actually buffered by line ends, so if you'd done printf("Hi\n"); you wouldn't have had this problem because it would have flushed the buffer itself.
printf("Hi"); doesn't actually immediately print the word "Hi" to your screen. What it does do is fill the stdout buffer with the word "Hi", which will then be shown once the buffer is 'flushed'. In this case, stdout is pointing to your monitor (assumedly). In that case, the buffer will be flushed when it is full, when you force it to flush, or (most commonly) when you print out a newline ("\n") character. Since the buffer is still full when fork() is called, both parent and child process inherit it and therefore they both will print out "Hi" when they flush the buffer. If you call fflush(stout); before calling fork it should work:
int main() {
printf("Hi");
fflush(stdout);
fork();
return 0;
}
Alternatively, as I said, if you include a newline in your printf it should work as well:
int main() {
printf("Hi\n");
fork();
return 0;
}
In general, it's very unsafe to have open handles / objects in use by libraries on either side of fork().
This includes the C standard library.
fork() makes two processes out of one, and no library can detect it happening. Therefore, if both processes continue to run with the same file descriptors / sockets etc, they now have differing states but share the same file handles (technically they have copies, but the same underlying files). This makes bad things happen.
Examples of cases where fork() causes this problem
stdio e.g. tty input/output, pipes, disc files
Sockets used by e.g. a database client library
Sockets in use by a server process - which can get strange effects when a child to service one socket happens to inherit a file handle for anohter - getting this kind of programming right is tricky, see Apache's source code for examples.
How to fix this in the general case:
Either
a) Immediately after fork(), call exec(), possibly on the same binary (with necessary parameters to achieve whatever work you intended to do). This is very easy.
b) after forking, don't use any existing open handles or library objects which depend on them (opening new ones is ok); finish your work as quickly as possible, then call _exit() (not exit() ). Do not return from the subroutine that calls fork, as that risks calling C++ destructors etc which may do bad things to the parent process's file descriptors. This is moderately easy.
c) After forking, somehow clear up all the objects and make them all in a sane state before having the child continue. e.g. close underlying file descriptors without flushing data which are in a buffer which is duplicated in the parent. This is tricky.
c) is approximately what Apache does.
printf() does buffering. Have you tried printing to stderr?
Technical answer:
when using fork() you need to make sure that exit() is not called twice (falling off of main is the same as calling exit()). The child (or rarely the parent) needs to call _exit instead. Also, don't use stdio in the child. That's just asking for trouble.
Some libraries have a fflushall() you can call before fork() that makes stdio in the child safe. In this particular case it would also make exit() safe but that is not true in the general case.

How fork() and scanf() work together?

I tried to see what happens if I read something from keyboard while I have multiple processes with fork() (in my case there are two children and a parent) and I discovered the following problem: I need to tell the parent to wait for children's processes, otherwise the program behaves strangely.
I did a research and I found that the problem is with the parent, he needs to wait for the child's process to end because if the parent's process ends first somehow he closes the STDIN, am I right? But also I found that every process has a copy of STDIN so my question is:
Why it works this way and why only the parent has the problem with STDIN and the children not, I mean why if the child's process ends first doesn't affect STDIN but if the parent's process ends first it does affect STDIN?
Here are my tests:
I ran the program without wait() and after I typed a number the program stopped, but then I pressed enter two more times and the other two messages from printf() appeared.
When I ran the program with wait() everything worked fine, every process called scanf() separately and read a different number.
Well, a lot of stuff is going on here. I will try to explain it step by step.
When you start your terminal, the terminal creates a special file having path /dev/pts/<some number>. Then it starts your shell (which is bash in this case) and links the STDIN, STDOUT and STDERR of the bash process to this special file. This file is called a special file because it doesn't actually exist on your hard disk. Instead, whatever you write to this file, it goes directly to the terminal and the terminal renders it on the screen. (Similarly, whenever you try to read from this file, the read blocks until someone types something at the terminal).
Now when you launch your program by typing ./main, bash calls the fork function in order create a new process. The child process execs your executable file, while the parent process waits for the child to terminate. Your program then calls fork twice and we have three processes trying to read their STDINs, ie the same file /dev/pts/something. (Remember that calling fork and exec duplicates and preserves the file descriptors respectively).
The three processes are in race condition. When you enter something at the terminal, one of the three processes will receive it (99 out of 100 times it would be the parent process since the children have to do more work before reaching scanf statement).
So, parent process prints the number and exits first. The bash process that was waiting for the parent to finish, resumes and puts the STDIN into a so called "non-canonical" mode, and calls read in order to read the next command. Now again, three processes (Child1, Child2 and bash) are trying to read STDIN.
Since the children are trying to read STDIN for a longer time, the next time you enter something it will be received by one of the children, rather than bash. So you think of typing, say, 23. But oops! Just after you press the 2 key, you get Your number is: 2. You didn't even press the Enter key! That happened because of this so called "non-canonical" mode. I won't be going into what and why is that. But for now, to make things easier, use can run your program on sh instead of bash, since sh doesn't put STDIN into non-canonical mode. That will make the picture clear.
TL;DR
No, parent process closing its STDIN doesn't mean that its children or other process won't be able to use it.
The strange behavior you are seeing is because when the parent exits, bash puts the pty (pseudo terminal) into non-canonical mode. If you use sh instead, you won't see that behavior. Read up on pseudo terminals, and line discipline if you want to have a clear understading.
The shell process will resume as soon as the parent exits.
If you use wait to ensure that parents exits last, you won't have any problem, since the shell won't be able to run along with your program.
Normally, bash makes sure that no two foreground processes read from STDIN simultaneously, so you don't see this strange behavior. It does this by either piping STDOUT of one program to another, or by making one process a background process.
Trivia: When a background process tries to read from its STDIN, it is sent a signal SIGTTIN, which stops the process. Though, that's not really relevant to this scenario.
There are several issues that can happen when multiple processes try to do I/O to the same TTY. Without code, we can't tell which may be happening.
Trying to do I/O from a background process group may deliver a signal: SIGTTIN for input (usually enabled), or SIGTTOU for output (usually disabled)
Buffering: if you do any I/O before the fork, any data that has been buffered will be there for both processes. Under some conditions, using fflush may help, but it's better to avoid buffering entirely. Remember that, unlike output buffering, it is impossible to buffer input on a line-by-line basis (although you can only buffer what is available, so it may appear to be line-buffered at first).
Race conditions: if more than one process is trying to read the same pipe-like file, it is undefined which one will "win" and actually get the input each time it is available.

General Quick fork() explanation

suppose if we have something like this:
printf("A");
fork();
printf("B");
Is the output going to be
1) ABAB
2) ABB
Can you please explain?
The right answer is that it depends on the buffering mode of stdout, which the other answers seem to be ignoring.
When you fork with unflushed buffers and then continue using stdio in both processes (instead of the usual quick execve or _exit in the child process), the stuff that was in the buffer at the time of the fork can be printed twice, once by each process.
It is undefined and ABB, ABAB and AABB is possible. First (ABB) can happen on unbuffered output only; with buffered output both processes will have the A in their output buffer. By calling fflush(3) before the fork(2) you can enforce this behavior.
Order of last chars depends on order of execution; most likely you will get ABAB as in this short example the program won't be interrupted by the scheduler.
The output should read as "ABB"
fork copies the entire program into a new memory space and continues from the fork. Since both processes are running the same code, I would save the process id that is returned from fork, so the rest of your program knows what to do.
the output is simply abb. Fork create one new child process so after execution of fork() two process will run and generally child process first get the chance to executes. See before fork only one process so the out put before fork only once A. Then fork executes and hence two different process now running each will print B and hence output is ABB.

Standard streams and vfork

I am playing a bit with fork/vfork functions, and there is something that is puzzling to me. In Stevens book it is written that:
Note in Figure 8.3 that we call _exit instead of exit.
As we described in Section 7.3, _exit does not perform any flushing of standard I/O buffers. If we call exit instead, the results are indeterminate.
Depending on the implementation of the standard I/O library, we might see no difference in the output, or we might find that the output from the parent's printf has disappeared.
If the child calls exit, the implementation flushes the standard I/O streams.
If this is the only action taken by the library, then we will see no difference with the output generated if the child called _exit.
If the implementation also closes the standard I/O streams, however, the memory representing the FILE object for the standard output will be cleared out.
Because the child is borrowing the parent's address space, when the parent resumes and calls printf, no output will appear and printf will return -1.
Note that the parent's STDOUT_FILENO is still valid, as the child gets a copy of the parent's file descriptor array (refer back to Figure 8.2).
Most modern implementations of exit will not bother to close the streams.
Because the process is about to exit, the kernel will close all the file descriptors open in the process.
Closing them in the library simply adds overhead without any benefit.
so I tried to test if I can get printf error, in my manual of vfork there is:
All open stdio(3) streams are flushed and closed. Files created by tmpfile(3) are removed.
but when I compile and execute this program:
#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/wait.h>
int main()
{
int s;
pid_t ret;
if (vfork() == 0)
{
//abort();
exit(6);
}
else
{
ret=wait(&s);
printf("termination status to %d\n",s);
if (WIFEXITED(s))
printf("normalnie, status to %d\n",WEXITSTATUS(s));
}
return 0;
}
everything is working fine, I don't get any printf errors. Why is that?
The end of the paragraph you quoted says:
Most modern implementations of exit will not bother to close the streams. Because the process is about to exit, the kernel will close all the file descriptors open in the process. Closing them in the library simply adds overhead without any benefit.
This is most likely what's happening. Your OS doesn't actually close the stream (but it does probably flush it).
The important thing isn't what exit does here, its the underlying concept. The child is sharing the parent's memory and stack frame. That means that the child can very easily change something that the parent did not expect, which could easily cause the parent to crash or misbehave when it starts running again. The man page for vfork says the only thing a process can do is call exit() or an exec. In fact, the child should not even allocate memory or modify any variables.
To see the impact of this, try putting the vfork call inside of a function and let the child return or modify some variables there and see what happens.

fork starts executing form where?

To my previous question about segmentation fault ,I got very useful answers.Thanks for those who have responded.
#include<stdio.h>
main()
{
printf("hello");
int pid = fork();
wait(NULL);
}
output: hellohello.
In this the child process starts executing form the beginning.
If Iam not wrong , then how the program works if I put the sem_open before fork()
(ref answers to :prev questions)
I need a clear explanation about segmentation fault which happens occasionally and not always. And why not always... If there is any error in coding then it should occur always right...?
fork creates a clone of your process. Conceptually speaking, all state of the parent also ends up in the child. This includes:
CPU registers (including the instruction pointer, which defines where in the code your program is)
Memory (as an optimization your kernel will most likely mark all pages as copy-on-write, but semantically speaking it should be the same as copying all memory.)
File descriptors
Therefore... Your program will not "start running" from anywhere... All the state that you had when you called fork will propagate to the child. The child will return from fork just as the parent will.
As for what you can do after a fork... I'm not sure about what POSIX says, but I wouldn't rely on semaphores doing the right thing after a fork. You might need an inter-process semaphore (see man sem_open, or the pshared parameter of sem_init). In my experience cross-process semaphores aren't really well supported on free Unix type OS's... (Example: Some BSDs always fail with ENOSYS if you ever try to create one.)
#GregS mentions the duplicated "hello" strings after a fork. He is correct to say that stdio (i.e. FILE*) will buffer in user-space memory, and that a fork leads to the string being buffered in two processes. You might want to call fflush(stdout); fflush(stderr); and flush any other important FILE* handles before a fork.
No, it starts from the fork(), which returns 0 in the child or the child's process ID in the parent.
You see "hello" twice because the standard output is buffered, and has not actually been written at the point of the fork. Both parent and child then actually write the buffered output. If you fflush(stdout); after the printf(), you should see it only once.

Resources