Standard streams and vfork - c

I am playing a bit with fork/vfork functions, and there is something that is puzzling to me. In Stevens book it is written that:
Note in Figure 8.3 that we call _exit instead of exit.
As we described in Section 7.3, _exit does not perform any flushing of standard I/O buffers. If we call exit instead, the results are indeterminate.
Depending on the implementation of the standard I/O library, we might see no difference in the output, or we might find that the output from the parent's printf has disappeared.
If the child calls exit, the implementation flushes the standard I/O streams.
If this is the only action taken by the library, then we will see no difference with the output generated if the child called _exit.
If the implementation also closes the standard I/O streams, however, the memory representing the FILE object for the standard output will be cleared out.
Because the child is borrowing the parent's address space, when the parent resumes and calls printf, no output will appear and printf will return -1.
Note that the parent's STDOUT_FILENO is still valid, as the child gets a copy of the parent's file descriptor array (refer back to Figure 8.2).
Most modern implementations of exit will not bother to close the streams.
Because the process is about to exit, the kernel will close all the file descriptors open in the process.
Closing them in the library simply adds overhead without any benefit.
so I tried to test if I can get printf error, in my manual of vfork there is:
All open stdio(3) streams are flushed and closed. Files created by tmpfile(3) are removed.
but when I compile and execute this program:
#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/wait.h>
int main()
{
int s;
pid_t ret;
if (vfork() == 0)
{
//abort();
exit(6);
}
else
{
ret=wait(&s);
printf("termination status to %d\n",s);
if (WIFEXITED(s))
printf("normalnie, status to %d\n",WEXITSTATUS(s));
}
return 0;
}
everything is working fine, I don't get any printf errors. Why is that?

The end of the paragraph you quoted says:
Most modern implementations of exit will not bother to close the streams. Because the process is about to exit, the kernel will close all the file descriptors open in the process. Closing them in the library simply adds overhead without any benefit.
This is most likely what's happening. Your OS doesn't actually close the stream (but it does probably flush it).
The important thing isn't what exit does here, its the underlying concept. The child is sharing the parent's memory and stack frame. That means that the child can very easily change something that the parent did not expect, which could easily cause the parent to crash or misbehave when it starts running again. The man page for vfork says the only thing a process can do is call exit() or an exec. In fact, the child should not even allocate memory or modify any variables.
To see the impact of this, try putting the vfork call inside of a function and let the child return or modify some variables there and see what happens.

Related

C file pointer changing after fork and (failed) exec

I made program which make fork and I think child does not affect parent.
But file pointer is changed although I did not made any changes in the parent.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void) {
FILE *fp = fopen("sm.c", "r");
char buf[1000];
char *args[] = {"invailid_command", NULL};
fgets(buf, sizeof(buf), fp);
printf("I'm one %d %ld\n", getpid(), ftell(fp));
if (fork() == 0) {
execvp(args[0], args);
exit(EXIT_FAILURE);
}
wait(NULL);
printf("I'm two %d %ld\n", getpid(), ftell(fp));
}
This outputs
I'm one 21500 20
I'm two 21500 -1
And I want to make file pointer not change between two printf calls.
Why does the file pointer change and can I make the file pointer unchangeable even though execvp fails?
Credit to Jonathan Leffler for pointing us in the right direction.
Although your program does not produce the same unexpected behavior for me on CentOS 7 / GCC 4.8.5 / GLIBC 2.17, it is plausible that you observe different behavior. Your program's behavior is in fact undefined according to POSIX (on which you rely for fork). Here are some excerpts from the relevant section (emphasis added):
An open file description may be accessed through a file descriptor,
which is created using functions such as open() or pipe(), or through
a stream, which is created using functions such as fopen() or popen().
Either a file descriptor or a stream is called a "handle" on the open
file description to which it refers; an open file description may have
several handles.
[...]
The result of function calls involving any one handle (the "active
handle") is defined elsewhere in this volume of POSIX.1-2017, but if
two or more handles are used, and any one of them is a stream, the
application shall ensure that their actions are coordinated as
described below. If this is not done, the result is undefined.
[...]
For a handle to become the active handle, the application shall ensure
that the actions below are performed between the last use of the
handle (the current active handle) and the first use of the second
handle (the future active handle). The second handle then becomes the
active handle. [...]
The handles need not be in the same process for these rules to apply.
Note that after a fork(), two handles exist where one existed before.
The application shall ensure that, if both handles can ever be
accessed, they are both in a state where the other could become the
active handle first. [Where subject to the preceding qualification, the] application shall prepare for a fork()
exactly as if it were a change of active handle. (If the only action
performed by one of the processes is one of the exec functions or
_exit() (not exit()), the handle is never accessed in that process.)
For the first handle, the first applicable condition below applies.
[An impressively long list of alternatives that do not apply to the OP's situation ...]
If the stream is open with a mode that allows reading and the underlying open file description refers to a device that is capable of
seeking, the application shall either perform an fflush(), or the
stream shall be closed.
For the second handle:
If any previous active handle has been used by a function that explicitly changed the file offset, except as required above for the
first handle, the application shall perform an lseek() or fseek() (as
appropriate to the type of handle) to an appropriate location.
Thus, for the OP's program to access the same stream in both parent and child, POSIX demands that the parent fflush() stdin before forking, and that the child fseek() it after starting. Then, after waiting for the child to terminate, the parent must fseek() the stream. Given that we know the child's exec will fail, however, the requirement for all the flushing and seeking can be avoided by having the child use _exit() (which does not access the stream) instead of exit().
Complying with POSIX's provisions yields the following:
When these rules are followed, regardless of the sequence of handles
used, implementations shall ensure that an application, even one
consisting of several processes, shall yield correct results: no data
shall be lost or duplicated when writing, and all data shall be
written in order, except as requested by seeks.
It is worth noting, however, that
It is
implementation-defined whether, and under what conditions, all input
is seen exactly once.
I appreciate that it may be somewhat unsatisfying to hear merely that your expectations for program behavior are not justified by the relevant standards, but that's really all there is. The parent and child processes do have some relevant shared data in the form of a common open file description (with which they have separate handles associated), and that seems likely to be the vehicle for the unexpected (and undefined) behavior, but there's no basis for predicting the specific behavior you see, nor the different behavior I see for the same program.
I was able to reproduce this on Ubuntu 16.04 with gcc 5.4.0. The culprit here is exit in conjunction with the way the child process is being created.
The man page for exit states the following:
The exit() function causes normal process termination and the value
of status & 0377 is returned to the parent (see wait(2)).
All functions registered with atexit(3) and on_exit(3) are
called, in the reverse order of their registration. (It is possible
for one of these functions to use atexit(3) or on_exit(3) to
register an additional function to be executed during exit processing;
the new registration is added to the front of the list of
functions that remain to be called.) If one of these functions does
not return (e.g., it calls _exit(2), or kills itself with a
signal), then none of the remaining functions is called, and further
exit processing (in particular, flushing of stdio(3) streams)
is abandoned. If a function has been registered multiple times
using atexit(3) or on_exit(3), then it is called as many times as
it was registered.
All open stdio(3) streams are flushed and closed. Files created by
tmpfile(3) are removed.
The C standard specifies two constants, EXIT_SUCCESS and
EXIT_FAILURE, that may be passed to exit() to indicate successful or
unsuccessful termination, respectively.
So when you call exit in the child it closes the FILE represented by fp.
Normally when a child process is created, it gets a copy of the parent's file descriptors. However, in this case it seems the child's memory still physically points to the parent's. So when exit closes the FILE it is affecting the parent.
If you change the child to instead call _exit, it closes the child's file descriptor but manages to not touch the FILE object and the second call to ftell in the parent will succeed. It's good practice to use _exit in a non-exec'ed child anyway because it prevents atexit handlers from being called in the child.

when a child process is created using fork() system call, where the child process starts execution? [duplicate]

This question already has answers here:
printf anomaly after "fork()"
(3 answers)
Closed 8 years ago.
fork() creates a new process and the child process starts to execute from the current state of the parent process.
This is the thing I know about fork() in Linux.
So, accordingly the following code:
int main() {
printf("Hi");
fork();
return 0;
}
needs to print "Hi" only once as per the above.
But on executing the above in Linux, compiled with gcc, it prints "Hi" twice.
Can someone explain to me what is happening actually on using fork() and if I have understood the working of fork() properly?
(Incorporating some explanation from a comment by user #Jack)
When you print something to the "Standard Output" stdout (computer monitor usually, although you can redirect it to a file), it gets stored in temporary buffer initially.
Both sides of the fork inherit the unflushed buffer, so when each side of the fork hits the return statement and ends, it gets flushed twice.
Before you fork, you should fflush(stdout); which will flush the buffer so that the child doesn't inherit it.
stdout to the screen (as opposed to when you're redirecting it to a file) is actually buffered by line ends, so if you'd done printf("Hi\n"); you wouldn't have had this problem because it would have flushed the buffer itself.
printf("Hi"); doesn't actually immediately print the word "Hi" to your screen. What it does do is fill the stdout buffer with the word "Hi", which will then be shown once the buffer is 'flushed'. In this case, stdout is pointing to your monitor (assumedly). In that case, the buffer will be flushed when it is full, when you force it to flush, or (most commonly) when you print out a newline ("\n") character. Since the buffer is still full when fork() is called, both parent and child process inherit it and therefore they both will print out "Hi" when they flush the buffer. If you call fflush(stout); before calling fork it should work:
int main() {
printf("Hi");
fflush(stdout);
fork();
return 0;
}
Alternatively, as I said, if you include a newline in your printf it should work as well:
int main() {
printf("Hi\n");
fork();
return 0;
}
In general, it's very unsafe to have open handles / objects in use by libraries on either side of fork().
This includes the C standard library.
fork() makes two processes out of one, and no library can detect it happening. Therefore, if both processes continue to run with the same file descriptors / sockets etc, they now have differing states but share the same file handles (technically they have copies, but the same underlying files). This makes bad things happen.
Examples of cases where fork() causes this problem
stdio e.g. tty input/output, pipes, disc files
Sockets used by e.g. a database client library
Sockets in use by a server process - which can get strange effects when a child to service one socket happens to inherit a file handle for anohter - getting this kind of programming right is tricky, see Apache's source code for examples.
How to fix this in the general case:
Either
a) Immediately after fork(), call exec(), possibly on the same binary (with necessary parameters to achieve whatever work you intended to do). This is very easy.
b) after forking, don't use any existing open handles or library objects which depend on them (opening new ones is ok); finish your work as quickly as possible, then call _exit() (not exit() ). Do not return from the subroutine that calls fork, as that risks calling C++ destructors etc which may do bad things to the parent process's file descriptors. This is moderately easy.
c) After forking, somehow clear up all the objects and make them all in a sane state before having the child continue. e.g. close underlying file descriptors without flushing data which are in a buffer which is duplicated in the parent. This is tricky.
c) is approximately what Apache does.
printf() does buffering. Have you tried printing to stderr?
Technical answer:
when using fork() you need to make sure that exit() is not called twice (falling off of main is the same as calling exit()). The child (or rarely the parent) needs to call _exit instead. Also, don't use stdio in the child. That's just asking for trouble.
Some libraries have a fflushall() you can call before fork() that makes stdio in the child safe. In this particular case it would also make exit() safe but that is not true in the general case.

printf flush at program exit

I'm interested in knowing how the printf() function's flush works when the program exits.
Let's take the following code:
int main(int ac, char **av)
{
printf("Hi");
return 0;
}
In this case, how does printf() manage to flush its buffer to stdout?
I guess it's platform dependent, so let's take Linux.
It could be implemented using gcc's __attribute__((dtor)) but then the standard library would be compiler dependent. I assume this is not the way it works.
Any explanations or links to documentation is appreciated. Thank you.
The C runtime will register atexit() handlers to flush standard buffers when exit() is called.
See this explanation.
When the program exits normally, the exit function has always performed a clean shutdown of the standard I/O library, this causes all buffered output data to be flushed.
Returning an integer value from the main function is equivalent to calling exit with the same value.So, return 0 has the same effect with exit(0)
If _Exit or _exit was called, the process will be terminated immediately, the IO won't be flushed.
Just to expand trofanjoe's response:
exit causes normal program termination. atexit functions are called in
reverse order of registration, open files are flushed, open streams
are closed, and control is returned to the environment.
and
Within main, return expr is equivalent to exit(expr). exit has the
advantage that it can be called from other functions
From man stdio on my machine here (emphasis added), whic runs RHEL 5.8:
A file may be subsequently reopened, by the same or another
program execution, and its contents reclaimed or modified (if it can
be repositioned at the start). If the main function returns to its
original caller, or the exit(3) function is called, all open files are
closed (hence all output streams are flushed) before program
termination. Other methods of program termination, such as abort(3)
do not bother about closing files properly.

Different outputs with fork()

can anyone explain why the output of
main()
{
printf("hello ");
fork();
printf("hello ");
}
is:
hello hello hello hello
and the output of:
main()
{
printf("hello\n");
fork();
printf("hello ");
}
is:
hello
hello hello
what difference does \n make w.r.t to buffer?
When you fork the memory of the process is copied. This includes stdio buffers, so if the hello stays in the buffer it will be printed by both processes. Both processes go on about their business and eventually flush their buffers and you see "hello" twice.
Now on most implementations stdout is line-buffered which means a \n triggers a flush. So when the fork happens the buffer is empty. A sure fire way to prevent this would be to flush everything before forking.
EDIT
so why does the hello appears twice in the second line of second
output
There are now two processes (parent & child) executing the same code so that printf is executed twice.
While cnicutar's answer describes the mechanism behind what's happening in common implementations, the core issue is that your program is invoking undefined behavior. POSIX places rules on switching the "active handle" for an open file:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_05_01
fork is one situation where new handles for a file come into existence, and unless you have prepared for switching the active handle prior to fork, it's undefined behavior to use both of them after fork:
Note that after a fork(), two handles exist where one existed before. The application shall ensure that, if both handles can ever be accessed, they are both in a state where the other could become the active handle first. The application shall prepare for a fork() exactly as if it were a change of active handle. (If the only action performed by one of the processes is one of the exec functions or _exit() (not exit()), the handle is never accessed in that process.)

fork starts executing form where?

To my previous question about segmentation fault ,I got very useful answers.Thanks for those who have responded.
#include<stdio.h>
main()
{
printf("hello");
int pid = fork();
wait(NULL);
}
output: hellohello.
In this the child process starts executing form the beginning.
If Iam not wrong , then how the program works if I put the sem_open before fork()
(ref answers to :prev questions)
I need a clear explanation about segmentation fault which happens occasionally and not always. And why not always... If there is any error in coding then it should occur always right...?
fork creates a clone of your process. Conceptually speaking, all state of the parent also ends up in the child. This includes:
CPU registers (including the instruction pointer, which defines where in the code your program is)
Memory (as an optimization your kernel will most likely mark all pages as copy-on-write, but semantically speaking it should be the same as copying all memory.)
File descriptors
Therefore... Your program will not "start running" from anywhere... All the state that you had when you called fork will propagate to the child. The child will return from fork just as the parent will.
As for what you can do after a fork... I'm not sure about what POSIX says, but I wouldn't rely on semaphores doing the right thing after a fork. You might need an inter-process semaphore (see man sem_open, or the pshared parameter of sem_init). In my experience cross-process semaphores aren't really well supported on free Unix type OS's... (Example: Some BSDs always fail with ENOSYS if you ever try to create one.)
#GregS mentions the duplicated "hello" strings after a fork. He is correct to say that stdio (i.e. FILE*) will buffer in user-space memory, and that a fork leads to the string being buffered in two processes. You might want to call fflush(stdout); fflush(stderr); and flush any other important FILE* handles before a fork.
No, it starts from the fork(), which returns 0 in the child or the child's process ID in the parent.
You see "hello" twice because the standard output is buffered, and has not actually been written at the point of the fork. Both parent and child then actually write the buffered output. If you fflush(stdout); after the printf(), you should see it only once.

Resources