Different input/output stream for every forked process

Different input/output stream for every forked process - c

I have some code, where several processes are created by forking. Every process have popen() function to execute some shell command. Problem is that all of these processes use same input/output stream. This cause situation, when collision occurs because of processes write to one stream simultaneously.
Is there any way to resolve that problem, so that every forked process used it's own stream?
It is not allowed to do anything with forking in my case.

You'll have to close and reopen your stdin and stdout before or, if possible, right after the fork, in the child process.

When you call fork(), you inherit the file descriptors (stdin, stdout, etc) from the parent process. When you popen it's going to take the shared stdin/stdout and pipe it into the popened process. It sounds like you want to close any open file descriptors after forking, and reopen them.

Related

read() hangs on zombie process

I have a while loop that reads data from a child process using blocking I/O by redirecting stdout of the child process to the parent process. Normally, as soon as the child process exits, a blocking read() in this case will return since the pipe that is read from is closed by the child process.
Now I have a case where the read() call does not exit for a child process that finishes. The child process ends up in a zombie state, since the operating system is waiting for my code to reap it, but instead my code is blocking on the read() call.
The child process itself does not have any child processes running at the time of the hang, and I do not see any file descriptors listed when looking in /proc/<child process PID>/fd. The child process did however fork two daemon processes, whose purpose seems to be to monitor the child process (the child process is a proprietary application I do not have any control over, so it is hard to say for sure).
When run from a terminal, the child process I try to read() from exits automatically, and in turn the daemon processes it forked terminate as well.
Linux version is 4.19.2.
What could be the reason of read() not returning in this case?
Follow-up: How to avoid read() from hanging in the following situation?

The child process did however fork two daemon processes ... What could be the reason of read() not returning in this case?
Forked processes still have the file descriptor open when the child terminates. Hence read call never returns 0.
Those daemon processes should close all file descriptors and open files for logging.

A possible reason (the most common) for read(2) blocking on a pipe with a dead child, is that the parent has not closed the writing side of the pipe, so there's still an open (for writing) descriptor for that pipe. Close the writing side of the pipe in the parent process before reading from it. The child is dead (you said zombie) so it cannot be the process with the writing side of the pipe open. And don't forget to wait(2) for the child in the parent, or you'll get a system full of zombies :)
Remember, you have to do two closes in your code:
One in the parent process, to close the writing side of the pipe, leaving the parent process with only a reading descriptor.
One in the child process (just before exec(2)ing) closing the reading side of the pipe, leaving the child process only with a writing descriptor.
In case you want to use the pipe(2) to send information to the child, change the reading for writing and viceversa in the above two points.

Fork and pipe creation

My book on C applied to Linux, says that if a process creates a child with a fork(), then the pipe created between them follow this principle:
It is important to notice that both the parent process and the child process initially close their unused ends of the pipe
If both processes start with their pipe-end closed, how they know when the other is free to communicate? Maybe, is there an intermediate buffer between the processes?

Pipes on computers works very much like pipes in real life. There are two ends, you put something into one end and it comes out the other end.
Normally when using pipes in a program, you usually only want the input-end, where you write data, or you want the output-end, where data is read from. If the parent process only wants to write to the child process, and the child process only reads from the parent process, then the parent process could close the read end after the fork, and the child process can close the write end.

Pipe is an interprocess communication mechanism provided by the kernel. A process writing on the pipe need not worry whether there is some other process to read it. The communication is asynchronous. The kernel takes care of the data in transit.

Why parent process has to close all file descriptors of a pipe before calling wait( )?

I do not know why the parent process needs to close both the file descriptors of a pipe before calling wait()?
I have a C program which does:
Parent creates child_a, which executes ls -l using execvp, and writes to the pipe (after closing read end of pipe).
Parent creates another child (without closing any file descriptor for pipe), called child_b, which executes 'wc' by reading from pipe.(after closing write end of pipe).
Parent waits for both children to complete by calling wait() twice.
I noticed that program is blocked if parent does not close both file descriptors of the pipe before calling the wait() syscall. Also after reading few questions already posted online it looks like this is the general rule and needs to be done. But I could not find the reason why this has to be done?
Why does wait() not return if the parent does not close the file descriptors of the pipe?
I was thinking that, in the worst case, if the parent does not close the file descriptor of pipe, then the only consequence would be that the pipe would keep existing (which is a waste of resource). But I never thought this would block the execution of child process (as can be seen because wait() does not return).
Also remember, parent is not using the pipe at all. It is child_a writing in the pipe, and child_b reading from the pipe.

If the parent process doesn't close the write ends of the pipes, the child processes never get EOF (zero bytes read) because there's a process that might (but won't) write to the pipe. The child process must also close the write end of the pipe for the same reason — if it doesn't, there's a process (itself) that might (but won't) write to the pipe, so the read won't return EOF.
If you duplicate one end of a pipe to standard output or standard error, you should close both ends of that pipe. It is a common mistake not to have enough calls to close() in multiprocess code using pipes. Occasionally, you get away with being sloppy, but the details vary by case and usually you don't.

Why should you close a pipe in linux?

When using a pipe for process-process communication, what is the purpose of closing one end of the pipe?
For example: How to send a simple string between two programs using pipes?
Notice that one side of the pipe is closed in the child and parent processes. Why is this required?

If you connect two processes - parent and child - using a pipe, you create the pipe before the fork.
The fork makes the both processes have access to both ends of the pipe. This is not desirable.
The reading side is supposed to learn that the writer has finished if it notices an EOF condition. This can only happen if all writing sides are closed. So it is best if it closes its writing FD ASAP.
The writer should close its reading FD just in order not to have too many FDs open and thus reaching a maybe existing limit of open FDs. Besides, if the then only reader dies, the writer gets notified about this by getting a SIGPIPE or at least an EPIPE error (depending on how signals are defined). If there are several readers, the writer cannot detect that "the real one" went away, goes on writing and gets stuck as the writing FD blocks in the hope, the "unused" reader will read something.
So here in detail what happens:
parent process calls pipe() and gets 2 file descriptors: let's call it rd and wr.
parent process calls fork(). Now both processes have a rd and a wr.
Suppose the child process is supposed to be the reader.
Then
the parent should close its reading end (for not wasting FDs and for proper detection of dying reader) and
the child must close its writing end (in order to be possible to detect the EOF condition).

The number of file descriptors that can be open at a given time is limited. If you keep opening pipes and not closing them pretty soon you'll run out of FDs and can't open anything anymore: not pipes, not files, not sockets, ...
Another reason why it can be important to close the pipe is when the closing itself has a meaning to the application. For example, a common use of pipes is to send the errno from a child process to the parent when using fork and exec to launch an external program:
The parent creates the pipe, calls fork to create a child process, closes its writing end, and tries to read from the pipe.
The child process attempts to use exec to run a different program:
If exec fails, for example because the program does not exist, the child writes errno to the pipe, and the parent reads it and knows what went wrong, and can tell the user.
If exec is successful the pipe is closed without anything being written. The read function in the parent returns 0 indicating the pipe was closed and knows the program was successfully started.
If the parent did not close its writing end of the pipe before trying to read from the pipe this would not work because the read function would never return when exec is successful.

Closing unused pipe file descriptor is more than a matter of ensuring that a process doesn't exhaust its limited set of file descriptor-it is essential to the correct use of pipes. We now consider why the unused file descriptors for both the read and write ends of the pipe must be closed.
The process reading from the pipe closes its write descriptor for the pipe, so that, when the other process completes its output and closes its write descriptor, the read sees end-of-file (once it has ready any outstanding data in the pipe).
If the reading process doesn't close the write end of the pipe, then after the other process closes its write descriptor, the reader won't see end-of-file, even after it has read all data from the pipe. Instead, a read() would block waiting for data, because the kernel knows that there is still at least one write descriptor open for the pipe.That this descriptor is held open by the reading process itself is irrelevant; In theory, that process could still write to the pipe, even if it is blocked trying to read.
For example, the read() might be interrupted by a signal handler that writes data to the pipe.
The writing process closes its read descriptor for the pipe for a different reason.
When a process tries to write to a pipe for which no process has an open read descriptor, the kernel sends the SIGPIPE signal to the writing process. By default, this signal kills a process. A process can instead arrange to catch or ignore this signal, in which case the write() on the pipe fails with the error EPIPE (broken pipe). Receiving the SIGPIPE signal or getting the EPIPE error is useful indication about the status of the pipe, and this is why unused read descriptors for the pipe should be closed.
If the writing process doesn't close the read end of the pipe, then even after the other process closes the read end of the pipe, the writing process will fill the pipe, and a further attempt to write will block indefinitely.
One final reason for closing unused file descriptor is that only after it all file descriptor are closed that the pipe is destroyed and its resources released for reuse by other processes. At this point, any unread data in the pipe is lost.
~ Micheal Kerrisk , the Linux programming interface

how to communicate to a program with another external program

I'm trying to write to stdin and read from stdout ( and stderr ) from an external program, without changing the code.
I've tried using named pipes, but stdout doesn't show until the program is terminated and stdin only works on the first input( then cin is null ).
i've tried using /proc/[pid]/fd but that only writes and reads from the terminal and not the program.
i've tried writing a character device file for this and it worked, but only one program at a time ( this needs to work for multiple programs at a time ).
at this point, to my knowledge, I could write the driver that worked to multiplex the io across multiple programs but I don't think that's the "right" solution.
the main purpose of this is to view a feed of a program through a web interface. I'm sure there has to be someway to do this. is there anything I haven't tried that's been done before?

The typical way of doing this is:
Create anonymous pipes (not named pipes) with the pipe(2) system call for the new process's standard streams
Call fork(2) to spawn the child process
close(2) the appropriate ends of the pipes in both the parent and the child (e.g. for the stdin pipe, close the read end in the parent and close the write end in the child; vice-versa for the stdout and stderr pipes)
Use dup2(2) in the child to copy the pipe file descriptors onto file descriptors 0, 1, and 2, and then close(2) the remaining old descriptors
exec(3) the external application in the child process
In the parent process, simultaneously write to the child's stdin pipe and read from the child's stdout and stderr pipes. However, depending on how the child behaves, this can easily lead to deadlock if you're not careful. One way to avoid deadlock is to spawn separate threads to handle each of the 3 streams; another way is to use the select(2) system call to wait until one of the streams can be read from/written to without blocking, and then process that stream.
Even if you do this all correctly, you may still not see your program's output right away. This is typically due to buffering stdout. Normally, when stdout is going to a terminal, it's line-buffered—it gets flushed after every newline gets written. But when stdout is a pipe (or anything else that's not a terminal, like a file or a socket), it's fully buffered, and it only gets written to when the program has outputted a full buffer's worth of data (e.g. 4 KB).
Many programs have command line options to change their buffering behavior. For example, grep(1) has the --line-buffered flag to force it to line-buffer its output even when stdout isn't a terminal. If your external program has such an option, you should probably use it. If not, it's still possible to change the buffering behavior, but you have to use some sneaky tricks—see this question and this question for how to do that.