C fork/exec with non-blocking pipe IO

C fork/exec with non-blocking pipe IO - c

This seems to be a fairly common thing to do, and I've managed to teach myself everything that I need to make it work, except that I now have a single problem, which is defying my troubleshooting.
int nonBlockingPOpen(char *const argv[]){
int inpipe;
pid_t pid;
/* open both ends of pipe nonblockingly */
pid = fork();
switch(pid){
case 0: /*child*/
sleep(1); /*child should open after parent has open for reading*/
/*redirect stdout to opened pipe*/
int outpipe = open("./fifo", O_WRONLY);
/*SHOULD BLOCK UNTIL MAIN PROCESS OPENS FOR WRITING*/
dup2(outpipe, 1);
fcntl(1, F_SETFL, fcntl(1, F_GETFL) | O_NONBLOCK);
printf("HELLO WORLD I AM A CHILD PROCESS\n");
/*This seems to be written to the pipe immediately, blocking or not.*/
execvp(*argv, argv);
/*All output from this program, which outputs "one" sleeps for 1 second
*outputs "two" sleeps for a second, etc, is captured only after the
*exec'd program exits!
*/
break;
default: /*parent*/
inpipe = open("./fifo", O_RDONLY | O_NONBLOCK);
sleep(2);
/*no need to do anything special here*/
break;
}
return inpipe;
}
Why won't the child process write its stdout to the pipe each time a line is generated? Is there something I'm missing in the way execvp or dup2 work? I'm aware that my approach to all this is a bit strange, but I can't find another way to capture output of closed-source binaries programatically.

I would guess you only get the exec'd program's output after it exits because it does not flush after each message. If so, there is nothing you can do from the outside.
I am not quite sure how this is supposed to relate to the choice between blocking and nonblocking I/O in your question. A non-blocking write may fail completely or partially: instead of blocking the program until room is available in the pipe, the call returns immediately and says that it was not able to write everything it should have. Non-blocking I/O neither makes the buffer larger nor forces output to be flushed, and it may be badly supported by some programs.
You cannot force the binary-only program that you are exec'ing to flush. If you thought that non-blocking I/O was a solution to that problem, sorry, but I'm afraid it is quite orthogonal.
EDIT: Well, if the exec'd program only uses the buffering provided by libc (does not implement its own) and is dynamically linked, you could force it to flush by linking it against a modified libc that flushes every write. This would be a desperate measure. to try only if everything else failed.

When a process is started (via execvp() in your example), the behaviour of standard output depends on whether the output device is a terminal or not. If it is not (and a FIFO is not a terminal), then the output will be fully buffered, rather than line buffered. There is nothing you can do about that; the (Standard) C library does that.
If you really want to make it work line buffered, then you will have to provide the program with a pseudo-terminal as its standard output. That gets into interesting realms - pseudo-terminals or ptys are not all that easy to handle. For the POSIX functions, see:
grantpt() - grant access to the slave pseudo-terminal device
posix_openpt() - open a pseudo-terminal device
ptsname() - get name of the slave pseudo-terminal device
unlockpt() - unlock a pseudo-terminal master/slave pair

Why won't the child process write its stdout to the pipe each time a line is generated?
How do you know that? You do not even try to read the output from the fifo.
N.B. by the file name I presume that you are using the fifo. Or is it a plain file?
And the minor bug in the child: after dup2(), you need to close(outpipe).
fcntl(1, F_SETFL, fcntl(1, F_GETFL) | O_NONBLOCK);
Depending on what program you exec(), you might either lose some output or cause the program to fail since write to stdout now might fail with EWOULDBLOCK.
IIRC fifos has the same buffer size as pipes. Per POSIX minimum is 512 bytes, commonly 4K or 8K.
You probably want to explain why you need that at all. Non-blocking IO has different semantics compared to blocking IO and unless your child process expects that you will run into various problems.
printf("HELLO WORLD I AM A CHILD PROCESS\n");
stdout is buffered, I would have after that fflush(stdout). (Can't find documentation whether exec() on its own would flush stdout or not.)
Is there something I'm missing in the way execvp or dup2 work? I'm aware that my approach to all this is a bit strange, but I can't find another way to capture output of closed-source binaries programatically.
I wouldn't toy with non-blocking IO - and leave it as it is in blocking mode.
And I would use pipe() instead of the fifo. Linux's man pipe has a convenient example with the fork().
Otherwise, that is a pretty normal practice.

The sleep()s do not guarantee that the parent will open the pipe first - as Dummy00001 says, you should be using a pipe() pipe, not a named pipe. You should also check for execvp() and fork() failing, and you shouldn't be setting the child side to non-blocking - that's a decision for the child process to make.
int nonBlockingPOpen(char *const argv[])
{
int childpipe[2];
pid_t pid;
pipe(childpipe);
pid = fork();
if (pid == 0)
{
/*child*/
/*redirect stdout to opened pipe*/
dup2(childpipe[1], 1);
/* close leftover pipe file descriptors */
close(childpipe[0]);
close(childpipe[1]);
execvp(*argv, argv);
/* Only reached if execvp fails */
perror("execvp");
exit(1);
}
/*parent*/
/* Close leftover pipe file descriptor */
close(childpipe[1]);
/* Check for fork() failing */
if (pid < 0)
{
close(childpipe[0]);
return -1;
}
/* Set file descriptor non-blocking */
fcntl(childpipe[0], F_SETFL, fcntl(childpipe[0], F_GETFL) | O_NONBLOCK);
return childpipe[0];
}

Related

How to use stderr with execve [duplicate]

I'm writing a C program where I fork(), exec(), and wait(). I'd like to take the output of the program I exec'ed to write it to file or buffer.
For example, if I exec ls I want to write file1 file2 etc to buffer/file. I don't think there is a way to read stdout, so does that mean I have to use a pipe? Is there a general procedure here that I haven't been able to find?

For sending the output to another file (I'm leaving out error checking to focus on the important details):
if (fork() == 0)
{
// child
int fd = open(file, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
dup2(fd, 1); // make stdout go to file
dup2(fd, 2); // make stderr go to file - you may choose to not do this
// or perhaps send stderr to another file
close(fd); // fd no longer needed - the dup'ed handles are sufficient
exec(...);
}
For sending the output to a pipe so you can then read the output into a buffer:
int pipefd[2];
pipe(pipefd);
if (fork() == 0)
{
close(pipefd[0]); // close reading end in the child
dup2(pipefd[1], 1); // send stdout to the pipe
dup2(pipefd[1], 2); // send stderr to the pipe
close(pipefd[1]); // this descriptor is no longer needed
exec(...);
}
else
{
// parent
char buffer[1024];
close(pipefd[1]); // close the write end of the pipe in the parent
while (read(pipefd[0], buffer, sizeof(buffer)) != 0)
{
}
}

You need to decide exactly what you want to do - and preferably explain it a bit more clearly.
Option 1: File
If you know which file you want the output of the executed command to go to, then:
Ensure that the parent and child agree on the name (parent decides name before forking).
Parent forks - you have two processes.
Child reorganizes things so that file descriptor 1 (standard output) goes to the file.
Usually, you can leave standard error alone; you might redirect standard input from /dev/null.
Child then execs relevant command; said command runs and any standard output goes to the file (this is the basic shell I/O redirection).
Executed process then terminates.
Meanwhile, the parent process can adopt one of two main strategies:
Open the file for reading, and keep reading until it reaches an EOF. It then needs to double check whether the child died (so there won't be any more data to read), or hang around waiting for more input from the child.
Wait for the child to die and then open the file for reading.
The advantage of the first is that the parent can do some of its work while the child is also running; the advantage of the second is that you don't have to diddle with the I/O system (repeatedly reading past EOF).
Option 2: Pipe
If you want the parent to read the output from the child, arrange for the child to pipe its output back to the parent.
Use popen() to do this the easy way. It will run the process and send the output to your parent process. Note that the parent must be active while the child is generating the output since pipes have a small buffer size (often 4-5 KB) and if the child generates more data than that while the parent is not reading, the child will block until the parent reads. If the parent is waiting for the child to die, you have a deadlock.
Use pipe() etc to do this the hard way. Parent calls pipe(), then forks. The child sorts out the plumbing so that the write end of the pipe is its standard output, and ensures that all other file descriptors relating to the pipe are closed. This might well use the dup2() system call. It then executes the required process, which sends its standard output down the pipe.
Meanwhile, the parent also closes the unwanted ends of the pipe, and then starts reading. When it gets EOF on the pipe, it knows the child has finished and closed the pipe; it can close its end of the pipe too.

Since you look like you're going to be using this in a linux/cygwin environment, you want to use popen. It's like opening a file, only you'll get the executing programs stdout, so you can use your normal fscanf, fread etc.

After forking, use dup2(2) to duplicate the file's FD into stdout's FD, then exec.

You could also use the linux sh command and pass it a command that includes the redirection:
string cmd = "/bin/ls > " + filepath;
execl("/bin/sh", "sh", "-c", cmd.c_str(), 0);

For those such as myself who like a complete example with includes, here's this fantastic answer with a runnable example (still without error handling, left as an exercise):
#include <fcntl.h>
#include <sys/wait.h>
#include <unistd.h>
int main() {
if (fork() == 0) { // child
int fd = open("test.txt", O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
dup2(fd, 1); // make stdout go to file
dup2(fd, 2); // make stderr go to file - you may choose to not do this
// or perhaps send stderr to another file
close(fd); // fd no longer needed - the dup'ed handles are sufficient
execlp("ls", "ls", NULL);
}
else {
while (wait(NULL) > 0) {} // wait for each child process
}
return 0;
}

understanding pipe() function

I'm trying to understand how pipe() function works and I have the following program example
int main(void)
{
int fd[2], nbytes;
pid_t childpid;
char string[] = "Hello, world!\n";
char readbuffer[80];
pipe(fd);
if((childpid = fork()) == -1)
{
perror("fork");
exit(1);
}
if(childpid == 0)
{
/* Child process closes up input side of pipe */
close(fd[0]);
/* Send "string" through the output side of pipe */
write(fd[1], string, (strlen(string)+1));
exit(0);
}
else
{
/* Parent process closes up output side of pipe */
close(fd[1]);
/* Read in a string from the pipe */
nbytes = read(fd[0], readbuffer, sizeof(readbuffer));
printf("Received string: %s", readbuffer);
}
return(0);
}
My first question is what benefits do we get from closing the file descriptor using close(fd[0]) and close(fd[1]) in child and parent processes. Second, we use write in child and read in parent, but what if parent process reaches read before child reaches write and tries to read from pipe which has nothing in it ? Thanks!

Daniel Jour gave you 99% of the answer already, in a very succinct and easy to understand manner:
Closing: Because it's good practice to close what you don't need. For the second question: These are potentially blocking functions. So reading from an empty pipe will just block the reader process until something gets written into the pipe.
I'll try to elaborate.
Closing:
When a process is forked, its open files are duplicated.
Each process has a limit on how many files descriptors it's allowed to have open. As stated in the documentation: each side of the pipe is a single fd, meaning a pipe requires two file descriptors and in your example, each process is only using one.
By closing the file descriptor you don't use, you're releasing resources that are in limited supply and which you might need further on down the road.
e.g., if you were writing a server, that extra fd means you can handle one more client.
Also, although releasing resources on exit is "optional", it's good practice. Resources that weren't properly released should be handled by the OS...
...but the OS was also written by us programmers, and we do make mistakes. So it only makes sense that the one who claimed a resource and knows about it will be kind enough to release the resource.
Race conditions (read before write):
POSIX defines a few behaviors that make read, write and pipes a good choice for thread and process concurrency synchronization. You can read more about it on the Rational section for write, but here's a quick rundown:
By default, pipes (and sockets) are created in what is known as "blocking mode".
This means that the application will hang until the IO operation is performed.
Also, IO operations are atomic, meaning that:
You will never be reading and writing at the same time. A read operation will wait until a write operation completes before reading from the pipe (and vice-versa)
if two threads call read in the same time, each will get a serial (not parallel) response, reading sequentially from the pipe (or socket) - this make pipes great tools for concurrency handling.
In other words, when your application calls:
read(fd[0], readbuffer, sizeof(readbuffer));
Your application will wait forever for some data to be available and for the read operation to complete (which it will once 80 (sizeof(readbuffer)) bytes were read, or if the EOF status changed during a read).

beej guide pipe example explanation

The following code is the pipe implementation given in beej's guide:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(void)
{
int pfds[2];
pipe(pfds);
if (!fork()) {
close(1); /* close normal stdout */
dup(pfds[1]); /* make stdout same as pfds[1] */
close(pfds[0]); /* we don't need this */
execlp("ls", "ls", NULL);
} else {
close(0); /* close normal stdin */
dup(pfds[0]); /* make stdin same as pfds[0] */
close(pfds[1]); /* we don't need this */
execlp("wc", "wc", "-l", NULL);
}
return 0;
}
I wanted to ask:
Is it possible that close(0) is executed before dup(pfds[1])? If yes, then in that case the program will not behave as expected.
What is the use of the following lines of code:
close(pfds[0]); /* we don't need this */
close(pfds[1]); /* we don't need this */
And what would change if these lines were not there?

Is it possible that close(0) is executed before dup(pfds[1])? If yes,
then in that case the program will not behave as expected.
Yes, it is possible to have the parent successfully complete close(0) before the child calls dup(pfds[1]). However, this is not a problem. When you fork a new process, the new process gets an entire copy of the parent's memory address space, including open file descriptors (except those marked with the O_CLOEXEC flag - see fcntl(2)). So, essentially each process has its own private copy of the file descriptors and is isolated and free to do whatever it wants with that copy.
Thus, when the parent calls close(0), it is only closing its copy of file descriptor 0 (stdin); it does not affect the child in any way, which still has a reference to stdin and can use it if needed (even though in this example it won't).
What is the use of the following lines of code:
close(pfds[0]); /* we don't need this */
close(pfds[1]); /* we don't need this */
Best practices mandate that you should close file descriptors that you don't use - this is the case for close(pfds[0]). Unused open file descriptors eat up space and resources, why keep it open if you're not going to use it?
close(pfds[1]) is a little more subtle though. Pipes report end of file only when there is no more data in the pipe buffer and there are no active writers, i.e., no live processes that have the pipe open for writing. If you do not close pfds[1] in the parent, the program will hang forever because wc(1) will never see the end of input, since there is a process (wc(1) itself) that has the pipe opened for writing and as such could (but won't) write more data.
Tl;DR: close(pfds[0]) is just good practice but not mandatory; close(pfds[1]) is absolutely necessary to ensure program correctness.

Question 1:
Yes it is entirely possible that "close(0);" (in the parent) is executed before "dup(pfds[1]);" (in the child). But since this happens in different processes, the child will still have fd 0 open.
Question 2:
It is good bookkeeping practice to close the end of the pipe that a process is not going to use. That way, you can avoid bugs further down the road in more complex programs. In the above scenario, the child process should ever only read from the pipe. If you close the write end in the child, eny attempt to write to it will cause an error, otherwise you might have a bug that is hard to detect.

How can I ensure a child process eventually writes data in C?

In C, I'd like to fork off a child process, and map its STDIN and STDOUT to pipes. The parent then communicates with the child by writing to or reading from the child's STDIN and STDOUT.
The MWE code below is apparently successful. The parent thread receives the string "Sending some message", and I can send arbitrary messages to the parent thread by writing to stdout. I can also freely read messages from the parent using, e.g. scanf.
The problem is that, once execl is called by the child, the output seems to stop coming through. I know that without the call to setvbuf to unbuffer stdout, this code will hang indefinitely, and so I suppose that the call to execl re-buffers stdout. Since the child program ./a.out is itself interactive, we hit a race condition where the child will not write (because of the buffering), and blocks waiting for input, while the parent blocks waiting for the child to write before producing input for the child.
Is there a nice way to avoid this? In particular, is there a way to use exec that doesn't overwrite the attributes of stdin stdout, etc.?
int main(char* argv[], int argc){
int mgame_read_pipe[2];
int mgame_write_pipe[2];
pipe(mgame_read_pipe);
pipe(mgame_write_pipe);
pid_t is_child = fork();
if(is_child == -1){
perror("Error while forking.");
exit(1);
}
if(is_child==0){
dup2(mgame_read_pipe[1], STDOUT_FILENO);
printf("Sending some message.\n");
dup2(mgame_write_pipe[0], STDIN_FILENO);
setvbuf(stdin, NULL, _IONBF, 0);
setvbuf(stdout, NULL, _IONBF, 0);
close(mgame_read_pipe[0]);
close(mgame_write_pipe[1]);
execl("./a.out", "./a.out", NULL);
}
else{
close(mgame_read_pipe[1]);
close(mgame_write_pipe[0]);
int status;
do{
printf("SYSTEM: Waiting for inferior process op.\n");
char buf[BUFSIZ];
read(mgame_read_pipe[0], buf, BUFSIZ);
printf("%s",buf);
scanf("%s", buf);
printf("SYSTEM: Waiting for inferior process ip.\n");
write(mgame_write_pipe[1], buf, strlen(buf));
} while( !waitpid(is_child, &status, WNOHANG) );
}
}
EDIT: For completeness, here's an (untested) example a.out:
int main(){
printf("I'm alive!");
int parent_msg;
scanf("%d", &parent_msg);
printf("I got %d\n");
}

Your buffering problems stem from the fact that the buffering is being performed by the C standard library in the program that you are exec-ing, not at the kernel / file descriptor level (as observed by #Claris). There is nothing you can do to affect buffering in another programs own code (unless you modify that program).
This is actually a common problem encountered by anyone trying to automate interaction with a program.
One solution is to use a pseudo-tty, which makes the program think it is actually talking to an interactive terminal, which alters it's buffering behaviour, amongst other things.
This article provides a good introduction. There is an example program there showing exactly how to achieve what you are trying to do.

The setvbuf options you are setting have to do with stdio streams and not file descriptors so will have no effect.
The read/write system calls are not buffered (aside from caching which is different and which might exist in the kernel), so you don't need to worry about disabling a buffer or any other such stuff. They will go directly to where they need to go.
That being said, they are blocking so if the kernel does not have enough data to fill your IO block size they will block at the OS level until that data exists and can be copied to/from your buffer. They will only provide you less than the data you asked for if an EOF condition is encountered or you have enabled async/non blocking IO.
You may be able to enable non-blocking IO through a system call using the fcntl interface. This would return immediately but is not always supported depending on how you are using a file descriptor. Async IO (for files) is supported through the AIO interface.

How to pipe own output to another process?

I want to do simple thing: my_process | proc2 | proc3, but programatically - without using shell, that can do this pretty easy. Is this possible? I cannot find anything :(
EDIT:
Well, without code, nobody will know, what problem I'm trying to resolve. Actually, no output is going out (I'm using printfs)
int pip1[2];
pipe(pip1);
dup2(pip1[1], STDOUT_FILENO);
int fres = fork();
if (fres == 0) {
close(pip1[1]);
dup2(pip1[0], STDIN_FILENO);
execlp("wc", "wc", (char*)0);
}
else {
close(pip1[0]);
}

Please learn about file descriptors and the pipe system call. Also, check read and write.

Your 'one child' code has some major problems, most noticeably that you configure the wc command to write to the pipe, not to your original standard output. It also doesn't close enough file descriptors (a common problem with pipes), and isn't really careful enough if the fork() fails.
You have:
int pip1[2];
pipe(pip1);
dup2(pip1[1], STDOUT_FILENO); // The process will write to the pipe
int fres = fork(); // Both the parent and the child will…
// Should handle fork failure
if (fres == 0) {
close(pip1[1]);
dup2(pip1[0], STDIN_FILENO); // Should close pip1[0] too
execlp("wc", "wc", (char*)0);
}
else { // Should duplicate pipe to stdout here
close(pip1[0]); // Should close pip1[1] too
}
You need:
fflush(stdout); // Print any pending output before forking
int pip1[2];
pipe(pip1);
int fres = fork();
if (fres < 0)
{
/* Failed to create child */
/* Report problem */
/* Probably close both ends of the pipe */
close(pip1[0]);
close(pip1[1]);
}
else if (fres == 0)
{
dup2(pip1[0], STDIN_FILENO);
close(pip1[0]);
close(pip1[1]);
execlp("wc", "wc", (char*)0);
}
else
{
dup2(pip1[1], STDOUT_FILENO);
close(pip1[0]);
close(pip1[1]);
}
Note that the amended code follows the:
Rule of thumb: If you use dup2() to duplicate one end of a pipe to standard input or standard output, you should close both ends of the original pipe.
This also applies if you use dup() or fcntl() with F_DUPFD.
The corollary is that if you don't duplicate one end of the pipe to a standard I/O channel, you typically don't close both ends of the pipe (though you usually still close one end) until you're finished communicating.
You might need to think about saving your original standard output before running the pipeline if you ever want to reinstate things.

As Alex answered, you'll need syscalls like pipe(2), dup2(2), perhaps poll(2) and some other syscalls(2) etc.
Read Advanced Linux Programming, it explains that quite well...
Also, play with strace(1) and study the source code of some simple free software shell.
See also popen(3) -which is not enough in your case-
Recall that stdio(3) streams are buffered. You probably need to fflush(3) at appropriate places (e.g. before fork(2))

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight