Find out if pipe's read end is currently blocking - c

I'm trying to find out if a child process is waiting for user input (without parsing its output). Is it possible, in C on Unix, to determine if a pipe's read end currently has a read() call blocking?
The thing is, I have no control over the programs exec'd in the child processes. They print all kinds of verbose garbage which I would usually want to redirect to /dev/null. Occasionally though one will prompt the user for something. (With the prompt having no reliable format.) So my idea was:
In a loop:
Drain child's stdout, append it to a temporary buffer.
Check (no idea how) if the child is asking for user input, in which case the buffer is printed to stdout.
When the child exits, throw away the buffer.

The thing is, I have no control over the programs exec'd in the child processes. They print all kinds of verbose garbage which I would usually want to redirect to /dev/null. Occasionally though one will prompt the user for something. (With the prompt having no reliable format.) So my idea was:
In a loop:
Drain child's stdout, append it to a temporary buffer.
Check (no idea how) if the child is asking for user input, in which case the buffer is printed to stdout.
When the child exits, throw away the buffer.

You have these options:
if you know that the child will need certain input (such as shell that will read a command), just write to a pipe
if you assume the child won't read anything usually, but may do it sometimes, you probably need something like job control in the shell (use a terminal for communication with the child, use process groups and TIOCSPGRP ioctl on the terminal to get the child to the background; the child will get SIGTTIN when it tries to read from the terminal, and you can wait() for that). This is how bash handles things like "(sleep 10; read a;)&"
if you don't know what to write, or you have more possibilities, you will have to parse the output

That sounds as if you were trying to supervise dpkg where occasionally some post-inst script queries the admin whether it may override some config file.
Anyway, you may want to look at how strace works:
strace -f -etrace=read your.program
Of course you need to keep track of which fds are the pipes you write about, but you probably need only stdin, anyway.

I don't think that's true: For example, right before calling read() on the reader side, the pipe would have a reader that isn't actually reading.

You would typically just write to the pipe, or use select or poll. If you need a handshake mechanism you can do that out of band various ways or come up with and in-band protocol.
I don't know if there is a built-in way to know if a reader on the other end is blocking. Why do you need to know this?

If I recall correctly, you can not have a pipe with no reader which means that you have either a read(2) or a select(2) syscal pending at all time.

Related

Capturing stdout/stderr separately and simultaneously from child process results in wrong total order (libc/unix)

I'm writing a library that should execute a program in a child process, capture the output, and make the output available in a line by line (string vector) way. There is one vector for STDOUT, one for STDERR, and one for "STDCOMBINED", i.e. all output in the order it was printed by the program. The child process is connected via two pipes to a parent process. One pipe for STDOUT and one for STDERR. In the parent process I read from the read-ends of the pipes, in the child process I dup2()'ed STDOUT/STDERR to the write ends of the pipes.
My problem:
I'd like to capture STDOUT, STDERR, and "STDCOMBINED" (=both in the order they appeared). But the order in the combined vector is different to the original order.
My approach:
I iterate until both pipes show EOF and the child process exited. At each iteration I read exactly one line (or EOF) from STDOUT and exactly one line (or EOF) from STDERR. This works so far. But when I capture out the lines as they come in the parent process, the order of STDOUT and STDERR is not the same as if I execute the program in a shell and look at the output.
Why is this so and how can I fix this? Is this possible at all? I know in the child process I could redirect STDOUT and STDERR both to a single pipe but I need STDOUT and STDERR separately, and "STDCOMBINED".
PS: I'm familiar with libc/unix system calls, like dup2(), pipe(), etc. Therefore I didn't post code. My question is about the general approach and not a coding problem in a specific language. I'm doing it in Rust against the raw libc bindings.
PPS: I made a simple test program, that has a mixup of 5 stdout and 5 stderr messages. That's enough to reproduce the problem.
At each iteration I read exactly one line (or EOF) from STDOUT and exactly one line (or EOF) from STDERR.
This is the problem. This will only capture the correct order if that was exactly the order of output in the child process.
You need to capture the asynchronous nature of the beast: make your pipe endpoints nonblocking, select* on the pipes, and read whatever data is present, as soon as select returns. Then you'll capture the correct order of the output. Of course now you can't be reading "exactly one line": you'll have to read whatever data is available and no more, so that you won't block, and maintain a per-pipe buffer where you append new data, extract any lines that are present, shove the unprocessed output to the beginning, and repeat. You could also use a circular buffer to save a little bit of memcpy-ing, but that's probably not very important.
Since you're doing this in Rust, I presume there's already a good asynchronous reaction pattern that you could leverage (I'm spoiled with go, I guess, and project the hopes on the unsuspecting).
*Always prefer platform-specific higher-performance primitives like epoll on Linux, /dev/poll on Solaris, pollset &c. on AIX
Another possibility is to launch the target process with LD_PRELOAD, with a dedicated library that it takes over glibc's POSIX write, detects writes to the pipes, and encapsulates such writes (and only those) in a packet by prepending it with a header that has an (atomically updated) process-wide incrementing counter stored in it, as well as the size of the write. Such headers can be easily decoded on the other end of the pipe to reorder the writes with a higher chance of success.
I think it's not possible to strictly do what you want to do.
If you think about how it's done when running a command in an interactive shell, what happens is that both stdout and stderr point to the same file descriptor (the TTY), so the total ordering is correct by means of synchronization against the same file.
To illustrate, imagine what happens if the child process has 2 completely independent threads, one only writing to stderr, and to other only writing to stdout. The total ordering would depend on however the scheduler decided to schedule these threads, and if you wanted to capture that, you'd need to synchronize those threads against something.
And of course, something can write thousands of lines to stdout before writing anything to stderr.
There are 2 ways to relax your requirements into something workable:
Have the user pass a flag waiving separate stdout and stderr streams in favor of a correct stdcombined, and then redirect both to a single file descriptor. You might need to change the buffering settings (like stdbuf does) before you execute the process.
Assume that stdout and stderr are "reasonably interleaved", an assumption pointed out by #Nate Eldredge, in which case you can use #Unslander Monica's answer.

How fork() and scanf() work together?

I tried to see what happens if I read something from keyboard while I have multiple processes with fork() (in my case there are two children and a parent) and I discovered the following problem: I need to tell the parent to wait for children's processes, otherwise the program behaves strangely.
I did a research and I found that the problem is with the parent, he needs to wait for the child's process to end because if the parent's process ends first somehow he closes the STDIN, am I right? But also I found that every process has a copy of STDIN so my question is:
Why it works this way and why only the parent has the problem with STDIN and the children not, I mean why if the child's process ends first doesn't affect STDIN but if the parent's process ends first it does affect STDIN?
Here are my tests:
I ran the program without wait() and after I typed a number the program stopped, but then I pressed enter two more times and the other two messages from printf() appeared.
When I ran the program with wait() everything worked fine, every process called scanf() separately and read a different number.
Well, a lot of stuff is going on here. I will try to explain it step by step.
When you start your terminal, the terminal creates a special file having path /dev/pts/<some number>. Then it starts your shell (which is bash in this case) and links the STDIN, STDOUT and STDERR of the bash process to this special file. This file is called a special file because it doesn't actually exist on your hard disk. Instead, whatever you write to this file, it goes directly to the terminal and the terminal renders it on the screen. (Similarly, whenever you try to read from this file, the read blocks until someone types something at the terminal).
Now when you launch your program by typing ./main, bash calls the fork function in order create a new process. The child process execs your executable file, while the parent process waits for the child to terminate. Your program then calls fork twice and we have three processes trying to read their STDINs, ie the same file /dev/pts/something. (Remember that calling fork and exec duplicates and preserves the file descriptors respectively).
The three processes are in race condition. When you enter something at the terminal, one of the three processes will receive it (99 out of 100 times it would be the parent process since the children have to do more work before reaching scanf statement).
So, parent process prints the number and exits first. The bash process that was waiting for the parent to finish, resumes and puts the STDIN into a so called "non-canonical" mode, and calls read in order to read the next command. Now again, three processes (Child1, Child2 and bash) are trying to read STDIN.
Since the children are trying to read STDIN for a longer time, the next time you enter something it will be received by one of the children, rather than bash. So you think of typing, say, 23. But oops! Just after you press the 2 key, you get Your number is: 2. You didn't even press the Enter key! That happened because of this so called "non-canonical" mode. I won't be going into what and why is that. But for now, to make things easier, use can run your program on sh instead of bash, since sh doesn't put STDIN into non-canonical mode. That will make the picture clear.
TL;DR
No, parent process closing its STDIN doesn't mean that its children or other process won't be able to use it.
The strange behavior you are seeing is because when the parent exits, bash puts the pty (pseudo terminal) into non-canonical mode. If you use sh instead, you won't see that behavior. Read up on pseudo terminals, and line discipline if you want to have a clear understading.
The shell process will resume as soon as the parent exits.
If you use wait to ensure that parents exits last, you won't have any problem, since the shell won't be able to run along with your program.
Normally, bash makes sure that no two foreground processes read from STDIN simultaneously, so you don't see this strange behavior. It does this by either piping STDOUT of one program to another, or by making one process a background process.
Trivia: When a background process tries to read from its STDIN, it is sent a signal SIGTTIN, which stops the process. Though, that's not really relevant to this scenario.
There are several issues that can happen when multiple processes try to do I/O to the same TTY. Without code, we can't tell which may be happening.
Trying to do I/O from a background process group may deliver a signal: SIGTTIN for input (usually enabled), or SIGTTOU for output (usually disabled)
Buffering: if you do any I/O before the fork, any data that has been buffered will be there for both processes. Under some conditions, using fflush may help, but it's better to avoid buffering entirely. Remember that, unlike output buffering, it is impossible to buffer input on a line-by-line basis (although you can only buffer what is available, so it may appear to be line-buffered at first).
Race conditions: if more than one process is trying to read the same pipe-like file, it is undefined which one will "win" and actually get the input each time it is available.

how to communicate to a program with another external program

I'm trying to write to stdin and read from stdout ( and stderr ) from an external program, without changing the code.
I've tried using named pipes, but stdout doesn't show until the program is terminated and stdin only works on the first input( then cin is null ).
i've tried using /proc/[pid]/fd but that only writes and reads from the terminal and not the program.
i've tried writing a character device file for this and it worked, but only one program at a time ( this needs to work for multiple programs at a time ).
at this point, to my knowledge, I could write the driver that worked to multiplex the io across multiple programs but I don't think that's the "right" solution.
the main purpose of this is to view a feed of a program through a web interface. I'm sure there has to be someway to do this. is there anything I haven't tried that's been done before?
The typical way of doing this is:
Create anonymous pipes (not named pipes) with the pipe(2) system call for the new process's standard streams
Call fork(2) to spawn the child process
close(2) the appropriate ends of the pipes in both the parent and the child (e.g. for the stdin pipe, close the read end in the parent and close the write end in the child; vice-versa for the stdout and stderr pipes)
Use dup2(2) in the child to copy the pipe file descriptors onto file descriptors 0, 1, and 2, and then close(2) the remaining old descriptors
exec(3) the external application in the child process
In the parent process, simultaneously write to the child's stdin pipe and read from the child's stdout and stderr pipes. However, depending on how the child behaves, this can easily lead to deadlock if you're not careful. One way to avoid deadlock is to spawn separate threads to handle each of the 3 streams; another way is to use the select(2) system call to wait until one of the streams can be read from/written to without blocking, and then process that stream.
Even if you do this all correctly, you may still not see your program's output right away. This is typically due to buffering stdout. Normally, when stdout is going to a terminal, it's line-buffered—it gets flushed after every newline gets written. But when stdout is a pipe (or anything else that's not a terminal, like a file or a socket), it's fully buffered, and it only gets written to when the program has outputted a full buffer's worth of data (e.g. 4 KB).
Many programs have command line options to change their buffering behavior. For example, grep(1) has the --line-buffered flag to force it to line-buffer its output even when stdout isn't a terminal. If your external program has such an option, you should probably use it. If not, it's still possible to change the buffering behavior, but you have to use some sneaky tricks—see this question and this question for how to do that.

Is it possible to know how many bytes have been printed to a file stream such as standard output?

Is it possible for a caller program in C to know how many bytes it has printed to a file stream such as stdout without actually counting and adding up the return values of printf?
I am trying to implement control of the quantity of output of a C program which uses libraries to print, but the libraries don't report the amount of data they have printed out.
I am interested in either a general solution or a Unix-specific one.
POSIX-specific: redirect stdout to a file, flush after all writing is done, then stat the file and look at st_size (or use the ls command).
Update: You say you're trying to control the quantity of output of a program. The POSIX head command will do that. If that's not satisfactory, then state your requirements clearly.
It's rather a heavyweight solution, but the following will work:
Create a pipe by calling pipe()
Spawn a child process
In the parent: redirect stdout to the write-side of the pipe, and close the read side (and the old stdout)
In the child: keep reading from the read side of the pipe, and copying the data to the inherited stdout (which is the original stdout) - counting it as it goes past
In the parent, keep writing to stdout (which is now the pipe) as usual
Use some form of IPC to communicate the result of the count from the child to the parent.
Basically the idea is to spawn a child process, and pipe all output through it, and have the child process count all the data as it goes through.
The precise form of IPC to use may vary - for example, shared memory (with atomic reads/writes on each side) would work well for fast transfer of data, but other methods (such as sockets, more pipes etc) are possible, and offer better scope for synchronisation.
The trickiest part is the synchronisation, i.e. ensuring that, at the time the child tells the parent how much data has been written, it has already processed all the data that the parent said (and there is none left in the pipe, for example). How important this is will depend on exactly what your aim is - if an approximate indication is all that's required, then you may be able to get away with using shared memory for IPC and not performing any explicit synchronisation; if the total is only required at the end then you can close stdout from the parent, and have the child indicate in the shared memory when it has received the eof notification.
If you require more frequent readouts, which must be exact, then something more copmlex will be required, but this can be achieved by designing some sort of protocol using sockets, pipes, or even condvars/semaphores/etc in the shared memory.
printf returns the number of bytes written.
Add them up.
No idea how reliable this is, but you can use ftell on stdout:
long int start = ftell(stdout);
printf("abcdef\n");
printf("%ld\n", ftell(stdout) - start); // >> 7
EDIT
Checked this on Ubuntu Precise: it does not work if the output goes to the console, but does work if it is redirected to a file.
$ ./a.out
abcdef
0
$ ./a.out >tt
$ cat tt
abcdef
7
$ echo `./a.out`
abcdef 0
$ echo `cat tt`
abcdef 7

Pipe is not receiving all output from child process

I wanted to open up a pipe to a program and read output from it. My initial inclination was to use popen(), but the program takes a number of options, and rather that fighting with shell quoting/escaping, I decided to use a combination of pipe(), fork(), dup() to tie the ends of the pipe to stdin/stdout in the parent/child, and execv() to replace the child with an invocation of the program passed all of the options it expects as an array.
The program outputs many lines of data (and flushes stdout after each line). The parent code sets stdin to non-blocking and reads from it in a loop using fgets(). The loop runs while fgets() return non-NULL or stdin has an error condition that is EAGAIN or EWOULDBLOCK.
It receives most of the lines successfully, but towards the end it seems to drop off, with the last fgets() failing with an odd error of "No such file or directory."
Does anyone know what I might have done wrong here?
I found the problem. I stupidly was not resetting errno to zero each iteration. I guess I just assumed fgets() would take care of it or something... My stupid mistake. Now it is working fine. Always reset errno!
Thanks for the help anyway.
not sure, there is a cool function on linux called posix_spawn (example here http://www.opengroup.org/onlinepubs/000095399/xrat/xsh_chap03.html#tag_03_03_01_02) sometimes it makes it easier to do pipes... but sounds like a possible blocking issue or pipe....
Make sure you open a pipe to STDERR. Most programs write error data there instead of STDIN.

Resources