Pipe Stream Question - c

I am trying to execute a program I created in popen and then collect the output. The program I execute takes 30 seconds (there are 30 sleep(1)s) and after each seconds it sends chunks of output. What puzzles me is that when I call pipe = popen("test -flag", "r") it finishes immediately and the FILE stream pipe is empty. Are my assumptions that that the program will halt and wait for test to finish executing before continuing or does it initiate the order to collect the output and immediately continue on? If it is the latter, is there any way to pause the program until the pipe has all the output before continuing?
Thanks!

The call to popen() is supposed to be relatively quick, so your program can get on with reading the output from the program. Certainly, popen() itself does not wait for the invoked program to finish. Once the popen() returns, your program should be able to read from the file stream; it will hang until there is input waiting, or until the other process closes the pipe (e.g. terminates), at which point it will get an EOF indication. You can then pclose() the stream you were reading from.
How does your test program behave when you run it directly with its output sent to a pipe? Note that standard I/O typically behaves differently when the output is a pipe (full buffering) as against a terminal (line buffering).

Read pipe until EOF is returned.

Related

How can I understand the behavior of pipe, with varying data flow?

My problem is a bit hard to explain properly as I do not understand fully the behavior behind it.
I have been working on pipe and pipelines in C, and I noticed some behavior that is a bit mysterious to me.
Let's take a few example: Let's try to pipe yes with head. (yes | head). Even though I coded the behavior in a custom program, I don't understand how the pipe knows when to stop piping ? It seems two underlying phenomenons are causing this (maybe), the SIGPIPE and/or the internal size a pipe can take. How does the pipe stop piping, is it when it's full ? But the size of a pipe is way superior to 10 "yes\n" no ? And SIGPIPE only works when the end read/write is closed no ?
Also let's take another example, for example cat and ls: cat | ls or even cat | cat | ls.
It seems the stdin of the pipe is waiting for input, but how does it know when to stop, i.e. after one input ? What are the mechanism that permits this behavior?
Also can anyone provide me with others examples of these very specific behavior if there are any in pipes and pipelines so I can get an good overview of theses mechanism ?
In my own implementation, I managed to replicate that behavior using waitpid. However how does the child process itself know when to stop ? Is it command specific ?
The write operation will block when the pipe buffer is full, the read operation will block when the buffer is empty.
When the write end of the pipe is closed, the reading process will get an EOF indication after reading all data from the buffer. Many programs will terminate in this case.
When the read end of the pipe is closed, the writing process will get a SIGPIPE. This will also terminate most programs.
When you run cat | ls, STDOUT of cat is connected to STDIN of ls, but ls does not read from STDIN. On the system where I checked this, ls simply ignores STDIN and the file descriptor will be closed when ls terminates.
You will see the output of ls, and cat will be waiting for input.
cat will not write anything to STDOUT before it has read enough data from STDIN, so it will not notice that the other end of the pipe has been closed.
cat will terminate when it detects EOF on STDIN which can be done by pressing CTRL+D or by redirecting STDIN from /dev/null, or when it gets SIGPIPE after trying to write to the pipe which will happen when you (type something and) press ENTER.
You can see the behavior with strace.
cat terminates after EOF on input which is shown as read(0, ...) returning 0.
strace cat < /dev/null | ls
cat killed by SIGPIPE.
strace cat < /dev/zero | ls
How does the pipe stop piping
The pipe stops piping when either end is closed.
If the input(write) end of the pipe is closed, then any data in the pipe is held until it is read from the output end. Once the buffer is emptied, anyone subsequently reading from the output end will get an EOF.
If the output(read) end of the pipe is closed, any data in the pipe will be discarded. Anyone subsequently writing to the input end will get a SIGPIPE/EPIPE. Note that a process merely holding open the input but not actively writing to it will not be signalled.
So when you type cat | ls you get a cat program with stdout connected to the input of the pipe and ls with stdin connected to the output. ls runs and outputs some stuff (to its stdout, which is still the terminal) and never reads from stdin. Once done it exits and closes the output of the pipe. Meanwhile cat is waiting for input from its stdin (the terminal). When it gets it (you type a line), it writes it to stdout, gets a SIGPIPE/EPIPE and exits (discarding the data as there's noone to write it to.) This closes the input of the pipe, so the pipe goes away now that both ends have been closed.
Now lets look at what happens with cat | cat | ls. You now have two pipes and two cat programs. As before ls runs and exits, closing the output of the second pipe. Now you type a line and the first cat reads it and copies it to the first pipe (still fully open) where the second cat reads it and copies it to the second pipe (which has its output closed), so it (the second cat) gets a SIGPIPE/EPIPE and exits (which closes the output of the first pipe). At this point the first cat is still waiting for input, so if you type a second line, it copies that to the now closed first pipe and gets a SIGPIPE/EPIPE and exits
How does the pipe stop piping, is it when it's full ?
A pipe has several states:
if you obtain the pipe through a call to pipe(2) (an unnamed pipe) both file descriptors are already open, so this doesn't apply to it (you start in point 2. below). When you open a named pipe, your open(2) call (depending if you have open with O_READ, O_WRITE, or O_RDWR. The pipe has two sides, the writer and the reader side. When you open it, you attach to the sides, depending on how do you open it. Well, up to here, the pipe blocks any open(2) call, until both sides have at least one process tied to them. So, if you open a pipe and read(2) from it, then your open will be blocked, until other process has opened it to read.
once both extremes have it open, the readers (the process issuing a read(2) call) block when the pipe is empty, and the writers (the processes issuing a write(2) call) block whenever the write call cannot be satisfied due to fillin completely the pipe. Old implementations of pipes used the filesystem to hold the data, and the data was stored only in the direct addressed disk blocks. This meant (as there are 10 such blocks in an inode) that you normally had space in the pipe to hold 10 blocks, after that, the writers are blocked. Later, pipes were implemented using the socket infrastructure in BSD systems, which allowed you to control the buffer size with ioctl(2) calls. Today, IMHO, pipes use a common implementation, that is separate from sockets also.
When the processes close the pipe continues to work as said in point 2. above, until the number of readers/writers collapses to zero. At that point, the pipe starts giving End Of File condition to all readers (this means read(2) syscall will return 0 bytes, without blocking) and error (cannot write to pipe) to writers. In addition, the kernel sends a signal (which normally aborts the writer processes) SIGPIPE to every process that has the pipe open for writing. If you have not ignored that signal or you have not installed a signal handler for it, your process will die. In this state, it's impossible to reopen the pipe again, until all processes have closed it.
A common error is when you pipe() or you open a pipe with O_RDWR, and the other process closes its file descriptor, and you don't get anything indicating about the other's close call..... this is due to the thing that both sides of the pipe are still open (by the same process) so it will not receive anything because it can still write to the pipe.
Any other kind of misbehaviour could be explained if you had posted any code, but you didn't, so IMHO, thi answer is still incomplete, but the number of different scenarios is difficult to enumerate, so I'll be pendant of any update to your question with some faulty (or needed of explanation) code.

How to detect pclose inside client

How can a C-client started by popen and writing to stdout properly detect that the calling process has called pclose.
I am sending binary data from a small client program written in C to Matlab. To this end, Matlab is starting the process by calling popen inside an API written in C. The client is continuously writing binary data to stdout using fwrite. When Matlab is stopping, the API apparently calls pclose on the client's handle, but that does not stop the client process. I guess the fwrite will not raise an error, as the data gets buffered by the OS. So what is the appropriate way to detect the pclose inside the client?
BTW I will run into the same problem agin, when trying to write to some C-client from within Matlab.
Three ways:
If you are reading from the pipe, and the other end goes away, you should get end-of-file, and you can detect that, and stop.
If you are reading from the pipe, and you control the protocol, the other end can send you a "quit" command over the pipe, and you can read that, and stop.
If you are writing to the pipe, and the other end goes away, you should get a SIGPIPE signal, which by default should kill your process.
Note that #1 only works if the other end is the only process that has the pipe open. If any other process has the writing end of pipe open, you won't get EOF, and you won't know to stop. If you're calling pipe(), then exec(), it's easy for you to have the writing end of the pipe open. This is a common mistake. (Since you're calling popen(), though, it's less likely you're having this problem.)

Will a process writing to a pipe block if the pipe is full?

I'm currently diving into the Win32 API and writing myself a wrapper class for CreateProcess and CreatePipe. I was just wondering what will happen if a process that I opened writes too much output for the pipe buffer to hold. Will the process wait until I read from the other end of the pipe? The Remark of the CreatePipe function suggests so:
When a process uses WriteFile to write to an anonymous pipe, the write operation is not completed until all bytes are written. If the pipe buffer is full before all bytes are written, WriteFile does not return until another process or thread uses ReadFile to make more buffer space available.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365152%28v=vs.85%29.aspx
Let's assume I open a process with CreateProcess, then use WaitForSingleObject to wait until the process exits. Will the process ever exit if it exceeds the buffer size of its standard output pipe?
WaitForSingleObject on a process with redirected output is indeed a deadlock. You need to keep the output pipe drained in order to let the child process run to completion.
Generally you would use overlapped I/O on the pipe and then WaitForMultipleObjects on the handle pair1 (process handle, pipe read event handle) in a loop until the process handle becomes signaled.
Raymond Chen wrote about the scenario when the input is also piped:
Be careful when redirecting both a process's stdin and stdout to pipes, for you can easily deadlock
1 As Hans commented, there can be more than one output stream. stdout, stderr are typical, even more are possible through handle inheritance. Drain all the pipes coming out of the process.

how to communicate to a program with another external program

I'm trying to write to stdin and read from stdout ( and stderr ) from an external program, without changing the code.
I've tried using named pipes, but stdout doesn't show until the program is terminated and stdin only works on the first input( then cin is null ).
i've tried using /proc/[pid]/fd but that only writes and reads from the terminal and not the program.
i've tried writing a character device file for this and it worked, but only one program at a time ( this needs to work for multiple programs at a time ).
at this point, to my knowledge, I could write the driver that worked to multiplex the io across multiple programs but I don't think that's the "right" solution.
the main purpose of this is to view a feed of a program through a web interface. I'm sure there has to be someway to do this. is there anything I haven't tried that's been done before?
The typical way of doing this is:
Create anonymous pipes (not named pipes) with the pipe(2) system call for the new process's standard streams
Call fork(2) to spawn the child process
close(2) the appropriate ends of the pipes in both the parent and the child (e.g. for the stdin pipe, close the read end in the parent and close the write end in the child; vice-versa for the stdout and stderr pipes)
Use dup2(2) in the child to copy the pipe file descriptors onto file descriptors 0, 1, and 2, and then close(2) the remaining old descriptors
exec(3) the external application in the child process
In the parent process, simultaneously write to the child's stdin pipe and read from the child's stdout and stderr pipes. However, depending on how the child behaves, this can easily lead to deadlock if you're not careful. One way to avoid deadlock is to spawn separate threads to handle each of the 3 streams; another way is to use the select(2) system call to wait until one of the streams can be read from/written to without blocking, and then process that stream.
Even if you do this all correctly, you may still not see your program's output right away. This is typically due to buffering stdout. Normally, when stdout is going to a terminal, it's line-buffered—it gets flushed after every newline gets written. But when stdout is a pipe (or anything else that's not a terminal, like a file or a socket), it's fully buffered, and it only gets written to when the program has outputted a full buffer's worth of data (e.g. 4 KB).
Many programs have command line options to change their buffering behavior. For example, grep(1) has the --line-buffered flag to force it to line-buffer its output even when stdout isn't a terminal. If your external program has such an option, you should probably use it. If not, it's still possible to change the buffering behavior, but you have to use some sneaky tricks—see this question and this question for how to do that.

Should data to the pipe be written at a time?

I have a scenario where two pipes are used for IPC between child and parent. The child process uses execvp to execute a remote program. The parent process takes care of writing data to the pipe. The remote programs stdin is duplicated to read end of one pipe. To the same pipe parent writes data at the write end. The remote program has a simple getchar() in one of the functions that is called twice in the remote program's main function.
The parent writes data in the following sequence.
writes data to the pipe. Closes all the required handles. (say wrote 1)
after some time writes data again to the pipe. Closes handle (say wrote 2)
The getchar in the remote program reads "1" in the proper fashion. But the problem comes while reading "2". The getchar is reading garbage values.
I have debugged using GDB and the program exits normally. No "signals" are raised while debugging.
I have used the fork(), dup2() and pipe() functions and need to stick to it.

Resources