How does a pipe work in Linux? - c

How does piping work? If I run a program via CLI and redirect output to a file will I be able to pipe that file into another program as it is being written?
Basically when one line is written to the file I would like it to be piped immediately to my second application (I am trying to dynamically draw a graph off an existing program). Just unsure if piping completes the first command before moving on to the next command.
Any feed back would be greatly appreciated!

If you want to redirect the output of one program into the input of another, just use a simple pipeline:
program1 arg arg | program2 arg arg
If you want to save the output of program1 into a file and pipe it into program2, you can use tee(1):
program1 arg arg | tee output-file | program2 arg arg
All programs in a pipeline are run simultaneously. Most programs typically use blocking I/O: if when they try to read their input and nothing is there, they block: that is, they stop, and the operating system de-schedules them to run until more input becomes available (to avoid eating up the CPU). Similarly, if a program earlier in the pipeline is writing data faster than a later program can read it, eventually the pipe's buffer fills up and the writer blocks: the OS de-schedules it until the pipe's buffer gets emptied by the reader, and then it can continue writing again.
EDIT
If you want to use the output of program1 as the command-line parameters, you can use the backquotes or the $() syntax:
# Runs "program1 arg", and uses the output as the command-line arguments for
# program2
program2 `program1 arg`
# Same as above
program2 $(program1 arg)
The $() syntax should be preferred, since they are clearer, and they can be nested.

Piping does not complete the first command before running the second. Unix (and Linux) piping run all commands concurrently. A command will be suspended if
It is starved for input.
It has produced significantly more output than its successor is ready to consume.
For most programs output is buffered, which means that the OS accumulates a substantial amount of output (perhaps 8000 characters or so) before passing it on to the next stage of the pipeline. This buffering is used to avoid too much switching back and forth between processes and kernel.
If you want output on a pipeline to be sent right away, you can use unbuffered I/O, which in C means calling something like fflush() to be sure that any buffered output is immediately sent on to the next process. Unbuffered input is also possible but is generally unnecessary because a process that is starved for input typically does not wait for a full buffer but will process any input you can get.
For typical applications unbuffered output is not recommended; you generally get the best performance with the defaults. In your case, however, where you want to do dynamic graphing immediately the first process has the info available, you definitely want to be using unbuffered output. If you're using C, calling fflush(stdout) whenever you want output sent will be sufficient.

If your programs are communicating using stdin and stdout, then make sure that you are either calling fflush(stdout) after you write or find some way to disable standard IO buffering. The best reference that I can think of that really describe how to best implement pipelines in C/C++ is Advanced Programming in the UNIX Environment or UNIX Network Programming: Volume 2. You could probably start with a this article as well.

If your two programs insist on reading and writing to files and do not use stdin/stdout, you may find you can use a named pipe instead of a file.
Create a named pipe with the mknod(1) command:
$ mknod /tmp/named-pipe p
Then configure your programs to read and write to /tmp/named-pipe (use whatever path/name you feel is appropriate).
In this case, both programs will run in parallel, blocking as necessary when the pipe becomes full/empty as described in the other answers.

Related

Capturing stdout/stderr separately and simultaneously from child process results in wrong total order (libc/unix)

I'm writing a library that should execute a program in a child process, capture the output, and make the output available in a line by line (string vector) way. There is one vector for STDOUT, one for STDERR, and one for "STDCOMBINED", i.e. all output in the order it was printed by the program. The child process is connected via two pipes to a parent process. One pipe for STDOUT and one for STDERR. In the parent process I read from the read-ends of the pipes, in the child process I dup2()'ed STDOUT/STDERR to the write ends of the pipes.
My problem:
I'd like to capture STDOUT, STDERR, and "STDCOMBINED" (=both in the order they appeared). But the order in the combined vector is different to the original order.
My approach:
I iterate until both pipes show EOF and the child process exited. At each iteration I read exactly one line (or EOF) from STDOUT and exactly one line (or EOF) from STDERR. This works so far. But when I capture out the lines as they come in the parent process, the order of STDOUT and STDERR is not the same as if I execute the program in a shell and look at the output.
Why is this so and how can I fix this? Is this possible at all? I know in the child process I could redirect STDOUT and STDERR both to a single pipe but I need STDOUT and STDERR separately, and "STDCOMBINED".
PS: I'm familiar with libc/unix system calls, like dup2(), pipe(), etc. Therefore I didn't post code. My question is about the general approach and not a coding problem in a specific language. I'm doing it in Rust against the raw libc bindings.
PPS: I made a simple test program, that has a mixup of 5 stdout and 5 stderr messages. That's enough to reproduce the problem.
At each iteration I read exactly one line (or EOF) from STDOUT and exactly one line (or EOF) from STDERR.
This is the problem. This will only capture the correct order if that was exactly the order of output in the child process.
You need to capture the asynchronous nature of the beast: make your pipe endpoints nonblocking, select* on the pipes, and read whatever data is present, as soon as select returns. Then you'll capture the correct order of the output. Of course now you can't be reading "exactly one line": you'll have to read whatever data is available and no more, so that you won't block, and maintain a per-pipe buffer where you append new data, extract any lines that are present, shove the unprocessed output to the beginning, and repeat. You could also use a circular buffer to save a little bit of memcpy-ing, but that's probably not very important.
Since you're doing this in Rust, I presume there's already a good asynchronous reaction pattern that you could leverage (I'm spoiled with go, I guess, and project the hopes on the unsuspecting).
*Always prefer platform-specific higher-performance primitives like epoll on Linux, /dev/poll on Solaris, pollset &c. on AIX
Another possibility is to launch the target process with LD_PRELOAD, with a dedicated library that it takes over glibc's POSIX write, detects writes to the pipes, and encapsulates such writes (and only those) in a packet by prepending it with a header that has an (atomically updated) process-wide incrementing counter stored in it, as well as the size of the write. Such headers can be easily decoded on the other end of the pipe to reorder the writes with a higher chance of success.
I think it's not possible to strictly do what you want to do.
If you think about how it's done when running a command in an interactive shell, what happens is that both stdout and stderr point to the same file descriptor (the TTY), so the total ordering is correct by means of synchronization against the same file.
To illustrate, imagine what happens if the child process has 2 completely independent threads, one only writing to stderr, and to other only writing to stdout. The total ordering would depend on however the scheduler decided to schedule these threads, and if you wanted to capture that, you'd need to synchronize those threads against something.
And of course, something can write thousands of lines to stdout before writing anything to stderr.
There are 2 ways to relax your requirements into something workable:
Have the user pass a flag waiving separate stdout and stderr streams in favor of a correct stdcombined, and then redirect both to a single file descriptor. You might need to change the buffering settings (like stdbuf does) before you execute the process.
Assume that stdout and stderr are "reasonably interleaved", an assumption pointed out by #Nate Eldredge, in which case you can use #Unslander Monica's answer.

how to buffer and delay printf() output?

I wrote a C program and in the program there are many printf() which output log information to stdout. Now I want to use multiple processes to run the program simultaneously with different arguments. And I want to redirect the output from stdout to a log file using >.
But multiple processes are running at the same time, their log information output overlap with each other, which can be confusing for future analysis.
one solution is: considering that different processes will exit at different time,modify the C program, so each log information is temporarily written into a temporal file. When the C program is about to exit. Read from the temporal file and write the content to stdout, this requires a lot of modification.
My idea is: I hope in the C program, all the printf() output can be buffered, the outputs put into stdout/redirection only when the process exits.
is it possible or not?
thanks!
This is not really possible, unless you are sure that the output is reasonably bounded (e.g. the total output is less than a few megabytes), otherwise use a logging mechanism which send to some central logger (like syslog).
On Linux and most Posix systems, the simplest way to do logging would be to use syslog(3) which is designed for logging (and is able to deal with different processes). I think this is the preferable approach.
With GNU libc, you could consider using open_memstream(3) -to write to memory, and here you need to be sure the total output is bounded- and use atexit(3) to have the memory stream written at the exit of the program into some file; you probably want to use some locking mechanism like flock(2) etc...
As commented by J.Holetzeck the simplest way is to redirect output into different files (perhaps using freopen(3), or simply in the invoking shell), and later merge these files.
I'm guessing you use Linux, or some Posix system. For Windows, I have no idea.

Replace pipe-shellscript with C-program

I have the following Bash script:
cat | command1 | command2 | command3
The commands never change.
For performance reasons, I want to replace it with a small C-program, that runs the commands and creates and assings the pipes accordingly.
Is there a way to do that in C?
As others said, you probably won't get a significant performance benefit.
It's reasonable to assume that the commands you run take most of the time, not the shell script gluing them together, so even if the glue becomes faster, it will change almost nothing.
Having said that, if you want to do it, you should use the fork(), pipe, dup2() and exec() functions.
fork will give you multiple processes.
pipe will give you a pair of file descriptors - what you write into one, you can read from the other.
dup2 can be used to change file descriptor numbers. You can take one side of a pipe and make it become file descriptor 1 (stdout) in one process, and the other side you'll make file descriptor 0 (stdin) in another (don't forget to close the normal stdin, stdout first).
exec (or one of its variants) will be used to execute the programs.
There are lots of details to fill in. Have fun.
Here is an example that does pretty much this.
There is no performance benefit for the processing itself, just a couple of milliseconds in initialization. Obviously we don't know the context in which you're doing this, but just using dash instead of bash would probably have gotten you 80% of those milliseconds from a single character change in your #!

Capturing program output

i am making a small library that will basically capture the standard outputs of a program (such as printf()) into a separate process/thread...this process should then perform certain tasks (lets say write these captured outputs to a file)...i am just beginning to do serious C programming so i am still learning.
i wanted to know what is the best way to do this, i mean using process or a thread...how do i capture these printf() statements...also this library must handle any child process if spawned by the programs...the general assumption is the program that uses it is a threaded one so may be what sort of approach should i take.
If you want your program or library to launch the program and capture its output, look at popen(3). It will give you a FILE pointer where you can read the output from the program.
The easiest way to capture the STDOUT from another program is to simply pipe it into the STDIN of your program (via the command-line ">" or "|" operator). So basically, in your C library, you should just read from STDIN with scanf, or gets, or whatever STDIN function you're using.
This is a pretty standard convention in the Unix/Linux world - programs read from STDIN and write to STDOUT in some well-formatted way, so that you can pipeline different programs together by simply adding pipes to the command line, e.g.:
grep "somestring" file1 file2 file3 | cut -d, -f1 | sort | uniq

How to capture unbuffered output from stdout without modifying the program?

I'm writing a utility for running programs, and I need to capture unbuffered stdout and stderr from the programs. I need to:
Capture stdout and stderr to separate files.
Output needs to not be buffered (or be line buffered).
Without modifying the source of the program being run.
The problem is, when piping output to a file, the stdout stream becomes block buffered rather than line buffered. If the program crashes, the output never gets flushed, and is blank. So I need to capture stdout without buffering (or with line buffering).
I think this can be done with pty's but I'm having difficulty finding any examples that do exactly what I want (most ignore stderr). In fact, I'm not sure I've found any pty examples in C at all; most use a higher-level interface like Python's pty and subprocess modules.
Can anyone help (with code snippets or links)? Any help would be appreciated.
EDIT: I think I've solved it. The following two links were pretty helpful.
http://publib.boulder.ibm.com/infocenter/zos/v1r10/index.jsp?topic=/com.ibm.zos.r10.bpxbd00/posixopenpt.htm
http://www.gidforums.com/t-3369.html
My code is available as a repository:
https://bitbucket.org/elliottslaughter/pty
see man 7 pty
In particular:
Unix 98 pseudo-terminals
An unused Unix 98 pseudo-terminal master is opened by calling
posix_openpt(3). (This function opens the master clone device,
/dev/ptmx; see pts(4).) After performing any program-specific initializations,
changing the ownership and permissions of the slave device
using grantpt(3), and unlocking the slave using unlockpt(3)), the corresponding
slave device can be opened by passing the name returned by
ptsname(3) in a call to open(2).
And now that you know the names of the library functions such a code will need to call, you can do two useful things:
Look up their man pages
Google for example code. Since you know what keywords to use with the search engine I suspect you will have much more luck hunting down examples.

Resources