how to buffer and delay printf() output? - c

I wrote a C program and in the program there are many printf() which output log information to stdout. Now I want to use multiple processes to run the program simultaneously with different arguments. And I want to redirect the output from stdout to a log file using >.
But multiple processes are running at the same time, their log information output overlap with each other, which can be confusing for future analysis.
one solution is: considering that different processes will exit at different time,modify the C program, so each log information is temporarily written into a temporal file. When the C program is about to exit. Read from the temporal file and write the content to stdout, this requires a lot of modification.
My idea is: I hope in the C program, all the printf() output can be buffered, the outputs put into stdout/redirection only when the process exits.
is it possible or not?
thanks!

This is not really possible, unless you are sure that the output is reasonably bounded (e.g. the total output is less than a few megabytes), otherwise use a logging mechanism which send to some central logger (like syslog).
On Linux and most Posix systems, the simplest way to do logging would be to use syslog(3) which is designed for logging (and is able to deal with different processes). I think this is the preferable approach.
With GNU libc, you could consider using open_memstream(3) -to write to memory, and here you need to be sure the total output is bounded- and use atexit(3) to have the memory stream written at the exit of the program into some file; you probably want to use some locking mechanism like flock(2) etc...
As commented by J.Holetzeck the simplest way is to redirect output into different files (perhaps using freopen(3), or simply in the invoking shell), and later merge these files.
I'm guessing you use Linux, or some Posix system. For Windows, I have no idea.

Related

Is there a way to set up a Linux pipe to non-buffering or line-buffering?

My program is controlling an external application on Linux, passing in input commands via a pipe to the external applications stdin, and reading output result via a pipe from the external applications stdout.
The problem is that writes to pipes are buffered by block, and not by line, and therefore delays occur before my app receives data output by the external application. The external application cannot be altered to add explicit fflush() calls.
When I set the external application to /bin/cat -n (it echoes back the input, with line numbers added), it works correctly, it seems, cat flushes after each line. The only way to force the external application to flush, is sending exit command to it; as it receives the command, it flushes, and all the answers appears on the stdout, just before exiting.
I'm pretty sure, that Unix pipes are appropiate solution for that kind of interprocess communication (pseudo server-client), but maybe I'm wrong.
(I've just copied some text from a similar question: Force another program's standard output to be unbuffered using Python)
Don't use a pipe. Use a pty instead. Pty's (pseudo-ttys) have the benefit of being line buffered if you want it, which provides you with simple framing for your data stream.
Using a PTY may be an overkill for the problem at hand (although it will work).
If the "target application" (the Delphi command-line utility) is dynamically linked, a possibly much simpler solution is to interpose (via LD_PRELOAD) a small library into the application. That library simply needs to implement isatty, and answer true (return 1) regardless of whether the output is going to a pipe or a terminal. You may wish to do that for all file descriptors, or just for STDOUT_FILENO.
Most UNIX implementations will call isatty to decide whether to do full buffering or line buffering for a given file descriptor.
Hmm, glibc doesn't. It calls __fxstat, and then only calls isatty if the status indicates that fd is going to a character device. So you'll need to interpose both __fxstat and isatty. More on library interposition here.
By default standard input and standard output are fully buffered unless they are connected to an interactive device in which cases they are line buffered [1]. Pipes are non-interactive devices. PTYs are interactive devices. "Fully buffered" means "use a chunk of memory of a certain size".
I'm sure you want line buffering. Therefore using a master/slave PTY instead of pipes should bring the controlled application into the right buffering mode automatically.
[1] see "stdin(3)" and "setbuf(3)" for details.
Why calling fflush suitably (on the write side) don't work for you?
You can use poll (or other syscalls like ppoll, pselect, select) to check availability of input on the read side.
If the external application is using <stdio.h> without calling fflush appropriately (perhaps by setbuf making it happen on newlines ....), data would remain inside its FILE* buffer without even being sent (with a write syscall) to the pipe!
An application can detect if its output is a terminal with e.g. isatty. But it should ensure that flushing happens...
As Michael Dillon suggested, using pty-s is probably the best. But it is hard (I forgot the gory details).

How does a pipe work in Linux?

How does piping work? If I run a program via CLI and redirect output to a file will I be able to pipe that file into another program as it is being written?
Basically when one line is written to the file I would like it to be piped immediately to my second application (I am trying to dynamically draw a graph off an existing program). Just unsure if piping completes the first command before moving on to the next command.
Any feed back would be greatly appreciated!
If you want to redirect the output of one program into the input of another, just use a simple pipeline:
program1 arg arg | program2 arg arg
If you want to save the output of program1 into a file and pipe it into program2, you can use tee(1):
program1 arg arg | tee output-file | program2 arg arg
All programs in a pipeline are run simultaneously. Most programs typically use blocking I/O: if when they try to read their input and nothing is there, they block: that is, they stop, and the operating system de-schedules them to run until more input becomes available (to avoid eating up the CPU). Similarly, if a program earlier in the pipeline is writing data faster than a later program can read it, eventually the pipe's buffer fills up and the writer blocks: the OS de-schedules it until the pipe's buffer gets emptied by the reader, and then it can continue writing again.
EDIT
If you want to use the output of program1 as the command-line parameters, you can use the backquotes or the $() syntax:
# Runs "program1 arg", and uses the output as the command-line arguments for
# program2
program2 `program1 arg`
# Same as above
program2 $(program1 arg)
The $() syntax should be preferred, since they are clearer, and they can be nested.
Piping does not complete the first command before running the second. Unix (and Linux) piping run all commands concurrently. A command will be suspended if
It is starved for input.
It has produced significantly more output than its successor is ready to consume.
For most programs output is buffered, which means that the OS accumulates a substantial amount of output (perhaps 8000 characters or so) before passing it on to the next stage of the pipeline. This buffering is used to avoid too much switching back and forth between processes and kernel.
If you want output on a pipeline to be sent right away, you can use unbuffered I/O, which in C means calling something like fflush() to be sure that any buffered output is immediately sent on to the next process. Unbuffered input is also possible but is generally unnecessary because a process that is starved for input typically does not wait for a full buffer but will process any input you can get.
For typical applications unbuffered output is not recommended; you generally get the best performance with the defaults. In your case, however, where you want to do dynamic graphing immediately the first process has the info available, you definitely want to be using unbuffered output. If you're using C, calling fflush(stdout) whenever you want output sent will be sufficient.
If your programs are communicating using stdin and stdout, then make sure that you are either calling fflush(stdout) after you write or find some way to disable standard IO buffering. The best reference that I can think of that really describe how to best implement pipelines in C/C++ is Advanced Programming in the UNIX Environment or UNIX Network Programming: Volume 2. You could probably start with a this article as well.
If your two programs insist on reading and writing to files and do not use stdin/stdout, you may find you can use a named pipe instead of a file.
Create a named pipe with the mknod(1) command:
$ mknod /tmp/named-pipe p
Then configure your programs to read and write to /tmp/named-pipe (use whatever path/name you feel is appropriate).
In this case, both programs will run in parallel, blocking as necessary when the pipe becomes full/empty as described in the other answers.

How can I capture another process's output using C?

How can I capture another process's output using pure C? Can you provide sample code?
EDIT: let's assume Linux. I would be interested in "pretty portable" code. All I want to do is to execute a command, capture it's output and process it in some way.
There are several options, but it does somewhat depend on your platform. That said popen should work in most places, e.g.
#include <stdio.h>
FILE *stream;
stream = popen("acommand", "r");
/* use fread, fgets, etc. on stream */
pclose(stream);
Note that this has a very specific use, it creates the process by running the command acommand and attaches its standard out in a such as way as to make it accessible from your program through the stream FILE*.
If you need to connect to an existing process, or need to do richer operations, you may need to look into other facilities. Unix has various mechanisms for hooking up a processes stdout etc.
Under windows you can use the CreateProcess API to create a new process and hook up its standard output handle to what you want. Windows also supports popen.
There's no plain C way to do this that I know of though, so it's always going somewhat dependent on platform specific APis.
Based on your edits popen seems ideal, it is "pretty portable", I don't think there's a unix like OS without it, indeed it is part of the Single Unix Specification, and POSIX, and it lets you do exactly what you want, execute a process, grab its output and process it.
If you can use system pipes, simply pipe the other process's output to your C program, and in your C program, just read the standard input.
otherprocess | your_c_program
Which OS are you using? On *nix type OS if you are process is outputting to STDOUT or STDERR you can obviously use pipes

Is glibc's implementation of fprintf() thread-safe?

Is fprintf thread-safe? The glibc manual seems to say it is, but my application, which writes to a file using single call to fprintf() seems to be intermingling partial writes from different processes.
edit: To clarify, the program in question is a lighttpd plugin, and the server is running with multiple worker threads.
Looking at the file, some of the writes are intermingled.
edit 2: It seems the problem I'm seeing might be due to lighttpd's "worker threads" actually being separate processes: http://redmine.lighttpd.net/wiki/lighttpd/Docs:MultiProcessor
Problems
By running 2 or more processes on the
same socket you will have a better
concurrency, but will have a few
drawbacks that you have to be aware
of:
mod_accesslog might create broken access logs, as the same file is opened twice and is NOT synchronized.
mod_status will have n separate counters, one set for each
process.
mod_rrdtool will fail as it receives the same timestamp twice.
mod_uploadprogress will not show correct status.
You're confusing two concepts - writing from multiple threads and writing from multiple processes.
Inside a process its possible to ensure that one invocation of fprintf is completed before the next is allowed access to the output buffer, but once your app pumps that output to a file you're at the mercy of the OS. Without some kind of OS based locking mechanism you cant ensure that an entirely different application doesnt write to your log file.
Sounds to me like you need to read on file locking. The problem you have is that multiple processes (i.e. not threads) are writing to the same file simultaneously and there is no reliable way to insure the writes will be atomic. This can result in files overwriting each other's writes, mixed output, and altogether non-deterministic behaviour.
This has nothing to do with Thread Safety, as this is relevant only in single-process multithreading programs.
The current C++ standard says nothing useful about concurrency, nor does the 1990 C standard. (I haven't read the 1999 C standard, so can't comment on it; the upcoming C++0x standard does say things, but I don't know exactly what offhand.)
This means that fprintf() itself is likely neither thread-safe nor otherwise, and that it would depend on the implementation. I'd read exactly what the glibc documentation says about it, and compare it to exactly what you're doing.

How to capture unbuffered output from stdout without modifying the program?

I'm writing a utility for running programs, and I need to capture unbuffered stdout and stderr from the programs. I need to:
Capture stdout and stderr to separate files.
Output needs to not be buffered (or be line buffered).
Without modifying the source of the program being run.
The problem is, when piping output to a file, the stdout stream becomes block buffered rather than line buffered. If the program crashes, the output never gets flushed, and is blank. So I need to capture stdout without buffering (or with line buffering).
I think this can be done with pty's but I'm having difficulty finding any examples that do exactly what I want (most ignore stderr). In fact, I'm not sure I've found any pty examples in C at all; most use a higher-level interface like Python's pty and subprocess modules.
Can anyone help (with code snippets or links)? Any help would be appreciated.
EDIT: I think I've solved it. The following two links were pretty helpful.
http://publib.boulder.ibm.com/infocenter/zos/v1r10/index.jsp?topic=/com.ibm.zos.r10.bpxbd00/posixopenpt.htm
http://www.gidforums.com/t-3369.html
My code is available as a repository:
https://bitbucket.org/elliottslaughter/pty
see man 7 pty
In particular:
Unix 98 pseudo-terminals
An unused Unix 98 pseudo-terminal master is opened by calling
posix_openpt(3). (This function opens the master clone device,
/dev/ptmx; see pts(4).) After performing any program-specific initializations,
changing the ownership and permissions of the slave device
using grantpt(3), and unlocking the slave using unlockpt(3)), the corresponding
slave device can be opened by passing the name returned by
ptsname(3) in a call to open(2).
And now that you know the names of the library functions such a code will need to call, you can do two useful things:
Look up their man pages
Google for example code. Since you know what keywords to use with the search engine I suspect you will have much more luck hunting down examples.

Resources