int main(int argc, char *argv[])
{
FILE *fp = fopen("a.txt", "wt");
fprintf(fp, "AAAA");
// No flush. and No close
raise(SIGTERM);
exit(EXIT_SUCCESS);
}
result: No data has written to a.txt
I expected this is fine. Because the system will close the file handle and then the filesystem driver flushes the unflushed data in his Close handler. But it wasn't.
I tested this code on EXT4, ubuntu 11.10
Question:
I thought ALL filesystems must flush unflushed data at his close processing.
Posix doesn't have the rule?
P.S This code worked well (flushed well) on NTFS, Win7
int _tmain(int argc, _TCHAR* argv[])
{
HANDLE h = CreateFile(L"D:\\a.txt", GENERIC_READ|GENERIC_WRITE,
0, 0, OPEN_ALWAYS, 0, 0);
BYTE a[3];
memset(a, 'A', 3);
DWORD dw;
WriteFile(h, (PVOID)a, 3, &dw, 0);
TerminateProcess(GetCurrentProcess(), 1);
return 0;
}
Edit:
I tested it again with system call write. And it was flushed well.
int main(int argc, char** argv)
{
int fd = open("a.txt", O_CREAT|O_TRUNC|O_WRONLY);
char buf[3];
memset(buf, 'A', 3);
size_t result = write(fd, buf, 3);
raise(SIGTERM);
exit(EXIT_SUCCESS);
return 0;
}
This isn't anything to do with the filesystem, rather it's the behaviour of the C implementation you are using that determines when open streams are flushed or not.
Under POSIX, the default action action for the SIGTERM signal is:
Abnormal termination of the process. The process is terminated with all the consequences of _exit()...
_exit() is equivalent to _Exit() according to the C standard and the choice of whether to flush streams is not specified by the standard:
The _Exit() and _exit() functions shall not call functions registered with atexit() nor any registered signal handlers. Whether open streams are flushed or closed, or temporary files are removed is implementation-defined...
Assuming you are using glibc on Linux, from the documentation (emphasis mine):
When a process terminates for any reason—either because the program terminates, or as a result of a signal—the following things happen:
All open file descriptors in the process are closed. See Low-Level I/O. Note that streams are not flushed automatically when the process terminates; see I/O on Streams.
I'm not familiar with the Windows' WriteFile and TerminateProcess so I can't comment on what the documented behaviour is.
It doesn't have anything to do with file system drivers. The issue is that the CRT is buffering the file stream itself. You set the buffer size with setvbuf(), it uses a default if you don't use this function. There's no buffering in the application when you use WriteFile(), output is buffered in the operating system's file system cache. Immune from abrupt app aborts.
You'll have to call fflush() to achieve the same.
Related
The behaviour of printf() seems to depend on the location of stdout.
If stdout is sent to the console, then printf() is line-buffered and is flushed after a newline is printed.
If stdout is redirected to a file, the buffer is not flushed unless fflush() is called.
Moreover, if printf() is used before stdout is redirected to file, subsequent writes (to the file) are line-buffered and are flushed after newline.
When is stdout line-buffered, and when does fflush() need to be called?
Minimal example of each:
void RedirectStdout2File(const char* log_path) {
int fd = open(log_path, O_RDWR|O_APPEND|O_CREAT,S_IRWXU|S_IRWXG|S_IRWXO);
dup2(fd,STDOUT_FILENO);
if (fd != STDOUT_FILENO) close(fd);
}
int main_1(int argc, char* argv[]) {
/* Case 1: stdout is line-buffered when run from console */
printf("No redirect; printed immediately\n");
sleep(10);
}
int main_2a(int argc, char* argv[]) {
/* Case 2a: stdout is not line-buffered when redirected to file */
RedirectStdout2File(argv[0]);
printf("Will not go to file!\n");
RedirectStdout2File("/dev/null");
}
int main_2b(int argc, char* argv[]) {
/* Case 2b: flushing stdout does send output to file */
RedirectStdout2File(argv[0]);
printf("Will go to file if flushed\n");
fflush(stdout);
RedirectStdout2File("/dev/null");
}
int main_3(int argc, char* argv[]) {
/* Case 3: printf before redirect; printf is line-buffered after */
printf("Before redirect\n");
RedirectStdout2File(argv[0]);
printf("Does go to file!\n");
RedirectStdout2File("/dev/null");
}
Flushing for stdout is determined by its buffering behaviour. The buffering can be set to three modes: _IOFBF (full buffering: waits until fflush() if possible), _IOLBF (line buffering: newline triggers automatic flush), and _IONBF (direct write always used). "Support for these characteristics is implementation-defined, and may be affected via the setbuf() and setvbuf() functions." [C99:7.19.3.3]
"At program startup, three text streams are predefined and need not be opened explicitly
— standard input (for reading conventional input), standard output (for writing
conventional output), and standard error (for writing diagnostic output). As initially
opened, the standard error stream is not fully buffered; the standard input and standard
output streams are fully buffered if and only if the stream can be determined not to refer
to an interactive device." [C99:7.19.3.7]
Explanation of observed behaviour
So, what happens is that the implementation does something platform-specific to decide whether stdout is going to be line-buffered. In most libc implementations, this test is done when the stream is first used.
Behaviour #1 is easily explained: when the stream is for an interactive device, it is line-buffered, and the printf() is flushed automatically.
Case #2 is also now expected: when we redirect to a file, the stream is fully buffered and will not be flushed except with fflush(), unless you write gobloads of data to it.
Finally, we understand case #3 too for implementations that only perform the check on the underlying fd once. Because we forced stdout's buffer to be initialised in the first printf(), stdout acquired the line-buffered mode. When we swap out the fd to go to file, it's still line-buffered, so the data is flushed automatically.
Some actual implementations
Each libc has latitude in how it interprets these requirements, since C99 doesn't specify what an "interactive device" is, nor does POSIX's stdio entry extend this (beyond requiring stderr to be open for reading).
Glibc. See filedoalloc.c:L111. Here we use stat() to test if the fd is a tty, and set the buffering mode accordingly. (This is called from fileops.c.) stdout initially has a null buffer, and it's allocated on the first use of the stream based on the characteristics of fd 1.
BSD libc. Very similar, but much cleaner code to follow! See this line in makebuf.c
Your are wrongly combining buffered and unbuffered IO functions. Such a combination must be done very carefully especially when the code has to be portable. (and it is bad to write unportable code...)
It is certainly best to avoid combining buffered and unbuffered IO on the same file descriptor.
Buffered IO: fprintf(), fopen(), fclose(), freopen()...
Unbuffered IO: write(), open(), close(), dup()...
When you use dup2() to redirect stdout. The function is not aware of the buffer which was filled by fprintf(). So when dup2() closes the old descriptor 1 it does not flush the buffer and the content could be flushed to a different output. In your case 2a it was sent to /dev/null.
The solution
In your case it is best to use freopen() instead of dup2(). This solves all your problems:
It flushes the buffers of the original FILE stream. (case 2a)
It sets the buffering mode according to the newly opened file. (case 3)
Here is the correct implementation of your function:
void RedirectStdout2File(const char* log_path) {
if(freopen(log_path, "a+", stdout) == NULL) err(EXIT_FAILURE, NULL);
}
Unfortunately with buffered IO you cannot directly set permissions of a newly created file. You have to use other calls to change the permissions or you can use unportable glibc extensions. See the fopen() man page.
You should not close the file descriptor, so remove close(fd) and close
stdout_bak_fd if you want the message to be print only in the file.
I've boiled down my entire program to a short main that replicates the issue, so forgive me for it not making any sense.
input.txt is a text file that has a couple lines of text in it. This boiled down program should print those lines. However, if fork is called, the program enters an infinite loop where it prints the contents of the file over and over again.
As far as I understand fork, the way I use it in this snippet is essentially a no-op. It forks, the parent waits for the child before continuing, and the child is immediately killed.
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
enum { MAX = 100 };
int main(){
freopen("input.txt", "r", stdin);
char s[MAX];
int i = 0;
char* ret = fgets(s, MAX, stdin);
while (ret != NULL) {
//Commenting out this region fixes the issue
int status;
pid_t pid = fork();
if (pid == 0) {
exit(0);
} else {
waitpid(pid, &status, 0);
}
//End region
printf("%s", s);
ret = fgets(s, MAX, stdin);
}
}
Edit: Further investigation has only made my issue stranger. If the file contains <4 blank lines or <3 lines of text, it does not break. However, if there are more than that, it loops infinitely.
Edit2: If the file contains numbers 3 lines of numbers it will infinitely loop, but if it contains 3 lines of words it will not.
I am surprised that there is a problem, but it does seem to be a problem on Linux (I tested on Ubuntu 16.04 LTS running in a VMWare Fusion VM on my Mac) — but it was not a problem on my Mac running macOS 10.13.4 (High Sierra), and I wouldn't expect it to be a problem on other variants of Unix either.
As I noted in a comment:
There's an open file description and an open file descriptor behind each stream. When the process forks, the child has its own set of open file descriptors (and file streams), but each file descriptor in the child shares the open file description with the parent. IF (and that's a big 'if') the child process closing the file descriptors first did the equivalent of lseek(fd, 0, SEEK_SET), then that would also position the file descriptor for the parent process, and that could lead to an infinite loop. However, I've never heard of a library that does that seek; there's no reason to do it.
See POSIX open() and fork() for more information about open file descriptors and open file descriptions.
The open file descriptors are private to a process; the open file descriptions are shared by all copies of the file descriptor created by an initial 'open file' operation. One of the key properties of the open file description is the current seek position. That means that a child process can change the current seek position for a parent — because it is in the shared open file description.
neof97.c
I used the following code — a mildly adapted version of the original that compiles cleanly with rigorous compilation options:
#include "posixver.h"
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
enum { MAX = 100 };
int main(void)
{
if (freopen("input.txt", "r", stdin) == 0)
return 1;
char s[MAX];
for (int i = 0; i < 30 && fgets(s, MAX, stdin) != NULL; i++)
{
// Commenting out this region fixes the issue
int status;
pid_t pid = fork();
if (pid == 0)
{
exit(0);
}
else
{
waitpid(pid, &status, 0);
}
// End region
printf("%s", s);
}
return 0;
}
One of the modifications limits the number of cycles (children) to just 30.
I used a data file with 4 lines of 20 random letters plus a newline (84 bytes total):
ywYaGKiRtAwzaBbuzvNb
eRsjPoBaIdxZZtJWfSty
uGnxGhSluywhlAEBIXNP
plRXLszVvPgZhAdTLlYe
I ran the command under strace on Ubuntu:
$ strace -ff -o st-out -- neof97
ywYaGKiRtAwzaBbuzvNb
eRsjPoBaIdxZZtJWfSty
uGnxGhSluywhlAEBIXNP
plRXLszVvPgZhAdTLlYe
…
uGnxGhSluywhlAEBIXNP
plRXLszVvPgZhAdTLlYe
ywYaGKiRtAwzaBbuzvNb
eRsjPoBaIdxZZtJWfSty
$
There were 31 files with names of the form st-out.808## where the hashes were 2-digit numbers. The main process file was quite large; the others were small, with one of the sizes 66, 110, 111, or 137:
$ cat st-out.80833
lseek(0, -63, SEEK_CUR) = 21
exit_group(0) = ?
+++ exited with 0 +++
$ cat st-out.80834
lseek(0, -42, SEEK_CUR) = -1 EINVAL (Invalid argument)
exit_group(0) = ?
+++ exited with 0 +++
$ cat st-out.80835
lseek(0, -21, SEEK_CUR) = 0
exit_group(0) = ?
+++ exited with 0 +++
$ cat st-out.80836
exit_group(0) = ?
+++ exited with 0 +++
$
It just so happened that the first 4 children each exhibited one of the four behaviours — and each further set of 4 children exhibited the same pattern.
This shows that three out of four of the children were indeed doing an lseek() on standard input before exiting. Obviously, I have now seen a library do it. I have no idea why it is thought to be a good idea, though, but empirically, that is what is happening.
neof67.c
This version of the code, using a separate file stream (and file descriptor) and fopen() instead of freopen() also runs into the problem.
#include "posixver.h"
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
enum { MAX = 100 };
int main(void)
{
FILE *fp = fopen("input.txt", "r");
if (fp == 0)
return 1;
char s[MAX];
for (int i = 0; i < 30 && fgets(s, MAX, fp) != NULL; i++)
{
// Commenting out this region fixes the issue
int status;
pid_t pid = fork();
if (pid == 0)
{
exit(0);
}
else
{
waitpid(pid, &status, 0);
}
// End region
printf("%s", s);
}
return 0;
}
This also exhibits the same behaviour, except that the file descriptor on which the seek occurs is 3 instead of 0. So, two of my hypotheses are disproven — it's related to freopen() and stdin; both are shown incorrect by the second test code.
Preliminary diagnosis
IMO, this is a bug. You should not be able to run into this problem.
It is most likely a bug in the Linux (GNU C) library rather than the kernel. It is caused by the lseek() in the child processes. It is not clear (because I've not gone to look at the source code) what the library is doing or why.
GLIBC Bug 23151
GLIBC Bug 23151 - A forked process with unclosed file does lseek before exit and can cause infinite loop in parent I/O.
The bug was created 2018-05-08 US/Pacific, and was closed as INVALID by 2018-05-09. The reason given was:
Please read
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_05_01,
especially this paragraph:
Note that after a fork(), two handles exist where one existed before. […]
POSIX
The complete section of POSIX referred to (apart from verbiage noting that this is not covered by the C standard) is this:
2.5.1 Interaction of File Descriptors and Standard I/O Streams
An open file description may be accessed through a file descriptor, which is created using functions such as open() or pipe(), or through a stream, which is created using functions such as fopen() or popen(). Either a file descriptor or a stream is called a "handle" on the open file description to which it refers; an open file description may have several handles.
Handles can be created or destroyed by explicit user action, without affecting the underlying open file description. Some of the ways to create them include fcntl(), dup(), fdopen(), fileno(), and fork(). They can be destroyed by at least fclose(), close(), and the exec functions.
A file descriptor that is never used in an operation that could affect the file offset (for example, read(), write(), or lseek()) is not considered a handle for this discussion, but could give rise to one (for example, as a consequence of fdopen(), dup(), or fork()). This exception does not include the file descriptor underlying a stream, whether created with fopen() or fdopen(), so long as it is not used directly by the application to affect the file offset. The read() and write() functions implicitly affect the file offset; lseek() explicitly affects it.
The result of function calls involving any one handle (the "active handle") is defined elsewhere in this volume of POSIX.1-2017, but if two or more handles are used, and any one of them is a stream, the application shall ensure that their actions are coordinated as described below. If this is not done, the result is undefined.
A handle which is a stream is considered to be closed when either an fclose(), or freopen() with non-full(1) filename, is executed on it (for freopen() with a null filename, it is implementation-defined whether a new handle is created or the existing one reused), or when the process owning that stream terminates with exit(), abort(), or due to a signal. A file descriptor is closed by close(), _exit(), or the exec() functions when FD_CLOEXEC is set on that file descriptor.
(1) [sic] Using 'non-full' is probably a typo for 'non-null'.
For a handle to become the active handle, the application shall ensure that the actions below are performed between the last use of the handle (the current active handle) and the first use of the second handle (the future active handle). The second handle then becomes the active handle. All activity by the application affecting the file offset on the first handle shall be suspended until it again becomes the active file handle. (If a stream function has as an underlying function one that affects the file offset, the stream function shall be considered to affect the file offset.)
The handles need not be in the same process for these rules to apply.
Note that after a fork(), two handles exist where one existed before. The application shall ensure that, if both handles can ever be accessed, they are both in a state where the other could become the active handle first. The application shall prepare for a fork() exactly as if it were a change of active handle. (If the only action performed by one of the processes is one of the exec() functions or _exit() (not exit()), the handle is never accessed in that process.)
For the first handle, the first applicable condition below applies. After the actions required below are taken, if the handle is still open, the application can close it.
If it is a file descriptor, no action is required.
If the only further action to be performed on any handle to this open file descriptor is to close it, no action need be taken.
If it is a stream which is unbuffered, no action need be taken.
If it is a stream which is line buffered, and the last byte written to the stream was a <newline> (that is, as if a
putc('\n')
was the most recent operation on that stream), no action need be taken.
If it is a stream which is open for writing or appending (but not also open for reading), the application shall either perform an fflush(), or the stream shall be closed.
If the stream is open for reading and it is at the end of the file (feof() is true), no action need be taken.
If the stream is open with a mode that allows reading and the underlying open file description refers to a device that is capable of seeking, the application shall either perform an fflush(), or the stream shall be closed.
For the second handle:
If any previous active handle has been used by a function that explicitly changed the file offset, except as required above for the first handle, the application shall perform an lseek() or fseek() (as appropriate to the type of handle) to an appropriate location.
If the active handle ceases to be accessible before the requirements on the first handle, above, have been met, the state of the open file description becomes undefined. This might occur during functions such as a fork() or _exit().
The exec() functions make inaccessible all streams that are open at the time they are called, independent of which streams or file descriptors may be available to the new process image.
When these rules are followed, regardless of the sequence of handles used, implementations shall ensure that an application, even one consisting of several processes, shall yield correct results: no data shall be lost or duplicated when writing, and all data shall be written in order, except as requested by seeks. It is implementation-defined whether, and under what conditions, all input is seen exactly once.
Each function that operates on a stream is said to have zero or more "underlying functions". This means that the stream function shares certain traits with the underlying functions, but does not require that there be any relation between the implementations of the stream function and its underlying functions.
Exegesis
That is hard reading! If you're not clear on the distinction between open file descriptor and open file description, read the specification of open() and fork() (and dup() or dup2()). The definitions for file descriptor and open file description are also relevant, if terse.
In the context of the code in this question (and also for Unwanted child processes being created while file reading), we have a file stream handle open for reading only which has not yet encountered EOF (so feof() would not return true, even though the read position is at the end of the file).
One of the crucial parts of the specification is: The application shall prepare for a fork() exactly as if it were a change of active handle.
This means that the steps outlined for 'first file handle' are relevant, and stepping through them, the first applicable condition is the last:
If the stream is open with a mode that allows reading and the underlying open file description refers to a device that is capable of seeking, the application shall either perform an fflush(), or the stream shall be closed.
If you look at the definition for fflush(), you find:
If stream points to an output stream or an update stream in which the most recent operation was not input, fflush() shall cause any unwritten data for that stream to be written to the file, [CX] ⌦ and the last data modification and last file status change timestamps of the underlying file shall be marked for update.
For a stream open for reading with an underlying file description, if the file is not already at EOF, and the file is one capable of seeking, the file offset of the underlying open file description shall be set to the file position of the stream, and any characters pushed back onto the stream by ungetc() or ungetwc() that have not subsequently been read from the stream shall be discarded (without further changing the file offset). ⌫
It isn't exactly clear what happens if you apply fflush() to an input stream associated with a non-seekable file, but that isn't our immediate concern. However, if you're writing generic library code, then you might need to know whether the underlying file descriptor is seekable before doing a fflush() on the stream. Alternatively, use fflush(NULL) to have the system do whatever is necessary for all I/O streams, noting that this will lose any pushed-back characters (via ungetc() etc).
The lseek() operations shown in the strace output seem to be implementing the fflush() semantics associating the file offset of the open file description with the file position of the stream.
So, for the code in this question, it seems that fflush(stdin) is necessary before the fork() to ensure consistency. Not doing that leads to undefined behaviour ('if this is not done, the result is undefined') — such as looping indefinitely.
The exit() call closes all open file handles. After the fork, the child and parent have identical copies of the execution stack, including the FileHandle pointer. When the child exits, it closes the file and resets the pointer.
int main(){
freopen("input.txt", "r", stdin);
char s[MAX];
prompt(s);
int i = 0;
char* ret = fgets(s, MAX, stdin);
while (ret != NULL) {
//Commenting out this region fixes the issue
int status;
pid_t pid = fork(); // At this point both processes has a copy of the filehandle
if (pid == 0) {
exit(0); // At this point the child closes the filehandle
} else {
waitpid(pid, &status, 0);
}
//End region
printf("%s", s);
ret = fgets(s, MAX, stdin);
}
}
As /u/visibleman pointed out, the child thread is closing the file and messing things up in main.
I was able to work around it by checking if the program is in terminal mode with
!isatty(fileno(stdin))
And if stdin has been redirected, then it will read all of it into a linkedlist before doing any processing or forking.
Replace exit(0) with _exit(0), and all is fine. This is an old unix tradition, if you are using stdio, your forked image must use _exit(), not exit().
The behaviour of printf() seems to depend on the location of stdout.
If stdout is sent to the console, then printf() is line-buffered and is flushed after a newline is printed.
If stdout is redirected to a file, the buffer is not flushed unless fflush() is called.
Moreover, if printf() is used before stdout is redirected to file, subsequent writes (to the file) are line-buffered and are flushed after newline.
When is stdout line-buffered, and when does fflush() need to be called?
Minimal example of each:
void RedirectStdout2File(const char* log_path) {
int fd = open(log_path, O_RDWR|O_APPEND|O_CREAT,S_IRWXU|S_IRWXG|S_IRWXO);
dup2(fd,STDOUT_FILENO);
if (fd != STDOUT_FILENO) close(fd);
}
int main_1(int argc, char* argv[]) {
/* Case 1: stdout is line-buffered when run from console */
printf("No redirect; printed immediately\n");
sleep(10);
}
int main_2a(int argc, char* argv[]) {
/* Case 2a: stdout is not line-buffered when redirected to file */
RedirectStdout2File(argv[0]);
printf("Will not go to file!\n");
RedirectStdout2File("/dev/null");
}
int main_2b(int argc, char* argv[]) {
/* Case 2b: flushing stdout does send output to file */
RedirectStdout2File(argv[0]);
printf("Will go to file if flushed\n");
fflush(stdout);
RedirectStdout2File("/dev/null");
}
int main_3(int argc, char* argv[]) {
/* Case 3: printf before redirect; printf is line-buffered after */
printf("Before redirect\n");
RedirectStdout2File(argv[0]);
printf("Does go to file!\n");
RedirectStdout2File("/dev/null");
}
Flushing for stdout is determined by its buffering behaviour. The buffering can be set to three modes: _IOFBF (full buffering: waits until fflush() if possible), _IOLBF (line buffering: newline triggers automatic flush), and _IONBF (direct write always used). "Support for these characteristics is implementation-defined, and may be affected via the setbuf() and setvbuf() functions." [C99:7.19.3.3]
"At program startup, three text streams are predefined and need not be opened explicitly
— standard input (for reading conventional input), standard output (for writing
conventional output), and standard error (for writing diagnostic output). As initially
opened, the standard error stream is not fully buffered; the standard input and standard
output streams are fully buffered if and only if the stream can be determined not to refer
to an interactive device." [C99:7.19.3.7]
Explanation of observed behaviour
So, what happens is that the implementation does something platform-specific to decide whether stdout is going to be line-buffered. In most libc implementations, this test is done when the stream is first used.
Behaviour #1 is easily explained: when the stream is for an interactive device, it is line-buffered, and the printf() is flushed automatically.
Case #2 is also now expected: when we redirect to a file, the stream is fully buffered and will not be flushed except with fflush(), unless you write gobloads of data to it.
Finally, we understand case #3 too for implementations that only perform the check on the underlying fd once. Because we forced stdout's buffer to be initialised in the first printf(), stdout acquired the line-buffered mode. When we swap out the fd to go to file, it's still line-buffered, so the data is flushed automatically.
Some actual implementations
Each libc has latitude in how it interprets these requirements, since C99 doesn't specify what an "interactive device" is, nor does POSIX's stdio entry extend this (beyond requiring stderr to be open for reading).
Glibc. See filedoalloc.c:L111. Here we use stat() to test if the fd is a tty, and set the buffering mode accordingly. (This is called from fileops.c.) stdout initially has a null buffer, and it's allocated on the first use of the stream based on the characteristics of fd 1.
BSD libc. Very similar, but much cleaner code to follow! See this line in makebuf.c
Your are wrongly combining buffered and unbuffered IO functions. Such a combination must be done very carefully especially when the code has to be portable. (and it is bad to write unportable code...)
It is certainly best to avoid combining buffered and unbuffered IO on the same file descriptor.
Buffered IO: fprintf(), fopen(), fclose(), freopen()...
Unbuffered IO: write(), open(), close(), dup()...
When you use dup2() to redirect stdout. The function is not aware of the buffer which was filled by fprintf(). So when dup2() closes the old descriptor 1 it does not flush the buffer and the content could be flushed to a different output. In your case 2a it was sent to /dev/null.
The solution
In your case it is best to use freopen() instead of dup2(). This solves all your problems:
It flushes the buffers of the original FILE stream. (case 2a)
It sets the buffering mode according to the newly opened file. (case 3)
Here is the correct implementation of your function:
void RedirectStdout2File(const char* log_path) {
if(freopen(log_path, "a+", stdout) == NULL) err(EXIT_FAILURE, NULL);
}
Unfortunately with buffered IO you cannot directly set permissions of a newly created file. You have to use other calls to change the permissions or you can use unportable glibc extensions. See the fopen() man page.
You should not close the file descriptor, so remove close(fd) and close
stdout_bak_fd if you want the message to be print only in the file.
Is it alright for multiple processes to access (write) to the same file at the same time? Using the following code, it seems to work, but I have my doubts.
Use case in the instance is an executable that gets called every time an email is received and logs it's output to a central file.
if (freopen(console_logfile, "a+", stdout) == NULL || freopen(error_logfile, "a+", stderr) == NULL) {
perror("freopen");
}
printf("Hello World!");
This is running on CentOS and compiled as C.
Using the C standard IO facility introduces a new layer of complexity; the file is modified solely via write(2)-family of system calls (or memory mappings, but that's not used in this case) -- the C standard IO wrappers may postpone writing to the file for a while and may not submit complete requests in one system call.
The write(2) call itself should behave well:
[...] If the file was
open(2)ed with O_APPEND, the file offset is first set to the
end of the file before writing. The adjustment of the file
offset and the write operation are performed as an atomic
step.
POSIX requires that a read(2) which can be proved to occur
after a write() has returned returns the new data. Note that
not all file systems are POSIX conforming.
Thus your underlying write(2) calls will behave properly.
For the higher-level C standard IO streams, you'll also need to take care of the buffering. The setvbuf(3) function can be used to request unbuffered output, line-buffered output, or block-buffered output. The default behavior changes from stream to stream -- if standard output and standard error are writing to the terminal, then they are line-buffered and unbuffered by default. Otherwise, block-buffering is the default.
You might wish to manually select line-buffered if your data is naturally line-oriented, to prevent interleaved data. If your data is not line-oriented, you might wish to use un-buffered or leave it block-buffered but manually flush the data whenever you've accumulated a single "unit" of output.
If you are writing more than BUFSIZ bytes at a time, your writes might become interleaved. The setvbuf(3) function can help prevent the interleaving.
It might be premature to talk about performance, but line-buffering is going to be slower than block buffering. If you're logging near the speed of the disk, you might wish to take another approach entirely to ensure your writes aren't interleaved.
This answer was incorrect. It does work:
So the race condition would be:
process 1 opens it for append, then
later process 2 opens it for append, then
later still 1 writes and closes, then
finally 2 writes and closes.
I'd be impressed if that 'worked' because it isn't clear to me what
working should mean. I assume 'working' means all of the bytes written
by the two processes are inthe log file? I'd expect that they both
write starting at the same byte offset, so one will replace the others
bytes. It will all be okay upto and including step 3. and only show up
as a problem at step 4, Seems like an easy test to write: open getchar
... write close.
Is it critical that they can have the file open simultaneously? A
more obvious solution if the write is quick, is to open exclusive.
For a quick check on your system, try:
/* write the first command line argument to a file called foo
* stackoverflow topic 9880935
*/
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
int main (int argc, const char * argv[]) {
if (argc <2) {
fprintf(stderr, "Error: need some text to write to the file Foo\n");
exit(1);
}
FILE* fp = freopen("foo", "a+", stdout);
if (fp == NULL) {
perror("Error failed to open file\n");
exit(1);
}
fprintf(stderr, "Press a key to continue\n");
(void) getchar(); /* Yes, I really mean to ignore the character */
if (printf("%s\n", argv[1]) < 0) {
perror("Error failed to write to file: ");
exit(1);
}
fclose(fp);
return 0;
}
I have a C application with many worker threads. It is essential that these do not block so where the worker threads need to write to a file on disk, I have them write to a circular buffer in memory, and then have a dedicated thread for writing that buffer to disk.
The worker threads do not block any more. The dedicated thread can safely block while writing to disk without affecting the worker threads (it does not hold a lock while writing to disk). My memory buffer is tuned to be sufficiently large that the writer thread can keep up.
This all works great. My question is, how do I implement something similar for stdout?
I could macro printf() to write into a memory buffer, but I don't have control over all the code that might write to stdout (some of it is in third-party libraries).
Thoughts?
NickB
I like the idea of using freopen. You might also be able to redirect stdout to a pipe using dup and dup2, and then use read to grab data from the pipe.
Something like so:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define MAX_LEN 40
int main( int argc, char *argv[] ) {
char buffer[MAX_LEN+1] = {0};
int out_pipe[2];
int saved_stdout;
saved_stdout = dup(STDOUT_FILENO); /* save stdout for display later */
if( pipe(out_pipe) != 0 ) { /* make a pipe */
exit(1);
}
dup2(out_pipe[1], STDOUT_FILENO); /* redirect stdout to the pipe */
close(out_pipe[1]);
/* anything sent to printf should now go down the pipe */
printf("ceci n'est pas une pipe");
fflush(stdout);
read(out_pipe[0], buffer, MAX_LEN); /* read from pipe into buffer */
dup2(saved_stdout, STDOUT_FILENO); /* reconnect stdout for testing */
printf("read: %s\n", buffer);
return 0;
}
If you're working with the GNU libc, you might use memory streams string streams.
You can "redirect" stdout into file using freopen().
man freopen says:
The freopen() function opens the file
whose name is the string pointed to
by path and associates the stream
pointed to by stream with it. The
original stream (if it exists) is
closed. The mode argument is used
just as in the fopen() function.
The primary use of the freopen()
function is to change the file
associated with a standard text
stream (stderr, stdin, or stdout).
This file well could be a pipe - worker threads will write to that pipe and writer thread will listen.
Why don't you wrap your entire application in another? Basically, what you want is a smart cat that copies stdin to stdout, buffering as necessary. Then use standard stdin/stdout redirection. This can be done without modifying your current application at all.
~MSalters/# YourCurrentApp | bufcat
You can change how buffering works with setvbuf() or setbuf(). There's a description here: http://publications.gbdirect.co.uk/c_book/chapter9/input_and_output.html.
[Edit]
stdout really is a FILE*. If the existing code works with FILE*s, I don't see what prevents it from working with stdout.
One solution ( for both things your doing ) would be to use a gathering write via writev.
Each thread could for example sprintf into a iovec buffer and then pass the iovec pointers to the writer thread and have it simply call writev with stdout.
Here is an example of using writev from Advanced Unix Programming
Under Windows you would use WSAsend for similar functionality.
The method using the 4096 bigbuf will only sort of work. I've tried this code, and while it does successfully capture stdout into the buffer, it's unusable in a real world case. You have no way of knowing how long the captured output is, so no way of knowing when to terminate the string '\0'. If you try to use the buffer you get 4000 characters of garbage spit out if you had successfully captured 96 characters of stdout output.
In my application, I'm using a perl interpreter in the C program. I have no idea how much output is going to be spit out of what ever document is thrown at the C program, and hence the code above would never allow me to cleanly print that output out anywhere.