C: multi-processes stdio append mode - c

I wrote this code in C:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
void random_seed(){
struct timeval tim;
gettimeofday(&tim, NULL);
double t1=tim.tv_sec+(tim.tv_usec/1000000.0);
srand (t1);
}
void main(){
FILE *f;
int i;
int size=100;
char *buf=(char*)malloc(size);
f = fopen("output.txt", "a");
setvbuf (f, buf, _IOFBF, size);
random_seed();
for(i=0; i<200; i++){
fprintf(f, "[ xx - %d - 012345678901234567890123456789 - %d]\n", rand()%10, getpid());
fflush(f);
}
fclose(f);
free(buf);
}
This code opens in append mode a file and attaches 200 times a string.
I set the buf of size 100 that can contains the full string.
Then I created multi processes running this code by using this bash script:
#!/bin/bash
gcc source.c
rm output.txt
for i in `seq 1 100`;
do
./a.out &
done
I expected that in the output the strings are never mixed up, as I read that when opening a file with O_APPEND flag the file offset will be set to the end of the file prior to each write and i'm using a fully buffered stream, but i got the first line of each process is mixed as this:
[ xx - [ xx - 7 - 012345678901234567890123456789 - 22545]
and some lines later
2 - 012345678901234567890123456789 - 22589]
It looks like the write is interrupted for calling the rand function.
So...why appear these lines?
Is the only way to prevent this the use file locks...even if i'm using only the append mode?
Thanks in advance!

You will need to implement some form of concurrency control yourself, POSIX makes no guarantees with respect to concurrent writes from multiple processes. You get some guarantees for pipes, but not for regular files written to from different processes.
Quoting POSIX write():
This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control.
(At the end of the Rationale section.)

You open the file in the fully buffered mode. That means that every line of the output first goes into the buffer and when the buffer overflows it gets flushed to the file regardless whether it contains incomplete lines. That causes chunks of output from different processes writing into the same file concurrently to be interleaved.
An easy fix would be to open the file in line buffered mode _IOLBF, so that the buffer gets flushed on each complete line. Just make sure that the buffer size is at least as big as your longest line, otherwise it will end up writing incomplete lines. The buffer is normally flushed with a single write() system call, so that lines from different processes won't interleave each other.
There is no guarantee that write() system call is atomic for different filesystems though, but it normally works as expected because write() normally locks the file descriptor in the kernel with a mutex before proceeding.

Related

How to use write() or fwrite() for writing data to terminal (stdout)?

I am trying to speed up my C program to spit out data faster.
Currently I am using printf() to give some data to the outside world. It is a continuous stream of data, therefore I am unable to use return(data).
How can I use write() or fwrite() to give the data out to the console instead of file?
Overall my setup consist of program written in C and its output goes to the python script, where the data is processed further. I form a pipe:
./program_in_c | script_in_python
This gives additional benefit on Raspberry Pi by using more of processor's cores.
#include <unistd.h>
ssize_t write(int fd, const void *buf, size_t count);
write() writes up to count bytes from the buffer starting at buf to
the file referred to by the file descriptor fd.
the standard output file descriptor is: 1 in linux at least!
concern using flush the stdoutput buffer as well, before calling to write system call to ensure that all previous garabge was cleaned
fflush(stdout); // Will now print everything in the stdout buffer
write(1, buf, count);
using fwrite:
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);
The function fwrite() writes nmemb items of data, each size bytes
long, to the stream pointed to by stream, obtaining them from the
location given by ptr.
fflush(stdout);
int buf[8];
fwrite(buf, sizeof(int), sizeof(buf), stdout);
Please refare to man pages for further reading, in the links below:
fwrite
write
Well, there's little or no win in trying to overcome the already used buffering system of the stdio.h package. If you try to use fwrite() with larger buffers, you'll probably win no more time, and use more memory than is necessary, as stdio.h selects the best buffer size appropiate to the filesystem where the data is to be written.
A simple program like the following will show that speed is of no concern, as stdio is already buffering output.
#include <stdio.h>
int
main()
{
int c;
while((c = getchar()) >= 0)
putchar(c);
}
If you try the above and below programs:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main()
{
char buffer[512];
int n;
while((n = read(0, buffer, sizeof buffer)) > 0)
write(1, buffer, n);
if (n < 0) {
perror("read");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
You will see that there's no significative difference or, even, the first program will be faster, despite it is doing I/O on a per character basis. (as B. Kernighan & Dennis Ritchie wrote it in her first edition of "The C programming language") Most probably the first program will win.
The calls to read() and write() involve a system call each, with a buffer size decided by you. The individual getchar() and putchar() calls don't. They just store the received chars in a memory buffer, as you print them, whose size has been decided by the stdio.h library implementation, based on the filesystem, and it flushes the buffer, once it is full of data. If you grow the buffer size in the second program, you'll see that you get better results increasing it up to a point, but after that you'll see no more increment in speed. The number of calls made to the library is insignificant with respect to the time involved in doing the actual I/O, and selecting a very large buffer, will eat much memory from your system (and a Raspberry Pi memory is limited in this sense, to 1Gb or ram) If you end making swap due to a so large buffer, you'll lose the battle completely.
Most filesystems have a preferred buffer size, because the kernel does write ahead (the kernel reads more than what you asked for, on sequential reads, in prevision that you'll continue reading more after you consumed the data) and this affects the optimum buffer size. For that, the stat(2) system call tells you what is the optimum buffer size, and stdio uses that when it selects the actual buffer size.
Don't think you are going to get better (or much better) than the program listed first above. Even if you use large enough buffers.
What is not correct (or valid) is to intermix calls that do buffering (like all the stdio package's) with basic system calls (like read(2) or write(2) ---as I've seen recommending you to use fflush(3) after write(2), which is totally incoherent--- that do not buffer the data) there's no earn (and probably you'll get your output incorrectly ordered, if you do part of the calls using printf(3) and part using write(2) (this happens more in pipelines like you plan to do, because the buffers are not line oriented ---another characteristic of buffered output in stdio---)
Finally, I recomend you to read "The Unix programming environment" by Dennis Ritchie and Rob Pike. It will teach you a lot of unix, but one very good thing is that it will teach you to use perfectly the stdio package and the unix filesystem calls for reading and writing. With a little of luck you'll find it in .pdf on internet.
The next program shows you the effect of buffering:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main()
{
int i;
char *sep = "";
for (i = 0; i < 10; i++) {
printf("%s%d", sep, i);
sep = ", ";
sleep(1);
}
printf("\n");
}
One would assume you are going to see (on the terminal) the program, writing the numbers 0 to 9, separated by , and paced on one second intervals.
But due to the buffering, what you observe is quite different, you'll see how your program waits for 10 seconds without writing anything at all on the terminal, and at the end, writes everything in one shot, including the final line end, when the program terminates, and the shell shows you the prompt again.
If you change the program to this:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main()
{
int i;
char *sep = "";
for (i = 0; i < 10; i++) {
printf("%s%d", sep, i);
fflush(stdout);
sep = ", ";
sleep(1);
}
printf("\n");
}
You'll see the expected output, because you have told stdio to flush the buffer at each loop pass. In both programs you did 10 calls to printf(3), but there was only one write(2) at the end to write the full buffer. In the second version you forced stdio to do one such write(2) after each printf, and that showed the data out as the program passed through the loop.
Be careful, because another characteristic of stdio can be confounding you, as printf(3), when you print to a terminal device, flushes the output at each \n, but when you run it through a pipe, it does it only when the buffer fills up completely. This saves system calls (in FreeBSD, for example, the buffer size selected by stdio is around 32kb, large enough to force two blocks to write(2) and optimum (you'll not get better going above that size)
The console output in C works almost the same way as a file. Once you have included stdio.h, you can write on the console output, named stdout (for "standard output"). In the end, the following statement:
printf("hello world!\n");
is the same as:
char str[] = "hello world\n";
fwrite(str, sizeof(char), sizeof(str) - 1, stdout);
fflush(stdout);

why "Line buffer stdout to ensure lines are written atomically and immediately"

I'm reading the source code of wc command, and in main function I found follow code:
/* Line buffer stdout to ensure lines are written atomically and immediately
so that processes running in parallel do not intersperse their output. */
setvbuf (stdout, NULL, _IOLBF, 0);
so why line buffer stdout ensure that?
Say block buffering is used for stdout instead of line buffering. (This is the default if stdout refers to a regular file, for example.) Let the buffer size be 1024 bytes (so that output is flushed to the file every 1024 bytes), and pretend that two processes are writing to the same file.
Say that the first process currently has 1020 bytes in its I/O buffer and writes the line "foo_file 37\n" to stdout. This will put "foo_" at the end of the I/O buffer, flush the buffer to the file (since the buffer is now full), and then put "file 37\n" at the beginning of the buffer. Say that the second process then comes along and flushes its buffer, which happens to start with "bar_file 48\n". The resulting line in the output file will then be "foo_bar_file 48", which clearly isn't what we want.
The basic problem is that buffer boundaries do not necessarily correspond to line boundaries when block buffering is used.
You could play around with two instances of the following program writing to the same file to see this effect in action yourself:
#include <stdio.h>
int main(void) {
setvbuf (stdout, NULL, _IOLBF, 0);
for (;;)
puts("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
return 0;
}
With the setvbuf() call commented out, you will see some lines get mixed up with other lines. Be aware that this will program will quickly write a huge file, of course. :)

stderr and stdout - not buffered vs. buffered?

I was writing a small program that had various console output strings depending upon different events. As I was looking up the best way to send these messages I came across something that was a bit confusing.
I have read that stderr is used to shoot messages directly to the console - not buffered. While, in contrast, I read that stdout is buffered and is typically used to redirect messages to various streams?, that may or may not be error messages, to an output file or some other medium.
What is the difference when something is said to be buffered and not buffered? It made sense when I was reading that the message is shot directly to the output and is not buffered .. but at the same time I realized that I was not entirely sure what it meant to be buffered.
Typically, stdout is line buffered, meaning that characters sent to stdout "stack up" until a newline character arrives, at which point that are all outputted.
A buffered stream is one in which you keep writing until a certain threshold. This threshold could be a specific character, as Konrad mentions for line buffering, or another threshold, such as a specific count of characters written.
Buffering is intended to speed up input/output operations. One of the slowest things a computer does is write to a stream (whether a console or a file). When things don't need to be immediately seen, it saves time to store it up for a while.
You are right, stderr is typically an unbuffered stream while stdout typically is buffered. So there can be times when you output things to stdout then output to stderr and stderr appears first on the console. If you want to make stdout behave similarly, you would have to flush it after every write.
When an output stream is buffered, it means that the stream doesn't necessarily output data the moment you tell it to. There can be significant overhead per IO operation, so lots and lots of little IO operations can create a bottleneck. By buffering IO operations and then flushing many at once, this overhead is reduced.
While stdout and stderr may behave differently regarding buffering, that is generally not the deciding factor between them and shouldn't be relied on. If you absolutely need the output immediately, always manually flush the stream.
Assume
int main(void)
{
printf("foo\n");
sleep(10);
printf("bar\n");
}
when executing it on the console
$ ./a.out
you will see foo line and 10 seconds later bar line (--> line buffered). When redirecting the output into a file or a pipe
$ ./a.out > /tmp/file
the file stays empty (--> buffered) until the program terminates (--> implicit fflush() at exit).
When lines above do not contain a \n, you won't see anything on the console either until program terminates.
Internally, printf() adds a character to a buffer. To make things more easy, let me describe fputs(char const *s, FILE *f) instead of. FILE might be defined as
struct FILE {
int fd; /* is 0 for stdin, 1 for stdout, 2 for stderr (usually) */
enum buffer_mode mode;
char buf[4096];
size_t count; /* number of chars in buf[] */
};
typedef struct FILE *FILE;
int fflush(FILE *f)
{
write(f->fd, f->buf, f->count);
f->count = 0;
}
int fputc(int c, FILE *f)
{
if (f->count >= ARRAY_SIZE(f->buf))
fflush(f);
f->buf[f->count++] = c;
}
int fputs(char const *s, FILE *f)
{
while (*s) {
char c = *s++;
fputc(c, f);
if (f->mode == LINE_BUFFERED && c == '\n')
fflush(f);
}
if (f->mode == UNBUFFERED)
fflush(f);
}

Multiple processes accessing the same file

Is it alright for multiple processes to access (write) to the same file at the same time? Using the following code, it seems to work, but I have my doubts.
Use case in the instance is an executable that gets called every time an email is received and logs it's output to a central file.
if (freopen(console_logfile, "a+", stdout) == NULL || freopen(error_logfile, "a+", stderr) == NULL) {
perror("freopen");
}
printf("Hello World!");
This is running on CentOS and compiled as C.
Using the C standard IO facility introduces a new layer of complexity; the file is modified solely via write(2)-family of system calls (or memory mappings, but that's not used in this case) -- the C standard IO wrappers may postpone writing to the file for a while and may not submit complete requests in one system call.
The write(2) call itself should behave well:
[...] If the file was
open(2)ed with O_APPEND, the file offset is first set to the
end of the file before writing. The adjustment of the file
offset and the write operation are performed as an atomic
step.
POSIX requires that a read(2) which can be proved to occur
after a write() has returned returns the new data. Note that
not all file systems are POSIX conforming.
Thus your underlying write(2) calls will behave properly.
For the higher-level C standard IO streams, you'll also need to take care of the buffering. The setvbuf(3) function can be used to request unbuffered output, line-buffered output, or block-buffered output. The default behavior changes from stream to stream -- if standard output and standard error are writing to the terminal, then they are line-buffered and unbuffered by default. Otherwise, block-buffering is the default.
You might wish to manually select line-buffered if your data is naturally line-oriented, to prevent interleaved data. If your data is not line-oriented, you might wish to use un-buffered or leave it block-buffered but manually flush the data whenever you've accumulated a single "unit" of output.
If you are writing more than BUFSIZ bytes at a time, your writes might become interleaved. The setvbuf(3) function can help prevent the interleaving.
It might be premature to talk about performance, but line-buffering is going to be slower than block buffering. If you're logging near the speed of the disk, you might wish to take another approach entirely to ensure your writes aren't interleaved.
This answer was incorrect. It does work:
So the race condition would be:
process 1 opens it for append, then
later process 2 opens it for append, then
later still 1 writes and closes, then
finally 2 writes and closes.
I'd be impressed if that 'worked' because it isn't clear to me what
working should mean. I assume 'working' means all of the bytes written
by the two processes are inthe log file? I'd expect that they both
write starting at the same byte offset, so one will replace the others
bytes. It will all be okay upto and including step 3. and only show up
as a problem at step 4, Seems like an easy test to write: open getchar
... write close.
Is it critical that they can have the file open simultaneously? A
more obvious solution if the write is quick, is to open exclusive.
For a quick check on your system, try:
/* write the first command line argument to a file called foo
* stackoverflow topic 9880935
*/
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
int main (int argc, const char * argv[]) {
if (argc <2) {
fprintf(stderr, "Error: need some text to write to the file Foo\n");
exit(1);
}
FILE* fp = freopen("foo", "a+", stdout);
if (fp == NULL) {
perror("Error failed to open file\n");
exit(1);
}
fprintf(stderr, "Press a key to continue\n");
(void) getchar(); /* Yes, I really mean to ignore the character */
if (printf("%s\n", argv[1]) < 0) {
perror("Error failed to write to file: ");
exit(1);
}
fclose(fp);
return 0;
}

How to buffer stdout in memory and write it from a dedicated thread

I have a C application with many worker threads. It is essential that these do not block so where the worker threads need to write to a file on disk, I have them write to a circular buffer in memory, and then have a dedicated thread for writing that buffer to disk.
The worker threads do not block any more. The dedicated thread can safely block while writing to disk without affecting the worker threads (it does not hold a lock while writing to disk). My memory buffer is tuned to be sufficiently large that the writer thread can keep up.
This all works great. My question is, how do I implement something similar for stdout?
I could macro printf() to write into a memory buffer, but I don't have control over all the code that might write to stdout (some of it is in third-party libraries).
Thoughts?
NickB
I like the idea of using freopen. You might also be able to redirect stdout to a pipe using dup and dup2, and then use read to grab data from the pipe.
Something like so:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define MAX_LEN 40
int main( int argc, char *argv[] ) {
char buffer[MAX_LEN+1] = {0};
int out_pipe[2];
int saved_stdout;
saved_stdout = dup(STDOUT_FILENO); /* save stdout for display later */
if( pipe(out_pipe) != 0 ) { /* make a pipe */
exit(1);
}
dup2(out_pipe[1], STDOUT_FILENO); /* redirect stdout to the pipe */
close(out_pipe[1]);
/* anything sent to printf should now go down the pipe */
printf("ceci n'est pas une pipe");
fflush(stdout);
read(out_pipe[0], buffer, MAX_LEN); /* read from pipe into buffer */
dup2(saved_stdout, STDOUT_FILENO); /* reconnect stdout for testing */
printf("read: %s\n", buffer);
return 0;
}
If you're working with the GNU libc, you might use memory streams string streams.
You can "redirect" stdout into file using freopen().
man freopen says:
The freopen() function opens the file
whose name is the string pointed to
by path and associates the stream
pointed to by stream with it. The
original stream (if it exists) is
closed. The mode argument is used
just as in the fopen() function.
The primary use of the freopen()
function is to change the file
associated with a standard text
stream (stderr, stdin, or stdout).
This file well could be a pipe - worker threads will write to that pipe and writer thread will listen.
Why don't you wrap your entire application in another? Basically, what you want is a smart cat that copies stdin to stdout, buffering as necessary. Then use standard stdin/stdout redirection. This can be done without modifying your current application at all.
~MSalters/# YourCurrentApp | bufcat
You can change how buffering works with setvbuf() or setbuf(). There's a description here: http://publications.gbdirect.co.uk/c_book/chapter9/input_and_output.html.
[Edit]
stdout really is a FILE*. If the existing code works with FILE*s, I don't see what prevents it from working with stdout.
One solution ( for both things your doing ) would be to use a gathering write via writev.
Each thread could for example sprintf into a iovec buffer and then pass the iovec pointers to the writer thread and have it simply call writev with stdout.
Here is an example of using writev from Advanced Unix Programming
Under Windows you would use WSAsend for similar functionality.
The method using the 4096 bigbuf will only sort of work. I've tried this code, and while it does successfully capture stdout into the buffer, it's unusable in a real world case. You have no way of knowing how long the captured output is, so no way of knowing when to terminate the string '\0'. If you try to use the buffer you get 4000 characters of garbage spit out if you had successfully captured 96 characters of stdout output.
In my application, I'm using a perl interpreter in the C program. I have no idea how much output is going to be spit out of what ever document is thrown at the C program, and hence the code above would never allow me to cleanly print that output out anywhere.

Resources