I have two (POSIX) threads that write to a log file like this:
pthread_mutex_lock(&log_mutex);
fprintf(LOG, "something...\n");
fsync(fileno(LOG));
pthread_mutex_unlock(&log_mutex);
The file is opened in main() with fopen() with mode "a". While the process is running I can't see anything appearing in the file with cat or tail although after the process is terminated and the file is fclose()-ed, the lines are all there.
What am I doing wrong?
I guess you need to call fflush() to flush the changed to the file system.
See also difference between fsync() and fflush().
Since you are using FILE handle in C, you need to first flush the data from C/C++ buffers to kernel buffers by calling fflush(). fsync is not really required unless you also want to make sure the data reaches the underlying physical storage especially for durability concerns
Related
One thread writes to a file (or even delete it), another one calls sendfile() for that file simultaneously for overlapping positions.
What is the expected behaviour here?
Also, If my understanding is correct, sendfile call will just add a record to socket (file description, position, length) and return to caller. (no copy to socket's buffer yet?), so even sendfile() returns before modification to file, OS depends on file existence until file is sent completely.
So what is the result of any modification until file is sent?
The expected behavior is that the result is unpredictable. sendfile() is not atomic, it's effectively equivalent to writing your own loop that calls read() from the file descriptor and write() on the socket descriptor. If some other process writes to the file while this is going on, you'll get a mix of the old and new contents. Think of it mainly as a convenience function, although it also should be significantly more efficient since it doesn't require multiple system calls, and can copy directly between the file buffer and the socket buffer, rather than copying back and forth between application buffers. Because of this the window of vulnerability should be smaller than doing the read/write loop, but it's still there.
To avoid this problem, you should use file locking between the process(es) doing the writing and the one calling sendfile(). So the sequence should be:
lock the file
call sendfile()
unlock the file
and the writing process should do:
lock the file
write to the file
unlock the file
EDIT:
Actually, it looks like it isn't this simple, because sendfile() links the socket buffer to the file buffer cache, rather than copying it in the kernel. Since sendfile() doesn't wait for the data to be sent, modifying the file after it returns could still affect what gets sent. You need to check that the data has all been received with application-layer acknowledgements. See Minimizing copies when writing large data to a socket for an excerpt from an article that explains these details.
I'm reading some files from the procfs every few seconds, and displaying the information. Instead of opening and closing the files each time, I'm maintaining open file handles and closing them when I'm finished. The problem is that I'm consistently getting old data. The information gathered from the first read is returned in subsequent reads, and I've confirmed that the procfs files are indeed changing.
The only workaround I've found is to do an fflush() before a rewind() when reading the data. This works, but I don't understand why. I know that if I have two programs reading and writing to the same file, an fflush() would be necessary on the PRODUCER side to allow for these changes to be seen by the consumer. Here I'm doing a fflush() on the consumer side and it works. Doesn't the producer and consumer have different file handles, and so a fflush() in the consumer does not fflush() data written by the producer?
Any ideas why I'm getting stale data without fflush(), and up-to-date information using fflush()?
File streams are usually buffered meaning that they are copied to memory before reading to avoid locking them from other processes. You must ensure that your Stream is un-buffered to continually retrieve the information from the hard disk. To do that use setbuf (stream,NULL ); to ensure that your buffer is cleared.
You can read about setbuf here:
http://www.cplusplus.com/reference/clibrary/cstdio/setbuf/
The reason I assumed that your stream is buffered is because fflush(stream) clears a buffered stream.
You can read about that here:
http://www.cplusplus.com/reference/clibrary/cstdio/fflush/
I don't know the exact answer, but I would imagine that the reason for the behavior you observed is the CONSUMER-side cache. It reads the file in blocks, most likely larger than what you are processing at a time, so the "rest" of the buffer is being fed to you when you ask for "more". fflush() makes sure the cache is destroyed before giving you data
I'm writing a server web.
Each connection is served by a separate thread, so I don't know in advance the number of threads.
There are also a group of text files (don't know the number, too), and each thread can read/write on each file.
A file can be written by just one thread a time, but different threads can write on different files at the same time.
If a file is read by one or more threads (reads can be concurrent), no thread can write on THAT file.
Now, I noticed this (Thread safe multi-file writing) solution, but I'd like also to use functions as fgets(), for example.
So, can I flock() a file, and then use a fgets() or another stdio read/write library function?
First of all, use fcntl, not flock. The latter is a non-standard, deprecated BSD function and does not work with NFS and possibly other filesystems. fcntl locking on the other hand is POSIX standard and is intended to work everywhere.
Now if you want to use file-level reader-writer locking mixed with stdio, it will work, but you have to take some care to ensure that buffering does not break your assumptions about locks. The method I'm about to explain is not the only one, but I believe it's the clearest/simplest:
When you want to operate on one of your files with stdio, obtaining the correct type of lock (read or write, aka shared of exclusive) should be the first thing you do after fopen. Use fileno to get the file descriptor number and apply the lock to it. After that, perform your entire read or write operation. Do not make any attempt to unlock the file; instead, call fclose to close the file and let it be implicitly unlocked when it's closed. Otherwise you may release the lock while unbuffered data is still unwritten, or later read data that was buffered before the lock was released, that's no longer valid after the lock is released.
My program is controlling an external application on Linux, passing in input commands via a pipe to the external applications stdin, and reading output result via a pipe from the external applications stdout.
The problem is that writes to pipes are buffered by block, and not by line, and therefore delays occur before my app receives data output by the external application. The external application cannot be altered to add explicit fflush() calls.
When I set the external application to /bin/cat -n (it echoes back the input, with line numbers added), it works correctly, it seems, cat flushes after each line. The only way to force the external application to flush, is sending exit command to it; as it receives the command, it flushes, and all the answers appears on the stdout, just before exiting.
I'm pretty sure, that Unix pipes are appropiate solution for that kind of interprocess communication (pseudo server-client), but maybe I'm wrong.
(I've just copied some text from a similar question: Force another program's standard output to be unbuffered using Python)
Don't use a pipe. Use a pty instead. Pty's (pseudo-ttys) have the benefit of being line buffered if you want it, which provides you with simple framing for your data stream.
Using a PTY may be an overkill for the problem at hand (although it will work).
If the "target application" (the Delphi command-line utility) is dynamically linked, a possibly much simpler solution is to interpose (via LD_PRELOAD) a small library into the application. That library simply needs to implement isatty, and answer true (return 1) regardless of whether the output is going to a pipe or a terminal. You may wish to do that for all file descriptors, or just for STDOUT_FILENO.
Most UNIX implementations will call isatty to decide whether to do full buffering or line buffering for a given file descriptor.
Hmm, glibc doesn't. It calls __fxstat, and then only calls isatty if the status indicates that fd is going to a character device. So you'll need to interpose both __fxstat and isatty. More on library interposition here.
By default standard input and standard output are fully buffered unless they are connected to an interactive device in which cases they are line buffered [1]. Pipes are non-interactive devices. PTYs are interactive devices. "Fully buffered" means "use a chunk of memory of a certain size".
I'm sure you want line buffering. Therefore using a master/slave PTY instead of pipes should bring the controlled application into the right buffering mode automatically.
[1] see "stdin(3)" and "setbuf(3)" for details.
Why calling fflush suitably (on the write side) don't work for you?
You can use poll (or other syscalls like ppoll, pselect, select) to check availability of input on the read side.
If the external application is using <stdio.h> without calling fflush appropriately (perhaps by setbuf making it happen on newlines ....), data would remain inside its FILE* buffer without even being sent (with a write syscall) to the pipe!
An application can detect if its output is a terminal with e.g. isatty. But it should ensure that flushing happens...
As Michael Dillon suggested, using pty-s is probably the best. But it is hard (I forgot the gory details).
I've a new file, opened as read/write then 1 thread will receive from network and append binary data to that file, the other thread will read from the same file to process the binary data, but the read() always return 0, so I can't read the data, but if I using cat in command line to append data, then the program can read the data and process. I don't know why it can't notice the new data coming from network. I'm using open(), read(), and write() in this program.
Use a pipe instead of an HDD-file. Depending on your system (which you didnt tell us) there are only minor modifications to your code (which you didnt give us) to do that.
file operations are buffered. try flushing the stream?
Assuming that your read() and write() functions are the POSIX one, they share the file position, even if they are used in different threads. So your read after write was trying to read after the position at which write had written. Don't use file IO to communicate between threads. In most contexts, I'd not even use pipe or sockets for that (one context I'd use them is when the reading thread is using poll/select with other file descriptors) but simple shared memory and mutex.