flush without sync - c

From what I've read, flush pushes data into the OS buffers and sync makes sure that data goes down to the storage media. So, if you want to be sure that data is actually written to disk, you need to do a flush followed by a sync. So, are there any cases where you want to call flush but not sync?

You only want to fflush if you're using stdio's FILE *. This writes a user space buffer to the kernel.
The other answers seem to be missing fdatasync. This is the system call you want to flush a specific file descriptor to disk.

When you fflush, you flush the buffer of one file to disk (unless you give NULL, in which case it flushes all open files). http://www.manpagez.com/man/3/fflush/
When you sync, you flush all the buffers to disk. http://www.manpagez.com/man/2/sync/
The most important thing that you should notice is that fflush is a standard function, while sync is a system call provided by the operating system (Linux for example).
So basically, if you are writing portable program, you in fact never use sync.

Yes, lots. Most programs most of the time would not bother to call any of the various sync operations; flushing the data into the kernel buffer pool as you close the file is sufficient. This is doubly true if you're using a journalled file system.
Note that flushing is a higher level operation than the read() or similar system calls. It is used by the C <stdio.h> library, or the C++ <iostream> library. The system calls inherently flush the data to the kernel buffer pool (or direct to disk if you're using direct I/O or something similar).
Note, too, that on POSIX-like systems, you can arrange for data sync etc by setting flags on the open() system call (O_SYNC, O_DSYNC, O_RSYNC), or subsequently via fcntl().

Just to clarify, fflush() applies only when using the FILE interface of UNIX that buffers writes at the application level. In case the normal write() call is used, fflush() makes little sense.
Having said that, I can think of two situations where you would like to call fflush() but not sync:
You want to make sure that the data will eventually make it to disk even though the application crashes.
Force to screen the data that the application has written to standard output so far.
The second case is the most common use I have seen and it is usually required if the printf() call does not end with a new line character ('\n').

Related

POSIX guaranteeing write to disk

As I understand it, if I want to synchronise data to the storage device I can use fsync() to supposedly flush all the OS output caches... but apparently it doesn't guarantee this at all, unlike the documentation tries to deceive you, and the data may not be written to the disk!
This is not very good for many purposes because it can lead to data corruption. How do I use the POSIX libraries (In a portable way if possible) to guarantee that the data has been written (as far as possible) and prevent data corruption?
There is fdatasync() but it is not implemented on OSX, so is there a better and more portable way, or does one have to implement different code on different systems? I'm also not sure if fdatasync() is good enough.
Of-course, in the worst case scenario I could forget about this and use a redundant database library that uses ACID to store the data. I don't want that.
Also I'm interested in how to ensure truncate and rename operations have definitely completed.
Thanks!
You are looking for sync. There is both a program called sync and a system call called sync (man 1 sync and man 2 sync respectively):
#include <unistd.h>
void sync(void);
DESCRIPTION
sync() first commits inodes to buffers, and then buffers to disk.
So it will ensure that all pending operations (truncates, renames etc) are in fact written to the disk.
fsync does not claim to flush all output caches, but instead claims to flush all changes to a particular file descriptor to disk. It explicitly does not ensure that the directory entry is updated (in which case a call to fsync on a filedescriptor for the directory is needed).
fsyncdata is even more useless as it will not flush file metadata and instead will just ensure that the data in the file is flushed.
It is a good idea to trust the manpages. I won't say there are not mistakes, but they tend to be extremely accurate.

When does actual write() takes place in C?

What really happens when write() system call is executed?
Lets say I have a program which writes certain data into a file using write() function call. Now C library has its own internal buffer and OS too has its own buffer.
What interaction takes place between these buffers ?
Is it like when C library buffer gets filled completely, it writes to OS buffer and when OS buffer gets filled completely, then the actual write is done on the file?
I am looking for some detailed answers, useful links would also help. Consider this question for a UNIX system.
The write() system call (in fact all system calls) are nothing more that a contract between the application program and the OS.
for "normal" files, the write() only puts the data on a buffer, and marks that buffer as "dirty"
at some time in the future, these dirty buffers will be collected and actually written to disk. This can be forced by fsync()
this is done by the .write() "method" in the mounted-filesystem-table
and this will invoke the hardware's .write() method. (which could involve another level of buffering, such as DMA)
modern hard disks have there own buffers, which may or may not have actually been written to the physical disk, even if the OS->controller told them to.
Now, some (abnormal) files don't have a write() method to support them. Imagine open()ing "/dev/null", and write()ing a buffer to it. The system could choose not to buffer it, since it will never be written anyway.
Also note that the behaviour of write() does depend on the nature of the file; for network sockets the write(fd,buff,size) can return before size bytes have been sent(write will return the number of characters sent). But it is impossible to find out where they are once they have been sent. They could still be in a network buffer (eg waiting for Nagle ...), or a buffer inside the network interface, or a buffer in a router or switch somewhere on the wire.
As far as I know...
The write() function is a lower level thing where the library doesn't buffer data (unlike fwrite() where the library does/may buffer data).
Despite that, the only guarantee is that the OS transfers the data to disk drive before the next fsync() completes. However, hard disk drives usually have their own internal buffers that are (sometimes) beyond the OS's control, so even if a subsequent fsync() has completed it's possible for a power failure or something to occur before the data is actually written from the disk drive's internal buffer to the disk's physical media.
Essentially, if you really must make sure that your data is actually written to the disk's physical media; then you need to redesign your code to avoid this requirement, or accept a (small) risk of failure, or ensure the hardware is capable of it (e.g. get a UPS).
write() writes data to operating system, making it visible for all processes (if it is something which can be read by other processes). How operating system buffers it, or when it gets written permanently to disk, that is very library, OS, system configuration and file system specific. However, sync() can be used to force buffers to be flushed.
What is quaranteed, is that POSIX requires that, on a POSIX-compliant file system, a read() which can be proved to occur after a write() has returned must return the written data.
OS dependant, see man 2 sync and (on Linux) the discussion in man 8 sync.
Years ago operating systems were supposed to implement an 'elevator algorithm' to schedule writes to disk. The idea would be to minimize the disk writing head movement, which would allow a good throughput for several processes accessing the disk at the same time.
Since you're asking for UNIX, you must keep in mind that a file might actually be on an FTP server, which you have mounted, as an example. For example files /dev and /proc are not files on the HDD, as well.
Also, on Linux data is not written to the hard drive directly, instead there is a polling process, that flushes all pending writes every so often.
But again, those are implementation details, that really don't affect anything from the point of view of your program.

fsync vs write system call

I would like to ask a fundamental question about when is it useful to use a system call like fsync. I am beginner and i was always under the impression that write is enough to write to a file, and samples that use write actually write to the file at the end.
So what is the purpose of a system call like fsync?
Just to provide some background i am using Berkeley DB library version 5.1.19 and there is a lot of talk around the cost of fsync() vs just writing. That is the reason i am wondering.
Think of it as a layer of buffering.
If you're familiar with the standard C calls like fopen and fprintf, you should already be aware of buffering happening within the C runtime library itself.
The way to flush those buffers is with fflush which ensures that the information is handed from the C runtime library to the OS (or surrounding environment).
However, just because the OS has it, doesn't mean it's on the disk. It could get buffered within the OS as well.
That's what fsync takes care of, ensuring that the stuff in the OS buffers is written physically to the disk.
You may typically see this sort of operation in logging libraries:
fprintf (myFileHandle, "something\n"); // output it
fflush (myFileHandle); // flush to OS
fsync (fileno (myFileHandle)); // flush to disk
fileno is a function which gives you the underlying int file descriptor for a given FILE* file handle, and fsync on the descriptor does the final level of flushing.
Now that is a relatively expensive operation since the disk write is usually considerably slower than in-memory transfers.
As well as logging libraries, one other use case may be useful for this behaviour. Let me see if I can remember what it was. Yes, that's it. Databases! Just like Berzerkely DB. Where you want to ensure the data is on the disk, a rather useful feature for meeting ACID requirements :-)

Write system call writes data to disk directly?

I've read couple of questions(here) related to this but I still have some confusion.
My understanding is that write system call puts the data into Buffered Cache(OS caches as referred in that question). When the Buffered Cache gets full it is written to the disk.
Buffered IO is further optimization on top of this. It caches in the C RTL buffers and when they get full a write system call issued to move the contents to Buffered Cache. If I use fflush then data related to this particular file that is present in the C RTL buffers as well as Buffered Cache is sent to the disk.
Is my understanding correct?
How the stdio buffers are flushed is depending on the standard C library you use. To quote from the Linux manual page:
Note that fflush() only flushes the user space buffers provided by the C library.
To ensure that the data is physically stored on disk the kernel buffers must be
flushed too, for example, with sync(2) or fsync(2).
This means that on a Linux system, using fflush or overflowing the buffer will call the write function. But the operating system may keep internal buffers, and not actually write the data to the device. To make sure the data is truly written to the device, use both fflush and the low-level fsync.
Edit: Answer rephrased.

How OS performs buffering for a file

I know that when you call fwrite or fprintf or rather any other function that writes to a file, the contents aren't immediately flushed to the disk, but buffered in the memory.
Firstly, where do the OS manage these buffers and how. Secondly, if you do the write to a file and later read in the content you wrote and assuming that the OS didn't flushed the contents between the time you wrote and read, how it knows that it has to return the read from the buffer? How does it handle this situation.
The reason I want to know this is that I'm interested in implementing my own buffering scheme in user-space, rather than kernel space as done by OS. That is, write to a file would be buffered in user-space and the actual write will only occur at a certain point. Consquently I also need to handle situations where read is called for the content that is still in the buffer. Is it possible to do all this in user-space.
Firstly, where do the OS manage these buffers and how
The functions fwrite and fprintf use stdio buffers which already are completely in userspace. The buffers are (likely) static arrays or perhaps malloced memory.
how it knows that it has to return the read from the buffer
It doesn't, so the changes aren't seen. Nothing actually happens to a file until the underlying system call (write) is called (and even then - read on).
Is it possible to do all this in user-space
No, it's not possible. The good news is that the kernel already has buffers so every write you do isn't atually translated into an actual write to the file. It is postponed and executed later. If in the meantime somebody tries to read from the file, the kernel is smart enough to serve him from the buffer.
Bits from TLPI:
When working with disk files, the read() and write() system calls
don’t directly ini- tiate disk access. Instead, they simply copy data
between a user-space buffer and a buffer in the kernel buffer cache.
When performing I/O on a disk file, a successful return from write()
doesn’t guarantee that the data has been transferred to disk, because
the kernel performs buffering of disk I/O in order to reduce disk
activity and expedite write() calls.
At some later point, the kernel writes (flushes) its buffer to the
disk.
If, in the interim, another process attempts to read these bytes of
the file, then the kernel automatically supplies the data from the
buffer cache, rather than from (the outdated contents of) the file.
So you may want to find out about sync and fsync.
Multiple levels of buffering are generally bad. The reason stdio buffers are useful is that they minimize the number of system calls performed. If a system call would be cheaper nobody would use stdio buffers anymore.

Resources