How does sync() work? - c

I checked man 2 sync
It shows sync and syncfs
void sync(void);
void syncfs(int fd);
syncfs is easy to understand. An fd is given and the data of that fd is written completely to underlying file systems.
What is it with sync?
sync() causes all buffered modifications to file metadata and data to be written to the underlying file systems.
Is it that all the buffers in the system are written to fs? or is it that all the files that are opened by this process are written to fs? I didnot quite understand "buffered modifications to file metadata"

Whenever you issue a write, send, write to file-backed mappings or similar things the kernel is not forced to flush that data straight to persistent storage, the underlying network stack, etc... This buffering is done for performance reasons.
sync instructs the kernel to do exactly this. Empty all buffers.

Related

What is the functional difference between sync()/fsync() and std::ofstream.flush()?

Background: I'm working with VxWorks for the first time in my life, and I'm having to translate some inherited code so that it will work in VxWorks.
Problem & Solution: The codebase calls sync(). There's no definition for sync() in VxWorks headers (at least, not the ones I have). I have fsync(), which requires a file descriptor to work. The function which calls sync() is writing to the file with a ofstream object before calling sync... and there's no obvious way to recover a file descriptor from an ofstream object.
After an embarrassingly long time rooting around for options, I discover the ofstream::flush() function, which ought to work for what I'm trying to do.
However, in my rooting it was pointed out to me that ofstream::flush() and sync/fsync are associated with different things (IIUC, the C library and the operating system, respectively). That suggests that ofstream::flush() won't achieve quite the same thing that sync() or fsync() will.
Can someone lay out the difference for me? My understanding is still far too fuzzy to be relied upon in future.
Followup: The original code appends a std::endl to the stream and calls sync. My understanding is that appending endl will have the effect of flush(). If the original code has both, that suggests to me either that
the code is redundant
flush() and sync()/fsync() are both required because they do similar but not identical things.
which is it?
If you know the device name being used for the file system, it would be easy to develop a stub fonction that would perform the sync() operation on that device:
fd = open(...);
fsync(fd);
close(fd);
Ideally the code could parse all open fd to the file system and do an ioctl() operation, by passing the command FIOFLUSH or FIOSYNC.
For example:
status = ioctl(fd, FIOSYNC, 0); /* see definition in ioLib.h */
You can have a look at iosFdShow() code for going through all file descriptor.
flush() flushes the application level buffer. That is, it ensures all data is written using write.
sync() is a system call that makes the operating system synchronize its file cache with the underlying storage. In other words, you call this if you want to make sure the file is written to disk and not just on some page in memory.

POSIX guaranteeing write to disk

As I understand it, if I want to synchronise data to the storage device I can use fsync() to supposedly flush all the OS output caches... but apparently it doesn't guarantee this at all, unlike the documentation tries to deceive you, and the data may not be written to the disk!
This is not very good for many purposes because it can lead to data corruption. How do I use the POSIX libraries (In a portable way if possible) to guarantee that the data has been written (as far as possible) and prevent data corruption?
There is fdatasync() but it is not implemented on OSX, so is there a better and more portable way, or does one have to implement different code on different systems? I'm also not sure if fdatasync() is good enough.
Of-course, in the worst case scenario I could forget about this and use a redundant database library that uses ACID to store the data. I don't want that.
Also I'm interested in how to ensure truncate and rename operations have definitely completed.
Thanks!
You are looking for sync. There is both a program called sync and a system call called sync (man 1 sync and man 2 sync respectively):
#include <unistd.h>
void sync(void);
DESCRIPTION
sync() first commits inodes to buffers, and then buffers to disk.
So it will ensure that all pending operations (truncates, renames etc) are in fact written to the disk.
fsync does not claim to flush all output caches, but instead claims to flush all changes to a particular file descriptor to disk. It explicitly does not ensure that the directory entry is updated (in which case a call to fsync on a filedescriptor for the directory is needed).
fsyncdata is even more useless as it will not flush file metadata and instead will just ensure that the data in the file is flushed.
It is a good idea to trust the manpages. I won't say there are not mistakes, but they tend to be extremely accurate.

When does actual write() takes place in C?

What really happens when write() system call is executed?
Lets say I have a program which writes certain data into a file using write() function call. Now C library has its own internal buffer and OS too has its own buffer.
What interaction takes place between these buffers ?
Is it like when C library buffer gets filled completely, it writes to OS buffer and when OS buffer gets filled completely, then the actual write is done on the file?
I am looking for some detailed answers, useful links would also help. Consider this question for a UNIX system.
The write() system call (in fact all system calls) are nothing more that a contract between the application program and the OS.
for "normal" files, the write() only puts the data on a buffer, and marks that buffer as "dirty"
at some time in the future, these dirty buffers will be collected and actually written to disk. This can be forced by fsync()
this is done by the .write() "method" in the mounted-filesystem-table
and this will invoke the hardware's .write() method. (which could involve another level of buffering, such as DMA)
modern hard disks have there own buffers, which may or may not have actually been written to the physical disk, even if the OS->controller told them to.
Now, some (abnormal) files don't have a write() method to support them. Imagine open()ing "/dev/null", and write()ing a buffer to it. The system could choose not to buffer it, since it will never be written anyway.
Also note that the behaviour of write() does depend on the nature of the file; for network sockets the write(fd,buff,size) can return before size bytes have been sent(write will return the number of characters sent). But it is impossible to find out where they are once they have been sent. They could still be in a network buffer (eg waiting for Nagle ...), or a buffer inside the network interface, or a buffer in a router or switch somewhere on the wire.
As far as I know...
The write() function is a lower level thing where the library doesn't buffer data (unlike fwrite() where the library does/may buffer data).
Despite that, the only guarantee is that the OS transfers the data to disk drive before the next fsync() completes. However, hard disk drives usually have their own internal buffers that are (sometimes) beyond the OS's control, so even if a subsequent fsync() has completed it's possible for a power failure or something to occur before the data is actually written from the disk drive's internal buffer to the disk's physical media.
Essentially, if you really must make sure that your data is actually written to the disk's physical media; then you need to redesign your code to avoid this requirement, or accept a (small) risk of failure, or ensure the hardware is capable of it (e.g. get a UPS).
write() writes data to operating system, making it visible for all processes (if it is something which can be read by other processes). How operating system buffers it, or when it gets written permanently to disk, that is very library, OS, system configuration and file system specific. However, sync() can be used to force buffers to be flushed.
What is quaranteed, is that POSIX requires that, on a POSIX-compliant file system, a read() which can be proved to occur after a write() has returned must return the written data.
OS dependant, see man 2 sync and (on Linux) the discussion in man 8 sync.
Years ago operating systems were supposed to implement an 'elevator algorithm' to schedule writes to disk. The idea would be to minimize the disk writing head movement, which would allow a good throughput for several processes accessing the disk at the same time.
Since you're asking for UNIX, you must keep in mind that a file might actually be on an FTP server, which you have mounted, as an example. For example files /dev and /proc are not files on the HDD, as well.
Also, on Linux data is not written to the hard drive directly, instead there is a polling process, that flushes all pending writes every so often.
But again, those are implementation details, that really don't affect anything from the point of view of your program.

fsync vs write system call

I would like to ask a fundamental question about when is it useful to use a system call like fsync. I am beginner and i was always under the impression that write is enough to write to a file, and samples that use write actually write to the file at the end.
So what is the purpose of a system call like fsync?
Just to provide some background i am using Berkeley DB library version 5.1.19 and there is a lot of talk around the cost of fsync() vs just writing. That is the reason i am wondering.
Think of it as a layer of buffering.
If you're familiar with the standard C calls like fopen and fprintf, you should already be aware of buffering happening within the C runtime library itself.
The way to flush those buffers is with fflush which ensures that the information is handed from the C runtime library to the OS (or surrounding environment).
However, just because the OS has it, doesn't mean it's on the disk. It could get buffered within the OS as well.
That's what fsync takes care of, ensuring that the stuff in the OS buffers is written physically to the disk.
You may typically see this sort of operation in logging libraries:
fprintf (myFileHandle, "something\n"); // output it
fflush (myFileHandle); // flush to OS
fsync (fileno (myFileHandle)); // flush to disk
fileno is a function which gives you the underlying int file descriptor for a given FILE* file handle, and fsync on the descriptor does the final level of flushing.
Now that is a relatively expensive operation since the disk write is usually considerably slower than in-memory transfers.
As well as logging libraries, one other use case may be useful for this behaviour. Let me see if I can remember what it was. Yes, that's it. Databases! Just like Berzerkely DB. Where you want to ensure the data is on the disk, a rather useful feature for meeting ACID requirements :-)

Writing and reading the same fd without fsync in Linux

Suppose I write a block to a file descriptor without doing fsync and then read the same block from the same descriptor some time later. Is it guaranteed that I will receive the same information?
The program is single-threaded and no other process will access the file at any time.
Yes, it is guaranteed by the operating system.
Even if the modifications have not made it to disk yet, the OS uses its buffer cache to reflect file modifications and guarantees atomicity level for reads and writes, to ALL processes. So not only your process, but any other process, would be able to see the changes.
As to fsync(), it only instructs the operating system to do its best to flush the contents to disk. See also fdatasync().
Also, I suggest you use two file descriptors: one for reading, another for writing.
fsync() synchronizes cache and disk. Since the data is already in the cache, it will be read from there instead of from disk.
When you write to a file descriptor, the data is stored in ram caches and buffers before being sent to disk. So as long as you don't close the descriptor, you can access the data you just wrote. If you close the descriptor, the file contents must be put to disk either by flushing it yourself or waiting for the OS to do it for efficiency, BUT if you want to be assured to access the just written data on disk after opening a new FD, you MUST flush to disk with fsync().

Resources