What is the difference between fsync and syncfs? - c

What is the difference between fsync and syncfs ?
int syncfs(int fd);
int fsync(int fd);
The manpage for fync says the following :
fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages
for) the file referred to by the file descriptor fd to the disk device (or other permanent stor‐
age device) so that all changed information can be retrieved even after the system crashed or
was rebooted. This includes writing through or flushing a disk cache if present. The call
blocks until the device reports that the transfer has completed. It also flushes metadata
information associated with the file (see stat(2)).
The manpage for syncfs says the following:
sync() causes all buffered modifications to file metadata and data to be written to the underly‐
ing filesystems.
syncfs() is like sync(), but synchronizes just the filesystem containing file referred to by the
open file descriptor fd.
For me both seem equal. They are sychronizing the file referred by the filedescriptor and the associated metadata.

First, fsync() (and sync()) are POSIX-standard functions while syncfs() is Linux-only.
So availability is one big difference.
From the POSIX standard for fsync():
The fsync() function shall request that all data for the open file
descriptor named by fildes is to be transferred to the storage
device associated with the file described by fildes. The nature of
the transfer is implementation-defined. The fsync() function shall
not return until the system has completed that action or until an
error is detected.
Note that it's just a request.
From the POSIX standard for sync():
The sync() function shall cause all information in memory that
updates file systems to be scheduled for writing out to all file
systems.
The writing, although scheduled, is not necessarily complete upon
return from sync().
Again, that's not something guaranteed to happen.
The Linux man page for syncfs() (and sync()) states
sync() causes all pending modifications to filesystem metadata and
cached file data to be written to the underlying filesystems.
syncfs() is like sync(), but synchronizes just the filesystem
containing file referred to by the open file descriptor fd.
Note that when the function returns is unspecified.
The Linux man page for fsync() states:
fsync() transfers ("flushes") all modified in-core data of (i.e.,
modified buffer cache pages for) the file referred to by the file
descriptor fd to the disk device (or other permanent storage device)
so that all changed information can be retrieved even if the system
crashes or is rebooted. This includes writing through or flushing a
disk cache if present. The call blocks until the device reports that
the transfer has completed.
As well as flushing the file data, fsync() also flushes the metadata
information associated with the file (see inode(7)).
Calling fsync() does not necessarily ensure that the entry in the
directory containing the file has also reached disk. For that an
explicit fsync() on a file descriptor for the directory is also
needed.
Note that the guarantees Linux provides for fsync() are much stronger than those provided for sync() or syncfs(), and by POSIX for both fsync() and sync().
In summary:
POSIX fsync(): "please write data for this file to disk"
POSIX sync(): "write all data to disk when you get around to it"
Linux sync(): "write all data to disk (when you get around to it?)"
Linux syncfs(): "write all data for the filesystem associated with this file to disk (when you get around to it?)"
Linux fsync(): "write all data and metadata for this file to disk, and don't return until you do"
Note that the Linux man page doesn't specify when sync() and syncfs() return.

I think the current answer is not complete. For Linux:
According to the standard specification (e.g., POSIX.1-2001), sync()
schedules the writes, but may return before the actual writing is
done. However Linux waits for I/O completions, and thus sync() or
syncfs() provide the same guarantees as fsync called on every file
in the system or filesystem respectively.
and
Before version 1.3.20 Linux did not wait for I/O to complete before
returning.
This is mentioned on the sync(2) page in the "notes" and "bugs" sections.

Related

Calling fsync after file read/write operations returns -1

I am doing file read/write operation to copy a file of size 900 MB from a directory to USB(/mnt/usb).
After read/write completed, I am doing fflush and fsync as below,
FILE *filename;
/* file read/write operations */
fflush(filename);
fsync(fileno(filename));
In the above code, fsync is returning -1. what will be the reason and how to check it. Thanks in advance.
FSYNC(2) Linux Programmer's Manual FSYNC(2)
NAME
fsync, fdatasync - synchronize a file's in-core state with storage device
SYNOPSIS
#include <unistd.h>
int fsync(int fd);
int fdatasync(int fd);
Feature Test Macro Requirements for glibc (see feature_test_macros(7)):
fsync():
Glibc 2.16 and later:
No feature test macros need be defined
Glibc up to and including 2.15:
_BSD_SOURCE || _XOPEN_SOURCE
|| /* since glibc 2.8: */ _POSIX_C_SOURCE >= 200112L
fdatasync():
_POSIX_C_SOURCE >= 199309L || _XOPEN_SOURCE >= 500
DESCRIPTION
fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by
the file descriptor fd to the disk device (or other permanent storage device) so that all changed information can be re‐
trieved even if the system crashes or is rebooted. This includes writing through or flushing a disk cache if present.
The call blocks until the device reports that the transfer has completed.
As well as flushing the file data, fsync() also flushes the metadata information associated with the file (see inode(7)).
Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk.
For that an explicit fsync() on a file descriptor for the directory is also needed.
fdatasync() is similar to fsync(), but does not flush modified metadata unless that metadata is needed in order to allow a
subsequent data retrieval to be correctly handled. For example, changes to st_atime or st_mtime (respectively, time of
last access and time of last modification; see inode(7)) do not require flushing because they are not necessary for a sub‐
sequent data read to be handled correctly. On the other hand, a change to the file size (st_size, as made by say ftrun‐
cate(2)), would require a metadata flush.
The aim of fdatasync() is to reduce disk activity for applications that do not require all metadata to be synchronized
with the disk.
RETURN VALUE
On success, these system calls return zero. On error, -1 is returned, and errno is set appropriately.
ERRORS
EBADF fd is not a valid open file descriptor.
EIO An error occurred during synchronization. This error may relate to data written to some other file descriptor on
the same file. Since Linux 4.13, errors from write-back will be reported to all file descriptors that might have
written the data which triggered the error. Some filesystems (e.g., NFS) keep close track of which data came
through which file descriptor, and give more precise reporting. Other filesystems (e.g., most local filesystems)
will report errors to all file descriptors that where open on the file when the error was recorded.
ENOSPC Disk space was exhausted while synchronizing.
EROFS, EINVAL
fd is bound to a special file (e.g., a pipe, FIFO, or socket) which does not support synchronization.
ENOSPC, EDQUOT
fd is bound to a file on NFS or another filesystem which does not allocate space at the time of a write(2) system
call, and some previous write failed due to insufficient storage space.
CONFORMING TO
POSIX.1-2001, POSIX.1-2008, 4.3BSD.
AVAILABILITY
On POSIX systems on which fdatasync() is available, _POSIX_SYNCHRONIZED_IO is defined in <unistd.h> to a value greater
than 0. (See also sysconf(3).)
NOTES
On some UNIX systems (but not Linux), fd must be a writable file descriptor.
In Linux 2.2 and earlier, fdatasync() is equivalent to fsync(), and so has no performance advantage.
The fsync() implementations in older kernels and lesser used filesystems does not know how to flush disk caches. In these
cases disk caches need to be disabled using hdparm(8) or sdparm(8) to guarantee safe operation.
SEE ALSO
sync(1), bdflush(2), open(2), posix_fadvise(2), pwritev(2), sync(2), sync_file_range(2), fflush(3), fileno(3), hdparm(8),
mount(8)
COLOPHON
This page is part of release 4.16 of the Linux man-pages project. A description of the project, information about report‐
ing bugs, and the latest version of this page, can be found at https://www.kernel.org/doc/man-pages/.
Linux 2017-09-15 FSYNC(2) RTFM

is there an official document that mark read/write function as thread-safe functions?

the man pages of read/write didn't mention anything about their thread-safety
According to this link!
i understood this functions are thread safe but in this comment there is not a link to an official document.
In other hand according to this link! which says:
The read() function shall attempt to read nbyte bytes
from the file associated with the open file descriptor,
fildes, into the buffer pointed to by buf.
The behavior of multiple concurrent reads on the same pipe, FIFO, or
terminal device is unspecified.
I concluded the read function is not thread safe.
I am so confused now. please send me a link to official document about thread-safety of this functions.
i tested this functions with pipe but there wasn't any problem.(of course i know i couldn't state any certain result by testing some example)
thanks in advance:)
The thread safe versions of read and write are pread and pwrite:
pread(2)
The pread() and pwrite() system calls are especially useful in
multithreaded applications. They allow multiple threads to perform
I/O on the same file descriptor without being affected by changes to
the file offset by other threads.
when two threads write() at the same time the order is not specified (which write call completes first) therefore the behaviour is unspecified (without synchronization)
read() and write() are not strictly thread-safe, and there is no documentation that says they are, as the location where the data is read from or written to can be modified by another thread.
Per the POSIX read documentation (note the bolded parts):
The read() function shall attempt to read nbyte bytes from the file associated with the open file descriptor, fildes, into the buffer pointed to by buf. The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.
That's the part you noticed - but that does not cover all possible types of file descriptors, such as regular files. It only applies to "pipe[s], FIFO[s]" and "terminal device[s]". This part covers almost everything else (weird things like "files" in /proc that are generated on the fly by the kernel are, well, weird and highly implementation-specific):
On files that support seeking (for example, a regular file), the read() shall start at a position in the file given by the file offset associated with fildes. The file offset shall be incremented by the number of bytes actually read.
Since the "file offset associated with fildes" is subject to modification from other threads in the process, the following code is not guaranteed to return the same results even given the exact same file contents and inputs for fd, offset, buffer, and bytes:
lseek( fd, offset, SEEK_SET );
read( fd, buffer, bytes );
Since both read() and write() depend upon a state (current file offset) that can be modified at any moment by another thread, they are not tread-safe.
On some embedded file systems, or really old desktop systems that weren't designed to facilitate multitasking support (e.g. MS-DOS 3.0), an attempt to perform an fread() on one file while an fread() is being performed on another file may result in arbitrary system corruption.
Any modern operating system and language runtime will guarantee that such corruption won't occur as a result of operations performed on unrelated files, or when independent file descriptors are used to access the same file in ways that do not modify it. Functions like fread() and fwrite() will be thread-safe when used in that fashion.
The act of reading data from a disk file does not modify it, but reading data from many kinds of stream will modify them by removing data. If two threads both perform actions that modify the same stream, such actions may interfere with each other in unspecified ways even if such modifications are performed by fread() operations.

How to prevent data loss when closing a file descriptor?

When I issue write(), my data goes to some kernel space buffers. The actual commit to physical layer ("phy-commit") is (likely) deferred, until.. (exactly until what events?)
When I issue a close() for a file descriptor, then
If [...], the resources associated with the open file description are freed
Does it mean releasing (freeing) those kernel buffers which contained my data? What will happen to my precious data, contained in those buffers? Will be lost?
How to prevent that loss?
Via fsync()? It requests an explicite phy-commit. I suppose that either immediately (synchronous call) or deferred only "for short time" and queued to precede subsequent operations, at least destructive ones.
But I do not quite want immediate or urgent phy-commit. Only (retaining my data and) not to forget doing phy-commit later in some time.
From man fclose:
The fclose() function [...] closes the underlying file descriptor.
...
fclose() flushes only the user-space buffers provided by the C library. To ensure that the data is physically stored on disk the kernel buffers must be flushed too, for example, with sync(2) or fsync(2).
It may suggest that fsync does not have to precede close (or fclose, which contains a close), but can (even have to) come after it. So the close() cannot be very destructive...
Does it mean releasing (freeing) those kernel buffers which contained my data? What will happen to my precious data, contained in those buffers? Will be lost?
No. The kernel buffers will not freed before it writes the data to the underlying file. So, there won't be any data loss (unless something goes really wrong - such as power outage to the system).
Whether the that data will be immediately written to the physical file is another question. It may dependent on the filesystem (which may be buffering) and/or any hardware caching as well.
As far as your user program is concerned, a successful close() call can be considered as successful write to the file.
It may suggest that fsync does not have to precede close (or fclose, which contains a close), but can (even have to) come after it. So the close() cannot be very destructive...
After a call to close(), the state of the file descriptor is left unspecified by POSIX (regardless of whether close() succeeded). So, you are not allowed to use fsync(fd); after the calling close().
See: POSIX/UNIX: How to reliably close a file descriptor.
And no, it doesn't suggest close() can be destructive. It suggests that the C library may be doing its own buffering in the user and suggests to use fsync() to flush it to the kernel (and now, we are in the same position as said before).

Calling fsync(2) after close(2)

Scenario:
Task code (error checking omitted):
// open, write and close
fd = open(name);
write(fd, buf, len);
close(fd);
< more code here **not** issuing read/writes to name but maybe open()ing it >
// open again and fsync
fd = open(name);
fsync(fd);
No more tasks accessing name concurrently in the system.
Is it defined, and more important, will it sync possible outstanding writes on the inode referred by name? ie, will I read back buf from the file after the fsync?
From POSIX http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html I would say it seems legit ...
Thanks.
Edit may 18:
Thanks for the answers and research. I took this question (in 2016) to one of the extfs lead developers (Ted) and got this answer: "It's not guaranteed by Posix, but in practice it should work on most
file systems, including ext4. The key wording in the Posix specification is:
The fsync() function shall request that all data for the open file
^^^^^^^^^^^^^^^^^
descriptor named by fildes is to be transferred to the storage device
^^^^^^^^^^^^^^^^^^^^^^^^^^
associated with the file described by fildes.
It does not say "all data for the file described by fildes...." it
says "all data for the open file descriptor". So technically data
written by another file descriptor is not guaranteed to be synced to
disk.
In practice, file systems don't try dirty data by which fd it came in
on, so you don't need to worry. And an OS which writes more than what
is strictly required is standards compliant, and so that's what you
will find in general, even if it isn't guaranteed." This is less specific than "exact same durabily guarrantees" but is quite authoritative, even though maybe outdated.
What I was trying to do was a 'sync' command that worked on single files.
Like fsync /some/file without having to sync the whole filesystem, to use it in shell scripts for example.
Now (since a few years ago) gnu coreutils 'sync' works on single files and does exactly this (open/fsync). commit: https://github.com/coreutils/coreutils/commit/8b2bf5295f353016d4f5e6a2317d55b6a8e7fd00
No, close()+re-open()+fsync() does not provide the same guarantees as fsync()+close().
Source: I took this question to the linux-fsdevel mailing list and got the answer:
Does a sequence of close()/re-open()/fsync() provide the same durability
guarantees as fsync()/close()?
The short answer is no, the latter provides a better guaranty.
The longer answer is that durability guarantees depends on kernel version,
because situation has been changing in v4.13, v4.14 and now again in
v4.17-rc and stable kernels.
Further relevant links are:
https://wiki.postgresql.org/wiki/Fsync_Errors ("fsyncgate")
Mailing list entry PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Writing programs to cope with I/O errors causing lost writes on Linux from the same author
In particular, the latter links describe how
after closing an FD, you lose all ways to enforce durability
after an fsync() fails, you cannot call fsync() again in the hope that now your data would be written
you must re-do/confirm all writing work if that happens
The current (2017) specification of POSIX fsync()
recognizes a base functionality and an optional functionality:
The fsync() function shall request that all data for the open file descriptor named by fildes is to be transferred to the storage device associated with the file described by fildes. The nature of the transfer is implementation-defined. The fsync() function shall not return until the system has completed that action or until an error is detected.
[SIO] ⌦ If _POSIX_SYNCHRONIZED_IO is defined, the fsync() function shall force all currently queued I/O operations associated with the file indicated by file descriptor fildes to the synchronized I/O completion state. All I/O operations shall be completed as defined for synchronized I/O file integrity completion. ⌫
If _POSIX_SYNCHRONIZED_IO is not defined by the implementation, then your reopened file descriptor has no unwritten data to be transferred to the storage device, so the fsync() call is effectively a no-op.
If _POSIX_SYNCHRONIZED_IO is defined by the implementation, then your reopened file descriptor will ensure that all data written on any file descriptor associated with the file to be transferred to the storage device.
The section of the standard on Conformance has information about options and option groups.
The Definitions section has definitions 382..387 which defines aspects of Synchronized I/O and Synchronous I/O (yes, they're different — beware open file descriptors and open file descriptions, too).
The section on Realtime defers to the Definitions section for what synchronized I/O means.
It defines:
3.382 Synchronized Input and Output
A determinism and robustness improvement mechanism to enhance the data input and output mechanisms, so that an application can ensure that the data being manipulated is physically present on secondary mass storage devices.
3.383 Synchronized I/O Completion
The state of an I/O operation that has either been successfully transferred or diagnosed as unsuccessful.
3.384 Synchronized I/O Data Integrity Completion
For read, when the operation has been completed or diagnosed if unsuccessful. The read is complete only when an image of the data has been successfully transferred to the requesting process. If there were any pending write requests affecting the data to be read at the time that the synchronized read operation was requested, these write requests are successfully transferred prior to reading the data.
For write, when the operation has been completed or diagnosed if unsuccessful. The write is complete only when the data specified in the write request is successfully transferred and all file system information required to retrieve the data is successfully transferred.
File attributes that are not necessary for data retrieval (access time, modification time, status change time) need not be successfully transferred prior to returning to the calling process.
3.385 Synchronized I/O File Integrity Completion
Identical to a synchronized I/O data integrity completion with the addition that all file attributes relative to the I/O operation (including access time, modification time, status change time) are successfully transferred prior to returning to the calling process.
3.386 Synchronized I/O Operation
An I/O operation performed on a file that provides the application assurance of the integrity of its data and files.
3.387 Synchronous I/O Operation
An I/O operation that causes the thread requesting the I/O to be blocked from further use of the processor until that I/O operation completes.
Note:
A synchronous I/O operation does not imply synchronized I/O data integrity completion or synchronized I/O file integrity completion.
It is not 100% clear whether the 'all currently queued I/O operations associated with the file indicated by [the] file descriptor' applies across processes.
Conceptually, I think it should, but the wording isn't there in black and white (or black on pale yellow). It certainly should apply to any open file descriptors in the current process referring to the same file. It is not clear that it would apply to the previously opened (and closed) file descriptor in the current process. If it applies across all processes, then it should include the queued I/O from the current process. If it does not apply across all processes, it is possible that it does not.
In view of this and the rationale notes for fsync(), it is by far safest to assume that the fsync() operation has no effect on the queued operations associated with the closed file descriptor. If you want fsync() to be effective, call it before you close the file descriptor.

Writing and reading the same fd without fsync in Linux

Suppose I write a block to a file descriptor without doing fsync and then read the same block from the same descriptor some time later. Is it guaranteed that I will receive the same information?
The program is single-threaded and no other process will access the file at any time.
Yes, it is guaranteed by the operating system.
Even if the modifications have not made it to disk yet, the OS uses its buffer cache to reflect file modifications and guarantees atomicity level for reads and writes, to ALL processes. So not only your process, but any other process, would be able to see the changes.
As to fsync(), it only instructs the operating system to do its best to flush the contents to disk. See also fdatasync().
Also, I suggest you use two file descriptors: one for reading, another for writing.
fsync() synchronizes cache and disk. Since the data is already in the cache, it will be read from there instead of from disk.
When you write to a file descriptor, the data is stored in ram caches and buffers before being sent to disk. So as long as you don't close the descriptor, you can access the data you just wrote. If you close the descriptor, the file contents must be put to disk either by flushing it yourself or waiting for the OS to do it for efficiency, BUT if you want to be assured to access the just written data on disk after opening a new FD, you MUST flush to disk with fsync().

Resources