What is the functional difference between sync()/fsync() and std::ofstream.flush()?

What is the functional difference between sync()/fsync() and std::ofstream.flush()? - c

Background: I'm working with VxWorks for the first time in my life, and I'm having to translate some inherited code so that it will work in VxWorks.
Problem & Solution: The codebase calls sync(). There's no definition for sync() in VxWorks headers (at least, not the ones I have). I have fsync(), which requires a file descriptor to work. The function which calls sync() is writing to the file with a ofstream object before calling sync... and there's no obvious way to recover a file descriptor from an ofstream object.
After an embarrassingly long time rooting around for options, I discover the ofstream::flush() function, which ought to work for what I'm trying to do.
However, in my rooting it was pointed out to me that ofstream::flush() and sync/fsync are associated with different things (IIUC, the C library and the operating system, respectively). That suggests that ofstream::flush() won't achieve quite the same thing that sync() or fsync() will.
Can someone lay out the difference for me? My understanding is still far too fuzzy to be relied upon in future.
Followup: The original code appends a std::endl to the stream and calls sync. My understanding is that appending endl will have the effect of flush(). If the original code has both, that suggests to me either that
the code is redundant
flush() and sync()/fsync() are both required because they do similar but not identical things.
which is it?

If you know the device name being used for the file system, it would be easy to develop a stub fonction that would perform the sync() operation on that device:
fd = open(...);
fsync(fd);
close(fd);
Ideally the code could parse all open fd to the file system and do an ioctl() operation, by passing the command FIOFLUSH or FIOSYNC.
For example:
status = ioctl(fd, FIOSYNC, 0); /* see definition in ioLib.h */
You can have a look at iosFdShow() code for going through all file descriptor.

flush() flushes the application level buffer. That is, it ensures all data is written using write.
sync() is a system call that makes the operating system synchronize its file cache with the underlying storage. In other words, you call this if you want to make sure the file is written to disk and not just on some page in memory.

Related

Is there a better way to manage file pointer in C?

Is it better to use fopen() and fclose() at the beginning and end of every function that use that file, or is it better to pass the file pointer to every of these function ? Or even to set the file pointer as an element of the struct the file is related to.
I have two projects going on and each one use one method (because I thought about passing the file pointer after I began the first one).
When I say better, I mean in term of speed and/or readability. What's best practice ?
Thank you !

It depends. You certainly should document what function is fopen(3)-ing a FILE handle and what function is expecting to fclose(3) it.
You might put the FILE* in a struct but you should have a convention about who and when should the file be read and/or written and closed.
Be aware that opened files are some expansive resources in a process (=your running program). BTW, it is also operating system and file system specific. And FILE handles are buffered, see fflush(3) & setvbuf(3)
On small systems, the maximal number of fopen-ed files handles could be as small as a few dozens. On a current Linux desktop, a process could have a few thousand opened file descriptors (which the internal FILE is keeping, with its buffers). In any case, it is a rather precious and scare resource (on Linux, you might limit it with setrlimit(2))
Be aware that disk IO is very slow w.r.t. CPU.

How does sync() work?

I checked man 2 sync
It shows sync and syncfs
void sync(void);
void syncfs(int fd);
syncfs is easy to understand. An fd is given and the data of that fd is written completely to underlying file systems.
What is it with sync?
sync() causes all buffered modifications to file metadata and data to be written to the underlying file systems.
Is it that all the buffers in the system are written to fs? or is it that all the files that are opened by this process are written to fs? I didnot quite understand "buffered modifications to file metadata"

Whenever you issue a write, send, write to file-backed mappings or similar things the kernel is not forced to flush that data straight to persistent storage, the underlying network stack, etc... This buffering is done for performance reasons.
sync instructs the kernel to do exactly this. Empty all buffers.

Confused about node.js file system

I used write file with nodejs in two steps:
1.First judge if the file is exist or not,use fs.exists function;
2.Then use fs.writeFile to write file directly;
But now I have notice there have more functions used for write file, like fs.open or fs.close, should I use these for open or close file while writing?
Besides, I noticed there have fs.createReadStream and fs.createWriteStream function , what's the differences between them and fs.writeFile and fs.readFile?

Here's how I would explain the differences:
Low-level:
fs.open and fs.close work on file descriptors. These are low-level functions and represent map calls to open(2) BSD system calls. As you'll have a file descriptor, you'd be using these with fs.read or fs.write.
Note, all these are asynchronous and there are synchronous versions as well: fs.openSync, fs.closeSync, fs.readSync, fs.writeSync, where you wouldn't use a callback. The difference between the asynchronous and synchronous versions is that fs.openSync would only return when the operation to open the file has completed, whereas fs.open returns straight away and you'd use the file descriptor in the callback.
These low-level functions give you full control, but will mean a lot more coding.
Mid level:
fs.createReadStream and fs.createWriteStream create stream objects which you can wire up to events. Examples for these events are 'data' (when a chunk of data has been read, but that chunk is only part of the file) or 'close'. Advantages of this are that you can read a file and process it as data comes in, i.e. you don't have to read the whole file, keep it in memory and then process it. This makes sense when dealing with large files as you can get better performance in processing bits in chunks rather than dealing with the whole file (e.g. a whole 1GB file in memory).
High level:
fs.readFile and fs.writeFile operate on the whole file. So you'd call fs.readFile, node would read in the whole file and then present you the whole data in your callback. The advantage of this is that you don't need to deal with differently sized chunks (like when using streams). When writing, node would write the whole file. The disadvantage of this approach is that when reading/writing, you'd have to have the whole file in memory. For example, if you are transforming a log file, you may only need lines of data, using streams you can do this without having to wait for the file to be read in completely before starting to write.
There are also, fs.readFileSync and fs.writeFileSync which would not use a callback, but wait for the read/write to finish before returning. The advantage of using this is that for a small file, you may not want to do anything before the file returns, but for big files it would mean that the CPU would idle away while waiting for the file I/O to finish.
Hope that makes sense and in answer to your question, when using fs.writeFile you don't need fs.open or fs.close.

fsync vs write system call

I would like to ask a fundamental question about when is it useful to use a system call like fsync. I am beginner and i was always under the impression that write is enough to write to a file, and samples that use write actually write to the file at the end.
So what is the purpose of a system call like fsync?
Just to provide some background i am using Berkeley DB library version 5.1.19 and there is a lot of talk around the cost of fsync() vs just writing. That is the reason i am wondering.

Think of it as a layer of buffering.
If you're familiar with the standard C calls like fopen and fprintf, you should already be aware of buffering happening within the C runtime library itself.
The way to flush those buffers is with fflush which ensures that the information is handed from the C runtime library to the OS (or surrounding environment).
However, just because the OS has it, doesn't mean it's on the disk. It could get buffered within the OS as well.
That's what fsync takes care of, ensuring that the stuff in the OS buffers is written physically to the disk.
You may typically see this sort of operation in logging libraries:
fprintf (myFileHandle, "something\n"); // output it
fflush (myFileHandle); // flush to OS
fsync (fileno (myFileHandle)); // flush to disk
fileno is a function which gives you the underlying int file descriptor for a given FILE* file handle, and fsync on the descriptor does the final level of flushing.
Now that is a relatively expensive operation since the disk write is usually considerably slower than in-memory transfers.
As well as logging libraries, one other use case may be useful for this behaviour. Let me see if I can remember what it was. Yes, that's it. Databases! Just like Berzerkely DB. Where you want to ensure the data is on the disk, a rather useful feature for meeting ACID requirements :-)

Read from a file in C while its being written externally

I'd like to write a small C program that reads from a file while it is actively being written to. Any ideas?

If you have control over the writing process you should use mmap() with MAP_SHARED in both reader and writer. This way the reader will see the changes done by the writer practically immediately.
Also, note that Linux does not make any snapshot of the data in the file when you open the file, so you should see the changes that are being made in the file even if you just use read() and lseek().
In order to determine whether a file was modified/opened/accessed/etc in Linux you can use inotify API (see inotify manpage). This allows you to make your process wait for an event you're interested in until it occurs (as opposed to polling it regularly). You can also use epoll() or more traditional select() to accomplish similar result.

I think that tail -f is exactly what you want, isn't it? Take a look at the source code:
http://www.gnu.org/s/coreutils/
Or this one (not sure if updated): http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight