using fwrite as an atomic process on Linux - c

I am developing a C code on Linux environment. I use fwrite to write some data to some files. The program will be run on an environment that power cut offs occur often (at least once a day). Therefore, I want fwrite to ensure that the file should not be updated if a power cut occurs while it is writing data. It should only save the file when the fwrite finishes its job. How can I use fwrite that effects the file only it finishes the writing process?
EDIT: I use fopen with wb to discard the previous info in the file and write a new file e.g.
FILE *rtng_p;
rtng_p = fopen("/etc/routing_table", "wb");
fwrite(&user_list, sizeof(struct routing), 40, rtng_p);
and it is a very small data some bytes long

First write the file to a temporary path on the same filesystem, like /etc/routing_table.tmp. Then just rename the copy on top of original file. Renames are guaranteed atomic.
So, the sequence of calls would be, fopen, fwrite, fclose, rename.

In addition of the sequence given in David Schwartz answer you could perhaps use advisory locks with e.g. flock(2) syscall (or maybe lockf(3) i.e. fcntl(2) with F_SETLK ....)
That would mean to add, just after
FILE * fil = fopen("/etc/routing_table.tmp", "wb");
the lines
if (!fil)
{ perror("/etc/routing_table.tmp"); exit(EXIT_FAILURE); };
if (flock(fileno(fil), LOCK_EX))
{ perror("flock LOCK_EX"); exit(EXIT_FAILURE); };
and at the end, you would
if (fflush(fil)) /* flush the file before unlocking it!!*/
{ perror("fflush"); exit(EXIT_FAILURE); };
if (flock(fileno(fil), LOCK_UN))
{ perror("flock LOCK_UN"); exit(EXIT_FAILURE); };
if (fclose (fil))
{ perror("fclose"); exit(EXIT_FAILURE); };;
if (rename("/etc/routing_table.tmp", "/etc/routing_table"))
{ perror("rename"); exit(EXIT_FAILURE); };
Using such advisory locking would ensure that even if two processes of your program are running, only one would write the file.
But it is overkill probably.
BTW, you seems to write binary data in /etc/. I believe it is against the habits or the conventions (see Linux Filesystem Hierarchy, or Linux Standard Base). I expect files under /etc to be textual. Perhaps you want your file under /var/lib ?
See also Advanced Linux Programming book online.

There has been a large argument going on in the UNIX/Linux community about the whether the open/write/close/rename pattern (as described in David Schwartz's answer) is actually guaranteed to be atomic. Note this conversation is about write and not fwrite!
The primary author of the EXT4 filesystem did not believe that it should be guaranteed according to POSIX and early versions of the filesystem did not treat it as atomic. Eventually he capitulated and made that set of operations atomic as the default behavior for EXT4. The claim was made, however, that user programs should actually be doing open/write/fsync/close/rename.
Other filesystems may not guarantee atomicity without the fsync, and if EXT4 is mounted with noauto_da_alloc then that guarantee is lost there as well. So if you want to be really safe you should add fsync after close before the rename. I haven't tried this with fwrite it might work if you use fflush.
See the auto_da_alloc section at https://www.kernel.org/doc/Documentation/filesystems/ext4.txt for more information. Also see an article written by the primary author of EXT4 here: http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/

Related

is there an official document that mark read/write function as thread-safe functions?

the man pages of read/write didn't mention anything about their thread-safety
According to this link!
i understood this functions are thread safe but in this comment there is not a link to an official document.
In other hand according to this link! which says:
The read() function shall attempt to read nbyte bytes
from the file associated with the open file descriptor,
fildes, into the buffer pointed to by buf.
The behavior of multiple concurrent reads on the same pipe, FIFO, or
terminal device is unspecified.
I concluded the read function is not thread safe.
I am so confused now. please send me a link to official document about thread-safety of this functions.
i tested this functions with pipe but there wasn't any problem.(of course i know i couldn't state any certain result by testing some example)
thanks in advance:)
The thread safe versions of read and write are pread and pwrite:
pread(2)
The pread() and pwrite() system calls are especially useful in
multithreaded applications. They allow multiple threads to perform
I/O on the same file descriptor without being affected by changes to
the file offset by other threads.
when two threads write() at the same time the order is not specified (which write call completes first) therefore the behaviour is unspecified (without synchronization)
read() and write() are not strictly thread-safe, and there is no documentation that says they are, as the location where the data is read from or written to can be modified by another thread.
Per the POSIX read documentation (note the bolded parts):
The read() function shall attempt to read nbyte bytes from the file associated with the open file descriptor, fildes, into the buffer pointed to by buf. The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.
That's the part you noticed - but that does not cover all possible types of file descriptors, such as regular files. It only applies to "pipe[s], FIFO[s]" and "terminal device[s]". This part covers almost everything else (weird things like "files" in /proc that are generated on the fly by the kernel are, well, weird and highly implementation-specific):
On files that support seeking (for example, a regular file), the read() shall start at a position in the file given by the file offset associated with fildes. The file offset shall be incremented by the number of bytes actually read.
Since the "file offset associated with fildes" is subject to modification from other threads in the process, the following code is not guaranteed to return the same results even given the exact same file contents and inputs for fd, offset, buffer, and bytes:
lseek( fd, offset, SEEK_SET );
read( fd, buffer, bytes );
Since both read() and write() depend upon a state (current file offset) that can be modified at any moment by another thread, they are not tread-safe.
On some embedded file systems, or really old desktop systems that weren't designed to facilitate multitasking support (e.g. MS-DOS 3.0), an attempt to perform an fread() on one file while an fread() is being performed on another file may result in arbitrary system corruption.
Any modern operating system and language runtime will guarantee that such corruption won't occur as a result of operations performed on unrelated files, or when independent file descriptors are used to access the same file in ways that do not modify it. Functions like fread() and fwrite() will be thread-safe when used in that fashion.
The act of reading data from a disk file does not modify it, but reading data from many kinds of stream will modify them by removing data. If two threads both perform actions that modify the same stream, such actions may interfere with each other in unspecified ways even if such modifications are performed by fread() operations.

how to (f)sync a directory under linux in c

I've some c application under linux. I'm renaming some files with rename(...)
How can I ensure that the renaming is written persistent to the underlaying disk?
With a file I can do something like:
FILE * f = fopen("foo","w");
...
fflush(f);
fsync(fileno(f));
fclose(f);
How can I fsync (or similar) a directory after a rename() in c?
This is how you can do what you want:
#include <fcntl.h>
int fd = open('/path/to/dir', O_RDONLY);
fsync(fd);
Don't forget to close the fd file descriptor when no longer needed of course.
Contrary to some misconceptions, the atomicity of rename() does not guarantee the file will be persisted to disk. The atomicity guarantee only ensures that the metadata in the file system buffers is in a consistent state but not that it has been persisted to disk.
rename() is atomic (on linux), so I don't think you need to worry about that
Atomicity is typically guaranteed in operations involving filename handling ; for example, for rename, “specification requires that the action of the function be atomic” – that is, when renaming a file from the old name to the new one, at no circumstances should you ever see the two files at the same time.
a power outage in the middle of a rename() operation shall not leave the filesystem in a “weird” state, with the filename being unreachable because its metadata has been corrupted. (ie. either the operation is lost, or the operation is committed.)
Source
So, I think you should only be worried about error value.
If you really want to be safe, fsync() also flush metadata (on linux), so you could fsync the directory and the file you want to be sure there are present on the disk.
According to the manual, at the return of the function, rename has been done effectively (return 0) or an error occured (return -1) and errno is set to check what's wrong.
If you want the system to apply the potential pending modifications only on this file after rename you can do :
int fd = open(new_name, O_RDONLY);
syncfs(fd);

Equivalent of fgetc with Unix file descriptors

The fgetc(3) function takes a FILE * as its input stream. Must I reimplement character-at-a-time input with read(2), or is there a <unistd.h>-style equivalent taking an integer file descriptor instead?
No, there isn't such a thing, and please never do read(fd, &ch, sizeof(char)) (explanations below).
The function read(2) is usually implemented as a system call to the operating system kernel. Although the internal (and funky) details of such a thing shall not be discused here, the overall idea is that system calls are (usually) not something cheap.
It would be inefficient for both the userspace application and the kernel to do a system call just to get a single character from a file descriptor.
For instance, fgetc(3) usually ends up doing some buffering inside the structure of the FILE object. This means that the internal read(2) from fgetc(3) wouldn't just read a single character, but rather it'll try to get more for the sake of efficiency.
Anyway, it's not usually a good idea to mess up with such low-level stuff. You can get all the benefits of buffering (and of FILEs overall) by using fdopen(3) to create a FILE object from a file descriptor, as your question appears to imply that you have at hand just a raw file descriptor at the moment.
If you want to, you can open a file using open() -
int fh = open("abc.txt", O_RDONLY, S_IREAD); // there are different permissions you can provide (refer to link).
and then you can use fh in read() calls.

Are fopen/fread/fgets PID-safe in C?

Various users are browsing through a website 100% programmed in C (CGI). Each webpage uses fopen/fgets/fread to read common data (like navigation bars) from files. Would each call to fopen/fgets/fread interefere with each other if various people are browsing the same page ? If so, how can this be solved in C ? (This is a Linux server, compiling is done with gcc and this is for a CGI website programmed in C.)
Example:
FILE *DATAFILE = fopen(PATH, "r");
if ( DATAFILE != NULL )
{
while ( fgets( LINE, BUFFER, DATAFILE ) )
{
/* do something */
}
}
On Linux, it's perfectly safe for multiple processes to simultaneously read from a file.
It's perfectly safe to read from multiple processes (in any modern system).
A call to fopen() returns a pointer to a FILE structure, which has its own members, like flags, current position, etc.
You should only care if somebody changes the file (e.g: shrink), while others are reading it. But I imagine this isn't your case.
Concurrent reads from a file (whether from multiple threads -- assuming from separately opened descriptors -- or from multiple processes) is well-defined and permitted on all modern major operating systems. It is only concurrent writes to a file which are ill-defined and which you should not attempt to do without locking (unless you are appending to the file, like a log, and the OS makes such concurrent writes well-defined).

C fopen vs open

Is there any reason (other than syntactic ones) that you'd want to use
FILE *fdopen(int fd, const char *mode);
or
FILE *fopen(const char *path, const char *mode);
instead of
int open(const char *pathname, int flags, mode_t mode);
when using C in a Linux environment?
First, there is no particularly good reason to use fdopen if fopen is an option and open is the other possible choice. You shouldn't have used open to open the file in the first place if you want a FILE *. So including fdopen in that list is incorrect and confusing because it isn't very much like the others. I will now proceed to ignore it because the important distinction here is between a C standard FILE * and an OS-specific file descriptor.
There are four main reasons to use fopen instead of open.
fopen provides you with buffering IO that may turn out to be a lot faster than what you're doing with open.
fopen does line ending translation if the file is not opened in binary mode, which can be very helpful if your program is ever ported to a non-Unix environment (though the world appears to be converging on LF-only (except IETF text-based networking protocols like SMTP and HTTP and such)).
A FILE * gives you the ability to use fscanf and other stdio functions.
Your code may someday need to be ported to some other platform that only supports ANSI C and does not support the open function.
In my opinion the line ending translation more often gets in your way than helps you, and the parsing of fscanf is so weak that you inevitably end up tossing it out in favor of something more useful.
And most platforms that support C have an open function.
That leaves the buffering question. In places where you are mainly reading or writing a file sequentially, the buffering support is really helpful and a big speed improvement. But it can lead to some interesting problems in which data does not end up in the file when you expect it to be there. You have to remember to fclose or fflush at the appropriate times.
If you're doing seeks (aka fsetpos or fseek the second of which is slightly trickier to use in a standards compliant way), the usefulness of buffering quickly goes down.
Of course, my bias is that I tend to work with sockets a whole lot, and there the fact that you really want to be doing non-blocking IO (which FILE * totally fails to support in any reasonable way) with no buffering at all and often have complex parsing requirements really color my perceptions.
open() is a low-level os call. fdopen() converts an os-level file descriptor to the higher-level FILE-abstraction of the C language. fopen() calls open() in the background and gives you a FILE-pointer directly.
There are several advantages to using FILE-objects rather raw file descriptors, which includes greater ease of usage but also other technical advantages such as built-in buffering. Especially the buffering generally results in a sizeable performance advantage.
fopen vs open in C
1) fopen is a library function while open is a system call.
2) fopen provides buffered IO which is faster compare to open which is non buffered.
3) fopen is portable while open not portable (open is environment specific).
4) fopen returns a pointer to a FILE structure(FILE *); open returns an integer that identifies the file.
5) A FILE * gives you the ability to use fscanf and other stdio functions.
Unless you're part of the 0.1% of applications where using open is an actual performance benefit, there really is no good reason not to use fopen. As far as fdopen is concerned, if you aren't playing with file descriptors, you don't need that call.
Stick with fopen and its family of methods (fwrite, fread, fprintf, et al) and you'll be very satisfied. Just as importantly, other programmers will be satisfied with your code.
If you have a FILE *, you can use functions like fscanf, fprintf and fgets etc. If you have just the file descriptor, you have limited (but likely faster) input and output routines read, write etc.
open() is a system call and specific to Unix-based systems and it returns a file descriptor. You can write to a file descriptor using write() which is another system call.
fopen() is an ANSI C function call which returns a file pointer and it is portable to other OSes. We can write to a file pointer using fprintf.
In Unix:
You can get a file pointer from the file descriptor using:
fP = fdopen(fD, "a");
You can get a file descriptor from the file pointer using:
fD = fileno (fP);
Using open, read, write means you have to worry about signal interaptions.
If the call was interrupted by a signal handler the functions will return -1
and set errno to EINTR.
So the proper way to close a file would be
while (retval = close(fd), retval == -1 && ernno == EINTR) ;
I changed to open() from fopen() for my application, because fopen was causing double reads every time I ran fopen fgetc . Double reads were disruptive of what I was trying to accomplish. open() just seems to do what you ask of it.
open() will be called at the end of each of the fopen() family functions. open() is a system call and fopen() are provided by libraries as a wrapper functions for user easy of use
Depends also on what flags are required to open. With respect to usage for writing and reading (and portability) f* should be used, as argued above.
But if basically want to specify more than standard flags (like rw and append flags), you will have to use a platform specific API (like POSIX open) or a library that abstracts these details. The C-standard does not have any such flags.
For example you might want to open a file, only if it exits. If you don't specify the create flag the file must exist. If you add exclusive to create, it will only create the file if it does not exist. There are many more.
For example on Linux systems there is a LED interface exposed through sysfs. It exposes the brightness of the led through a file. Writing or reading a number as a string ranging from 0-255. Of course you don't want to create that file and only write to it if it exists. The cool thing now: Use fdopen to read/write this file using the standard calls.
opening a file using fopen
before we can read(or write) information from (to) a file on a disk we must open the file. to open the file we have called the function fopen.
1.firstly it searches on the disk the file to be opened.
2.then it loads the file from the disk into a place in memory called buffer.
3.it sets up a character pointer that points to the first character of the buffer.
this the way of behaviour of fopen function
there are some causes while buffering process,it may timedout. so while comparing fopen(high level i/o) to open (low level i/o) system call , and it is a faster more appropriate than fopen.

Resources