fopen multiple times in append mode - c

I have multiple threads attempting to log to the same file.
Each thread has a FILE * that points to the file. The FILE *s were opened in append ('a') mode and are using line buffering.
Opening multiple FILE * to the same file within the same process is implementation defined according to ANSI C.
Would anyone happen to know the implementation specific behaviour for MacOS, FreeBSD and Linux, specifically whether each FILE * will have its own line buffer, and whether there's any chance of lost or interleaved writes.

MacOS, FreeBSD and Linux are all POSIX systems. As such each FILE* will have its own user-space buffer (or none if you disable it), and once that buffer is flushed it will be written to the underlying file descriptor. POSIX guarantees that append opened file descriptor writes are atomic, thus no data will be lost. As long as your data isn't split across multiple flushes it won't interleave with each other either.

Related

is there an official document that mark read/write function as thread-safe functions?

the man pages of read/write didn't mention anything about their thread-safety
According to this link!
i understood this functions are thread safe but in this comment there is not a link to an official document.
In other hand according to this link! which says:
The read() function shall attempt to read nbyte bytes
from the file associated with the open file descriptor,
fildes, into the buffer pointed to by buf.
The behavior of multiple concurrent reads on the same pipe, FIFO, or
terminal device is unspecified.
I concluded the read function is not thread safe.
I am so confused now. please send me a link to official document about thread-safety of this functions.
i tested this functions with pipe but there wasn't any problem.(of course i know i couldn't state any certain result by testing some example)
thanks in advance:)
The thread safe versions of read and write are pread and pwrite:
pread(2)
The pread() and pwrite() system calls are especially useful in
multithreaded applications. They allow multiple threads to perform
I/O on the same file descriptor without being affected by changes to
the file offset by other threads.
when two threads write() at the same time the order is not specified (which write call completes first) therefore the behaviour is unspecified (without synchronization)
read() and write() are not strictly thread-safe, and there is no documentation that says they are, as the location where the data is read from or written to can be modified by another thread.
Per the POSIX read documentation (note the bolded parts):
The read() function shall attempt to read nbyte bytes from the file associated with the open file descriptor, fildes, into the buffer pointed to by buf. The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.
That's the part you noticed - but that does not cover all possible types of file descriptors, such as regular files. It only applies to "pipe[s], FIFO[s]" and "terminal device[s]". This part covers almost everything else (weird things like "files" in /proc that are generated on the fly by the kernel are, well, weird and highly implementation-specific):
On files that support seeking (for example, a regular file), the read() shall start at a position in the file given by the file offset associated with fildes. The file offset shall be incremented by the number of bytes actually read.
Since the "file offset associated with fildes" is subject to modification from other threads in the process, the following code is not guaranteed to return the same results even given the exact same file contents and inputs for fd, offset, buffer, and bytes:
lseek( fd, offset, SEEK_SET );
read( fd, buffer, bytes );
Since both read() and write() depend upon a state (current file offset) that can be modified at any moment by another thread, they are not tread-safe.
On some embedded file systems, or really old desktop systems that weren't designed to facilitate multitasking support (e.g. MS-DOS 3.0), an attempt to perform an fread() on one file while an fread() is being performed on another file may result in arbitrary system corruption.
Any modern operating system and language runtime will guarantee that such corruption won't occur as a result of operations performed on unrelated files, or when independent file descriptors are used to access the same file in ways that do not modify it. Functions like fread() and fwrite() will be thread-safe when used in that fashion.
The act of reading data from a disk file does not modify it, but reading data from many kinds of stream will modify them by removing data. If two threads both perform actions that modify the same stream, such actions may interfere with each other in unspecified ways even if such modifications are performed by fread() operations.

Is there a guaranteed and safe way to truncate a file from ANSI C FILE pointer?

I know ANSI C defines fopen, fwrite, fread, fclose to modify a file's content. However, when it comes to truncating a file, we have to turn to OS specific function, e.g, truncate() on Linux, _chsize_s_() on Windows. But before we can call those OS specific functions, we have to obtain the file-handle from FILE pointer, by calling fileno, also an non-ANSI-C one.
My question is: Is it reliable to continue using FILE* after truncating the file? I mean, ANSI C FILE layer has its own buffer and does not know the file is truncated from beneath. In case the buffered bytes is beyond the truncated point, will the buffered content be flushed to the file when doing fclose() ?
If no guarantee, what is the best practice of using file I/O functions accompanied with truncate operation when write a Windows-Linux portable program?
Similar question: When querying file size from a file-handle returned by fileno , is it the accurate size when I later call fclose() -- without further fwrite()?
[EDIT 2012-12-11]
According to Joshua's suggestion. I conclude that current possible best practice is: Set the stream to unbuffered mode by calling setbuf(stream, NULL); , then truncate() or _chsize_s() can work peacefully with the stream.
Anyhow, no official document seems to explicitly confirm this behavior, whether Microsoft CRT or GNU glibc.
The POSIX way....
ftruncate() is what you're looking for, and it's been in POSIX base specifications since 2001, so it should be in every modern POSIX-compatible system by now.
Note that ftruncate() operates on a POSIX file descriptor (despite its potentially misleading name), not a STDIO stream FILE handle. Note also that mixing operations on the STDIO stream and on the underlying OS calls which operate on the file descriptor for the open stream can confuse the internal runtime state of the STDIO library.
So, to use ftruncate() safely with STDIO it may be necessary to first flush any STDIO buffers (with fflush()) if your program may have already written to the stream in question. This will avoid STDIO trying to flush the otherwise unwritten buffer to the file after the truncation has been done.
You can then use fileno() on the STDIO stream's FILE handle to find the underlying file descriptor for the open STDIO stream, and you would then use that file descriptor with ftruncate(). You might consider putting the call to fileno() right in the parameter list for the ftruncate() call so that you don't keep the file descriptor around and accidentally use it yet other ways which might further confuse the internal state of STDIO. Perhaps like this (say to truncate a file to the current STDIO stream offset):
/*
* NOTE: fflush() is not needed here if there have been no calls to fseek() since
* the last fwrite(), assuming it extended the length of the stream --
* ftello() will account for any unwritten buffers
*/
if (ftruncate(fileno(stdout), ftello(stdout)) == -1) {
fprintf(stderr, "%s: ftruncate(stdout) failed: %s\n", argv[0], strerror(errno));
exit(1);
}
/* fseek() is not necessary here since we truncated at the current offset */
Note also that the POSIX definition of ftruncate() says "The value of the seek pointer shall not be modified by a call to ftruncate()", so this means you may also need to use use fseek() to set the STDIO layer (and thus indirectly the file descriptor) either to the new end of the file, or perhaps back to the beginning of the file, or somewhere still within the boundaries of the file, as desired. (Note that the fseek() should not be necessary if the truncation point is found using ftello().)
You should not have to make the STDIO stream unbuffered if you follow the procedure above, though of course doing so could be an alternative to using fflush() (but not fseek()).
Without POSIX....
If you need to stick to strict ISO Standard C, say C99, then you have no portable way to truncate a file to a given length other than zero (0) length. The latest draft of C11 that I have says this in Section 7.21.3 (paragraph 2):
Binary files are not truncated, except as defined in 7.21.5.3. Whether a write on a text stream causes the associated file to be truncated beyond that point is implementation-defined.
(and 7.21.5.3 describes the flags to fopen() which allow a file to be truncated to a length of zero)
The caveat about text files is there because on silly systems that have both text and binary files (as opposed to just plain POSIX-style content agnostic files) then it is often possible to write a value to the file which will be stored in the file at the position written and which will be treated as an EOF indicator when the file is next read.
Other types of systems may have different underlying file I/O interfaces that are not compatible with POSIX while still providing a compatible ISO C STDIO library. In theory if such a system offers something similar to fileno() and ftrunctate() then a similar procedure could be used with them as well, provided that one took the same care to avoid confusing the internal runtime state of the STDIO library.
With regard to querying file size....
You also asked whether the file size found by querying the file descriptor returned by fileno() would be an accurate representation of the file size after a successful call to fclose(), even without any further calls to fwrite().
The answer is: Don't do that!
As I mentioned above, the POSIX file descriptor for a file opened as a STDIO stream must be used very carefully if you don't want to confuse the internal runtime state of the STDIO library. We can add here that it is important not to confuse yourself with it either.
The most correct way to find the current size of a file opened as a STDIO stream is to seek to the end of it and then ask where the stream pointer is by using only STDIO functions.
Isn't an unbuffered write of zero bytes supposed to truncate the file at that point?
See this question for how to set unbuffered: Unbuffered I/O in ANSI C

What is Opening a file in C?

In C when we open a file what happens?? As I know that the contents of the file is not loaded in the memory when we open a file. It just sets the file descriptor ? So what is this file descriptor then?? And if the contents of the file is not loaded in the memory then how a file is opened?
Typically, if you're opening a file with fopen or (on a POSIX system) open, the function, if successful, will "open the file" - it merely gives you a value (a FILE * or an int) to use in future calls to a read function.
Under the hood, the operating system might read some or all of the file in, it might not. You have to call some function to request data to be read anyways, and if it hasn't done it by the time you call fread/fgets/read/etc... then it will at that point.
A "file descriptor" typically refers to the integer returned by open in POSIX systems. It is used to identify an open file. If you get a value 3, somewhere, the operating system is keeping track that 3 refers to /home/user/dir/file.txt, or whatever. It's a short little value to indicate to the OS which file to read from. When you call open, and open say, foo.txt, the OS says, "ok, file open, calling it 3 from here on".
This question is not entirely related to the programming language. Although the library does have an impact on what happens when opening a file (using open or fopen, for example), the main behavior comes from the operating system.
Linux, and I assume other OSs perform read ahead in most cases. This means that the file is actually read from the physical storage even before you call read for the file. This is done as an optimization, reducing the time for the read when the file is actually read by the user. This behavior can be controlled partially by the programmer, using specific flag for the open functions. For example, the Win32 API CreateFile can specify FILE_FLAG_RANDOM_ACCESS or FILE_FLAG_SEQUENTIAL_SCAN to specify random access (in which case the file is not read ahead) or sequential access (in which case the OS will perform quite aggressive read ahead), respectively. Other OS APIs might give more or less control.
For the basic ANSI C API of open, read, write that use a file descriptor, the file descriptor is a simple integer that is passed onto the OS and signifies the file. In the OS itself this is most often translated to some structure that contains all the needed information for the file (name, path, seek offsets, size, read and write buffers, etc.). The OS will open the file - meaning find the specific file system entry (an inode under Linux) that correlates to the path you've given in the open method, creates the file structure and return an ID to the user - the file descriptor. From that point on the OS is free to read whatever data it seems fit, even if not requested by the user (reading more than was requested is often done, to at least work in the file system native size).
C has no primitives for file I/O, it all depends on what operating system
and what libraries you are using.
File descriptors are just abstracts. Everything is done on the operating system.
If the program uses fopen() then a buffering package will use an implementation-specific system call to get a file descriptor and it will store it in a FILE structure.
The system call (at least on Unix, Linux, and the Mac) will look around on (usually) a disk-based filesystem to find the file. It creates data structures in the kernel memory that collects the information needed to read or write the file.
It also creates a table for each process that links to the other kernel data structures necessary to access the file. The index into this table is a (usually) small number. This is the file descriptor that is returned from the system call to the user process, and then stored in the FILE struct.
As already mentioned it is OS functionality.
But for C file I/O most probably you need info on fopen function.
If you will check description for that function, it says :
Description:
Opens a stream.
fopen opens the file named by
filename and associates a stream with
it. fopen returns a pointer to be used
to identify the stream in subsequent
operations.
So on successful completion fopen just returns a pointer to the newly opened stream. And it returns NULL in case of any error.
When you open the file then the file pointer gets the base address(starting address)of that file.Then you use different functions to work on the file.
EDIT:
Thanks to Chris,here is the structure which is named FILE
typedef struct {
int level; /* fill/empty level of buffer */
unsigned flags; /* File status flags */
char fd; /* File descriptor */
unsigned char hold; /* Ungetc char if no buffer */
int bsize; /* Buffer size */
unsigned char *buffer; /* Data transfer buffer */
unsigned char *curp; /* Current active pointer */
unsigned istemp; /* Temporary file indicator */
short token; /* Used for validity checking */
} FILE;

Difference between fflush and fsync

I thought fsync() does fflush() internally, so using fsync() on a stream is OK. But I am getting an unexpected result when executed under network I/O.
My code snippet:
FILE* fp = fopen(file, "wb");
/* multiple fputs() calls like: */
fputs(buf, fp);
...
...
fputs(buf.c_str(), fp);
/* get fd of the FILE pointer */
fd = fileno(fp);
#ifndef WIN32
ret = fsync(fd);
#else
ret = _commit(fd);
fclose(fp);
But it seems _commit() is not flushing the data (I tried on Windows and the data was written on a Linux exported filesystem).
When I changed the code to be:
FILE* fp = fopen(file, "wb");
/* multiple fputs() calls like: */
fputs(buf, fp);
...
...
fputs(buf.c_str(), fp);
/* fflush the data */
fflush(fp);
fclose(fp);
it flushes the data.
I am wondering if _commit() does the same thing as fflush(). Any inputs?
fflush() works on FILE*, it just flushes the internal buffers in the FILE* of your application out to the OS.
fsync works on a lower level, it tells the OS to flush its buffers to the physical media.
OSs heavily cache data you write to a file. If the OS enforced every write to hit the drive, things would be very slow. fsync (among other things) allows you to control when the data should hit the drive.
Furthermore, fsync/commit works on a file descriptor. It has no knowledge of a FILE* and can't flush its buffers. FILE* lives in your application, file descriptors live in the OS kernel, typically.
The standard C function fflush() and the POSIX system call fsync() are conceptually somewhat similar. fflush() operates on C file streams (FILE objects), and is therefore portable.
fsync() operate on POSIX file descriptors.
Both cause buffered data to be sent to a destination.
On a POSIX system, each C file stream has an associated file descriptor, and all the operations on a C file stream will be implemented by delegating, when necessary, to POSIX system calls that operate on the file descriptor.
One might think that a call to fflush on a POSIX system would cause a write of any data in the buffer of the file stream, followed by a call of fsync() for the file descriptor of that file stream. So on a POSIX system there would be no need to follow a call to fflush with a call to fsync(fileno(fp)). But is that the case: is there a call to fsync from fflush?
No, calling fflush on a POSIX system does not imply that fsync will be called.
The C standard for fflush says (emphasis added) it
causes any unwritten data for [the] stream to be delivered to the host environment to be written to the file
Saying that the data is to be written, rather than that is is written implies that further buffering by the host environment is permitted. That buffering by the "host environment" could include, for a POSIX environment, the internal buffering that fsync flushes. So a close reading of the C standard suggests that the standard does not require the POSIX implementation to call fsync.
The POSIX standard description of fflush does not declare, as an extension of the C semantics, that fsync is called.
fflush() and fsync() can be used to try and ensure data is written to the storage media (but it is not always be possible):
first use fflush(fp) on the output stream (fp being a FILE * obtained from fopen or one of the standard streams stdout or stderr) to write the contents of the buffer associated with the stream to the OS.
then use fsync(fileno(fp)) to tell the OS to write its own buffers to the storage media.
Note however that fileno() and fsync() are POSIX functions that might not be available on all systems, notably Microsoft legacy systems where alternatives may be named _fileno(), _fsync() or _commit()...
I could say that for simplicity:
use fsync() with not streaming files (integer file descriptors)
use fflush() with file streams.
Also here is the help from man:
int fflush(FILE *stream); // flush a stream, FILE* type
int fsync(int fd); // synchronize a file's in-core state with storage device
// int type
To force the commitment of recent changes to disk, use the sync() or fsync() functions.
fsync() will synchronize all of the given file's data and metadata with the permanent storage device. It should be called just before the corresponding file has been closed.
sync() will commit all modified files to disk.
I think below document from python (https://docs.python.org/2/library/os.html) clarifies it very well.
os.fsync(fd) Force write of file with filedescriptor fd to disk. On
Unix, this calls the native fsync() function; on Windows, the MS
_commit() function.
If you’re starting with a Python file object f, first do f.flush(),
and then do os.fsync(f.fileno()), to ensure that all internal buffers
associated with f are written to disk.
Availability: Unix, and Windows starting in 2.2.3.

Is there any ordinary reason to use open() instead of fopen()?

I'm doing a small project in C after quite a long time away from it. These happen to include some file handling. I noticed in various documentation that there are functions which return FILE * handles and others which return (small integer) descriptors. Both sets of functions offer the same basic services I need so it really does not matter I use.
But I'm curious about the collection wisdom: is it better to use fopen() and friends, or open() and friends?
Edit Since someone mentioned buffered vs unbuffered and accessing devices, I should add that one part of this small project will be writing a userspace filesystem driver under FUSE. So the file level access could as easily be on a device (e.g. a CDROM or a SCSI drive) as on a "file" (i.e. an image).
It is better to use open() if you are sticking to unix-like systems and you might like to:
Have more fine-grained control over unix permission bits on file creation.
Use the lower-level functions such as read/write/mmap as opposed to the C buffered stream I/O functions.
Use file descriptor (fd) based IO scheduling (poll, select, etc.) You can of course obtain an fd from a FILE * using fileno(), but care must be taken not to mix FILE * based stream functions with fd based functions.
Open any special device (not a regular file)
It is better to use fopen/fread/fwrite for maximum portability, as these are standard C functions, the functions I've mentioned above aren't.
The objection that "fopen" is portable and "open" isn't is bogus.
fopen is part of libc, open is a POSIX system call.
Each is as portable as the place they come from.
i/o to fopen'ed files is (you must assume it may be, and for practical purposes, it is) buffered by libc, file descriptors open()'ed are not buffered by libc (they may well be, and usually are buffered in the filesystem -- but not everything you open() is a file on a filesystem.
What's the point of fopen'ing, for example, a device node like /dev/sg0, say, or /dev/tty0... What are you going to do? You're going to do an ioctl on a FILE *? Good luck with that.
Maybe you want to open with some flags like O_DIRECT -- makes no sense with fopen().
fopen works at a higher level than open ....
fopen returns you a pointer to FILE stream which is similar to the stream abstraction that you read in C++
open returns you a file descriptor for the file opened ... It does not provide you a stream abstraction and you are responsible for handling the bits and bytes yourself ... This is at a lower level as compared to fopen
Stdio streams are buffered, while open() file descriptors are not. Depends on what you need. You can also create one from the other:
int fileno (FILE * stream) returns the file descriptor for a FILE *, FILE * fdopen(int fildes, const char * mode) creates a FILE * from a file descriptor.
Be careful when intermixing buffered and non-buffered IO, since you'll lose what's in your buffer when you don't flush it with fflush().
Yes. When you need a low-level handle.
On UNIX operating systems, you can generally exchange file handles and sockets.
Also, low-level handles make for better ABI compatibility than FILE pointers.
read() & write() use unbuffered I/O. (fd: integer file descriptor)
fread() & fwrite() use buffered I/O. (FILE* structure pointer)
Binary data written to a pipe with write() may not be able to read binary data with fread(), because of byte alignments, variable sizes, etc. Its a crap-shoot.
Most low-level device driver code uses unbuffered I/O calls.
Most application level I/O uses buffered.
Use of the FILE* and its associated functions
is OK on a machine-by-machine basis: but portability is lost
on other architectures in the reading and writing of binary data.
fwrite() is buffered I/O and can lead to unreliable results if
written for a 64 bit architecture and run on a 32bit; or (Windows/Linux).
Most OSs have compatibility macros within their own code to prevent this.
For low-level binary I/O portability read() and write() guarantee
the same binary reads and writes when compiled on differing architectures.
The basic thing is to pick one way or the other and be consistent about it,
throughout the binary suite.
<stdio.h> // mostly FILE* some fd input/output parameters for compatibility
// gives you a lot of helper functions -->
List of Functions
Function Description
───────────────────────────────────────────────────────────────────
clearerr check and reset stream status
fclose close a stream
fdopen stream open functions //( fd argument, returns FILE*) feof check and reset stream status
ferror check and reset stream status
fflush flush a stream
fgetc get next character or word from input stream
fgetpos reposition a stream
fgets get a line from a stream
fileno get file descriptor // (FILE* argument, returns fd)
fopen stream open functions
fprintf formatted output conversion
fpurge flush a stream
fputc output a character or word to a stream
fputs output a line to a stream
fread binary stream input/output
freopen stream open functions
fscanf input format conversion
fseek reposition a stream
fsetpos reposition a stream
ftell reposition a stream
fwrite binary stream input/output
getc get next character or word from input stream
getchar get next character or word from input stream
gets get a line from a stream
getw get next character or word from input stream
mktemp make temporary filename (unique)
perror system error messages
printf formatted output conversion
putc output a character or word to a stream
putchar output a character or word to a stream
puts output a line to a stream
putw output a character or word to a stream
remove remove directory entry
rewind reposition a stream
scanf input format conversion
setbuf stream buffering operations
setbuffer stream buffering operations
setlinebuf stream buffering operations
setvbuf stream buffering operations
sprintf formatted output conversion
sscanf input format conversion
strerror system error messages
sys_errlist system error messages
sys_nerr system error messages
tempnam temporary file routines
tmpfile temporary file routines
tmpnam temporary file routines
ungetc un-get character from input stream
vfprintf formatted output conversion
vfscanf input format conversion
vprintf formatted output conversion
vscanf input format conversion
vsprintf formatted output conversion
vsscanf input format conversion
So for basic use I would personally use the above without mixing idioms too much.
By contrast,
<unistd.h> write()
lseek()
close()
pipe()
<sys/types.h>
<sys/stat.h>
<fcntl.h> open()
creat()
fcntl()
all use file descriptors.
These provide fine-grained control over reading and writing bytes
(recommended for special devices and fifos (pipes) ).
So again, use what you need, but keep consistent in your idioms and interfaces.
If most of your code base uses one mode , use that too, unless there is
a real reason not to. Both sets of I/O library functions are extremely reliable
and used millions of times a day.
note-- If you are interfacing C I/O with another language,
(perl, python, java, c#, lua ...) check out what the developers of those languages
recommend before you write your C code and save yourself some trouble.
usually, you should favor using the standard library (fopen). However, there are occasions where you will need to use open directly.
One example that comes to mind is to work around a bug in an older version of solaris which made fopen fail after 256 files were open. This was because they erroniously used an unsigned char for the fd field in their struct FILE implementation instead of an int. But this was a very specific case.
fopen and its cousins are buffered. open, read, and write are not buffered. Your application may or may not care.
fprintf and scanf have a richer API that allows you to read and write formatted text files. read and write use fundamental arrays of bytes. Conversions and formatting must be hand crafted.
The difference between file descriptors and (FILE *) is really inconsequential.
Randy

Resources