Equivalent of fgetc with Unix file descriptors - c

The fgetc(3) function takes a FILE * as its input stream. Must I reimplement character-at-a-time input with read(2), or is there a <unistd.h>-style equivalent taking an integer file descriptor instead?

No, there isn't such a thing, and please never do read(fd, &ch, sizeof(char)) (explanations below).
The function read(2) is usually implemented as a system call to the operating system kernel. Although the internal (and funky) details of such a thing shall not be discused here, the overall idea is that system calls are (usually) not something cheap.
It would be inefficient for both the userspace application and the kernel to do a system call just to get a single character from a file descriptor.
For instance, fgetc(3) usually ends up doing some buffering inside the structure of the FILE object. This means that the internal read(2) from fgetc(3) wouldn't just read a single character, but rather it'll try to get more for the sake of efficiency.
Anyway, it's not usually a good idea to mess up with such low-level stuff. You can get all the benefits of buffering (and of FILEs overall) by using fdopen(3) to create a FILE object from a file descriptor, as your question appears to imply that you have at hand just a raw file descriptor at the moment.

If you want to, you can open a file using open() -
int fh = open("abc.txt", O_RDONLY, S_IREAD); // there are different permissions you can provide (refer to link).
and then you can use fh in read() calls.

Related

is there an official document that mark read/write function as thread-safe functions?

the man pages of read/write didn't mention anything about their thread-safety
According to this link!
i understood this functions are thread safe but in this comment there is not a link to an official document.
In other hand according to this link! which says:
The read() function shall attempt to read nbyte bytes
from the file associated with the open file descriptor,
fildes, into the buffer pointed to by buf.
The behavior of multiple concurrent reads on the same pipe, FIFO, or
terminal device is unspecified.
I concluded the read function is not thread safe.
I am so confused now. please send me a link to official document about thread-safety of this functions.
i tested this functions with pipe but there wasn't any problem.(of course i know i couldn't state any certain result by testing some example)
thanks in advance:)
The thread safe versions of read and write are pread and pwrite:
pread(2)
The pread() and pwrite() system calls are especially useful in
multithreaded applications. They allow multiple threads to perform
I/O on the same file descriptor without being affected by changes to
the file offset by other threads.
when two threads write() at the same time the order is not specified (which write call completes first) therefore the behaviour is unspecified (without synchronization)
read() and write() are not strictly thread-safe, and there is no documentation that says they are, as the location where the data is read from or written to can be modified by another thread.
Per the POSIX read documentation (note the bolded parts):
The read() function shall attempt to read nbyte bytes from the file associated with the open file descriptor, fildes, into the buffer pointed to by buf. The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.
That's the part you noticed - but that does not cover all possible types of file descriptors, such as regular files. It only applies to "pipe[s], FIFO[s]" and "terminal device[s]". This part covers almost everything else (weird things like "files" in /proc that are generated on the fly by the kernel are, well, weird and highly implementation-specific):
On files that support seeking (for example, a regular file), the read() shall start at a position in the file given by the file offset associated with fildes. The file offset shall be incremented by the number of bytes actually read.
Since the "file offset associated with fildes" is subject to modification from other threads in the process, the following code is not guaranteed to return the same results even given the exact same file contents and inputs for fd, offset, buffer, and bytes:
lseek( fd, offset, SEEK_SET );
read( fd, buffer, bytes );
Since both read() and write() depend upon a state (current file offset) that can be modified at any moment by another thread, they are not tread-safe.
On some embedded file systems, or really old desktop systems that weren't designed to facilitate multitasking support (e.g. MS-DOS 3.0), an attempt to perform an fread() on one file while an fread() is being performed on another file may result in arbitrary system corruption.
Any modern operating system and language runtime will guarantee that such corruption won't occur as a result of operations performed on unrelated files, or when independent file descriptors are used to access the same file in ways that do not modify it. Functions like fread() and fwrite() will be thread-safe when used in that fashion.
The act of reading data from a disk file does not modify it, but reading data from many kinds of stream will modify them by removing data. If two threads both perform actions that modify the same stream, such actions may interfere with each other in unspecified ways even if such modifications are performed by fread() operations.

Can I adapt a function that writes to disk to write to memory

I have third-party library with a function that does some computation on the specified data, and writes the results to a file specified by file name:
int manipulateAndWrite(const char *filename,
const FOO_DATA *data);
I cannot change this function, or reimplement the computation in my own function, because I do not have the source.
To get the results, I currently need to read them from the file. I would prefer to avoid the write to and read from the file, and obtain the results into a memory buffer instead.
Can I pass a filepath that indicates writing to memory instead of a
filesystem?
Yes, you have several options, although only the first suggestion below is supported by POSIX. The rest of them are OS-specific, and may not be portable across all POSIX systems, although I do believe they work on all POSIXy systems.
You can use a named pipe (FIFO), and have a helper thread read from it concurrently to the writer function.
Because there is no file per se, the overhead is just the syscalls (write and read); basically just the overhead of interprocess communication, nothing to worry about. To conserve resources, do create the helper thread with a small stack (using pthread_attr_ etc.), as the default stack size tends to be huge (on the order of several megabytes; 2*PTHREAD_STACK_SIZE should be plenty for helper threads.)
You should ensure the named pipe is in a safe directory, accessible only to the user running the process, for example.
In many POSIXy systems, you can create a pipe or a socket pair, and access it via /dev/fd/N, where N is the descriptor number in decimal. (In Linux, /proc/self/fd/N also works.) This is not mandated by POSIX, so may not be available on all systems, but most do support it.
This way, there is no actual file per se, and the function writes to the pipe or socket. If the data written by the function is at most PIPE_BUF bytes, you can simply read the data from the pipe afterwards; otherwise, you do need to create a helper thread to read from the pipe or socket concurrently to the function, or the write will block.
In this case, too, the overhead is minimal.
On ELF-based POSIXy systems (basically all), you can interpose the open(), write(), and close() syscalls or C library functions.
(In Linux, there are two basic approaches, one using the linker --wrap, and one using dlsym(). Both work fine for this particular case. This ability to interpose functions is based on how ELF binaries are linked at run time, and is not directly related to POSIX.)
You first set up the interposing functions, so that open() detects if the filename matches your special "in-memory" file, and returns a dedicated descriptor number for it. (You may also need to interpose other functions, like ftruncate() or lseek(), depending on what the function actually does; in Linux, you can run a binary under ptrace to examine what syscalls it actually uses.)
When write() is called with the dedicated descriptor number, you simply memcpy() it to a memory buffer. You'll need to use global variables to describe the allocated size, size used, and the pointer to the memory buffer, and probably be prepared to resize/grow the buffer if necessary.
When close() is called with the dedicated descriptor number, you know the memory buffer is complete, and the contents ready for processing.
You can use a temporary file on a RAM filesystem. While the data is technically written to a file and read back from it, the operations involve RAM only.
You should arrange for a default path to one to be set at compile time, and for individual users to be able to override that for their personal needs, for example via an environment variable (YOURAPP_TMPDIR?).
There is no need for the application to try and look for a RAM-based filesystem: choices like this are, and should be, up to the user. The application should not even care what kind of filesystem the file is on, and should just use the specified directory.
You could not use that library function. Take a look at this on how to write to in-memory files:
Is it possible to create a C FILE object to read/write in memory

Is there a guaranteed and safe way to truncate a file from ANSI C FILE pointer?

I know ANSI C defines fopen, fwrite, fread, fclose to modify a file's content. However, when it comes to truncating a file, we have to turn to OS specific function, e.g, truncate() on Linux, _chsize_s_() on Windows. But before we can call those OS specific functions, we have to obtain the file-handle from FILE pointer, by calling fileno, also an non-ANSI-C one.
My question is: Is it reliable to continue using FILE* after truncating the file? I mean, ANSI C FILE layer has its own buffer and does not know the file is truncated from beneath. In case the buffered bytes is beyond the truncated point, will the buffered content be flushed to the file when doing fclose() ?
If no guarantee, what is the best practice of using file I/O functions accompanied with truncate operation when write a Windows-Linux portable program?
Similar question: When querying file size from a file-handle returned by fileno , is it the accurate size when I later call fclose() -- without further fwrite()?
[EDIT 2012-12-11]
According to Joshua's suggestion. I conclude that current possible best practice is: Set the stream to unbuffered mode by calling setbuf(stream, NULL); , then truncate() or _chsize_s() can work peacefully with the stream.
Anyhow, no official document seems to explicitly confirm this behavior, whether Microsoft CRT or GNU glibc.
The POSIX way....
ftruncate() is what you're looking for, and it's been in POSIX base specifications since 2001, so it should be in every modern POSIX-compatible system by now.
Note that ftruncate() operates on a POSIX file descriptor (despite its potentially misleading name), not a STDIO stream FILE handle. Note also that mixing operations on the STDIO stream and on the underlying OS calls which operate on the file descriptor for the open stream can confuse the internal runtime state of the STDIO library.
So, to use ftruncate() safely with STDIO it may be necessary to first flush any STDIO buffers (with fflush()) if your program may have already written to the stream in question. This will avoid STDIO trying to flush the otherwise unwritten buffer to the file after the truncation has been done.
You can then use fileno() on the STDIO stream's FILE handle to find the underlying file descriptor for the open STDIO stream, and you would then use that file descriptor with ftruncate(). You might consider putting the call to fileno() right in the parameter list for the ftruncate() call so that you don't keep the file descriptor around and accidentally use it yet other ways which might further confuse the internal state of STDIO. Perhaps like this (say to truncate a file to the current STDIO stream offset):
/*
* NOTE: fflush() is not needed here if there have been no calls to fseek() since
* the last fwrite(), assuming it extended the length of the stream --
* ftello() will account for any unwritten buffers
*/
if (ftruncate(fileno(stdout), ftello(stdout)) == -1) {
fprintf(stderr, "%s: ftruncate(stdout) failed: %s\n", argv[0], strerror(errno));
exit(1);
}
/* fseek() is not necessary here since we truncated at the current offset */
Note also that the POSIX definition of ftruncate() says "The value of the seek pointer shall not be modified by a call to ftruncate()", so this means you may also need to use use fseek() to set the STDIO layer (and thus indirectly the file descriptor) either to the new end of the file, or perhaps back to the beginning of the file, or somewhere still within the boundaries of the file, as desired. (Note that the fseek() should not be necessary if the truncation point is found using ftello().)
You should not have to make the STDIO stream unbuffered if you follow the procedure above, though of course doing so could be an alternative to using fflush() (but not fseek()).
Without POSIX....
If you need to stick to strict ISO Standard C, say C99, then you have no portable way to truncate a file to a given length other than zero (0) length. The latest draft of C11 that I have says this in Section 7.21.3 (paragraph 2):
Binary files are not truncated, except as defined in 7.21.5.3. Whether a write on a text stream causes the associated file to be truncated beyond that point is implementation-defined.
(and 7.21.5.3 describes the flags to fopen() which allow a file to be truncated to a length of zero)
The caveat about text files is there because on silly systems that have both text and binary files (as opposed to just plain POSIX-style content agnostic files) then it is often possible to write a value to the file which will be stored in the file at the position written and which will be treated as an EOF indicator when the file is next read.
Other types of systems may have different underlying file I/O interfaces that are not compatible with POSIX while still providing a compatible ISO C STDIO library. In theory if such a system offers something similar to fileno() and ftrunctate() then a similar procedure could be used with them as well, provided that one took the same care to avoid confusing the internal runtime state of the STDIO library.
With regard to querying file size....
You also asked whether the file size found by querying the file descriptor returned by fileno() would be an accurate representation of the file size after a successful call to fclose(), even without any further calls to fwrite().
The answer is: Don't do that!
As I mentioned above, the POSIX file descriptor for a file opened as a STDIO stream must be used very carefully if you don't want to confuse the internal runtime state of the STDIO library. We can add here that it is important not to confuse yourself with it either.
The most correct way to find the current size of a file opened as a STDIO stream is to seek to the end of it and then ask where the stream pointer is by using only STDIO functions.
Isn't an unbuffered write of zero bytes supposed to truncate the file at that point?
See this question for how to set unbuffered: Unbuffered I/O in ANSI C

C fopen vs open

Is there any reason (other than syntactic ones) that you'd want to use
FILE *fdopen(int fd, const char *mode);
or
FILE *fopen(const char *path, const char *mode);
instead of
int open(const char *pathname, int flags, mode_t mode);
when using C in a Linux environment?
First, there is no particularly good reason to use fdopen if fopen is an option and open is the other possible choice. You shouldn't have used open to open the file in the first place if you want a FILE *. So including fdopen in that list is incorrect and confusing because it isn't very much like the others. I will now proceed to ignore it because the important distinction here is between a C standard FILE * and an OS-specific file descriptor.
There are four main reasons to use fopen instead of open.
fopen provides you with buffering IO that may turn out to be a lot faster than what you're doing with open.
fopen does line ending translation if the file is not opened in binary mode, which can be very helpful if your program is ever ported to a non-Unix environment (though the world appears to be converging on LF-only (except IETF text-based networking protocols like SMTP and HTTP and such)).
A FILE * gives you the ability to use fscanf and other stdio functions.
Your code may someday need to be ported to some other platform that only supports ANSI C and does not support the open function.
In my opinion the line ending translation more often gets in your way than helps you, and the parsing of fscanf is so weak that you inevitably end up tossing it out in favor of something more useful.
And most platforms that support C have an open function.
That leaves the buffering question. In places where you are mainly reading or writing a file sequentially, the buffering support is really helpful and a big speed improvement. But it can lead to some interesting problems in which data does not end up in the file when you expect it to be there. You have to remember to fclose or fflush at the appropriate times.
If you're doing seeks (aka fsetpos or fseek the second of which is slightly trickier to use in a standards compliant way), the usefulness of buffering quickly goes down.
Of course, my bias is that I tend to work with sockets a whole lot, and there the fact that you really want to be doing non-blocking IO (which FILE * totally fails to support in any reasonable way) with no buffering at all and often have complex parsing requirements really color my perceptions.
open() is a low-level os call. fdopen() converts an os-level file descriptor to the higher-level FILE-abstraction of the C language. fopen() calls open() in the background and gives you a FILE-pointer directly.
There are several advantages to using FILE-objects rather raw file descriptors, which includes greater ease of usage but also other technical advantages such as built-in buffering. Especially the buffering generally results in a sizeable performance advantage.
fopen vs open in C
1) fopen is a library function while open is a system call.
2) fopen provides buffered IO which is faster compare to open which is non buffered.
3) fopen is portable while open not portable (open is environment specific).
4) fopen returns a pointer to a FILE structure(FILE *); open returns an integer that identifies the file.
5) A FILE * gives you the ability to use fscanf and other stdio functions.
Unless you're part of the 0.1% of applications where using open is an actual performance benefit, there really is no good reason not to use fopen. As far as fdopen is concerned, if you aren't playing with file descriptors, you don't need that call.
Stick with fopen and its family of methods (fwrite, fread, fprintf, et al) and you'll be very satisfied. Just as importantly, other programmers will be satisfied with your code.
If you have a FILE *, you can use functions like fscanf, fprintf and fgets etc. If you have just the file descriptor, you have limited (but likely faster) input and output routines read, write etc.
open() is a system call and specific to Unix-based systems and it returns a file descriptor. You can write to a file descriptor using write() which is another system call.
fopen() is an ANSI C function call which returns a file pointer and it is portable to other OSes. We can write to a file pointer using fprintf.
In Unix:
You can get a file pointer from the file descriptor using:
fP = fdopen(fD, "a");
You can get a file descriptor from the file pointer using:
fD = fileno (fP);
Using open, read, write means you have to worry about signal interaptions.
If the call was interrupted by a signal handler the functions will return -1
and set errno to EINTR.
So the proper way to close a file would be
while (retval = close(fd), retval == -1 && ernno == EINTR) ;
I changed to open() from fopen() for my application, because fopen was causing double reads every time I ran fopen fgetc . Double reads were disruptive of what I was trying to accomplish. open() just seems to do what you ask of it.
open() will be called at the end of each of the fopen() family functions. open() is a system call and fopen() are provided by libraries as a wrapper functions for user easy of use
Depends also on what flags are required to open. With respect to usage for writing and reading (and portability) f* should be used, as argued above.
But if basically want to specify more than standard flags (like rw and append flags), you will have to use a platform specific API (like POSIX open) or a library that abstracts these details. The C-standard does not have any such flags.
For example you might want to open a file, only if it exits. If you don't specify the create flag the file must exist. If you add exclusive to create, it will only create the file if it does not exist. There are many more.
For example on Linux systems there is a LED interface exposed through sysfs. It exposes the brightness of the led through a file. Writing or reading a number as a string ranging from 0-255. Of course you don't want to create that file and only write to it if it exists. The cool thing now: Use fdopen to read/write this file using the standard calls.
opening a file using fopen
before we can read(or write) information from (to) a file on a disk we must open the file. to open the file we have called the function fopen.
1.firstly it searches on the disk the file to be opened.
2.then it loads the file from the disk into a place in memory called buffer.
3.it sets up a character pointer that points to the first character of the buffer.
this the way of behaviour of fopen function
there are some causes while buffering process,it may timedout. so while comparing fopen(high level i/o) to open (low level i/o) system call , and it is a faster more appropriate than fopen.

How can you flush a write using a file descriptor?

It turns out this whole misunderstanding of the open() versus fopen() stems from a buggy I2C driver in the Linux 2.6.14 kernel on an ARM. Backporting a working bit bashed driver solved the root cause of the problem I was trying to address here.
I'm trying to figure out an issue with a serial device driver in Linux (I2C). It appears that by adding timed OS pauses (sleeps) between writes and reads on the device things work ... (much) better.
Aside: The nature of I2C is that each byte read or written by the master is acknowledged by the device on the other end of the wire (slave) - the pauses improving things encourage me to think of the driver as working asynchronously - something that I can't reconcile with how the bus works. Anyhoo ...
I'd either like to flush the write to be sure (rather than using fixed duration pause), or somehow test that the write/read transaction has completed in an multi-threaded friendly way.
The trouble with using fflush(fd); is that it requires 'fd' to be stream pointer (not a file descriptor) i.e.
FILE * fd = fopen("filename","r+");
... // do read and writes
fflush(fd);
My problem is that I require the use of the ioctl(), which doesn't use a stream pointer. i.e.
int fd = open("filename",O_RDWR);
ioctl(fd,...);
Suggestions?
I think what you are looking for may be
int fsync(int fd);
or
int fdatasync(int fd);
fsync will flush the file from kernel buffer to the disk. fdatasync will also do except for the meta data.
You have two choices:
Use fileno() to obtain the file descriptor associated with the stdio stream pointer
Don't use <stdio.h> at all, that way you don't need to worry about flush either - all writes will go to the device immediately, and for character devices the write() call won't even return until the lower-level IO has completed (in theory).
For device-level IO I'd say it's pretty unusual to use stdio. I'd strongly recommend using the lower-level open(), read() and write() functions instead (based on your later reply):
int fd = open("/dev/i2c", O_RDWR);
ioctl(fd, IOCTL_COMMAND, args);
write(fd, buf, length);
fflush() only flushes the buffering added by the stdio fopen() layer, as managed by the FILE * object. The underlying file itself, as seen by the kernel, is not buffered at this level. This means that writes that bypass the FILE * layer, using fileno() and a raw write(), are also not buffered in a way that fflush() would flush.
As others have pointed out, try not mixing the two. If you need to use "raw" I/O functions such as ioctl(), then open() the file yourself directly, without using fopen<() and friends from stdio.
Have you tried disabling buffering?
setvbuf(fd, NULL, _IONBF, 0);
It sounds like what you are looking for is the fsync() function (or fdatasync()?), or you could use the O_SYNC flag in your open() call.
If you want to go the other way round (associate FILE* with existing file descriptor), use fdopen() :
FDOPEN(P)
NAME
fdopen - associate a stream with a file descriptor
SYNOPSIS
#include <stdio.h>
FILE *fdopen(int fildes, const char *mode);

Resources