Amount of data read() syscall will actually read - c

Suppose I have a file for which the file descriptor has more than n bytes left until EOF, and I invoke the read() syscall for n bytes. Is the function guaranteed to read n bytes into the buffer? Or can it read less?

The read system call is guaranteed to read as many many characters as you asked for, except when it can't. But it turns out that there are so many exceptions -- so many cases where it can't read as many characters as you asked for -- that it basically ends up being safest to assume that any given read call probably won't read as many characters as you asked for. I believe it's good practice to always write your code with that in mind.
The man page on my system says
The system guarantees to read the number of bytes requested if the descriptor references a normal file that has that many bytes left before the end-of-file, but in no other case.
So if it's not a normal file, or if it is a normal file but there aren't enough characters, you'll get fewer than you asked for. But in the case you asked about, yes, you should be guaranteed to get exactly as many characters as you asked for.
With that said, though, if you find yourself with a choice between assuming that read is allegedly guaranteed to read exactly the number of characters requested, versus acknowledging that it might return less, I would always write the code to assume it might return less. That is, if you have a call like
r = read(fd, buf, n);
there isn't usually much to be gained by assuming that if r is greater than 0, it must be exactly n. Your code has to be able to handle the r < n case so it will behave properly when it's almost at end-of-file, so unless you want to have two different code paths (one for "normal" reads, and one for the last read), you might as well write one piece of code, that can handle the r < n case, and let it operate all the time.
(Also, as Zan Lynx reminds in a comment, don't have the code notice that r < n, and infer from that that end-of-file is coming up soon. Wait for r == 0 before deciding you're at end-of-file.)

You could've read it from the man page yourself:
On Linux, read() (and similar system calls) will transfer at most
0x7ffff000 (2,147,479,552) bytes, returning the number of bytes
actually transferred. (This is true on both 32-bit and 64-bit
systems.)
So even if you had enough RAM and so on, you couldn't read a full-size DVD image in one go - however, this wouldn't be the sane thing to do either; to access such large files, mmap would be better.
Other than that, a signal might be delivered, which can cause exit with EINTR and buffer contents indeterminate.
ERRORS
[...]
EINTR The call was interrupted by a signal before any data was read; see signal(7).

Is the function guaranteed to read n bytes into the buffer? Or can it
read less?
No, even if your file has more than n bytes before its end, the read(fd, buf, n) function is not guaranteed to read n bytes into bufffer and then return n. It can read less and return a positive value that is less than n.
See Linux man page at https://man7.org/linux/man-pages/man2/read.2.html
RETURN VALUE
It is not an error if this number is smaller than the number of
bytes requested; this may happen for example because fewer bytes
are actually available right now (maybe because we were close to
end-of-file, or because we are reading from a pipe, or from a
terminal), or because read() was interrupted by a signal.

Related

Understanding read syscall

I'm reading man read manual page and discovered that it was possible to read less then the desired number of bytes passed in as a parameter:
It is not an error if this number is smaller than the number of bytes
requested; this may happen for example because fewer bytes are
actually available right now (maybe because we were close to
end-of-file, or because we are reading from a pipe, or from a
termi‐nal), or because read() was interrupted by a signal.
I have the following situation:
Some process moved a file into a directory I'm listening to IN_MOVED_TO inotify events.
I receive a IN_MOVED_TO event, open a file and start reading it till the EOF is reached
No other processes modify the moved at 1. file (After it is moved it is left unchanged all the time)
Is it guaranteed that if read returns the number of bytes read less then I requested then the next call to read will return 0? I mean the situation like 'reading 1 000 000 000 by a single bytes for a gigabyte file' is forbidden by the documentation
Is it guaranteed that if read returns the number of bytes read less then I requested then the next call to read will return 0?
No, not in practice. It should be true if the file system is entirely POSIX compliant, but many of them are not (in corner cases). In particular NFS (see nfs(5)) and FUSE or proc (see proc(5)) are not exactly POSIX compliant.
So in practice I strongly recommend handling the "read returns a smaller number of bytes than wanted case", even if you are right to believe that it should not happen. Handling that "impossible" case should be easy for you.
Notice also that inotify(7) facilities don't work with bizarre filesystems like NFS, proc, FUSE, ... Think also of corner cases like, inside an Ext4 file system, a symlink to an NFS file,; or bind mounts, etc...

Finding out the number of chars read/write reads

I'm fairly new to c so bear with me.
How do I go about finding out the number of chars read/write reads?
Can I be more specific and designate the # of chars read/write reads in an argument? If so, how?
From man(2) read:
If successful, the number of bytes actually read is returned
From man(2) write:
Upon successful completion the number of bytes which were written is returned
Now concerning:
Can I be more specific and designate the # of chars read/write reads in an argument? If so, how?
AFAIK no, but there might be some device/kernel specific ways using for example ioctl(2)
C and C++ has different IO libraries. I guess you are coding in C.
fprintf(3) returns (when successful) the number of printed characters.
scanf(3) returns the number of successfully read items, but also accept the %n specifier:
n Nothing is expected; instead, the number of characters
consumed thus far from the input is stored through the next
pointer, which must be a pointer to int.
You could also do IO line by line... (getline, snprintf, sscanf, fputs ....)
for Linux and Posix
If you call directly the read(2) or write(2) functions (i.e. syscalls) they return the number of input or output bytes on success.
And you could use the lseek(2) syscall, or the ftell(3) <stdio.h> function, to query the current file offset (which has no meaning so would fail on non-seekable files like pipes, sockets, FIFOs, ...).
See also FIONREAD

Reading the last chunk of a file open for un-cached direct I/O doesn't produce EOF, normal behavior?

I'm opening a file using CreateFile() with the flags FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH for several reasons, and I've noticed a strange behavior:
Since for using those flags we have to allocate memory aligned to the sector size, let's say the sector size is 512.
Now, if I allocate 512 bytes with _aligned_malloc() and I read from the file, everything works fine if the file size is exactly a multiple of the sector size, let's say 512*4, or 2048. I read pieces of 512 bytes and the last piece makes ReadFile() to return the EOF code, that is, to return FALSE and GetLastError() set as ERROR_HANDLE_EOF.
The problem arise when the file size it not aligned to the sector size, that is, the file's size is let's say 2048+13, or 2061 bytes.
I can successfully read the first 4 512-sized chunks from the file, and a 5th call to ReadFile() lets me to read the latest 13 surplus bytes from the file, but this is the strange thing: in such a case ReadFile() doesn't return the EOF code! Even if I told to ReadFile() to read 512 bytes, and it read only 13 bytes (so it surpassed the end of file), it doesn't tell me that, and returns just 13 bytes read, without no other further information.
So, when I read the last 13 bytes and my loop is set to read until EOF, it will call ReadFile() again for a 6th time, causing an error: ERROR_INVALID_PARAMETER and I guess this is correct, because I'm trying to read after I had surpassed the end of file!
My question is: is this a normal behavior or am I doing something wrong? When using non-buffered I/O, I should expect to not having EOF code when I read the last non-sector-aligned chunk of file? Or there is another way to do that?
How I can understand that I've just passed the EOF?
I guess that I could solve this problem by modifying the loop: instead of reading until EOF, I could read until EOF OR until the actually returned bytes are less than the requested bytes for the reading. Is this a correct assumption?
NOTE: this does not happen when using files with normal flags, it only happens when I use FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH.
NOTE 2: I'm using I/O Completion Ports for reading files, but I guess this happens also without using them, by just using blocking I/O.
EOF is surprisingly hard. Even C's feof function is often misunderstood.
Basically, you get ERROR_HANDLE_EOF in the first case to distinguish the "512 bytes read, more to read" and "512 bytes read, nothing left" cases.
In the seconds case, this is not needed. "512 bytes requested, 13 bytes read, no error" already means that you're at EOF. Any other reason for a partial read would have been an error.

What are the conditions under which a short read/write can occur?

The read and write functions (and relatives like send, recv, readv, ...) can return a number of bytes less than the requested read/write length if interrupted by a signal (under certain circumstances), and perhaps in other cases too. Is there a well-defined set of conditions for when this can happen, or is it largely up to the implementation? Here are some particular questions I'm interested in the answers to:
If a signal handler is non-interrupting (SA_RESTART) that will cause IO operations interrupted before any data is transferred to be restarted after the signal handler returns. But if a partial read/write has already occurred and the signal handler is non-interrupting, will the syscall return immediately with the partial length, or will it be resumed attempting to read/write the remainder?
Obviously read functions can return short reads on network, pipe, and terminal file descriptors when less data than the requested amount is available. But can write functions return short writes in these cases due to limited buffer size, or will they block until all the data can be written?
I'd be interested in all three of standards-required, common, and Linux-specific behavior.
For your second question : write can return short writes for a limited buffer size if it is non-blocking
There's at least one standard condition that can cause write on a regular file to return a short size:
If a write() requests that more bytes
be written than there is room for (for
example, [XSI] the file size limit
of the process or the physical end of
a medium), only as many bytes as there
is room for shall be written. For
example, suppose there is space for 20
bytes more in a file before reaching a
limit. A write of 512 bytes will
return 20. The next write of a
non-zero number of bytes would give a
failure return (except as noted
below).

"short read" from filesystem, when can it happen?

It is obvious that in general the read(2) system call can return less bytes than what was asked to be read. However, quite a few programs assume that when working with a local files, read(2) never returns less than what was asked (unless the file is shorter, of course).
So, my question is: on Linux, in which cases can read(2) return less than what was requested if reading from an open file and EOF is not encountered and the amount being read is a few kilobytes at maximum?
Some guesses:
Can received signals interrupt a read like that, but not make it fail?
Can different filesystems affect this behavior? Is there anything special about jffs2?
POSIX.1-2008 states:
The value returned may be less than
nbyte if the number of bytes left in
the file is less than nbyte, if the
read() request was interrupted by a
signal, or if the file is a pipe or
FIFO or special file and has fewer
than nbyte bytes immediately available
for reading.
Disk-based filesystems generally use uninterruptible reads, which means that the
read operation generally cannot be interrupted by a signal. Network-based
filesystems sometimes use interruptible reads, which can return partial data or no data.
(In the case of NFS this is configurable using the intr mount option.)
They sometimes also implement timeouts.
Keep in mind that even /some/arbitrary/file/path may refer to a FIFO or
special file, so what you thought was a regular file may not be. It is therefore
good practice to handle partial reads even though they may be unlikely.
I have to ask: "why do you care about the reason"? If read can return a number of bytes less than the requested amount (which, as you point out, it certainly can) why would you not want to deal with that situation?
A received signal only makes read() fail if it hasn't yet read a single byte. Otherwise, it will return partial data.
And I guess alternate filesystems may indeed return short reads in other situations. For example, it makes some sense (to me) to have a network-based filesystem behave just like a network socket wrt short reads (= having them often).
If it's really a file you are reading, then you can get short read as the last read before end of file.
Howver, it's generally best to behave as if ANY read could be a short read. If what you are reading is a pipe or an input device (stdin) rather than a file, you can get a short read whenever your buffer is larger than what is currently in the input buffer.
I am not sure but this situation could arise when the OS is running out of pages in the page cache. You could suggest that flush thread will be invoked in that case, but it depends on the heuristic used in the I/O scheduler. This situation could cause a read to return fewer bytes.
What I have always read being called a "short read" is not related to the file access read(2) but to the physical read of a disk sector. It happens when, while reading the data part of the sector, less valid magnetic signals are found than to make the 512 (or 4096 or whatever) bytes of a sector. That makes an invalid sector and a read fault. Regarding "when", or rather why it happens is most probably because the power feeding the drive fell down while that sector was written.
Could it be that a read(2) ends with a physical error code called "short read"?

Resources