C - How to handle last part of file if buffer is bigger? - c

isn't it possible to read bytes left in a file that is smaller than buffer size?
char * buffer = (char *)malloc(size);
FILE * fp = fopen(filename, "rb");
while(fread(buffer, size, 1, fp)){
// do something
}
Let's assume size is 4 and file size is 17 bytes. I thought fread can handle last operation as well even if bytes left in file is smaller than buffer size, but apparently it just terminates while loop without reading one last byte.
I tried to use lower system call read() but I couldn't read any byte for some reason.
What should I do if fread cannot handle last part of bytes that is smaller than buffer size?

Yep, turn your parameters around.
Instead of requesting one block of size bytes, you should request size blocks of 1 bytes. Then the function will return how many blocks (bytes) it was able to read:
int nread;
while( 0 < (nread = fread(buffer, 1, size, fp)) ) ...

try using "man fread"
it clearly mention following things which itself answers your question:
SYNOPSIS
size_t fread(void *ptr, size_t size, size_t nitems, FILE *stream);
DESCRIPTION
fread() copies, into an array pointed to by ptr, up to nitems items of
data from the named input stream, where an item of data is a sequence
of bytes (not necessarily terminated by a null byte) of length size.
fread() stops appending bytes if an end-of-file or error condition is
encountered while reading stream, or if nitems items have been read.
fread() leaves the file pointer in stream, if defined, pointing to the
byte following the last byte read if there is one.
The argument size is typically sizeof(*ptr) where the pseudo-function
sizeof specifies the length of an item pointed to by ptr.
RETURN VALUE
fread(), return the number of items read.If size or nitems is 0, no
characters are read or written and 0 is returned.
The value returned will be less than nitems only if a read error or
end-of-file is encountered. The ferror() or feof() functions must be
used to distinguish between an error condition and an end-of-file
condition.

Related

Is there a limitation of bytes to be read with fread()

I'm trying to read data from a file into a buffer. The data in file is of 900K bytes. (seek to end of file and ftell()). Allocated the buffer in which the data is to be read of size 900K + 1 (to null terminate). My question is that fread() returns 900K but the I see the strlen(buffer) it shows lesser value and in the buffer at the last I can see something like ".....(truncated)". Why is this behavior? Is there a limit with fread() beyond which we cannot read into buffer and it will truncate it. Also why the return value of fread() says 900K even though actually it has read even less.?
strlen does something along these lines:
int strlen(char *str)
{
int len = 0;
while(*str++) len++;
return len;
}
If your file contains binary data (or if it's a text file with a UTF encoding and unused upper bytes) strlen is going to stop at the first 0x00 byte it encounters and return how many bytes into the file that was encountered. If you read a text file in a single-byte encoding like ANSI there won't be a null terminator and calling strlen will invoke undefined behavior.
If you want to determine how many bytes that fread successfully read out of the file, check its return value.1
If you want to determine the file size before reading a file, do this:
size_t len;
fseek(fp, 0, SEEK_END);
len = ftell(fp);
rewind(fp);
len will contain the file's size in bytes.
1: Assuming you called fread with parameter 2 set to 1 byte per element and didn't try to read more bytes than are actually in the file.
Your main question has already been answered, though it's worth notice that strlen is not designed to measure the size of an array but a NULL-terminated string. It probably prints a lower value because strlen returns the number of characters that appear before a null-char, so if you have nullchars ('\0') through your data, strlen will stop as soon as it finds one of them.
You should trust fread 's return value.
EDIT: as a note, fread MAY read less bytes than requested, and it can be caused by an error or an end of file. You can check it with ferror and feof, respectively.

fread() return value in C

I am trying to understand how the fread() function in <stdio.h> works and I am confused about the return value of this function. In the man pages it says
RETURN VALUE
On success, fread() and fwrite() return the number of items read
or written. This number equals the number of bytes transferred only
when size is 1. If an error occurs, or the end of the file is
reached, the return value is a short item count (or zero).
fread() does not distinguish between end-of-file and error, and
callers must use feof(3) and ferror(3) to determine which occurred.
Could someone please explain to me what is meant by number of items read or written in this context. Also can anyone provide me with some example return values and their meanings?
fread() will read a count of items of a specified size from the provided IO stream (FILE* stream). It returns the number of items successfully read from the stream. If it returns a number less than the requested number of items, the IO stream can be considered empty (or otherwise broken).
The number of bytes read will be equal to the number of items successfully read times the provided size of the items.
Consider the following program.
#include <stdio.h>
int main() {
char buf[8];
size_t ret = fread(buf, sizeof(*buf), sizeof(buf)/sizeof(*buf), stdin);
printf("read %zu bytes\n", ret*sizeof(*buf));
return 0;
}
When we run this program, depending on the amount of input provided, different outcomes can be observed.
We can provide no input at all. The IO stream will be empty to begin with (EOF). The return value will be zero. No items have been read. Zero is returned.
$ : | ./a.out
read 0 bytes
We provide fewer input as requested. Some items will be read before EOF is encountered. The number of items read is returned. No more items are available. Thereafter the stream is empty.
$ echo "Hello" | ./a.out
read 6 bytes
We provide equal or more input as requested. The number of items as requested will be returned. More items may be available.
$ echo "Hello World" | ./a.out
read 8 bytes
Related reading:
What is the rationale for fread/fwrite taking size and count as arguments?
When there are less bytes in the stream than consitute an item, the number of bytes consumed from the stream might however be greater than the number of bytes read as calculated by above formula. This answer to above linked question (and the comment to it) I find especially insightful in this matter:
https://stackoverflow.com/a/296305/1025391
The syntax for fread() is
size_t fread(void *ptr, size_t size, size_t nmemb, FILE * stream );
which means,
The function fread() reads nmemb elements of data, each size bytes long, from the stream pointed to by stream, storing them at the location given by ptr.
So, the total number of bytes read will be nmemb * size.
Now, by saying
on success, fread() and fwrite() return the number of items read or written. This number equals the number of bytes transferred only when size is 1.
it means that, the return value will equal the nmemb when the size is 1.
Logic is same, in case of fwrite() also.
EDIT
for example, a fully successful call to fread() like
fread(readbuf, sizeof(int), 5 , stdin);
will return 5 while it will read sizeof(int) * 5 bytes. If we assume sizeof(int) is 4, then total bytes read will be 5 * 4 or 20. As you can see, here, the number of items read or written is not equal to the number of bytes transferred.
OTOH, another fully successful call to fread() like
fread(readbuf, sizeof(char), 5 , stdin);
will also return 5 while it will read sizeof(char) * 5 bytes, i.e., 5 bytes. In this case, as sizeof(char) is 1, so, here, the number of items read or written is equal to the number of bytes transferred. i.e., 5.

fprintf and fwrite (written bytes to fp)

I have question about fprintf and fwrite.
How many bytes are written when this code runs (assuming fp has been correctly set up).
int i = 10000;
fprintf(fp,"%d",i);
fwrite(fp,sizeof(int),1,&i);
When I checked then 5 bytes and 9 bytes respectively. Maybe I am wrong. I thought it is 4 bytes since int. Can someone explain please??? Thanks.
fprintf writes the string 10000 (5 bytes) to the file, while fwrite writes binary representation of 10000 (sizeof(int) bytes) to the file.
How are you checking the number of bytes written? sizeof(int) depends on platform.
Given below is the function signature for fwrite.
size_t fwrite ( const void * ptr, size_t size, size_t count, FILE * stream );
fwrite writes an array of count elements, each one with a size of size bytes, from the block of memory pointed by ptr to stream. The return value gives the actual number of bytes written. Mostly it is going to be size * count.
Similarly fprintf returns the number of characters written/printed.
fprintf(fp,"%d",i); writes 5 bytes. it writes 10000 as string, 5 chars

Size of information in last fread() in C

#define MAXSIZE 256
fread(buff, sizeof(MAXSIZE), 1, infp);
Say at most we need to read 3 times, and after reading 2 times, the remaining stuff in infp is less than the size of MAXSIZE. How do we determine the size of information at the last read?
You can just check the return value of fread():
Return value
Number of objects read successfully, which may be less than count if an error or end-of-file condition occurs.
Like this:
size_t num = fread(...);
P.S.: as #chux commented, you are actually need to use fread(buff, MAXSIZE, 1, infp) instead.
From fread man page
On success, fread() and fwrite() return the number of items read or written. This number equals the number of bytes transferred only when size is 1. If an error occurs, or the end of the file is reached, the return value is a short item count (or zero).
fread() does not distinguish between end-of-file and error, and callers must use feof(3) and ferror(3) to determine which occurred.
Man fread

How does fread really work?

The declaration of fread is as following:
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
The question is: Is there a difference in reading performance of two such calls to fread:
char a[1000];
fread(a, 1, 1000, stdin);
fread(a, 1000, 1, stdin);
Will it read 1000 bytes at once each time?
There may or may not be any difference in performance. There is a difference in semantics.
fread(a, 1, 1000, stdin);
attempts to read 1000 data elements, each of which is 1 byte long.
fread(a, 1000, 1, stdin);
attempts to read 1 data element which is 1000 bytes long.
They're different because fread() returns the number of data elements it was able to read, not the number of bytes. If it reaches end-of-file (or an error condition) before reading the full 1000 bytes, the first version has to indicate exactly how many bytes it read; the second just fails and returns 0.
In practice, it's probably just going to call a lower-level function that attempts to read 1000 bytes and indicates how many bytes it actually read. For larger reads, it might make multiple lower-level calls. The computation of the value to be returned by fread() is different, but the expense of the calculation is trivial.
There may be a difference if the implementation can tell, before attempting to read the data, that there isn't enough data to read. For example, if you're reading from a 900-byte file, the first version will read all 900 bytes and return 900, while the second might not bother to read anything. In both cases, the file position indicator is advanced by the number of characters successfully read, i.e., 900.
But in general, you should probably choose how to call it based on what information you need from it. Read a single data element if a partial read is no better than not reading anything at all. Read in smaller chunks if partial reads are useful.
According to the specification, the two may be treated differently by the implementation.
If your file is less than 1000 bytes, fread(a, 1, 1000, stdin) (read 1000 elements of 1 byte each) will still copy all the bytes until EOF. On the other hand, the result of fread(a, 1000, 1, stdin) (read 1 1000-byte element) stored in a is unspecified, because there is not enough data to finish reading the 'first' (and only) 1000 byte element.
Of course, some implementations may still copy the 'partial' element into as many bytes as needed.
That would be implementation detail. In glibc, the two are identical in performance, as it's implemented basically as (Ref http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofread.c):
size_t fread (void* buf, size_t size, size_t count, FILE* f)
{
size_t bytes_requested = size * count;
size_t bytes_read = read(f->fd, buf, bytes_requested);
return bytes_read / size;
}
Note that the C and POSIX standard does not guarantee a complete object of size size need to be read every time. If a complete object cannot be read (e.g. stdin only has 999 bytes but you've requested size == 1000), the file will be left in an interdeterminate state (C99 ยง7.19.8.1/2).
Edit: See the other answers about POSIX.
fread calls getc internally. in Minix number of times getc is called is simply size*nmemb so how many times getc will be called depends on the product of these two. So Both fread(a, 1, 1000, stdin) and fread(a, 1000, 1, stdin) will run getc 1000=(1000*1) Times.
Here is the siimple implementation of fread from Minix
size_t fread(void *ptr, size_t size, size_t nmemb, register FILE *stream){
register char *cp = ptr;
register int c;
size_t ndone = 0;
register size_t s;
if (size)
while ( ndone < nmemb ) {
s = size;
do {
if ((c = getc(stream)) != EOF)
*cp++ = c;
else
return ndone;
} while (--s);
ndone++;
}
return ndone;
}
There may be no performance difference, but those calls are not the same.
fread returns the number of elements read, so those calls will return different values.
If an element cannot be completely read, its value is indeterminate:
If an error occurs, the resulting value of the file position indicator for the stream is
indeterminate. If a partial element is read, its value is indeterminate. (ISO/IEC 9899:TC2 7.19.8.1)
There's not much difference in the glibc implementation, which just multiplies the element size by the number of elements to determine how many bytes to read and divides the amount read by the member size in the end. But the version specifying an element size of 1 will always tell you the correct number of bytes read. However, if you only care about completely read elements of a certain size, using the other form saves you from doing a division.
One more sentence form http://pubs.opengroup.org/onlinepubs/000095399/functions/fread.html is notable
The fread() function shall read into the array pointed to by ptr up to nitems elements whose size is specified by size in bytes, from the stream pointed to by stream. For each object, size calls shall be made to the fgetc() function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object.
Inshort in both case data will be accessed by fgetc()...!
I wanted to clarify the answers here. fread performs buffered IO. The actual read block sizes fread uses are determined by the C implementation being used.
All modern C libraries will have the same performance with the two calls:
fread(a, 1, 1000, file);
fread(a, 1000, 1, file);
Even something like:
for (int i=0; i<1000; i++)
a[i] = fgetc(file)
Should result in the same disk access patterns, although fgetc would be slower due to more calls into the standard c libraries and in some cases the need for a disk to perform additional seeks which would have otherwise been optimized away.
Getting back to the difference between the two forms of fread. The former returns the actual number of bytes read. The latter returns 0 if the file size is less than 1000, otherwise it returns 1. In both cases the buffer would be filled with the same data, i.e. the contents of the file up to 1000 bytes.
In general, you probably want to keep the 2nd parameter (size) set to 1 such that you get the number of bytes read.

Resources