When reading the documentation for fread here, it explains that the two arguments after the void *ptr are multiplied together to determine the amount of bytes being read / written in the file. Below is the function header for fread given from the link:
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
My question is, other than the returned value from the function, is there really a behavior or performance difference between calling each of these:
// assume arr is an int[SOME_LARGE_NUMBER] and fp is a FILE*
fread(arr, sizeof(arr), 1, fp);
fread(arr, sizeof(arr) / sizeof(int), sizeof(int), fp);
fread(arr, sizeof(int), sizeof(arr) / sizeof(int), fp);
fread(arr, 1, sizeof(arr), fp);
And which one would generally be the best practice? Or a more general question is, how do I decide what to specify for each of the arguments in any given scenario?
EDIT
To clarify, I am not asking for a justification of two arguments instead of one, I'm asking for a general approach on deciding what to pass to the arguments in any given scenario. And this answer that Massimiliano linked in the comments and cited only provides two specific examples and doesn't sufficiently explain why that behavior happens.
There is a behavior difference if there is not enough data to satisfy the request. From the page you linked to:
The total number of elements successfully read are returned as a size_t object, which is an integral data type. If this number differs from the nmemb parameter, then either an error had occurred or the End Of File was reached.
So if you specify that there is only one element of size sizeof(arr), and there is not enough data to fill arr, then you won't get any data returned. If you do:
fread(arr, sizeof(int), sizeof(arr) / sizeof(int), fp);
then arr will be partially filled if there is not enough data.
The third line of your code most naturally fits the API of fread. However, you could use one of the other forms if you document why you are not doing the normal thing.
Related
I have the following C code that opens a file in rb+ mode, then writes 100 bytes of value 0. When I read the file with an offset of anything other than 0, I get 96. Why is this?
FILE *fp = fopen("myfile", "rb+");
rewind(fp);
char zero = 0;
fwrite(&zero, 1, 100, fp);
char result;
fseek(fp, 1, SEEK_SET);
fread(&result, 1, 1, fp);
printf("%d\n", result);
I'm on Linux x64 using GCC.
From your clarification in the comments, your intent was to write 100 zero bytes to the file. There are at least two and possibly three ways to do this.
The first is to allocate an array of 100 zero-initialized bytes, and write that:
char zeroes[100] = { 0 };
fwrite(zeroes, sizeof(char) /* == 1 */, sizeof(zeroes), f);
This doesn't scale well if you want to write, say, 10,000 or 10,000,000 zero bytes. You could also do this:
char zero = 0;
for (int i = 0; i < 100; ++i) fwrite(&zero, sizeof(char), 1, f);
This scales better, but performs very badly since it's always more efficient to do a single large write than many tiny writes. Instead, you can seek to a later position in the file, and then write only the last byte. On POSIX systems, this is guaranteed to fill the earlier unwritten portion of the file with zeroes:
char zero = 0;
fseek(f, 99, SEEK_SET);
fwrite(&zero, sizeof(char) /* == 1 */, 1, f);
I believe the zero-fill guarantee is also given for Windows' MSVCRT runtime, but I can't immediately find proof of that on MSDN (this might make a good question). If someone knows whether Windows, other platforms, and/or some version(s) of the C standard itself make or do not make this guarantee, this answer could be improved.
Of course, if you are on a POSIX system and don't need portable code, you can use ftruncate() which makes the same guarantee without even needing to do an fwrite(). Windows has SetEndOfFile() but that function fills the extended portion of the file with undefined values, not zero bytes.
You probably want something like:
int i; for (i=0;i<100;++i){fwrite(&zero, 1, 1, fp);}
You cannot write 100 bytes from a pointer that points to a single char.
I've seen two ways of providing the size of a file to fread.
If lets say I have a char array data, a file pointer and filesize is the size of the file in bytes then the first is:
fread(data, 1, filesize, file);
And the second:
fread(data, filesize, 1, file);
My question is, is there any difference between the two lines of code?
and which line of code is more "correct".
Also, I'm assuming the 1 in the two lines of code actually means sizeof(char), is that correct?
Argument 2: size of each member
Argument 3: Number of objects you want to read
Now your question:
is there any difference between the two lines of code?
fread(data, 1, filesize, file);
Reads filesize objects pointed by data where size of each object is 1 byte. If less than filesize bytes are read, those would be read partially.
fread(data, filesize, 1, file);
Reads 1 object pointed by data where size of this object is filesize bytes. If less than filesize bytes are available, none would be read.
Do whatever is the requirement of your program.
The first tells fread to read elements of size 1, filesize of them.
The second tells fread to read filesize elements of size of 1.
In theory both produce the same result.
In practice and theory both produce the same result. But if you respect the 'fread' standard, the first line is the correct one.
I know that fwrite takes the following parameters:
fwrite ( const void * ptr, size_t size, size_t count, FILE * stream );
As far as I know, size_t is a typedef and nothing else than:
typedef unsigned long size_t;
Is it possible to use values greater than size_t for count and write?
And if it is not could I connect the written blocks somehow?
No, fwrite accepts only values that fit in the size_t type.
There may be implementation-specific ways to write more but, for standard C, the approach is generally just to do sequential fwrite calls. Each subsequent call will append to what you've already written.
And keep in mind that size_t is a distinct type. It may be defined as an unsigned long is some implementations but that's not guaranteed.
From an application programmers point of view a file is a contiguous series of bytes. Successive writes will position the data sequentially onto a file. (This comment is necessary because some will argue details NOT relevant to your question).
Thus:
fwrite(&user_record1, sizeof(user_record1), 1, fp);
fwrite(&user_record2, sizeof(user_record2), 1, fp);
Results in two user records, one immediately following the other, on the file.
If you have a very large record, then divide it into two smaller records, as:
fwrite(&user_record_parta, sizeof(user_record1), 1, fp);
fwrite(&user_record_partb, sizeof(user_record2), 1, fp);
However, I would question an application design that uses such large records. Perhaps what you are really doing in the application is writing an array of user records and that array grows really large. If this is the case, write each entry of the array, rather than the whole array.
In order to use fwrite, the entire object you're writing must be in the object pointed to by ptr. Unless you have a really messed-up C implementation, it's impossible to have an object larger than the maximum value of size_t, so trying to write more bytes than that would be a programming error, since the pointed-to object is not actually that large anyway.
If you use greater values than ULONG_MAX it will just wrap around to 0.
What you can do is write ULONG_MAX Bytes, seek to the end position and continue writing, then do that over in a loop, until you've written all your data.
I am wondering if this is the best way to go about solving my problem.
I know the values for particular offsets of a binary file where the information I want is held...What I want to do is jump to the offsets and then read a certain amount of bytes, starting from that location.
After using google, I have come to the conclusion that my best bet is to use fseek() to move to the position of the offset, and then to use fread() to read an amount of bytes from that position.
Am I correct in thinking this? And if so, how is best to go about doing so? i.e. how to incorporate the two together.
If I am not correct, what would you suggest I do instead?
Many thanks in advance for your help.
Matt
Edit:
I followed a tutorial on fread() and adjusted it to the following:
`#include <stdio.h>
int main()
{
FILE *f;
char buffer[11];
if (f = fopen("comm_array2.img", "rt"))
{
fread(buffer, 1, 10, f);
buffer[10] = 0;
fclose(f);
printf("first 10 characters of the file:\n%s\n", buffer);
}
return 0;
}`
So I used the file 'comm_array2.img' and read the first 10 characters from the file.
But from what I understand of it, this goes from start-of-file, I want to go from some-place-in-file (offset)
Is this making more sense?
Edit Number 2:
It appears that I was being a bit dim, and all that is needed (it would seem from my attempt) is to put the fseek() before the fread() that I have in the code above, and it seeks to that location and then reads from there.
If you are using file streams instead of file descriptors, then you can write yourself a (simple) function analogous to the POSIX pread() system call.
You can easily emulate it using streams instead of file descriptors1. Perhaps you should write yourself a function such as this (which has a slightly different interface from the one I suggested in a comment):
size_t fpread(void *buffer, size_t size, size_t mitems, size_t offset, FILE *fp)
{
if (fseek(fp, offset, SEEK_SET) != 0)
return 0;
return fread(buffer, size, nitems, fp);
}
This is a reasonable compromise between the conventions of pread() and fread().
What would the syntax of the function call look like? For example, reading from the offset 732 and then again from offset 432 (both being from start of the file) and filestream called f.
Since you didn't say how many bytes to read, I'm going to assume 100 each time. I'm assuming that the target variables (buffers) are buffer1 and buffer2, and that they are both big enough.
if (fpread(buffer1, 100, 1, 732, f) != 1)
...error reading at offset 732...
if (fpread(buffer2, 100, 1, 432, f) != 1)
...error reading at offset 432...
The return count is the number of complete units of 100 bytes each; either 1 (got everything) or 0 (something went awry).
There are other ways of writing that code:
if (fpread(buffer1, sizeof(char), 100, 732, f) != 100)
...error reading at offset 732...
if (fpread(buffer2, sizeof(char), 100, 432, f) != 100)
...error reading at offset 432...
This reads 100 single bytes each time; the test ensures you got all 100 of them, as expected. If you capture the return value in this second example, you can know how much data you did get. It would be very surprising if the first read succeeded and the second failed; some other program (or thread) would have had to truncate the file between the two calls to fpread(), but funnier things have been known to happen.
1 The emulation won't be perfect; the pread() call provides guaranteed atomicity that the combination of fseek() and fread() will not provide. But that will seldom be a problem in practice, unless you have multiple processes or threads concurrently updating the file while you are trying to position and read from it.
It frequently depends on the distance between the parts you care about. If you're only skipping over/ignoring a few bytes between the parts you care about, it's often easier to just read that data and ignore what you read, rather than using fseek to skip past it. A typical way to do this is define a struct holding both the data you care about, and place-holders for the ones you don't care about, read in the struct, and then just use the parts you care about:
struct whatever {
long a;
long ignore;
short b;
} w;
fread(&w, 1, sizeof(w), some_file);
// use 'w.a' and 'w.b' here.
If there's any great distance between the parts you care about, though, chances are that your original idea of using fseek to get to the parts that matter will be simpler.
Your theory sounds correct. Open, seek, read, close.
Create a struct to for the data you want to read and pass a pointer to read() of struct's allocated memory. You'll likely need #pragma pack(1) or similar on the struct to prevent misalignment problems.
The declaration of fread is as following:
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
The question is: Is there a difference in reading performance of two such calls to fread:
char a[1000];
fread(a, 1, 1000, stdin);
fread(a, 1000, 1, stdin);
Will it read 1000 bytes at once each time?
There may or may not be any difference in performance. There is a difference in semantics.
fread(a, 1, 1000, stdin);
attempts to read 1000 data elements, each of which is 1 byte long.
fread(a, 1000, 1, stdin);
attempts to read 1 data element which is 1000 bytes long.
They're different because fread() returns the number of data elements it was able to read, not the number of bytes. If it reaches end-of-file (or an error condition) before reading the full 1000 bytes, the first version has to indicate exactly how many bytes it read; the second just fails and returns 0.
In practice, it's probably just going to call a lower-level function that attempts to read 1000 bytes and indicates how many bytes it actually read. For larger reads, it might make multiple lower-level calls. The computation of the value to be returned by fread() is different, but the expense of the calculation is trivial.
There may be a difference if the implementation can tell, before attempting to read the data, that there isn't enough data to read. For example, if you're reading from a 900-byte file, the first version will read all 900 bytes and return 900, while the second might not bother to read anything. In both cases, the file position indicator is advanced by the number of characters successfully read, i.e., 900.
But in general, you should probably choose how to call it based on what information you need from it. Read a single data element if a partial read is no better than not reading anything at all. Read in smaller chunks if partial reads are useful.
According to the specification, the two may be treated differently by the implementation.
If your file is less than 1000 bytes, fread(a, 1, 1000, stdin) (read 1000 elements of 1 byte each) will still copy all the bytes until EOF. On the other hand, the result of fread(a, 1000, 1, stdin) (read 1 1000-byte element) stored in a is unspecified, because there is not enough data to finish reading the 'first' (and only) 1000 byte element.
Of course, some implementations may still copy the 'partial' element into as many bytes as needed.
That would be implementation detail. In glibc, the two are identical in performance, as it's implemented basically as (Ref http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofread.c):
size_t fread (void* buf, size_t size, size_t count, FILE* f)
{
size_t bytes_requested = size * count;
size_t bytes_read = read(f->fd, buf, bytes_requested);
return bytes_read / size;
}
Note that the C and POSIX standard does not guarantee a complete object of size size need to be read every time. If a complete object cannot be read (e.g. stdin only has 999 bytes but you've requested size == 1000), the file will be left in an interdeterminate state (C99 ยง7.19.8.1/2).
Edit: See the other answers about POSIX.
fread calls getc internally. in Minix number of times getc is called is simply size*nmemb so how many times getc will be called depends on the product of these two. So Both fread(a, 1, 1000, stdin) and fread(a, 1000, 1, stdin) will run getc 1000=(1000*1) Times.
Here is the siimple implementation of fread from Minix
size_t fread(void *ptr, size_t size, size_t nmemb, register FILE *stream){
register char *cp = ptr;
register int c;
size_t ndone = 0;
register size_t s;
if (size)
while ( ndone < nmemb ) {
s = size;
do {
if ((c = getc(stream)) != EOF)
*cp++ = c;
else
return ndone;
} while (--s);
ndone++;
}
return ndone;
}
There may be no performance difference, but those calls are not the same.
fread returns the number of elements read, so those calls will return different values.
If an element cannot be completely read, its value is indeterminate:
If an error occurs, the resulting value of the file position indicator for the stream is
indeterminate. If a partial element is read, its value is indeterminate. (ISO/IEC 9899:TC2 7.19.8.1)
There's not much difference in the glibc implementation, which just multiplies the element size by the number of elements to determine how many bytes to read and divides the amount read by the member size in the end. But the version specifying an element size of 1 will always tell you the correct number of bytes read. However, if you only care about completely read elements of a certain size, using the other form saves you from doing a division.
One more sentence form http://pubs.opengroup.org/onlinepubs/000095399/functions/fread.html is notable
The fread() function shall read into the array pointed to by ptr up to nitems elements whose size is specified by size in bytes, from the stream pointed to by stream. For each object, size calls shall be made to the fgetc() function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object.
Inshort in both case data will be accessed by fgetc()...!
I wanted to clarify the answers here. fread performs buffered IO. The actual read block sizes fread uses are determined by the C implementation being used.
All modern C libraries will have the same performance with the two calls:
fread(a, 1, 1000, file);
fread(a, 1000, 1, file);
Even something like:
for (int i=0; i<1000; i++)
a[i] = fgetc(file)
Should result in the same disk access patterns, although fgetc would be slower due to more calls into the standard c libraries and in some cases the need for a disk to perform additional seeks which would have otherwise been optimized away.
Getting back to the difference between the two forms of fread. The former returns the actual number of bytes read. The latter returns 0 if the file size is less than 1000, otherwise it returns 1. In both cases the buffer would be filled with the same data, i.e. the contents of the file up to 1000 bytes.
In general, you probably want to keep the 2nd parameter (size) set to 1 such that you get the number of bytes read.