How to use fread correctly? - c

I've seen two ways of providing the size of a file to fread.
If lets say I have a char array data, a file pointer and filesize is the size of the file in bytes then the first is:
fread(data, 1, filesize, file);
And the second:
fread(data, filesize, 1, file);
My question is, is there any difference between the two lines of code?
and which line of code is more "correct".
Also, I'm assuming the 1 in the two lines of code actually means sizeof(char), is that correct?

Argument 2: size of each member
Argument 3: Number of objects you want to read
Now your question:
is there any difference between the two lines of code?
fread(data, 1, filesize, file);
Reads filesize objects pointed by data where size of each object is 1 byte. If less than filesize bytes are read, those would be read partially.
fread(data, filesize, 1, file);
Reads 1 object pointed by data where size of this object is filesize bytes. If less than filesize bytes are available, none would be read.
Do whatever is the requirement of your program.

The first tells fread to read elements of size 1, filesize of them.
The second tells fread to read filesize elements of size of 1.
In theory both produce the same result.

In practice and theory both produce the same result. But if you respect the 'fread' standard, the first line is the correct one.

Related

C - Writing a buffer as binary file (wav)

I am reading a wav file as binary, putting it in a buffer and I want to write the exact same wav file again.
Here is my code so far:
file = fopen("tone1.wav", "rb");
file3 = fopen("outout.wav","wb");
fseek(file, 0, SEEK_END);
fileLen=ftell(file);
fseek(file, 0, SEEK_SET);
buffer=(char *)malloc(fileLen+1);
buffer3=(char *)malloc(fileLen+1);
fread(buffer, fileLen, 1, file);
for (int i=0;i<fileLen+1;++i){
buffer3[i]=buffer[i];
fwrite(buffer3,sizeof(buffer3),1,file3);
}
fclose(file);
fclose(file3);
free(buffer);
free(buffer3);
The problem is that the outout wav file comes empty and unplayable.
I am not sure what i'm doing wrong. If I replace fwrite(buffer3,sizeof(buffer3),1,file3); by fwrite(buffer3,sizeof(buffer3),1048,file3); (let's say 1048) I get something playable but not the entire wav with a loop in it.
Can anyone tell me what's the problem? Maybe it's the for loop length that's wrong maybe i shouldn't put fileLen as a limit to it? What should i replace 1 by?
Thanks in advance
Please observe the following:
the fact that the file is "raw" and has a ".wav" extension, as you put it, does neither mean it is a wav file nor makes a wav file out of it. In order to become a wav file it needs a valid WAV header and requires proper audio file API for reading and writing. What you're reading and copying is headerless data of unknown format and unknown endianness.
if you want use to standard library C functions for copying contents from one file to another, you do it on the byte level, without interpreting the content, which is what you do.
In that case, there are few issues in your code:
redundant padding of the buffers and casting the return of malloc in C:
buffer = malloc(fileLen); should work.
ambiguous logic: why do you, upon having read the source file in one pass, both copy buffers and write to the destination file byte-per-byte, inside the loop?
even if so, you are still passing incorrect arguments to fread and fwrite functions, please, check man pages. fread(buffer, 1, fileLen, file); should fix the read. (1 equals sizeof (char)).
why do you need a redundant buffer buffer3 if you don't interpret the content of file?
even if so, you are still passing incorrect arguments to your functions inside the loop. This should do the fix:
for (int i=0; i<fileLen; i++){
buffer3[i]=buffer[i];
fwrite(&(buffer3[i]),1,1,file3);
}
Generally one doesn't know the size of the file in advance, before opening the source file. So one allocates the buffer of reasonable size, then reads and writes in chunks determined by the size of the buffer. In that case your read routine should also handle the buffer underflow condition.
Do not write inside the loop.
Or write diferent characters every time (not always buffer3[0]).
fread(buffer, fileLen, 1, file);
for (int i = 0; i < fileLen; ++i) {
// transform and copy
buffer3[i] = transform(buffer[i]);
//fwrite(buffer3 + i, 1, 1, file3); // write 1 character only
}
fwrite(buffer3, fileLen, 1, file3); // write all the new buffer
I solved it by putting fwrite(buffer3,sizeof(char),78000,file3); outside of the for loop, with 78000 being the size of the file1. But my question is, how can i know what is its size by code?

Convert a file into binary buffer in C?

I'm just starting in c/c++. I'm able to write a file from binary :
FILE *myFile= fopen("/mnt/music.mp3", "ab+"); // Doesn't exist
fwrite(binaryBuffer, sizeOfBuffer, 1, myFile);
All I want is to get a new "binaryBuffer" from "myFile"
How I can do that ?
Thanks !
Use the fread function, which works just like fwrite:
char buffer[BUFFER_SIZE]; // declare a buffer
fread(buffer, length, 1, file); //read length amount of bytes into buffer
If you don't know how many bytes to read you can seek to the end of the file to find the length.
(If you read from the same file you just wrote to you will want to rewind)
http://www.cplusplus.com/reference/cstdio/fread/

Significance of two arguments in fread?

When reading the documentation for fread here, it explains that the two arguments after the void *ptr are multiplied together to determine the amount of bytes being read / written in the file. Below is the function header for fread given from the link:
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
My question is, other than the returned value from the function, is there really a behavior or performance difference between calling each of these:
// assume arr is an int[SOME_LARGE_NUMBER] and fp is a FILE*
fread(arr, sizeof(arr), 1, fp);
fread(arr, sizeof(arr) / sizeof(int), sizeof(int), fp);
fread(arr, sizeof(int), sizeof(arr) / sizeof(int), fp);
fread(arr, 1, sizeof(arr), fp);
And which one would generally be the best practice? Or a more general question is, how do I decide what to specify for each of the arguments in any given scenario?
EDIT
To clarify, I am not asking for a justification of two arguments instead of one, I'm asking for a general approach on deciding what to pass to the arguments in any given scenario. And this answer that Massimiliano linked in the comments and cited only provides two specific examples and doesn't sufficiently explain why that behavior happens.
There is a behavior difference if there is not enough data to satisfy the request. From the page you linked to:
The total number of elements successfully read are returned as a size_t object, which is an integral data type. If this number differs from the nmemb parameter, then either an error had occurred or the End Of File was reached.
So if you specify that there is only one element of size sizeof(arr), and there is not enough data to fill arr, then you won't get any data returned. If you do:
fread(arr, sizeof(int), sizeof(arr) / sizeof(int), fp);
then arr will be partially filled if there is not enough data.
The third line of your code most naturally fits the API of fread. However, you could use one of the other forms if you document why you are not doing the normal thing.

fseek behavior with binary file

I'm working with a binary format.
I've noticed that
fseek(fp, offset, SEEK_SET);
fread(&mystruct, sizeof(struct mystruct_thing), 1, fp);
produces output that's different from simply
fread(&mystruct, sizeof(struct mystruct_thing), 1, fp);
which follows expected behavior.
Why is this the case? Is it because SEEK_SET overrides the offset parameter?
The second argument of fread is the size of each item to be read, in this case the struct. I don't understand how you would expect the offset to go there – it should be something like sizeof(mystruct).
Edit: Now that the question has been edited, the reason why the two pieces of code produce different results is simply that the first one seeks the position of fp to offset before reading and the second one doesn't. fread reads sizeof(struct mystruct_thing) bytes starting from the current position of fp, so the starting position differs (assuming offset is not the same position at which you are already at before fseek) because fseek sets the position for future reads (and writes).
The first fragment will read a struct from offset bytes into the file, the second fragment will read it from the current file position - if the file has just been opened, that will be zero.
The obvious explanation perhaps is that offset is not equal to zero.

How does fread really work?

The declaration of fread is as following:
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
The question is: Is there a difference in reading performance of two such calls to fread:
char a[1000];
fread(a, 1, 1000, stdin);
fread(a, 1000, 1, stdin);
Will it read 1000 bytes at once each time?
There may or may not be any difference in performance. There is a difference in semantics.
fread(a, 1, 1000, stdin);
attempts to read 1000 data elements, each of which is 1 byte long.
fread(a, 1000, 1, stdin);
attempts to read 1 data element which is 1000 bytes long.
They're different because fread() returns the number of data elements it was able to read, not the number of bytes. If it reaches end-of-file (or an error condition) before reading the full 1000 bytes, the first version has to indicate exactly how many bytes it read; the second just fails and returns 0.
In practice, it's probably just going to call a lower-level function that attempts to read 1000 bytes and indicates how many bytes it actually read. For larger reads, it might make multiple lower-level calls. The computation of the value to be returned by fread() is different, but the expense of the calculation is trivial.
There may be a difference if the implementation can tell, before attempting to read the data, that there isn't enough data to read. For example, if you're reading from a 900-byte file, the first version will read all 900 bytes and return 900, while the second might not bother to read anything. In both cases, the file position indicator is advanced by the number of characters successfully read, i.e., 900.
But in general, you should probably choose how to call it based on what information you need from it. Read a single data element if a partial read is no better than not reading anything at all. Read in smaller chunks if partial reads are useful.
According to the specification, the two may be treated differently by the implementation.
If your file is less than 1000 bytes, fread(a, 1, 1000, stdin) (read 1000 elements of 1 byte each) will still copy all the bytes until EOF. On the other hand, the result of fread(a, 1000, 1, stdin) (read 1 1000-byte element) stored in a is unspecified, because there is not enough data to finish reading the 'first' (and only) 1000 byte element.
Of course, some implementations may still copy the 'partial' element into as many bytes as needed.
That would be implementation detail. In glibc, the two are identical in performance, as it's implemented basically as (Ref http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofread.c):
size_t fread (void* buf, size_t size, size_t count, FILE* f)
{
size_t bytes_requested = size * count;
size_t bytes_read = read(f->fd, buf, bytes_requested);
return bytes_read / size;
}
Note that the C and POSIX standard does not guarantee a complete object of size size need to be read every time. If a complete object cannot be read (e.g. stdin only has 999 bytes but you've requested size == 1000), the file will be left in an interdeterminate state (C99 §7.19.8.1/2).
Edit: See the other answers about POSIX.
fread calls getc internally. in Minix number of times getc is called is simply size*nmemb so how many times getc will be called depends on the product of these two. So Both fread(a, 1, 1000, stdin) and fread(a, 1000, 1, stdin) will run getc 1000=(1000*1) Times.
Here is the siimple implementation of fread from Minix
size_t fread(void *ptr, size_t size, size_t nmemb, register FILE *stream){
register char *cp = ptr;
register int c;
size_t ndone = 0;
register size_t s;
if (size)
while ( ndone < nmemb ) {
s = size;
do {
if ((c = getc(stream)) != EOF)
*cp++ = c;
else
return ndone;
} while (--s);
ndone++;
}
return ndone;
}
There may be no performance difference, but those calls are not the same.
fread returns the number of elements read, so those calls will return different values.
If an element cannot be completely read, its value is indeterminate:
If an error occurs, the resulting value of the file position indicator for the stream is
indeterminate. If a partial element is read, its value is indeterminate. (ISO/IEC 9899:TC2 7.19.8.1)
There's not much difference in the glibc implementation, which just multiplies the element size by the number of elements to determine how many bytes to read and divides the amount read by the member size in the end. But the version specifying an element size of 1 will always tell you the correct number of bytes read. However, if you only care about completely read elements of a certain size, using the other form saves you from doing a division.
One more sentence form http://pubs.opengroup.org/onlinepubs/000095399/functions/fread.html is notable
The fread() function shall read into the array pointed to by ptr up to nitems elements whose size is specified by size in bytes, from the stream pointed to by stream. For each object, size calls shall be made to the fgetc() function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object.
Inshort in both case data will be accessed by fgetc()...!
I wanted to clarify the answers here. fread performs buffered IO. The actual read block sizes fread uses are determined by the C implementation being used.
All modern C libraries will have the same performance with the two calls:
fread(a, 1, 1000, file);
fread(a, 1000, 1, file);
Even something like:
for (int i=0; i<1000; i++)
a[i] = fgetc(file)
Should result in the same disk access patterns, although fgetc would be slower due to more calls into the standard c libraries and in some cases the need for a disk to perform additional seeks which would have otherwise been optimized away.
Getting back to the difference between the two forms of fread. The former returns the actual number of bytes read. The latter returns 0 if the file size is less than 1000, otherwise it returns 1. In both cases the buffer would be filled with the same data, i.e. the contents of the file up to 1000 bytes.
In general, you probably want to keep the 2nd parameter (size) set to 1 such that you get the number of bytes read.

Resources