Why does fread() give seemingly random data? - c

I have the following C code that opens a file in rb+ mode, then writes 100 bytes of value 0. When I read the file with an offset of anything other than 0, I get 96. Why is this?
FILE *fp = fopen("myfile", "rb+");
rewind(fp);
char zero = 0;
fwrite(&zero, 1, 100, fp);
char result;
fseek(fp, 1, SEEK_SET);
fread(&result, 1, 1, fp);
printf("%d\n", result);
I'm on Linux x64 using GCC.

From your clarification in the comments, your intent was to write 100 zero bytes to the file. There are at least two and possibly three ways to do this.
The first is to allocate an array of 100 zero-initialized bytes, and write that:
char zeroes[100] = { 0 };
fwrite(zeroes, sizeof(char) /* == 1 */, sizeof(zeroes), f);
This doesn't scale well if you want to write, say, 10,000 or 10,000,000 zero bytes. You could also do this:
char zero = 0;
for (int i = 0; i < 100; ++i) fwrite(&zero, sizeof(char), 1, f);
This scales better, but performs very badly since it's always more efficient to do a single large write than many tiny writes. Instead, you can seek to a later position in the file, and then write only the last byte. On POSIX systems, this is guaranteed to fill the earlier unwritten portion of the file with zeroes:
char zero = 0;
fseek(f, 99, SEEK_SET);
fwrite(&zero, sizeof(char) /* == 1 */, 1, f);
I believe the zero-fill guarantee is also given for Windows' MSVCRT runtime, but I can't immediately find proof of that on MSDN (this might make a good question). If someone knows whether Windows, other platforms, and/or some version(s) of the C standard itself make or do not make this guarantee, this answer could be improved.
Of course, if you are on a POSIX system and don't need portable code, you can use ftruncate() which makes the same guarantee without even needing to do an fwrite(). Windows has SetEndOfFile() but that function fills the extended portion of the file with undefined values, not zero bytes.

You probably want something like:
int i; for (i=0;i<100;++i){fwrite(&zero, 1, 1, fp);}
You cannot write 100 bytes from a pointer that points to a single char.

Related

Alternatives of fopen, fread for reading from string?

I have a code which integrates with a certain library I cannot modify and reads from file. I need to turn it into reading from string instead of from file.
string str = "I now have the string from file.txt in memory";
// original code:
FILE *file = fopen("file.txt", "rb");
// ...
uint8_t buffer[128];
// ...
var->var1 = buffer;
// ...
var->var2 = fread(buffer, 1, 128, file);
// ...
So a simple question, what's an alternative for fread for reading from string into buffer?
Platform: Windows
If you don't require a FILE *, if you just need to copy counted characters from one buffer (or string) to another, that's what memcpy is for. For example, to a first approximation, you could replace
fread(buffer, 1, 128, file);
with
memcpy(buffer, str, 128);
Now, this will break pretty badly if str does not contain 128 characters. (If your file had less than 128 characters on it, fread would give you less than 128.) So a safer replacement would be
int n = 128;
if(strlen(str) < n) n = strlen(str);
memcpy(buffer, str, n);
There is a function that does precisely what you want: fmemopen. You hand it a pointer to an in-memory string buffer, and a "r" or "w" flag (just like fopen), and it gives you a regular old FILE * that you can read from or write to -- or, in your case, pass to a function that needs a FILE * to read from or write to.
It's available in glibc and therefore in virtually all versions of Linux. I think it's available in some versions of Unix. I don't seem to have it on my Mac. I'm afraid you'll probably have a hard time finding it for Windows.
If i understand your task correctly, you need to get information out of a string as if they would been written from a file. A very comfortable way to solve that is std::stringstream, if you are using C++.

replace bytes in file c

I got some code and I want improve it to find and replace bytes in file
so
I want to find all origbytes in FILE and then replace it with newbytes
then save file, I know how to open, write and save, but hot I can find bytes in char?
FILE* file;
file = fopen("/Users/Awesome/Desktop/test", "r+b");
int size = sizeof(file)+1;
char bytes [size];
fgets(bytes, size, file);
for (int i=0; i<size; i++){
char origbytes [] = {0x00, 0x00};
char newbytes [] = {0x11, 0x11};
if (strcmp(bytes[i], origbytes)) //Here the problem
{
fseek(file, i, SEEK_SET);
fwrite(newbytes, sizeof(newbytes), 1, file);
}
}
fclose(file);
strcmp() is for string compare and not character compare. Two characters can be compared directly
if ( bytes[i] == origbytes[something] )
Also you you should not apply sizeof() on a file pointer to determine file size. You should seek to the end of file using fseek and then query ftell except for binary files. For binary files, use something like fstat
One more thing to note is that fgets returns much before the EOF if it sees a newline. Hence in your code, you may not read entire file contents even after doing the changes that we suggested. You need to use fread appropriately
Strings are null terminated in the C standard library. Your search data is effectively a zero length string. You want memcmp instead.
memcmp (&bytes [i], origBytes, 2)
Firstly sizeof(file) + 1 just returns you the size of a pointer + 1. I don't think you need this for the size of the file. Use this: How do you determine the size of a file in C?
Then since you compare bytes (more or less smae as char) you simply compare using =
you can use fseek and then ftell functions to get the file size, not sizeof.

Go to a certain point of a binary file in C (using fseek) and then reading from that location (using fread)

I am wondering if this is the best way to go about solving my problem.
I know the values for particular offsets of a binary file where the information I want is held...What I want to do is jump to the offsets and then read a certain amount of bytes, starting from that location.
After using google, I have come to the conclusion that my best bet is to use fseek() to move to the position of the offset, and then to use fread() to read an amount of bytes from that position.
Am I correct in thinking this? And if so, how is best to go about doing so? i.e. how to incorporate the two together.
If I am not correct, what would you suggest I do instead?
Many thanks in advance for your help.
Matt
Edit:
I followed a tutorial on fread() and adjusted it to the following:
`#include <stdio.h>
int main()
{
FILE *f;
char buffer[11];
if (f = fopen("comm_array2.img", "rt"))
{
fread(buffer, 1, 10, f);
buffer[10] = 0;
fclose(f);
printf("first 10 characters of the file:\n%s\n", buffer);
}
return 0;
}`
So I used the file 'comm_array2.img' and read the first 10 characters from the file.
But from what I understand of it, this goes from start-of-file, I want to go from some-place-in-file (offset)
Is this making more sense?
Edit Number 2:
It appears that I was being a bit dim, and all that is needed (it would seem from my attempt) is to put the fseek() before the fread() that I have in the code above, and it seeks to that location and then reads from there.
If you are using file streams instead of file descriptors, then you can write yourself a (simple) function analogous to the POSIX pread() system call.
You can easily emulate it using streams instead of file descriptors1. Perhaps you should write yourself a function such as this (which has a slightly different interface from the one I suggested in a comment):
size_t fpread(void *buffer, size_t size, size_t mitems, size_t offset, FILE *fp)
{
if (fseek(fp, offset, SEEK_SET) != 0)
return 0;
return fread(buffer, size, nitems, fp);
}
This is a reasonable compromise between the conventions of pread() and fread().
What would the syntax of the function call look like? For example, reading from the offset 732 and then again from offset 432 (both being from start of the file) and filestream called f.
Since you didn't say how many bytes to read, I'm going to assume 100 each time. I'm assuming that the target variables (buffers) are buffer1 and buffer2, and that they are both big enough.
if (fpread(buffer1, 100, 1, 732, f) != 1)
...error reading at offset 732...
if (fpread(buffer2, 100, 1, 432, f) != 1)
...error reading at offset 432...
The return count is the number of complete units of 100 bytes each; either 1 (got everything) or 0 (something went awry).
There are other ways of writing that code:
if (fpread(buffer1, sizeof(char), 100, 732, f) != 100)
...error reading at offset 732...
if (fpread(buffer2, sizeof(char), 100, 432, f) != 100)
...error reading at offset 432...
This reads 100 single bytes each time; the test ensures you got all 100 of them, as expected. If you capture the return value in this second example, you can know how much data you did get. It would be very surprising if the first read succeeded and the second failed; some other program (or thread) would have had to truncate the file between the two calls to fpread(), but funnier things have been known to happen.
1 The emulation won't be perfect; the pread() call provides guaranteed atomicity that the combination of fseek() and fread() will not provide. But that will seldom be a problem in practice, unless you have multiple processes or threads concurrently updating the file while you are trying to position and read from it.
It frequently depends on the distance between the parts you care about. If you're only skipping over/ignoring a few bytes between the parts you care about, it's often easier to just read that data and ignore what you read, rather than using fseek to skip past it. A typical way to do this is define a struct holding both the data you care about, and place-holders for the ones you don't care about, read in the struct, and then just use the parts you care about:
struct whatever {
long a;
long ignore;
short b;
} w;
fread(&w, 1, sizeof(w), some_file);
// use 'w.a' and 'w.b' here.
If there's any great distance between the parts you care about, though, chances are that your original idea of using fseek to get to the parts that matter will be simpler.
Your theory sounds correct. Open, seek, read, close.
Create a struct to for the data you want to read and pass a pointer to read() of struct's allocated memory. You'll likely need #pragma pack(1) or similar on the struct to prevent misalignment problems.

How does fread really work?

The declaration of fread is as following:
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
The question is: Is there a difference in reading performance of two such calls to fread:
char a[1000];
fread(a, 1, 1000, stdin);
fread(a, 1000, 1, stdin);
Will it read 1000 bytes at once each time?
There may or may not be any difference in performance. There is a difference in semantics.
fread(a, 1, 1000, stdin);
attempts to read 1000 data elements, each of which is 1 byte long.
fread(a, 1000, 1, stdin);
attempts to read 1 data element which is 1000 bytes long.
They're different because fread() returns the number of data elements it was able to read, not the number of bytes. If it reaches end-of-file (or an error condition) before reading the full 1000 bytes, the first version has to indicate exactly how many bytes it read; the second just fails and returns 0.
In practice, it's probably just going to call a lower-level function that attempts to read 1000 bytes and indicates how many bytes it actually read. For larger reads, it might make multiple lower-level calls. The computation of the value to be returned by fread() is different, but the expense of the calculation is trivial.
There may be a difference if the implementation can tell, before attempting to read the data, that there isn't enough data to read. For example, if you're reading from a 900-byte file, the first version will read all 900 bytes and return 900, while the second might not bother to read anything. In both cases, the file position indicator is advanced by the number of characters successfully read, i.e., 900.
But in general, you should probably choose how to call it based on what information you need from it. Read a single data element if a partial read is no better than not reading anything at all. Read in smaller chunks if partial reads are useful.
According to the specification, the two may be treated differently by the implementation.
If your file is less than 1000 bytes, fread(a, 1, 1000, stdin) (read 1000 elements of 1 byte each) will still copy all the bytes until EOF. On the other hand, the result of fread(a, 1000, 1, stdin) (read 1 1000-byte element) stored in a is unspecified, because there is not enough data to finish reading the 'first' (and only) 1000 byte element.
Of course, some implementations may still copy the 'partial' element into as many bytes as needed.
That would be implementation detail. In glibc, the two are identical in performance, as it's implemented basically as (Ref http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofread.c):
size_t fread (void* buf, size_t size, size_t count, FILE* f)
{
size_t bytes_requested = size * count;
size_t bytes_read = read(f->fd, buf, bytes_requested);
return bytes_read / size;
}
Note that the C and POSIX standard does not guarantee a complete object of size size need to be read every time. If a complete object cannot be read (e.g. stdin only has 999 bytes but you've requested size == 1000), the file will be left in an interdeterminate state (C99 ยง7.19.8.1/2).
Edit: See the other answers about POSIX.
fread calls getc internally. in Minix number of times getc is called is simply size*nmemb so how many times getc will be called depends on the product of these two. So Both fread(a, 1, 1000, stdin) and fread(a, 1000, 1, stdin) will run getc 1000=(1000*1) Times.
Here is the siimple implementation of fread from Minix
size_t fread(void *ptr, size_t size, size_t nmemb, register FILE *stream){
register char *cp = ptr;
register int c;
size_t ndone = 0;
register size_t s;
if (size)
while ( ndone < nmemb ) {
s = size;
do {
if ((c = getc(stream)) != EOF)
*cp++ = c;
else
return ndone;
} while (--s);
ndone++;
}
return ndone;
}
There may be no performance difference, but those calls are not the same.
fread returns the number of elements read, so those calls will return different values.
If an element cannot be completely read, its value is indeterminate:
If an error occurs, the resulting value of the file position indicator for the stream is
indeterminate. If a partial element is read, its value is indeterminate. (ISO/IEC 9899:TC2 7.19.8.1)
There's not much difference in the glibc implementation, which just multiplies the element size by the number of elements to determine how many bytes to read and divides the amount read by the member size in the end. But the version specifying an element size of 1 will always tell you the correct number of bytes read. However, if you only care about completely read elements of a certain size, using the other form saves you from doing a division.
One more sentence form http://pubs.opengroup.org/onlinepubs/000095399/functions/fread.html is notable
The fread() function shall read into the array pointed to by ptr up to nitems elements whose size is specified by size in bytes, from the stream pointed to by stream. For each object, size calls shall be made to the fgetc() function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object.
Inshort in both case data will be accessed by fgetc()...!
I wanted to clarify the answers here. fread performs buffered IO. The actual read block sizes fread uses are determined by the C implementation being used.
All modern C libraries will have the same performance with the two calls:
fread(a, 1, 1000, file);
fread(a, 1000, 1, file);
Even something like:
for (int i=0; i<1000; i++)
a[i] = fgetc(file)
Should result in the same disk access patterns, although fgetc would be slower due to more calls into the standard c libraries and in some cases the need for a disk to perform additional seeks which would have otherwise been optimized away.
Getting back to the difference between the two forms of fread. The former returns the actual number of bytes read. The latter returns 0 if the file size is less than 1000, otherwise it returns 1. In both cases the buffer would be filled with the same data, i.e. the contents of the file up to 1000 bytes.
In general, you probably want to keep the 2nd parameter (size) set to 1 such that you get the number of bytes read.

Reading a binary file 1 byte at a time

I am trying to read a binary file in C 1 byte at a time and after searching the internet for hours I still can not get it to retrieve anything but garbage and/or a seg fault. Basically the binary file is in the format of a list that is 256 items long and each item is 1 byte (an unsigned int between 0 and 255). I am trying to use fseek and fread to jump to the "index" within the binary file and retrieve that value. The code that I have currently:
unsigned int buffer;
int index = 3; // any index value
size_t indexOffset = 256 * index;
fseek(file, indexOffset, SEEK_SET);
fread(&buffer, 256, 1, file);
printf("%d\n", buffer);
Right now this code is giving me random garbage numbers and seg faulting. Any tips as to how I can get this to work right?
Your confusing bytes with int. The common term for a byte is an unsigned char. Most bytes are 8-bits wide. If the data you are reading is 8 bits, you will need to read in 8 bits:
#define BUFFER_SIZE 256
unsigned char buffer[BUFFER_SIZE];
/* Read in 256 8-bit numbers into the buffer */
size_t bytes_read = 0;
bytes_read = fread(buffer, sizeof(unsigned char), BUFFER_SIZE, file_ptr);
// Note: sizeof(unsigned char) is for emphasis
The reason for reading all the data into memory is to keep the I/O flowing. There is an overhead associated with each input request, regardless of the quantity requested. Reading one byte at a time, or seeking to one position at a time is the worst case.
Here is an example of the overhead required for reading 1 byte:
Tell OS to read from the file.
OS searches to find the file location.
OS tells disk drive to power up.
OS waits for disk drive to get up to speed.
OS tells disk drive to position to the correct track and sector.
-->OS tells disk to read one byte and put into drive buffer.
OS fetches data from drive buffer.
Disk spins down to a stop.
OS returns 1 byte to your program.
In your program design, the above steps will be repeated 256 times. With everybody's suggestion, the line marked with "-->" will read 256 bytes. Thus the overhead is executed only once instead of 256 times to get the same quantity of data.
In your code you are trying to read 256 bytes to the address of one int. If you want to read one byte at a time, call fread(&buffer, 1, 1, file); (See fread).
But a simpler solution will be to declare an array of bytes, read it all together and process it after that.
unsigned char buffer; // note: 1 byte
fread(&buffer, 1, 1, file);
It is time to read mans I believe.
Couple of problems with the code as it stands.
The prototype for fread is:
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
You've set the size to 256 (bytes) and the count to 1. That's fine, that means "read one lump of 256 bytes, shove it into the buffer".
However, your buffer is on the order of 2-8 bytes long (or, at least, vastly smaller than 256 bytes), so you have a buffer overrun. You probably want to use fred(&buffer, 1, 1, file).
Furthermore, you're writing byte data to an int pointer. This will work on one endian-ness (small-endian, in fact), so you'll be fine on Intel architecture and from that learn bad habits tha WILL come back and bite you, one of these days.
Try real hard to only write byte data into byte-organised storage, rather than into ints or floats.
You are trying to read 256 bytes into a 4-byte integer variable called "buffer". You are overwriting the next 252 bytes of other data.
It seems like buffer should either be unsigned char buffer[256]; or you should be doing fread(&buffer, 1, 1, f) and in that case buffer should be unsigned char buffer;.
Alternatively, if you just want a single character, you could just leave buffer as int (unsigned is not needed because C99 guarantees a reasonable minimum range for plain int) and simply say:
buffer = fgetc(f);

Resources