C How fread reads different data blocks in a binary file? - c

I'm porting some C code to C#, but I know little about C, but I'm flexible and I can learn new programming languages. Anyway I wasn't able to figure out the exact behaviour from the code I'm porting.
I've read about fread() and on the web.
fread(&(targetObj->data), sizeof(TestObj), 1, file);
Now, file is a big binary file with lots of data in it.
What I want to know is how I can do this in C#.
Let me explain:
I think that line of code does this:
TestObj is an unsigned short
reads 1 time a chunk of data of the size of TestObj(unsigned short)
reads it from file (which is pointer to a binary file on filesystem) into targetObj->data
What I don't understand is:
I have a big binary file, what it actually reads? There are somewhere headers which define where an unsigned short sized chunk of data is written?
Where does it takes from the binary that object? How can I know how to read back from the binary file in C#? Maybe C knows where to pick that single unsigned short, but I don't in C#
For example if that binary file has saved in it 40 unsigned shorts the C code line above reads just the first one?
and if I do
fread(&(targetObj->data), sizeof(TestObj), 5, file);
it is expected that testObj->data is an array of 5 unsigned shorts?
And the code will read the first 5 unsigned shorts that it finds in the whole binary file?
I can't wrap my head around this but I need to know how C recognizes that unsigned short in a big binary file which I don't know the content of nor I can't think how I can say in C# read the first C unsigned short from that file

fread just reads the specified number of bytes from the current file cursor position, and advances the file cursor (or "file pointer", but not to be confused with a C pointer).
So if sizeof(TestObj) is 2, it will read two bytes and place them into the location pointed by &(targetObj->data), with no bounds checking, and regardless of any differences between your architecture endianess and the file protocol endianess. Note that this approach is not a platform-independent way of parsing files containing numbers in binary form, since the number might be stored differently on your machine, compared to how it is stored inside the file (by whoever designed the binary protocol you are trying to read).
In C#, you might achieve a similar thing by manually specifying struct packing and field placement, although the code will suffer from the same problems as your C code.

fread reads from current position in stream see also ftell and fseek. Equivalent in C# would be Stream.Read

From man fread
size_t
fread(void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream);
The function fread() reads nitems objects, each size bytes long, from the stream pointed to by stream, storing them at the location given by ptr.
sizeof(short) is resolved by compiler, as per https://stackoverflow.com/a/14171152/6204612
And C does not do any pretty conversions from you. What is read is precisely sizeof(short) bytes, and these bytes are put into TestObj variable. Whether it is correct or not is implementer's responsibility. You need to manage offsets, collection sizes etc. on your own.

Related

How do fread and fwrite distinguish between different data (types) in C?

I am working with a program and C (with Ubuntu and its bash) and using it to manipulate binary data files. First of all, when I use fopen(filename, 'w') it creates a file but without any extension. However, when I use vim filename it opens it up in some binary form.
For this question, when I use fwrite(array, sizeof(some struct), # of structs, filePointer) it writes (which I am not sure how in binary) into the file. When I use fread(anotherArray, sizeof(same struct), same # of structs, anotherFilePointer) it somehow magically knows how to read each struct in binary form and puts it into the array just by knowing its size and how much to read. What happens if I put a decimal value less than the number of structs there are in the # of structs parameter? How would fread know what to read correctly? How does it work in reading data just by looking at the sizes and not knowing what type of data it is?
fwrite writes the bytes of the memory where the object is stored to the output stream and fread reads bytes from the input stream into the memory whose address it gets as an argument. No assumption is made regarding the types and representations of the C objects stored in this memory.
Hence a number of problems can occur:
the representation of basic types can differ from one compiler to another, one machine to another, one OS to another, possibly even depending on compiler switches. Writing the bytes of the memory representation of basic types makes sense only if you know you will be reading the file back into byte-compatible structures.
the mode for accessing the input and output files matters: as you mention, files must be open in binary mode to avoid any translation between memory representation and file contents such as what happens for text files on legacy systems. For example text mode on MS-Windows causes 0A bytes to convert to 0D 0A sequences on output and 0D bytes to be stripped on input, resulting in different contents for isolated 0D bytes in the initial content.
if the C structure contains pointers, the bytes written to the output represent the value of these pointers, not what they point to. Reading these values back into memory is highly likely to create invalid pointers and very unlikely to make any sense.
if the C structure has a flexible array at the end, its contents is not included in the sizeof(T) bytes written by fwrite or read by fread.
the C structure may contain padding between members, causing the output file to contain non deterministic bytes, which might be a problem in some circumstances.
if the C structure has arrays with only partial meaningful contents, such as char arrays containing C strings, beware that fwrite will write the bytes beyond the null terminator, which should not be meaningful, but might be sensitive information such as password fragments or other meaningful data. Carefully erasing such arrays may avoid this issue, but padding bytes cannot be erased reliably, so this solution is not perfect.
For all the above reasons and other ones, reading/writing binary data is to be reserved to very specific cases where the programmer knows exactly what is happening. For other purposes, saving as text files in human readable form is much preferred.
In question comments from #David C. Rankin
"Well, fread/fwrite read and write bytes (binary data - if you write out then read in the same number of bytes -- you get the same thing back). If you want to read and write text where you need to worry about line-breaks, etc.., fgets/fputs. or fprintf"
So I guess I can never know what I read in with fread unless I know what I wrote to it in with fwriite?
"Right, look at the type for your buffer in fwrite(3) - Linux man page it is type void *. It's just a starting address for fwrite to use in writing however many bytes you told it to write. (obviously you know what it is writing) The same for fread -- it just reads bytes -- you have to know what you are reading (or at least the format of it). That's what binary I/O is about, it's all just bytes -- it's up to you, the Programmer, to know what you are writing and reading and how to unpack it. Otherwise, use formatted-I/O and lines, words, etc.."

Reading data from a file to a struct in C

Lets say i used the fread function to read data from a file to a struct. How exactly is data read to the struct? Lets say my struct has the following:
Int x;
Char c;
Will the first 4 bytes read go into x and the next byte go into c?
And if i read in more bytes than the elements in my struct can hold what's gonna happen?
Will the first 4 bytes read go into x and the next byte go into c?
Yes, unless your compiler has extremely strange padding rules (e.g. every member must be 8 byte aligned). And assuming Int is 4 bytes and Char is 1 byte.
And if i read in more bytes than the elements in my struct can hold what's gonna happen?
That's undefined behavior, unless perhaps the over-long write is not more than sizeof(YourStruct) in which case you'll only be writing to the padding bytes (which on a lot of platforms will be 3 bytes after the char).
fread reads data byte-for-byte from a file (stream) into memory. Therefore, if what you're trying to read is a struct, the byte layout of the struct in the file must exactly match the layout your compiler has chosen for the struct in memory.
So the question of "How does fread read from a file?" really boils down to, "How does the compiler lay out structs in memory?"
And the answer to that question is, it's partly determined by the rules of the C language, and it's partly up to the compiler.
So if you want to read structures from a file, you have three choices:
Learn everything you can about the C rules for laying out structures in memory, and the choices compilers can make in interpreting these rules. Keep all these rules in mind as you design your structures and your data file formats. (This is not an impossible task. Many programmers take this approach to file i/o all the time.)
Don't worry about the layout too much. Define your structures, and write them out to files using fwrite. Then the files are automatically readable using fread -- at least, as long as the program doing the reading is running on the same kind of machine, and was compiled by the same compiler using the same settings. (This, too, is a popular strategy, and works much of the time.)
Don't use fread to read structures form a file. (And although it sounds defeatist, this is my own preferred argument.)
There's much, much more that could be said abut this question. If you choose approach 1, as I've already said, you're going to have to learn everything you can about the C rules for laying out structures in memory, and the choices compilers can make in interpreting these rules. If you choose approach 3, you have to learn some decent techniques for doing so without using fwrite and fread. But I'm not going to launch into long explanations of either of those topics here. I'm sure someone else will post some links, or you could start with Chapter 17 of these C programming notes.

Is there a limitation on blocksize for reading when using fread in C?

I am currently programming an application for smartphones using C++ and the NDK. For reading external files, I use fread. This works well on Windows, however, on Android phones, I got a mess with my implementation of the deflate decompressor. Of course I thought, there something wrong with my implementation of deflate, but it didn't really make sense as everything worked perfectly on Windows machines. After hours, I was finally able to track down the problem to fread.
I am reading a file of size 4790954 and the return value of fread is also 4790954. I, however discovered, that the buffer starts to contain trash at offset: 4194304. Exactly 4MB. Is there any known limitation on blocksize to be read at once that are defined in ANSI C I am not aware of?? Also, isn't that considered a bug, if the Google NDK fread function returns an amount of read bytes of 4790954, if it however, only read 4194304 bytes (4MB)?
Is there a limitation on blocksize for reading when using fread in C?
Not per se. The limitation is implied based on the data types. Android is 32-bit only, so size_t is 32-bits. There's also a potential for wrap leading to a smaller read size because of object_size * number_of_objects (these are unsigned values, so they wrap rather than overflow).
From The Open Group Base Specifications Issue 6 and fread:
size_t fread(void *restrict ptr, size_t size, size_t nitems,
FILE *restrict stream);
And the description:
The fread() function shall read into the array pointed to by ptr up to
nitems elements whose size is specified by size in bytes, from the
stream pointed to by stream. For each object, size calls shall be made
to the fgetc() function and the results stored, in the order read, in
an array of unsigned char exactly overlaying the object.
What does ferror(fp) return after the read? Is there an error?
Related: you might want to have a quick look at the answer of Using fread properly. I'm not claiming there's a problem in your usage fo fread or fgetc. But there's no code, so we can't tell.
I know its quite some time ago but looking around my profile I realized that this question remained open even though I found a solution to the problem a long time aog. The error was rather stupid - I didn't open the file in binary mode using fopen, which led to this strange behaviour...

Reading a binary file bit by bit

I know the function below:
size_t fread(void *ptr, size_t size_of_elements, size_t number_of_elements, FILE *a_file);
It only reads byte by byte, my goal is to be able to read 12 bits at a time and then take them into an array. Any help or pointers would be greatly appreciated!
Adding to the first comment, you can try reading one byte at a time (declare a char variable and write there), and then use the bitwise operators >> and << to read bit by bit. Read more here: http://www.cprogramming.com/tutorial/bitwise_operators.html
Many years ago, I wrote some I/O routines in C for a Huffman encoder. This needs to be able to read and write on the granularity of bits rather than bytes. I created functions analogous to read(2) and write(2) that could be asked to (say) read 13 bits from a stream. To encode, for example, bytes would be fed into the coder and variable numbers of bits would emerge the other side. I had a simple structure with a bit pointer into the current byte being read or written. Every time it went off the end, it flushed the completed byte out and reset the pointer to zero. Unfortunately that code is long gone, but it might be an idea to pull apart an open-source Huffman coder and see how the problem was solved there. Similarly, base64 coding takes 3 bytes of data and turns them into 4 (or vice versa).
I've implemented a couple of methods to read/write files bit by bit. Here they are. Whether it is viable or not for your use case, you have to decide that for yourself. I've tried to make the most readable, optimized code i could, not being a seasoned C developer (for now).
Internally, it uses a "bitCursor" to store information about previous bits that don't yet fit a full byte. It has who data fields: d stores data and s stores size, or the amount of bits stored in the cursor.
You have four functions:
newBitCursor(), which returns a bitCursor object with default values
{0,0}. Such a cursor is needed at the beginning of a sequence of
read/write operations to or from a file.
fwriteb(void *ptr, size_t sizeb, size_t rSizeb, FILE *fp, bitCursor
*c), which writes sizeb rightmost bits of the value stored in ptr to fp.
fcloseb(FILE *fp, bitCursor *c), which writes a remaining byte, if
the previous writes did not exactly encapsulate all data needed to
be written, that is probably almost always the case...
freadb(void *ptr, size_t sizeb, size_t rSizeb, FILE *fp, bitCursor
*c), which reads sizeb bits and bitwise ORs them to *ptr. (it is, therefore, your responsibility to init *ptr as 0)
More info is provided in the comments. Have Fun!
Edit: It has come to my knowledge today that when i made that i assumed Little Endian! :P Oops! It's always nice to realize how much of a noob i still am ;D.
Edit: GNU's Binary File Descriptors.
Read the first two bytes from your a_file file pointer and check the bits in the least or greatest byte — depending on the endianness of your platform (x86 is little-endian) — using bitshift operators.
You can't really put bits into an array, as there isn't a datatype for bits. Rather than keeping 1's and 0's in an array, which is inefficient, it seems cheaper just to keep the two bytes in a two-element array (say, of type unsigned char *) and write functions to map those two bytes to one of 4096 (2^12) values-of-interest.
As a complication, on subsequent reads, if you want to fread through the pointer every 12 bits, you would read only one byte, using the left-over bits from the previous read to build a new 12-bit value. If there are no leftovers, you would need to read two bytes.
Your mapping functions would need to address the second case where bits were used from previous read, because the two bytes would need different mapping. To do this efficiently, a modulus on a read-counter could be used to swap between two mappings.
read 2 bytes and do bit wise operations will get it done for the next time read 2nd bytes onwards apply the bit-wise operations will get back you expected . . . .
For your problem you can see this demo program which read 2byte but actual information is only 12 bit.As well as this type of things are used it bit wise access.
fwrite() is a standard library function which take the size argument as byte and of type int.So it is not possible exactly 12bit read.If the file you create then create like below as well as read as below it solve your problem.
If that file is special file which not written by you then follow the standard provided for that file to read I think they also writing like this only.Or you can provide the axact where it I may help you.
#include<stdio.h>
#include<stdlib.h>
struct node
{
int data:12;
}NODE;
int main()
{
FILE *fp;
fp=fopen("t","w");
NODE.data=1024;
printf("%d\n",NODE.data);
fwrite(&NODE,sizeof(NODE),1,fp);
NODE.data=0;
NODE.data=2048;
printf("%d\n",(unsigned)NODE.data);
fwrite(&NODE,sizeof(NODE),1,fp);
fclose(fp);
fp=fopen("t","r");
fread(&NODE,sizeof(NODE),1,fp);
printf("%d\n",NODE.data);
fread(&NODE,sizeof(NODE),1,fp);
printf("%d\n",NODE.data);
fclose(fp);
}

What's a good coding style for reading different bits of data from a binary file in C?

I'm novice programmer and am writing a simple wav-player in C as a pet project. Part of the file loading process requires reading specific data (sampling rate, number of channels,...) from the file header.
Currently what I'm doing is similar to this:
Scan for a sequence of bytes and skip past it
Read 2 bytes into variable a
Check value and return on error
Skip 4 bytes
Read 4 bytes into variable b
Check value and return on error
...and so on. (code see: https://github.com/qgi/Player/blob/master/Importer.c)
I've written a number of helper functions to do the scanning/skipping/reading bit. Still I'm repeating the reading, checking, skipping part several times, which doesn't seem to be neither very effective nor very smart. It's not a real issue for my project, but as this seems to be quite a common task when handling binary files, I was wondering:
Is there some kind of a pattern on how to do this more effectively with cleaner code?
Most often, people define structs (often with something like #pragma pack(1) to assure against padding) that matches the file's structures. They then read data into an instance of that with something like fread, and use the values from the struct.
The cleanest option that I've come across is the scanf-like function unpack presented by Kernighan & Pike on page 219 of The Practice of Programming, which can be used like
// assume we read the file header into buf
// and the header consists of magic (4 bytes), type (2) and length (4).
// "l" == 4 bytes (long)
// "s" == 2 bytes (short)
unpack(buf, "lsl", &magic, &type, &length);
For efficiency using a buffer of say size 4096 to read into and then doing your parsing on the data in the buffer would be more efficient, and ofcource doing a single scan parsing where you only go forward is most efficient.

Resources