Reading data from a file to a struct in C - c

Lets say i used the fread function to read data from a file to a struct. How exactly is data read to the struct? Lets say my struct has the following:
Int x;
Char c;
Will the first 4 bytes read go into x and the next byte go into c?
And if i read in more bytes than the elements in my struct can hold what's gonna happen?

Will the first 4 bytes read go into x and the next byte go into c?
Yes, unless your compiler has extremely strange padding rules (e.g. every member must be 8 byte aligned). And assuming Int is 4 bytes and Char is 1 byte.
And if i read in more bytes than the elements in my struct can hold what's gonna happen?
That's undefined behavior, unless perhaps the over-long write is not more than sizeof(YourStruct) in which case you'll only be writing to the padding bytes (which on a lot of platforms will be 3 bytes after the char).

fread reads data byte-for-byte from a file (stream) into memory. Therefore, if what you're trying to read is a struct, the byte layout of the struct in the file must exactly match the layout your compiler has chosen for the struct in memory.
So the question of "How does fread read from a file?" really boils down to, "How does the compiler lay out structs in memory?"
And the answer to that question is, it's partly determined by the rules of the C language, and it's partly up to the compiler.
So if you want to read structures from a file, you have three choices:
Learn everything you can about the C rules for laying out structures in memory, and the choices compilers can make in interpreting these rules. Keep all these rules in mind as you design your structures and your data file formats. (This is not an impossible task. Many programmers take this approach to file i/o all the time.)
Don't worry about the layout too much. Define your structures, and write them out to files using fwrite. Then the files are automatically readable using fread -- at least, as long as the program doing the reading is running on the same kind of machine, and was compiled by the same compiler using the same settings. (This, too, is a popular strategy, and works much of the time.)
Don't use fread to read structures form a file. (And although it sounds defeatist, this is my own preferred argument.)
There's much, much more that could be said abut this question. If you choose approach 1, as I've already said, you're going to have to learn everything you can about the C rules for laying out structures in memory, and the choices compilers can make in interpreting these rules. If you choose approach 3, you have to learn some decent techniques for doing so without using fwrite and fread. But I'm not going to launch into long explanations of either of those topics here. I'm sure someone else will post some links, or you could start with Chapter 17 of these C programming notes.

Related

Confusion about files in C?

I want to work with files in C. I know that the standard library has some functions to help with that. However I have some questions. Are the contents of a file just an array of characters or is it a combination of ints, chars, doubles and more data types. What are files terminated by in C?
A few things jump to mind such as EOF and NULL but I'm not sure.
C is extremely basic in that the contents of a file is just a byte stream until you write code that interprets otherwise.
Keep in mind the very concept of something like an int is extremely hazy at best due to endian issues. While any given C compiler has strong opinions on the form those take, these are architecture specific and not portable. Over the decades it has meant anything from binary coded decimal values, to "words" of varying sizes, and likely even stranger things along the way.
Files aren't "terminated" by anything in C. They're just files. They have zero or more bytes of data, and possibly holes to keep things interesting.
When reading a file you read in either a fixed amount of data, or read until you bump into the "End of File".
Remember, some files may be actively written to, so the EOF position might be constantly changing.
It's also important to keep in mind NULL in C means a pointer. When talking about the zero byte used as a string terminator that's often called NUL to avoid confusion, a term captured in ASCII standard, though it actually pre-dates that standard.

Unable to read from a file in Turbo C [duplicate]

I need to serialize a C struct to a file in a portable way, so that I can read the file on other machines and can be guaranteed that I will get the same thing that I put in.
The file format doesn't matter as long as it is reasonably compact (writing out the in-memory representation of a struct would be ideal if it wasn't for the portability issues.)
Is there a clean way to easily achieve this?
You are essentially designing a binary network protocol, so you may want to use an existing library (like Google's protocol buffers). If you still want to design your own, you can achieve reasonable portability of writing raw structs by doing this:
Pack your structs (GCC's __attribute__((packed)), MSVC's #pragma pack). This is compiler-specific.
Make sure your integer endianness is correct (htons, htonl). This is architecture-specific.
Do not use pointers for strings (use character buffers).
Use C99 exact integer sizes (uint32_t etc).
Ensure that the code only compiles where CHAR_BIT is 8, which is the most common, or otherwise handles transformation of character strings to a stream of 8-bit octets. There are some environments where CHAR_BIT != 8, but they tend to be special-purpose hardware.
With this you can be reasonably sure you will get the same result on the other end as long as you are using the same struct definition. I am not sure about floating point numbers representation, however, but I usually avoid sending those.
Another thing unrelated to portability you may want to address is backwards compatibility by introducing length as a first field, and/or using version tag.
You could try using a library such as protocol buffers; rolling your own is probably not worth the effort.
Write one function for output.
Use sprintf to print an ascii representation of each field to the file,
one field per line.
Write one function for input.
Use fgets to load each line from the file.
Use scanf to convert to binary, directly into the field in your structure.
If you plan on doing this with a lot of different structures,
consider adding a header to each file, which identifies what kind of structure
it represents.

How do fread and fwrite distinguish between different data (types) in C?

I am working with a program and C (with Ubuntu and its bash) and using it to manipulate binary data files. First of all, when I use fopen(filename, 'w') it creates a file but without any extension. However, when I use vim filename it opens it up in some binary form.
For this question, when I use fwrite(array, sizeof(some struct), # of structs, filePointer) it writes (which I am not sure how in binary) into the file. When I use fread(anotherArray, sizeof(same struct), same # of structs, anotherFilePointer) it somehow magically knows how to read each struct in binary form and puts it into the array just by knowing its size and how much to read. What happens if I put a decimal value less than the number of structs there are in the # of structs parameter? How would fread know what to read correctly? How does it work in reading data just by looking at the sizes and not knowing what type of data it is?
fwrite writes the bytes of the memory where the object is stored to the output stream and fread reads bytes from the input stream into the memory whose address it gets as an argument. No assumption is made regarding the types and representations of the C objects stored in this memory.
Hence a number of problems can occur:
the representation of basic types can differ from one compiler to another, one machine to another, one OS to another, possibly even depending on compiler switches. Writing the bytes of the memory representation of basic types makes sense only if you know you will be reading the file back into byte-compatible structures.
the mode for accessing the input and output files matters: as you mention, files must be open in binary mode to avoid any translation between memory representation and file contents such as what happens for text files on legacy systems. For example text mode on MS-Windows causes 0A bytes to convert to 0D 0A sequences on output and 0D bytes to be stripped on input, resulting in different contents for isolated 0D bytes in the initial content.
if the C structure contains pointers, the bytes written to the output represent the value of these pointers, not what they point to. Reading these values back into memory is highly likely to create invalid pointers and very unlikely to make any sense.
if the C structure has a flexible array at the end, its contents is not included in the sizeof(T) bytes written by fwrite or read by fread.
the C structure may contain padding between members, causing the output file to contain non deterministic bytes, which might be a problem in some circumstances.
if the C structure has arrays with only partial meaningful contents, such as char arrays containing C strings, beware that fwrite will write the bytes beyond the null terminator, which should not be meaningful, but might be sensitive information such as password fragments or other meaningful data. Carefully erasing such arrays may avoid this issue, but padding bytes cannot be erased reliably, so this solution is not perfect.
For all the above reasons and other ones, reading/writing binary data is to be reserved to very specific cases where the programmer knows exactly what is happening. For other purposes, saving as text files in human readable form is much preferred.
In question comments from #David C. Rankin
"Well, fread/fwrite read and write bytes (binary data - if you write out then read in the same number of bytes -- you get the same thing back). If you want to read and write text where you need to worry about line-breaks, etc.., fgets/fputs. or fprintf"
So I guess I can never know what I read in with fread unless I know what I wrote to it in with fwriite?
"Right, look at the type for your buffer in fwrite(3) - Linux man page it is type void *. It's just a starting address for fwrite to use in writing however many bytes you told it to write. (obviously you know what it is writing) The same for fread -- it just reads bytes -- you have to know what you are reading (or at least the format of it). That's what binary I/O is about, it's all just bytes -- it's up to you, the Programmer, to know what you are writing and reading and how to unpack it. Otherwise, use formatted-I/O and lines, words, etc.."

C How fread reads different data blocks in a binary file?

I'm porting some C code to C#, but I know little about C, but I'm flexible and I can learn new programming languages. Anyway I wasn't able to figure out the exact behaviour from the code I'm porting.
I've read about fread() and on the web.
fread(&(targetObj->data), sizeof(TestObj), 1, file);
Now, file is a big binary file with lots of data in it.
What I want to know is how I can do this in C#.
Let me explain:
I think that line of code does this:
TestObj is an unsigned short
reads 1 time a chunk of data of the size of TestObj(unsigned short)
reads it from file (which is pointer to a binary file on filesystem) into targetObj->data
What I don't understand is:
I have a big binary file, what it actually reads? There are somewhere headers which define where an unsigned short sized chunk of data is written?
Where does it takes from the binary that object? How can I know how to read back from the binary file in C#? Maybe C knows where to pick that single unsigned short, but I don't in C#
For example if that binary file has saved in it 40 unsigned shorts the C code line above reads just the first one?
and if I do
fread(&(targetObj->data), sizeof(TestObj), 5, file);
it is expected that testObj->data is an array of 5 unsigned shorts?
And the code will read the first 5 unsigned shorts that it finds in the whole binary file?
I can't wrap my head around this but I need to know how C recognizes that unsigned short in a big binary file which I don't know the content of nor I can't think how I can say in C# read the first C unsigned short from that file
fread just reads the specified number of bytes from the current file cursor position, and advances the file cursor (or "file pointer", but not to be confused with a C pointer).
So if sizeof(TestObj) is 2, it will read two bytes and place them into the location pointed by &(targetObj->data), with no bounds checking, and regardless of any differences between your architecture endianess and the file protocol endianess. Note that this approach is not a platform-independent way of parsing files containing numbers in binary form, since the number might be stored differently on your machine, compared to how it is stored inside the file (by whoever designed the binary protocol you are trying to read).
In C#, you might achieve a similar thing by manually specifying struct packing and field placement, although the code will suffer from the same problems as your C code.
fread reads from current position in stream see also ftell and fseek. Equivalent in C# would be Stream.Read
From man fread
size_t
fread(void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream);
The function fread() reads nitems objects, each size bytes long, from the stream pointed to by stream, storing them at the location given by ptr.
sizeof(short) is resolved by compiler, as per https://stackoverflow.com/a/14171152/6204612
And C does not do any pretty conversions from you. What is read is precisely sizeof(short) bytes, and these bytes are put into TestObj variable. Whether it is correct or not is implementer's responsibility. You need to manage offsets, collection sizes etc. on your own.

What's a good coding style for reading different bits of data from a binary file in C?

I'm novice programmer and am writing a simple wav-player in C as a pet project. Part of the file loading process requires reading specific data (sampling rate, number of channels,...) from the file header.
Currently what I'm doing is similar to this:
Scan for a sequence of bytes and skip past it
Read 2 bytes into variable a
Check value and return on error
Skip 4 bytes
Read 4 bytes into variable b
Check value and return on error
...and so on. (code see: https://github.com/qgi/Player/blob/master/Importer.c)
I've written a number of helper functions to do the scanning/skipping/reading bit. Still I'm repeating the reading, checking, skipping part several times, which doesn't seem to be neither very effective nor very smart. It's not a real issue for my project, but as this seems to be quite a common task when handling binary files, I was wondering:
Is there some kind of a pattern on how to do this more effectively with cleaner code?
Most often, people define structs (often with something like #pragma pack(1) to assure against padding) that matches the file's structures. They then read data into an instance of that with something like fread, and use the values from the struct.
The cleanest option that I've come across is the scanf-like function unpack presented by Kernighan & Pike on page 219 of The Practice of Programming, which can be used like
// assume we read the file header into buf
// and the header consists of magic (4 bytes), type (2) and length (4).
// "l" == 4 bytes (long)
// "s" == 2 bytes (short)
unpack(buf, "lsl", &magic, &type, &length);
For efficiency using a buffer of say size 4096 to read into and then doing your parsing on the data in the buffer would be more efficient, and ofcource doing a single scan parsing where you only go forward is most efficient.

Resources