Dumping struct in C - c

Is it a good idea to simply dump a struct to a binary file using fwrite?
e.g
struct Foo {
char name[100];
double f;
int bar;
} data;
fwrite(&data,sizeof(data),1,fout);
How portable is it?
I think it's really a bad idea to just throw whatever the compiler gives(padding,integer size,etc...). even if platform portability is not important.
I've a friend arguing that doing so is very common.... in practice.
Is it true???
Edit: What're the recommended way to write portable binary file? Using some sort of library?
I'm interested how this is achieved too.(By specifying byte order,sizes,..?)

That's certainly a very bad idea, for two reasons:
the same struct may have different sizes on different platforms due to alignment issues and compiler mood
the struct's elements may have different representations on different machines (think big-endian/little-endian, IEE754 vs. some other stuff, sizeof(int) on different platforms)

It rather critically matters whether you want the file to be portable, or just the code.
If you're only ever going to read the data back on the same C implementation (and that means with the same values for any compiler options that affect struct layout in any way), using the same definition of the struct, then the code is portable. It might be a bad idea for other reasons: difficulty of changing the struct, and in theory there could be security risks around dumping padding bytes to disk, or bytes after any NUL terminator in that char array. They could contain information that you never intended to persist. That said, the OS does it all the time in the swap file, so whatEVER, but try using that excuse when users notice that your document format doesn't always delete data they think they've deleted, and they just emailed it to a reporter.
If the file needs to be passed between different platforms then it's a pretty bad idea, because you end up accidentally defining your file format to be something like, "whatever MSVC on Win32 ends up writing". This could end up being pretty inconvenient to read and write on some other platform, and certainly the code you wrote in the first place won't do it when running on another platform with an incompatible storage representation of the struct.
The recommended way to write portable binary files, in order of preference, is probably:
Don't. Use a text format. Be prepared to lose some precision in floating-point values.
Use a library, although there's a bit of a curse of choice here. You might think ASN.1 looks all right, and it is as long as you never have to manipulate the stuff yourself. I would guess that Google Protocol Buffers is fairly good, but I've never used it myself.
Define some fairly simple binary format in terms of what each unsigned char in turn means. This is fine for characters[*] and other integers, but gets a bit tricky for floating-point types. "This is a little-endian representation of an IEEE-754 float" will do you OK provided that all your target platforms use IEEE floats. Which I expect they do, but you have to bet on that. Then, assemble that sequence of characters to write and interpret it to read: if you're "lucky" then on a given platform you can write a struct definition that matches it exactly, and use this trick. Otherwise do whatever byte manipulation you need to. If you want to be really portable, be careful not to use an int throughout your code to represent the value taken from bar, because if you do then on some platform where int is 16 bits, it won't fit. Instead use long or int_least32_t or something, and bounds-check the value on writing. Or use uint32_t and let it wrap.
[*] Until you hit an EBCDIC machine, that is. Not that anybody will seriously expect your files to be portable to a machine that plain text files aren't portable to either.

How fond are you of getting a call in the middle of the night? Either use a #pragma to pack them or write them variable by variable.

Yes, this sort of foolishness is very common but that doesn't make it a good idea. You should write each field individually in a specified byte order, that will avoid alignment and byte order problems at the cost of a little tiny bit of extra effort. Reading and writing field by field will also make your life easier when you upgrade your software and have to read your old data format or if the underlying hardware architecture changes.

Related

Are there reasons to avoid bit-field structure members?

I long knew there are bit-fields in C and occasionally I use them for defining densely packed structs:
typedef struct Message_s {
unsigned int flag : 1;
unsigned int channel : 4;
unsigned int signal : 11;
} Message;
When I read open source code, I instead often find bit-masks and bit-shifting operations to store and retrieve such information in hand-rolled bit-fields. This is so common that I do not think the authors were not aware of the bit-field syntax, so I wonder if there are reasons to roll bit-fields via bit-masks and shifting operations your own instead of relying on the compiler to generate code for getting and setting such bit-fields.
Why other programmers use hand-coded bit manipulations instead of bitfields to pack multiple fields into a single word?
This answer is opinion based as the question is quite open:
Many programmers are unaware of the availability of bitfields or unsure about their portability and precise semantics. Some even distrust the compiler's ability to produce correct code. They prefer to write explicit code that they understand.
As commented by Cornstalks, this attitude is rooted in real life experience as explained in this article.
Bitfield's actual memory layout is implementation defined: if the memory layout must follow a precise specification, bitfields should not be used and hand-coded bit manipulations may be required.
The handing of signed values in signed typed bitfields is implementation defined. If signed values are packed into a range of bits, it may be more reliable to hand-code the access functions.
Are there reasons to avoid bitfield-structs?
bitfield-structs come with some limitations:
Bit fields result in non-portable code. Also, the bit field length has a high dependency on word size.
Reading (using scanf()) and using pointers on bit fields is not possible due to non-addressability.
Bit fields are used to pack more variables into a smaller data space, but cause the compiler to generate additional code to manipulate these variables. This results in an increase in both space as well as time complexities.
The sizeof() operator cannot be applied to the bit fields, since sizeof() yields the result in bytes and not in bits.
Source
So whether you should use them or not depends. Read more in Why bit endianness is an issue in bitfields?
PS: When to use bit-fields in C?
There is no reason for it. Bitfields are useful and convenient. They are in the common use in the embedded projects. Some architectures (like ARM) have even special instructions to manipulate bitfields.
Just compare the code (and write the rest of the function foo1)
https://godbolt.org/g/72b3vY
In many cases, it is useful to be able to address individual groups of bits within a word, or to operate on a word as a unit. The Standard presently does not provide
any practical and portable way to achieve such functionality. If code is written to use bitfields and it later becomes necessary to access multiple groups as a word, there would be no nice way to accommodate that without reworking all the code using the bit fields or disabling type-based aliasing optimizations, using type punning, and hoping everything gets laid out as expected.
Using shifts and masks may be inelegant, but until C provides a means of treating an explicitly-designated sequence of bits within one lvalue as another lvalue, it is often the best way to ensure that code will be adaptable to meet needs.

standardising bit fields using unions (making them portable)

The page shown is from
"Programming Embedded Systems"
by Michael Barr, Anthony Massa,
published by O'Reilly,
ISBN: 0-596-00983-6, page 204.
I am asking you to give more details and explanations on this, like:
Does this means that the bit fields are going to be portable across all the compilers?
For (different) architectures does this work for bit fields with sizes more than one byte or (considering the endianness difference which I don't think that using this method will overcome this problem)?
For (same) architectures does this work for bit fields with sizes more than one byte?
If they are standardised across all compilers as the book says, can we specify how are they going to be aligned?
Q.1.2 if the bit fields are just one byte so the endianness problem won't affect it, right? So will the bit fields be portable across all the compilers on different architectures and endiannesses?
does this means that the bit fields are going to be portable across all the compilers
No, the cited text about unions somehow making bit-fields portable is strange. union does not add any further portability guarantees what-so-ever. There are many aspects of bit-fields that make them completely non-portable, because they are very poorly specified by the standard. Some examples here.
For example, using uint8_t or a char type for a bit-field is not covered by the standard. The book fails to mention this even though it makes such a non-standard example.
for (different) architectures does this work for bit fields with sizes more than one byte
for (same) architectures does this work for bit fields with sizes more than one byte
No, there are no guarantees at all.
if they are standardised across all compilers as the book says ,can we specify how are they going to be aligned?
They aren't, the book is misleading. My advise is to stop reading at "bitfields are not portable" then forget that you ever heard about bit-fields. It is a 100% superfluous feature anyway. Instead, use bitwise operators.
That is a very disturbing text, I think I would toss the whole book.
Please go read the spec, you dont have to pay for it there are older ones available and draft versions (which of course are not the final) versions available, but for that couple of decades they are more common that different.
If I could add more upvotes to Lundin's answer I would, not worth creating new users with email addresses just to do that...
I have/will possibly spark an argument on this, but...The spec does say that if you define some number of (non-zero sized) bitfields in a row in a struct or union they will get packed, and there is a special zero sized one that is used to break that so that you can declare a bunch of bitfields and group them without having to make some other struct.
Perhaps it says they will be aligned, but I would never assume, that. I know for a fact the same compiler will treat endians different and pack them on opposite ends (top bit down or bottom bit up). But there is no reason to assume that any compiler follows any convention other than packing, and I would assume although perhaps it is also subject to interpretation, that they are packed in the order defined once you figure out where they start. So I wouldnt assume that 6 bits worth of declaration are aligned either way, could be up to six different alignents within a byte assuming a byte is the size of the unit, if the size of the unit is 32 or 64 bits then I am not going to bother counting the combinations, it is more than one and that is all that matters. I know for a fact from gcc that when the 32 to 64 bit x86 transition happened, that caused problems with code making assumptions on where those bits landed.
I personally wouldnt even assume that the bits are in the declared order when they pack them together...Popular compilers tend to do that, but the spec does not say more than they are packed...what does that mean. If I have a 1 then an 8 then a 1 then an 8 then a 6 bit I would hope the compiler alignes the 8 bit ones on a byte boundary and then moves the two ones near the 6, if I were to ever use a bitfield which I dont...
The prime contention here, is to me the spec is very clear that the initial items in more than one declaration in a union only uses the same memory if the order and size are the same they are compatible types. A one bit unsigned it is not the same as a 32 bit unsigned int, so they are NOT compatible types IMO. The spec goes further to state that, for bitfields the types have to be the same type and size, so for a bitfield to share the same memory in a union, you need two structures with the same initial bitfield items to be the same type and size, and only those items are per spec going to share memory, what happens with the rest of the bits is a different story, per the spec. so your example from my reading of the spec does nothing to say that the 8 bit char (using a made up non-spec declaration) and 8 declared bits of bit field are in no way expected to line up with each other and share the same memory. Just because a compiler chooses to do that in some version does not mean you can assume that, the union in particular does not make that code portable or more portable in anyway, in fact perhaps worse as now you not only have a bitfield issue across compilers or versions, but you now have union issues across compilers or versions.
As a general rule NEVER use a structure across a compile domain (with or without bitfields)(this includes unions), so never read a file into a structure, you have crossed a compile domain, never point structures at hardware registers, you have crossed a compile domain, never point structures at memory, dont point a structure at a char array that contains an ethernet packet and use the struct and/or bitfields to pick apart the ip header. Yes, these rules are widely used and abused and are a ticking time bomb. The primary reason the time bomb only goes off rarely is that the code keeps using the same compiler, and or a couple of vary popular compilers that currently have the same implementation. but struct pointing in general fails very very often, bitfield failures are just a side effect of that, and perhaps because of the horrible text in your book, unions are starting to show up a lot making the time bomb now nuclear instead of conventional.
So if you want to use a struct or a union or a bitfield and have the code actually work without maintenance. Then stay within the same compile domain (one program compiled at the same time with the same compiler and settings), pass structures defined as structures across functions, do not point at memory or other arrays, etc. for unions never access across individually defined items, if a single variable only use that variable until finished completely with it assume it is now trash if you use a struct or other variable in that union. With bitfields each variable is a standalone item independent of the other variables next to it, you are just ATTEMPTING to save memory by using them but you are actually wasting a lot of code overhead, performance, code space by using them in the first place. Keep to that and your code is far more likely to work without maintenance, across compilers. Now if you want to use this as job security and have your code fail to build or function every minor or major release of a compiler, then do those things above, point structs across a compile domain, point bitfields at hardware registers, etc. Other than your boss noting that you write horrible code that breaks often when some other employees dont, you will have to keep maintaining that code on a regular basis for the life of that product.
All the compiler does with your bitfield is generate masks and shifts, if you write those masks and shifts yourself, MASSIVELY more portable, you may still have endian issues (which can actually at times be easily solved in portable endian-less code) but you wont be completely pointing at the wrong thing using masking and shifting. it simply works, it does not produce more code, does not produce slower code, if you really need make macros for everything, using a macro to isolate a field within a "unit" is far more portable than using a bitfield.
Forget you ever read about bitfields or heard about them, never ever use them again. The need for them died decades ago. Unions somewhat fall into the same category as well, they do actually save memory but you have to be careful to share that memory properly.
And I would toss that book as well, if they dont understand this simple topic what else do those authors not understand. As with most folks this may be confusing a popular compilers interpretation with reality.
There is a little 'bit' confusion with the concepts of bit-fields and endians -
Suppose you have an MCU of 32bit - it means that internal memory of device have to be multiplied with a size of 32bits.
Now as you might know, the way each MCU stores the memory is LSB or MSB which is Big Endian and Little Endian respectively,
see here for illustrations: Endians figure
As can be seen, same data: 0x12345678 (32 bit value) is stored differently internally.
When you are reading and writing memory using a 32 bit pointer (trivial) case - the MCU will convert it for you (it doesn't makes any difference between the endians) - the problem arose when you are dealing with a one by one byte manipulation or when exporting (or importing) from other MCU / memory peripheral also suing 8 bit / 1 byte manipulations.
Bit field will be aligned to the Byte, Word and to the Long Word types (as seen) so it can be miss-interpreted when porting to other target.
Hence, the answer your questions:
If it is only one byte that you dividing into bits it will be ported nicely.
if you define a multi-bytes union it will make you goes in trouble.
Answered at the introduction of this answer
See answer no. 1
See the figure I have attached for illustrations
Right in general
1, 2: Not quite: it always depends on the platform (endianess) and the types you are using.
3: Yes they will always land on the same spot in memory.
4: Which alignment do you mean — the memory alignment or the field alignment?

C - What to do when double is not 8 bytes long?

I have a C program and I will use it on deprecated machines. I am a little bit scared of problems with basic types compatibilities - I mean, when I ask How can I guarantee, that double will be 8 bytes long? answer is
"You can put this in your code:
static_assert(sizeof(double) == 8, "invalid double size");"
Ok, I get it...
But what do I do, when I run the code and it goes "assertion failed"?
What are my options now? How can I deal with this problem? I need to run the program on that specific machine... Is it hard to adapt the program to different size of double?
P.S. My program needs 8byte doubles
EDIT: Why do I need 8 byte doubles?
I read some data from serial port in binary format to a buffer and from that buffer I need to extract the IEEE754 double precision and single precision numbers, to do some calculations with them.
If the reason you need 8-byte doubles is because you are reading and writing them to data files that way, you need to step back and have a long, hard think about this. Reading and writing "binary" data files is one of those things people love to do because it's "easy" and "efficient" -- but of course it's wildly unportable. You can try to patch things up to make your binary data files vaguely portable, but it's actually pretty difficult, and by the time you do, it's no longer easy or efficient.
If you can possibly see your way clear to using text data files, just about all of your portability problems will magically vanish, and you won't have to worry about those data files any more.
You can read and write floating-point quantities to text data files using %a, which will do a good job of preserving accuracy and precision.
If you know the specific types of machines and C environment on which your code must run, then you can and should study their documentation to determine whether their characteristics match your data format. If they do, then you have no problem. If they don't, then you have to write FP emulation code.
If you can't make this determination prior to writing the code, or if characteristics of the target machines vary in a way that makes a difference here, then you need to write the emulation code regardless. Of course, you can use such code on any machine, so you might stop there. You could also, however, provide an alternative code path that uses native FP, and choose between the two paths either at build time (via conditional compilation) or at run time (via an explicit test of FP representation).

Questions about memory alignement in structures and portability of the sizeof operator

I have a question about structure padding and memory alignment optimizations regarding structures in C language. I am sending a structure over the network, I know that, for run-time optimizations purposes, the memory inside a structure is not contiguous. I've run some tests on my local computer and indeed, sizeof(my_structure) was different than the sum of all my structure members. I ran some research to find out two things :
First, the sizeof() operator retrieves the padded size of the structure (i.e the real size that would be stored in memory).
When specifying __attribute__((__packed__)) in the declaration of the structure this optimization is disabled by the compiler, so sizeof(my_structure) will be exactly the same as the sum of the fields of my structure.
That being said, i am wondering if the sizeof operator was getting the padded size on every compilers implementation and on every architecture, in other words, is it always safe to copy a structure with memcpy for example using the sizeof operator such as :
memcpy(struct_dest, struct_src, sizeof(struct_src));
I am also wondering what is the real purpose of __attribute__((__packed__)), is it used to send a less important amount the data on a network when submitting a structure or is it, in fact, used to avoid some unspecified and platform-dependant sizeof operator behaviour ?
Thanks by advance.
Different compilers on different architectures can and do use different padding. So for wire transmission it is not uncommon to pack structs to achieve a consistent binary layout. This can then cater for the code at each end of the wire running on different architecture.
However you also need to make sure that your data types are the same size if you use this approach. For example, on 64 bit systems, long is 4 bytes on Windows and 8 bytes almost everywhere else. And you also need to deal with endianness issues. The standard is to transmit over the wire in network byte order. In practice you would be better using a dedicated serialization library rather than trying to reinvent solutions to all these issues.
I am sending a structure over the network
Stop there. Perhaps some would disagree with me on this (in practice you do see a lot of projects doing this), but struct is a way of laying out things in memory - it's not a serialization mechanism. By using this tool for the job, you're already tying yourself to a bunch of non-portable assumptions.
Sure, you may be able to fake it with things like structure padding pragmas and attributes, but - can you really? Even with those non-portable mechanisms you never know what quirks might show up. I recall working in a code base where "packed" structures were used, then suddenly taking it to a platform where access had to be word aligned... even though it was nominally the same compiler (thus supported the same proprietary extensions) it produced binaries which crashed. Any pain you get from this path is probably deserved, and I would say only take it if you can be 100% sure it will only run in a given compiler and environment, and that will never change. I'd say the safer bet is to write a proper serialization mechanism that doesn't allow writing structures around across process boundaries.
Is it always safe to copy a structure with memcpy for example using the sizeof operator
Yes, it is and that is the purpose of providing the sizeof operator.
Usually __attribute__((__packed__)) is used not for size considerations but when you want want to to make sure of the layout of a structure is exactly as you want it to be.
For ex:
If a structure is to be used to match hardware or be sent on a wire then it needs to have the exact same layout without any padding.This is because different architectures usually implement different kinds & amounts of padding and alignment and the only way to ensure common ground is to remove padding out out of the picture by using packing.

Reasons to use (or not) stdint

I already know that stdint is used to when you need specific variable sizes for portability between platforms. I don't really have such an issue for now, but what are the cons and pros of using it besides the already shown fact above?
Looking for this on stackoverflow and others sites, I found 2 links that treats about the theme:
codealias.info - this one talks about the portability of the stdint.
stackoverflow - this one is more specific about uint8_t.
These two links are great specially if one is looking to know more about the main reason of this header - portability. But for me, what I like most about it is that I think uint8_t is cleaner than unsigned char (for storing an RBG channel value for example), int32_t looks more meaningful than simply int, etc.
So, my question is, exactly what are the cons and pros of using stdint besides the portability? Should I use it just in some specifics parts of my code, or everywhere? if everywhere, how can I use functions like atoi(), strtok(), etc. with it?
Thanks!
Pros
Using well-defined types makes the code far easier and safer to port, as you won't get any surprises when for example one machine interprets int as 16-bit and another as 32-bit. With stdint.h, what you type is what you get.
Using int etc also makes it hard to detect dangerous type promotions.
Another advantage is that by using int8_t instead of char, you know that you always get a signed 8 bit variable. char can be signed or unsigned, it is implementation-defined behavior and varies between compilers. Therefore, the default char is plain dangerous to use in code that should be portable.
If you want to give the compiler hints of that a variable should be optimized, you can use the uint_fastx_t which tells the compiler to use the fastest possible integer type, at least as large as 'x'. Most of the time this doesn't matter, the compiler is smart enough to make optimizations on type sizes no matter what you have typed in. Between sequence points, the compiler can implicitly change the type to another one than specified, as long as it doesn't affect the result.
Cons
None.
Reference: MISRA-C:2004 rule 6.3."typedefs that indicate size and signedness shall be used in place of the basic types".
EDIT : Removed incorrect example.
The only reason to use uint8_t rather than unsigned char (aside from aesthetic preference) is if you want to document that your program requires char to be exactly 8 bits. uint8_t exists if and only if CHAR_BIT==8, per the requirements of the C standard.
The rest of the intX_t and uintX_t types are useful in the following situations:
reading/writing disk/network (but then you also have to use endian conversion functions)
when you want unsigned wraparound behavior at an exact cutoff (but this can be done more portably with the & operator).
when you're controlling the exact layout of a struct because you need to ensure no padding exists (e.g. for memcmp or hashing purposes).
On the other hand, the uint_least8_t, etc. types are useful anywhere that you want to avoid using wastefully large or slow types but need to ensure that you can store values of a certain magnitude. For example, while long long is at least 64 bits, it might be 128-bit on some machines, and using it when what you need is just a type that can store 64 bit numbers would be very wasteful on such machines. int_least64_t solves the problem.
I would avoid using the [u]int_fastX_t types entirely since they've sometimes changed on a given machine (breaking the ABI) and since the definitions are usually wrong. For instance, on x86_64, the 64-bit integer type is considered the "fast" one for 16-, 32-, and 64-bit values, but while addition, subtraction, and multiplication are exactly the same speed whether you use 32-bit or 64-bit values, division is almost surely slower with larger-than-necessary types, and even if they were the same speed, you're using twice the memory for no benefit.
Finally, note that the arguments some answers have made about the inefficiency of using int32_t for a counter when it's not the native integer size are technically mostly correct, but it's irrelevant to correct code. Unless you're counting some small number of things where the maximum count is under your control, or some external (not in your program's memory) thing where the count might be astronomical, the correct type for a count is almost always size_t. This is why all the standard C functions use size_t for counts. Don't consider using anything else unless you have a very good reason.
cons
The primary reason the C language does not specify the size of int or long, etc. is for computational efficiency. Each architecture has a natural, most-efficient size, and the designers specifically empowered and intended the compiler implementor to use the natural native data size data for speed and code size efficiency.
In years past, communication with other machines was not a primary concern—most programs were local to the machine—so the predictability of each data type's size was of little concern.
Insisting that a particular architecture use a particular size int to count with is a really bad idea, even though it would seem to make other things easier.
In a way, thanks to XML and its brethren, data type size again is no longer much of a concern. Shipping machine-specific binary structures from machine to machine is again the exception rather than the rule.
I use stdint types for one reason only, when the data I hold in memory shall go on disk/network/descriptor in binary form. You only have to fight the little-endian/big-endian issue but that's relatively easy to overcome.
The obvious reason not to use stdint is when the code is size-independent, in maths terms everything that works over the rational integers. It would produce ugly code duplicates if you provided a uint*_t version of, say, qsort() for every expansion of *.
I use my own types in that case, derived from size_t when I'm lazy or the largest supported unsigned integer on the platform when I'm not.
Edit, because I ran into this issue earlier:
I think it's noteworthy that at least uint8_t, uint32_t and uint64_t are broken in Solaris 2.5.1.
So for maximum portability I still suggest avoiding stdint.h (at least for the next few years).

Resources