Avoidable padding with "unfull" structs - c

I was trying to give a "logical" counter-example to this answer indicating that sorting the members of a struct based on their size would minimize padding, when I encountered what seems to me as illogical.
Imagine the following struct:
struct A
{
int32_t a;
int16_t b;
};
sizeof this struct would normally be padded to 8 bytes to make sure a is aligned for example in an array of struct A.
Now image these other structs:
struct B
{
struct A a, b;
int16_t c, d;
};
struct C
{
struct A a;
int16_t c;
struct A b;
int16_t d;
};
As expected struct B has size 20 due to padding. However, I would have expected struct C to have size 16 since padding could be avoided, but ideone and gcc (with or without optimization) give a size of 24 bytes, clearly padding 2 bytes after each of the members.
My reasoning is that struct A in reality has only 6 bytes and should be padded when necessary, for example in an array of struct A or its usage in struct B. However, in struct C padding of struct A is unnecessary and c could have been placed where the padding of a could have been and the same with d and b.
Why doesn't the compiler minimize the padding by putting c where the padding of a would be?
P.S. I understand that sizeof(struct A) must return 8. Otherwise something like memset(array_of_A, 0, N * sizeof *array_of_A) won't work properly since array_of_A would contain padding while N * sizeof *array_of_A would ignore that padding.
The only thing I can think of that could be a problem is that with the optimization above then sizeof(struct C) would be smaller than the sizeof of all its members. However, I can't think of a case where such a thing could become a problem (i.e. a usage that is not based on undefined behavior).

struct C someC;
struct A someA;
*(struct A*)&(someC.a) = someA;
The assignment above may fail (mistakenly write to someC.c) if the padding you describe is supported by compilers.
EDITED: The example above relies on the compiler behavior when assigning structs. As I have known (an just checked) gcc copies as the struct is a flat region of memory, which is not member-wise.
EDITED: Changed from "would fail" to "may fail" since it's not defined if the padding bits shall be copied, see item 6 of section 6.2.6.1 of ISO_IEC_9899_2011:
When a value is stored in an object of structure or union type, including in a member
object, the bytes of the object representation that correspond to any padding bytes take
unspecified values.51)
and footnote 51):
51) Thus, for example, structure assignment need not copy any padding bits.

memcpy(&someC.a, &someA, sizeof(someC.a)) would write over someC.c.
That's what I was trying to get at with my comment about sizeof()'s having
to be different. For that memcpy() to work, sizeof(someC.a) would have to
be different from sizeof(someA) which just seems to be asking for a lot of
trouble and hard to find bugs.

Related

Memory Alignment warning with gcc

I'm trying to implement a polymorphic data structure e.g. an intrusive linked list (I already know the kernel has one - this is more of learning experience).
The trouble is casting a nested struct to the containing struct leads to gcc issuing a memory-alignment warning.
The specifics are provided below:
// t.c
#include <stdio.h>
enum OL_TYPE { A, B, C };
struct base {
enum OL_TYPE type;
};
struct overlay {
struct base base;
int i;
};
struct overlay2 {
struct base base;
float f;
int i;
// double d; // --> adding this causes a memory alignment warning
};
void testf(struct base *base) {
if (base->type == A) {
struct overlay *olptr = (struct overlay *)base;
printf("overlay->i = %d\n", olptr->i);
} else
if (base->type == B) {
struct overlay2 *olptr = (struct overlay2 *)base;
printf("overlay->i = %d\n", olptr->i);
}
}
int main(int argc, char *argv[]) {
struct overlay ol;
ol.base.type = A;
ol.i = 3;
testf(&ol.base);
}
Compiled with gcc t.c -std=c99 -pedantic -fstrict-aliasing -Wcast-align=strict -O3 -o q leads to this:
t.c: In function ‘testf’:
t.c:28:34: warning: cast increases required alignment of target type [-Wcast-align]
28 | struct overlay2 *olptr = (struct overlay2 *)base;
| ^
It's interesting to notice that if I comment out the double from overlay2 such that it's not part of the structure anymore, the warning disappears.
In light of this, I have a few questions:
Is there any danger to this?
If yes, what is the danger? If not, does that mean the warning is a false positive and how can it be dealt with?
Why does adding the double suddenly lead to the alignment warning?
Is misalignment / an alignment problem even possible if I cast from base to a struct overlay2 * IF the actual type IS in fact a struct overlay2 with a nested struct base ? Please explain!
Appreciate any answers that can provide some insight
is there any danger to this?
Only if what base points to was not allocated as a struct overlay2 in the first place. If you're just smuggling in a pointer to struct overlay2 as a struct base*, and casting back internally, that should be fine (the address would actually be aligned correctly in the first place).
If yes, what is the danger? If not, does that mean the warning is a false positive and how can it be dealt with?
The danger occurs when you allocate something as something other than struct overlay2 then try to use the unaligned double inside it. For example, a union of struct base and char data[sizeof(struct overlay2)] would be the right size to contain a struct overlay2, but it might be four byte aligned but not eight byte aligned, causing the double (size 8, alignment 8) to be misaligned. On x86 systems, a misaligned field typically just slows your code, but on non-x86 misaligned field access can crash your program.
As for dealing with it, you can silence the warning (with some compiler-specific directives), or you can just force all your structures to be aligned identically, by adding alignas directives (available since C11/C++11):
#include <stdalign.h> // Add at top of file to get alignas macro
struct base {
alignas(double) enum OL_TYPE type;
};
why does adding the double suddenly lead to the alignment warning?
Because it's the only data type in any of the structures that requires eight byte alignment; without it, the structures can all be legally aligned to four bytes; with it, overlay2 needs eight byte alignment while the others only need four byte alignment.
is an alignment even possible if I cast from base to a struct overlay2 * IF the actual type IS in fact a struct overlay2 with a nested struct base ?
As noted above, if it was really pointing to something allocated as a struct overlay2 all along, you're safe.
It's not really a false positive, because the compiler doesn't know that your passed base* argument will be the address of an actual overlay2 structure (even though you do). However, if that condition is always going to be true, then no issue should arise due to inappropriate alignment.
However, to be completely sure (and to silence the warning), you could always make the base structure align to the requirements of a double, if that doesn't cause any other issues:
struct base {
_Alignas(double) enum OL_TYPE type;
};
Why does adding the double suddenly lead to the alignment warning
Because, in general, the alignment requirement of any structure must be at least that of the largest alignment requirement of all its members. Without such an assurance, an array of those structures would – by definition – result in misalignment of such members in any one one of two adjacent elements in the array.
Some processors require that pointers to an item of size S must be aligned to the next power-of-two greater than or equal to S, otherwise you get memory alignment faults.
If you take pointer to a variable (including aggregate) that's lesser alignment (lower power-of-two) and cast it to the type of a variable that has higher alignment requirements, you potentially set yourself up for those alignment faults.
If this is likely to be an issue, one option is to put all the item types that will be cast between into a union, always allocate that union, and then reference the type within the union.
That has the potential to get expensive on memory, so another possibility is to create a copy function that initialises a greater-alignment object from the contents of the lesser-alignment one, ensuring any additional values the larger one has will be set to sensible default values.
Finally, and since we're talking about linked lists anyway, the better option is to have a single struct type that comprises the linked items. It will also contain a void * or union thing * pointer to the actual data and an indication of what type of data the pointer refers to. This way you're pretty economical with memory, you don't have to worry about alignment issues, and you have the cleanest and most adaptable representation.

Would the two following examples take up equal memory after compiling?

Please help me with this concept:
Which would take up more memory after compiling if both are declared and initialized?
struct V
{
int a, b;
};
struct X
{
struct V v;
int N;
};
OR:
struct X
{
int a, b, c;
};
Rules for structure padding:
Padding is only inserted when a structure member is followed by a member with a larger alignment requirement or at the end of the structure.
The last member is padded with the number of bytes required so that the total size of the structure should be a multiple of the largest alignment of any structure member.
This suggests that both will take same memory space, 12-bytes each assuming size of int is 4 bytes. The reason is that there is no padding in either case.
My common sense says me that in both cases the program will allocate the same amount of memory (3 integers).
In the end the C structs are not more than a way to help the developer to organize variables and make code more readable. Once compiled the program doesn't care about structs or code organization but about variables (memory) and operations (instructions).
Both variants of struct X take up the same amount of memory, after all, they only contain three integers and the nested struct doesn't affect the memory layout of its member data when it comes to size.
The struct in this case is only syntactic sugar to arrange its members in a specific order and doesn't introduce additional overhead.
If there is no additional padding between the ints, both structs will consume 3 * sizeof(int) bytes.

C Are arrays in structs aligned?

struct {
uint8_t foo;
uint8_t bar;
uint8_t baz;
uint8_t foos[252];
uint8_t somethingOrOther;
} A;
struct {
uint8_t foo;
uint8_t bar;
uint8_t baz;
uint8_t somethingOrOther;
uint8_t foos[252];
} B;
Does it matter that I've put foos on byte 3 in the first example, vs on byte 4 in B?
Does an array in C have to start aligned?
Is the size of this struct 256 bytes exactly?
Given that the data type is uint8_t (equivalent to unsigned char), there is no need for padding in the structure, regardless of how it is ordered. So, you can reasonably assume in this case that every compiler will make that structure into 256 bytes, regardless of the order of the elements.
If there were data elements of different sizes, then you might well get padding added and the size of the structure might vary depending on the order of the elements.
As the good book says (C11 section 6.7.2.1 paragraph 14):
Each non-bit-field member of a structure or union object is aligned in
an implementation- defined manner appropriate to its type... There
may be unnamed padding within a structure object
You didn't "put foos on byte 3" - apart from the fact that the first element is always on byte 0, you have no real control over what byte an element will be put on. The compiler can give each field a full machine word or more if it thinks that will provide the most efficient accesses. If so, it will allocate "padding" bytes in between - unused space.
Note that this only applies to structs; arrays themselves do not add any padding bytes (if they did, the pointer arithmetic/indexing rule wouldn't work), so you can be sure that the size of foos itself is exactly the declared size, and know the exact alignment of each numbered element.
No, it doesn't matter.
Pretty much every compiler out there will align struct fields to a natural boundary appropriate for the target architecture, by inserting hidden "padding bytes" between fields where necessary.
Specifically, since you're using only uint8_t, there will be no padding bytes inserted - every field already falls on a naturally-aligned boundary. (Everything is a multiple of one.)
Both of the structs you've shown are exactly 256 bytes in size. You can confirm this like so:
int main(void)
{
printf("sizeof(struct A)=%zu \n", sizeof(struct A));
printf("sizeof(struct B)=%zu \n", sizeof(struct B));
}
You can prevent this padding from being added by "packing" the structure:
Microsoft Visual C uses #pragma pack(1)
GCC uses __attribute__((packed))
Do note that doing this on a structure with unaligned members can have a serious impact on the performance of your program.

Structures and Unions in C, determining size and accessing members

All,
Here is an example on Unions which I find confusing.
struct s1
{
int a;
char b;
union
{
struct
{
char *c;
long d;
}
long e;
}var;
};
Considering that char is 1 byte, int is 2 bytes and long is 4 bytes. What would be the size of the entire struct here ? Will the union size be {size of char*}+ {size of double} ? I am confused because of the struct wrapped in the union.
Also, how can I access the variable d in the struct. var.d ?
The sizes are implementation-defined, because of padding. A union will be at least the size of the largest member, while a struct will be at least the sum of the members' sizes. The inner struct will be at least sizeof(char *) to sizeof(long), so the union will be at least that big. The outer struct will be at least sizeof(int) + 1 + sizeof(char *) + sizeof(long). All of the structs and unions can have padding.
You are using an extension to the standard, unnamed fields. In ISO C, there would be no way to access the inner struct. But in GCC (and I believe MSVC), you can do var.d.
Also, you're missing the semi-colon after the inner struct.
With no padding, and assuming sizeof(int)==sizeof(char *)==sizeof(long)==4, the size of the outer struct will be 13.
Breaking it down, the union var overlaps an anonymous struct with a single long. That inner struct is larger (a pointer and a long) so its size controls the size of the union, making the union consume 8 bytes. The other members are 4 bytes and 1 byte, so the total is 13.
In any sensible implementation with the size assumptions I made above, this struct will be padded to either 2 byte or 4 byte boundaries, adding at least 1 or 3 additional bytes to the size.
Edit: In general, since the sizes of all of the member types are themselves implementation defined, and the padding is implementation defined, you need to refer to the documentation for your implementation and the platform to know for sure.
The implementation is allowed to insert padding after essentially any element of a struct. Sensible implementations use as little padding as required to comply with platform requirements (e.g. RISC processors often require that a value is aligned to the size of that value) or for performance.
If using a struct to map fields to the layout of values assumed by file format specification, a coprocessor in shared memory, a hardware device, or any similar case where the packing and layout actually matter, then you might want to be concerned that you are testing at either compile time or run time that your assumptions of the member layout are true. This can be done by verifying the size of the whole structure, as well as the offsets of its members.
See this question among others for a discussion of compile-time assertion tricks.
Unions are dangerous and risky to use without strict discipline. And the fact you put it in a struct is really dangerous because by default all struct members are public: that exposes the possibility of client code making changes to your union, without informing your program what type of data it stuffed in there. If you use a union you should put it in a class where at least you can hide it by making it private.
We had a dev years ago who drank the koolaid of unions, and put it in all his data structures. As a result, the features he wrote with it, are now one of the most despised parts of our entire application, since they are unmodifiable, unfixable and incomprehensible.
Also Unions throw away all the type safety that modern c/c++ compilers give you. Surely if you lie to the compiler it will get back at you someday. Well actually it will get back at your customer when your app crashes.

Can C arrays contain padding in between elements?

I heard a rumor that, in C, arrays that are contained inside structs may have padding added in between elements of the array. Now obviously, the amount of padding could not vary between any pair of elements or calculating the next element in an array is not possible with simple pointer arithmetic.
This rumor also stated that arrays which are not contained in structures are guaranteed to contain no padding. I know at least that part is true.
So, in code, the rumor is:
{
// Given this:
struct { int values[20]; } foo;
int values[20];
// This may be true:
sizeof(values) != sizeof(foo.values);
}
I'm pretty certain that sizeof(values) will always equal sizeof(foo.values). However, I have not been able to find anything in the C standard (specifically C99) that explicitly confirms or denies this.
Does anyone know if this rumor is addressed in any C standard?
edit: I understand that there may be padding between the end of the array foo.values and the end of the struct foo and that the standard states that there will be no padding between the start of foo and the start of foo.values. However, does anyone have a quote from or reference to the standard where it says there is no padding between the elements of foo.values?
No, there will never be padding in between elements of an array. That is specifically not allowed. The C99 standard calls array types "An array type describes a contiguously allocated nonempty set of objects...". For contrast, a structure is "sequentially", not "contiguously" allocated.
There might be padding before or after an array within a structure; that is another animal entirely. The compiler might do that to aid alignment of the structure, but the C standard doesn't say anything about that.
Careful here. Padding may be added at the end of the struct, but will not be added between the elements of the array as you state in your question. Arrays will always reference contiguous memory, though an array of structures may have padding added to each element as part of the struct itself.
In your example, the values and foo.values arrays will have the same size. Any padding will be part of the struct foo instead.
Here's the explanation as to why a structure may need padding between its members or even after its last member, and why an array doesn't:
Different types might have different alignment requirements. Some types need to be aligned on word boundaries, others on double or even quad word boundaries. To accomplish this, a structure may contain padding bytes between its members. Trailing padding bytes might be needed because the memory location directly ofter a structure must also conform to the structure's alignment requirements, ie if bar is of type struct foo *, then
(struct foo *)((char *)bar + sizeof(struct foo))
yields a valid pointer to struct foo (ie doesn't fail due to mis-alignment).
As each 'member' of an array has the same alignment requirement, there's no reason to introduce padding. This holds true for arrays contained in structures as well: If an array's first elment is correctly aligned, so are all following elements.
Yes, sort of. Variables are often aligned to some boundry, depending on the variable. Take the following, for instance:
typedef struct
{
double d;
char c;
} a_type_t;
double and char are 8 and 1 bytes, on my system, respectively. Total of 9. That structure, however, will be 16 bytes, so that the doubles will always be 8-byte aligned. If I had just used ints, chars, etc, then the alignment might be 1, 2, 4, or 8.
For some type T, sizeof(T) may or may not equal sizeof(T.a) + sizeof(T.b) + sizeof(T.c) ... etc.
Generally, this is entirely compiler and architecture dependent. In practice, it never matters.
Consider:
struct {
short s;
int i;
} s;
Assuming shorts are 16 bits and you're on 32 bits, the size will probably be 8 bytes as each struct members tends to be aligned a word (32 bit in this case) boundary. I say "probably" because it is implementation specific behaviour that can be varied by compiler flags and the like.
It's worth stressing that this is implementation behaviour not necessarily defined by the C standard. Much like the size of shorts, ints and longs (the C standard simply says shorts won't be larger than ints and longs won't be smaller than ints, which can end up as 16/32/32, 16/32/64, 32/32/64 or a number of other configurations).

Resources