struct packing: how to add struct members at the beginning? - c

I'm implementing a binary tree in C89, and I'm trying to share common attributes among all node structs through composition. Thus I have the following code:
enum foo_type
{
FOO_TYPE_A,
FOO_TYPE_B
};
struct foo {
enum foo_type type;
};
struct foo_type_a {
struct foo base;
struct foo * ptr;
};
struct foo_type_b {
struct foo base;
char * text;
};
I'm including a member of type struct foo in all struct definitions as their initial member in order to provide access to the value held by enum foo_type regardless of struct type. To achieve this I'm expecting that a pointer to a structure object points to its initial member, but I'm not sure if this assumption holds in this case. With C99, the standard states the following (see ISO/IEC 9899:1999 6.7.2.1 §13)
A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
Although all structs share a common struct foo object as their initial member, padding comes into play. While struct foo only has a single member which is as int size, both struct foo_type_a and struct foo_type_b include pointer members, which in some cases increase the alignment and thus adds padding.
So, considering this scenario, does the C programming language (C89 or any subsequent version) ensures that it's safe to access the value of struct foo::type through a pointer to an object, whether that object is of type struct foo or includes an object of type struct foo as its first member, such as struct foo_type_a or struct foo_type_b?

As you yourself quote from the C Standard, what you describe is supported by C99 and later versions.
Is appears it was also supported by C89 as the language you quoted was already present in the ANSI-C document from 1988:
3.5.2.1 Structure and union specifiers
...
Within a structure object, the non-bit-field members and the units
in which bit-fields reside have addresses that increase in the order
in which they are declared. A pointer to a structure object, suitably
cast, points to its initial member (or if that member is a bit-field,
then to the unit in which it resides), and vice versa. There may
therefore be unnamed holes within a structure object, but not at its
beginning, as necessary to achieve the appropriate alignment.

Related

Does C ever pad a struct before the first element?

Consider an arbitrary struct where the C compiler will perform padding
struct node {
enum type;
size_t num_children;
void** nodes;
};
Will C ever perform padding before the first element? I ask this as I need to do some funky things with void* and require that
void* a = node->nodes[0];
enum type t = *(enum type*)(a);
will always be evaluated correctly. I'm aware that I can force no padding but would rather not.
Will C ever perform padding before the first element?
No. This is explicitly prohibited in the C standard:
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at its
beginning.
(emphasis mine).

structure members and pointer alignment

Is a pointer to the struct aligned as if it were a pointer to the first element?
or
Is a conversion between a pointer to a struct and a pointer to the type of its first member (or visa versa) ever UB?
(I hope they are the same question...)
struct element
{
tdefa x;
tdefb y;
};
int foo(struct element* e);
int bar(tdefa* a);
~~~~~
tdefa i = 0;
foo((struct element*)&i);
or
struct element e;
bar((tdefa*)&e);
Where tdefa and tdefb could be defined as any type
Background:
I asked this question
and a user in a comment on one of the answers brought up C11 6.3.2.3 p7 that states:
"A pointer to an object type may be converted to a pointer to a
different object type. If the resulting pointer is not correctly
aligned for the referenced type, the behavior is undefined"
However I am having trouble working out when this would become an issue, my understanding was that padding would allow all members of the struct to be aligned correctly. Have I misunderstood?
and if:
struct element e;
tdefa* a = &e.x;
would work then:
tdefa* a = (tdefa*)&e;
would too.
There is never any initial padding; the first member of a struct is required to start at the same address as the struct itself.
You can always access the first member of a struct by casting a pointer to the whole struct, to be a pointer to the type of the first member.
Your foo example might run into trouble because foo will be expecting its argument to point to a struct element which in fact it does not, and there might be an alignment mismatch.
However the bar example and the final example is fine.
A pointer to a structure always points to its initial member.
Here is the citation directly from C99 standard (6.7.2.1, paragraph 13), emphasis mine:
Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning
As for your foo and bar examples:
The call to bar will be fine, as bar expects a tdefa, which is exactly what it's getting.
The call to foo, however, is problematic. foo expects a full struct element, but you're only passing a tdefa (while the struct consists of both tdefa and tdefb).

Is this a C11 anonymous struct?

I was looking into the C11 draft and it says
An unnamed member of structure type with no tag is called an anonymous structure; an unnamed member of union type with no tag is called an anonymous union. The members of an anonymous structure or union are considered to be members of the containing structure or union.
So I constructed the following testcase
// struct type with no tag
typedef struct {
unsigned char a;
unsigned char b;
// ... Some other members ...
unsigned char w;
} AToW;
union
{
AToW; // <- unnamed member
unsigned char bytes[sizeof(AToW)];
} myUnion;
Clang and GCC both complain about the unnamed member, saying that the declaration has no effect. Did I do something wrong, or do they simply not support that feature yet?
No, that's not an unnamed member.
An example is:
struct outer {
int a;
struct {
int b;
int c;
};
int d;
};
The inner structure containing members b and c is an unnamed member of struct outer. The members of this unnamed member, b and c, are considered to be members of the containing structure.
This is probably more useful with a contained union rather than a contained structure. In particular, it can be used to define something similar to a Pascal or Ada variant record:
enum variant_type { t_int, t_double, t_pointer, t_pair };
struct variant {
enum variant_type type;
union {
int i;
double d;
void *p;
struct {
int x;
int y;
};
};
};
This lets you refer to i, d, and p directly as members of a struct variant object rather than creating an artificial name for the variant portion. If some variants require more than one member, you can nest anonymous structures within the anonymous union.
(It differs from Pascal and Ada in that there's no mechanism to enforce which variant is active given the value of the type member; that's C for you.)
In your example, AToW is a typedef for a struct type that you defined previously. You're not permitted to have a bare
AToW;
in the middle of a struct definition, any more than you can have a bare
int;
C11 added the ability to define a nested anonymous struct within another struct, but only by defining a new anonymous struct type at that point. You can't have an anonymous struct member of a previously defined type. The language could have been defined to permit it, and the semantics would (I think) be reasonably straightforward -- but there wasn't much point in defining two different ways to do the same thing. (For "struct" in the above, read "struct or union".)
Quoting the N1570 draft (which is very close to the released 2011 ISO C standard), section 6.7.2.1 paragraph 13:
An unnamed member whose type specifier is a structure specifier with
no tag is called an anonymous structure; an unnamed member whose type
specifier is a union specifier with no tag is called an anonymous
union. The members of an anonymous structure or union are considered
to be members of the containing structure or union. This applies
recursively if the containing structure or union is also anonymous.
A structure specifier consists of the keyword struct, followed by an optional identifier (the tag, omitted in this case), followed by a sequence of declarations enclosed in { and }. In your case, AToW is a type name, not a structure specifier, so it can't be used to define an anonymous structure.

Struct pointer compatibility

Suppose we have two structs:
typedef struct Struct1
{
short a_short;
int id;
} Struct1;
typedef struct Struct2
{
short a_short;
int id;
short another_short;
} Struct2;
Is it safe to cast from Struct2 * to Struct1 * ? What does the ANSI spec says about this?
I know that some compilers have the option to reorder structs fields to optimize memory usage, which might render the two structs incompatible. Is there any way to be sure this code will be valid, regardless of the compiler flag?
Thank you!
It is safe, as far as I know.
But it's far better, if possible, to do:
typedef struct {
Struct1 struct1;
short another_short;
} Struct2;
Then you've even told the compiler that Struct2 starts with an instance of Struct1, and since a pointer to a struct always points at its first member, you're safe to treat a Struct2 * as a Struct1 *.
struct pointers types always have the same representation in C.
(C99, 6.2.5p27) "All pointers to structure types shall have the same
representation and alignment requirements as each other."
And members in structure types are always in order in C.
(C99, 6.7.2.1p5) "a structure is a type consisting of a sequence of
members, whose storage is allocated in an ordered sequence"
No, the standard does't allow this; accessing the elements of a Struct2 object through a Struct1 pointer is undefined behavior. Struct1 and Struct2 are not compatible types (as defined in 6.2.7) and may be padded differently, and accessing them via the wrong pointer also violates aliasing rules.
The only way something like this is guaranteed to work is when Struct1 is included in Struct2 as its initial member (6.7.2.1.15 in the standard), as in unwind's answer.
The language specification contains the following guarantee
6.5.2.3 Structure and union members
6 One special guarantee is made in order to simplify the use of unions: if a union contains
several structures that share a common initial sequence (see below), and if the union
object currently contains one of these structures, it is permitted to inspect the common
initial part of any of them anywhere that a declaration of the completed type of the union
is visible. Two structures share a common initial sequence if corresponding members
have compatible types (and, for bit-fields, the same widths) for a sequence of one or more
initial members.
This only applies to type-punning through unions. However, this essentially guarantees that the initial portions of these struct types will have identical memory layout, including padding.
The above does not necessarily allow one to do the same by casting unrelated pointer types. Doing so might constitute a violation of aliasing rules
6.5 Expressions
7 An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the
object,
— a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
The only question here is whether accessing
((Struct1 *) struct2_ptr)->a_short
constitutes access to the whole Struct2 object (in which case it is a violation of 6.5/7 and it is undefined), or merely access to a short object (in which case it might be perfectly defined).
It general, it might be a good idea to stick to the following rule: type-punning is allowed through unions but not through pointers. Don't do it through pointers, even if you are dealing with two struct types with a common initial subsequence of members.
It will most probably work. But you are very correct in asking how you can be sure this code will be valid. So: somewhere in your program (at startup maybe) embed a bunch of ASSERT statements which make sure that offsetof( Struct1.a_short ) is equal to offsetof( Struct2.a_short ) etc. Besides, some programmer other than you might one day modify one of these structures but not the other, so better safe than sorry.
Yes, it is ok to do that!
A sample program is as follows.
#include <stdio.h>
typedef struct Struct1
{
short a_short;
int id;
} Struct1;
typedef struct Struct2
{
short a_short;
int id;
short another_short;
} Struct2;
int main(void)
{
Struct2 s2 = {1, 2, 3};
Struct1 *ptr = &s2;
void *vp = &s2;
Struct1 *s1ptr = (Struct1 *)vp;
printf("%d, %d \n", ptr->a_short, ptr->id);
printf("%d, %d \n", s1ptr->a_short, s1ptr->id);
return 0;
}

What does the C standard say about pointers to structs and their first member?

Consider the following two struct:
struct a
{
int a;
};
struct b
{
struct a a_struct;
int b;
};
the following instantiation of struct b:
struct b b_struct;
and this condition:
if (&b_struct == (struct b*)&b_struct.a_struct)
printf("Yes\n");
Does the C standard mandate this to always evaluate true?
Yes, according to 6.7.2.1, "Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning."
Can't find it in the C Standard, but the answer is "yes" - the C++ Standard says:
A pointer to a POD-struct object,
suitably converted using a
reinterpret_cast, points to its
initial member (or if that member is a
bit-field, then to the unit in which
it resides) and vice versa. [Note:
There might therefore be unnamed
padding within a POD-struct object,
but not at its beginning, as necessary
to achieve appropriate alignment. ]
As C and C++ POD objects must be compatible, the same must be true for C.
Yes.
There must not be any padding in front of the first member.
The address of a structure is the same as the address of its first member, provided that the appropriate cast is used.
resource

Resources