Is multiple-level "struct inheritance" guaranteed to work everywhere?

Is multiple-level "struct inheritance" guaranteed to work everywhere? - c

I know that in C, the first member of a struct is guaranteed to have no padding before it. Thus &mystruct == &mystruct.firstmember is always true.
This allows the "struct inheritance" technique, as described in this question:
typedef struct
{
// base members
} Base;
typedef struct
{
Base base;
// derived members
} Derived;
// ... later
Base* object = (Base*) malloc(sizeof()); // This is legal
However, I'd like to make sure that this actually works safely with unlimited layers of "inheritance". E.g.:
typedef struct
{
// members
} A;
typedef struct
{
A base;
// members
} B;
typedef struct
{
B base;
// members
} C;
Are all of the following uses guaranteed to work?
A* a = (A*) malloc(sizeof(B));
A* a = (A*) malloc(sizeof(C));
B* b = (B*) malloc(sizeof(C));
C* c = malloc(sizeof(C));
// ... use and access members through the pointers
EDIT:
Let me clarify what I'm asking. Is the following use of "multi-level inheritance" guaranteed to work by the C standard?
C* c = malloc(sizeof(C));
// ... initialize fields in c
A* a = (A*) c;
// ... use A fields in a
B* b = (B*) a;
// ... use B fields in b
B* b = (B*) c;
// ... use B fields in b
c = (C*) a;
// ... go back to using C fields in c

That the kind of "multi-level inheritance" you describe must work follows from the same principles -- explained in the other Q&A you referenced -- that makes this kind of inheritance work at all. Specifically, the standard explicitly provides that casting the addresses of structures and of their initial members between the applicable types has the desired effect:
A pointer to a structure object, suitably
converted, points to its initial member [...] and vice versa.
(paragraph 6.7.2.1/15)
So consider this declaration, relative to the structure definitions provided:
C c;
The quoted provision specifies that &c == (C *) &c.base and (B *) &c == &c.base are both true.
But c.base is a B, so the provision also specifies that (A *) &c.base == &c.base.base and &c.base == (B *) &c.base.base are both true.
Since (B *) &c == &c.base is true and &c.base == (B *) &c.base.base are both true, it follows that (B *) &c == (B *) &c.base.base is also true.
Casting both sides to either A * or C * then produces also the equalities (A *) &c == &c.base.base and &c == (C *) &c.base.base.
This reasoning can be extended to an arbitrary nesting depth.
One can quibble a bit about dynamically allocated structures vis a vis the strict aliasing rule, but there's no reason to think that it is supposed to work any differently in that case, and as long as one first accesses the dynamically-allocated space via an lvalue of the most specific type (C in this example), I see no scenario that supports a different interpretation of the standard for the dynamic-allocation case than applies to other cases. In practice, I do not expect initial access via the most specific type actually to be required by any implementation.

What the ISO C standard requires to work is the following situation:
union U {
struct X x;
struct Y y;
struct Z z;
/* ... */
};
If the structures share some common initial sequence of members, then that initial sequence can be accessed through any of the members. For instance:
struct X {
/* common members, same as in Y and Z: */
int type;
unsigned flags;
/* different members */
};
If all the structures have type and flags in the same order and of the same types, then this is required to work:
union U u;
u.x.type = 42; /* store through x.type */
foo(u.y.type); /* access through y.type */
Other hacks of this type are not "blessed" by ISO C.
The situation you have there is a little different. It's question of whether, given a leading member of a structure, can we convert a pointer to the structure to that member's type and then use it. The simplest case is something like this:
struct S {
int m;
};
Given an object struct S s, we can take the address of m using &s.m, obtaining an int * pointer. Equivalently, we can obtain the same pointer using (int *) &s.
ISO C does require that a structure has the same address as its first member; a pointer to the structure and a pointer to the first member have a different type, but point to the same address, and we can convert between them.
This isn't restricted by nesting levels. Given an a of this type:
struct A {
struct B {
struct C {
int m;
} c;
} b
};
the address &a.b.c.m is still the same as the address &a. The pointer &a.b.c.m is the same as (int *) &a.

Related

Does `offsetof(struct Derived, super.x) == offsetof(struct Base, x)` hold true in C?

I am unsure what André Caron means here:
Virtual functions in C
... some of this code relies on (officially) non-standard behavior that
"just happens" to work on most compilers. The main issue is that the
code assumes that &m.base == &m (e.g. the offset of the base member is
0). If that is not the case, then the cast in custom_bar() results in
undefined behavior. To work around this issue, you can add an extra
pointer in struct foo as such:
m is of type struct meh *. An object f of type struct foo * is assigned to m through a cast to struct meh *.
struct meh has member base of type struct foo (struct foo meh::base = foo::bar).
Why it is supposedly not guaranteed that &m.base == &m?
I can see this if the structure is not a POD. André also hints at this. However, why is it necessary for a POD structure to have another pointer void *foo::hook?
struct meh * m = (struct meh*)f; becomes struct meh * m = (struct meh*)f->hook;.
After he assigns hook to m->base.hook = m;.
struct meh
{
/* inherit from "class foo". MUST be first. */
struct foo base;
int more_data;
};
Below, I listed relevant ISO C90/C++98 excerpts from my research. I also created a code example.
The example code can be compiled with Clang via -fsanitize=undefined -std=c++98 -O0 -Wall -Wextra -Wpedantic -Wconversion -Wundef.
Here it is:
https://godbolt.org/z/qo9f8KnYM
Excerpts
From ISO C90 (ANSI C89):
An object shall have its stored value accessed only by an lvalue
that has one of the following types: /28/
...
an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
A pointer to a structure object, suitably cast, points to its initial
member (or if that member is a bit-field, then to the unit in which it
resides), and vice versa. There may therefore be unnamed holes within
a structure object, but not at its beginning, as necessary to achieve
the appropriate alignment.
From ISO C++98:
16 If a POD-union contains two or more POD-structs that share a
common initial sequence, and if the POD-
union object currently contains one of these POD-structs, it is permitted to inspect the common initial part
of any of them. Two POD-structs share a common initial sequence if corresponding members have layout-
compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members. 17 A pointer to a
POD-struct object, suitably converted using a reinterpret_cast, points
to its initial
member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [Note: There
might therefore be unnamed padding within a POD-struct object, but not at its beginning, as necessary to
achieve appropriate alignment. ]
Code example
#include <iostream>
struct A {
int m1;
};
struct B {
int m1;
int m2;
};
struct C {
struct A super;
int m3;
};
int main(void) {
struct A a = {42};
struct C c = {{666}, 1984};
// Access A::m1 through pointer of type B
std::cout << ((B *)&a)->m1 << std::endl; // 42
// Access A::m1 through pointer of type C
std::cout << ((C *)&a)->super.m1 << std::endl; // 42
// Access C::super::A::m1 through pointer of type A.
std::cout << ((A *)(&c))->m1 << std::endl; // 666
return 0;
}
Edit 1: Let me rewrite this question in this edit section.
I will ignore C++, as people in the comments told me to not complicate the question. If this edit is more helpful than the original, then perhaps you can consider replacing the original post with this edit.
Or I or someone else can just "strike-through" the original one.
Or, if you have a better idea on how to improve my question, please tell me.
(I might add that I have issues with attention and get lost in details quite easily... I will leave it at that. You may have guessed what it is...)
If my second attempt still fails to deliver, then perhaps I should take my failure to ask a clear question as a hint to think and write it down another time, if applicable.
Without further ado, here is my second attempt to pose this question:
I am referring to an answer posted here:
Virtual functions in C
struct Base {
int x;
};
struct Derived {
struct Base super;
};
If offsetof(struct Derived, super) == 0 and offsetof(struct Base, x) == 0, can we then imply that offsetof(struct Derived, super.x) == offsetof(struct Base, x)?
André Caron suggests using an extra pointer pointing to a derived object.
Apparently, it is not sufficient or portable to rely on offsetof(struct Derived, super.x) == offsetof(struct Base, x).
Even though this works, you are relying on compiler extensions for type punning that can lead to undefined behavior blablabla. This works in GCC and MSVC for a fact.
Indeed the alignment stuff relies on compiler extensions. You can make it portable using an extra void* pointer in struct foo that points to the "derived object". However, the technique is sufficiently popular in well-known libraries to be considered "portable". Any compiler that made this type of code break would have lots of complaints from its customers.
I have trouble understanding why offsetof(struct Derived, super.x) != offsetof(struct Base, x) could potentially be the case.
I have not found clarification in the C standards. Hence, I am looking for further clarification on that.
13:26, restate my assumptions:
Assuming offsetof(struct Derived, super.x) != offsetof(struct Base, x)
struct Base {
int x;
void *hook;
};
struct Derived {
struct Base super;
};
With the assumption above, consider:
struct Base base = {42};
struct Derived derived;
base.hook = &base; /* Assuming offsetof(struct Base, x) == 0 */
derived.super = base;
(struct Base*)(derived.super.hook) == &base shall be true.
#include <stddef.h>
#include <stdio.h>
struct Base {
int x;
void *hook;
};
struct Derived {
struct Base super;
};
int main(void) {
struct Base base = {42};
struct Derived derived;
base.hook = &base; /* Assuming offsetof(struct Base, x) == 0 */
derived.super = base;
printf("Offset Base x: %lu\n", offsetof(struct Base, x));
printf("Offset Derived super: %lu\n", offsetof(struct Derived, super));
printf("Offset Derived super.x: %lu\n", offsetof(struct Derived, super.x));
printf("Offset Derived super.hook: %lu\n",
offsetof(struct Derived, super.hook));
printf("derived.super.hook == &base, yields %d",
(struct Base *)(derived.super.hook) == &base);
return 0;
}

However, why is it necessary for a POD structure to have another pointer void *foo::hook?
It isn't necessary. From the original question and answer:
This technique is more reliable, especially if you plan to write the "derived struct" in C++ and use virtual functions. In that case, the offset of the first member is often non-0 as compilers store run-time type information and the class' v-table there.
A c++ struct/class with virtual function is not POD. Any non POD structure/class can have a non-0 offset for the data members and that is the case the hook is there to handle.

Does inheritance via unwinding violate strict aliasing rule?

I have a struct X which inherits from struct Base. However, in my current setup, due to alignment, size of X is 24B:
typedef struct {
double_t a;
int8_t b;
} Base;
typedef struct {
Base base;
int8_t c;
} X;
In order to save the memory, I'd like to unwind the Base struct, so I created struct Y which contains fields from Base (in the same order, always at the beginning of the struct), so the size of the struct is 16B:
typedef struct {
double_t base_a;
int8_t base_b;
int8_t c;
} Y;
Then I'm going to use instance of struct Y in a method which expects a pointer to Base struct:
void print_base(Base* b)
{
printf("%f %d\n", b->a, b->b);
}
// ...
Y data;
print_base((Base*)&data);
Does the code above violates the strict aliasing rule, and causes undefined behavior?

First, Base and Y are not compatible types as defined by the standard 6.2.7, all members must match.
To access an Y through a Base* without creating a strict aliasing violation, Y needs to be "an aggregate type" (it is) that contains a Base type among its members. It does not.
So it is a strict aliasing violation and furthermore, since Y and Base are not compatible, they may have different memory layouts. Which is kind of the whole point, you made them different types for that very reason :)
What you can do in situations like this, is to use unions with struct members that share a common initial sequence, which is a special allowed case. Example of valid code from C11 6.5.2.3:
union {
struct {
int alltypes;
} n;
struct {
int type;
int intnode;
} ni;
struct {
int type;
double doublenode;
} nf;
} u;
u.nf.type = 1;
u.nf.doublenode = 3.14;
/* ... */
if (u.n.alltypes == 1)
if (sin(u.nf.doublenode) == 0.0)

Complicated structure offsets in contiguous space

I would like to know if there is an elegant alternative to this:
struct A{
uint64_t w;
uint64_t x;
uint64_t y;
uint64_t z;
};
struct B{
uint32_t a;
uint16_t b;
};
void function(uint32_t length){
//we have one struct A at the head and multiple struct B.
struct B *ptr = malloc (length * sizeof B + sizeof A);
//we set random values in the head:
struct A * tmp = (struct A*)ptr;
tmp->w = 1000;
tmp->x = 1200;
tmp->y = 99;
tmp->z = ~(0ULL);
/*then we set the first element of type B.
*this is where my question lies
*/
// put the pointer at the right position:
tmp++;
//convert that position again:
struct B * right_position = (struct B*)tmp;
...// do things with B type.
}
Obviously, it would be simpler to have those fitted like so:
struct root{
struct A child1;
struct B *child2;
}
But my question is much more about the way to mark those offset down properly without writing the tmp++.
How could I directly access the first B element on that array without using tmp++?
Again, this is not how I would do it in real code. This is just.. kind of art we are discussing here, if you will :)

Perhaps struct B * right_position = (struct B*)((char *)ptr + sizeof(A));. The (char *) cast will make the calculation be performed in bytes.

struct A *a_ptr = malloc(sizeof(struct A) + length * sizeof(struct B));
struct B *b_ptr = (struct B *)(a_ptr + 1);

Maybe you should create a structure type with a flexible array member:
struct Both
{
struct A a;
struct B b[];
};
struct Both *c = malloc(sizeof(*c) + length * sizeof(c->b[0]));
c->a.w = 1000;
c->a.x = 1200;
c->a.y = 99;
c->a.z = ~(0ULL);
c->b[0].a = 37;
c->b[0].b = 59;
This guarantees alignment and doesn't require any casting or other chicanery. It's a part of C99 and C11 and replaces the struct hack. The standard (ISO/IEC 9899:2011) says:
§6.7.2.1 Structure and union specifiers
¶18 As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same
element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.
The standard then has paragraphs 20-25 with examples discussing aspects of the use of flexible array members.

Extending a structure

typedef struct A
{
int x;
}A;
typedef struct B
{
A a;
int d;
}B;
void fn()
{
B *b;
((A*)b)->x = 10;
}
I read the above code snippet in SO. ((A*)b)->xis not good programming style.b->a.x is good programming style. Because anybody adds something before the statement "A a;" in structure b , it will not work. I don't understand why? I tried it too. Any suggestions please?

That trick is used to emulate inheritance in C. It makes possible to pass address A or B to function which expects pointer to A.
This works because C guarantees that there is no padding before the first member of struct. So if A is first member of B, memory layout at the beginning of B is always same as A.
int doStuff(A * a) {
return a->x + 1;
}
...
B b;
doStuff((A*)&b); // Will work because b and b.a have the same start address
If you would change B declaration:
typedef struct B
{
int d;
A a;
}B;
this would no longer work as (A*)&b would return address of b.d, not b.a.

What you have here is a "poor man's inheritance". Like true inheritance in C++ it's used to define a type which comprises the common features (data, function pointers) of objects which may in fact carry more information than just that common subset.
That technique is widely used in e.g. GhostScript where printer drivers carry some common information and on top special information to control that particular printer model.
The C language mechanism employed here is that a struct is essentially the concatenation of its data in memory, in the order of member declaration. That order is important for getting access right after casting.
The memory layout of your B is |---int x---|---int d ---|. There is no additional information stored. An A * points to the first element, x; so does a B *. You could have a struct c
struct C
{
B b;
float f;
};
whose layout would be |---int x---|---int d ---|-----float f---|. The interesting thing is that you can pass an A *pa to a function which somehow knows that pa actually points to a C and cast that "down": ((C *)pa)->f. (C *)pa does not change the value of pa but just tells the compiler what it points to (at the responsibility of the programmer). The knowledge about what type actually is hidden in the object is often encoded in an enum/int data member which is manually set to a magic, type-indicating value when the object is created.

Variable as array size in struct

I implement a file's struct in my program but for some arrays in the struct I don't know the size. The size of the array is stored in another variable but it's unknown before the struct is filled in.
struct Vertex {
float x;
float y;
float z;
};
struct myFile {
ulong nVertices;
Vertex vertices[nVertices];
};
That gives an error: "error C2065: 'nVertices' : undeclared identifier".

You should store a pointer in your struct:
Vertex *vertices;
Then allocate the memory at runtime:
myFile f;
f.vertices = malloc(nVertices * sizeof(Vertex));
if (f.vertices == 0)
handle_out_of_memory();
f.nVertices = nVertices;
Remember to free the memory when done:
free(f.vertices);

C99 introduces 'flexible array members', which may be what you want to use. Your code still ends up looking remarkably like the code suggested by #frast, but is subtly different.
§6.7.2.1 Structure and union specifiers
A structure or union shall not contain a member with incomplete or function type (hence,
a structure shall not contain an instance of itself, but may contain a pointer to an instance
of itself), except that the last member of a structure with more than one named member
may have incomplete array type; such a structure (and any union containing, possibly
recursively, a member that is such a structure) shall not be a member of a structure or an
element of an array.
[...]
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. With two
exceptions, the flexible array member is ignored. First, the size of the structure shall be
equal to the offset of the last element of an otherwise identical structure that replaces the
flexible array member with an array of unspecified length.106) Second, when a . (or ->)
operator has a left operand that is (a pointer to) a structure with a flexible array member
and the right operand names that member, it behaves as if that member were replaced
with the longest array (with the same element type) that would not make the structure
larger than the object being accessed; the offset of the array shall remain that of the
flexible array member, even if this would differ from that of the replacement array. If this
array would have no elements, it behaves as if it had one element but the behavior is
undefined if any attempt is made to access that element or to generate a pointer one past
it.
EXAMPLE Assuming that all array members are aligned the same, after the declarations:
struct s { int n; double d[]; };
struct ss { int n; double d[1]; };
the three expressions:
sizeof (struct s)
offsetof(struct s, d)
offsetof(struct ss, d)
have the same value. The structure struct s has a flexible array member d.
If sizeof (double) is 8, then after the following code is executed:
struct s *s1;
struct s *s2;
s1 = malloc(sizeof (struct s) + 64);
s2 = malloc(sizeof (struct s) + 46);
and assuming that the calls to malloc succeed, the objects pointed to by s1 and s2 behave as if the
identifiers had been declared as:
struct { int n; double d[8]; } *s1;
struct { int n; double d[5]; } *s2;
Following the further successful assignments:
s1 = malloc(sizeof (struct s) + 10);
s2 = malloc(sizeof (struct s) + 6);
they then behave as if the declarations were:
struct { int n; double d[1]; } *s1, *s2;
and:
double *dp;
dp = &(s1->d[0]); // valid
*dp = 42; // valid
dp = &(s2->d[0]); // valid
*dp = 42; // undefined behavior
The assignment:
*s1 = *s2;
only copies the member n and not any of the array elements. Similarly:
struct s t1 = { 0 }; // valid
struct s t2 = { 2 }; // valid
struct ss tt = { 1, { 4.2 }}; // valid
struct s t3 = { 1, { 4.2 }}; // invalid: there is nothing for the 4.2 to initialize
t1.n = 4; // valid
t1.d[0] = 4.2; // undefined behavior
106) The length is unspecified to allow for the fact that implementations may give array members different
alignments according to their lengths.
The example is from the C99 standard.