Is a pointer to the struct aligned as if it were a pointer to the first element?
or
Is a conversion between a pointer to a struct and a pointer to the type of its first member (or visa versa) ever UB?
(I hope they are the same question...)
struct element
{
tdefa x;
tdefb y;
};
int foo(struct element* e);
int bar(tdefa* a);
~~~~~
tdefa i = 0;
foo((struct element*)&i);
or
struct element e;
bar((tdefa*)&e);
Where tdefa and tdefb could be defined as any type
Background:
I asked this question
and a user in a comment on one of the answers brought up C11 6.3.2.3 p7 that states:
"A pointer to an object type may be converted to a pointer to a
different object type. If the resulting pointer is not correctly
aligned for the referenced type, the behavior is undefined"
However I am having trouble working out when this would become an issue, my understanding was that padding would allow all members of the struct to be aligned correctly. Have I misunderstood?
and if:
struct element e;
tdefa* a = &e.x;
would work then:
tdefa* a = (tdefa*)&e;
would too.
There is never any initial padding; the first member of a struct is required to start at the same address as the struct itself.
You can always access the first member of a struct by casting a pointer to the whole struct, to be a pointer to the type of the first member.
Your foo example might run into trouble because foo will be expecting its argument to point to a struct element which in fact it does not, and there might be an alignment mismatch.
However the bar example and the final example is fine.
A pointer to a structure always points to its initial member.
Here is the citation directly from C99 standard (6.7.2.1, paragraph 13), emphasis mine:
Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning
As for your foo and bar examples:
The call to bar will be fine, as bar expects a tdefa, which is exactly what it's getting.
The call to foo, however, is problematic. foo expects a full struct element, but you're only passing a tdefa (while the struct consists of both tdefa and tdefb).
Related
Consider an arbitrary struct where the C compiler will perform padding
struct node {
enum type;
size_t num_children;
void** nodes;
};
Will C ever perform padding before the first element? I ask this as I need to do some funky things with void* and require that
void* a = node->nodes[0];
enum type t = *(enum type*)(a);
will always be evaluated correctly. I'm aware that I can force no padding but would rather not.
Will C ever perform padding before the first element?
No. This is explicitly prohibited in the C standard:
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at its
beginning.
(emphasis mine).
I'm implementing a binary tree in C89, and I'm trying to share common attributes among all node structs through composition. Thus I have the following code:
enum foo_type
{
FOO_TYPE_A,
FOO_TYPE_B
};
struct foo {
enum foo_type type;
};
struct foo_type_a {
struct foo base;
struct foo * ptr;
};
struct foo_type_b {
struct foo base;
char * text;
};
I'm including a member of type struct foo in all struct definitions as their initial member in order to provide access to the value held by enum foo_type regardless of struct type. To achieve this I'm expecting that a pointer to a structure object points to its initial member, but I'm not sure if this assumption holds in this case. With C99, the standard states the following (see ISO/IEC 9899:1999 6.7.2.1 §13)
A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
Although all structs share a common struct foo object as their initial member, padding comes into play. While struct foo only has a single member which is as int size, both struct foo_type_a and struct foo_type_b include pointer members, which in some cases increase the alignment and thus adds padding.
So, considering this scenario, does the C programming language (C89 or any subsequent version) ensures that it's safe to access the value of struct foo::type through a pointer to an object, whether that object is of type struct foo or includes an object of type struct foo as its first member, such as struct foo_type_a or struct foo_type_b?
As you yourself quote from the C Standard, what you describe is supported by C99 and later versions.
Is appears it was also supported by C89 as the language you quoted was already present in the ANSI-C document from 1988:
3.5.2.1 Structure and union specifiers
...
Within a structure object, the non-bit-field members and the units
in which bit-fields reside have addresses that increase in the order
in which they are declared. A pointer to a structure object, suitably
cast, points to its initial member (or if that member is a bit-field,
then to the unit in which it resides), and vice versa. There may
therefore be unnamed holes within a structure object, but not at its
beginning, as necessary to achieve the appropriate alignment.
This code snippet prints the value 5. I don't understand why.
#include <stdio.h>
struct A
{
int x;
};
struct B
{
struct A a;
int y;
};
void printA(struct A *a)
{
printf("A obj: %d\n", a->x);
}
int main(void)
{
struct B b = {
{
5
},
10
};
struct A *a = (struct A*)&b;
printA(a);
printf("Done.\n");
return 0;
}
When I create b, a pointer to it would point to the data { {5}, 10 }.
When I cast &b to struct A*, I'm assuring the compiler that this struct A* points to a struct of a single data element of data type int. Instead, I'm providing it a pointer to a struct of two data elements of data types struct A, int.
Even if the second variable is ignored (since struct A has only one data member) I am still providing it a struct whose member is of data type struct A, not int.
Thus, when I pass in a to printA, the line a->x is performed, essentially asking to access the first data element of a. The first data element of a is of data type struct A, which is a type mismatch due to the %d expecting a digit, not a struct A.
What exactly is happening here?
When I create b, a pointer to it would point to the data { {5}, 10 }.
Yes, in the sense of that being the text of a type-appropriate and value-correct C initializer. That text itself should not be taken literally as the value of the structure.
When I cast &b to struct A*, I'm assuring the compiler that this
struct A* points to a struct of a single data element of data type
int.
No, not exactly. You are converting the value of the expression &b to type struct A *. Whether the resulting pointer actually points to a struct A is a separate question.
Instead, I'm providing it a pointer to a struct of two data
elements of data types struct A, int.
No, not "instead". Given that struct B's first member is a struct A, and that C forbids padding before the first member of a structure, a pointer to a struct B also points to a struct A -- the B's first member -- in a general sense. As #EricPostpischi observed in comments, the C standard explicitly specifies the outcome in your particular case: given struct B b, converting a pointer to b to type struct A * yields a pointer to b's first member., a struct A.
Even if the second variable is ignored (since struct A has only one
data member) I am still providing it a struct whose member is of data
type struct A, not int.
The first sizeof(struct A) bytes of the representation of a struct B form the representation of its first member, a struct A. That the latter is a member of the former has no physical manifestation other than their overlap in memory.
Even if the language did not explicitly specify it, given your declaration of variable b as a struct B, there would be no practical reason to expect that the expression (struct A*)&b == &b.a would evaluate to false, and there can be no question that the right-hand pointer can be used to access a struct A.
Thus, when I pass in a to printA, the line a->x is performed,
essentially asking to access the first data element of a.
Yes, and this is where an assertion enters that a really does point to a struct A. Which it does in your case, as already discussed.
The first
data element of a is of data type struct A,
No. *a is by definition a struct A. Specifically, it is the struct A whose representation overlaps the beginning of the representation of b. If there were not such a struct A then the behavior would be undefined, but that's not an issue here. Like every struct A, it has a member, designated by x, that is an int.
which is a type mismatch
due to the %d expecting a digit, not a struct A.
You mean expecting an int. And that's what it gets. That's what the expression a->x reads, supposing the behavior is defined at all, because that is the type of that expression. Under different circumstances the behavior might indeed not be defined, but under no circumstance does that expression ever provide a struct A.
What exactly is happening here?
What seems to be happening is that you are imagining different, higher-level semantics than C actually provides. In particular, you seem to have a mental model of structures as lists of distinguishable member objects, and that's leading you to form incorrect expectations.
Perhaps you are more familiar with a weakly typed language such as Perl, or a dynamically typed language such as Python, but C works differently. You cannot look at a C object and usefully ask "what is your type"? Instead, you look at each and every object through the lens of the static type of the expression used to access it.
The language-lawyer explanation of why the code is fine:
Any pointer in C may be converted to any other pointer type. (C17 6.3.2 §7).
If it is safe to dereference the pointed-at object after conversion depends on: 1) if the types are compatible and thereby correctly aligned, and 2) if the respective pointer types used are allowed to alias.
As a special case, a pointer to a struct type is equivalent to a pointer to its first member. The relevant part of C17 6.7.2 §15 says:
A pointer to a structure object,
suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in
which it resides), and vice versa.
This means that (struct A*)&b is fine. &b is suitably converted to the correct type.
There is no violation of "strict aliasing", since we fulfil C17 6.5 §7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
...
an aggregate or union type that includes one of the aforementioned types among its members
The effective type of the initial member being struct A. The lvalue access that happens inside the print function is fine. struct B is also an aggregate type that includes struct A among its members, so strict aliasing violations are impossible, regardless of the initial member rule cited at the top.
There is a special rule in the C standard for this case. C 2011 6.7.2.1 15 says:
A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa.
I've got a struct x:
struct x {
__s32 array[10];
};
How can I create a pointer to array x->array, if I've got only pointer to stucture?
The straightaway method is the commonly used way, as
struct x * ptr = NULL;
//allocation
__s32 * otherPtr = ptr->array; //array name decays to pointer to first member
__s32 (*p) [10] = &(ptr->array); // pointer to whole array.
Otherwise, there's another way, but for specialized cases, quoting C11, chapter §6.7.2.1, Structure and union specifiers
[...] A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
So, in case, the array variable is the first member (or only member, as seen in above example) of the structure, the pointer to the structure variable, suitably converted to proper type, will also point to the beginning of the array member variable.
In this case, you can use a cast of (__s32 (*)[10]).
Correct way is
__s32 *pointer = x->array
It is equal to
__s32 *pointer = &(x->array[0])
Suppose I have some structures defined like:
struct foo { int a; };
struct bar { struct foo r; int b; };
struct baz { struct bar z; int c; };
Does the C standard guarantee that the following code is strictly conforming?
struct baz x;
struct foo *p = (void *)&x;
assert(p == &x.z.r);
The motivation for this construct is to provide a consistent programming idiom for casting to a pointer type that is known to be compatible.
Now, this is what C says about how structures and its initial members are convertible:
Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
C.11 §6.7.2.1¶15
This is what it says about void pointer conversions:
A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.
C.11 §6.3.2.3¶1
And this is what it says about converting between object pointer types:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned68) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.
68) In general, the concept ‘‘correctly aligned’’ is transitive: if a pointer to type A is correctly aligned for a
pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.
C.11 §6.3.2.3¶7
My understanding from the above is that converting an object pointer to an object pointer of a different type via a void * conversion is perfectly fine. But, I got a comment that suggests otherwise.
Your example is strictly conforming.
The 2nd sentence from §6.7.2.1 ¶15 (A pointer to a
structure object, suitably converted, points to its initial member ... and vice versa.) guarantees the following equalities :
(sruct bar *) &x == &(x.z)
(struct foo *) &(x.z) == &(x.z.r)
As you are at the beginning of the struct, no padding can occur, and my understanding of the standard is that the address of a struct and of its first element are the same.
So struct foo *p = (void *) &x; is correct as would be struct foo *p = (struct foo *) &x;
In that particular case, the alignment is guaranteed to be correct per §6.7.2.1 ¶15. And it is always allowed to pass via a void *, but it is not necessary, because §6.3.2.3 ¶7 allows the conversion between pointers to different objects, provided there is no alignment problem
And it should be noted that §6.2.3.2 ¶7 also says : When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object that means that all those pointers point in fact to the lowest addressed byte of x.r.z.a. So you could also pass via pointers to char because we also have :
(char *) &x == (char *) &(x.z) == (char *) &(x.z.r) == (char *) &(x.z.r.a)
To complete the analysis, it is necessary to see the definition of how pointer equality is defined:
... If one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void, the former is converted to the type of the latter.
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
C.11 §6.5.9¶5-6
So, here is an argument that it is well-defined:
(void *)&x == (void *)&x.z §6.7.2.1¶15, §6.5.9¶5-6
(void *)&x.z == &x.z.r §6.7.2.1¶15, §6.5.9¶5-6
(void *)&x == &x.z.r transitive equality
(struct foo *)(void *)&x == (void *)&x §6.3.2.3¶1, §6.5.9¶5-6
(struct foo *)(void *)&x == &x.z.r transitive equality
The last step above is the essence of initializing p and the assertion from the code in the question.
IMHO yes, apart on what strict interpretations of standard can make questionable, allocation of objects in memory follows the same rule on the same compiler: supply an address suitable for any kind of variable. Because each structure starts with its first variable for the transitive property the structure itself will be aligned to an address that suits any kind of variable. The latter close the doubt that different structures have different addresses, no modification can be made between conversions because the address definition follows the same rules. That's of course not true for following structure fields, which can be not contiguous to conform with alignment requirements of following fields.
If you operate on the first element of a structure it is guarantee that is the same as the first field of the structure itself.
Now have a look to one of the most diffused piece of software all around: the Independent JPEG group JPEGlib.
The whole software, compiled on many processors and machines, uses a technique that resembles C++ management passing structures wich beginning is always the same, but that holds many other, and different substructures and fields between calls.
This code compile and runs on anything from toys to PC's to tablets, etc...
Yes, in terms of language standard, your example is strictly conforming, thus, perfectly legal. This essentially comes from 2 quotes you provided (important is highlighted). The first one:
A pointer to void may be converted to or from a pointer to any object type.
This means that in your assignment in code we have a successfull cast from struct baz pointer to void pointer and, after that, successfull cast from void pointer to struct due to the fact that both pointers are aligned equally. If that was not the case, we would have undefined behaviour due to non-compliance to 6.3.2.3 that you provided.
And the second one:
68) In general, the concept ‘‘correctly aligned’’ is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.
And this one is more important. It does not state (nor should it) that types A and C must be the same which, in turn, allows them not to. The only restriction is the alingment.
That's pretty much it.
Of course, however, such manipulations are unsafe for obvious reasons.