Elements of the struct must be properly aligned, so struct can be padded with non-used bytes.
Size of the struct is always the same (since it's a constant expression).
So I presume compiler must somehow use the same padding every time such as:
placing a single struct instance
passing a struct instance by value to a function
returning a struct instance by value from a function
in the array of structs
If I'm correct with this presumption, can you please point me where does C standard says so? Are there any strict rules for padding placement that guarantee this property?
C locks down what the offsets of fields are in a particular structure type via the specification of the offsetof macro, in §7.17 of this version of the draft, which is describing the required macros in <stddef.h>.
The macros are
NULL
which expands to an implementation-defined null pointer constant; and
offsetof(type, member-designator)
which expands to an integer constant expression that has type size_t, the value of
which is the offset in bytes, to the structure member (designated by member-designator),
from the beginning of its structure (designated by type). The type and member designator
shall be such that given
static type t;
then the expression &(t.member-designator) evaluates to an address constant. (If the
specified member is a bit-field, the behavior is undefined.)
Since it is a constant expression, every use of a particular structure type must use the same offset (or this would all be completely crazy).
Two different struct types, even if declared the same, are not constrained by this to have the same layout. (For example, if the types were declared under the control of different structure layout pragmas, they would not be expected to necessarily have the same layout, yet the declarations would look prima facie identical. typedef is important!)
Related
N2479 C17..C2x working draft — February 5, 2020 ISO/IEC 9899:202x (E) (emphasis added):
6.7.2.1 Structure and union specifiers
17 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
18 The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.
Question: what is the exact definition of suitably converted?
Extra: if there is no exact definition of suitably converted, then shall the C implementation document its understanding? For example (C/C++ preprocessor domain), Microsoft understands the term single item (C++, N4713) as single, permanently indivisible preprocessor token (which leads to issues while porting code from gcc/clang/other, which has different understanding), however, it seems that they don't document their understanding of the single item.
In this context "suitable converted" means converted to a proper compatible type. For example:
#include <stdio.h>
struct mystruct {
double a;
int b;
};
int main()
{
struct mystruct s = { 2.5, 4 };
double *d = (double *)&s;
printf("%f\n", *d); // prints 2.500000
return 0;
}
Here the first member of struct mystruct has type double. So a "suitable conversion" in this case means a struct mystruct * may be converted via explicit cast to double * and it will point to the a member.
This isn't a term with special meaning.
You are dealing with conversion from type A to type B. Type A is a pointer-to-structure type. Type B is not specified except that it must be suitable for use as a pointer to the initial member. Therefore, it must be a pointer (so we are dealing only with conversions between two pointer types) and it must follow the strict aliasing rule (the destination type can be pointer to a narrow character type, std::byte, the actual type of the first member, or a type that is representation compatible such as differing only in signedness).
Any pointer conversion that results in a suitable pointer satisfied "suitably converted".
Question: what is the exact definition of suitably converted?
The C standard does not give any “exact definition” of “suitably converted.”
I interpret it to mean any sequence of conversions to a type of “pointer to type of initial member” or “pointer to type of structure” such that the specifications of the conversions ensure the final pointer is pointing to the appropriate address. (E.g., has not passed through a conversion with possibly incorrect alignment.)
Extra: if there is no exact definition of suitably converted, then shall the C implementation document its understanding?
The C standard does not impose any requirement on a C implementation to document its understanding or interpretation of “suitably converted.”
It's not a formal term, but between the lines we can tell that it is used to mean a valid pointer conversion. And it needs to be carried out explicitly by the programmer by means of a cast.
In C, pretty much any object pointer can be converted to another object pointer casting. But what happens if you would de-reference such a pointer through the wrong type is a whole different story though. Most of the time it's poorly-defined behavior.
Valid pointer conversions in this case:
A pointer to a type compatible with the type of the first member.1) 2) 3)
A void pointer. 1) 2)
A pointer to character type.2)
A pointer to another structure type, where both structures are part of a union and share common initial member(s) of compatible type.4)
Optionally the pointer in any of the above cases can be type qualified. Unless the first member is a qualified type, in which case the pointer needs to share all qualifiers of that type. 1) 3)
1) The rules of simple assignment 6.5.16.1.
2) The rules regarding pointer conversions (6.3.2.3).
3) The rules of compatible type qualifiers (6.7.3).
4) The rule of common initial sequence (6.5.2.3).
The common initial sequence one is an oddball rule which apparently got poor compiler support, but it's in line with "strict aliasing".
Consider the following code:
int main()
{
typedef struct { int first; float second; } type;
type whole = { 1, 2.0 };
void * vp = &whole;
struct { int first; } * shorn = vp;
printf("values: %d, %d\n", ((type *)vp)->first, shorn->first);
if (vp == shorn)
printf("ptrs compare the same\n");
return 0;
}
Two questions:
Is the pointer equality comparison UB?
Regarding the "shearing" away of the second member on the line that initializes shorn: is it valid C to cast away struct members like this and then dereference the manipulated pointer to access the remaining member?
Comparing two pointers with == when one is a void * is well defined.
Section 6.5.9 of the C standard regarding the equality operator == says the following:
2 One of the following shall hold:
both operands have arithmetic type;
both operands are pointers to qualified or unqualified versions of compatible types;
one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void; or
one operand is a pointer and the other is a null pointer constant
...
5 Otherwise, at least one operand is a pointer. If one operand is a pointer and the other is a null pointer constant, the null pointer
constant is converted to the type of the pointer. If one operand
is a pointer to an object type and the other is a pointer
to a qualified or unqualified version of void, the former is
converted to the type of the latter.
The usage of shorn->first works because a pointer to a struct can be converted to a pointer to its first member. For both type and the unnamed struct type their first member is an int so it works out.
Section 6.2.5 Types paragraph 28 of the C standard says:
[...] All pointers to structure types shall have the same representation and alignment requirements as each other. [...]
Section 6.3.2.3 Pointers paragraph 1 says:
A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.
And paragraph 7 says:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned68) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. [...]
And footnote 68 says:
In general, the concept "correctly aligned" is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.
Because all pointers to structure types have the same representation, the conversions between pointers to void and pointers to structure types must be the same for all pointers to structure types. So it seems that a pointer to structure type A could be converted by a cast operator directly to a pointer to structure type B without an intermediate conversion to a pointer to void as long as the pointer is "correctly aligned" for structure type B. (This may be a weak argument.)
The question remains when, in the case of two structure types A and B where the initial sequence of structure type A consists of all the members of structure type B, a pointer to structure type A is guaranteed to be correctly aligned for structure type B (the reverse is obviously not guaranteed). As far as I can tell, the C standard makes no such guarantee. So strictly speaking, a pointer to the larger structure type A might not be correctly aligned for the smaller structure type B, and if it is not, the behavior is undefined. For a "sane" compiler, the larger structure type A would not have weaker alignment than the smaller structure type B, but for an "insane" compiler, that might not be the case.
Regarding the second question about accessing members of the truncated (shorter) structure using the pointer derived from the full (longer) structure, then as long as the pointer is correctly aligned for the shorter structure (see above for why that might not be true for an "insane" compiler), and as long as strict aliasing rules are avoided (for example, by going through an intermediate pointer to void in an intermediate external function call across compilation unit boundaries), then accessing the members through the pointer to the shorter structure type should be perfectly fine. There is a special guarantee for that when objects of both structure types appear as members of the same union type. Section 6.3.2.3 Structure and union members paragraph 6 says:
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union
object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union
is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
However, since the offsets of members within a structure type does not depend on whether an object of the structure type appears in a union type or not, the above implies that any structures with a common initial sequence of members will have those common members at the same offsets within their respective structure types.
In the language the C89 Standard was written to describe, it was well established that if two structures share a common initial sequence, a pointer to either may be cast to the other and used to inspect members of that common initial sequence. Code which relied upon this was commonplace and not considered even remotely controversial.
In the interest of optimization, the authors of the C99 Standard deliberately allowed compilers to assume that structures of different types won't alias in cases where such assumption would be useful for their customers. Because there are many good means by which implementations could recognize cases where such assumptions would be needlessly break what had been perfectly good code, and because they the authors of the Standard expected that compiler writers would make a bona fide effort to behave in ways useful to the programmers using their products, the Standard doesn't mandate any particular means of making such distinctions. Instead, it regards the ability to support constructs that had been universally supported as a "quality of implementation" issue, which would be reasonable if compiler writers made a bona fide effort to treat it as such.
Unfortunately, some compiler writers who aren't interested in selling their products to paying customers have taken the Standard's failure to mandate useful behavior as an invitation to behave in needlessly useless fashion. Code which relies upon Common Initial Sequence guarantees can thus not be meaningfully processed by clang or gcc without using either non-standard syntax or disabling type-based aliasing entirely.
I wanted to make something like a binary search tree for mapping addresses to Page *s (except it was actually hexadecimal, and the address is implied by the structure itself), so:
typedef union Map Map;
union Map {
Map (*ps)[16];
Page *p;
};
This makes perfect sense, logically (a union Map containing either a pointer to a Page or a pointer to an array of 16 Maps), but gcc errors at the ps declaration with array type has incomplete element type, so I guess this kind of recursive definition is not allowed in C.
Is there any way to do this without using tricks like pointer aliasing?
C 2018 6.7.6.2 1 specifies the constraints for array declarators. It says, in part:
The element type shall not be an incomplete or function type.
In Map (*ps)[16], the (*ps)[16] is a declarator as shown in the grammar at 6.7.6 1. Since it is an array declarator, it is subject to the rules in 6.7.6.2, and therefore the element type must be complete. This is true even though the ultimate type being declared is a pointer.
As noted in a comment, you can declare Map *ps instead. If this is unsatisfactory because then pointer arithmetic on ps works in units of Map instead of Map [16], an alternative could be to define typedef struct Map16 Map16; before the union, then Map16 *ps; inside the union, and then struct Map16 { Map element[16]; } after the union. That will make pointer arithmetic on ps work in the desired units (supposing the implementation does not pad the structure, which would be unusual), although it does make you use an extra .element when referencing elements.
In considering why Map *ps is accepted while Map (*ps)[16] is not, we can see that both declare pointers to incomplete types, and so it is not the completeness of the pointed-to type that distinguishes them but merely this rule in the C standard. It may be that the rule in 6.7.6.2 1 could have been modified to permit Map (*ps)[16], as it does not appear the compiler needs complete information about the pointed-to type at that point.
I can't seem to wrap my head around certain parts of the C standard, so I'm coming here to clear up that foggy, anxious uncertainty that comes when I have to think about what such tricks are defined behaviour and what are undefined or violate the standard. I don't care whether or not it will WORK, I care if the C standard considers it legal, defined behaviour.
Such as this, which I am fairly certain is UB:
struct One
{
int Hurr;
char Durr[2];
float Nrrr;
} One;
struct Two
{
int Hurr;
char Durr[2];
float Nrrr;
double Wibble;
} Two;
One = *(struct One*)&Two;
This is not all I am talking about. Such as casting the pointer to One to int*, and dereferencing it, etc. I want to get a good understanding of what such things are defined so I can sleep at night. Cite places in the standard if you can, but be sure to specify whether it's C89 or C99. C11 is too new to be trusted with such questions IMHO.
I think that technically that example is UB, too. But it will almost certainly work, and neither gcc nor clang complain about it with -pedantic.
To start with, the following is well-defined in C99 (§6.5.2.3/6): [1]
union OneTwo {
struct One one;
struct Two two;
};
OneTwo tmp = {.two = {3, {'a', 'b'}, 3.14f, 3.14159} };
One one = tmp.one;
The fact that accessing the "punned" struct One through union must work implies that the layout of the prefix of struct Two is identical to struct One. This cannot be contingent on the existence of a union because the a given composite type can only have one storage layout, and its layout cannot be contingent on its use in a union because the union does not need to be visible to every translation unit in which the struct is used.
Furthermore, in C all types are no more than a sequence of bytes (unlike, for example, C++) (§6.2.6.1/4) [2]. Consequently, the following is also guaranteed to work:
struct One one;
struct Two two = ...;
unsigned char tmp[sizeof one];
memcpy(tmp, two, sizeof one);
memcpy(one, tmp, sizeof one);
Given the above and the convertibility of any pointer type to a void*, I think it is reasonable to conclude that the temporary storage above is unnecessary, and it could have been written directly as:
struct One one;
struct Two two = ...;
unsigned char tmp[sizeof one];
memcpy(one, two, sizeof one);
From there to the direct assignment through an aliased pointer as in the OP is not a very big leap, but there is an additional problem for the aliased pointer: it is theoretically possible for the pointer conversion to create an invalid pointer, because it's possible that the bit format of a struct Two* differs from a struct One*. Although it is legal to cast one pointer type to another pointer type with looser alignment (§6.3.2.3/7) [3] and then convert it back again, it is not guaranteed that the converted pointer is actually usable, unless the conversion is to a character type. In particular, it is possible that the alignment of struct Two is different from (more strict than) the alignment of struct One, and that the bit format of the more strongly-aligned pointer is not directly usable as a pointer to the less strongly-aligned struct. However, it is hard to see an argument against the almost equivalent:
one = *(struct One*)(void*)&two;
although this may not be explicitly guaranteed by the standard.
In comments, various people have raised the spectre of aliasing optimizations. The above discussion does not touch on aliasing at all because I believe that it is irrelevant to a simple assignment. The assignment must be sequenced after any preceding expressions and before any succeeding ones; it clearly modifies one and almost as clearly references two. An optimization which made a preceding legal mutation of two invisible to the assignment, would be highly suspect.
But aliasing optimizations are, in general, possible. Consequently, even though all of the above pointer casts should be acceptable in the context of a single assignment expression, it would certainly not be legal behaviour to retain the converted pointer of type struct One* which actually points into an object of type struct Two and expect it to be usable either to mutate a member of its target or to access a member of its target which has otherwise been mutated. The only context in which you could get away with using a pointer to struct One as though it were a pointer to the prefix of struct Two is when the two objects are overlaid in a union.
--- Standard references:
[1] "if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible."
[2] "Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT
bits, where n is the size of an object of that type, in bytes. The value may be copied into
an object of type unsigned char [n] (e.g., by memcpy)…"
[3] "A pointer to an object type may be converted to a pointer to a different object type… When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object."
C99 6.7.2.1 says:
Para 5
As discussed in 6.2.5, a structure is a type consisting of a sequence
of members, whose storage is allocated in an ordered sequence
Para 12
Each non-bit-field member of a structure or union object is aligned in
an implementation-defined manner appropriate to its type.
Para 13
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There
may be unnamed padding within a structure object, but not at its
beginning
That last paragraph covers your second question (casting the pointer to One to int*, and dereferencing it).
The first point - whether it is valid to "Downcast" a Two* to a One* - I could not find specifically addressed. It boils down to whether the other rules ensure that the memory layout of the fields of One and the initial fields of Two are identical in all cases.
The members have to be packed in ordered sequence, no padding is allowed at the beginning, and they have to be aligned according to type, but the standard does not actually say that the layout needs to be the same (even though in most compilers I am sure it is).
There is, however, a better way to define these structures so that you can guarantee it:
struct One
{
int Hurr;
char Durr[2];
float Nrrr;
} One;
struct Two
{
struct One one;
double Wibble;
} Two;
You might think you can now safely cast a Two* to a One* - Para 13 says so. However strict aliasing might bite you somewhere unpleasant. But with the example above you don't need to anyway:
One = Two.one;
A1. Undefined behaviour, because of Wibble.
A2. Defined.
S9.2 in N3337.
Two standard-layout struct (Clause 9) types are layout-compatible if
they have the same number of non-static data members and corresponding
non-static data members (in declaration order) have layout-compatible
types
Your structs would be layout compatible and thus interchangeable but for Wibble. There is a good reason too: Wibble might cause different padding in struct Two.
A pointer to a standard-layout struct object, suitably converted using
a reinterpret_cast, points to its initial member (or if that member is
a bit-field, then to the unit in which it resides) and vice versa.
I think that guarantees that you can dereference the initial int.
in qemu source code, I have the following macro named offsetof. Can anybody tell me what it does?
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *) 0)->MEMBER)
It's used in this manner :
offsetof(CPUState, icount_decr.u32)
where CPUState is a struct.
I think it gives the offset of the member inside a struct, but I'm not sure.
EDIT:Yeah, I found out what was happening. The definition of CPUState had a macro inside, which I missed, which included the variable icount_decr.
It gets the offset of the member of a struct. It does so by casting address zero to a struct of that type then taking the address of the member.
Your thinking is correct! And the name of the macro gives a good hint, too. ;)
It's defined in §7.17/3:
offsetof(type, member-designator)
which expands to an integer constant expression that has type size_t, the value of
which is the offset in bytes, to the structure member (designated by member-designator),
from the beginning of its structure (designated by type). The type and member designator
shall be such that given
static type t;
then the expression &(t.member-designator) evaluates to an address constant. (If the
specified member is a bit-field, the behavior is undefined.)
Because the library doesn't have to necessarily follow language rules, an implementation is free to get the result however it pleases.
So the result of this particular implementation is not undefined behavior, because you aren't suppose to care how it's implemented. (In other words, your implementation makes the guarantee that taking the address of an indirection through a null pointer is well-defined. You of course can't assume this in your own programs.)
If that some library has (re)defined offsetof, they've made your program behavior undefined and should be using the standard library instead. (The dummies.)