With an anonymous union declared in a struct, you can access the members directly. This made sense and I thought, like a normal union, you could only read from the most recent value that has been written to. Then I saw this
#include<stdio.h>
struct Scope
{
// Anonymous union
union
{
char alpha;
int num;
};
};
int main()
{
struct Scope x;
x.num = 65;
// Note that members of union are accessed directly
printf("x.alpha = %c, x.num = %d", x.alpha, x.num);
return 0;
}
What then, is the point if I can just access all of the variables all of the time? Why not just declare the variables in the scope of "Scope"?
According to C11 6.2.6.1, the value of the accessed member is unspecified. In general, it could be a trap representation, which triggers undefined behavior.
(If you changed char to unsigned char, it would be safe, since unsigned char cannot have trap representations. So your program would run to completion and print something, but the C standard does not specify what value would be printed for x.alpha.)
Of course, any given implementation may specify what value you actually get (e.g. the low byte, or the high byte, of x.num). So such code is most likely intended to work only on such implementations, and not meant to be portable or standard-conforming.
As Peter notes, all this is independent of whether you use anonymous unions or the old-fashioned kind.
The point is that struct Scope could have other members. A more realistic example:
struct Scope
{
union
{
int num;
uint8_t num_byte [sizeof(int)];
};
int foo;
};
Now you can access struct Scope members as obj.num or obj.num_byte[i]. After that union in memory, there will be a different variable foo, so clearly the union members can't get moved out to the struct.
If not for anonymous union, we'd have to type something like obj.name.num, where name is potentially just clutter.
Regarding reading different union members, your statement "you could only read from the most recent value that has been written to" is not true in C. Note that C and C++ are different here.
C17 6.5.2.3 states:
A postfix expression followed by the . operator and an identifier designates a member of
a structure or union object. The value is that of the named member,95) and is an lvalue if
the first expression is an lvalue.
where the foot note 95) is helpful:
95) If the member used to read the contents of a union object is not the same as the member last used to
store a value in the object, the appropriate part of the object representation of the value is reinterpreted
as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type
punning’’). This might be a trap representation.
The part in 6.2.6 referred to by the foot note:
When a value is stored in a member of an object of union type, the bytes of the object
representation that do not correspond to that member but do correspond to other members
take unspecified values.
Where an operator is applied to a value that has more than one object representation,
which object representation is used shall not affect the value of the result. Where a
value is stored in an object using a type that has more than one object representation for
that value, it is unspecified which representation is used, but a trap representation shall
not be generated.
What this means in plain English is that C allows type punning, but it's the programmers responsibility to ensure that it is feasible, with regards to alignment/padding, trap representations, endianess etc.
You can interpret same memory segment differently. Its widely used in protocols implementations. Where you receive buffer of bytes and based on various flags treat/decode it as needed.
In this example int and char are different size and byte order
Related
Prompted by this question:
The C11 standard states that a pointer to a union can be converted to a pointer to each of its members. From Section 6.7.2.1p17:
The size of a union is sufficient to contain the largest of
its members. The value of at most one of the members can be
stored in a union object at any time. A pointer to a union
object, suitably converted, points to each of its members (or
if a member is a bit-field, then to the unit in which it
resides), and vice versa.
This implies you can do the following:
union u {
int a;
double b;
};
union u myunion;
int *i = (int *)&u;
double *d = (double *)&u;
u.a = 2;
printf("*i=%d\n", *i);
u.b = 3.5;
printf("*d=%f\n", *d);
But what about the reverse: in case of the above union, can an int * or double * be safely converted to a union u *? Consider the following code:
#include <stdio.h>
union u {
int a;
double b;
};
void f(int isint, union u *p)
{
if (isint) {
printf("int value=%d\n", p->a);
} else {
printf("double value=%f\n", p->b);
}
}
int main()
{
int a = 3;
double b = 8.25;
f(1, (union u *)&a);
f(0, (union u *)&b);
return 0;
}
In this example, pointers to int and double, both of which are members of union u, are passed to a function where a union u * is expected. A flag is passed to the function to tell it which "member" to access.
Assuming, as in this case, that the member accessed matches the type of the object that was actually passed in, is the above code legal?
I compiled this on gcc 6.3.0 with both -O0 and -O3 and both gave the expected output:
int value=3
double value=8.250000
In this example, pointers to int and double, both of which are members
of union u, are passed to a function where a union u * is expected. A
flag is passed to the function to tell it which "member" to access.
Assuming, as in this case, that the member accessed matches the type
of the object that was actually passed in, is the above code legal?
You seem to be focusing your analysis with respect to the strict aliasing rule on the types of the union members. However, given
union a_union {
int member;
// ...
} my_union, *my_union_pointer;
, I would be inclined to argue that expressions of the form my_union.member and my_union_pointer->member express accessing the stored value of an object of type union a_union in addition to accessing an object of the member's type. Thus, if my_union_pointer does not actually point to an object whose effective type is union a_union then there is indeed a violation of the strict aliasing rule -- with respect to type union a_union -- and the behavior is therefore undefined.
The Standard gives no general permission to access a struct or union object using an lvalue of member type, nor--so far as I can tell--does it give any specific permission to perform such access unless the member happens to be of character type. Nor does it define any means by which the act of casting an int* into a union u* can create one which did not already exist. Instead, the creation of any storage that will ever be accessed as a union u implies the simultaneous creation of a union u object within that storage.
Instead, the Standard (references quoted from the C11 draft N1570) relies upon implementations to apply the footnote 88 (The intent of this list is to specify those circumstances in which an object may or may not be aliased.) and recognize that the "strict aliasing rule" (6.5p7) should only be applied when an object is referenced both via an lvalue of its own type and a seemingly-unrelated lvalue of another type during some particular execution of a function or loop [i.e. when the object aliases some other lvalue].
The question of when two lvalues may be viewed as "seemingly unrelated", and when an implementations should be expected to recognize a relationship between them, is a Quality of Implementation issue. Clang and gcc seem to recognize that lvalues with forms unionPtr->value and unionPtr->value[index] are related to *unionPtr, but seem unable to recognize that pointers to such lvalues have any relationship to unionPtr. They will thus recognize that both unionPtr->array1[i] and unionPtr->array2[j] access *unionPtr (since array subscripting via [] seems to be treated differently from array-to-pointer decay), but will not recognize that *(unionPtr->array1+i) and *(unionPtr->array2+j) do likewise.
Addendum--standard reference:
Given
union foo {int x;} foo,bar;
void test(void)
{
foo=bar; // 1
foo.x = 2; // 2
bar=foo; // 3
}
The Standard would describe the type of foo.x as int. If the second statement didn't access the stored value of foo, then the third statement would have no effect. Thus, the second statement accesses the stored value of an object of type union foo using an lvalue of type int. Looking at N1570 6.5p7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:(footnote 88)
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.
Footnote 88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
Note that there is no permission given above to access an object of type union foo using an lvalue of type int. Because the above is a constraint, any violation thereof invokes UB even if the behavior of the construct would otherwise be defined by the Standard.
Regarding strict aliasing, there is not an issue going from pointer-to-type (for example &a), to pointer-to-union containing that type. It is one of the exceptions to the strict aliasing rule, C17 6.5/7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object, /--/
- an aggregate or union type that includes one of the aforementioned types among its
members
So this is fine as far as strict aliasing goes, as long as the union contains an int/double. And the pointer conversion in itself is well-defined too.
The problem comes when you try to access the contents, for example the contents of an int as a larger double. This is probably UB for multiple reasons - I can think of at least C17 6.3.2.3/7:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned69) for the referenced type, the behavior is undefined.
Where the non-normative foot note provides more information:
69) In general, the concept “correctly aligned” is transitive: if a pointer to type A is correctly aligned for a pointer to type B,
which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.
No. It's not formally correct.
In C you can do whatever, and it could work, but constructs like this are bombs. Any future modification could lead to a big failure.
The union reserves memory space to hold the largest of it elements:
The size of a union is sufficient to contain the largest of its
members.
On the reverse the space can't be enough.
Consider:
union
{
char a;
int b;
double c;
} myunion;
char c;
((union myunion *)&c)->b = 0;
Will create a memory corruption.
The meaning of the standard definition:
The value of at most one of the members can be stored in a union
object at any time. A pointer to a union object, suitably converted,
points to each of its members (or if a member is a bit-field, then to
the unit in which it resides), and vice versa.
Enforce the point that each union member start at the union start address, and, implicitly, states that the compiler shall align unions on a suitable boundary for each of its elements, that means to choose an alignment correct for each member. Because the standard alignments are normally powers of 2, as rule of thumb the union will get aligned on the boundary that fit the element requiring the largest alignment.
Consider the following union:
union{
uint32_t a;
uint64_t b;
};
Both a and b reside in the same memory area. If we initialize the 32 bit integer a to some value, how is it possible to get b (when b was not initialized)? Does it mean that the compiler internally converts a to b
Thanks
The extra bytes in the uint64_t will have an unspecified value.
From section 6.2.6.1 of the C standard:
7 When a value is stored in a member of an object of union type, the bytes of the object representation that do not
correspond to that member but do correspond to other members take
unspecified values.
And section 6.5.2.3:
3 A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The
value is that of the named member, 95) and is an lvalue if
the first expression is an lvalue. If the first expression
has qualified type, the result has the so-qualified version of
the type of the designated member
95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object,
the appropriate part of the object representation of the value is
reinterpreted as an object representation in the new type as
described in 6.2.6 (a process sometimes called "type punning").
This might be a trap representation.
The only time this is allowed is if you have a union of one or more structs and each struct has an initial member of the same type.
From section 6.5.2.3:
6 One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a
common initial sequence (see below), and if the union object
currently contains one of these structures, it is permitted
to inspect the common initial part of any of them anywhere that a
declaration of the completed type of the union is visible. Two
structures share a common initial sequence if corresponding
members have compatible types (and, for bit-fields, the same widths)
for a sequence of one or more initial members.
Here's an example where this might be useful:
union sockadddr_u {
struct sockaddr sa;
struct sockaddr_un sun;
struct sockaddr_in sin;
}
These structures store information on sockets of different types. Most of the members differ, but the first member of each is of type sa_family_t whole value tells you the socket type. This allows you to inspect the first member of any of these members to figure out which of the members contains meaningful data its internal members.
Sample code:
#include <assert.h>
struct S
{
unsigned char ch;
int i;
};
int main()
{
struct S s;
memset(&s, 0, sizeof s);
s.ch = 257;
assert( 0 == ((unsigned char *)&s)[1] );
}
Can the assertion fail?
The motivation for the question is whether a compiler on a little-endian system could decide to use a 4-byte store to implement s.ch = 257;. Obviously nobody would ever write code like I did in my example, but something similar might realistically occur if ch is assigned in various ways in a program which then goes on to use memcmp to check for struct equality.
For example, if the code does --s.ch instead of s.ch = 257 - can the compiler emit a word-size decrement instruction?
I don't think the discussion around DR 451 is relevant, as that only applies to uninitialized padding; however the memset initializes all the padding to zero bytes.
Yes, it can fail. The behavior is unspecified, but not undefined.
After the assignment s.ch = 257;, the values of all padding bits take unspecified values1 , which means that, if the second byte of the structure is a padding byte, it takes unspecified value and the result of the comparison to zero isn't specified. It may trigger or not.
The read value in the assert cannot be a trap representation because unsigned char doesn't have trap representations, and because the value is unspecified, not indeterminate.
1 (Quoted from: ISO/IEC 9899:201x 6.2.6.1 General 6):
When a value is stored in an object of structure or union type, including in a member
object, the bytes of the object representation that correspond to any padding bytes take
unspecified values.
ISO/IEC 9899:2011 §6.2.6.1 (Representations of types) General says:
¶6 When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.51) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.
51) Thus, for example, structure assignment need not copy any padding bits.
However, your example doesn't do a structure assignment, so maybe that doesn't apply. I believe there is no reason to think that an assignment to a simple type member of a structure would modify the data.
However, your assert code does exhibit undefined behaviour, trying to access structure padding, which is simply not allowed.
So, it is unlikely that the assertion would fire, but because your code exhibits undefined behaviour, it could happen and you'd have no recourse.
Basically, I have a
struct foo {
/* variable denoting active member of union */
enum whichmember w;
union {
struct some_struct my_struct;
struct some_struct2 my_struct2;
struct some_struct3 my_struct3;
/* let's say that my_struct is the largest member */
};
};
main()
{
/*...*/
/* earlier in main, we get some struct foo d with an */
/* unknown union assignment; d.w is correct, however */
struct foo f;
f.my_struct = d.my_struct; /* mystruct isn't necessarily the */
/* active member, but is the biggest */
f.w = d.w;
/* code that determines which member is active through f.w */
/* ... */
/* we then access the *correct* member that we just found */
/* say, f.my_struct3 */
f.my_struct3.some_member_not_in_mystruct = /* something */;
}
Accessing C union members via pointers seems to say that accessing the members via pointers is okay. See comments.
But my question concerns directly accessing them. Basically, if I write all the information that I need to the largest member of the union and keep track of types manually, will accessing the manually specified member still yield the correct information every time?
I note that the code in the question uses an anonymous union, which means that it must be written for C11; anonymous unions were not a part of C90 or C99.
ISO/IEC 9899:2011, the current C11 standard, has this to say:
§6.5.2.3 Structure and union members
¶3 A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,95) and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type of the designated member.
¶4 A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points, and is an lvalue.96) If the first expression is a pointer to a qualified type, the result has the so-qualified version of the type of the designated member.
¶5 …
¶6 One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.
96) If &E is a valid pointer expression (where & is the ‘‘address-of’’ operator, which generates a pointer to its operand), the expression (&E)->MOS is the same as E.MOS.
Italics as in the standard
And section §6.2.6 Representations of types says (in part):
§6.2.6.1 General
¶6 When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.51) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.
¶7 When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.
51) Thus, for example, structure assignment need not copy any padding bits.
My interpretation of what you're doing is that footnote 51 says "it might not work" because you may have assigned only part of the structure. You're treading on thin ice, at best. However, against that, you stipulate that the assigned structure (in the f.my_struct = d.my_struct; assignment) is the largest member. The chances are moderately high that it won't go wrong, but if the padding bytes in the two structures (in the active member of the union and in the largest member of the union) are at different places, then things could go wrong and if you reported a problem to the compiler writer, the compiler writer would simply say to you "don't contravene the standard".
So, to the extent I'm a language lawyer, this language lawyer's answer is "It is not guaranteed". In practice, you're unlikely to run into problems, but the possibility is there and you have no comeback on anyone.
To make your code safe, simply use f = d; with a union assignment.
Illustrative Example
Suppose that the machine requires double aligned on an 8-byte boundary and sizeof(double) == 8, that int must be aligned on a 4-byte boundary and sizeof(int) == 4, and that short must be aligned on a 2-byte boundary and sizeof(short) == 2). This is a plausible and even common set of sizes and alignment requirements.
Further, suppose that you have a two-structure union variant of the structure in the question:
struct Type_A { char x; double y; };
struct Type_B { int a; short b; short c; };
enum whichmember { TYPE_A, TYPE_B };
struct foo
{
enum whichmember w;
union
{
struct Type_A s1;
struct Type_B s2;
};
};
Now, under the sizes and alignments specified, the struct Type_A will occupy 16 bytes, and struct Type_B will occupy 8 bytes, so the union will use 16 bytes too. The layout of the union will be like this:
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| x | p...a...d...d...i...n...g | y | s1
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| a | b | c | p...a...d...d...i...n...g | s2
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
The w element would also mean that there are 8 bytes in struct foo before the (anonymous) union, of which it is likely that w only occupies 4. The size of struct foo is therefore 24 on this machine. That's not particularly relevant to the discussion, though.
Now suppose we have code like this:
struct foo d;
d.w = TYPE_B;
d.s2.a = 1234;
d.s2.b = 56;
d.s2.c = 78;
struct foo f;
f.s1 = d.s1;
f.w = TYPE_B;
Now, under the ruling of footnote 51, the structure assignment f.s1 = d.s1; does not have to copy the padding bits. I know of no compiler that behaves like this, but the standard says that a compiler need not copy the padding bits. That means that the value of f.s1 could be:
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| x | g...a...r...b...a...g...e | r...u...b...b...i...s...h |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
The garbage is because those 7 bytes need not have been copied (footnote 51 says that is an option, even though it is not likely to be an option exercised by any current compiler). The rubbish is because the initialization of d never set any values in those bytes; the contents of that part of the structure is unspecified.
If you now go ahead and try to treat f as a copy of d, you might be a little surprised to find that only 1 byte of the 8 relevant bytes of f.s2 is actually initialized.
I'll reemphasize: I know of no compiler that would do this. But the question is tagged 'language lawyer' so the issue is 'what does the language standard state' and this is my interpretation of the quoted sections of the standard.
Yes your code will work because with an union the compiler will share the same memory space for all the elements.
For example if:
&f.mystruct = 100 then &f.mystruct2 = 100 and &f.mystruct3 = 100
If mystruct is the largest one then it will work all the time.
Yes you can directly access them. You can assign a value to a union member and read it back through a different union member. The result will be deterministic and correct.
I can't seem to wrap my head around certain parts of the C standard, so I'm coming here to clear up that foggy, anxious uncertainty that comes when I have to think about what such tricks are defined behaviour and what are undefined or violate the standard. I don't care whether or not it will WORK, I care if the C standard considers it legal, defined behaviour.
Such as this, which I am fairly certain is UB:
struct One
{
int Hurr;
char Durr[2];
float Nrrr;
} One;
struct Two
{
int Hurr;
char Durr[2];
float Nrrr;
double Wibble;
} Two;
One = *(struct One*)&Two;
This is not all I am talking about. Such as casting the pointer to One to int*, and dereferencing it, etc. I want to get a good understanding of what such things are defined so I can sleep at night. Cite places in the standard if you can, but be sure to specify whether it's C89 or C99. C11 is too new to be trusted with such questions IMHO.
I think that technically that example is UB, too. But it will almost certainly work, and neither gcc nor clang complain about it with -pedantic.
To start with, the following is well-defined in C99 (§6.5.2.3/6): [1]
union OneTwo {
struct One one;
struct Two two;
};
OneTwo tmp = {.two = {3, {'a', 'b'}, 3.14f, 3.14159} };
One one = tmp.one;
The fact that accessing the "punned" struct One through union must work implies that the layout of the prefix of struct Two is identical to struct One. This cannot be contingent on the existence of a union because the a given composite type can only have one storage layout, and its layout cannot be contingent on its use in a union because the union does not need to be visible to every translation unit in which the struct is used.
Furthermore, in C all types are no more than a sequence of bytes (unlike, for example, C++) (§6.2.6.1/4) [2]. Consequently, the following is also guaranteed to work:
struct One one;
struct Two two = ...;
unsigned char tmp[sizeof one];
memcpy(tmp, two, sizeof one);
memcpy(one, tmp, sizeof one);
Given the above and the convertibility of any pointer type to a void*, I think it is reasonable to conclude that the temporary storage above is unnecessary, and it could have been written directly as:
struct One one;
struct Two two = ...;
unsigned char tmp[sizeof one];
memcpy(one, two, sizeof one);
From there to the direct assignment through an aliased pointer as in the OP is not a very big leap, but there is an additional problem for the aliased pointer: it is theoretically possible for the pointer conversion to create an invalid pointer, because it's possible that the bit format of a struct Two* differs from a struct One*. Although it is legal to cast one pointer type to another pointer type with looser alignment (§6.3.2.3/7) [3] and then convert it back again, it is not guaranteed that the converted pointer is actually usable, unless the conversion is to a character type. In particular, it is possible that the alignment of struct Two is different from (more strict than) the alignment of struct One, and that the bit format of the more strongly-aligned pointer is not directly usable as a pointer to the less strongly-aligned struct. However, it is hard to see an argument against the almost equivalent:
one = *(struct One*)(void*)&two;
although this may not be explicitly guaranteed by the standard.
In comments, various people have raised the spectre of aliasing optimizations. The above discussion does not touch on aliasing at all because I believe that it is irrelevant to a simple assignment. The assignment must be sequenced after any preceding expressions and before any succeeding ones; it clearly modifies one and almost as clearly references two. An optimization which made a preceding legal mutation of two invisible to the assignment, would be highly suspect.
But aliasing optimizations are, in general, possible. Consequently, even though all of the above pointer casts should be acceptable in the context of a single assignment expression, it would certainly not be legal behaviour to retain the converted pointer of type struct One* which actually points into an object of type struct Two and expect it to be usable either to mutate a member of its target or to access a member of its target which has otherwise been mutated. The only context in which you could get away with using a pointer to struct One as though it were a pointer to the prefix of struct Two is when the two objects are overlaid in a union.
--- Standard references:
[1] "if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible."
[2] "Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT
bits, where n is the size of an object of that type, in bytes. The value may be copied into
an object of type unsigned char [n] (e.g., by memcpy)…"
[3] "A pointer to an object type may be converted to a pointer to a different object type… When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object."
C99 6.7.2.1 says:
Para 5
As discussed in 6.2.5, a structure is a type consisting of a sequence
of members, whose storage is allocated in an ordered sequence
Para 12
Each non-bit-field member of a structure or union object is aligned in
an implementation-defined manner appropriate to its type.
Para 13
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There
may be unnamed padding within a structure object, but not at its
beginning
That last paragraph covers your second question (casting the pointer to One to int*, and dereferencing it).
The first point - whether it is valid to "Downcast" a Two* to a One* - I could not find specifically addressed. It boils down to whether the other rules ensure that the memory layout of the fields of One and the initial fields of Two are identical in all cases.
The members have to be packed in ordered sequence, no padding is allowed at the beginning, and they have to be aligned according to type, but the standard does not actually say that the layout needs to be the same (even though in most compilers I am sure it is).
There is, however, a better way to define these structures so that you can guarantee it:
struct One
{
int Hurr;
char Durr[2];
float Nrrr;
} One;
struct Two
{
struct One one;
double Wibble;
} Two;
You might think you can now safely cast a Two* to a One* - Para 13 says so. However strict aliasing might bite you somewhere unpleasant. But with the example above you don't need to anyway:
One = Two.one;
A1. Undefined behaviour, because of Wibble.
A2. Defined.
S9.2 in N3337.
Two standard-layout struct (Clause 9) types are layout-compatible if
they have the same number of non-static data members and corresponding
non-static data members (in declaration order) have layout-compatible
types
Your structs would be layout compatible and thus interchangeable but for Wibble. There is a good reason too: Wibble might cause different padding in struct Two.
A pointer to a standard-layout struct object, suitably converted using
a reinterpret_cast, points to its initial member (or if that member is
a bit-field, then to the unit in which it resides) and vice versa.
I think that guarantees that you can dereference the initial int.