As previously established, a union of the form
union some_union {
type_a member_a;
type_b member_b;
...
};
with n members comprises n + 1 objects in overlapping storage: One object for the union itself and one object for each union member. It is clear, that you may freely read and write to any union member in any order, even if reading a union member that was not the last one written to. The strict aliasing rule is never violated, as the lvalue through which you access the storage has the correct effective type.
This is further supported by footnote 95, which explains how type punning is an intended use of unions.
A typical example of the optimizations enabled by the strict aliasing rule is this function:
int strict_aliasing_example(int *i, float *f)
{
*i = 1;
*f = 1.0;
return (*i);
}
which the compiler may optimize to something like
int strict_aliasing_example(int *i, float *f)
{
*i = 1;
*f = 1.0;
return (1);
}
because it can safely assume that the write to *f does not affect the value of *i.
However, what happens when we pass two pointers to members of the same union? Consider this example, assuming a typical platform where float is an IEEE 754 single precision floating point number and int is a 32 bit two's complement integer:
int breaking_example(void)
{
union {
int i;
float f;
} fi;
return (strict_aliasing_example(&fi.i, &fi.f));
}
As previously established, fi.i and fi.f refer to an overlapping memory region. Reading and writing them is unconditionally legal (writing is only legal once the union has been initialized) in any order. In my opinion, the previously discussed optimization performed by all major compilers yields incorrect code as the two pointers of different type legally point to the same location.
I somehow can't believe that my interpretation of the strict aliasing rule is correct. It doesn't seem plausible that the very optimization the strict aliasing was designed for is not possible due to the aforementioned corner case.
Please tell me why I'm wrong.
A related question turned up during research.
Please read all existing answers and their comments before adding your own to make sure that your answer adds a new argument.
Starting with your example:
int strict_aliasing_example(int *i, float *f)
{
*i = 1;
*f = 1.0;
return (*i);
}
Let's first acknowledge that, in the absence of any unions, this would violate the strict aliasing rule if i and f both point to the same object; assuming the object has no declared type, then *i = 1 sets the effective type to int and *f = 1.0 then sets it to float, and the final return (*i) then accesses an object with effective type of float via an lvalue of type int, which is clearly not allowed.
The question is about whether this would still amount to a strict-aliasing violation if both i and f point to members of the same union. For this not to be the case, it would either have to be that there is some special exemption from the strict aliasing rule that applies in this situation, or that accessing the object via *i does not (also) access the same object as *f.
On union member access via the "." member access operator, the standard says (6.5.2.3):
A postfix expression followed by the . operator and an identifier
designates a member of a structure or union object. The value is that
of the named member (95) and is an lvalue if the first expression is
an lvalue.
The footnote 95 referred to in above says:
If the member used to read the contents of a union object is not the
same as the member last used to store a value in the object, the
appropriate part of the object representation of the value is
reinterpreted as an object representation in the new type as described
in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be
a trap representation.
This is clearly intended to allow type punning via a union, but it should be noted that (1) footnotes are non-normative, that is, they are not supposed to proscribe behaviour, but rather they should clarify the intention of some part of the text in accordance with the rest of the specification, and (2) this allowance for type punning via a union is deemed by compiler vendors as applying only for access via the union member access operator - since otherwise strict aliasing is pretty useless for optimisation, as just about any two pointers potentially refer to different members of the same union (your example is a case in point).
So at this point, we can say that:
the code in your example is explicitly allowed by a non-normative footnote
the normative text on the other hand seems to disallow your example (due to strict aliasing), assuming that accessing one member of a union also constitutes access to another - but more on this shortly
Does accessing one member of a union actually access the others, though? If not, the strict aliasing rule isn't concerned with the example. (If it does, the strict aliasing rule, problematically, disallows just about any type-punning via a union).
A union is defined as (6.2.5 para 20):
A union type describes an overlapping nonempty set of member objects
And note that (6.7.2.1 para 16):
The value of at most one of the members can be stored in a union object at any time
Since access is (3):
〈execution-time action〉 to read or modify the value of an object
... and, since non-active union members do not have a stored value, then presumably accessing one member does not constitute access to the other members!
However, the definition of member access (6.5.2.3, quoted above) says "The value is that of the named member" (this is the precise statement that footnote 95 is attached to) - if the member has no value, what then? Footnote 95 gives an answer but as I've noted it is not supported by the normative text.
In any case, nothing in the text would seem to imply that reading or modifying a union member "via the member object" (i.e. directly via an expression using the member access operator) should be any different than reading or modifying it via pointer to that same member. The consensus understanding applied by compiler vendors, which allows them to perform optimisations under the assumption that pointers of different types do not alias, and that requires type punning be performed only via expressions involving member access, is not supported by the text of the standard.
If footnote 95 is considered normative, your example is perfectly fine code without undefined behaviour (unless the value of (*i) is a trap representation), according to the rest of the text. However, if footnote 95 is not considered normative, there is an attempted access to an object which has no stored value and the behaviour then is at best unclear (though the strict aliasing rule is arguably not relevant).
In the understanding of compiler vendors currently, your example has undefined behaviour, but since this isn't specified in the standard it's not clear exactly what constraint the code violates.
Personally, I think the "fix" to the standard is to:
disallow access to a non-active union member except via lvalue conversion of a member access expression, or via assignment where the left-hand-side is a member access expression (an exception to this could perhaps be made for when the member in question has character type, since that would not have an effect on possible optimisations due to a similar exception in the strict aliasing rule itself)
specify in the normative text that the value of a non-active member is as is currently described by footnote 95
That would make your example not a violation of the strict aliasing rule, but rather a violation of the constraint that a non-active union member must be accessed only via an expression containing the member access operator (and appropriate member).
Therefore, to answer your question - Is the strict aliasing rule incorrectly specified? - no, the strict aliasing rule is not relevant to this example because the objects accessed by the two pointer dereferences are separate objects and, even though they overlap in storage, only one of them has a value at a time. However, the union member access rules are incorrectly specified.
A note on Defect Report 236:
Arguments about union semantics invariably refer to DR 236 at some point. Indeed, your example code is superficially very similar to the code in that Defect Report. I would note that:
The example in DR 236 is not about type-punning. It is about whether it is ok to assign to a non-active union member via a pointer to that member. The code in question is subtly different to that in the question here, since it does not attempt to access the "original" union member again after writing to the second member. Thus, despite the structural similarity in the example code, the Defect Report is largely unrelated to your question.
"Committee believes that Example 2 violates the aliasing rules in 6.5 paragraph 7" - this indicates that the committee believes that writing a "non-active" union member, but not via an expression containing a member access of the union object, is a strict-aliasing violation. As I've detailed above, this is not supported by the text of the standard.
"In order to not violate the rules, function f in example should be written as" - i.e. you must use the union object (and the "." operator) to change the active member type; this is in agreement with the "fix" to the standard I proposed above.
The Committee Response in DR 236 claims that "Both programs invoke undefined behavior". It has no explanation for why the first does so, and its explanation for why the 2nd does so seems to be wrong.
Under the definition of union members in §6.5.2.3:
3 A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. ...
4 A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. ...
See also §6.2.3 ¶1:
the members of structures or unions; each structure or union has a separate name space for its members (disambiguated by the type of the expression used to access the member via the . or -> operator);
It is clear that footnote 95 refers to the access of a union member with the union in scope and using the . or -> operator.
Since assignments and accesses to the bytes comprising the union are not made through union members but through pointers, your program does not invoke the aliasing rules of union members (including those clarified by footnote 95).
Further, normal aliasing rules are violated since the effective type of the object after *f = 1.0 is float, but its stored value is accessed by an lvalue of type int (see §6.5 ¶7).
Note: All references cite this C11 standard draft.
The C11 standard (§6.5.2.3.9 EXAMPLE 3) has following example:
The following is not a valid fragment (because the union type is not
visible within function f):
struct t1 { int m; };
struct t2 { int m; };
int f(struct t1 *p1, struct t2 *p2)
{
if (p1->m < 0)
p2->m = -p2->m;
return p1->m;
}
int g()
{
union {
struct t1 s1;
struct t2 s2;
} u;
/* ... */
return f(&u.s1, &u.s2);
}
But I can't find more clarification on this.
Essentially the strict aliasing rule describes circumstances in which a compiler is permitted to assume (or, conversely, not permitted to assume) that two pointers of different types do not point to the same location in memory.
On that basis, the optimisation you describe in strict_aliasing_example() is permitted because the compiler is allowed to assume f and i point to different addresses.
The breaking_example() causes the two pointers passed to strict_aliasing_example() to point to the same address. This breaks the assumption that strict_aliasing_example() is permitted to make, therefore results in that function exhibiting undefined behaviour.
So the compiler behaviour you describe is valid. It is the fact that breaking_example() causes the pointers passed to strict_aliasing_example() to point to the same address which causes undefined behaviour - in other words, breaking_example() breaks the assumption that the compiler is allowed to make within strict_aliasing_example().
The strict aliasing rule forbids access to the same object by two pointers that do not have compatible types, unless one is a pointer to a character type:
7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.
In your example, *f = 1.0; is modifying fi.i, but the types are not compatible.
I think the mistake is in thinking that a union contains n objects, where n is the number of members. A union contains only one active object at any point during program execution by §6.7.2.1 ¶16
The value of at most one of the members can be stored in a union object at any time.
Support for this interpretation that a union does not simultaneously contain all of its member objects can be found in §6.5.2.3:
and if the union object currently contains one of these structures
Finally, an almost identical issue was raised in defect report 236 in 2006.
Example 2
// optimization opportunities if "qi" does not alias "qd"
void f(int *qi, double *qd) {
int i = *qi + 2;
*qd = 3.1; // hoist this assignment to top of function???
*qd *= i;
return;
}
main() {
union tag {
int mi;
double md;
} u;
u.mi = 7;
f(&u.mi, &u.md);
}
Committee believes that Example 2 violates the aliasing rules in 6.5
paragraph 7:
"an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union)."
In order to not violate the rules, function f in example should be
written as:
union tag {
int mi;
double md;
} u;
void f(int *qi, double *qd) {
int i = *qi + 2;
u.md = 3.1; // union type must be used when changing effective type
*qd *= i;
return;
}
Here is note 95 and its context:
A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member, (95) and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type of the designated member.
(95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called “type punning”). This might be a trap representation.
Note 95 clearly applies to an access via a union member. Your code does not do that. Two overlapping objects are accessed via pointers to 2 separate types, none of which is a character type, and none of which is a postfix expression pertinent for type punning.
This is not a definitive answer...
Let's back away from the standard for a second, and think about what's actually possible for a compiler.
Suppose that strict_aliasing_example() is defined in strict_aliasing_example.c, and breaking_example() is defined in breaking_example.c. Assume both of these files are compiled separately and then linked together, like so:
gcc -c -o strict_aliasing_example.o strict_aliasing_example.c
gcc -c -o breaking_example.o breaking_example.c
gcc -o breaking_example strict_aliasing_example.o breaking_example.o
Of course we'll have to add a function prototype to breaking_example.c, which looks like this:
int strict_aliasing_example(int *i, float *f);
Now consider that the first two invocations of gcc are completely independent and cannot share information except for the function prototype. It is impossible for the compiler to know that i and j will point to members of the same union when it generates code for strict_aliasing_example(). There's nothing in the linkage or type system to specify that these pointers are somehow special because they came from a union.
This supports the conclusion that other answers have mentioned: from the standard's point of view, accessing a union via . or -> obeys different aliasing rules compared with dereferencing an arbitrary pointer.
Prior to the C89 Standard, the vast majority of implementations defined the behavior of write-dereferencing to pointer of a particular type as setting the bits of the underlying storage in the fashion defined for that type, and defined the behavior of read-dereferencing a pointer of a particular type as reading the bits of the underlying storage in the fashion defined for that type. While such abilities would not have been useful on all implementations, there were many implementations where the performance of hot loops could be greatly improved by e.g. using 32-bit loads and stores to operate on groups of four bytes at once. Further, on many such implementations, supporting such behaviors didn't cost anything.
The authors of the C89 Standard state that one of their objectives was to avoid irreparably breaking existing code, and there are two fundamental ways the rules could have been interpreted consistent with that:
The C89 rules could have been intended to be applicable only in the cases similar to the one given in the rationale (accessing an object with declared type both directly via that type and indirectly via pointer), and where compilers would not have reason to expect that lvalues are related. Keeping track for each variable whether it is currently cached in a register is pretty simple, and being able to keep such variables in registers while accessing pointers of other types is a simple and useful optimization and would not preclude support for code which uses the more common type punning patterns (having a compiler interpret a float* to int* cast as necessitating a flush of any register-cached float values is simple and straightforward; such casts are rare enough that such an approach would be unlikely to adversely affect performance).
Given that the Standard is generally agnostic with regard to what makes a good-quality implementation for a given platform, the rules could be interpreted as allowing implementations to break code which uses type punning in ways that would be both useful and obvious, without suggesting that good quality implementations shouldn't try to avoid doing so.
If the Standard defines a practical way of allowing in-place type punning which is not in any way significantly inferior to other approaches, then approaches other than the defined way might reasonably be regarded as deprecated. If no Standard-defined means exists, then quality implementations for platforms where type punning is necessary to achieve good performance should endeavor to efficiently support common patterns on those platforms whether or not the Standard requires them to do so.
Unfortunately, the lack of clarity as to what the Standard requires has resulted in a situation where some people regard as deprecated constructs for which no replacements exist. Having the existence of a complete union type definition involving two primitive types be interpreted as an indication that any access via pointer of one type should be regarded as a likely access to the other would make it possible to adjust programs which rely upon in-place type punning to do so without Undefined Behavior--something which is not achievable any other practical way given the present Standard. Unfortunately, such an interpretation would also limit many optimizations in the 99% of cases where they would be harmless, thus making it impossible for compilers which interpret the Standard that way to run existing code as efficiently as would otherwise be possible.
As to whether the rule is correctly specified, that would depend upon what it is supposed to mean. Multiple reasonable interpretations are possible, but combining them yields some rather unreasonable results.
PS--the only interpretation of the rules regarding pointer-comparisons and memcpy that would make sense without giving the term "object" a meaning different from its meaning in the aliasing rules would suggest that no allocated region can be used to hold more than a single kind of object. While some kinds of code might be able to abide such a restriction, it would make it impossible for programs to use their own memory management logic to recycle storage without excessive numbers of malloc/free calls. The authors of the Standard may have intended to say that implementations are not required to let programmers create a large region and partition it into smaller mixed-type chunks themselves, but that doesn't mean that they intended general-purpose implementations would fail to do so.
The Standard does not allow the stored value of a struct or union to be accessed using an lvalue of the member type. Since your example accesses the stored value of a union using lvalues whose type is not that of the union, nor any type that contains that union, behavior would be Undefined on that basis alone.
The one thing that gets tricky is that under a strict reading of the Standard, even something so straightforward as
int main(void)
{
struct { int x; } foo;
foo.x = 1;
return 0;
}
also violates N1570 6.5p7 because foo.x is an lvalue of type int, it is used to access the stored value of an object of type struct foo, and type int does not satisfy any of the conditions on that section.
The only way the Standard can be even remotely useful is if one recognizes that there need to be exceptions to N1570 6.5p7 in cases involving lvalues that are derived from other lvalues. If the Standard were to describe cases where compilers may or must recognize such derivation, and specify that N1570 6.5p7 only applies in cases where storage is accessed using more than one type within a particular execution of a function or loop, that would have eliminated a lot of complexity including any need for the notion of "Effective Type".
Unfortunately, some compilers have taken it upon themselves to ignore derivation of lvalues and pointers even in some obvious cases like:
s1 *p1 = &unionArr[i].v1;
p1->x ++;
It may be reasonable for a compiler to fail to recognize the association between p1 and unionArr[i].v1 if other actions involving unionArr[i] separated the creation and use of p1, but neither gcc nor clang can consistently recognize such association even in simple cases where the use of the pointer immediately follows the action which takes the address of the union member.
Again, since the Standard doesn't require that compilers recognize any usage of derived lvalues unless they are of character types, the behavior of gcc and clang does not make them non-conforming. On the other hand, the only reason they are conforming is because of a defect in the Standard which is so outrageous that nobody reads the Standard as saying what it actually does.
Related
First, I apologize if this appears to be a duplicate, but I couldn't find exactly this question elsewhere
I was reading through N1570, specifically §6.5¶7, which reads:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
This reminded me of a common idiom I had seen in (BSD-like) socket programming, especially in the connect() call. Though the second argument to connect() is a struct sockaddr *, I have often seen passed to it a struct sockaddr_in *, which appears to work because they share a similar initial element. My question is:
To which contingency detailed in the above rule does this situation apply and why, or is it now undefined behavior that's an artifact of previous standard(s)?
This behavior is not defined by the C standard.
The behavior is defined by The Single Unix Specification and/or other documents relating to the software you are using, albeit in part implicitly.
The phrasing that “An object shall have its stored value accessed only by…” is misleading. The C standard cannot compel you to do anything; you are not obligated to obey its “shall” requirements. In terms of the C standard, the only consequence of not obeying its requirements is that the C standard does not define the behavior. This does not prohibit other documents from defining the behavior.
In the netinet/in.h documentation, we see “The sockaddr_in structure is used to store addresses for the Internet protocol family. Values of this type must be cast to struct sockaddr for use with the socket interfaces defined in this document.” So the documentation tells us not only that we should, but that we must, convert a sockaddr_in to a sockaddr. The fact that we must do so implies that the software supports it and that it will work. (Note that the phrasing is imprecise here; we do not actually cast a sockaddr_in to a sockaddr but actually convert the pointer, causing the sockaddr_in object in memory to be treated as a sockaddr.)
Thus there is an implied promise that the operating system, libraries, and developer tools provided for a Unix implementation support this.
This is an extension to the C language: Where behavior is not defined by the C standard, other documents may provide definitions and allow you to write software that cannot be written using the C standard alone. Behavior that the C standard says is undefined is not behavior that is prohibited but rather is an empty space that may be filled in by other specifications.
The rules about common initial sequences goes back to 1974. The earliest rules about "strict aliasing" only go back to 1989. The intention of the latter was not that they trump everything else, but merely that compilers be allowed to perform optimizations that their customers would find useful without being branded non-conforming. The Standard makes clear that in situations where one part of the Standard and/or an implementation's documentation would describe the behavior of some action but another part of the Standard would characterize it as Undefined Behavior, implementations may opt to give priority to the first, and the Rationale makes clear that the authors thought "the marketplace" would be better placed than the Committee to determine when implementations should do so.
Under a sufficiently pedantic reading of the N1570 6.5p7 constraints, almost all programs violate them, but in ways that won't matter unless an implementation is being sufficiently obtuse. The Standard makes no attempt to list all the situations in which an object of one type may be accessed by an lvalue of another, but rather those where a compiler must allow for an object of one type to be accessed by a seemingly unrelated lvalue of another. Given the code sequence:
int x;
int *p[10];
p[2] = &someStruct.intMember;
...
*p[2] = 23;
x = someStruct.intMember;
In the absence of the rules in 6.5p7, unless a compiler kept track of where p[2] came from, it would have no reason to recognize that the read of someStruct.member might be targeting storage that was just written using *p[2]. On the other hand, given the code:
int x;
int *p[10];
...
someStruct.intMember = 12;
p[2] = &someStruct.intMember;
x = *p[2];
Here, there is no rule that would actually allow the storage associated with a structure to be accessed by an lvalue of that member type, but unless a compiler is being deliberately blind, it would be able to see that after the first assignment to someStruct.intMember, the address of that member is being taken, and should either:
Account for all actions that will ever be done with the resulting pointer, if it is able to do so, or
Refrain from assuming that the structure's storage will not be accessed between the previous and succeeding actions using the structure's type.
I don't think it ever occurred to the people who were writing the rules that would later be renumbered as N1570 6.5p7 that they would be construed so as to disallow common patterns that exploited the Common Initial Sequence rule. As noted, most programs violate the constraints of 6.5p7, but do so in ways that would be processed predictably by any compiler that isn't being obtuse; those using the Common Initial Sequence guarantees would have fallen into that category. Since the authors of the Standard recognized the possibility of a "conforming" compiler that was only capable of meaningfully processing one contrived and useless program, the fact that an obtuse compiler could abuse the "aliasing rules" wasn't seen as a defect.
Is it OK do do something like this?
struct MyStruct {
int x;
const char y; // notice the const
unsigned short z;
};
struct MyStruct AStruct;
fread(&MyStruct, sizeof (MyStruct), 1,
SomeFileThatWasDefinedEarlierButIsntIncludedInThisCodeSnippet);
I am changing the constant struct member by writing to the entire struct from a file. How is that supposed to be handled? Is this undefined behavior, to write to a non-constant struct, if one or more of the struct members is constant? If so, what is the accepted practice to handle constant struct members?
It's undefined behavior.
The C11 draft n1570 says:
6.7.3 Type qualifiers
...
...
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.
My interpretation of this is: To be compliant with the standard, you are only allowed to set the value of the const member during object creation (aka initialization) like:
struct MyStruct AStruct = {1, 'a', 2}; // Fine
Doing
AStruct.y = 'b'; // Error
should give a compiler error.
You can trick the compiler with code like:
memcpy(&AStruct, &AnotherStruct, sizeof AStruct);
It will probably work fine on most systems but it's still undefined behavior according to the C11 standard.
Also see memcpy with destination pointer to const data
How are constant struct members handled in C?
Read the C11 standard n1570 and its §6.7.3 related to the const qualifier.
If so, what is the accepted practice to handle constant struct members?
It depends if you care more about strict conformance to the C standard, or about practical implementations. See this draft report (work in progress in June 2020) discussing these concerns. Such considerations depend on the development efforts allocated on your project, and on portability of your software (to other platforms).
It is likely that you won't spend the same efforts on the embedded software of a Covid respirator (or inside some ICBM) and on the web server (like lighttpd or a library such as libonion or some FastCGI application) inside a cheap consumer appliance or running on some cheap rented Linux VPS.
Consider also using static analysis tools such as Frama-C or the Clang static analyzer on your code.
Regarding undefined behavior, be sure to read this blog.
See also this answer to a related question.
I am changing the constant struct member by writing to the entire struct from a file.
Then endianness issues and file system issues are important. Consider perhaps using libraries related to JSON, to YAML, perhaps mixed to sqlite or PostGreSQL or TokyoCabinet (and the source code of all these open source libraries, or from the Linux kernel, could be inspirational).
The Standard is a bit sloppy in its definition and use of the term "object". For a statement like "All X must be Y" or "No X may be Z" to be meaningful, the definition of X must have criteria that are not only satisfied by all X, but that would unambiguously exclude all objects that aren't required to be Y or are allowed to be Z.
The definition of "object", however, is simply "region of data storage in the execution environment, the contents of which can represent values". Such a definition, however, fails to make clear whether every possible range of consecutive addresses is always an "object", or when various possible ranges of addresses are subject to the constraints that apply to "objects" and when they are not.
In order for the Standard to unambiguously classify a corner case as defined or undefined, the Committee would have to reach a consensus as to whether it should be defined or undefined. If the Committee members fundamentally disagree about whether some cases should be defined or undefined, the only way to pass a rule by consensus will be if the rule is written ambiguously in a way that allows people with contradictory views about what should be defined to each think the rule supports their viewpoint. While I don't think the Committee members explicitly wanted to make their rules ambiguous, I don't think the Committee could have been consensus for rules that weren't.
Given that situation, many actions, including updating structures that have constant members, most likely falls in the realm of actions which the Standard doesn't require implementations to process meaningfully, but which the authors of the Standard would have expected that implementations would process meaningfully anyhow.
Consider the following code on a platform where the ABI does not insert padding into unions:
union { int xi; } x;
x.xi = 1;
I believe that the second line exhibits undefined behaviour as it violates the strict aliasing rule:
The object referred to by x.xi is the same object as the object referred to by x. Both are the same region of storage and the term object is defined in ISO 9899:2011 §3.15 as:
object
1 region of data storage in the execution environment, the contents of which can represent values
2 NOTE When referenced, an object may be interpreted as having a particular type; see 6.3.2.1.
As an object is not more than a region of storage, I conclude that as x and x.xi occupy the same storage, they are the same object.
The effective type of x is union { int xi; } as that's the type it has been declared with. See §6.5 ¶6:
6 The effective type of an object for an access to its stored value is the declared type of the object, if any.87) If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.
87) Allocated objects have no declared type.
By the wording of ¶6 it is also clear that each object can only have one effective type.
In the statement x.xi I access x through the lvalue x.xi typed int. This is not one of the types listed in §6.5 ¶7:
7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the
object,
a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or
a character type.
88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
Therefore, the second line exhibits undefined behaviour.
As this interpretation is clearly wrong, where lies my misreading of the standard?
The error is thinking that x and x.xi are the same object.
The union is an object and it contains member objects1. They are distinct objects, each with it's own type.
1. (Quoted from: ISO/IEC 9899:20x1 6.2.5 Types 20)
A union type describes an overlapping nonempty set of member objects, each of
which has an optionally specified name and possibly distinct type.
Outside of the rules which forbid the use of pointers to access things of other types, the term "object" refers to a contiguous allocation of storage. Each individual variable of automatic or static duration is an independent object (since an implementation could arbitrarily scatter them throughout memory) but any region of memory created by malloc would be a single object--effectively of type "char[]", no matter how many different ways the contents therein were indexed and accessed.
The C89 rules regarding pointer type access could be made workable if, in addition to the special rule for character-pointer types, there were a corresponding rule for suitably-aligned objects of character-array types, or for objects with no declared type that were effectively "char[]". Interpreting the rules in such a fashion would limit their application to objects that had declared types. This would have allowed most of the optimizations that would have been practical in 1989, but as compilers got more sophisticated it became more desirable to be able to apply similar optimizations to allocated storage; since the rules weren't set up for that, there was little clarity as to what was or was not permissible.
By 1999, there was a substantial overlap between the kinds of pointer-based
accesses some programs needed to do, and the kinds of pointer-based accesses
that compilers were assuming programs wouldn't do, so any single C99 standard
would have either required that some C99 implementations be made less efficient
than they had been, or else allow C99 compilers to behave arbitrarily with a
large corpus of code that relies upon techniques that some compilers didn't
support.
The authors of C99, rather than resolving the situation by defining
directives to specify different aliasing modes, attempted to "clarify" it
by adding language that either requires applying a different definition of
"object" from the one used elsewhere, or else requires that each allocated
region hold either one array of a single type or a single structure which
may contain a flexible array member. The latter restriction might be usable
in a language being designed from scratch, but would effectively invalidate
a huge amount of C code. Fortunately or unfortunately, however, the authors of the Standard were to get away with such sloppy drafting since compiler writers were, at least until recently, more interested in doing what was necessary to make a compiler useful than in doing the minimum necessary to comply with the poorly-written Standard.
If one wants to write code that will work with a quality compiler, ensure that any aliasing is done in ways that a compiler would have to be obtuse to ignore (e.g. if a function receives a parameter of type T*, casts it to U*, and then
accesses the object as a U*, a compiler that's not being obtuse should have no
trouble recognizing that the function might really be accessing a T*). If one wants to write code that will work with the most obtuse compiler imaginable... that's impossible, since the Standard doesn't require that an implementation be incapable of processing anything other than a possibly-contrived and useless program. If one wants to write code that will work on gcc, the author's willingness to support constructs will be far more relevant than what the Standard has to say about them.
The Question
The question of whether all pointers derived from pointers to structure types are the same, is not easy to answer. I find it to be a significant question for the following two primary reasons.
A. The lack of a pointer to pointer to 'any' incomplete or object type, imposes a limitation on convenient function interfaces, such as:
int allocate(ANY_TYPE **p,
size_t s);
int main(void)
{
int *p;
int r = allocate(&p, sizeof *p);
}
[Complete code sample]
The existing pointer to 'any' incomplete or object type is explicitly described as:
C99 / C11 §6.3.2.3 p1:
A pointer to void may be converted to or from a pointer to any incomplete or object type. [...]
A pointer derived from the existing pointer to 'any' incomplete or object type, pointer to pointer to void, is strictly a pointer to pointer to void, and is not required to be convertible with a pointer derived from a pointer to 'any' incomplete or object type.
B. It is not uncommon for programmers to utilize conventions based on assumptions that are not required, related to the generalization of pointers, knowingly or unknowingly, while depending on their experience with their specific implementations. Assumptions such as being convertible, being representable as integers, or sharing a common property: object size, representation, or alignment.
The words of the standard
According to C99 §6.2.5 p27 / C11 §6.2.5 p28:
[...] All pointers to structure types shall have the same representation and alignment requirements as each other. [...]
Followed by C99 TC3 Footnote 39 / C11 Footnote 48:
The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.
Although the standard doesn't say: "A pointer to a structure type" and the following words have been chosen: "All pointers to structure types", it doesn't explicitly specify whether it applies to a recursive derivation of such pointers. In other occasions where special properties of pointers are mentioned in the standard, it doesn't explicitly specify or mention recursive pointer derivation, which means that either the 'type derivation' applies, or it doesn't- but it's not explicitly mentioned.
And although the phrasing "All pointers to" while referring to types is used only twice, (for structure and union types), as opposed to the more explicit phrasing: "A pointer to" which is used throughout the standard, we can't conclude whether it applies to a recursive derivation of such pointers.
Background
The assumption that the standard implicitly requires all pointers to structure types, (complete, incomplete, compatible and incompatible), to have the same representation and alignment requirements, began at C89- many years before the standard required it explicitly. The reasoning behind it was the compatibility of incomplete types in separate translation units, and although according to the C standards committee, the original intent was to allow the compatibility of an incomplete type with its completed variation, the actual words of the standard did not describe it. This has been amended in the second Technical corrigendum to C89, and therefore made the original assumption concrete.
Compatibility and Incomplete Types
While reading the guidelines related to compatibility and incomplete types, thanks to Matt McNabb, we find further insight of the original C89 assumption.
Pointer derivation of object and incomplete types
C99 / C11 §6.2.5 p1:
Types are partitioned into object types, function types, and incomplete types.
C99 / C11 §6.2.5 p20:
A pointer type may be derived from a function type, an object type, or an incomplete type, called the referenced type.
C99 / C11 §6.2.5 p22:
A structure or union type of unknown content is an incomplete type. It is completed, for all declarations of that type, by declaring the same structure or union tag with its defining content later in the same scope.
Which means that pointers may be derived from both object types and incomplete types. Although it isn't specified that incomplete types are not required to be completed; in the past the committee responded on this matter, and stated that the lack of a prohibition is sufficient and there's no need for a positive statement.
The following pointer to pointer to incomplete 'struct never_completed', is never completed:
int main(void)
{
struct never_completed *p;
p = malloc(1024);
}
[Complete code sample]
Compatible types of separate translation units
C99 / C11 §6.7.2.3 p4:
All declarations of structure, union or enumerated types that have the same scope and use the same tag declare the same type.
C99 / C11 §6.2.7 p1:
Two types have compatible type if their types are the same. Two structure types declared in separate translation units are compatible if their tags (are) the same tag. [trimmed quote] [...]
This paragraph has a great significance, allow me to summarize it: two structure types declared in separate translation units are compatible if they use the same tag. If both of them are completed- their members have to be the same (according to the specified guidelines).
Compatibility of pointers
C99 §6.7.5.1 p2 / C11 §6.7.6.1 p2:
For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.
If the standard mandates that two structures under specified conditions, are to be compatible in separate translation units whether being incomplete or complete, it means that the pointers derived from these structures are compatible just as well.
C99 / C11 §6.2.5 p20:
Any number of derived types can be constructed from the object, function, and incomplete types
These methods of constructing derived types can be applied recursively.
And due to the fact that pointer derivation is recursive, it makes pointers derived from pointers to compatible structure types, to be compatible with each other.
Representation of compatible types
C99 §6.2.5 p27 / C11 §6.2.5 p28:
pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements.
C99 / C11 §6.3 p2:
Conversion of an operand value to a compatible type causes no change to the value or the representation.
C99 / C11 §6.2.5 p26:
The qualified or unqualified versions of a type are distinct types that belong to the same type category and have the same representation and alignment requirements.
This means that a conforming implementation can't have a distinct judgement concerning the representation and alignment requirements of pointers derived from incomplete or complete structure types, due to the possibility that a separate translation unit might have a compatible type, which will have to share the same representation and alignment requirements, and it is required to apply the same distinct judgement with either an incomplete or a complete variation of the same structure type.
The following pointer to pointer to incomplete 'struct complete_incomplete':
struct complete_incomplete **p;
Is compatible and shares the same representation and alignment requirements as the following pointer to pointer to complete 'struct complete_incomplete':
struct complete_incomplete { int i; } **p;
C89 related
If we wonder about the premise concerning C89, defect report #059 of Jun 93' questioned:
Both sections do not explicitly require that an incomplete type eventually must be completed, nor do they explicitly allow incomplete types to remain incomplete for the whole compilation unit. Since this feature is of importance for the declaration of true opaque data types, it deserves clarification.
Considering mutual referential structures defined and implemented in different compilation units makes the idea of an opaque data type a natural extension of an incomplete data type.
The response of the committee was:
Opaque data types were considered, and endorsed, by the Committee when drafting the C Standard.
Compatibility versus Interchangeability
We have covered the aspect concerning the representation and alignment requirements of recursive pointer derivation of pointers to structure types, now we are facing a matter that a non-normative footnote mentioned, 'interchangeability':
C99 TC3 §6.2.5 p27 Footnote 39 / C11 §6.2.5 p28 Footnote 48:
The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.
The standard says that the notes, footnotes, and examples are non-normative and are "for information only".
C99 FOREWORD p6 / C11 FOREWORD p8:
[...] this foreword, the introduction, notes, footnotes, and examples are also for information only.
It's unfortunate that this confusing footnote was never changed, because at best- the footnote is specifically about the direct types referring to it, so phrasing the footnote as-if the properties of "representation and alignment requirements" are without the context of these specific types, makes it easy to interpret as being a general rule for all types that share a representation and alignment. If the footnote is to be interpreted without the context of specific types, then it's obvious that the normative text of the standard doesn't imply it, even without the need to debate the interpretation of the term 'interchangeable'.
Compatibility of pointers to structure types
C99 / C11 §6.7.2.3 p4:
All declarations of structure, union or enumerated types that have the same scope and use the same tag declare the same type.
C99 / C11 §6.2.7 p1:
Two types have compatible type if their types are the same.
C99 §6.7.5.1 p2 / C11 §6.7.6.1 p2:
For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.
This states the obvious conclusion, different structure types are indeed different types, and because they are different they are incompatible. Therefore, two pointers to two different and incompatible types, are incompatible just as well, regardless of their representation and alignment requirements.
Effective types
C99 / C11 §6.5 p7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object
C99 / C11 §6.5 p6:
The effective type of an object for an access to its stored value is the declared type of the object, if any.
Incompatible pointers are not 'interchangeable' as arguments to functions, nor as return values from functions. Implicit conversions and specified special cases are the exceptions, and these types are not part of any such exception. Even if we decide to add an unrealistic requirement for said 'interchangeability', and say that an explicit conversion is required to make it applicable, then accessing the stored value of an object with an incompatible effective type breaks the effective types rules. For making it a reality we need a new property that currently the standard doesn't have. Therefore sharing the same representation and alignment requirements, and being convertible, is simply not enough.
This leaves us with being interchangeable 'as members of unions', and although they are indeed interchangeable as members of union- it bears no special significance.
Official interpretations
1. The first 'official' interpretation belongs to a member of the C standards committee. His interpretation for: "are meant to imply interchangeability", is that it doesn't actually imply that such an interchangeability exists, but actually makes a suggestion for it.
As much as I would like it to become a reality, I wouldn't consider an implementation that took a suggestion from a non-normative footnote, not to mention an unreasonably vague footnote, while contradicting normative guidelines- to be a conforming implementation. This obviously renders a program that utilizes and depends on such a 'suggestion', to be a non-strictly conforming one.
2. The second 'official' interpretation belongs to a member/contributor to the C standards committee, by his interpretation the footnote doesn't introduce a suggestion, and because the (normative) text of standard doesn't imply it- he considers it to be a defect in the standard. He even made a suggestion to change the effective types rules for addressing this matter.
3. The third 'official' interpretation is from defect report #070 of Dec 93`. It has been asked, within the context of C89, whether a program that passes an 'unsigned int' type, where the type 'int' is expected, as an argument to a function with a non-prototype declarator, to introduce undefined behavior.
In C89 there's the very same footnote, with the same implied interchangeability as arguments to functions, attached to:
C89 §3.1.2.5 p2:
The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same.
The committee responded that they encourage implementors to allow this interchangeability to work, but since it's not a requirement, it renders the program to be a non-strictly conforming one.
The following code sample is not strictly conforming. '&s1' and 'struct generic **' are sharing the same representation and alignment requirements, but nevertheless they are incompatible. According to the effective types rules, we are accessing the stored value of the object 's1' with an incompatible effective type, a pointer to 'struct generic', while its declared type, and therefore effective type, is a pointer to 'struct s1'. To overcome this limitation we could've used the pointers as members of a union, but this convention damages the goal of being generic.
int allocate_struct(void *p,
size_t s)
{
struct generic **p2 = p;
if ((*p2 = malloc(s)) == NULL)
return -1;
return 0;
}
int main(void)
{
struct s1 { int i; } *s1;
if (allocate_struct(&s1, sizeof *s1) != 0)
return EXIT_FAILURE;
}
[Complete code sample]
The following code sample is strictly conforming, to overcome both issues of effective types and being generic, we're taking advantage of: 1. a pointer to void, 2. the representation and alignment requirements of all pointers to structs, and 3. accessing the pointer's byte representation 'generically', while using memcpy to copy the representation, without affecting its effective type.
int allocate_struct(void *pv,
size_t s)
{
struct generic *pgs;
if ((pgs = malloc(s)) == NULL)
return -1;
memcpy(pv, &pgs, sizeof pgs);
return 0;
}
int main(void)
{
struct s1 { int i; } *s1;
if (allocate_struct(&s1, sizeof *s1) != 0)
return EXIT_FAILURE;
}
[Complete code sample]
The Conclusion
The conclusion is that a conforming implementation must have the same representation and alignment requirements, respectively, for all recursively derived pointers to structure types, whether they are incomplete or complete, and whether they are compatible or incompatible. Although whether the types are compatible or incompatible is significant, but due to the mere possibility of a compatible type, they must share the fundamental properties of representation and alignment. It would've been preferred if we could access pointers that share representation and alignment directly, but unfortunately the current effective types rules do not require it.
My answer is "no."
There is no wording in any standard of C that I'm aware of which suggests otherwise. The fact that all pointers to structure types have the same representation and alignment requirements has no bearing on any derived type.
This makes complete sense and any other reality would seem to be inconsistent. Consider the alternative:
Let's call the alignment and representation requirements for pointers to structure types "A". Suppose that any "recursively derived type" shares the requirements "A".
Let's call the alignment and representation requirements for pointers to union types "B". Suppose that any "recursively derived type" shares the requirements "B".
Let's suppose that "A" and "B" are not the same[1]. Furthermore, let's suppose that they cannot be satisfied at the same time. (A 4-byte representation and an 8-byte representation, for example.)
Now derive a type from both:
A type with requirements "A"
A type with requirements "B"
Now you have a type whose requirements are impossible to satisfy, because it must satisfy "A" and "B", but they cannot both be satisfied at once.
Perhaps you're thinking of derived types as having a flat lineage all the way back to a single ancestor, but that's not so. Derived types can have many ancestors. The standard definition of "derived types" discusses this.
[1] While it might seem unreasonable, unlikely and silly, it's allowed.
A number of answers for the Stack Overflow question Getting the IEEE Single-precision bits for a float suggest using a union structure for type punning (e.g.: turning the bits of a float into a uint32_t):
union {
float f;
uint32_t u;
} un;
un.f = your_float;
uint32_t target = un.u;
However, the value of the uint32_t member of the union appears to be unspecified according to the C99 standard (at least draft n1124), where section 6.2.6.1.7 states:
When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.
At least one footnote of the C11 n1570 draft seems to imply that this is no longer the case (see footnote 95 in 6.5.2.3):
If the member used to read the contents of a union object is not the same as the member last used to
store a value in the object, the appropriate part of the object representation of the value is reinterpreted
as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type
punning’’). This might be a trap representation.
However, the text to section 6.2.6.1.7 is the same in the C99 draft as in the C11 draft.
Is this behavior actually unspecified under C99? Has it become specified in C11? I realize that most compilers seem to support this, but it would be nice to know if it's specified in the standard, or just a very common extension.
The behavior of type punning with union changed from C89 to C99. The behavior in C99 is the same as C11.
As Wug noted in his answer, type punning is allowed in C99 / C11. An unspecified value that could be a trap is read when the union members are of different size.
The footnote was added in C99 after Clive D.W. Feather Defect Report #257:
Finally, one of the changes from C90 to C99 was to remove any restriction on accessing one member of a union when the last store was to a different one. The rationale was that the behaviour would then depend on the representations of the values. Since this point is often misunderstood, it might well be worth making it clear in the Standard.
[...]
To address the issue about "type punning", attach a new footnote 78a to the words "named member" in 6.5.2.3#3:
78a If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
The wording of Clive D.W. Feather was accepted for a Technical Corrigendum in the answer by the C Committee for Defect Report #283.
The original C99 specification left this unspecified.
One of the technical corrigenda to C99 (TR2, I think) added footnote 82 to correct this oversight:
If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
That footnote is retained in the C11 standard (it's footnote 95 in C11).
This has always been "iffy". As others have noted a footnote was added to C99 via a Technical Corregendum. It reads as follows:
If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
However, footnotes are specified in the Foreword as non-normative:
Annexes D and F form a normative part of this standard; annexes A, B, C, E, G, H, I, J, the bibliography, and the index are for information only. In accordance with Part 3 of the ISO/IEC Directives, this foreword, the introduction, notes, footnotes, and examples are also for information only.
That is, the footnotes cannot proscribe behaviour; they should only clarify the existing text. It's an unpopular opinion, but the footnote quoted above actually fails in this regard - there is no such behaviour proscribed in the normative text. Indeed, there are contradictory sections, such as 6.7.2.1:
... The value of at most one of the members can be stored in a union object at any time
In conjunction with 6.5.2.3 (regarding accessing union members with the "." operator):
The value is that of the named member
I.e. if the value of only one member can be stored, the value of another member is non-existent. This strongly implies that type punning via a union should not be possible; the member access yields a non-existent value. The same text still exists in the C11 document.
However, it's clear that the purpose of adding the footnote was to allow for type-punning; it's just that the committee seemingly broke the rules on footnotes not containing normative text. To accept the footnote, you really have to disregard the section that says footnotes aren't normative, or otherwise try to figure out how to interpret the normative text in such a way that supports the conclusion of the footnote (which I have tried, and failed, to do).
About the best we can do to ratify the footnote is to make some assumptions about the definition of a union as a set of "overlapping objects", from 6.2.5:
A union type describes an overlapping nonempty set of member objects, each of which has an optionally specified name and possibly distinct type
Unfortunately there is no elaboration on what is meant by "overlapping". An object is defined as a (3.14) "region of data storage in the execution environment, the contents of which can represent values" (that the same region of storage can be identified by two or more distinct objects is implied by the "overlapping objects" definition above, that is, objects have an identity which is separate to their storage region). The reasonable assumption seems to be that union members (of a particular union instance) use the same storage region.
Even if we ignore 6.7.2.1/6.5.2.3 and allow, as the footnote suggests, that reading any union member returns the value that would be represented by the contents of the corresponding storage region—which would therefore allow for type punning—the ever-problematic strict-aliasing rule in 6.5 disallows (with certain minor exceptions) accessing an object other than by its type. Since an "access" is an (3.1) "〈execution-time action〉 to read or modify the value of an object", and since modifying one of a set of overlapping objects necessarily modifies the others, then the strict-aliasing rule could potentially be violated by writing to a union member (regardless of whether it is then read through another, or not).
For example, by the wording of the standard, the following is illegal:
union {
int a;
float b;
} u;
u.a = 0; // modifies a float object by an lvalue of type int
int *pa = &u.a;
*pa = 1; // also modifies a float object, without union lvalue involved
(Specifically, the two commented lines break the strict-aliasing rule).
Strictly speaking, the footnote speaks to a separate issue, that of reading an inactive union member; however the strict-aliasing rule in conjunction with other sections as noted above seriously limits its applicability and in particular means that it does not allow type-punning in general (but only for specific combinations of types).
Frustratingly, the committee responsible for developing the standard seem to intend for type-punning to generally be possible via a union, and yet do not appear to be troubled that the text of the standard still disallows it.
Worth noting also is that the consensus understanding (by compiler vendors) seems to be that type punning via a union is allowed, but "access must be via the union type" (eg the first commented line in the example above, but not the second). It's a little unclear whether this should apply to both read and write accesses, and is in no way supported by the text of the standard (disregarding the footnote).
In conclusion: while it is largely accepted that type punning via a union is legal (most consider it allowed only if the access is done "via the union type", so to speak), the wording of the standard prohibits it in all but certain trivial cases.
The section you quote:
When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.
... has to be read carefully, though. "The bytes of the object representation that do not correspond to that member" is referring to bytes beyond the size of the member, which isn't itself an issue for type punning (except that you cannot assume writing to a union member will leave the "extra" part of any larger member untouched).
However, this appears to violate the C99 standard (at least draft n1124), where section 6.2.6.1.7 states some stuff. Is this behavior actually unspecified under C99?
No, you're fine.
When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.
This applies to data blocks of different sizes. I.e, if you have:
union u
{
float f;
double d;
};
and you assign something to f, it would change the lower 4 bytes of d, but the upper 4 bytes would be in an indeterminate state.
Unions exist primarily for type punning.