When is it valid to access a pointer to a "dead" object? - c

First, to clarify, I am not talking about dereferencing invalid pointers!
Consider the following two examples.
Example 1
typedef struct { int *p; } T;
T a = { malloc(sizeof(int) };
free(a.p); // a.p is now indeterminate?
T b = a; // Access through a non-character type?
Example 2
void foo(int *p) {}
int *p = malloc(sizeof(int));
free(p); // p is now indeterminate?
foo(p); // Access through a non-character type?
Question
Do either of the above examples invoke undefined behaviour?
Context
This question is posed in response to this discussion. The suggestion was that, for example, pointer arguments may be passed to a function via x86 segment registers, which could cause a hardware exception.
From the C99 standard, we learn the following (emphasis mine):
[3.17] indeterminate value - either an unspecified value or a trap representation
and then:
[6.2.4 p2] The value of a pointer becomes indeterminate when
the object it points to reaches the end of its lifetime.
and then:
[6.2.6.1 p5] Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.
Taking all of this together, what restrictions do we have on accessing pointers to "dead" objects?
Addendum
Whilst I've quoted the C99 standard above, I'd be interested to know if the behaviour differs in any of the C++ standards.

Example 2 is invalid. The analysis in your question is correct.
Example 1 is valid. A structure type never holds a trap representation, even if one of its members does. This means that structure assignment, on a system where trap representations would cause problems, must be implemented as a bytewise copy, rather than a member-by-member copy.
6.2.6 Representations of types
6.2.6.1 General
6 [...] The value of a structure or union object is never a t rap
representation, even though the value of a member of the structure or union object may be
a trap representation.

My interpretation is that while only non-character types can have trap representations, any type can have indeterminate value, and that accessing an object with indeterminate value in any way invokes undefined behavior. The most infamous example might be OpenSSL's invalid use of uninitialized objects as a random seed.
So, the answer to your question would be: never.
By the way, an interesting consequence of not just the pointed-to object but the pointer itself being indeterminate after free or realloc is that this idiom invokes undefined behavior:
void *tmp = realloc(ptr, newsize);
if (tmp != ptr) {
/* ... */
}

C++ discussion
Short answer: In C++, there is no such thing as accessing "reading" a class instance; you can only "read" non-class object, and this is done by a lvalue-to-rvalue conversion.
Detailed answer:
typedef struct { int *p; } T;
T designates an unnamed class. For the sake of the discussion let's name this class T:
struct T {
int *p;
};
Because you did not declare a copy constructor, the compiler implicitly declares one, so the class definition reads:
struct T {
int *p;
T (const T&);
};
So we have:
T a;
T b = a; // Access through a non-character type?
Yes, indeed; this is initialization by copy constructor, so the copy constructor definition will be generated by the compiler; the definition is equivalent with
inline T::T (const T& rhs)
: p(rhs.p) {
}
So you are accessing the value as a pointer, not a bunch of bytes.
If the pointer value is invalid (not initialized, freed), the behavior is not defined.

Related

Is reading an uninitialized value always an undefined behaviour? Or are there exceptions to it?

An obvious example of undefined behavior (UB), when reading a value, is:
int a;
printf("%d\n", a);
What about the following examples?
int i = i; // `i` is not initialized when we are reading it by assigning it to itself.
int x; x = x; // Is this the same as above?
int y; int z = y;
Are all three examples above also UB, or are there exceptions to it?
Each of the three lines triggers undefined behavior. The key part of the C standard, that explains this, is section 6.3.2.1p2 regarding Conversions:
Except when it is the operand of the sizeof operator, the
_Alignof operator, the unary & operator, the ++ operator, the
-- operator, or the left operand of the . operator or an
assignment operator, an lvalue that does not have array type
is converted to the value stored in the designated object
(and is no longer an lvalue); this is called lvalue
conversion. If the lvalue has qualified type, the value has
the unqualified version of the type of the lvalue; additionally,
if the lvalue has atomic type, the value has the non-atomic version
of the type of the lvalue; otherwise, the value has the
type of the lvalue. If the lvalue has an incomplete type and does
not have array type, the behavior is undefined. If the lvalue
designates an object of automatic storage duration that could
have been declared with the register storage class (never had its
address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined.
In each of the three cases, an uninitialized variable is used as the right-hand side of an assignment or initialization (which for this purpose is equivalent to an assignment) and undergoes lvalue to rvalue conversion. The part in bold applies here as the objects in question have not been initialized.
This also applies to the int i = i; case as the lvalue on the right side has not (yet) been initialized.
There was debate in a related question that the right side of int i = i; is UB because the lifetime of i has not yet begun. However, that is not the case. From section 6.2.4 p5 and p6:
5 An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic
storage duration, as do some compound literals. The result of
attempting to indirectly access an object with automatic storage
duration from a thread other than the one with which the object is
associated is implementation-defined.
6 For such an object that does not have a variable length array type, its lifetime extends from entry into the block
with which it is associated until execution of that block ends in any
way. (Entering an enclosed block or calling a function
suspends, but does not end,execution of the current block.) If
the block is entered recursively, a new instance of the object is
created each time. The initial value of the object is
indeterminate. If an initialization is specified for the
object, it is performed each time the declaration or compound
literal is reached in the execution of the block; otherwise,
the value becomes indeterminate each time the declaration is reached
So in this case the lifetime of i begins before the declaration in encountered. So int i = i; is still undefined behavior, but not for this reason.
The bolded part of 6.3.2.1p2 does however open the door for use of an uninitialized variable not being undefined behavior, and that is if the variable in question had it's address taken. For example:
int a;
printf("%p\n", (void *)&a);
printf("%d\n", a);
In this case it is not undefined behavior if:
The implementation does not have trap representations for the given type, OR
The value chosen for a happens to not be a trap representation.
In which case the value of a is unspecified. In particular, this will be the case with GCC and Microsoft Visual C++ (MSVC) in this example as these implementations do not have trap representations for integer types.
Use of the not initialized automatic storage duration objects invokes UB.
Use of the not initialized static storage duration objects is defined as they are initialized to 0s
int a;
int foo(void)
{
static int b;
int c;
int d = d; //UB
static int e = e; //OK
printf("%d\n", a); //OK
printf("%d\n", b); //OK
printf("%d\n", c); //UB
}
In cases where an action on an object of some type might have unpredictable consequences on platforms where the type has trap representations, but have at-least-somewhat predictable behavior for types that don't, the Standard will seek to avoid distinguishing platforms that do or don't define the behavior by throwing everything into the catch-all category of "Undefined Behavior".
With regard to the behavior of uninitialized or partially-initialized objects, I don't think there's ever been a consensus over exactly which corner cases must be treated as though objects were initialized with Unspecified bit patterns, and which cases need not be treated in such fashion.
For example, given something like:
struct ztstr15 { char dat[16]; } x,y;
void test(void)
{
struct zstr15 hey;
strcpy(hey.dat, "Hey");
x=hey;
y=hey;
}
Depending upon how x and y will be used, there are at least four ways it might be useful to have an implementation process the above code:
Squawk if an attempt is made to copy any automatic-duration object that isn't fully initialized. This could be very useful in cases where one must avoid leakage of confidential information.
Zero-fill all unused portions of hey. This would prevent leakage of confidential information on the stack, but wouldn't flag code that might cause such leakage if the data weren't zero-filled.
Ensure that all parts of x and y are identical, without regard for whether the corresponding members of hey were written.
Write the first four bytes of x and y to match those of hey, but leave some or all of the remaining portions holding whatever they held before test() was called.
I don't think the Standard was intended to pass judgment as to whether some of those approaches would be better or worse than others, but it would have been awkward to write the Standard in a manner that would define behavior of test() while allowing for option #3. The optimizations facilitated by #3 would only be useful if programmers could safely write code like the above in cases where client code wouldn't care about the contents of x.dat[4..15] and y.dat[4..15]. If the only way to guarantee anything about the behavior of that function would be to write all portions of hey were written, including those whose values would be irrelevant to program behavior, that would nullify any optimization advantage approach #3 could have offered.

Could this memcpy result in undefined behavior?

With this definition:
struct vector {
const float x;
const float y;
};
would the code snippet below possibly result in undefined behavior?
struct vector src = {.x=1.0, .y=1.0};
struct vector dst;
void *dstPtr = &dst;
memcpy(dstPtr, &src, sizeof dst);
gcc and clang do not emit any warnings, but it does result in modification of a const-qualified type.
The construct looks a lot like the one given in the accepted answer to How to initialize const members of structs on the heap
, which apparently is conformant. I do not understand how my example would therefore be non-conformant.
The const-qualifiers on members let the compiler assume that - after an object has been initialized - these members must not be altered through any way, and it may optimise code accordingly (cf, for example, #Ajay Brahmakshatriya comment).
So it is essential to distinguish the initialization phase from the subsequent phases where assignments would apply, i.e. from when on may a compiler assume that an object got initialized and has an effective type to rely on.
I think there is a main difference between your example and that in the accepted answer you cited. In this SO answer, the target aggregate object with const-qualified member types is created through malloc:
ImmutablePoint init = { .x = x, .y = y };
ImmutablePoint *p = malloc(sizeof *p);
memcpy(p, &init, sizeof *p);
According to the rules on how a stored value of an object may be accessed (cf. this part of an online c standard draft), the p-target object will get its effective type the first time in the course of performing memcpy; the effective type is then that of the source object init, and the first memcpy on an object that got malloced can be seen as an initialization. Modifying the target object's const members afterwards, however, would be UB then (I think that even a second memcpy would be UB, but that's probably opinion based).
6.5 Expressions
The effective type of an object for an access to its stored value is the declared type of the object, if any.87) ... If a value is
copied into an object having no declared type using memcpy or memmove,
or is copied as an array of character type, then the effective type of
the modified object for that access and for subsequent accesses that
do not modify the value is the effective type of the object from which
the value is copied, if it has one. For all other accesses to an
object having no declared type, the effective type of the object is
simply the type of the lvalue used for the access.
87) Allocated objects have no declared type.
In your example, however, the target object dst already has a declared type through it's definition struct vector dst;. Hence, the const-qualifiers on dst's members are already in place before the memcpy is applied, and it has to be seen as an assignment rather than an initialization.
So I'd vote for UB in this case.

Casting structure pointers between structs containing pointers to different types?

I have a structure, defined by as follows:
struct vector
{
(TYPE) *items;
size_t nitems;
};
where type may literally be any type, and I have a type-agnostic structure of similar kind:
struct _vector_generic
{
void *items;
size_t nitems;
};
The second structure is used to pass structures of the first kind of any type to a resizing function, for example like this:
struct vector v;
vector_resize((_vector_generic*)&v, sizeof(*(v->items)), v->nitems + 1);
where vector_resize attempts to realloc memory for the given number of items in the vector.
int
vector_resize (struct _vector_generic *v, size_t item_size, size_t length)
{
void *new = realloc(v->items, item_size * length);
if (!new)
return -1;
v->items = new;
v->nitems = length;
return 0;
}
However, the C standard states that pointers to different types are not required to be of the same size.
6.2.5.27:
A pointer to void shall have the same representation and alignment
requirements as a pointer to a character type.39) Similarly, pointers
to qualified or unqualified versions of compatible types shall have
the same representation and alignment requirements. All pointers to
structure types shall have the same representation and alignment
requirements as each other. All pointers to union types shall have the
same representation and alignment requirements as each other. Pointers
to other types need not have the same representation or alignment
requirements.
Now my question is, should I be worried that this code may break on some architectures?
Can I fix this by reordering my structs such that the pointer type is at the end? for example:
struct vector
{
size_t nitems;
(TYPE) *items;
};
And if not, what can I do?
For reference of what I am trying to achieve, see:
https://github.com/andy-graprof/grapes/blob/master/grapes/vector.h
For example usage, see:
https://github.com/andy-graprof/grapes/blob/master/tests/grapes.tests/vector.exp
You code is undefined.
Accessing an object using an lvalue of an incompatible type results in undefined behavior.
Standard defines this in:
6.5 p7:
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the
object,
— a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
struct vector and struct _vector_generic have incompatible types and do not fit into any of the above categories. Their internal representation is irrelevant in this case.
For example:
struct vector v;
_vector_generic* g = &v;
g->size = 123 ; //undefined!
The same goes for you example where you pass the address of the struct vector to the function and interpret it as a _vector_generic pointer.
The sizes and padding of the structs could also be different causing elements to be positioned at different offsets.
What you can do is use your generic struct, and cast if depending on the type the void pointer holds in the main code.
struct gen
{
void *items;
size_t nitems;
size_t nsize ;
};
struct gen* g = malloc( sizeof(*g) ) ;
g->nitems = 10 ;
g->nsize = sizeof( float ) ;
g->items = malloc( g->nsize * g->nitems ) ;
float* f = g->items ;
f[g->nitems-1] = 1.2345f ;
...
Using the same struct definition you can allocate for a different type:
struct gen* g = malloc( sizeof(*g) ) ;
g->nitems = 10 ;
g->nsize = sizeof( int ) ;
g->items = malloc( g->nsize * g->nitems ) ;
int* i = g->items ;
...
Since you are storing the size of the type and the number of elements, it is obvious how your resize function would look like( try it ).
You will have to be careful to remember what type is used in which variable as the compiler will not warn you because you are using void*.
The code in your question invokes undefined behaviour (UB), because you de-reference a potentially invalid pointer. The cast:
(_vector_generic*)&v
... is covered by 6.3.2.3 paragraph 7:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.
If we assume alignment requirements are met, then the cast does not invoke UB. However, there is no requirement that the converted pointer must "compare equal" with (i.e. point at the same object as) the original pointer, nor even that it points to any object at all - that is to say, the value of the pointer is unspecified - therefore, to dereference this pointer (without first ascertaining that it is equal to the original) invokes undefined behaviour.
(Many people who know C well find this odd. I think this is because they know a pointer cast usually compiles to no operation - the pointer value simply remains as it is - and therefore they see pointer conversion as purely a type conversion. However, the standard does not mandate this).
Even if the pointer after conversion did compare equal with the original pointer, 6.5 paragraph 7 (the so-called "strict aliasing rule") would not allow you to dereference it. Essentially, you cannot access the same object via two pointers with different type, with some limited exceptions.
Example:
struct a { int n; };
struct b { int member; };
struct a a_object;
struct b * bp = (struct b *) &a_object; // bp takes an unspecified value
// Following would invoke UB, because bp may be an invalid pointer:
// int m = b->member;
// But what if we can ascertain that bp points at the original object?:
if (bp == &a_object) {
// The comparison in the line above actually violates constraints
// in 6.5.9p2, but it is accepted by many compilers.
int m = b->member; // UB if executed, due to 6.5p7.
}
Lets for the sake of discussion ignore that the C standard formally says this is undefined behavior. Because undefined behavior simply means that something is beyond the scope of the language standard: anything can happen and the C standard makes no guarantees. There may however be "external" guarantees on the particular system you are using, made by those who made the system.
And in the real world where there is hardware, there are indeed such guarantees. There are just two things that can go wrong here in practice:
TYPE* having a different representation or size than void*.
Different struct padding in each struct type because of alignment requirements.
Both of these seem unlikely and can be dodged with a static asserts:
static void ct_assert (void) // dummy function never linked or called by anyone
{
struct vector v1;
struct _vector_generic v2;
static_assert(sizeof(v1.items) == sizeof(v2.items),
"Err: unexpected pointer format.");
static_assert(sizeof(v1) == sizeof(v2),
"Err: unexpected padding.");
}
Now the only thing left that could go wrong is if a "pointer to x" has same size but different representation compared to "pointer to y" on your specific system. I have never heard of such a system anywhere in the real world. But of course, there are no guarantees: such obscure, unorthodox systems may exist. In that case, it is up to you whether you want to support them, or if it will suffice to just have portability to 99.99% of all existing computers in the world.
In practice, the only time you have more than one pointer format on a system is when you are addressing memory beyond the CPU's standard address width, which is typically handled by non-standard extensions such as far pointers. In all such cases, the pointers will have different sizes and you will detect such cases with static assert above.

Pointer difference across members of a struct?

The C99 standard states that:
When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object
Consider the following code:
struct test {
int x[5];
char something;
short y[5];
};
...
struct test s = { ... };
char *p = (char *) s.x;
char *q = (char *) s.y;
printf("%td\n", q - p);
This obviously breaks the above rule, since the p and q pointers are pointing to different "array objects", and, according to the rule, the q - p difference is undefined.
But in practice, why should such a thing ever result in undefined behaviour? After all, the struct members are laid out sequentially (just as array elements are), with any potential padding between the members. True, the amount of padding will vary across implementations and that would affect the outcome of the calculations, but why should that result be "undefined"?
My question is, can we suppose that the standard is just "ignorant" of this issue, or is there a good reason for not broadening this rule? Couldn't the above rule be rephrased to "both shall point to elements of the same array object or members of the same struct"?
My only suspicion are segmented memory architectures where the members might end up in different segments. Is that the case?
I also suspect that this is the reason why GCC defines its own __builtin_offsetof, in order to have a "standards compliant" definition of the offsetof macro.
EDIT:
As already pointed out, arithmetic on void pointers is not allowed by the standard. It is a GNU extension that fires a warning only when GCC is passed -std=c99 -pedantic. I'm replacing the void * pointers with char * pointers.
Subtraction and relational operators (on type char*) between addresses of member of the same struct are well defined.
Any object can be treated as an array of unsigned char.
Quoting N1570 6.2.6.1 paragraph 4:
Values stored in non-bit-field objects of any other object type
consist of n × CHAR_BIT bits, where n is the size of an object of that
type, in bytes. The value may be copied into an object of type
unsigned char [ n ] (e.g., by memcpy); the resulting set of bytes is
called the object representation of the value.
...
My only suspicion are segmented memory architectures where the members
might end up in different segments. Is that the case?
No. For a system with a segmented memory architecture, normally the compiler will impose a restriction that each object must fit into a single segment. Or it can permit objects that occupy multiple segments, but it still has to ensure that pointer arithmetic and comparisons work correctly.
Pointer arithmetic requires that the two pointers being added or subtracted to be part of the same object because it doesn't make sense otherwise.
The quoted section of standard specifically refers to two unrelated objects such as int a[b]; and int b[5]. The pointer arithmetic requires to know the type of the object that the pointers pointing to (I am sure you are aware of this already).
i.e.
int a[5];
int *p = &a[1]+1;
Here p is calculated by knowing the that the &a[1] refers to an int object and hence incremented to 4 bytes (assuming sizeof(int) is 4).
Coming to the struct example, I don't think it can possibly be defined in a way to make pointer arithmetic between struct members legal.
Let's take the example,
struct test {
int x[5];
char something;
short y[5];
};
Pointer arithmatic is not allowed with void pointers by C standard (Compiling with gcc -Wall -pedantic test.c would catch that). I think you are using gcc which assumes void* is similar to char* and allows it.
So,
printf("%zu\n", q - p);
is equivalent to
printf("%zu", (char*)q - (char*)p);
as pointer arithmetic is well defined if the pointers point to within the same object and are character pointers (char* or unsigned char*).
Using correct types, it would be:
struct test s = { ... };
int *p = s.x;
short *q = s.y;
printf("%td\n", q - p);
Now, how can q-p be performed? based on sizeof(int) or sizeof(short) ? How can the size of char something; that's in the middle of these two arrays be calculated?
That should explain it's not possible to perform pointer arithmetic on objects of different types.
Even if all members are of same type (thus no type issue as stated above), then it's better to use the standard macro offsetof (from <stddef.h>) to get the difference between struct members which has the similar effect as pointer arithmetic between members:
printf("%zu\n", offsetof(struct test, y) - offsetof(struct test, x));
So I see no necessity to define pointer arithmetic between struct members by the C standard.
Yes, you are allowed to perform pointer arithmetric on structure bytes:
N1570 - 6.3.2.3 Pointers p7:
... When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.
This means that for the programmer, bytes of the stucture shall be seen as a continuous area, regardless how it may have been implemented in the hardware.
Not with void* pointers though, that is non-standard compiler extension. As mentioned on paragraph from the standard, it applies only to character type pointers.
Edit:
As mafso pointed out in comments, above is only true as long as type of substraction result ptrdiff_t, has enough range for the result. Since range of size_t can be larger than ptrdiff_t, and if structure is big enough, it's possible that addresses are too far apart.
Because of this it's preferable to use offsetof macro on structure members and calculate result from those.
I believe the answer to this question is simpler than it appears, the OP asks:
but why should that result be "undefined"?
Well, let's see that the definition of undefined behavior is in the draft C99 standard section 3.4.3:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
it is simply behavior for which the standard does not impose a requirement, which perfectly fits this situation, the results are going to vary depending on the architecture and attempting to specify the results would have probably been difficult if not impossible in a portable manner. This leaves the question, why would they choose undefined behavior as opposed to let's say implementation of unspecified behavior?
Most likely it was made undefined behavior to limit the number of ways an invalid pointer could be created, this is consistent with the fact that we are provided with offsetof to remove the one potential need for pointer subtraction of unrelated objects.
Although the standard does not really define the term invalid pointer, we get a good description in Rationale for International Standard—Programming Languages—C which in section 6.3.2.3 Pointers says (emphasis mine):
Implicit in the Standard is the notion of invalid pointers. In
discussing pointers, the Standard typically refers to “a pointer to an
object” or “a pointer to a function” or “a null pointer.” A special
case in address arithmetic allows for a pointer to just past the end
of an array. Any other pointer is invalid.
The C99 rationale further adds:
Regardless how an invalid pointer is created, any use of it yields
undefined behavior. Even assignment, comparison with a null pointer
constant, or comparison with itself, might on some systems result in
an exception.
This strongly suggests to us that a pointer to padding would be an invalid pointer, although it is difficult to prove that padding is not an object, the definition of object says:
region of data storage in the execution environment, the contents of
which can represent values
and notes:
When referenced, an object may be interpreted as having a particular
type; see 6.3.2.1.
I don't see how we can reason about the type or the value of padding between elements of a struct and therefore they are not objects or at least is strongly indicates padding is not meant to be considered an object.
I should point out the following:
from the C99 standard, section 6.7.2.1:
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
It isn't so much that the result of pointer subtraction between members is undefined so much as it is unreliable (i.e. not guaranteed to be the same between different instances of the same struct type when the same arithmetic is applied).

Is const-casting via a union undefined behaviour?

Unlike C++, C has no notion of a const_cast. That is, there is no valid way to convert a const-qualified pointer to an unqualified pointer:
void const * p;
void * q = p; // not good
First off: Is this cast actually undefined behaviour?
In any event, GCC warns about this. To make "clean" code that requires a const-cast (i.e. where I can guarantee that I won't mutate the contents, but all I have is a mutable pointer), I have seen the following "conversion" trick:
typedef union constcaster_
{
void * mp;
void const * cp;
} constcaster;
Usage: u.cp = p; q = u.mp;.
What are the C language rules on casting away constness through such a union? My knowledge of C is only very patchy, but I've heard that C is far more lenient about union access than C++, so while I have a bad feeling about this construction, I would like an argument from the standard (C99 I suppose, though if this has changed in C11 it'll be good to know).
It's implementation defined, see C99 6.5.2.3/5:
if the value of a member of a union object is used when the most
recent store to the object was to a different member, the behavior is
implementation-defined.
Update: #AaronMcDaid commented that this might be well-defined after all.
The standard specified the following 6.2.5/27:
Similarly, pointers to qualified or unqualified versions of compatible
types shall have the same representation and alignment
requirements.27)
27) The same representation and alignment requirements are meant to
imply interchangeability as arguments to functions, return values from
functions, and members of unions.
And (6.7.2.1/14):
A pointer to a union object, suitably converted, points to each of its
members (or if a member is a bitfield, then to the unit in which it
resides), and vice versa.
One might conclude that, in this particular case, there is only room for exactly one way to access the elements in the union.
My understanding it that the UB can arise only if you try to modify a const-declared object.
So the following code is not UB:
int x = 0;
const int *cp = &x;
int *p = (int*)cp;
*p = 1; /* OK: x is not a const object */
But this is UB:
const int cx = 0;
const int *cp = &cx;
int *p = (int*)cp;
*p = 1; /* UB: cx is const */
The use of a union instead of a cast should not make any difference here.
From the C99 specs (6.7.3 Type qualifiers):
If an attempt is made to modify an object defined with a const-qualified type through use
of an lvalue with non-const-qualified type, the behavior is undefined.
The initialization certainly won't cause UB. The conversion between qualified pointer types is explicitly allowed in §6.3.2.3/2 (n1570 (C11)). It's the use of content in that pointer afterwards that cause UB (see #rodrigo's answer).
However, you need an explicit cast to convert a void* to a const void*, because the constraint of simple assignment still require all qualifier on the LHS appear on the RHS.
§6.7.9/11: ... The initial value of the object is that of the expression (after conversion); the same type constraints and conversions as for simple assignment apply, taking the type of the scalar to be the unqualified version of its declared type.
§6.5.16.1/1: (Simple Assignment / Contraints)
... both operands are
pointers to qualified or unqualified versions of compatible types, and the type pointed
to by the left has all the qualifiers of the type pointed to by the right;
... one operand is a pointer
to an object type, and the other is a pointer to a qualified or unqualified version of
void, and the type pointed to by the left has all the qualifiers of the type pointed to
by the right;
I don't know why gcc just gives a warning though.
And for the union trick, yes it's not UB, but still the result is probably unspecified.
§6.5.2.3/3 fn 95: If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
§6.2.6.1/7: When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values. (* Note: see also §6.5.2.3/6 for an exception, but it doesn't apply here)
The corresponding sections in n1124 (C99) are
C11 §6.3.2.3/2 = C99 §6.3.2.3/2
C11 §6.7.9/11 = C99 §6.7.8/11
C11 §6.5.16.1/1 = C99 §6.5.16.1/1
C11 §6.5.2.3/3 fn 95 = missing ("type punning" doesn't appear in C99)
C11 §6.2.6.1/7 = C99 §6.2.6.1/7
Don't cast it at all. It's a pointer to const which means that attempting to modify the data is not allowed and in many implementations will cause the program to crash if the pointer points to unmodifiable memory. Even if you know the memmory can be modified, there may be other pointers to it that do not expect it to change e.g. if it is part of the storage of a logically immutable string.
The warning is there for good reason.
If you need to modify the content of a const pointer, the portable safe way to do it is first to copy the memory it points to and then modify that.

Resources