Can we use va_arg with unions? - c

6.7.2.1 paragraph 14 of my draft of the C99 standard has this to say about unions and pointers (emphasis, as always, added):
The size of a union is sufficient to contain the largest of its members. The value of at
most one of the members can be stored in a union object at any time. A pointer to a
union object, suitably converted, points to each of its members (or if a member is a bit-
field, then to the unit in which it resides), and vice versa.
All well and good, that means that it is legal to do something like the following to copy either a signed or unsigned int into a union, assuming we only want to copy it out into data of the same type:
union ints { int i; unsigned u; };
int i = 4;
union ints is = *(union ints *)&i;
int j = is.i; // legal
unsigned k = is.u; // not so much
7.15.1.1 paragraph 2 has this to say:
The va_arg macro expands to an expression that has the specified type and the value of
the next argument in the call. The parameter ap shall have been initialized by the
va_start or va_copy macro (without an intervening invocation of the va_end macro for the sameap). Each invocation of the va_arg macro modifies ap so that the values of successive arguments are returned in turn. The parameter type shall be a type name specified such that the type of a pointer to an object that has the specified type can be obtained simply by postfixing a * to type. If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:
—one type is a signed integer type, the other type is the corresponding unsigned integer
type, and the value is representable in both types;
—one type is pointer to void and the other is a pointer to a character type.
I'm not going to go and cite the part about default argument promotions. My question is: is this defined behavior:
void func(int i, ...)
{
va_list arg;
va_start(arg, i);
union ints is = va_arg(arg, union ints);
va_end(arg);
}
int main(void)
{
func(0, 1);
return 0;
}
If so, it would appear to be a neat trick to overcome the "and the value is compatible with both types" requirement of signed/unsigned integer conversion (albeit in a way that's rather difficult to do anything with legally). If not, it would appear to be safe to just use unsigned in this case, but what if there were more elements in the union with more incompatible types? If we can guarantee that we won't access the union by element (i.e. we just copy it into another union or storage space that we're treating like a union) and that all elements of the union are the same size, is this allowed with varargs? Or would it only be allowed with pointers?
In practice I expect this code will almost never fail, but I want to know if it's defined behavior. My current guess is that it appears not to be defined, but that seems incredibly dumb.

You have a couple things off.
A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit- field, then to the unit in which it resides), and vice versa.
This does not mean that the types are compatible. In fact, they are not compatible. So the following code is wrong:
func(0, 1); // undefined behavior
If you want to pass a union,
func(0, (union ints){ .u = BLAH });
You can check by writing the code,
union ints x;
x = 1;
GCC gives an "error: incompatible types in assignment" message when compiling.
However, most implementations will "probably" do the right thing in both cases. There are some other problems...
union ints {
int i;
unsigned u;
};
int i = 4;
union ints is = *(union ints *)&i; // Invalid
int j = is.i; // legal
unsigned k = is.u; // also legal (see note)
The behavior when you dereference the address of a type using a type other than its actual type *(uinon ints *)&i is sometimes undefined (looking up the reference, but I'm pretty sure about this). However, in C99 it is permitted to access a union member other than the most recently stored union member (or is it C1x?), but the value is implementation defined and may be a trap representation.
About type punning through unions: As Pascal Cuoq notes, it's actually TC3 that defines the behavior of accessing a union element other than the most recently stored one. TC3 is the third update to C99. The good news is that this part of TC3 is really codifying existing practice — so think of it as a de facto part of C prior to TC3.

Since the standard says:
The parameter type shall be a type name specified such that the type of a pointer to an object that has the specified type can be obtained simply by postfixing a * to type.
For union ints, that condition is satisfied. Since union ints * is a perfectly good representation of a pointer to a union ints, so there is nothing in that sentence to prevent it being used to collect a value pushed onto the stack as a union.
If you cheat and try to pass a plain int or unsigned int in place of a union, then you would be invoking undefined behaviour. Thus, you could use:
union ints u1 = ...;
func(0, (union ints) { .i = 0 });
func(1, (union ints) { .u = UINT_MAX });
func(2, u1);
You could not use:
func(1, 0);
The arguments are not union types.

I don't see why you think that code should never fail in practice. It would fail on any implementation where integer types are passed by register but aggregate types (even when small) are passed on the stack, and I see nothing in the standard that forbids such implementations. A union containing an int is not a type compatible with int, even if their sizes are the same.
Back to your first code fragment, it has a problem too:
union ints is = *(union ints *)&i;
This is an aliasing violation and invokes undefined behavior. You could avoid it by using memcpy and I suppose then it would be legal..
I'm also a bit confused about your comment here:
unsigned k = is.u; // not so much
Since the value 4 is represented in both the signed and unsigned types, this should be legal, unless it's specifically forbidden as a special case.
If this doesn't answer your question, perhaps you could elaborate more on what (albeit theoretical) problem you're trying to solve.

Related

Why void pointer if pointers can be casted into any type(in c)?

I want to understand the real need of having a void pointer, for example in the following code, i use casting to be able to use the same ptr in different way, so why is there really a void pointer if anything can be casted?
int main()
{
int x = 0xAABBCCDD;
int * y = &x;
short * c = (short *)y;
char * d = (char*)y;
*c = 0;
printf("x is %x\n",x);//aabb0000
d +=2;
*d = 0;
printf("x is %x\n",x);//aa000000
return 0;
}
Converting any pointer type to any other pointer type is not supported by base C (that is, C without any extensions or behavior not required by the C standard). The 2018 C standard says in clause 6.3.2.3, paragraph 7:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer…
In that passage, we see two limitations:
If the pointer is not properly aligned, the conversion may fail in various ways. In your example, converting an int * to a short * is unlikely to fail since int typically has stricter alignment than short. However, the reverse conversion is not supported by base C. Say you define an array with short x[20]; or char x[20];. Then the array will be aligned as needed for a short or char, but not necessarily as needed for an int, in which case the behavior of (int *) x would not be defined by the C standard.
The value that results from the conversion mostly unspecified. This passage only guarantees that converting it back yields the original pointer (or something equivalent). It does not guarantee you can do anything useful with the pointer without converting it back—you cannot necessarily use a pointer converted from int * to access a short.
The standard does make some additional guarantees about certain pointer conversions. One of them is in the continuation of the passage above:
… When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
So you can use a pointer converted from int * to access the individual bytes that represent an int, and you can do the same to access the bytes of any other object type. But that guarantee is made only for access the individual bytes with a character type, not with a short type.
From the above, we know that after the short * c = (short *)y; in your example, y does not necessarily point to any part of the x it originated from—the value resulting from the pointer conversion is not guaranteed to work as a short * at all. But, even if it does point to the place where x is, base C does not support using c to access those bytes, because 6.5 7 says:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
So the *c = 0; in your example is not supported by C for two reasons: c does not necessarily point to any part of x or to any valid address, and, even if it does, the behavior of modifying part of the int x using short type is not defined by the C standard. It might appear to work in your C implementation, and it might even be supported by your C implementation, but it is not strictly conforming C code.
The C standard provides the void * type for use when a specific type is inadequate. 6.3.2.3 1 makes a similar guarantee for pointers to void as it does for pointers to objects:
A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.
void * is used with routines that must work with arbitrary object types, such as qsort. char * could serve this purpose, but it is better to have a separate type that clearly denotes no specific type is associated with it. For example, if the parameter to a function were char *p, the function could inadvertently use *p and get a character that it does not want. If the parameter is void *p, then the function must convert the pointer to a specific type before using it to access an object. Thus having a special type for “generic pointers” can help avoid errors as well as indicate intent to people reading the code.
Why void pointer if pointers can be casted into any type(in c)?
C does not specify that void* can be cast into a pointer of any type. A void * may be cast into a pointer to any object type. IOWs, a void * may be insufficient to completely store a function pointer.
need of having a void pointer
A void * is a universal pointer for object types. Setting aside pointers to const, volatile, etc. concerns, functions like malloc(), memset() provide universal ways to allocate and move/set data.
In more novel architectures, a int * and void * and others have different sizes and interpretations. void* is the common pointer type for objects, complete enough to store information to re-constitute the original pointer, regardless of object type pointed to.

Can a type which is a union member alias that union?

Prompted by this question:
The C11 standard states that a pointer to a union can be converted to a pointer to each of its members. From Section 6.7.2.1p17:
The size of a union is sufficient to contain the largest of
its members. The value of at most one of the members can be
stored in a union object at any time. A pointer to a union
object, suitably converted, points to each of its members (or
if a member is a bit-field, then to the unit in which it
resides), and vice versa.
This implies you can do the following:
union u {
int a;
double b;
};
union u myunion;
int *i = (int *)&u;
double *d = (double *)&u;
u.a = 2;
printf("*i=%d\n", *i);
u.b = 3.5;
printf("*d=%f\n", *d);
But what about the reverse: in case of the above union, can an int * or double * be safely converted to a union u *? Consider the following code:
#include <stdio.h>
union u {
int a;
double b;
};
void f(int isint, union u *p)
{
if (isint) {
printf("int value=%d\n", p->a);
} else {
printf("double value=%f\n", p->b);
}
}
int main()
{
int a = 3;
double b = 8.25;
f(1, (union u *)&a);
f(0, (union u *)&b);
return 0;
}
In this example, pointers to int and double, both of which are members of union u, are passed to a function where a union u * is expected. A flag is passed to the function to tell it which "member" to access.
Assuming, as in this case, that the member accessed matches the type of the object that was actually passed in, is the above code legal?
I compiled this on gcc 6.3.0 with both -O0 and -O3 and both gave the expected output:
int value=3
double value=8.250000
In this example, pointers to int and double, both of which are members
of union u, are passed to a function where a union u * is expected. A
flag is passed to the function to tell it which "member" to access.
Assuming, as in this case, that the member accessed matches the type
of the object that was actually passed in, is the above code legal?
You seem to be focusing your analysis with respect to the strict aliasing rule on the types of the union members. However, given
union a_union {
int member;
// ...
} my_union, *my_union_pointer;
, I would be inclined to argue that expressions of the form my_union.member and my_union_pointer->member express accessing the stored value of an object of type union a_union in addition to accessing an object of the member's type. Thus, if my_union_pointer does not actually point to an object whose effective type is union a_union then there is indeed a violation of the strict aliasing rule -- with respect to type union a_union -- and the behavior is therefore undefined.
The Standard gives no general permission to access a struct or union object using an lvalue of member type, nor--so far as I can tell--does it give any specific permission to perform such access unless the member happens to be of character type. Nor does it define any means by which the act of casting an int* into a union u* can create one which did not already exist. Instead, the creation of any storage that will ever be accessed as a union u implies the simultaneous creation of a union u object within that storage.
Instead, the Standard (references quoted from the C11 draft N1570) relies upon implementations to apply the footnote 88 (The intent of this list is to specify those circumstances in which an object may or may not be aliased.) and recognize that the "strict aliasing rule" (6.5p7) should only be applied when an object is referenced both via an lvalue of its own type and a seemingly-unrelated lvalue of another type during some particular execution of a function or loop [i.e. when the object aliases some other lvalue].
The question of when two lvalues may be viewed as "seemingly unrelated", and when an implementations should be expected to recognize a relationship between them, is a Quality of Implementation issue. Clang and gcc seem to recognize that lvalues with forms unionPtr->value and unionPtr->value[index] are related to *unionPtr, but seem unable to recognize that pointers to such lvalues have any relationship to unionPtr. They will thus recognize that both unionPtr->array1[i] and unionPtr->array2[j] access *unionPtr (since array subscripting via [] seems to be treated differently from array-to-pointer decay), but will not recognize that *(unionPtr->array1+i) and *(unionPtr->array2+j) do likewise.
Addendum--standard reference:
Given
union foo {int x;} foo,bar;
void test(void)
{
foo=bar; // 1
foo.x = 2; // 2
bar=foo; // 3
}
The Standard would describe the type of foo.x as int. If the second statement didn't access the stored value of foo, then the third statement would have no effect. Thus, the second statement accesses the stored value of an object of type union foo using an lvalue of type int. Looking at N1570 6.5p7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:(footnote 88)
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.
Footnote 88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
Note that there is no permission given above to access an object of type union foo using an lvalue of type int. Because the above is a constraint, any violation thereof invokes UB even if the behavior of the construct would otherwise be defined by the Standard.
Regarding strict aliasing, there is not an issue going from pointer-to-type (for example &a), to pointer-to-union containing that type. It is one of the exceptions to the strict aliasing rule, C17 6.5/7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object, /--/
- an aggregate or union type that includes one of the aforementioned types among its
members
So this is fine as far as strict aliasing goes, as long as the union contains an int/double. And the pointer conversion in itself is well-defined too.
The problem comes when you try to access the contents, for example the contents of an int as a larger double. This is probably UB for multiple reasons - I can think of at least C17 6.3.2.3/7:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned69) for the referenced type, the behavior is undefined.
Where the non-normative foot note provides more information:
69) In general, the concept “correctly aligned” is transitive: if a pointer to type A is correctly aligned for a pointer to type B,
which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.
No. It's not formally correct.
In C you can do whatever, and it could work, but constructs like this are bombs. Any future modification could lead to a big failure.
The union reserves memory space to hold the largest of it elements:
The size of a union is sufficient to contain the largest of its
members.
On the reverse the space can't be enough.
Consider:
union
{
char a;
int b;
double c;
} myunion;
char c;
((union myunion *)&c)->b = 0;
Will create a memory corruption.
The meaning of the standard definition:
The value of at most one of the members can be stored in a union
object at any time. A pointer to a union object, suitably converted,
points to each of its members (or if a member is a bit-field, then to
the unit in which it resides), and vice versa.
Enforce the point that each union member start at the union start address, and, implicitly, states that the compiler shall align unions on a suitable boundary for each of its elements, that means to choose an alignment correct for each member. Because the standard alignments are normally powers of 2, as rule of thumb the union will get aligned on the boundary that fit the element requiring the largest alignment.

Accessing bytes in a long long variable with pointers

I'm supposed to create a variable
long long hex = 0x1a1b2a2b3a3b4a4bULL;
and then define 4 pointers that point to 1a1b, 2a2b, 3a3b and 4a4b. I'm then printing the addresses and values of those double bytes.
My approach was to create a pointer
long long *ptr1 = &hex;
and then use pointer arithmetic to get to the next value. What I realized was that incrementing this pointer would increment it by long long bytes and not by 2 bytes like I need it to. Creating a short pointer
short *ptr1 = &hex;
Is what I would need but my compiler won't let me since the data types are incompatible. How do I get around that? Is there a way to create a pointer that increments by 2 bytes and assign that to a variable of a larger data type?
You can access any variable only through compatible types.
However, a char pointer can be used to access any type of variable.
Please do not cast it to a short* Please see NOTE below , they are not compatible types. You can only use a char* for conforming code.
Quoting C11, chapter §6.3.2.3
[...] When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.
So, the way out is, use a char * and use pointer arithmetic to get to the required address.
NOTE: Since all other answers suggest a blatantly wrong method (casting the pointer to short *, which explicitly violates strict aliasing), let me expand a bit on my answer and supporting quotes.
Quoting C11, chapter §6.5/P7
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types: 88)
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the
object,
— a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
In this case, a short and a long long are not compatiable types. so the only way out is to use pointer tochar` type.
Cut-'n-Paste from Question body
This was added as update by OP
Edit:
Here's the correct solution that doesn't cause undefined behavior.
Edit 2:
Added the memory address.
#include <stdio.h>
int main() {
long long hex = 0x1a1b2a2b3a3b4a4bULL;
char *ptr = (char*)&hex;
int i; int j;
for (i = 1, j = 0; i < 8, j < 7; i += 2, j += 2) {
printf("0x%hx%hx at address %p \n", ptr[i], ptr[j], (void *) ptr+i);
}
return 0;
}
As expected, it has been pointed out that this is undefined behavior. It's probably one of these stupid "C course" assignments where C isn't completely understood.
Just in case you want to avoid the UB, you could solve it using a union:
#include <stdio.h>
union longparts
{
unsigned long long whole;
unsigned short parts[4];
};
int main(void)
{
union longparts test;
test.whole = 0x1a1b2a2b3a3b4a4bULL;
for (int i = 0; i < 4; ++i)
{
unsigned short *part = &test.parts[i];
printf("short at addr %p: 0x%hx\n", (void *)part, *part);
}
return 0;
}
from C11 §6.5.2.3, footnote 95:
If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.
So, you could still run into problems in some cases with trap representations, but at least it's not undefined. The result is implementation defined, e.g. because of endianness of the host machine.
add a cast:
short *ptr1 = (short*)&hex;
However, make sure you pay attention to the endianness of your platform.
On x86, for instance, data is stored little end first, so
ptr1[0] should point to 0x4a4b
Also pay attention to your platforms actual sizes: long long is at least 64bit, and short is at least 16 bit. If you want to make sure the types are really those sizes, use uint64_t and uint16_t. You'll get a compiler error if there aren't any types matching those exact sizes available on your system.
Furthermore, take note of alignment. You can use uint64_t as uint16_t[4], however not the other way around, as the address of a uint16_t is usually dividable by two, and the address of uint64_t dividable by 8.
Should I worry about the alignment during pointer casting?
You need to cast the pointer to assign it to a different type:
short *ptr1 = (short*)&hex;
However, doing this results in implementation-defined behavior, since you're depending on the endianness of the system.

Are pointers to union required to be aligned for all members

Given
typedef union { unsigned char b; long l; } BYTE_OR_LONG;
would it be legitimate to have a function
unsigned long get_byte_or_long(BYTE_OR_LONG *it)
{
if (it->b)
return it->b;
else
return decode_long(it->l); // Platform-dependent method
// Could return (it), (it>>8), etc.
}
and call it
void test()
{
long l = encode_long(12345678); // Platform-dependent; could return
// (it<<8), (it & 16777215), etc.
char b[2] = {12,34};
BYTE_OR_LONG *bl[3];
bl[0] = (BYTE_OR_LONG*)&l;
bl[1] = (BYTE_OR_LONG*)b;
bl[2] = (BYTE_OR_LONG*)(b+1);
for (int i=0; i<3; i++)
printf("%lu\n", get_byte_or_long(bl[i]));
}
Certainly constructing an unaligned BYTE_OR_LONG *p and then accessing p->l would be Undefined Behavior. Further, even the act of casting an unaligned pointer to (unsigned long*) would be Undefined Behavior, since an implementation might not need as many bits for such a type as for a char*. With a union, however, things seem unclear.
From what I understand, a pointer to a union is supposed to be equivalent to a pointer to any of its elements. Does that mean that implementations required to guarantee that a pointer to a union type must be capable of identifying any instance of any type contained therein [thus a BYTE_OR_LONG* would have to be able to identify any unsigned char], or are programmers required to only cast to union types pointers which would satisfy every alignment requirement of every constituent therein?
Does that mean that implementations required to guarantee that a pointer to a union type must be capable of identifying any instance of any type contained therein ... ?
Long question, short answer: Yes.
(I'll dig out the Standard reference later)
Basically it's because a struct/union's 1st element is guaranteed to carry no padding before it.
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned [...] for the referenced type, the behavior is undefined. [C11 (n1570) 6.3.2.3 p7]
I couldn't find any explicit guarantees about the alignment requirements of unions, so the conversion to the union pointer appears not strictly conforming. On my machine, _Alignof(char) is 1, but _Alignof(BYTE_OR_LONG) is 4.
Does that mean that implementations required to guarantee that a pointer to a union type must be capable of identifying any instance of any type contained therein [thus a BYTE_OR_LONG* would have to be able to identify any unsigned char], or are programmers required to only cast to union types pointers which would satisfy every alignment requirement of every constituent therein?
No, a pointer to T may be made point to any union containing a T, not necessarily the other way round. As far as I know, the alignment requirements of a union could even be stricter than those of all of its members.

Is const-casting via a union undefined behaviour?

Unlike C++, C has no notion of a const_cast. That is, there is no valid way to convert a const-qualified pointer to an unqualified pointer:
void const * p;
void * q = p; // not good
First off: Is this cast actually undefined behaviour?
In any event, GCC warns about this. To make "clean" code that requires a const-cast (i.e. where I can guarantee that I won't mutate the contents, but all I have is a mutable pointer), I have seen the following "conversion" trick:
typedef union constcaster_
{
void * mp;
void const * cp;
} constcaster;
Usage: u.cp = p; q = u.mp;.
What are the C language rules on casting away constness through such a union? My knowledge of C is only very patchy, but I've heard that C is far more lenient about union access than C++, so while I have a bad feeling about this construction, I would like an argument from the standard (C99 I suppose, though if this has changed in C11 it'll be good to know).
It's implementation defined, see C99 6.5.2.3/5:
if the value of a member of a union object is used when the most
recent store to the object was to a different member, the behavior is
implementation-defined.
Update: #AaronMcDaid commented that this might be well-defined after all.
The standard specified the following 6.2.5/27:
Similarly, pointers to qualified or unqualified versions of compatible
types shall have the same representation and alignment
requirements.27)
27) The same representation and alignment requirements are meant to
imply interchangeability as arguments to functions, return values from
functions, and members of unions.
And (6.7.2.1/14):
A pointer to a union object, suitably converted, points to each of its
members (or if a member is a bitfield, then to the unit in which it
resides), and vice versa.
One might conclude that, in this particular case, there is only room for exactly one way to access the elements in the union.
My understanding it that the UB can arise only if you try to modify a const-declared object.
So the following code is not UB:
int x = 0;
const int *cp = &x;
int *p = (int*)cp;
*p = 1; /* OK: x is not a const object */
But this is UB:
const int cx = 0;
const int *cp = &cx;
int *p = (int*)cp;
*p = 1; /* UB: cx is const */
The use of a union instead of a cast should not make any difference here.
From the C99 specs (6.7.3 Type qualifiers):
If an attempt is made to modify an object defined with a const-qualified type through use
of an lvalue with non-const-qualified type, the behavior is undefined.
The initialization certainly won't cause UB. The conversion between qualified pointer types is explicitly allowed in §6.3.2.3/2 (n1570 (C11)). It's the use of content in that pointer afterwards that cause UB (see #rodrigo's answer).
However, you need an explicit cast to convert a void* to a const void*, because the constraint of simple assignment still require all qualifier on the LHS appear on the RHS.
§6.7.9/11: ... The initial value of the object is that of the expression (after conversion); the same type constraints and conversions as for simple assignment apply, taking the type of the scalar to be the unqualified version of its declared type.
§6.5.16.1/1: (Simple Assignment / Contraints)
... both operands are
pointers to qualified or unqualified versions of compatible types, and the type pointed
to by the left has all the qualifiers of the type pointed to by the right;
... one operand is a pointer
to an object type, and the other is a pointer to a qualified or unqualified version of
void, and the type pointed to by the left has all the qualifiers of the type pointed to
by the right;
I don't know why gcc just gives a warning though.
And for the union trick, yes it's not UB, but still the result is probably unspecified.
§6.5.2.3/3 fn 95: If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
§6.2.6.1/7: When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values. (* Note: see also §6.5.2.3/6 for an exception, but it doesn't apply here)
The corresponding sections in n1124 (C99) are
C11 §6.3.2.3/2 = C99 §6.3.2.3/2
C11 §6.7.9/11 = C99 §6.7.8/11
C11 §6.5.16.1/1 = C99 §6.5.16.1/1
C11 §6.5.2.3/3 fn 95 = missing ("type punning" doesn't appear in C99)
C11 §6.2.6.1/7 = C99 §6.2.6.1/7
Don't cast it at all. It's a pointer to const which means that attempting to modify the data is not allowed and in many implementations will cause the program to crash if the pointer points to unmodifiable memory. Even if you know the memmory can be modified, there may be other pointers to it that do not expect it to change e.g. if it is part of the storage of a logically immutable string.
The warning is there for good reason.
If you need to modify the content of a const pointer, the portable safe way to do it is first to copy the memory it points to and then modify that.

Resources