Implicit Pointer-To-Const-T conversion - c

I was wondering if the implicit conversion to a pointer to a const data type is somewhere defined in the C11 standard:
T x;
const T *p = &x;
A pointer to an object of type T is implicitly converted into a pointer to an object of type const T. Is this implicit conversion somewhere defined in the C11 standard? (I know that it makes sense to allow this and how useful it is. I'm just curious to know where it is defined in the standard)
Furthermore, is an implicit conversion from type T** to const T** forbidden according to C11?
T *p;
const T **pp = &p;
This is a well known problematic part and therefore GCC and LLVM/clang raise a warning. Still I'm wondering if this is allowed according to the C11 standard or not. I only found in §6.5.16.1P6 a comment that this should be a constraint violation. However, I do not see which constraint should be violated. Again I know that this should be prohibited and that this implicit conversion can lead to subtle problems. I'm just curious to know if this is (un)defined behaviour according to C11.
Again, my two questions are not about if this is good or not (which is answered multiple times here) but how/where the C11 standard defines this.
Just for the sake of completeness here is a link to why the second example is problematic: http://c-faq.com/ansi/constmismatch.html

A pointer to an object of type T is implicitely converted into a
pointer to an object of type const-T. Is this implicit conversion
somewhere defined in the C11 standard?
Yes. This implicit conversion is mandated by the standard.
Paragraph 3 of section 6.5.4 Cast Operators says that
Conversions that involve pointers, other than where permitted by the
constraints of 6.5.16.1, shall be specified by means of an explicit
cast.
and the referenced 6.5.16.1 under point 3 says:
the left operand has atomic, qualified, or unqualified pointer type,
and (considering the type the left operand would have after lvalue
conversion) both operands are pointers to qualified or unqualified
versions of compatible types, and the type pointed to by the left has
all the qualifiers of the type pointed to by the right;
Therefore the implicit conversion for const T *p = &x; holds because you're only adding qualifiers, not removing them.
const T **pp = &p; doesn't fall under this, so you need an explicit cast (C++ would allow const T*const*pp = &p; (the second const is needed) but C still wouldn't.)
The pointer conversion through an explicit cast isn't a problem as far as UB is concerned as long as the alignments match (which for pointers to differently qualified types they will) because
6.3.2.3p7 guarantees that:
A pointer to an object type may be converted to a pointer to a
different object type. If the resulting pointer is not correctly
aligned68) for the referenced type, the behavior is undefined.
Otherwise, when converted back again, the result shall compare equal
to the original pointer. When a pointer to an object is converted to a
pointer to a character type, the result points to the lowest addressed
byte of the object. Successive increments of the result, up to the
size of the object, yield pointers to the remaining bytes of the
object.
but you need to be mindful of accesses/dereferences which will be governed by the strict aliasing rule:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:88)
a type compatible with the effective type of the object, a qualified
version of a type compatible with the effective type of the object, a
type that is the signed or unsigned type corresponding to the
effective type of the object, a type that is the signed or unsigned
type corresponding to a qualified version of the effective type of the
object, an aggregate or union type that includes one of the
aforementioned types among its members (including, recursively, a
member of a subaggregate or contained union), or a character type.

Related

Modify pointer-to-const variable (`const int*`) through a pointer to pointer-to-non-const (`int**`)

Let's consider the next code:
#include <stdlib.h>
int some_api_function_with_multiple_results( int** OUT_result ) {
*OUT_result = malloc( 123 * sizeof(**OUT_result) );
return 42;
}
int main(void) {
const int* numbers; /* don't ask me why it's 'const' please, I'm just curious! */
int some_result = some_api_function_with_multiple_results( (int**)&numbers );
free( (void*)numbers );
return EXIT_SUCCESS;
}
Notice the const int* pointer (NOT int* const!!) that is passed by reference to the function expecting a pointer to int* (that is, int**) as a variable for an additional return value.
This compiles fine and without any warnings with GCC 9.4.0 in C90 mode (-std=c90). And the C90 standard (ansi-iso-9899-1990-1.pdf) states the following:
6.1.2.5 Types
A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements.16 Pointers to other types need not have the same representation or alignment requirements.
16 The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.
It's worth noting that C89 draft (http://port70.net/~nsz/c/c89/c89-draft.html#3.1.2.5) imposes a much more restrictive constraint. I wonder if this is the difference between the draft standard and its final version, or is it between C89 and C90? (update: I found a scanned copy of C89 called fipspub160.pdf and it has the same wording as in C90, so it appears to be a draft inaccuracy)
3.1.2.5 Types
A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. Other pointer types need not have the same representation or alignment requirements.
It's also well-known that a value of type T* can be assigned to a variable (or passed as a function parameter) of type const T* just fine and without any explicit coercion such as typecasting:
3.3.16.1 Simple assignment
One of the following shall hold:42
both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right;
42 The asymmetric appearance of these constraints with respect to type qualifiers is due to the conversion (specified in 3.2.2.1) that changes lvalues to "the value of the expression" which removes any type qualifiers from the top type of the expression.
3.3.2.2 Function calls
If the expression that denotes the called function has a type that includes a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters.
The actual thing that is bothering me is that const int* and int* types are not compatible according to both C89 and C90 standards (and C99 as well, so it wasn't an errata that could be fixed in both intermediate Technical Corrigenda or numerous Defect Reports of the standardization committee). So I'm not sure if the following statements make my code having undefined behaviour:
3.5.4.1 Pointer declarators
For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.
3.5.3 Type qualifiers
For two qualified types to be compatible, both shall have the identically qualified version of a compatible type; the order of type qualifiers within a list of specifiers or qualifiers does not affect the specified type.
3.1.2.6 Compatible type and composite type
All declarations that refer to the same object or function shall have compatible type; otherwise the behaviour is undefined.
The question is, does it actually take place? Or is it still valid code?

What rules are there for qualifiers of effective type?

So I was re-reading C17 6.5/6 - 6.5/7 regarding effective type and strict aliasing, but couldn't figure out how to treat qualifiers. Some things confuse me:
I always assumed that qualifiers aren't really relevant for effective type since the rules speak of lvalue access, meaning lvalue conversion that discards qualifiers. But what if the object is a pointer? Qualifiers to the pointed-at data aren't affected by lvalue conversion.
Q1: What if the effective type is a pointer to qualified-type? Can I lvalue access it as a non-qualified pointer to the same type? Where in the standard is this stated?
The exceptions to the strict aliasing rule mention qualifiers in these cases:
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the
object,
— a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,
None of these address qualifiers of the effective type itself, only by the lvalue used for access. Which should be quite irrelevant, because of lvalue conversion... right?
Q2: Does lvalue conversion happen before or after the above quoted rules of effective type/strict aliasing are applied?
Q3: Does the effective type come with qualifiers or not? Where in the standard is this stated?
"Qualified type" being a defined term, the definition is potentially relevant:
Any type so far mentioned is an unqualified type. Each unqualified type has several qualified versions of its type, corresponding to the combinations of one, two, or all three of the const, volatile, and restrict qualifiers. The qualified or unqualified versions of a type are distinct types that belong to the same type category and have the same representation and alignment requirements. A derived type is not qualified by the qualifiers (if any) of the type from which it is derived.
(C17 6.2.5/26)
I note that the _Atomic keyword is different from the other three categorized as type qualifiers, and I presume that this is related to the fact that atomic types are not required to have the same representation or alignment requirements as their corresponding non-atomic types.
I also note that the specification is explicit that qualified and unqualified versions of a type are different types.
With that background,
Q1: What if the effective type is a pointer to qualified-type? Can I lvalue access it as a non-qualified pointer to the same type? Where in the standard is this stated?
I take you to mean this:
const uint32_t *x = &some_uint32;
uint32_t * y = *(uint32_t **) &x;
The effective type of x is const uint32_t * (an unqualified pointer to const-qualified uint32_t), and it is being accessed via an lvalue of type uint32_t * (an unqualified pointer to unqualified uint32_t). This combination is not among the exceptions allowed by the language spec. In particular, uint32_t * is not a qualified version of a const uint32_t *. The resulting behavior is therefore undefined, as specified in C17 6.5, paragraphs 6 and 7.
Although the standard does not discuss this particular application of the SAR, I take it to be justified indirectly. The issue in cases such as this is not so much about accessing the pointer value itself as about producing a pointer whose type discards qualifiers of the pointed-to type.
Note also that the SAR does allow this variation:
const uint32_t *x = &some_uint32;
const uint32_t * const y = *(const uint32_t * const *) &x;
, as const uint32_t * const is a qualified version of const uint32_t *.
Q2: Does lvalue conversion happen before or after the above quoted rules of effective type/strict aliasing are applied?
I don't see how lvalue conversion could be construed to apply before strict aliasing. The strict aliasing rule is expressed in terms of the lvalues used for accessing objects, and the result of lvalue conversion is not an lvalue.
Additionally, as #EricPostpischil observed, the SAR applies to all accesses, which include writes. There is no lvalue conversion in the first place for an lvalue that is being written.
Q3: Does the effective type come with qualifiers or not? Where in the standard is this stated?
Qualified and unqualified versions of a type are different types. I see no justification for interpreting the paragraph 6.5/6's "the declared type of the object" or "the type of the lvalue" as if the type were supposed to be considered stripped of its qualifiers, much less as if all qualifiers in the type(s) from which it is derived were stripped. The words "the type" mean what they say.
Q3: Does the effective type come with qualifiers or not? Where in the standard is this stated?
The effective type includes qualifiers (or lack thereof) because the rules about effective type say that a type is used, and types include qualifiers, and the rules about effective type do not say the qualifiers are disregarded.
C 2018 6.5 6 says the effective type of an object for access to its stored value is one of:
“the declared type of the object” (if any),
“the type of the lvalue” previously used to store into it (if that is not a character type),
“the effective type of the object from which the value is copied” (if it was copied by a byte-copy method and the source has an effective type), or
“the type of the lvalue used for the access.”
The third of these is recursive, so it leads to one of the others. The others all say the effective type is some type, and they do not say the effective type is the unqualified version of that type. It simply is that type; the qualifiers are not removed.
Q2: Does lvalue conversion happen before or after the above quoted rules of effective type/strict aliasing are applied?
Lvalue conversion is immaterial. The aliasing rules in C 2018 6.5 7 make no mention of lvalue conversion, and it might not occur at all, since the rules apply to both reading and modifying values. (The rules in 6.5 7 are for when a stored value is “accessed,” and “access” in the C standard means reading or modifying, per 3.1.) When an object is modified, a new value is written into it; there is no lvalue conversion. When an object is read, the aliasing rules apply to that access, and lvalue conversion happens afterward, as a separate thing.
Q1: What if the effective type is a pointer to qualified-type? Can I lvalue access it as a non-qualified pointer to the same type? Where in the standard is this stated?
The phrasing of these sentences do not make sense in this context. I will consider two meanings for them.
First, I take the first sentence as it stands and the second question as “Can I lvalue access it as a pointer to the unqualified version of the effective type?” Although I suspect my second interpretation below is the one that was intended, this one involves less change to the text. The answer is the C standard does not define the behavior because it does not conform to the rule in 6.5 7.
Given const char *p;, p is a pointer to a qualified type. Then, after, char **q = (char **) &p;, *q is a pointer to an unqualified type. Using *q to read or to modify p would not conform to the rule in 6.5 7. When we consider accessing p with *q, then as we see above, the effective type of the object is const char *, the type of the lvalue is char *, and none of the cases in 6.5 7 say a const char * may be accessed as a char *.
Second, I take the sentences as “What if the effective type is a qualified type? Can I lvalue access it as an unqualified version of the same type?” Again, the answer is the C standard does not define the behavior because it does not conform to the rule in 6.5 7.
Given const int p = 3;, p has a qualified type. Then, after int *q = (int *) &p;, *q has the unqualified version of the same type. When we consider accessing p with *q, the effective type of the object is const int, and the type of the lvalue is int, and none of the cases in 6.5 7 say a const int may be accessed as an int.
None of these address qualifiers of the effective type itself, only by the lvalue used for access. Which should be quite irrelevant, because of lvalue conversion... right?
No, the qualifiers of the effective type are relevant. lvalue conversion, if it occurs, does not make them irrelevant. 6.5 7 states requirements for the lvalue type with relation to the effective type, and the qualifiers of each are parts of their types and partake in the rule in 6.5 7.

Can you assign pointers of different types to each others?

Considering
T1 *p1;
T2 *p2;
Can we assign p1 to p2 or vice versa? If so, can it be done without the casting or we must use a cast?
First, let’s consider assignment without casting. C 2018 6.5.16.1 1 lists constraints for simple assignment, saying that one of them must hold. The first two are for arithmetic, structure, and union types. The last two deal involve null pointer constants or _Bool. The middle two deal with assigning pointers to pointers:
the left operand has atomic, qualified, or unqualified pointer type, and … both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right
the left operand has atomic, qualified, or unqualified pointer type, and … one operand is a pointer to an object type, and the other is a pointer to a qualified or unqualified version of void, and the type pointed to by the left has all the qualifiers of the type pointed to by the right
The latter says we can assign void * to any object pointer and vice-versa, as long as no qualifiers (const, volatile, restrict, or __Atomic) are removed.
The former says we can assign pointers to compatible types, as long as no qualifiers are removed. What are compatible types?
6.2.7 1 says:
Two types are compatible if they are the same.
Additional rules are in 6.7.2, 6.7.3, and 6.7.6.
Two structure, union, or enumerated types declared in separate translation units are compatible if they are, in essence, declared identically. (Interestingly, two such types declared in the same translation unit are not compatible.)
6.7.2 4 says each enumerated type (an enum type) is compatible with an implementation-defined choice of char or a signed or unsigned integer type. So a pointer to some enum can be assigned to a pointer to one char or integer type (and vice-versa), but you cannot know which one without knowing something about the particular C implementation.
6.7.3 11 says qualified types must have the same qualifiers to be compatible. Thus an int is not compatible with a const int, and this prevents an int * from being assigned to a const int *.
6.7.6.1 2 says for two pointer types to be compatible, they must be identically qualified pointers to compatible types. This tells us, for example, that int * is not compatible with char *, and, therefore, by assignment constraints above, a char ** may not be assigned to an int **.
6.7.6.2 6 says for two array types to be compatible, they must have compatible element types and, if they both have integer constant sizes, they must be the same. (This allows that an array with unknown size may be compatible with an array of known size. However, additional text says that if the arrays ultimately have different sizes, using them in a context that requires them to be compatible has undefined behavior. So an assignment of pointers to such arrays may satisfy its constraints and compile without error, but the resulting program may misbehave.)
6.7.6.3 15 presents somewhat complicated rules for the compatibility of function types. These rules are complicated because functions may be declared with or without parameter lists, with ellipses, and so on. I will omit complete discussion of these.
Those are the rules that tell you what pointer assignments may be made without casts.
6.5.4 discusses casts. Its constraints do not restrict which pointer types may be converted to which other pointer types. (They do prohibit other things involving pointers, such as converting a pointer type to a floating-point type.) So you can specify any pointer conversion you want in a cast, and, as long as the resulting type is compatible with the type to which it is being assigned, no assignment or cast constraint is violated. However, there is a still a question about whether the conversion is proper.
6.3.2.3 specifies rules for pointer conversions. Those dealing with conversion from pointers to pointers (excluding integers and null pointer constants) say:
Any pointer to an object type (not to function types) may be converted to a pointer to void and vice-versa. The result of converting an object pointer to a void pointer and back compares equal to the original.
A pointer may be converted to the same type with more qualifiers, and the result compares equal to the original.
A pointer to an object type may be converted to a pointer to a different object type if the resulting pointer is correctly aligned for its type (otherwise, the behavior is undefined). When converted back, the result compares equal to the original pointer. (Note that, while you are allowed to make this conversion, this rule does not say the resulting pointer may be used to access an object of the new type. There are other rules in C about that.)
A pointer to a function type may be converted to a pointer to another function type. When converted back, the result compares equal to the original pointer. (As with objects, you are allowed to make this conversion, but using the resulting pointer to call an incompatible function has undefined behavior.)
So, when casts are used, you may convert any object pointer type to any object pointer type and assign it as long as the alignment requirement is satisfied, and you may convert any function pointer type to any function pointer type and assign it.
When types T1 and T2 are different, assignments between T1 *p1 and T2 *p2 are formally disallowed, unless at least one of T1 and T2 is void and the other is an object (not function) type.
In many cases incompatible assignments will work in practice, especially on machines (such as all popular ones today) with "flat" address spaces and where all pointer types share the same internal representation.
After a "mixed-mode" pointer assignment, however, problems may very well occur when the pointers are dereferenced, due to (1) alignment issues and (2) strict aliasing.
Since "mixed mode" pointer assignments are formally illegal and often a bad idea, most compilers will warn about them. Most compilers allow the warnings to be suppressed with an explicit cast. Most of the time, the cast serves only to suppress the warning; it does not introduce any actual conversion that wouldn't have been performed anyway. (That is, changing p1 = p2 to p1 = (T1 *)p2 is a lot like changing i = f to i = (int)f, where i is an int and f is a float.)
Addendum: I wrote, "When types T1 and T2 are different," but a more precise statement would be when they are incompatible. For example, types char and unsigned char are compatible, so assignment between pointers of those types is fine. See Eric Postpischil's longer answer for more details.

Does copying data byte wise between types break strict aliasing?

Suppose I have 2 types A and B with same size and I have two variables
A a = ... ; // Initialized to some constant of type A
B b;
If I copy the contents of a to b using something like -
assert(sizeof(A) == sizeof(B));
size_t t;
for( t=0; t < sizeof(A); t++){
((char*)&b)[t] = ((char*)&a)[t];
}
Does this break strict aliasing rules of C?
I know casting a pointer to char* and reading it is not UB but I am concerned about both the derefences involved in the assignment.
If this is not UB, can this be a valid way for type punning?
This code does not violate aliasing rules. From the latest draft (n1570), §6.5 section 7:
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the
object,
— a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or
— a character type
(emphasis mine)
I am concerned about both the derefences involved in the assignment.
These dereferences are accessing the stored value using a character type.
Of course, you could still trigger undefined behavior if the representation of your A is not a valid representation for B.
In cases where the destination has a declared type, there is no problem, but in cases where the destination is known only via pointer the Standard is ambiguous. According to the absolutely horribly written 6.5p6, copying data using memcpy or memmove, or "as an array of character type" [whatever that means] will cause the Effective Type of the source to be applied to the destination. The Standard doesn't specify what one must do to copy a sequence of bytes without the operation being regarded as copying an "array of character type".

Is const-casting via a union undefined behaviour?

Unlike C++, C has no notion of a const_cast. That is, there is no valid way to convert a const-qualified pointer to an unqualified pointer:
void const * p;
void * q = p; // not good
First off: Is this cast actually undefined behaviour?
In any event, GCC warns about this. To make "clean" code that requires a const-cast (i.e. where I can guarantee that I won't mutate the contents, but all I have is a mutable pointer), I have seen the following "conversion" trick:
typedef union constcaster_
{
void * mp;
void const * cp;
} constcaster;
Usage: u.cp = p; q = u.mp;.
What are the C language rules on casting away constness through such a union? My knowledge of C is only very patchy, but I've heard that C is far more lenient about union access than C++, so while I have a bad feeling about this construction, I would like an argument from the standard (C99 I suppose, though if this has changed in C11 it'll be good to know).
It's implementation defined, see C99 6.5.2.3/5:
if the value of a member of a union object is used when the most
recent store to the object was to a different member, the behavior is
implementation-defined.
Update: #AaronMcDaid commented that this might be well-defined after all.
The standard specified the following 6.2.5/27:
Similarly, pointers to qualified or unqualified versions of compatible
types shall have the same representation and alignment
requirements.27)
27) The same representation and alignment requirements are meant to
imply interchangeability as arguments to functions, return values from
functions, and members of unions.
And (6.7.2.1/14):
A pointer to a union object, suitably converted, points to each of its
members (or if a member is a bitfield, then to the unit in which it
resides), and vice versa.
One might conclude that, in this particular case, there is only room for exactly one way to access the elements in the union.
My understanding it that the UB can arise only if you try to modify a const-declared object.
So the following code is not UB:
int x = 0;
const int *cp = &x;
int *p = (int*)cp;
*p = 1; /* OK: x is not a const object */
But this is UB:
const int cx = 0;
const int *cp = &cx;
int *p = (int*)cp;
*p = 1; /* UB: cx is const */
The use of a union instead of a cast should not make any difference here.
From the C99 specs (6.7.3 Type qualifiers):
If an attempt is made to modify an object defined with a const-qualified type through use
of an lvalue with non-const-qualified type, the behavior is undefined.
The initialization certainly won't cause UB. The conversion between qualified pointer types is explicitly allowed in §6.3.2.3/2 (n1570 (C11)). It's the use of content in that pointer afterwards that cause UB (see #rodrigo's answer).
However, you need an explicit cast to convert a void* to a const void*, because the constraint of simple assignment still require all qualifier on the LHS appear on the RHS.
§6.7.9/11: ... The initial value of the object is that of the expression (after conversion); the same type constraints and conversions as for simple assignment apply, taking the type of the scalar to be the unqualified version of its declared type.
§6.5.16.1/1: (Simple Assignment / Contraints)
... both operands are
pointers to qualified or unqualified versions of compatible types, and the type pointed
to by the left has all the qualifiers of the type pointed to by the right;
... one operand is a pointer
to an object type, and the other is a pointer to a qualified or unqualified version of
void, and the type pointed to by the left has all the qualifiers of the type pointed to
by the right;
I don't know why gcc just gives a warning though.
And for the union trick, yes it's not UB, but still the result is probably unspecified.
§6.5.2.3/3 fn 95: If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
§6.2.6.1/7: When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values. (* Note: see also §6.5.2.3/6 for an exception, but it doesn't apply here)
The corresponding sections in n1124 (C99) are
C11 §6.3.2.3/2 = C99 §6.3.2.3/2
C11 §6.7.9/11 = C99 §6.7.8/11
C11 §6.5.16.1/1 = C99 §6.5.16.1/1
C11 §6.5.2.3/3 fn 95 = missing ("type punning" doesn't appear in C99)
C11 §6.2.6.1/7 = C99 §6.2.6.1/7
Don't cast it at all. It's a pointer to const which means that attempting to modify the data is not allowed and in many implementations will cause the program to crash if the pointer points to unmodifiable memory. Even if you know the memmory can be modified, there may be other pointers to it that do not expect it to change e.g. if it is part of the storage of a logically immutable string.
The warning is there for good reason.
If you need to modify the content of a const pointer, the portable safe way to do it is first to copy the memory it points to and then modify that.

Resources