Recently I had code (in C) where I passed the address of an int to a function expecting a pointer to unsigned char. Is this not valid? Is this UB or what?
e.g.,
void f(unsigned char*p)
{
// do something
}
// Call it somewhere
int x = 0; // actually it was uint32_t if it makes difference
f(&x);
I did get a warning though ... Compiled in Xcode
int * and unsigned char * are not considered compatible types, so implicit conversion will issue a diagnostic. However, the standard does allow explicit casting between different pointers, subject to two rules (C11 section 6.3.2.3):
Converting a type "pointer to A" to type "pointer to B" and back to "pointer to A" shall result in the same original pointer. (i.e., if p is of type int *, then (int *)(double *)p will yield p)
Converting any pointer to a char * will point to the lowest-addressable byte of the object.
So, in your case, an explicit (unsigned char *) cast will yield a conforming program without any undefined behavior.
The cast is required, see C11 (n1570) 6.5.2.2 p.2:
[…] Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter.
This refers to the rules for assignment, the relevant part is (ibid. 6.5.16.1 p.1)
One of the following shall hold:
[…]
the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right.
[…]
And unsigned char isn’t compatible to int.
These rules both appear in a “constraint” section, where “shall” means that the compiler has to give a “diagnostic message” (cf. C11 5.1.1.3) and may stop compiling (or whatever, everything beyond that diagnostic is, strictly speaking, out of the scope of the C standard). Your code is an example of a constraint violation.
Other examples of constraint violations are calling a (prototyped and non-variadic) function with the wrong number of arguments, using bitwise operators on doubles, or redeclaring an identifier with an incompatible type in the same scope, ibid. 5.1.1.3 p.2:
Example
An implementation shall issue a diagnostic for the translation unit:
char i;
int i;
because in those cases where wording in this International Standard describes the behavior for a construct as being both a constraint error and resulting in undefined behavior, the constraint error shall be diagnosed.
Syntax violations are treated equally.
So, strictly speaking, your program is as invalid as
int foo(int);
int main() {
It's my birthday!
foo(0.5 ^ 42, 12);
}
which a conforming implementation very well may compile, maybe to a program having undefined behavior, as long as it gives at least one diagnostic (e.g. a warning).
For e.g. gcc, a warning is a diagnostic (you can turn syntax and constraint violations into errors with -pedantic-errors).
The term ill-formed may be used to refer to either a syntax or a constraint violation, the C standard doesn't use this term, but cf. C++11 (n3242):
1.3.9
ill-formed program
program that is not well formed
1.3.26
well-formed program
C++ program constructed according to the syntax rules, diagnosable semantic rules, and the One Definition Rule.
The language-lawyer attitude aside, your code will probably always either be not compiled at all (which should be reason enough to do the cast), or show the expected behavior.
C11, §6.5.2.2:
2 Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter.
§6.5.16.1 describes assignment in terms of a list of constraints, including
the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right
int and unsigned char are not compatible types, so the program is not well-formed and the Standard doesn't even guarantee that it will compile.
Although some would say "it is undefined behavior according to the standard", here is what happens de-facto (answering by an example):
Safe:
void f(char* p)
{
char r, w = 0;
r = p[0]; // read access
p[0] = w; // write access
}
...
int x = 0;
f((char*)&x); // the casting is just in order to emit the compilation warning
This code is safe as long as you access memory with p[i], where 0 <= i <= sizeof(int)-1.
Unsafe:
void f(int* p)
{
int r, w = 0;
r = p[0]; // read access
p[0] = w; // write access
}
...
char x[sizeof(int)] = {0};
f((int*)&x); // the casting is just in order to emit the compilation warning
This code is unsafe because although the allocated variable is large enough to accommodate an int, its address in memory is not necessarily a multiple of sizeof(int). As a result, unless the compiler (as well as the underlying HW architecture) supports unaligned load/store operations, a memory access violation will occur during runtime if the address of this variable in memory is indeed not properly aligned.
Related
Take this code for instance:
const char *s;
char *t = s;
This would emit this: warning: assignment discards 'const' qualifier from pointer target type
It's easy to silence the compiler, by just adding a cast:
char *t = (char*)s;
And you can do similar things for regular variables and not just pointers, and it does not only apply to the const qualifier. You can basically cast from any type to any type, even if some of those casts would cause trouble.
I have also read that you never should cast malloc because a cast to and from void pointer is always safe.
But what does the casting actually do? Does it do something else than just preventing the compiler from spitting out warnings. When are explicit casts actually necessary?
Clarification:
I'm talking about assignments here. Not things like double res = (double)a / b. In that situation, I know what the cast does, and in this case we could easily get rid of the explicit cast in favor of an implicit cast like this: double c = a; double res = c/b; or just double res = (1.0 * a) / b
C 2018 6.5.16.1 1 specifies constraints for the simple assignment operator. For left and right operands that are both pointers other than pointers to void, these say:
One of the following shall hold: … both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right; …
Thus, the C standard requires that pointers be of the same “kind” (compatible types) and that the assignment not remove any qualifiers (the left side has all the qualifiers of the right side, but possibly more). If this requirement is violated, the compiler must issue a diagnostic message (although it can still choose to accept the program; the message can be a warning rather than an error).
6.5.4 specifies constraints for casts, in paragraphs 2 to 4. None of them limit conversions from one pointer type to another. Therefore, you can specify any pointer conversion with a cast, and the compiler is not required to issue a diagnostic (although some may choose to).
The underlying philosophy is that an assignment allows you to implicitly do conversions that are “normal” uses of pointers, but explicitly using a cast allows “special” uses of pointers.
The fact that a particular conversion is allowed with a cast does not necessarily mean it is meaningful or well-defined. 6.3.2.3 defines some rules for conversions of pointers. For example, an int * may be converted to a char *, and the resulting pointer can be used to examine the bytes that represent an int. But converting a char * to an int * may result in undefined behavior. So a cast will allow you to do a conversion, but whether the result is useful or is what you want depends on other rules in the C standard.
When a conversion is allowed and defined, it will have the same effect whether it occurs implicitly by assignment or explicitly by cast. The cast does not change what the conversion is; it only changes whether it is allowed (without a diagnostic) or not.
In C programming, comparing two different types of pointers like this :
int i = 1;
double d = 2.5;
int *ip = &i;
double *dp = &d;
if(ip != dp) // is it UB?
printf("Not same\n");
Is ip != dp undefined behaviour in C?
The direct comparison ip != dp is invalid in C. Specification of != operator does not allow mixing int * and double * pointers in one comparison. It is a constraint violation in C (aka a "compile error"). A conforming C compiler will report your code as invalid by issuing a diagnostic message.
What happens next depends solely on your compiler. It has nothing to do with C language.
Referring to this code as "C code that produces undefined behavior" would be misleading. It is formally true, but it makes exactly as much sense as saying that the text of "War and Piece" is "C code that produces undefined behavior" (in some strange C compiler that accepts it).
The key point here is that this code language constraints meaning that it is not C code at all.
This is not well defined. A prerequisite of the != operator is that if both operands are pointers, they must be to compatible types. int and double are not compatible types.
From section 6.5.9 of the C standard:
2 One of the following shall hold:
— both operands have arithmetic type;
— both operands are pointers to qualified or unqualified versions of
compatible types;
— one operand is a pointer to an object type and the other is a
pointer to a qualified or unqualified version of void; or
— one operand is a pointer and the other is a null pointer constant.
Yes this operation is undefined, and Kernighan & Ritchie were mentioned about that in their book "The C Programming Language":
Any pointer can be meaningfully compared for equality or inequality with zero. But the behavior is undefined for arithmetic or comparisons with pointers that do not point to members of the same array.
This means that pointers that points to different types cannot be checked for equality.
This is going to be a long, language lawyerish question, so I'd like to quickly state why I find it relevant. I am working on a project where strict standard compliance is crucial (writing a language that compiles to C). The example I am going to give seems like a standard violation on the part of clang, and so, if this is the case, I'd like to confirm it.
gcc says that a conditional with a pointer to a restrict qualified pointer can not co-inhabit a conditional statement with a void pointer. On the other hand, clang compiles such things fine. Here is an example program:
#include <stdlib.h>
int main(void){
int* restrict* A = malloc(8);
A ? A : malloc(8);
return 0;
}
For gcc, the options -std=c11 and -pedantic may be included or not in any combination, likewise for clang and the options -std=c11 and -Weverything. In any case, clang compiles with no errors, and gcc gives the following:
tem-2.c: In function ‘main’:
tem-2.c:7:2: error: invalid use of ‘restrict’
A ? A : malloc(8);
^
The c11 standard says the following with regard to conditional statements, emphasis added:
6.5.15 Conditional operator
...
One of the following shall hold for the second and third operands:
— both operands have arithmetic type;
— both operands have the same structure or union type;
— both operands have void type;
— both operands are pointers to qualified or unqualified versions of compatible types;
— one operand is a pointer and the other is a null pointer constant; or
— one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void.
...
If both the second and third operands are pointers or one is a null pointer constant and the
other is a pointer, the result type is a pointer to a type qualified with all the type qualifiers
of the types referenced by both operands. Furthermore, if both operands are pointers to
compatible types or to differently qualified versions of compatible types, the result type is
a pointer to an appropriately qualified version of the composite type; if one operand is a
null pointer constant, the result has the type of the other operand; otherwise, one operand
is a pointer to void or a qualified version of void, in which case the result type is a
pointer to an appropriately qualified version of void.
...
The way I see it, the first bold portion above says that the two types can go together, and the second bold portion defines the result to be a pointer to a restrict qualified version of void. However, as the following states, this type can not exist, and so the expression is correctly identified as erroneous by gcc:
6.7.3 Type qualifiers, paragraph 2
Types other than pointer types whose referenced type is an object type shall not be restrict-qualified.
Now, the problem is that a "shall not" condition is violated by this example program, and so is required to produce an error, by the following:
5.1.1.3 Diagnostics, paragraph 1
A conforming implementation shall produce at least one diagnostic message (identified in
an implementation-defined manner) if a preprocessing translation unit or translation unit
contains a violation of any syntax rule or constraint, even if the behavior is also explicitly
specified as undefined or implementation-defined. Diagnostic messages need not be
produced in other circumstances.
It seems clang is not standard compliant by treating an erroneous type silently. That makes me wonder what else clang does silently.
I am using gcc version 5.4.0 and clang version 3.8.0, on an x86-64 Ubuntu machine.
Yes it looks like a bug.
Your question more briefly: can void be restrict qualified? Since void is clearly not a pointer type, the answer is no. Because this violates a constraint, the compiler should give a diagnostic.
I was able to trick clang to confess its sins by using a _Generic expression
puts(_Generic(A ? A : malloc(8), void* : "void*"));
and clang tells me
static.c:24:18: error: controlling expression type 'restrict void *' not compatible with any generic association type
puts(_Generic(A ? A : malloc(8), void* : "void*"));
which shows that clang here really tries to match a nonsense type restrict void*.
Please file them a bug report.
While a compiler could satisfy all obligations surrounding restrict by ignoring the qualifier altogether, a compiler which wants to keep track of what it is or is not allowed to do needs to keep track of which pointers hold copies of restrict pointers. Given something like:
int *foo;
int *bar;
int wow(int *restrict p)
{
foo = p;
...
*p = 123;
*foo = 456;
*p++;
*bar = 890;
return *p;
}
since foo is derived from p, a compiler must allow for accesses made via
foo to alias accesses via p. A compiler need not make such allowances
for accesses made via bar, since that is known not to hold an address derived from p.
The rules surrounding restrict get murky in cases where a pointer may or
may not be derived from another. A compiler would certainly be allowed to
simply ignore a restrict qualifier in cases where it can't track all of
the pointers derived from a pointer; I'm not sure if any such cases would
invoke UB even if nothing ever modifies the storage identified by the
pointer. If a syntactic construct is structurally guaranteed to invoke
UB, having a compiler squawk may be more useful than having it act in an
arbitrary fashion (though having a compiler simply ignore any restrict
qualifiers it can't fully handle might be more useful yet).
Given the following C code, what is the difference between a = f; and a = (int *) f;?
float *f;
int *a;
...
a = f;
a = (int *) f;
float *f;
int *a;
a = f;
This assignment is erroneous (there is a C constraint violation), there is no implicit conversion between pointer types (except with void *). A compiler can refuse to compile a program with this assignment.
Given:
float *f;
int *a;
This:
a = f;
is a constraint violation. It requires a diagnostic from any conforming compiler. After issuing the required diagnostic, it may or may not reject the program. (IMHO it should do so.) A conforming compiler may choose to accept it with a mere warning (which qualifies as a diagnostic), but once it does so the behavior of the program is undefined. Compilers that do this most commonly generate an implicit conversion from float* to int*, giving the same behavior as if there were a cast (an explicit conversion), but the standard does not require that.
Non-conforming compilers, of course are free to do anything they like.
Conclusion: Don't write code like that. Even if your compiler lets you get away with it, another compiler might not. If you want to convert from one pointer type to another, use a cast. Aside from validity issues, the cast makes it much clearer to the reader that something funny is going on. If your compiler gave you a warning, heed it. If it didn't, find out how to increase the warning levels on your compiler.
This:
a = (int *) f;
takes the value of f (which is of type float*) and explicitly converts it to type int*, then assigns that int* value to a. (I'll assume that something between the declaration and the assignment has set f to some valid value.)
If f is a null pointer, the conversion is well defined, and yields a null pointer of type int*. The rules for converting a non-null object pointer to another pointer type are (quoting N1570 6.3.2.3p7):
A pointer to an object type may be converted to a pointer to a
different object type. If the resulting pointer is not correctly
aligned for the referenced type, the behavior is undefined.
Otherwise, when converted back again, the result shall compare equal
to the original pointer.
This kind of conversion, assuming int and float are the same size and have similar alignment requirements, is likely intended to let you treat a float object as if it were an int object. This is called "type-punning". If int and float aren't the same size, or if they have different alignment requirements, this can easily blow up in your face, crashing your program (if you're lucky) or giving you garbage results (if you're not). (Yes, crashing the program is a good outcome; it lets you know there's a problem.)
If you really need to do that for some reason, it's better to define a union with int and float members, or to use memcpy() to copy the contents of a float object into an int object.
But it very rarely makes sense to do that kind of thing. If you want to examine the representation of a float object, it's better to treat it as an array of unsigned char something that the language standard explicitly permits.
6.5.16.1 Simple assignment
the left operand has atomic, qualified, or unqualified pointer type, and (considering
the type the left operand would have after lvalue conversion) both operands are
pointers to qualified or unqualified versions of compatible types, and the type pointed
to by the left has all the qualifiers of the type pointed to by the right.
So, a = f is a constraint violation and invokes undefined behavior.
In second case you are making f (by casting it)to be compatible to a's type. It is legal to do a casting in C (not sure about other languages).
But it should be noted that after casting f is still pointer to float and you have to cast it every time when you will assign it to a.
a = (int*) f; makes explicit that you want to cast a float* pointer to an int* pointer. Without it, you'll receive an incompatible pointer types error.
Your code will compile (at least in my linux and gcc). But you will get a warning.
If you use a = f; and then use a somewhere in your code, you will get erroneous data, because a float is stored in a different format in memory. Even if you do the casting first you probably will get erroneous results, but the compiler sees your casting and assumes you know what you are doing.
a = f; //assignment
// is a constraint violation
a = (int *) f; //cast + assignment
Explicitly casting float pointer to int pointer.simply hides compiler warnings or errors.
but very well might crash when running as the sizes of what the program expects when dereferencing the pointer differs from reality.
Unlike C++, C has no notion of a const_cast. That is, there is no valid way to convert a const-qualified pointer to an unqualified pointer:
void const * p;
void * q = p; // not good
First off: Is this cast actually undefined behaviour?
In any event, GCC warns about this. To make "clean" code that requires a const-cast (i.e. where I can guarantee that I won't mutate the contents, but all I have is a mutable pointer), I have seen the following "conversion" trick:
typedef union constcaster_
{
void * mp;
void const * cp;
} constcaster;
Usage: u.cp = p; q = u.mp;.
What are the C language rules on casting away constness through such a union? My knowledge of C is only very patchy, but I've heard that C is far more lenient about union access than C++, so while I have a bad feeling about this construction, I would like an argument from the standard (C99 I suppose, though if this has changed in C11 it'll be good to know).
It's implementation defined, see C99 6.5.2.3/5:
if the value of a member of a union object is used when the most
recent store to the object was to a different member, the behavior is
implementation-defined.
Update: #AaronMcDaid commented that this might be well-defined after all.
The standard specified the following 6.2.5/27:
Similarly, pointers to qualified or unqualified versions of compatible
types shall have the same representation and alignment
requirements.27)
27) The same representation and alignment requirements are meant to
imply interchangeability as arguments to functions, return values from
functions, and members of unions.
And (6.7.2.1/14):
A pointer to a union object, suitably converted, points to each of its
members (or if a member is a bitfield, then to the unit in which it
resides), and vice versa.
One might conclude that, in this particular case, there is only room for exactly one way to access the elements in the union.
My understanding it that the UB can arise only if you try to modify a const-declared object.
So the following code is not UB:
int x = 0;
const int *cp = &x;
int *p = (int*)cp;
*p = 1; /* OK: x is not a const object */
But this is UB:
const int cx = 0;
const int *cp = &cx;
int *p = (int*)cp;
*p = 1; /* UB: cx is const */
The use of a union instead of a cast should not make any difference here.
From the C99 specs (6.7.3 Type qualifiers):
If an attempt is made to modify an object defined with a const-qualified type through use
of an lvalue with non-const-qualified type, the behavior is undefined.
The initialization certainly won't cause UB. The conversion between qualified pointer types is explicitly allowed in §6.3.2.3/2 (n1570 (C11)). It's the use of content in that pointer afterwards that cause UB (see #rodrigo's answer).
However, you need an explicit cast to convert a void* to a const void*, because the constraint of simple assignment still require all qualifier on the LHS appear on the RHS.
§6.7.9/11: ... The initial value of the object is that of the expression (after conversion); the same type constraints and conversions as for simple assignment apply, taking the type of the scalar to be the unqualified version of its declared type.
§6.5.16.1/1: (Simple Assignment / Contraints)
... both operands are
pointers to qualified or unqualified versions of compatible types, and the type pointed
to by the left has all the qualifiers of the type pointed to by the right;
... one operand is a pointer
to an object type, and the other is a pointer to a qualified or unqualified version of
void, and the type pointed to by the left has all the qualifiers of the type pointed to
by the right;
I don't know why gcc just gives a warning though.
And for the union trick, yes it's not UB, but still the result is probably unspecified.
§6.5.2.3/3 fn 95: If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
§6.2.6.1/7: When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values. (* Note: see also §6.5.2.3/6 for an exception, but it doesn't apply here)
The corresponding sections in n1124 (C99) are
C11 §6.3.2.3/2 = C99 §6.3.2.3/2
C11 §6.7.9/11 = C99 §6.7.8/11
C11 §6.5.16.1/1 = C99 §6.5.16.1/1
C11 §6.5.2.3/3 fn 95 = missing ("type punning" doesn't appear in C99)
C11 §6.2.6.1/7 = C99 §6.2.6.1/7
Don't cast it at all. It's a pointer to const which means that attempting to modify the data is not allowed and in many implementations will cause the program to crash if the pointer points to unmodifiable memory. Even if you know the memmory can be modified, there may be other pointers to it that do not expect it to change e.g. if it is part of the storage of a logically immutable string.
The warning is there for good reason.
If you need to modify the content of a const pointer, the portable safe way to do it is first to copy the memory it points to and then modify that.