Related
In what version(s) of the C standards (if any) is the following well-defined?
void foo(void) {
char *nullPtr = NULL;
&*nullPtr;
}
Note that I am not assigning the result to anything - the second line is a simple statement.
This should be a question with an obvious answer, but (as seemingly happens way too often on such questions) I have heard just as many people say the answer is "obviously undefined" as "obviously defined".
On a rather related note, what about the following? Should foo produce a read of c?
extern volatile char c;
void bar(void) {
volatile char *nonnullptr = &c;
&*nonnullptr;
}
(C++ version of the same question: Is &*NULL well-defined in C++?)
While attempts to dereference a null pointer cause undefined behavior, so *nullPtr is illegal, &*nullPtr is perfectly well-defined. According to footnote 102 in the C11 Draft Standard:
Thus, &*E is equivalent to E (even if E is a null pointer),....
This is a result of the fact that, for the unary & operator (§6.5.3.2 ¶3):
If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted,....
The C99 Standard has the same language, but this does not appear in the C90 Standard, and my reading of that standard is that &*nullPtr would indeed cause undefined behavior in pre-C99 implementations.
From the C90 Standard (§6.3.2.3):
The result of the unary & (address-of) operator is a pointer to the object or function designated by its operand....
and:
The unary * operator denotes indirection.... If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
Curiously, I don't see any discussion of this change in the C99 Rationale, though I may just be not finding it.
Is the behaviour of this code well defined?
#include <stdio.h>
#include <stdint.h>
int main(void)
{
void *ptr = (char *)0x01;
size_t val;
ptr = (char *)ptr + 1;
val = (size_t)(uintptr_t)ptr;
printf("%zu\n", val);
return 0;
}
I mean, can we assign some fixed number to a pointer and increment it even if it is pointing to some random address? (I know that you can not dereference it)
The assignment:
void *ptr = (char *)0x01;
Is implementation defined behavior because it is converting an integer to a pointer. This is detailed in section 6.3.2.3 of the C standard regarding Pointers:
5 An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined,
might not be correctly aligned, might not point to an entity
of the referenced type, and might be a trap representation.
As for the subsequent pointer arithmetic:
ptr = (char *)ptr + 1;
This is dependent on a few things.
First, the current value of ptr may be a trap representation as per 6.3.2.3 above. If it is, the behavior is undefined.
Next is the question of whether 0x1 points to a valid object. Adding a pointer and an integer is only valid if both the pointer operand and the result point to elements of an array object (a single object counts as an array of size 1) or one element past the array object. This is detailed in section 6.5.6:
7 For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a
pointer to the first element of an array of length one with the type
of the object as its element type
8 When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer
operand. If the pointer operand points to an element of an array
object, and the array is large enough, the result points to an element
offset from the original element such that the difference of the
subscripts of the resulting and original array elements equals the
integer expression. In other words, if the expression P points to the
i-th element of an array object, the expressions (P)+N (equivalently, N+(P) ) and (P)-N (where N has the value n ) point to,
respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an
array object, the expression (P)+1 points one past the last element of
the array object, and if the expression Q points one past the
last element of an array object, the expression (Q)-1 points to
the last element of the array object. If both the pointer
operand and the result point to elements of the same array
object, or one past the last element of the array object, the
evaluation shall not produce an overflow; otherwise, the behavior is
undefined. If the result points one past the last element of the
array object, it shall not be used as the operand of a unary
* operator that is evaluated.
On a hosted implementation the value 0x1 almost certainly does not point to a valid object, in which case the addition is undefined. An embedded implementation could however support setting pointers to specific values, and if so it could be the case that 0x1 does in fact point to a valid object. If so, the behavior is well defined, otherwise it is undefined.
No, the behaviour of this program is undefined. Once an undefined construct is reached in a program, any future behaviour is undefined. Paradoxically, any past behaviour is undefined too.
The result of void *ptr = (char*)0x01; is implementation-defined, due in part to the fact that a char can have a trap representation.
But the behaviour of the ensuing pointer arithmetic in the statement ptr = (char *)ptr + 1; is undefined. This is because pointer arithmetic is only valid within arrays including one past the end of the array. For this purpose an object is an array of length one.
Yes, the code is well-defined as implementation-defined. It is not undefined. See ISO/IEC 9899:2011 [6.3.2.3]/5 and note 67.
The C language was originally created as a system programming language. Systems programming required manipulating memory-mapped hardware, requiring that you would stuff hard-coded addresses into pointers, sometimes increment those pointers, and read and write data from and to the resulting address. To that end, assigning and integer to a pointer and manipulating that pointer using arithmetic is well defined by the language. By making it implementation-defined, what the language allows is that all kinds of things can happen: from the classic halt-and-catch-fire to raising a bus error when trying to dereference an odd address.
The difference between undefined behaviour and implementation-defined behaviour is basically undefined behaviour means "don't do that, we don't know what will happen" and implementation-defined behaviour means "it's OK to go ahead and do that, it's up to you to know what will happen."
It is undefined behavior.
From N1570 (emphasis added):
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
If the value is a trap representation, reading it is undefined behavior:
Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.) Such a representation is called a trap representation.
And
An identifier is a primary expression, provided it has been declared as designating an object (in which case it is an lvalue) or a function (in which case it is a function designator).
Therefore, the line void *ptr = (char *)0x01; is already potentially undefined behavior, on an implementation where (char*)0x01 or (void*)(char*)0x01 is a trap representation. The left-hand side is an lvalue expression that does not have character type and reads a trap representation.
On some hardware, loading an invalid pointer into a machine register could crash the program, so this was a forced move by the standards committee.
The Standard does not require that implementations process integer-to-pointer conversions in a meaningful fashion for any particular integer values, or even for any possible integer values other than Null Pointer Constants. The only thing it guarantees about such conversions is that a program which stores the result of such a conversion directly into an object of suitable pointer type and does nothing with it except examine the bytes of that object will, at worst, see Unspecified values. While the behavior of converting an integer to a pointer is Implementation-Defined, nothing would forbid any implementation (no matter what it actually does with such conversions!) from specifying that some (or even all) of the bytes of the representation having Unspecified values, and specifying that some (or even all) integer values may behave as though they yield trap representations.
The only reasons the Standard says anything at all about integer-to-pointer conversions are that:
In some implementations, the construct is meaningful, and some programs for those implementations require it.
The authors of the Standard did not like the idea of a construct that was used on some implementations would represent a constraint violation on others.
It would have been odd for the Standard to describe a construct but then specify that it has Undefined Behavior in all cases.
Personally, I think the Standard should have allowed implementations to treat integer-to-pointer conversions as constraint violations if they don't define any situations where they would be useful, rather than require that compilers accept the meaningless code, but that wasn't the philosophy at the time.
I think it would be simplest to simply say that any operation involving integer-to-pointer conversions with anything other than intptr_t or uintptr_t values received from pointer-to-integer conversions invokes Undefined Behavior, but then note that it is common for quality implementations intended for low-level programming to process Undefined Behavior "in a documented manner characteristic of the environment". The Standard doesn't specify when implementations should process programs that invoke UB in that fashion but instead treats it as a Quality of Implementation issue.
If an implementation specifies that integer-to-pointer conversions operate in a fashion that would define the behavior of
char *p = (char*)1;
p++;
as equivalent to "char p = (char)2;", then the implementation should be expected to work that way. On the other hand, an implementation could define the behavior of integer-to-pointer conversion in such a way that even:
char *p = (char*)1;
char *q = p; // Not doing any arithmetic here--just a simple assignment
would release nasal demons. On most platforms, a compiler where arithmetic on pointers produced by integer-to-pointer conversions behaved oddly would not be viewed as a high-quality implementation suitable for low-level programming. A programmer that is not intending to target any other kind of implementations could thus expect such constructs to behave usefully on compilers for which the code was suitable, even though the Standard does not require it.
When a pointer to a particular type (say int, char, float, ..) is incremented, its value is increased by the size of that data type. If a void pointer which points to data of size x is incremented, how does it get to point x bytes ahead? How does the compiler know to add x to value of the pointer?
Final conclusion: arithmetic on a void* is illegal in both C and C++.
GCC allows it as an extension, see Arithmetic on void- and Function-Pointers (note that this section is part of the "C Extensions" chapter of the manual). Clang and ICC likely allow void* arithmetic for the purposes of compatibility with GCC. Other compilers (such as MSVC) disallow arithmetic on void*, and GCC disallows it if the -pedantic-errors flag is specified, or if the -Werror=pointer-arith flag is specified (this flag is useful if your code base must also compile with MSVC).
The C Standard Speaks
Quotes are taken from the n1256 draft.
The standard's description of the addition operation states:
6.5.6-2: For addition, either both
operands shall have arithmetic type,
or one operand shall be a pointer to
an object type and the other shall
have integer type.
So, the question here is whether void* is a pointer to an "object type", or equivalently, whether void is an "object type". The definition for "object type" is:
6.2.5.1: Types are partitioned into object types (types that fully describe objects) , function types (types that describe functions), and incomplete types (types that describe objects but lack information needed to determine their sizes).
And the standard defines void as:
6.2.5-19: The void type comprises
an empty set of values;
it is an incomplete type that cannot
be completed.
Since void is an incomplete type, it is not an object type. Therefore it is not a valid operand to an addition operation.
Therefore you cannot perform pointer arithmetic on a void pointer.
Notes
Originally, it was thought that void* arithmetic was permitted, because of these sections of the C standard:
6.2.5-27: A pointer to void shall have the same representation and alignment
requirements as a pointer to a
character type.
However,
The same representation and alignment
requirements are meant to imply
interchangeability as arguments to
functions, return values from
functions, and members of unions.
So this means that printf("%s", x) has the same meaning whether x has type char* or void*, but it does not mean that you can do arithmetic on a void*.
Pointer arithmetic is not allowed on void* pointers.
cast it to a char pointer an increment your pointer forward x bytes ahead.
The C standard does not allow void pointer arithmetic. However, GNU C is allowed by considering the size of void is 1.
C11 standard §6.2.5
Paragraph - 19
The void type comprises an empty set of values; it is an incomplete
object type that cannot be completed.
Following program is working fine in GCC compiler.
#include<stdio.h>
int main()
{
int arr[2] = {1, 2};
void *ptr = &arr;
ptr = ptr + sizeof(int);
printf("%d\n", *(int *)ptr);
return 0;
}
May be other compilers generate an error.
You can't do pointer arithmetic on void * types, for exactly this reason!
Void pointers can point to any memory chunk. Hence the compiler does not know how many bytes to increment/decrement when we attempt pointer arithmetic on a void pointer. Therefore void pointers must be first typecast to a known type before they can be involved in any pointer arithmetic.
void *p = malloc(sizeof(char)*10);
p++; //compiler does how many where to pint the pointer after this increment operation
char * c = (char *)p;
c++; // compiler will increment the c by 1, since size of char is 1 byte.
You have to cast it to another type of pointer before doing pointer arithmetic.
[answer copied from a comment on a later, duplicate question]
Allowing arithmetic on void pointers is a controversial, nonstandard extension. If you're thinking in assembly language, where pointers are just addresses, arithmetic on void pointers makes sense, and adding 1 just adds 1. But if you're thinking in C terms, using C's model of pointer arithmetic, adding 1 to any pointer p actually adds sizeof(*p) to the address, and this is what you want pointer arithmetic to do, but since sizeof(void) is 0, it breaks down for void pointers.
If you're thinking in C terms you don't mind that it breaks down, and you don't mind inserting explicit casts to (char *) if that's the arithmetic you want. But if you're thinking in assembler you want it to just work, which is why the extension (though a departure from the proper definition of pointer arithmetic in C) is desirable in some circles, and provided by some compilers.
Pointer arithmetic is not allowed in the void pointer.
Reason: Pointer arithmetic is not the same as normal arithmetic, as it happens relative to the base address.
Solution: Use the type cast operator at the time of the arithmetic, this will make the base data type known for the expression doing the pointer arithmetic.
ex: point is the void pointer
*point=*point +1; //Not valid
*(int *)point= *(int *)point +1; //valid
Compiler knows by type cast. Given a void *x:
x+1 adds one byte to x, pointer goes to byte x+1
(int*)x+1 adds sizeof(int) bytes, pointer goes to byte x + sizeof(int)
(float*)x+1 addres sizeof(float) bytes,
etc.
Althought the first item is not portable and is against the Galateo of C/C++, it is nevertheless C-language-correct, meaning it will compile to something on most compilers possibly necessitating an appropriate flag (like -Wpointer-arith)
( I'm quoting ISO/IEC 9899:201x )
Here we see that, integer constant expression has an integer type:
6.6 Constant expressions
6.
An integer constant expression shall have integer type and shall only have operands
that are integer constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, _Alignof expressions, and floating
constants that are the immediate operands of casts. Cast operators in an integer constant
expression shall only convert arithmetic types to integer types, except as part of an
operand to the sizeof or _Alignof operator.
Then this holds true for any integer type:
6.2.6.2 Integer types
5.
The values of any padding bits are unspecified.A valid (non-trap) object representation
of a signed integer type where the sign bit is zero is a valid object representation of the
corresponding unsigned type, and shall represent the same value. For any integer type,
the object representation where all the bits are zero shall be a representation of the value
zero in that type.
Then we see that a null pointer constant is defined using an integer constant expression with the value 0.
6.3.2.3 Pointers
3.
An integer constant expression with the value 0, or such an expression cast to type
void*, is called a null pointer constant. If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
Therefore the null pointer constant must have all it's bits set to zero.
But there are many answers online and on StackOverflow that say that that isn't true.
I have a hard time believing them given the quoted parts.
( Please answer using references to the latest Standard )
Does Standard define null pointer constant to have all bits set to zero?
No, it doesn't. No paragraph of the C Standard impose such a requirement.
void *p = 0;
p for example is a null pointer, but the Standard does not require that the object p must have all bit set.
For information the c-faq website mentions some systems with non-zero null pointer representations here: http://c-faq.com/null/machexamp.html
No, NULL doesn't have to be all bits zero.
N1570 6.3.2.3 Pointers paragraph 3:
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. 66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
See my emphasis above: Integer 0 is converted if necessary, it doesn't have to have same bit presentation.
Note 66 on bottom of the page says:
66) The macro NULL is defined in (and other headers) as a null pointer constant; see 7.19.
Which leads us to a paragraph of that chapter:
The macros are
NULL
which expands to an implementation-defined null pointer constant
And what is more, on Annex J.3.12 (Portability issues, Implementation-defined behaviour, Library functions) says:
— The null pointer constant to which the macro NULL expands (7.19).
Asking about the representation of a null pointer constant is quite pointless.
A null pointer constant either has an integer type or the type void*. Whatever it is, it is a value. It is not an object. Values don't have a representation, only objects have. We can only talk about representations by taking the address of an object, casting it to char* or unsigned char*, and looking at the bytes. We can't do that with a null pointer constant. As soon as it is assigned to an object, it's not a null pointer constant anymore.
A major limitation of the C standard is that because the authors want to avoid prohibiting compilers from behaving in any ways that any production code anywhere might be relying upon, it fails to specify many things which programmers need to know. As a consequence, it is often necessary make assumptions about things which are not specified by the standard, but match the behaviors of common compilers. The fact that all of the bytes comprising a null pointer are zero is one such assumption.
Nothing in the C standard specifies anything about the bit-level representation of any pointer beyond the fact that every possible value of each and every data type--including pointers--will be representable as a sequence of char values(*). Nonetheless, on nearly all common platforms platforms zeroing out all the bytes associated with a structure is equivalent to setting all the members to the static default values for their types (the default value for a pointer being null). Further, code which uses calloc to receive a zeroed-out a block of RAM for a collection of structures will often be much faster than code which uses malloc and then has to manually clear every member of every structure, or which uses calloc and but still manually clears every non-integer member of every structure.
I would suggest therefore that in many cases it is perfectly reasonable to write code targeted for those dialects of C where null pointers are stored as all-bytes-zero, and have as a documented requirement that it will not work on dialects where that is not the case. Perhaps someday the ISO will provide a standard means by which such requirements could be documented in machine-readable form (such that every compiler would be required to either abide by a program's stated requirements or refuse compilation), but so far as I know none yet exists.
(*) From what I understand, there's some question as to whether compilers are required to honor that assumption anymore. Consider, for example:
int funcomp(int **pp, int **qq)
{
int *p,*q;
p = (int*)malloc(1234);
*pp = p;
free(p);
q = (int*)malloc(1234);
*qq = q;
*q = 1234;
if (!memcmp(pp, qq, sizeof p))
return *p;
return 0;
}
Following free(p) any attempt to access *p will be Undefined Behavior. Although there's a significant likelihood that q will receive the exact same bit pattern as p, nothing in the standard would require that p must be considered a valid alias for q even in that scenario. On the other hand, it also seems strange to say that two variables of the same type can hold the exact same bits without their contents being equivalent. Thus, while it's clearly natural that the function would be allowed to either return 0 along with values of *pp and *qq that don't compare bit-wise equal, or 1234 along with values of *pp and *qq that do compare bit-wise equal, the Standard would seem to allow the function to behave arbitrarily if both malloc happen to yield bitwise-equivalent values.
Recently I had code (in C) where I passed the address of an int to a function expecting a pointer to unsigned char. Is this not valid? Is this UB or what?
e.g.,
void f(unsigned char*p)
{
// do something
}
// Call it somewhere
int x = 0; // actually it was uint32_t if it makes difference
f(&x);
I did get a warning though ... Compiled in Xcode
int * and unsigned char * are not considered compatible types, so implicit conversion will issue a diagnostic. However, the standard does allow explicit casting between different pointers, subject to two rules (C11 section 6.3.2.3):
Converting a type "pointer to A" to type "pointer to B" and back to "pointer to A" shall result in the same original pointer. (i.e., if p is of type int *, then (int *)(double *)p will yield p)
Converting any pointer to a char * will point to the lowest-addressable byte of the object.
So, in your case, an explicit (unsigned char *) cast will yield a conforming program without any undefined behavior.
The cast is required, see C11 (n1570) 6.5.2.2 p.2:
[…] Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter.
This refers to the rules for assignment, the relevant part is (ibid. 6.5.16.1 p.1)
One of the following shall hold:
[…]
the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right.
[…]
And unsigned char isn’t compatible to int.
These rules both appear in a “constraint” section, where “shall” means that the compiler has to give a “diagnostic message” (cf. C11 5.1.1.3) and may stop compiling (or whatever, everything beyond that diagnostic is, strictly speaking, out of the scope of the C standard). Your code is an example of a constraint violation.
Other examples of constraint violations are calling a (prototyped and non-variadic) function with the wrong number of arguments, using bitwise operators on doubles, or redeclaring an identifier with an incompatible type in the same scope, ibid. 5.1.1.3 p.2:
Example
An implementation shall issue a diagnostic for the translation unit:
char i;
int i;
because in those cases where wording in this International Standard describes the behavior for a construct as being both a constraint error and resulting in undefined behavior, the constraint error shall be diagnosed.
Syntax violations are treated equally.
So, strictly speaking, your program is as invalid as
int foo(int);
int main() {
It's my birthday!
foo(0.5 ^ 42, 12);
}
which a conforming implementation very well may compile, maybe to a program having undefined behavior, as long as it gives at least one diagnostic (e.g. a warning).
For e.g. gcc, a warning is a diagnostic (you can turn syntax and constraint violations into errors with -pedantic-errors).
The term ill-formed may be used to refer to either a syntax or a constraint violation, the C standard doesn't use this term, but cf. C++11 (n3242):
1.3.9
ill-formed program
program that is not well formed
1.3.26
well-formed program
C++ program constructed according to the syntax rules, diagnosable semantic rules, and the One Definition Rule.
The language-lawyer attitude aside, your code will probably always either be not compiled at all (which should be reason enough to do the cast), or show the expected behavior.
C11, §6.5.2.2:
2 Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter.
§6.5.16.1 describes assignment in terms of a list of constraints, including
the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right
int and unsigned char are not compatible types, so the program is not well-formed and the Standard doesn't even guarantee that it will compile.
Although some would say "it is undefined behavior according to the standard", here is what happens de-facto (answering by an example):
Safe:
void f(char* p)
{
char r, w = 0;
r = p[0]; // read access
p[0] = w; // write access
}
...
int x = 0;
f((char*)&x); // the casting is just in order to emit the compilation warning
This code is safe as long as you access memory with p[i], where 0 <= i <= sizeof(int)-1.
Unsafe:
void f(int* p)
{
int r, w = 0;
r = p[0]; // read access
p[0] = w; // write access
}
...
char x[sizeof(int)] = {0};
f((int*)&x); // the casting is just in order to emit the compilation warning
This code is unsafe because although the allocated variable is large enough to accommodate an int, its address in memory is not necessarily a multiple of sizeof(int). As a result, unless the compiler (as well as the underlying HW architecture) supports unaligned load/store operations, a memory access violation will occur during runtime if the address of this variable in memory is indeed not properly aligned.