Dereferencing null pointer valid and working (not sizeof) [duplicate] - c

Code sample:
struct name
{
int a, b;
};
int main()
{
&(((struct name *)NULL)->b);
}
Does this cause undefined behaviour? We could debate whether it "dereferences null", however C11 doesn't define the term "dereference".
6.5.3.2/4 clearly says that using * on a null pointer causes undefined behaviour; however it doesn't say the same for -> and also it does not define a -> b as being (*a).b ; it has separate definitions for each operator.
The semantics of -> in 6.5.2.3/4 says:
A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.
However, NULL does not point to an object, so the second sentence seems underspecified.
Also relevant might be 6.5.3.2/1:
Constraints:
The operand of the unary & operator shall be either a function designator, the result of a
[] or unary * operator, or an lvalue that designates an object that is not a bit-field and is
not declared with the register storage-class specifier.
However I feel that the bolded text is defective and should read lvalue that potentially designates an object , as per 6.3.2.1/1 (definition of lvalue) -- C99 messed up the definition of lvalue, so C11 had to rewrite it and perhaps this section got missed.
6.3.2.1/1 does say:
An lvalue is an expression (with an object type other than void) that potentially
designates an object; if an lvalue does not designate an object when it is evaluated, the
behavior is undefined
however the & operator does evaluate its operand. (It doesn't access the stored value but that is different).
This long chain of reasoning seems to suggest that the code causes UB however it is fairly tenuous and it's not clear to me what the writers of the Standard intended. If in fact they intended anything, rather than leaving it up to us to debate :)

From a lawyer point of view, the expression &(((struct name *)NULL)->b); should lead to UB, since you could not find a path in which there would be no UB. IMHO the root cause is that at a moment you apply the -> operator on an expression that does not point to an object.
From a compiler point of view, assuming the compiler programmer was not overcomplicated, it is clear that the expression returns the same value as offsetof(name, b) would, and I'm pretty sure that provided it is compiled without error any existing compiler will give that result.
As written, we could not blame a compiler that would note that in the inner part you use operator -> on an expression than cannot point to an object (since it is null) and issue a warning or an error.
My conclusion is that until there is a special paragraph saying that provided it is only to take its address it is legal do dereference a null pointer, this expression is not legal C.

Yes, this use of -> has undefined behavior in the direct sense of the English term undefined.
The behavior is only defined if the first expression points to an object and not defined (=undefined) otherwise. In general you shouldn't search more in the term undefined, it means just that: the standard doesn't provide a meaning for your code. (Sometimes it points explicitly to such situations that it doesn't define, but this doesn't change the general meaning of the term.)
This is a slackness that is introduced to help compiler builders to deal with things. They may defined a behavior, even for the code that you are presenting. In particular, for a compiler implementation it is perfectly fine to use such code or similar for the offsetof macro. Making this code a constraint violation would block that path for compiler implementations.

Let's start with the indirection operator *:
6.5.3.2 p4:
The unary * operator denotes indirection. If the operand points to a function, the result is
a function designator; if it points to an object, the result is an lvalue designating the
object. If the operand has type "pointer to type", the result has type "type". If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined. 102)
*E, where E is a null pointer, is undefined behavior.
There is a footnote that states:
102) Thus, &*E is equivalent to E (even if E is a null pointer), and &(E1[E2]) to ((E1)+(E2)). It is
always true that if E is a function designator or an lvalue that is a valid operand of the unary &
operator, *&E is a function designator or an lvalue equal to E. If *P is an lvalue and T is the name of
an object pointer type, *(T)P is an lvalue that has a type compatible with that to which T points.
Which means that &*E, where E is NULL, is defined, but the question is whether the same is true for &(*E).m, where E is a null pointer and its type is a struct that has a member m?
C Standard doesn't define that behavior.
If it were defined, new problems would arise, one of which is listed below. C Standard is correct to keep it undefined, and provides a macro offsetof that handles the problem internally.
6.3.2.3 Pointers
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. 66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
This means that an integer constant expression with the value 0 is converted to a null pointer constant.
But the value of a null pointer constant is not defined as 0. The value is implementation defined.
7.19 Common definitions
The macros are
NULL
which expands to an implementation-defined null pointer constant
This means C allows an implementation where the null pointer will have a value where all bits are set and using member access on that value will result in an overflow which is undefined behavior
Another problem is how do you evaluate &(*E).m? Do the brackets apply and is * evaluated first. Keeping it undefined solves this problem.

First, let's establish that we need a pointer to an object:
6.5.2.3 Structure and union members
4 A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.96) If the first expression is a pointer to
a qualified type, the result has the so-qualified version of the type of the designated
member.
Unfortunately, no null pointer ever points to an object.
6.3.2.3 Pointers
3 An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
Result: Undefined Behavior.
As a side-note, some other things to chew over:
6.3.2.3 Pointers
4 Conversion of a null pointer to another pointer type yields a null pointer of that type.
Any two null pointers shall compare equal.
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.67)
6 Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
67) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.
So even if the UB should happen to be benign this time, it might still result in some totally unexpected number.

Nothing in the C standard would impose any requirements on what a system could do with the expression. It would, when the standard was written, have been perfectly reasonable for it to to cause the following sequence of events at runtime:
Code loads a null pointer into the addressing unit
Code asks the addressing unit to add the offset of field b.
The addressing unit trigger a trap when attempting to add an integer to a null pointer (which should for robustness be a run-time trap, even though many systems don't catch it)
The system starts executing essentially random code after being dispatched through a trap vector that was never set because code to set it would have wasted been a waste of memory, as addressing traps shouldn't occur.
The very essence of what Undefined Behavior meant at the time.
Note that most of the compilers that have appeared since the early days of C would regard the address of a member of an object located at a constant address as being a compile-time constant, but I don't think such behavior was mandated then, nor has anything been added to the standard which would mandate that compile-time address calculations involving null pointers be defined in cases where run-time calculations would not.

No. Let's take this apart:
&(((struct name *)NULL)->b);
is the same as:
struct name * ptr = NULL;
&(ptr->b);
The first line is obviously valid and well defined.
In the second line, we calculate the address of a field relative to the address 0x0 which is perfectly legal as well. The Amiga, for example, had the pointer to the kernel in the address 0x4. So you could use a method like this to call kernel functions.
In fact, the same approach is used on the C macro offsetof (wikipedia):
#define offsetof(st, m) ((size_t)(&((st *)0)->m))
So the confusion here revolves around the fact that NULL pointers are scary. But from a compiler and standard point of view, the expression is legal in C (C++ is a different beast since you can overload the & operator).

Related

What does this statement about pointer operators mean? "& can be used only with a variable, * can be used with variable, constant or expression."

'Address of' operator gives memory location of variables. So it can be used with variables.
I tried compiling this code.
#include<stdio.h>
int main()
{
int i=889,*j,*k;
j=&889;
k=*6422296;
printf("%d\n",j);
return 0;
}
It showed this error error: lvalue required as unary '&' operand for j=&889.
And I was expecting this error: invalid type argument of unary '*' (have 'int')| for k=*6422296.
6422296 is the memory location of variable i.
Can someone give examples of when '*' is used with constants and expressions?
P.S:- I have not yet seen any need for this But....
All constants in a program are also assigned some memory. Is it possible to determine their address with &? (Just wondering).
An expression that is a value (rvalue in C idom) may not represent a variable with a defined lifetime and for that reason you cannot take its address.
In the opposite direction, it is legal (and common) to dereference an expression:
int a[] = {1,2,3}
int *pt = a + 1; // pt points to the second element of the array
inf first = *(a - 1); // perfectly legal C
Dereferencing a constant is not common in C code. It only makes sense when dealing directly with the hardware, that is in kernel mode, or when programming for some embedded systems. Then you can have special registers that are mapped at well known addresses.
first_byte_of_screen = *((char *) 0xC0000); // may remember things to old MS/DOS programmers
But best practices would recommed to define a constant
#define SCREEN ((unsigned char *) 0xC0000)
first_byte = *SCREEN; // or even SCREEN[0] because it is the same thing
k=*6422296 means go to address number 6422296, read the content inside it and assign it to k, which is completely valid.
j=&889 means get me the address of 889, 889 is an rvalue, it's a temporary that theoretically only exists temporarily in the CPU registers, and might never even get stored in the memory, so asking for it's memory address makes no sense.
All constants in a program are also assigned some memory.
That's not necessarily the case for numeric literals; they're often hardcoded into the machine code instructions with no storage allocated for them.
A good rule of thumb is that anything that can't be the target of an assignment (such as a numeric literal) cannot be the operand of the unary & operator1.
6.5.3.2 Address and indirection operators
Constraints
1 The operand of the unary & operator shall be either a function designator, the result of a
[] or unary * operator, or an lvalue that designates an object that is not a bit-field and is
not declared with the register storage-class specifier.
2 The operand of the unary * operator shall have pointer type.
C 2011 Online Draft
*6422296 "works" (as in, doesn't result in a diagnostic from the compiler) because integer expressions can be converted to pointers:
6.3.2.3 Pointers
...
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.67)
67) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to
be consistent with the addressing structure of the execution environment.
Like all rules of thumb, there are exceptions; array expressions are lvalues, but cannot be the target of an assignment; if you declare an array like int a[10];, then you can't reassign a such as a = some_other_array_expression ;. Such lvalues are known as non-modifiable lvalues.

Does the C standard permit assigning an arbitrary value to a pointer and incrementing it?

Is the behaviour of this code well defined?
#include <stdio.h>
#include <stdint.h>
int main(void)
{
void *ptr = (char *)0x01;
size_t val;
ptr = (char *)ptr + 1;
val = (size_t)(uintptr_t)ptr;
printf("%zu\n", val);
return 0;
}
I mean, can we assign some fixed number to a pointer and increment it even if it is pointing to some random address? (I know that you can not dereference it)
The assignment:
void *ptr = (char *)0x01;
Is implementation defined behavior because it is converting an integer to a pointer. This is detailed in section 6.3.2.3 of the C standard regarding Pointers:
5 An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined,
might not be correctly aligned, might not point to an entity
of the referenced type, and might be a trap representation.
As for the subsequent pointer arithmetic:
ptr = (char *)ptr + 1;
This is dependent on a few things.
First, the current value of ptr may be a trap representation as per 6.3.2.3 above. If it is, the behavior is undefined.
Next is the question of whether 0x1 points to a valid object. Adding a pointer and an integer is only valid if both the pointer operand and the result point to elements of an array object (a single object counts as an array of size 1) or one element past the array object. This is detailed in section 6.5.6:
7 For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a
pointer to the first element of an array of length one with the type
of the object as its element type
8 When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer
operand. If the pointer operand points to an element of an array
object, and the array is large enough, the result points to an element
offset from the original element such that the difference of the
subscripts of the resulting and original array elements equals the
integer expression. In other words, if the expression P points to the
i-th element of an array object, the expressions (P)+N (equivalently, N+(P) ) and (P)-N (where N has the value n ) point to,
respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an
array object, the expression (P)+1 points one past the last element of
the array object, and if the expression Q points one past the
last element of an array object, the expression (Q)-1 points to
the last element of the array object. If both the pointer
operand and the result point to elements of the same array
object, or one past the last element of the array object, the
evaluation shall not produce an overflow; otherwise, the behavior is
undefined. If the result points one past the last element of the
array object, it shall not be used as the operand of a unary
* operator that is evaluated.
On a hosted implementation the value 0x1 almost certainly does not point to a valid object, in which case the addition is undefined. An embedded implementation could however support setting pointers to specific values, and if so it could be the case that 0x1 does in fact point to a valid object. If so, the behavior is well defined, otherwise it is undefined.
No, the behaviour of this program is undefined. Once an undefined construct is reached in a program, any future behaviour is undefined. Paradoxically, any past behaviour is undefined too.
The result of void *ptr = (char*)0x01; is implementation-defined, due in part to the fact that a char can have a trap representation.
But the behaviour of the ensuing pointer arithmetic in the statement ptr = (char *)ptr + 1; is undefined. This is because pointer arithmetic is only valid within arrays including one past the end of the array. For this purpose an object is an array of length one.
Yes, the code is well-defined as implementation-defined. It is not undefined. See ISO/IEC 9899:2011 [6.3.2.3]/5 and note 67.
The C language was originally created as a system programming language. Systems programming required manipulating memory-mapped hardware, requiring that you would stuff hard-coded addresses into pointers, sometimes increment those pointers, and read and write data from and to the resulting address. To that end, assigning and integer to a pointer and manipulating that pointer using arithmetic is well defined by the language. By making it implementation-defined, what the language allows is that all kinds of things can happen: from the classic halt-and-catch-fire to raising a bus error when trying to dereference an odd address.
The difference between undefined behaviour and implementation-defined behaviour is basically undefined behaviour means "don't do that, we don't know what will happen" and implementation-defined behaviour means "it's OK to go ahead and do that, it's up to you to know what will happen."
It is undefined behavior.
From N1570 (emphasis added):
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
If the value is a trap representation, reading it is undefined behavior:
Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.) Such a representation is called a trap representation.
And
An identifier is a primary expression, provided it has been declared as designating an object (in which case it is an lvalue) or a function (in which case it is a function designator).
Therefore, the line void *ptr = (char *)0x01; is already potentially undefined behavior, on an implementation where (char*)0x01 or (void*)(char*)0x01 is a trap representation. The left-hand side is an lvalue expression that does not have character type and reads a trap representation.
On some hardware, loading an invalid pointer into a machine register could crash the program, so this was a forced move by the standards committee.
The Standard does not require that implementations process integer-to-pointer conversions in a meaningful fashion for any particular integer values, or even for any possible integer values other than Null Pointer Constants. The only thing it guarantees about such conversions is that a program which stores the result of such a conversion directly into an object of suitable pointer type and does nothing with it except examine the bytes of that object will, at worst, see Unspecified values. While the behavior of converting an integer to a pointer is Implementation-Defined, nothing would forbid any implementation (no matter what it actually does with such conversions!) from specifying that some (or even all) of the bytes of the representation having Unspecified values, and specifying that some (or even all) integer values may behave as though they yield trap representations.
The only reasons the Standard says anything at all about integer-to-pointer conversions are that:
In some implementations, the construct is meaningful, and some programs for those implementations require it.
The authors of the Standard did not like the idea of a construct that was used on some implementations would represent a constraint violation on others.
It would have been odd for the Standard to describe a construct but then specify that it has Undefined Behavior in all cases.
Personally, I think the Standard should have allowed implementations to treat integer-to-pointer conversions as constraint violations if they don't define any situations where they would be useful, rather than require that compilers accept the meaningless code, but that wasn't the philosophy at the time.
I think it would be simplest to simply say that any operation involving integer-to-pointer conversions with anything other than intptr_t or uintptr_t values received from pointer-to-integer conversions invokes Undefined Behavior, but then note that it is common for quality implementations intended for low-level programming to process Undefined Behavior "in a documented manner characteristic of the environment". The Standard doesn't specify when implementations should process programs that invoke UB in that fashion but instead treats it as a Quality of Implementation issue.
If an implementation specifies that integer-to-pointer conversions operate in a fashion that would define the behavior of
char *p = (char*)1;
p++;
as equivalent to "char p = (char)2;", then the implementation should be expected to work that way. On the other hand, an implementation could define the behavior of integer-to-pointer conversion in such a way that even:
char *p = (char*)1;
char *q = p; // Not doing any arithmetic here--just a simple assignment
would release nasal demons. On most platforms, a compiler where arithmetic on pointers produced by integer-to-pointer conversions behaved oddly would not be viewed as a high-quality implementation suitable for low-level programming. A programmer that is not intending to target any other kind of implementations could thus expect such constructs to behave usefully on compilers for which the code was suitable, even though the Standard does not require it.

Is there a reason to use C language pointers like this &(*foo)?

I use to code pointers like this when I need to change the original memory address of a pointer.
Example:
static void get_line_func(struct data_s *data,
char **begin)
{
data->slot_number = strsep(&(*(begin)), "/");
data->protocol = *begin;
strsep(&(*begin), ">");
data->service_name = strsep(&(*begin), "\n");
}
I mean, isn't &(*foo) == foo?
There is no reason to do that directly. However, the combination can arise in machine-generated code (such as the expansion of a preprocessor macro).
For instance, suppose we have a macro do_something_to(obj) which expects the argument expression obj to designate an object. Suppose somewhere in its expansion, this macro takes the address of the object using &(obj). Now suppose we would like to apply the macro to an object which we only hold via a pointer ptr. To designate the object, we must use the expression *ptr so that we use the macro as do_something_to(*ptr). That of course means that&(*ptr) now occurs in the program.
The status of the expression &*ptr has changed over the years. I seem to remember that in the ANSI C 89 / ISO C90 dialect, the expression produced undefined behavior if ptr was an invalid pointer.
In ISO C11 the following is spelled out (and I believe nearly the same text is in C99), requiring &* not to dereference the pointer: "if the operand [of the address-of unary & operator] is the result of a unary * operator,
neither that operator nor the & operator is evaluated and the result is as if both were
omitted, except that the constraints on the operators still apply and the result is not an lvalue". Thus in the modern C dialect, the expression &*ptr doesn't dereference ptr, hence has defined behavior even if that value is null.
What does that mean? "constraints still apply" basically means that it still has to type check. Just because &*P doesn't dereference P doesn't mean that P can be a double or a struct; it has to be a pointer.
The "result is not an lvalue" part is potentially useful. If we have a pointer P which is an value, if we wrap it in the expression &*P, we obtain the same pointer value as a non-lvalue. There are other ways to obtain the value of P as a non-lvalue, but &*P is a "code golfed" solution to the problem requiring only two characters, and having the property that it will remain correct even if P changes from one pointer type to another.

Cancellation of *& in ANSI C

With some friends we discuss about the corectness of this simple following code in ANSI C.
#include <stdio.h>
int main(void)
{
int a=2;
printf("%d", *&a);
return 0;
}
The main discuss is about *&. If * access a memory location (aka pointer) and & gives memory address of some variable... I think * tries to access a int value as memory adress ( that obviously don't work), but my friend says *& it cancels automatically ( or interpret as &*). We tested it with GCC 4.8.1 (MinGW) and the code avobe worked it well... I think was not right.
What do you think about? Think there's a bad workaround here ( or this is just stupidity?). Thanks in advice :)
a is an lvalue: the variable a.
&a is a pointer to this lvalue.
*&a is the lvalue being pointed to by &a — that is, it is a.
Technically speaking, *&a and a are not completely equivalent in all cases, in that *&a is not permitted in all circumstances where a is (for example, if a is declared as register), but in your example, they are completely the same.
There is an interesting excerpt from C standard (as a footnote), namely C11 §6.5.3.2/4 (footnote 102, emphasis mine), which discusses this aspect directly:
Thus, &*E is equivalent to E (even if E is a null pointer), and
&(E1[E2]) to ((E1)+(E2)). It is always true that if E is a function
designator or an lvalue that is a valid operand of the unary &
operator, *&E is a function designator or an lvalue equal to E. If *P
is an lvalue and T is the name of an object pointer type, *(T)P is an
lvalue that has a type compatible with that to which T points.
In your case a is a (modifiable) lvalue, that reflects to E symbol from standard and it's valid operand of the & operator as standard requires, thus *&a (i.e. *&E) is an lvalue equal to a.
Note that you can't take address of register storage class variable (as pointed by #Deduplicator), so it does not qualify into such reduction (that is, even as modifiable lvalue).
So long as *&a is meaningful, then *&a and a are the same thing and are interchangeable.
In general, *&a is the same as a.
Still, there are corner-cases:
*& may be invalid because &a is not allowed, as it is not an lvalue or it is of register-storage-class.
Using a may be Undefined Behavior, because a is an uninitialized memory-less variable (register or auto-storage-class which might have been declared register (address was never taken)).
Applying that to your case, leaving out *& does not change anything.

Can a conforming C implementation #define NULL to be something wacky

I'm asking because of the discussion that's been provoked in this thread.
Trying to have a serious back-and-forth discussion using comments under other people's replies is not easy or fun. So I'd like to hear what our C experts think without being restricted to 500 characters at a time.
The C standard has precious few words to say about NULL and null pointer constants. There's only two relevant sections that I can find. First:
3.2.2.3 Pointers
An integral constant expression with
the value 0, or such an expression
cast to type void * , is called a null
pointer constant. If a null pointer
constant is assigned to or compared
for equality to a pointer, the
constant is converted to a pointer of
that type. Such a pointer, called a
null pointer, is guaranteed to compare
unequal to a pointer to any object or
function.
and second:
4.1.5 Common definitions
The macros are
NULL
which expands to an implementation-defined null pointer constant;
The question is, can NULL expand to an implementation-defined null pointer constant that is different from the ones enumerated in 3.2.2.3?
In particular, could it be defined as:
#define NULL __builtin_magic_null_pointer
Or even:
#define NULL ((void*)-1)
My reading of 3.2.2.3 is that it specifies that an integral constant expression of 0, and an integral constant expression of 0 cast to type void* must be among the forms of null pointer constant that the implementation recognizes, but that it isn't meant to be an exhaustive list. I believe that the implementation is free to recognize other source constructs as null pointer constants, so long as no other rules are broken.
So for example, it is provable that
#define NULL (-1)
is not a legal definition, because in
if (NULL)
do_stuff();
do_stuff() must not be called, whereas with
if (-1)
do_stuff();
do_stuff() must be called; since they are equivalent, this cannot be a legal definition of NULL.
But the standard says that integer-to-pointer conversions (and vice-versa) are implementation-defined, therefore it could define the conversion of -1 to a pointer as a conversion that produces a null pointer. In which case
if ((void*)-1)
would evaluate to false, and all would be well.
So what do other people think?
I'd ask for everybody to especially keep in mind the "as-if" rule described in 2.1.2.3 Program execution. It's huge and somewhat roundabout, so I won't paste it here, but it essentially says that an implementation merely has to produce the same observable side-effects as are required of the abstract machine described by the standard. It says that any optimizations, transformations, or whatever else the compiler wants to do to your program are perfectly legal so long as the observable side-effects of the program aren't changed by them.
So if you are looking to prove that a particular definition of NULL cannot be legal, you'll need to come up with a program that can prove it. Either one like mine that blatantly breaks other clauses in the standard, or one that can legally detect whatever magic the compiler has to do to make the strange NULL definition work.
Steve Jessop found an example of way for a program to detect that NULL isn't defined to be one of the two forms of null pointer constants in 3.2.2.3, which is to stringize the constant:
#define stringize_helper(x) #x
#define stringize(x) stringize_helper(x)
Using this macro, one could
puts(stringize(NULL));
and "detect" that NULL does not expand to one of the forms in 3.2.2.3. Is that enough to render other definitions illegal? I just don't know.
Thanks!
In the C99 standard, §7.17.3 states that NULL “expands to an implementation defined null pointer constant”. Meanwhile §6.3.2.3.3 defines null pointer constant as “an integer constant expression with the value 0, or such an expression cast to type void *”. As there is no other definition for a null pointer constant, a conforming definition of NULL must expand to an integer constant expression with the value zero (or this cast to void *).
Further quoting from the C FAQ question 5.5 (emphasis added):
Section 4.1.5 of the C Standard states that NULL “expands to an implementation-defined null pointer constant,” which means that the implementation gets to choose which form of 0 to use and whether to use a `void *` cast; see questions 5.6 and 5.7. “Implementation-defined” here does not mean that NULL might be #defined to match some implementation-specific nonzero internal null pointer value.
It makes perfect sense; since the standard requires a zero integer constant in pointer contexts to compile into a null pointer (regardless of whether or not the machine's internal representation of that has a value of zero), the case where NULL is defined as zero must be handled anyhow. The programmer is not required to type NULL to obtain null pointers; it's just a stylistic convention (and may help catch errors e.g. when a NULL defined as (void *)0 is used in a non-pointer context).
Edit: One source of confusion here seems to be the concise language used by the standard, i.e. it does not explicitly say that there is no other value that might be considered a null pointer constant. However, when the standard says “…is called a null pointer constant”, it means that exactly the given definitions are called null pointer constants. It does not need to explicitly follow every definition by stating what is non-conforming when (by definition) the standard defines what is conforming.
This expands a bit on some of the other answers and makes some points that the others missed.
Citations are to N1570 a draft of the 2011 ISO C standard. I don't believe there have been any significant changes in this area since the 1989 ANSI C standard (which is equivalent to the 1990 ISO C standard). A reference like "7.19p3" refers to subsection 7.19, paragraph 3. (The citations in the question seem to be to the 1989 ANSI standard, which described the language in section 3 and the library in section 4. All editions of the ISO standard describe the language in section 6 and the library in section 7.)
7.19p3 requires the macro NULL to expand to "an implementation-defined null pointer constant".
6.3.2.3p3 says:
An integer constant expression with the value 0, or such an expression
cast to type void *, is called a null pointer constant.
Since null pointer constant is in italics, that's the definition of the term (3p1 specifies that convention) -- which implies that nothing other that what's specified there can be a null pointer constant. (The standard doesn't always strictly follow that convention for its definitions, but there's no problem assuming that it does so in this case.)
So if we want "something wacky", we need to look at what can be an "integer constant expression".
The phrase null pointer constant needs to be taken as a single term, not as a phrase whose meaning depends on its constituent words. In particular, the integer constant 0 is a null pointer constant, regardless of the context in which it appears; it needn't result in a null pointer value, and it's of type int, not of any pointer type.
An "integer constant expression with the value 0" can be any of a number of things (infinitely many if we ignore capacity limits). A literal 0 is the most obvious. Other possibilities are 0x0, 00000, 1-1, '\0', and '-'-'-'. (It's not 100% clear from the wording whether "the value 0" refers specifically to that value of type int, but I think the consensus is that 0L is also a valid null pointer constant.)
Another relevant clause is 6.6p10:
An implementation may accept other forms of constant expressions.
It's not entirely clear (to me) how much latitude this is intended to allow. For example, a compiler might support binary literals as an extension; then 0b0 would be a valid null pointer constant. It might also allow C++-style references to const objects, so that given
const int x = 0;
a reference to x could be a constant expression (it isn't in standard C).
So it's clear that 0 is a null pointer constant, and that it's a valid definition for the NULL macro.
It's equally clear that (void*)0 is a null pointer constant, but it's not a valid definition for NULL, because of 7.1.2p5:
Any definition of an object-like macro described in this clause shall
expand to code that is fully protected by parentheses where necessary,
so that it groups in an arbitrary expression as if it were a single
identifier.
If NULL expanded to (void*)0, then the expression sizeof NULL would be a syntax error.
So what about ((void*)0)? Well, I'm 99.9% sure that it's intended to be a valid definition for NULL, but 6.5.1, which describes parenthesized expressions, says:
A parenthesized expression is a primary expression. Its type and value
are identical to those of the unparenthesized expression. It is an
lvalue, a function designator, or a void expression if the
unparenthesized expression is, respectively, an lvalue, a function
designator, or a void expression.
It doesn't say that a parenthesized null pointer constant is a null pointer constant. Still, as far as I know all C compilers reasonably assume that a parenthesized null pointer constant is a null pointer constant, making ((void*)0) a valid definition for NULL.
What if a null pointer is represented not as all-bits-zero, but as some other bit pattern, for example, one equivalent to 0xFFFFFFFF. Then (void*)0xFFFFFFFF, even if it happens to evaluate to a null pointer is not a null pointer constant, simply because it doesn't satisfy the definition of that term.
So what other variations are permitted by the standard?
Since implementations may accept other forms of constant expression, a compiler could define __null as a constant expression of type int with the value 0, allowing either __null or ((void*)__null) as the definition of NULL. It could make also __null itself a constant of pointer type, but it couldn't then use __null as the definition of NULL, since it doesn't satisfy the definition in 6.3.2.3p3.
An implementation could accomplish the same thing, with no compiler magic, like this:
enum { __null };
#define NULL __null
Here __null is an integer constant expression of type int with the value 0, so it can be used anywhere a constant 0 can be used.
The advantage defining NULL in terms of a symbol like __null is that the compiler could then issue a (perhaps optional) warning if NULL is used in a non-pointer constant. For example, this:
char c = NULL; /* PLEASE DON'T DO THIS */
is perfectly legal if NULL happens to be defined as 0; expanding NULL to some recognizable token like __null would make it easier for the compiler to detect this questionable construct.
Ages later, but no one brought up this point: Suppose that the implementation does in fact choose to use
#define NULL __builtin_null
My reading of C99 is that that's fine as long as the special keyword __builtin_null behaves as-if it were either "an integral constant expression with value 0" or "an integral constant expression with value 0, cast to void *". In particular, if the implementation chooses the former of those options, then
int x = __builtin_null;
int y = __builtin_null + 1;
is a valid translation unit, setting x and y to the integer values 0 and 1 respectively. If it chooses the latter, of course, both are constraint violations (6.5.16.1, 6.5.6 respectively; void * is not a "pointer to an object type" per 6.2.5p19; 6.7.8p11 applies the constraints for assignment to initialization). And I don't offhand see why an implementation would do this if not to provide better diagnostics for "misuse" of NULL, so it seems likely that it would take the option that invalidated more code.
Well, I've found a way to prove that
#define NULL ((void*)-1)
is not a legal definition of NULL.
int main(void)
{
void (*fp)() = NULL;
}
Initializing a function pointer with NULL is legal and correct, whereas...
int main(void)
{
void (*fp)() = (void*)-1;
}
...is a constraint violation that requires a diagnostic. So that's out.
But the __builtin_magic_null_pointer definition of NULL wouldn't suffer that problem. I'd still like to know if anybody can come up with a reason why it can't be.
An integral constant expression with
the value 0, or such an expression
cast to type void * , is called a
null pointer constant.
NULL which expands to an
implementation-defined null pointer
constant;
therefore either
NULL == 0
or
NULL == (void *)0
The null pointer constant must evaluate 0, otherwise expressions like !ptr would not work as expected.
The NULL macro expands to a 0-valued expression; AFAIK, it always has.

Resources