Related
According to §6.3.2.3 ¶3 of the C11 standard, a null pointer constant in C can be defined by an implementation as either the integer constant expression 0 or such an expression cast to void *. In C the null pointer constant is defined by the NULL macro.
My implementation (GCC 9.4.0) defines NULL in stddef.h in the following ways:
#define NULL ((void *)0)
#define NULL 0
Why are both of the above expressions considered semantically equivalent in the context of NULL? More specifically, why do there exist two ways of expressing the same concept rather than one?
Let's consider this example code:
#include <stddef.h>
int *f(void) { return NULL; }
int g(int x) { return x == NULL ? 3 : 4; }
We want f to compile without warnings, and we want g to cause an error or a warning (because an int variable x was compared to a pointer).
In C, #define NULL ((void*)0) gives us both (GCC warning for g, clean compile for f).
However, in C++, #define NULL ((void*)0) causes a compile error for f. Thus, to make it compile in C++, <stddef.h> has #define NULL 0 for C++ only (not for C). Unfortunately, this also prevents the warning from being reported for g. To fix that, C++11 uses built-in nullptr instead of NULL, and with that, C++ compilers report an error for g, and they compile f cleanly.
((void *)0) has stronger typing and could lead to better compiler or static analyser diagnostics. For example since implicit conversions between pointers and plain integers aren't allowed in standard C.
0 is likely allowed for historical reasons, from a pre-standard time when everything in C was pretty much just integers and wild implicit conversions between pointers and integers were allowed, though possibly resulting in undefined behavior.
Ancient K&R 1st edition provides some insight (7.14 the assignment operator):
The compilers currently allow a pointer to be assigned to an integer, an integer to a pointer, and a pointer to a pointer of another type. The assignment is a pure copy operation, with no conversion. This usage is nonportable, and may produce pointers which cause addressing exceptions when used. However, it is guaranteed that assignment of the constant 0 to a pointer will produce a null pointer distinguishable from a pointer to any object.
Few things in C are more confusing than null pointers. The C FAQ list devotes an entire section to the topic, and to the myriad misunderstandings that eternally arise. And we can see that those misunderstandings never go away, as some of them are being recycled even in this thread, in 2022.
The basic facts are these:
C has the concept of a null pointer, a distinguished pointer value which points definitively nowhere.
The source code construct by which a null pointer is requested — a null pointer constant — fundamentally involves the token 0.
Because the token 0 has other uses, ambiguity (not to mention confusion) is possible.
To help reduce the confusion and ambiguity, for many years the token 0 as a null pointer constant has been hidden behind the preprocessor macro NULL.
To provide some type safety and further reduce errors, it's attractive to have the macro definition of NULL include a pointer cast.
However, and most unfortunately, enough confusion crept in along the way that properly mitigating it all has become almost impossible. In particular, there is so very much extant code that says things like strbuf[len] = NULL; (in an obvious but basically wrong attempt to null-terminate a string) that it is believed in some circles to be impossible to actually define NULL with an expansion including either the explicit cast or the hypothetical future (or extant in C++) new keyword nullptr.
See also Why not call nullptr NULL?
Footnote (call this point 3½): It's also possible for a null pointer — despite being represented in C source code as an integer constant 0 — to have an internal value that is not all-bits-0. This fact adds massively to the confusion whenever this topic is discussed, but it doesn't fundamentally change the definition.
There is just one way to express NULL in C, it's a single 4-character token.
But hold on, when going into its definition it gets more interesting.
NULL has to be defined as a null pointer constant, meaning an integer constant with value 0 or such cast to void*.
As an integer constant is just an expression of integer type with a few restrictions to guarantee static evaluation, there are infinite possibilities for any wanted value.
Of all those possibilities, only an integer literal with value 0 is also a null pointer constant in C++, for what it's worth.
The reason for such variation is history and precedent (everyone did it differently, void* was late to the party, and existing code/implementations trumps all), reinforced with backwards-compatibility which preserves it.
6.3.2.3 Pointers
[...]
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.
67) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
[...]
6.6 Constant expressions
[...]
Description
2 A constant expression can be evaluated during translation rather than runtime, and accordingly may be used in any place that a constant may be.
Constraints
3 Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is not evaluated.117)
4 Each constant expression shall evaluate to a constant that is in the range of representable values for its type.
Semantics
5 An expression that evaluates to a constant is required in several contexts. If a floating expression is evaluated in the translation environment, the arithmetic range and precision shall be at least as
great as if the expression were being evaluated in the execution environment.118)
6 An integer constant expression119) shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, _Alignof expressions, and floating constants that are the immediate operands of casts.
Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof or _Alignof operator.
C was originally developed on machines where a null pointer constant and the integer constant 0 had the same representation. Later, some vendors ported the language to mainframes where a different special value triggered a hardware trap when used as a pointer, and wanted to use that value for NULL. These companies discovered that so much existing code type-punned between integers and pointers, they had to recognize 0 as a special constant that could implicitly convert to a null pointer constant. ANSI C incorporated this behavior, at the same time as they introduced the void* as a pointer that implicitly converts to any type of object pointer. This allowed NULL to be used as a safer alternative to 0.
I’ve seen some code that (possibly tongue-in-cheek) detected one of these machines by testing if ((char*)1 == 0).
why do there exist two ways of expressing the same concept rather than one?
History.
NULL started as 0 and later better programming practices encouraged ((void *)0).
First, there are more than 2 ways:
#define NULL ((void *)0)
#define NULL 0
#define NULL 0L
#define NULL 0LL
#define NULL 0u
...
Before void * (Pre C89)
Before void * and void existed, #define NULL some_integer_type_of_zero was used.
It was useful to have the size of that integer type to match the size of object pointers. Consider the below. With 16-bit int and 32-bit long, it is useful for the type of zero used to match the width of an object pointer.
Consider printing pointers.
double x;
printf("%ld\n", &x); // On systems where an object pointer was same size as long
printf("%ld\n", NULL);// Would like to use the same specifier for NULL
With 32-bit object pointers, #define NULL 0L is better.
double x;
printf("%d\n", &x); // On systems where an object pointer was same size as int
printf("%d\n", NULL);// Would like to use the same specifier for NULL
With 16-bit object pointers, #define NULL 0 is better.
C89
After the birth of void, void *, it is natural to have the null pointer constant to be a pointer type. This allowed the bit pattern of (void*)0) to be non-zero. This was useful in some architectures.
printf("%p\n", NULL);
With 16-bit object pointers, #define NULL ((void*)0) works above.
With 32-bit object pointers, #define NULL ((void*)0) works.
With 64-bit object pointers, #define NULL ((void*)0) works.
With 16-bit int, #define NULL ((void*)0) works.
With 32-bit int, #define NULL ((void*)0) works.
We now have independence of the int/long/object pointer size. ((void*)0) works in all cases.
Using #define NULL 0 creates issues when passing NULL as a ... argument, hence the irksome need to do printf("%p\n", (void*)NULL); for highly portable code.
With #define NULL ((void*)0), code like char n = NULL; will more likely raise a warning, unlike ``#define NULL 0`
C99
With the advent of _Generic, we can distinguish, for better or worse, NULL as a void *, int, long, ...
According to §6.3.2.3 ¶3 of the C11 standard, a null pointer constant in C can be defined by an implementation as either the integer constant expression 0 or such an expression cast to void *.
No, that a misleading paraphrase of the language spec. The actual language of the cited paragraph is
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. [...]
Implementations don't get to choose between those alternatives. Both are forms of a null pointer constant in the C language. They can be used interchangeably for the purpose.
Moreover, not only the specific integer constant expression 0 can serve in this role, but any integer constant expression with value 0 can do. For example, 1 + 2 + 3 + 4 - 10 is such an expression.
Additionally, do not confuse null pointer constants generally with the macro NULL. The latter is defined by conforming implementations to expand to a null pointer constant, but that doesn't mean that the replacement text of NULL is the only null pointer constant.
My implementation (GCC 9.4.0) defines NULL in stddef.h in the
following ways:
#define NULL ((void *)0)
#define NULL 0
Not both at the same time, of course.
Why are both of the above expressions considered semantically
equivalent in the context of NULL?
Again with the reversal. It's not "the context of NULL". It's pointer context. There is nothing particularly special about the macro NULL itself to distinguish contexts in which it appears from contexts where its replacement text appears directly.
And I guess you're asking for rationale for paragraph 6.3.2.3/3, as opposed to "because 6.3.2.3/3". There is no published rationale for C11. There is one for C99, which largely serves for C90 as well, but it does not address this issue.
It should be noted, however, that void (and therefore void *) was an invention of the committee that developed the original C language specification ("ANSI C" / C89 / C90). There was no possibility of an "integer constant expression cast to type void *" before then.
More specifically, why do there
exist two ways of expressing the same concept rather than one?
Are there, really?
If we accept an integer constant expression with value 0 as a null pointer constant (a source-code entity), and we want to convert it to a runtime null pointer value, then which pointer type do we choose? Pointers to different object types do not necessarily have the same representation, so this actually matters. Type void * seems the natural choice to me, and that's consistent with the fact that, alone of all pointer types, void * can be converted to other object pointer types without a cast.
But then, in a context where 0 is being interpreted as a null pointer constant, casting it to void * is a no-op, so (void *) 0 expresses exactly the same thing as 0 in such a context.
What's really going on here
At the time the ANSI committee was working, many existing C implementations accepted integer-to-pointer conversions without a cast, and although the meaning of most such conversions was implementation and / or context specific, there was wide acceptance that converting constant 0 to a pointer yielded a null pointer. That use was by far the most common one of converting an integer constant to a pointer. The committee wanted to impose stricter rules on type conversions, but it did not want to break all the existing code that used 0 as a constant representing a null pointer.
So they hacked the spec.
They invented a special kind of constant, the null pointer constant, and provided rules around it that made it compatible with existing use. A null pointer constant, regardless of lexical form, can be implicitly converted to any pointer type, yielding a null pointer (value) of that type. Otherwise, no implicit integer-to-pointer conversions are defined.
But the committee preferred that null pointer constants should actually have pointer type without conversion (which 0 does not, pointer context or no), so they provided for the "cast to type void *" option as part of the definition of a null pointer constant. At the time, that was a forward-looking move, but the general consensus now appears to be that it was the right direction to aim.
And why do we still have the "integer constant expression with value 0"? Backwards compatibility. Consistency with conventional idioms such as {0} as a universal initializer for objects of any type. Resistance to change. Perhaps other reasons as well.
The "why" - it is for historical reasons. NULL was used in various implementations before it was added to a standard. And at the time it was added to a C standard, implementations defined NULL usually as 0, or as 0 cast to some pointer. At that point you wouldn't want to make one of them illegal, because whichever you made illegal, you'd break half the existing code.
The C11 standard allows for a null pointer constant to be defined either as the integer constant expression 0 or as an expression that is cast to void *. The use of the NULL macro makes it easier for programmers to use the null pointer constant in their code, as they don't have to remember which of these definitions the implementation uses.
Using a macro also makes it easier to change the underlying definition of the null pointer constant in the future, if necessary. For example, if the implementation decided to change the definition of NULL to be a different integer constant expression, they could do so by simply modifying the definition of the NULL macro. This would not require any changes to the code that uses the NULL macro, as long as the code uses the NULL macro consistently.
There are two definitions of the NULL macro provided in the example you gave because some systems may define NULL as an expression that is cast to void *, while others may define it as the integer constant expression 0. By providing both definitions, the stddef.h header can be used on a wide range of systems without requiring any modifications.
I included a check for nullptr in a line of C code. The compiler (gcc) complained when using -std=c17 as well as -std=gnu17.
Is there such a thing as nullptr (or equivalent) in modern C standards? (C11, C17)
If not, then why?
No, C still uses NULL for a null pointer.
C++ needs a dedicated null pointer literal because it has overloading and template type deduction. These features get confused by NULL, which expands to 0 (or something like that) in C++. However, the risk of such confusion in C is small (maybe _Generic can get confused by this), and in addition, C can use (void*)0 for a null pointer, which mitigates this risk even more.
The closest thing to C++'s nullptr is C's NULL. Which may be
an integer constant expression with the value 0,
an integer constant expression with the value 0 cast to the type void*.
A null pointer constant may be converted to any pointer type; such conversion results in the null pointer value of that type.
The formal C17 specifications state that the stddef.h header defines NULL "which expands to an implementation-defined null pointer constant." (7.19)
A null pointer constant is defined as follows (6.3.2.3)
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
Conversion of a null pointer to another pointer type yields a null pointer of that type. Any two null
pointers shall compare equal.
Note that this makes the following program ambiguous, as NULL could be an integer constant expression (accepted by the function) or of the type void* (not accepted by the function).
#include <stdio.h>
void printInt(int n)
{
printf("%d\n", n);
}
int main(void)
{
printInt(NULL);
}
Which is why nullptr was introduced in C++11. For C, having no function overloading or type deduction, this is less of an issue.
A null pointer in C is a pointer object pointing at "null". You can turn a pointer into a null pointer by assigning it to a null pointer constant. Valid null pointer constants are 0 and (void*)0. The macro NULL is guaranteed to be a null pointer constant.
The internal representation of the pointer then becomes a "null pointer", which could in theory point at an address different from zero on some exotic system. Similarly, NULL could in theory expand to something different from zero in old, pre-standard C.
When creating C++, Bjarne Stroustrup found all of this to be needlessly complex and decided that "NULL is 0" (source: https://www.stroustrup.com/bs_faq2.html#null). Notably C++ was created long before the first standardization of C, so his arguments are less relevant to standard C than they were to pre-standard C.
For more info about null pointers vs NULL in C, see What's the difference between null pointers and NULL?
ISO C 23 now has nullptr, as well as the nullptr_t type. The proposal that introduced it has some rationale.
Code sample:
struct name
{
int a, b;
};
int main()
{
&(((struct name *)NULL)->b);
}
Does this cause undefined behaviour? We could debate whether it "dereferences null", however C11 doesn't define the term "dereference".
6.5.3.2/4 clearly says that using * on a null pointer causes undefined behaviour; however it doesn't say the same for -> and also it does not define a -> b as being (*a).b ; it has separate definitions for each operator.
The semantics of -> in 6.5.2.3/4 says:
A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.
However, NULL does not point to an object, so the second sentence seems underspecified.
Also relevant might be 6.5.3.2/1:
Constraints:
The operand of the unary & operator shall be either a function designator, the result of a
[] or unary * operator, or an lvalue that designates an object that is not a bit-field and is
not declared with the register storage-class specifier.
However I feel that the bolded text is defective and should read lvalue that potentially designates an object , as per 6.3.2.1/1 (definition of lvalue) -- C99 messed up the definition of lvalue, so C11 had to rewrite it and perhaps this section got missed.
6.3.2.1/1 does say:
An lvalue is an expression (with an object type other than void) that potentially
designates an object; if an lvalue does not designate an object when it is evaluated, the
behavior is undefined
however the & operator does evaluate its operand. (It doesn't access the stored value but that is different).
This long chain of reasoning seems to suggest that the code causes UB however it is fairly tenuous and it's not clear to me what the writers of the Standard intended. If in fact they intended anything, rather than leaving it up to us to debate :)
From a lawyer point of view, the expression &(((struct name *)NULL)->b); should lead to UB, since you could not find a path in which there would be no UB. IMHO the root cause is that at a moment you apply the -> operator on an expression that does not point to an object.
From a compiler point of view, assuming the compiler programmer was not overcomplicated, it is clear that the expression returns the same value as offsetof(name, b) would, and I'm pretty sure that provided it is compiled without error any existing compiler will give that result.
As written, we could not blame a compiler that would note that in the inner part you use operator -> on an expression than cannot point to an object (since it is null) and issue a warning or an error.
My conclusion is that until there is a special paragraph saying that provided it is only to take its address it is legal do dereference a null pointer, this expression is not legal C.
Yes, this use of -> has undefined behavior in the direct sense of the English term undefined.
The behavior is only defined if the first expression points to an object and not defined (=undefined) otherwise. In general you shouldn't search more in the term undefined, it means just that: the standard doesn't provide a meaning for your code. (Sometimes it points explicitly to such situations that it doesn't define, but this doesn't change the general meaning of the term.)
This is a slackness that is introduced to help compiler builders to deal with things. They may defined a behavior, even for the code that you are presenting. In particular, for a compiler implementation it is perfectly fine to use such code or similar for the offsetof macro. Making this code a constraint violation would block that path for compiler implementations.
Let's start with the indirection operator *:
6.5.3.2 p4:
The unary * operator denotes indirection. If the operand points to a function, the result is
a function designator; if it points to an object, the result is an lvalue designating the
object. If the operand has type "pointer to type", the result has type "type". If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined. 102)
*E, where E is a null pointer, is undefined behavior.
There is a footnote that states:
102) Thus, &*E is equivalent to E (even if E is a null pointer), and &(E1[E2]) to ((E1)+(E2)). It is
always true that if E is a function designator or an lvalue that is a valid operand of the unary &
operator, *&E is a function designator or an lvalue equal to E. If *P is an lvalue and T is the name of
an object pointer type, *(T)P is an lvalue that has a type compatible with that to which T points.
Which means that &*E, where E is NULL, is defined, but the question is whether the same is true for &(*E).m, where E is a null pointer and its type is a struct that has a member m?
C Standard doesn't define that behavior.
If it were defined, new problems would arise, one of which is listed below. C Standard is correct to keep it undefined, and provides a macro offsetof that handles the problem internally.
6.3.2.3 Pointers
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. 66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
This means that an integer constant expression with the value 0 is converted to a null pointer constant.
But the value of a null pointer constant is not defined as 0. The value is implementation defined.
7.19 Common definitions
The macros are
NULL
which expands to an implementation-defined null pointer constant
This means C allows an implementation where the null pointer will have a value where all bits are set and using member access on that value will result in an overflow which is undefined behavior
Another problem is how do you evaluate &(*E).m? Do the brackets apply and is * evaluated first. Keeping it undefined solves this problem.
First, let's establish that we need a pointer to an object:
6.5.2.3 Structure and union members
4 A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.96) If the first expression is a pointer to
a qualified type, the result has the so-qualified version of the type of the designated
member.
Unfortunately, no null pointer ever points to an object.
6.3.2.3 Pointers
3 An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
Result: Undefined Behavior.
As a side-note, some other things to chew over:
6.3.2.3 Pointers
4 Conversion of a null pointer to another pointer type yields a null pointer of that type.
Any two null pointers shall compare equal.
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.67)
6 Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
67) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.
So even if the UB should happen to be benign this time, it might still result in some totally unexpected number.
Nothing in the C standard would impose any requirements on what a system could do with the expression. It would, when the standard was written, have been perfectly reasonable for it to to cause the following sequence of events at runtime:
Code loads a null pointer into the addressing unit
Code asks the addressing unit to add the offset of field b.
The addressing unit trigger a trap when attempting to add an integer to a null pointer (which should for robustness be a run-time trap, even though many systems don't catch it)
The system starts executing essentially random code after being dispatched through a trap vector that was never set because code to set it would have wasted been a waste of memory, as addressing traps shouldn't occur.
The very essence of what Undefined Behavior meant at the time.
Note that most of the compilers that have appeared since the early days of C would regard the address of a member of an object located at a constant address as being a compile-time constant, but I don't think such behavior was mandated then, nor has anything been added to the standard which would mandate that compile-time address calculations involving null pointers be defined in cases where run-time calculations would not.
No. Let's take this apart:
&(((struct name *)NULL)->b);
is the same as:
struct name * ptr = NULL;
&(ptr->b);
The first line is obviously valid and well defined.
In the second line, we calculate the address of a field relative to the address 0x0 which is perfectly legal as well. The Amiga, for example, had the pointer to the kernel in the address 0x4. So you could use a method like this to call kernel functions.
In fact, the same approach is used on the C macro offsetof (wikipedia):
#define offsetof(st, m) ((size_t)(&((st *)0)->m))
So the confusion here revolves around the fact that NULL pointers are scary. But from a compiler and standard point of view, the expression is legal in C (C++ is a different beast since you can overload the & operator).
NULL appears to be zero in my GCC test programs, but wikipedia says that NULL is only required to point to unaddressable memory.
Do any compilers make NULL non-zero? I'm curious whether if (ptr == NULL) is better practice than if (!ptr).
NULL is guaranteed to be zero, perhaps casted to (void *)1.
C99, §6.3.2.3, ¶3
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.(55) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
And note 55 says:
55) The macro NULL is defined in <stddef.h> (and other headers) as a null pointer constant.
Notice that, because of how the rules for null pointers are formulated, the value you use to assign/compare null pointers is guaranteed to be zero, but the bit pattern actually stored inside the pointer can be any other thing (but AFAIK only few very esoteric platforms exploited this fact, and this should not be a problem anyway since to "see" the underlying bit pattern you should go into UB-land anyway).
So, as far as the standard is concerned, the two forms are equivalent (!ptr is equivalent to ptr==0 due to §6.5.3.3 ¶5, and ptr==0 is equivalent to ptr==NULL); if(!ptr) is also quite idiomatic.
That being said, I usually write explicitly if(ptr==NULL) instead of if(!ptr) to make it extra clear that I'm checking a pointer for nullity instead of some boolean value.
Notice that in C++ the void * cast cannot be present due to the stricter implicit casting rules that would make the usage of such NULL cumbersome (you would have to explicitly convert it to the compared pointer's type every time).
From the language standard:
6.3.2.3 Pointers
...
3 An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.55) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
...
55) The macro NULL is defined in <stddef.h> (and other headers) as a null pointer constant; see 7.17.
Given that language, the macro NULL should evaluate to a zero-valued expression (either an undecorated literal 0, an expression like (void *) 0, or another macro or expression that ultimately evaluates to 0). The expressions ptr == NULL and !ptr should be equivalent. The second form tends to be more idiomatic C code.
Note that the null pointer value doesn't have to be 0. The underlying implementation may use any value it wants to represent a null pointer. As far as your source code is concerned, however, a zero-valued pointer expression represents a null pointer.
In practice is the same, but NULL is different to zero. Since zero means there's a value and NULL means there isn't any. So, theoretically they are different, NULL having a different meaning and in some cases that difference should be of some use.
in practice no, !ptr is correct
I'm asking because of the discussion that's been provoked in this thread.
Trying to have a serious back-and-forth discussion using comments under other people's replies is not easy or fun. So I'd like to hear what our C experts think without being restricted to 500 characters at a time.
The C standard has precious few words to say about NULL and null pointer constants. There's only two relevant sections that I can find. First:
3.2.2.3 Pointers
An integral constant expression with
the value 0, or such an expression
cast to type void * , is called a null
pointer constant. If a null pointer
constant is assigned to or compared
for equality to a pointer, the
constant is converted to a pointer of
that type. Such a pointer, called a
null pointer, is guaranteed to compare
unequal to a pointer to any object or
function.
and second:
4.1.5 Common definitions
The macros are
NULL
which expands to an implementation-defined null pointer constant;
The question is, can NULL expand to an implementation-defined null pointer constant that is different from the ones enumerated in 3.2.2.3?
In particular, could it be defined as:
#define NULL __builtin_magic_null_pointer
Or even:
#define NULL ((void*)-1)
My reading of 3.2.2.3 is that it specifies that an integral constant expression of 0, and an integral constant expression of 0 cast to type void* must be among the forms of null pointer constant that the implementation recognizes, but that it isn't meant to be an exhaustive list. I believe that the implementation is free to recognize other source constructs as null pointer constants, so long as no other rules are broken.
So for example, it is provable that
#define NULL (-1)
is not a legal definition, because in
if (NULL)
do_stuff();
do_stuff() must not be called, whereas with
if (-1)
do_stuff();
do_stuff() must be called; since they are equivalent, this cannot be a legal definition of NULL.
But the standard says that integer-to-pointer conversions (and vice-versa) are implementation-defined, therefore it could define the conversion of -1 to a pointer as a conversion that produces a null pointer. In which case
if ((void*)-1)
would evaluate to false, and all would be well.
So what do other people think?
I'd ask for everybody to especially keep in mind the "as-if" rule described in 2.1.2.3 Program execution. It's huge and somewhat roundabout, so I won't paste it here, but it essentially says that an implementation merely has to produce the same observable side-effects as are required of the abstract machine described by the standard. It says that any optimizations, transformations, or whatever else the compiler wants to do to your program are perfectly legal so long as the observable side-effects of the program aren't changed by them.
So if you are looking to prove that a particular definition of NULL cannot be legal, you'll need to come up with a program that can prove it. Either one like mine that blatantly breaks other clauses in the standard, or one that can legally detect whatever magic the compiler has to do to make the strange NULL definition work.
Steve Jessop found an example of way for a program to detect that NULL isn't defined to be one of the two forms of null pointer constants in 3.2.2.3, which is to stringize the constant:
#define stringize_helper(x) #x
#define stringize(x) stringize_helper(x)
Using this macro, one could
puts(stringize(NULL));
and "detect" that NULL does not expand to one of the forms in 3.2.2.3. Is that enough to render other definitions illegal? I just don't know.
Thanks!
In the C99 standard, §7.17.3 states that NULL “expands to an implementation defined null pointer constant”. Meanwhile §6.3.2.3.3 defines null pointer constant as “an integer constant expression with the value 0, or such an expression cast to type void *”. As there is no other definition for a null pointer constant, a conforming definition of NULL must expand to an integer constant expression with the value zero (or this cast to void *).
Further quoting from the C FAQ question 5.5 (emphasis added):
Section 4.1.5 of the C Standard states that NULL “expands to an implementation-defined null pointer constant,” which means that the implementation gets to choose which form of 0 to use and whether to use a `void *` cast; see questions 5.6 and 5.7. “Implementation-defined” here does not mean that NULL might be #defined to match some implementation-specific nonzero internal null pointer value.
It makes perfect sense; since the standard requires a zero integer constant in pointer contexts to compile into a null pointer (regardless of whether or not the machine's internal representation of that has a value of zero), the case where NULL is defined as zero must be handled anyhow. The programmer is not required to type NULL to obtain null pointers; it's just a stylistic convention (and may help catch errors e.g. when a NULL defined as (void *)0 is used in a non-pointer context).
Edit: One source of confusion here seems to be the concise language used by the standard, i.e. it does not explicitly say that there is no other value that might be considered a null pointer constant. However, when the standard says “…is called a null pointer constant”, it means that exactly the given definitions are called null pointer constants. It does not need to explicitly follow every definition by stating what is non-conforming when (by definition) the standard defines what is conforming.
This expands a bit on some of the other answers and makes some points that the others missed.
Citations are to N1570 a draft of the 2011 ISO C standard. I don't believe there have been any significant changes in this area since the 1989 ANSI C standard (which is equivalent to the 1990 ISO C standard). A reference like "7.19p3" refers to subsection 7.19, paragraph 3. (The citations in the question seem to be to the 1989 ANSI standard, which described the language in section 3 and the library in section 4. All editions of the ISO standard describe the language in section 6 and the library in section 7.)
7.19p3 requires the macro NULL to expand to "an implementation-defined null pointer constant".
6.3.2.3p3 says:
An integer constant expression with the value 0, or such an expression
cast to type void *, is called a null pointer constant.
Since null pointer constant is in italics, that's the definition of the term (3p1 specifies that convention) -- which implies that nothing other that what's specified there can be a null pointer constant. (The standard doesn't always strictly follow that convention for its definitions, but there's no problem assuming that it does so in this case.)
So if we want "something wacky", we need to look at what can be an "integer constant expression".
The phrase null pointer constant needs to be taken as a single term, not as a phrase whose meaning depends on its constituent words. In particular, the integer constant 0 is a null pointer constant, regardless of the context in which it appears; it needn't result in a null pointer value, and it's of type int, not of any pointer type.
An "integer constant expression with the value 0" can be any of a number of things (infinitely many if we ignore capacity limits). A literal 0 is the most obvious. Other possibilities are 0x0, 00000, 1-1, '\0', and '-'-'-'. (It's not 100% clear from the wording whether "the value 0" refers specifically to that value of type int, but I think the consensus is that 0L is also a valid null pointer constant.)
Another relevant clause is 6.6p10:
An implementation may accept other forms of constant expressions.
It's not entirely clear (to me) how much latitude this is intended to allow. For example, a compiler might support binary literals as an extension; then 0b0 would be a valid null pointer constant. It might also allow C++-style references to const objects, so that given
const int x = 0;
a reference to x could be a constant expression (it isn't in standard C).
So it's clear that 0 is a null pointer constant, and that it's a valid definition for the NULL macro.
It's equally clear that (void*)0 is a null pointer constant, but it's not a valid definition for NULL, because of 7.1.2p5:
Any definition of an object-like macro described in this clause shall
expand to code that is fully protected by parentheses where necessary,
so that it groups in an arbitrary expression as if it were a single
identifier.
If NULL expanded to (void*)0, then the expression sizeof NULL would be a syntax error.
So what about ((void*)0)? Well, I'm 99.9% sure that it's intended to be a valid definition for NULL, but 6.5.1, which describes parenthesized expressions, says:
A parenthesized expression is a primary expression. Its type and value
are identical to those of the unparenthesized expression. It is an
lvalue, a function designator, or a void expression if the
unparenthesized expression is, respectively, an lvalue, a function
designator, or a void expression.
It doesn't say that a parenthesized null pointer constant is a null pointer constant. Still, as far as I know all C compilers reasonably assume that a parenthesized null pointer constant is a null pointer constant, making ((void*)0) a valid definition for NULL.
What if a null pointer is represented not as all-bits-zero, but as some other bit pattern, for example, one equivalent to 0xFFFFFFFF. Then (void*)0xFFFFFFFF, even if it happens to evaluate to a null pointer is not a null pointer constant, simply because it doesn't satisfy the definition of that term.
So what other variations are permitted by the standard?
Since implementations may accept other forms of constant expression, a compiler could define __null as a constant expression of type int with the value 0, allowing either __null or ((void*)__null) as the definition of NULL. It could make also __null itself a constant of pointer type, but it couldn't then use __null as the definition of NULL, since it doesn't satisfy the definition in 6.3.2.3p3.
An implementation could accomplish the same thing, with no compiler magic, like this:
enum { __null };
#define NULL __null
Here __null is an integer constant expression of type int with the value 0, so it can be used anywhere a constant 0 can be used.
The advantage defining NULL in terms of a symbol like __null is that the compiler could then issue a (perhaps optional) warning if NULL is used in a non-pointer constant. For example, this:
char c = NULL; /* PLEASE DON'T DO THIS */
is perfectly legal if NULL happens to be defined as 0; expanding NULL to some recognizable token like __null would make it easier for the compiler to detect this questionable construct.
Ages later, but no one brought up this point: Suppose that the implementation does in fact choose to use
#define NULL __builtin_null
My reading of C99 is that that's fine as long as the special keyword __builtin_null behaves as-if it were either "an integral constant expression with value 0" or "an integral constant expression with value 0, cast to void *". In particular, if the implementation chooses the former of those options, then
int x = __builtin_null;
int y = __builtin_null + 1;
is a valid translation unit, setting x and y to the integer values 0 and 1 respectively. If it chooses the latter, of course, both are constraint violations (6.5.16.1, 6.5.6 respectively; void * is not a "pointer to an object type" per 6.2.5p19; 6.7.8p11 applies the constraints for assignment to initialization). And I don't offhand see why an implementation would do this if not to provide better diagnostics for "misuse" of NULL, so it seems likely that it would take the option that invalidated more code.
Well, I've found a way to prove that
#define NULL ((void*)-1)
is not a legal definition of NULL.
int main(void)
{
void (*fp)() = NULL;
}
Initializing a function pointer with NULL is legal and correct, whereas...
int main(void)
{
void (*fp)() = (void*)-1;
}
...is a constraint violation that requires a diagnostic. So that's out.
But the __builtin_magic_null_pointer definition of NULL wouldn't suffer that problem. I'd still like to know if anybody can come up with a reason why it can't be.
An integral constant expression with
the value 0, or such an expression
cast to type void * , is called a
null pointer constant.
NULL which expands to an
implementation-defined null pointer
constant;
therefore either
NULL == 0
or
NULL == (void *)0
The null pointer constant must evaluate 0, otherwise expressions like !ptr would not work as expected.
The NULL macro expands to a 0-valued expression; AFAIK, it always has.