Related
Why is there a NULL in the C language? Is there a context in which just plain literal 0 would not work exactly the same?
Actually, you can use a literal 0 anyplace you would use NULL.
Section 6.3.2.3p3 of the C standard states:
An integer constant expression with the value 0, or such an
expression cast to type void *, is called a null pointer
constant. If a null pointer constant is converted to a pointer type,
the resulting pointer, called a null pointer, is guaranteed to
compare unequal to a pointer to any object or function.
And section 7.19p3 states:
The macros are:
NULL
which expands to an implementation-defined null pointer constant
So 0 qualifies as a null pointer constant, as does (void *)0 and NULL. The use of NULL is preferred however as it makes it more evident to the reader that a null pointer is being used and not the integer value 0.
NULL is used to make it clear it is a pointer type.
Ideally, the C implementation would define NULL as ((void *) 0) or something equivalent, and programmers would always use NULL when they want a null pointer constant.
If this is done, then, when a programmer has, for example, an int *x and accidentally writes *x = NULL;, then the compiler can recognize that a mistake has been made, because the left side of = has type int, and the right side has type void *, and this is not a proper combination for assignment.
In contrast, if the programmer accidentally writes *x = 0; instead of x = 0;, then the compiler cannot recognize this mistake, because the left side has type int, and the right side has type int, and that is a valid combination.
Thus, when NULL is defined well and is used, mistakes are detected earlier.
In particular answer to your question “Is there a context in which just plain literal 0 would not work exactly the same?”:
In correct code, NULL and 0 may be used interchangeably as null pointer constants.
0 will function as an integer (non-pointer) constant, but NULL might not, depending on how the C implementation defines it.
For the purpose of detecting errors, NULL and 0 do not work exactly the same; using NULL with a good definition serves to help detect some mistakes that using 0 does not.
The C standard allows 0 to be used for null pointer constants for historic reasons. However, this is not beneficial except for allowing previously written code to compile in compilers using current C standards. New code should avoid using 0 as a null pointer constant.
It is for humans not compilers.
if I see in the code p = NULL; instead of p = 0; it is much easier for me to understand that p is a pointer not the integer.
For compilers it does not matter, for humans does.
Same as we use definitions instead of "raw" values or expressions or human readable variable names like loopCounter instead of p755_x.
Why is there a NULL in the C language?
To help make clear the assignment implies a pointer and not an integer.
Example: strtok(char *s1, const char *s2); in both cases below receive a null pointer as the NULL and 0 are both converted to a char *. The first is usually considered better self-documentation. As a style issue, follow your group's coding standard.
strtok(s, NULL);
strtok(s, 0;
Is there a context in which just plain literal 0 would not work exactly the same?
Yes - when the original type is important.
0 is an int
NULL is a void *, or int, unsigned or long or long long, etc. It is implementation defined.
Consider a function that takes a variable number of pointers, with a sentinel null pointer to indicate the last.
foo("Hello", "World", NULL); // might work if `NULL` is a pointer.
foo("Hello", "World", 0);
As the arguments as 0 and NULL are not converted when passed to a ... function (aside from some promotions), the function foo() might not access them the same. Portable code would use:
foo("Hello", "World", (char*) NULL);
// or
foo("Hello", "World", (char*) 0);
A difference may also occur when NULL, 0 are passed to _Generic
The integer constant literal 0 has different meanings depending upon the context in which it's used. In all cases, it is still an integer constant with the value 0, it is just described in different ways.
Namely, the most common purposes of NULL pointer are:
To initialize a pointer variable when that pointer variable isn’t
assigned any valid memory address yet.
To check for a null pointer before accessing any pointer
variable. By doing so, we can perform error handling in pointer
related code e.g. dereference pointer variable only if it’s not
NULL.
To pass a null pointer to a function argument when we don’t want
to pass any valid memory address.(ref)
Unfortunately, I haven't found anything like std-discussion for the ISO C standard, so I'll ask here.
Before answering the question, make sure you are familiar with the idea of pointer provenance (see DR260 and/or http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2263.htm).
6.3.2.3(Pointers) paragraph 7 says:
When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.
The questions are:
What does "a pointer to an object" mean? Does it mean that if we want to get the pointer to the lowest addressed byte of an object of type T, we shall convert a pointer of type cvT* to a pointer to a character type, and converting a pointer to void, obtained from the pointer of type T*, won't give us desired result? Or "a pointer to an object" is the value of a pointer which follows from the pointer provenance and cast to void* does not change the value (analogous to how it was recently formalized in C++17)?
Why the paragraph explicitly mentions increments? Does it mean that adding value greater than 1 is undefined? Does it mean that decrementing (after we have incremented the result several times so that we won't go beyond the lower object boundary) is undefined? In short: is the sequences of bytes composing an object an array?
The general description of pointer addition suggests that for any values/types of pointer p and signed integers x and y where ((ptr+x)+y) and (x+y) are both defined by the Standard, (ptr+(x+y)) would behave equivalent to ((ptr+x)+y). While it might be possible to argue that the Standard doesn't explicitly say that incrementing a pointer five times would be equivalent to adding 5, there is nothing in the Standard that would suggest that quality implementations should not be expected to behave in that fashion.
Note that the authors of the Standard didn't try to make it "language-lawyer-proof". Further, they didn't think anyone would care about whether or not an obviously-inferior implementation was "conforming". An implementation that only works reliably if bytes of an object are accessed sequentially would be less versatile than one which supported reliable indexing, while offering no plausible advantage. Consequently, there should be no need for the Standard to mandate support for indexing, because anyone trying to produce a quality implementation would support it whether the Standard mandated it or not.
Of course, there are some constructs which programmers in the 1990s--or even the authors of the Standard themselves--expected quality compilers to handle reliably, but which some of today's "clever" compilers don't. Whether that means such expectations were unreasonable, or whether they remain accurate when applied to quality compilers, is a matter of opinion. In this particular case, I think the implication that positive indexing should behave like repeated incrementing is strong enough that I wouldn't expect compiler writers to argue otherwise, but I'm not 100% certain that no compiler would ever be "clever"/obtuse enough to look at something like:
int test(unsigned char foo[5][5], int x)
{
foo[1][0] = 1;
// Following should yield a pointer that can be used to access the entire
// array 'foo', but an obtuse compiler writer could perhaps argue that the
// code is converting the address of foo[0] into a pointer to the first
// element of that sub-array, and that the resulting pointer is thus only
// usable to access items within that sub-array.
unsigned char *p = (unsigned char*)foo;
// Following should be able to access any element of the object [i.e. foo]
// whose address was taken
p[x] = 2;
return foo[1][0];
}
and decide that it could skip the second read of foo[1][0] since p[x] wouldn't possibly access any element of foo beyond the first row. I would, however, say that programmers should not try to code around the possibility of vandals writing a compiler that would behave that way. No program can be made bullet-proof against vandals writing obtuse-but-"conforming" compilers, and the fact that a program can be undermined by such vandals should hardly be viewed as a defect.
Take a non-char c object and create a pointer to it, i.e.
int obj;
int *objPtr = &obj;
convert the pointer to object to pointer to char:
char *charPtr = (char *)objPtr;
now, charPtr points to the lowest byte or the int obj.
increment it:
charPtr++;
now it points to the next byte of the object. and so on till you reach the size of the object:
int i;
for (i = 0; i < sizeof(obj); i++)
printf("%d", *charPtr++);
Which of the following ways of treating and trying to recover a C pointer, is guaranteed to be valid?
1) Cast to void pointer and back
int f(int *a) {
void *b = a;
a = b;
return *a;
}
2) Cast to appropriately sized integer and back
int f(int *a) {
uintptr_t b = a;
a = (int *)b;
return *a;
}
3) A couple of trivial integer operations
int f(int *a) {
uintptr_t b = a;
b += 99;
b -= 99;
a = (int *)b;
return *a;
}
4) Integer operations nontrivial enough to obscure provenance, but which will nonetheless leave the value unchanged
int f(int *a) {
uintptr_t b = a;
char s[32];
// assume %lu is suitable
sprintf(s, "%lu", b);
b = strtoul(s);
a = (int *)b;
return *a;
}
5) More indirect integer operations that will leave the value unchanged
int f(int *a) {
uintptr_t b = a;
for (uintptr_t i = 0;; i++)
if (i == b) {
a = (int *)i;
return *a;
}
}
Obviously case 1 is valid, and case 2 surely must be also. On the other hand, I came across a post by Chris Lattner - which I unfortunately can't find now - saying something similar to case 5 is not valid, that the standard licenses the compiler to just compile it to an infinite loop. Yet each case looks like an unobjectionable extension of the previous one.
Where is the line drawn between a valid case and an invalid one?
Added based on discussion in comments: while I still can't find the post that inspired case 5, I don't remember what type of pointer was involved; in particular, it might have been a function pointer, which might be why that case demonstrated invalid code whereas my case 5 is valid code.
Second addition: okay, here's another source that says there is a problem, and this one I do have a link to. https://www.cl.cam.ac.uk/~pes20/cerberus/notes30.pdf - the discussion about pointer provenance - says, and backs up with evidence, that no, if the compiler loses track of where a pointer came from, it's undefined behavior.
According to the C11 draft standard:
Example 1
Valid, by §6.5.16.1, even without an explicit cast.
Example 2
The intptr_t and uintptr_t types are optional. Assigning a pointer to an integer requires an explicit cast (§6.5.16.1), although gcc and clang will only warn you if you don’t have one. With those caveats, the round-trip conversion is valid by §7.20.1.4. ETA: John Bellinger brings up that the behavior is only specified when you do an intermediate cast to void* both ways. However, both gcc and clang allow the direct conversion as a documented extension.
Example 3
Safe, but only because you’re using unsigned arithmetic, which cannot overflow, and are therefore guaranteed to get the same object representation back. An intptr_t could overflow! If you want to do pointer arithmetic safely, you can convert any kind of pointer to char* and then add or subtract offsets within the same structure or array. Remember, sizeof(char) is always 1. ETA: The standard guarantees that the two pointers compare equal, but your link to Chisnall et al. gives examples where compilers nevertheless assume the two pointers do not alias each other.
Example 4
Always, always, always check for buffer overflows whenever you read from and especially whenever you write to a buffer! If you can mathematically prove that overflow cannot happen by static analysis? Then write out the assumptions that justify that, explicitly, and assert() or static_assert() that they haven’t changed. Use snprintf(), not the deprecated, unsafe sprintf()! If you remember nothing else from this answer, remember that!
To be absolutely pedantic, the portable way to do this would be to use the format specifiers in <inttypes.h> and define the buffer length in terms of the maximum value of any pointer representation. In the real world, you would print pointers out with the %p format.
The answer to the question you intended to ask is yes, though: all that matters is that you get back the same object representation. Here’s a less contrived example:
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
int i = 1;
const uintptr_t u = (uintptr_t)(void*)&i;
uintptr_t v;
memcpy( &v, &u, sizeof(v) );
int* const p = (int*)(void*)v;
assert(p == &i);
*p = 2;
printf( "%d = %d.\n", i, *p );
return EXIT_SUCCESS;
}
All that matter are the bits in your object representation. This code also follows the strict aliasing rules in §6.5. It compiles and runs fine on the compilers that gave Chisnall et al trouble.
Example 5
This works, same as above.
One extremely pedantic footnote that will never ever be relevant to your coding: some obsolete esoteric hardware has one’s-complement or sign-and-magnitude representation of signed integers, and on these, there might be a distinct value of negative zero that might or might not trap. On some CPUs, this might be a valid pointer or null pointer representation distinct from positive zero. And on some CPUs, positive and negative zero might compare equal.
PS
The standard says:
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
Furthermore, if the two array objects are consecutive rows of the same multidimensional array, one past the end of the first row is a valid pointer to the start of the next row. Therefore, even a pathological implementation that deliberately sets out to cause as many bugs as the standard allows could only do so if your manipulated pointer compares equal to the address of an array object, in which case the implementation might in theory decide to interpret it as one-past-the-end of some other array object instead.
The intended behavior is clearly that the pointer comparing equal both to &array1+1 and to &array2 is equivalent to both: it means to let you compare it to addresses within array1 or dereference it to get array2[0]. However, the Standard does not actually say that.
PPS
The standards committee has addressed some of these issues and proposes that the C standard explicitly add language about pointer provenance. This would nail down whether a conforming implementation is allowed to assume that a pointer created by bit manipulation does not alias another pointer.
Specifically, the proposed corrigendum would introduce pointer provenance and allow pointers with different provenance not to compare equal. It would also introduce a -fno-provenance option, which would guarantee that any two pointers compare equal if and only if they have the same numeric address. (As discussed above, two object pointers that compare equal alias each other.)
1) Cast to void pointer and back
This yields a valid pointer equal to the original. Paragraph 6.3.2.3/1 of the standard is clear on this:
A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.
2) Cast to appropriately sized integer and back
3) A couple of trivial integer operations
4) Integer operations nontrivial enough to obscure provenance, but which will nonetheless leave the value unchanged
5) More indirect integer operations that will leave the value unchanged
[...] Obviously case 1 is valid, and case 2 surely must be also. On the other hand, I came across a post by Chris Lattner - which I unfortunately can't find now - saying case 5 is not valid, that the standard licenses the compiler to just compile it to an infinite loop.
C does require a cast when converting either way between pointers and integers, and you have omitted some of those in your example code. In that sense your examples (2) - (5) are all non-conforming, but for the rest of this answer I'll pretend the needed casts are there.
Still, being very pedantic, all of these examples have implementation-defined behavior, so they are not strictly conforming. On the other hand, "implementation-defined" behavior is still defined behavior; whether that means your code is "valid" or not depends on what you mean by that term. In any event, what code the compiler might emit for any of the examples is a separate matter.
These are the relevant provisions of the standard from section 6.3.2.3 (emphasis added):
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.
The definition of uintptr_t is also relevant to your particular example code. The standard describes it this way (C2011, 7.20.1.4/1; emphasis added):
an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer.
You are converting back and forth between int * and uintptr_t. int * is not void *, so 7.20.1.4/1 does not apply to these conversions, and the behavior is implementation-defined per section 6.3.2.3.
Suppose, however, that you convert back and forth via an intermediate void *:
uintptr_t b = (uintptr_t)(void *)a;
a = (int *)(void *)b;
On an implementation that provides uintptr_t (which is optional), that would make your examples (2 - 5) all strictly conforming. In that case, the result of the integer-to-pointer conversions depends only on the value of the uintptr_t object, not on how that value was obtained.
As for the claims you attribute to Chris Lattner, they are substantially incorrect. If you have represented them accurately, then perhaps they reflect a confusion between implementation-defined behavior and undefined behavior. If the code exhibited undefined behavior then the claim might hold some water, but that is not, in fact, the case.
Regardless of how its value was obtained, b has a definite value of type uintptr_t, and the loop must eventually increment i to that value, at which point the if block will run. In principle, the implementation-defined behavior of a conversion from uintptr_t directly to int * could be something crazy such as skipping the next statement (thus causing an infinite loop), but such behavior is entirely implausible. Every implementation you ever meet will either fail at that point, or store some value in variable a, and then, if it didn't crash, it will execute the return statement.
Because different application fields require the ability to manipulate pointers in different ways, and because the best implementations for some purposes may be totally unsuitable for some others, the C Standard treats the support (or lack thereof) for various kinds of manipulations as a Quality of Implementation issue. In general, people writing an implementation for a particular application field should be more familiar than the authors of the Standard with what features would be useful to programmers in that field, and people making a bona fide effort to produce quality implementations suitable for writing applications in that field will support such features whether the Standard requires them to or not.
In the pre-Standard language invented by Dennis Ritchie, all pointers of a particular type that were identified the same address were equivalent. If any sequence of operations on a pointer would end up producing another pointer of the same type that identified the same address, that pointer would--essentially by definition--be equivalent to the first. The C Standard specifies some situations, however, where pointers can identify the same location in storage and be indistinguishable from each other without being equivalent. For example, given:
int foo[2][4] = {0};
int *p = foo[0]+4, *q=foo[1];
both p and q will compare equal to each other, and to foo[0]+4, and to foo[1]. On the other hand, despite the fact that evaluation of p[-1] and q[0] would have defined behavior, evaluation of p[0] or q[-1] would invoke UB. Unfortunately, while the Standard makes clear that p and q would not be equivalent, it does nothing to clarify whether performing various sequences of operations on e.g. p will yield a pointer that is usable in all cases where p was usable, a pointer that's usable in all cases where either p or q would be usable, a pointer that's only usable in cases where q would be usable, or a pointer that's only usable in cases where both p and q would be usable.
Quality implementations intended for low-level programming should generally process pointer manipulations other than those involving restrict pointers in a way that would yield a pointer that would be usable in any case where a pointer that compares equal to it could usable. Unfortunately, the Standard provides no means by which a program can determine if it's being processed by a quality implementations that is suitable for low-level programming and refuse to run if it isn't, so most forms of systems programming must rely upon quality implementations processing certain actions in documented manner characteristic of the environment even when the Standard would impose no requirements.
Incidentally, even if the normal constructs for manipulating pointers wouldn't have any way of creating pointers where the principles of equivalence shouldn't apply, some platforms may define ways of creating "interesting" pointers. For example, if an implementation that would normally trap operations on null pointers were run on in an environment where it might sometimes be necessary to access an object at address zero, it might define a special syntax for creating a pointer that could be used to access any address, including zero, within the context where it was created. A "legitimate pointer to address zero" would likely compare equal to a null pointer (even though they're not equivalent), but performing a round-trip conversion to another type and back would likely convert what had been a legitimate pointer to address zero into a null pointer. If the Standard had mandated that a round-trip conversion of any pointer must yield one that's usable in the same way as the original, that would require that the compiler omit null traps on any pointers that could have been produced in such a fashion, even if they would far more likely have been produced by round-tripping a null pointer.
Incidentally, from a practical perspective, "modern" compilers, even in -fno-strict-aliasing, will sometimes attempt to track provenance of pointers through pointer-integer-pointer conversions in such a way that pointers produced by casting equal integers may sometimes be assumed incapable of aliasing.
For example, given:
#include <stdint.h>
extern int x[],y[];
int test(void)
{
if (!x[0]) return 999;
uintptr_t upx = (uintptr_t)x;
uintptr_t upy = (uintptr_t)(y+1);
//Consider code with and without the following line
if (upx == upy) upy = upx;
if ((upx ^ ~upy)+1) // Will return if upx != upy
return 123;
int *py = (int*)upy;
*py += 1;
return x[0];
}
In the absence of the marked line, gcc, icc, and clang will all assume--even when using -fno-strict-aliasing, that an operation on *py can't affect *px, even though the only way that code could be reached would be if upx and upy held the same value (meaning that both px and py were produced by casting the same uintptr_t value). Adding the marked line causes icc and clang to recognize that px and py could identify the same object, but gcc assumes that the assignment can be optimized out, even though it should mean that py would be derived from px--a situation a quality compiler should have no trouble recognizing as implying possible aliasing.
I'm not sure what realistic benefit compiler writers expect from their efforts to track provenance of uintptr_t values, given that I don't see much purpose for doing such conversions outside of cases where the results of conversions may be used in "interesting" ways. Given compiler behavior, however, I'm not sure I see any nice way to guarantee that conversions between integers and pointers will behave in a fashion consistent with the values involved.
( I'm quoting ISO/IEC 9899:201x )
Here we see that, integer constant expression has an integer type:
6.6 Constant expressions
6.
An integer constant expression shall have integer type and shall only have operands
that are integer constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, _Alignof expressions, and floating
constants that are the immediate operands of casts. Cast operators in an integer constant
expression shall only convert arithmetic types to integer types, except as part of an
operand to the sizeof or _Alignof operator.
Then this holds true for any integer type:
6.2.6.2 Integer types
5.
The values of any padding bits are unspecified.A valid (non-trap) object representation
of a signed integer type where the sign bit is zero is a valid object representation of the
corresponding unsigned type, and shall represent the same value. For any integer type,
the object representation where all the bits are zero shall be a representation of the value
zero in that type.
Then we see that a null pointer constant is defined using an integer constant expression with the value 0.
6.3.2.3 Pointers
3.
An integer constant expression with the value 0, or such an expression cast to type
void*, is called a null pointer constant. If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
Therefore the null pointer constant must have all it's bits set to zero.
But there are many answers online and on StackOverflow that say that that isn't true.
I have a hard time believing them given the quoted parts.
( Please answer using references to the latest Standard )
Does Standard define null pointer constant to have all bits set to zero?
No, it doesn't. No paragraph of the C Standard impose such a requirement.
void *p = 0;
p for example is a null pointer, but the Standard does not require that the object p must have all bit set.
For information the c-faq website mentions some systems with non-zero null pointer representations here: http://c-faq.com/null/machexamp.html
No, NULL doesn't have to be all bits zero.
N1570 6.3.2.3 Pointers paragraph 3:
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. 66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
See my emphasis above: Integer 0 is converted if necessary, it doesn't have to have same bit presentation.
Note 66 on bottom of the page says:
66) The macro NULL is defined in (and other headers) as a null pointer constant; see 7.19.
Which leads us to a paragraph of that chapter:
The macros are
NULL
which expands to an implementation-defined null pointer constant
And what is more, on Annex J.3.12 (Portability issues, Implementation-defined behaviour, Library functions) says:
— The null pointer constant to which the macro NULL expands (7.19).
Asking about the representation of a null pointer constant is quite pointless.
A null pointer constant either has an integer type or the type void*. Whatever it is, it is a value. It is not an object. Values don't have a representation, only objects have. We can only talk about representations by taking the address of an object, casting it to char* or unsigned char*, and looking at the bytes. We can't do that with a null pointer constant. As soon as it is assigned to an object, it's not a null pointer constant anymore.
A major limitation of the C standard is that because the authors want to avoid prohibiting compilers from behaving in any ways that any production code anywhere might be relying upon, it fails to specify many things which programmers need to know. As a consequence, it is often necessary make assumptions about things which are not specified by the standard, but match the behaviors of common compilers. The fact that all of the bytes comprising a null pointer are zero is one such assumption.
Nothing in the C standard specifies anything about the bit-level representation of any pointer beyond the fact that every possible value of each and every data type--including pointers--will be representable as a sequence of char values(*). Nonetheless, on nearly all common platforms platforms zeroing out all the bytes associated with a structure is equivalent to setting all the members to the static default values for their types (the default value for a pointer being null). Further, code which uses calloc to receive a zeroed-out a block of RAM for a collection of structures will often be much faster than code which uses malloc and then has to manually clear every member of every structure, or which uses calloc and but still manually clears every non-integer member of every structure.
I would suggest therefore that in many cases it is perfectly reasonable to write code targeted for those dialects of C where null pointers are stored as all-bytes-zero, and have as a documented requirement that it will not work on dialects where that is not the case. Perhaps someday the ISO will provide a standard means by which such requirements could be documented in machine-readable form (such that every compiler would be required to either abide by a program's stated requirements or refuse compilation), but so far as I know none yet exists.
(*) From what I understand, there's some question as to whether compilers are required to honor that assumption anymore. Consider, for example:
int funcomp(int **pp, int **qq)
{
int *p,*q;
p = (int*)malloc(1234);
*pp = p;
free(p);
q = (int*)malloc(1234);
*qq = q;
*q = 1234;
if (!memcmp(pp, qq, sizeof p))
return *p;
return 0;
}
Following free(p) any attempt to access *p will be Undefined Behavior. Although there's a significant likelihood that q will receive the exact same bit pattern as p, nothing in the standard would require that p must be considered a valid alias for q even in that scenario. On the other hand, it also seems strange to say that two variables of the same type can hold the exact same bits without their contents being equivalent. Thus, while it's clearly natural that the function would be allowed to either return 0 along with values of *pp and *qq that don't compare bit-wise equal, or 1234 along with values of *pp and *qq that do compare bit-wise equal, the Standard would seem to allow the function to behave arbitrarily if both malloc happen to yield bitwise-equivalent values.
Background:
When deleting a cell in a hash table that uses linear probing you have to indicate that a value once existed at that cell but can be skipped during a search. The easiest way to solve this is to add another variable to store this information, but this extra variable can be avoided if an guaranteed invalid memory address is known and is used to represent this state.
Question:
I assume that since 0 is a guaranteed invalid memory address (more often than not), there must be more than just NULL. So my question is, does C provide a standard macro for any other guaranteed invalid memory addresses?
Technically, NULL is not guaranteed to be invalid. It is only guaranteed not to be the address of any object (C11 6.3.2.3:3):
An integer constant expression with the value 0, or such an expression
cast to type void *, is called a null pointer constant(66). If a null
pointer constant is converted to a pointer type, the resulting
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function.
(66) The macro NULL is defined in (and other headers) as a null pointer constant
Your usage does not require the special address value to be invalid either: obviously, you are not accessing it, unless segfaulting is part of the normal behavior of your program.
So you could use the addresses of as many objects as you like, as long as the addresses of these objects are not intended to be part of the normal contents of a cell.
For instance, for an architecture where converting between pointers to objects preserve the representation, you could use:
char a, b, …;
#define NULL1 (&a)
#define NULL2 (&b)
…
Strictly speaking, NULL is not required to be numerically zero at runtime. C translates 0 and NULL, in a pointer context, into an implementation-defined invalid address. That address is often numerically zero, but that is not guaranteed by the C standard. To the best of my knowledge, C itself does not provide any invalid addresses guaranteed to be distinct from NULL.
You can also create your own 'invalid' address pointer:
const void* const SOME_MARKER = (void*) &x;
If you make sure that x (or its address) can never be actually used where you want to use SOME_MARKER you should be safe and 100% portable.