Why NULL is not predefined by the compiler - c

This issue bothered me for a while. I never saw a different definition of NULL, it's always
#define NULL ((void *) 0)
is there any architecture where NULL is defined diferently, and if so, why the compiler don't declare this for us ?

C 2011 Standard, online draft
6.3.2.3 Pointers
...
3 An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
66) The macro NULL is defined in <stddef.h> (and other headers) as a null pointer constant; see 7.19.
The macro NULL is always defined as a zero-valued constant expression; it can be a naked 0, or 0 cast to void *, or some other integral expression that evaluates to 0. As far as your source code is concerned, NULL will always evaluate to 0.
Once the code has been translated, any occurrence of the null pointer constant (0, NULL, etc.) will be replaced with whatever the underlying architecture uses for a null pointer, which may or may not be 0-valued.

WhozCraig wrote these comments to a now-deleted answer, but it could be promoted to a full answer (and that's what I've done here). He notes:
Interesting note: AS/400 is a very unique platform where any non-valid pointer is considered equivalent to NULL. The mechanics which they employ to do this are simply amazing. "Valid" in this sense is any 128-bit pointer (the platform uses a 128bit linear address space for everything) containing a "value" obtained by a known-trusted instruction set. Hard as it is to believe, int *p = (int *)1; if (p) { printf("foo"); } will not print "foo" on that platform. The value assigned to p is not source-trusted, and thus, considered "invalid" and thereby equivalent to NULL.
It's frankly startling how it works. Each 16-byte paragraph in the mapped virtual address space of a process has a corresponding "bit" in a process-wide bitmap. All pointers must reside on one of these paragraph boundaries. If the bit is "lit", the corresponding pointer was stored from a trusted source, otherwise it is invalid and equivalent to NULL. Calls to malloc, pointer math, etc, are all scrutinized in determining whether that bit gets lit or not. And as you can imagine, putting pointers in structures brings a whole new world of hurt on the idea of structure packing.
This is marked community-wiki (it's not my answer — I shouldn't get the credit) but it can be deleted if WhozCraig writes his own answer.
What this shows is that there are real platforms with interesting pointer properties.
There have been platforms where #define NULL ((void *)0) is not the usual definition; on some platforms it can be just 0, on others, 0L or 0ULL or other appropriate values as long as the compiler understands it. C++ does not like ((void *)0) as a definition; systems where the headers interwork with C++ may well not use the void pointer version.
I learned C on a machine where the representation for the char * address for a given memory location was different from the int * address for the same memory location. This was in the days before void *, but it meant that you had to have malloc() properly declared (char *malloc(); — no prototypes either), and you had to explicitly cast the return value to the correct type or you got core dumps. Be grateful for the C standard (though the machine in question, an ICL Perq — badged hardware from Three Rivers — was largely superseded by the time the standard was defined).

In the dark ages before ANSI-C the old K&R C had many different implementations on hardware that would be considered bizarre today. This was before the days of VM when machines were very "real". Addresses of zero were not only just fine on these machines, an address of zero could be popular... I think it was CDC that sometimes stored the system constant of zero at zero (and did strange things happen if this was set non-zero).
if ( NULL != ptr ) /* like this */
if ( ptr ) /* never like this */
The trick was finding address you could safely use to indicate "nothing" as storing things at the end of memory was also popular, which ruled out 0xFFFF on some architectures. And these architectures tended to use word addresses rather than byte addresses.

I don't know the answer to this but I'm making a guess. In C you usually do a lot of mallocs, and consequently many tests for returned pointers. Since malloc returns void *, and especially (void *)0 upon failure, NULL is a natrual thing to define in order to test malloc success. Since this is so essential, other library functions use NULL (or (void *)0) too, like fopen. Actually, everything that returns a pointer.
Hence here is no reason the define this at the language level - it's just a special pointer value that can be returned by so many functions.

Related

Getting the offset of a variable inside a struct is based on the NULL pointer, but why?

I found a trick on a youtube video explaining how you can get the offset of a struct member by using a NULL pointer. I understand the code snippit below (the casts, the ampersand, and so on), but I do not understand why this works with the NULL pointer. I thought that the NULL pointer could not point to anything. So I cannot mentally visualize how it works. Second, the NULL pointer is not always represented by the compiler as being 0, somtimes it is a non-zero value. But than how could this piece of code work correctly ? Or wouldn't it work correctly anymore ?
#include <stdio.h>
int main(void)
{
/* Getting the offset of a variable inside a struct */
typedef struct {
int a;
char b[23];
float c;
} MyStructType;
unsigned offset = (unsigned)(&((MyStructType * )NULL)->c);
printf("offset = %u\n", offset);
return 0;
}
I found a trick on a youtube video explaining how you can get the
offset of a struct member by using a NULL pointer.
Well, at least you came here to ask about the random Internet advice you turned up. We're an Internet resource ourselves, of course, but I like to think that our structure and reputation gives you a basis for estimating the reliability of what we have to say.
I understand the
code snippit below (the casts, the ampersand, and so on), but I do not
understand why this works with the NULL pointer. I thought that the
NULL pointer could not point to anything.
Yes, from the perspective of C semantics, a null pointer definitely does not point to anything, and NULL is a null pointer constant.
So I cannot mentally
visualize how it works.
The (flawed) idea is that
NULL is equivalent to a pointer to address 0 in a flat address space (unsafe assumption);
((MyStructType * )NULL)->c designates the member c of an altogether hypothetical object of type MyStructType residing at that address (not supported by the standard);
applying the & operator yields the address that such a member would have if it in fact existed (not supported by the standard); and
converting the resulting address to an integer yields an address in the assumed flat address space, expressed in units the size of a C char (in no way guaranteed);
so that the resulting integer simultaneously represents both an absolute address and an offset (follows from the previous assumptions, because the supposed base address of the hypothetical structure is 0).
Second, the NULL pointer is not always
represented by the compiler as being 0, somtimes it is a non-zero
value.
Quite right, that is one of the flaws in the scheme presented.
But than how could this piece of code work correctly ? Or
wouldn't it work correctly anymore ?
Although the Standard provides no basis to justify relying on the code to behave as advertised, that does not mean that it must necessarily fail. C implementations do need to be internally consistent about how they represent null pointers, and -- to a certain degree -- about how they convert between pointers and integer. It turns out to be fairly common that the code's assumptions about those things are in fact satisfied by implementations.
So in practice, the code does work with many C implementations. But it systematically produces the wrong answer with some others, and there may be some in which it produces the right answer some appreciable fraction of the time, but the wrong answer the rest of the time.
Note that this code is actually undefined behaviour. Dereferencing a NULL pointer is never allowed, even if no value is accessed, only the address (this was a root cause for a linux kernel exploit)
Use offsetof instead for a save alternative.
As to why it seems works with a NULL pointer: it assumes that NULL is 0. Basically you could use any pointer and calculate:
MyStructType t;
unsigned off = (unsigned)(&(&t)->c) - (unsigned)&t;
if &t == 0, this becomes:
unsigned off = (unsigned)(&(0)->c) - 0;
Substracting 0 is a no-op
This code is platform specific. This code might cause undefined behaviour on one platform and it might work on others.
That's why the C standard requires every library to implement the offsetof macro which could expand to code like derefering the NULL pointer, at least you can be sure the code will not crash on any platform
typedef struct Struct
{
double d;
} Struct;
offsetof(Struct, d)
This question resembles me to something seen more than 30 years ago:
#define XtOffset(p_type,field) \
((Cardinal) (((char *) (&(((p_type)NULL)->field))) - ((char *) NULL)))
#ifdef offsetof
#define XtOffsetOf(s_type,field) offsetof(s_type,field)
#else
#define XtOffsetOf(s_type,field) XtOffset(s_type*,field)
#endif
from xorg-libXt/include/X11/Intrinsic.h X11R4.
They took into account that a NULL Pointer could be different to 0x0 and included that in the definition of the XtOffsetOf macro.
This is a dirty hack and might not necessarily work.
(MyStructType * )NULL creates a null pointer. Null pointer and null pointer constant are two different terms. NULL is guaranteed to be a null pointer constant equivalent to 0, but the obtained null pointer we get when casting it to another type can be any implementation-defined value.
So it happened to work by luck on your specific system, you could as well have gotten any strange value.
The offsetof macro has been standard C since 1989 so maybe your Youtube hacker is still stuck in the early 1980s.

What other definitions of NULL were there on older platforms? [duplicate]

This question already has answers here:
When was the NULL macro not 0?
(7 answers)
Closed 9 years ago.
Occasionally, one reads that older C compilers had definitions of NULL that were not 0 or (void *)0. My understanding of the C standard was that even if the platform's bit pattern for a null pointer is nonzero, an integer 0 cast to a pointer (either implicitly or explicitly) is still a null pointer, and is stored internally as the platform's null pointer bit pattern.
But for example, here it is written:
In some older C compilers, NULL is variously defined to some weird things, so you have to be more careful with it.
I remember reading this in various other places from time to time. Unless this is a persistent urban legend, what other definitions of NULL have been in use?
You are absolutely right when you are saying that null-pointer constant (constant integral zero or constant integral zero cast to void *) is guaranteed to be properly converted to the appropriate internal representation of null pointer for the target type. Which means that there's no need for any other definition of NULL. 0 or (void *) 0 will work everywhere.
However, at the same time, very early versions of C language did not make such guarantee and did not have a standard NULL macro. Assigning an integral value to a pointer variable caused the pointer to literally point to the address represented by that integral value. Assigning constant 0 to a pointer simply made it to point to address 0. If users wanted to have a reserved pointer value in their program, they had to manually choose a "throwaway" address to use for that purpose.
It is quite possible that macro NULL came into informal usage before the standardization of the language and before the aforementioned zero-to-pointer conversion rule came into existence. At that time NULL would have to be defined as an integral value representing that exact reserved address. E.g. a pre-standard C implementation that wanted to use address 0xBAADFOOD for null pointers would define NULL as 0xBAADFOOD. I can't confirm that though, since I don't know when exactly macro NULL first appeared "in the wild".
You're absolutely correct.
One of the "weird things" you might encounter is #define'ing NULL to ((void *)0), which would cause code that used NULL as anything but a pointer to fail.
Here are a few other variations:
http://www.tutorialspoint.com/c_standard_library/c_macro_null.htm
#define NULL ((char *)0)
or
#define NULL 0L
or
#define NULL 0
It's also worth noting that C++ 11 introduces nullptr to provide a type-safe disambiguation of NULL, 0, etc. Refer to Much Ado about Nothing, or (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1601.pdf‎

When did constant 0 in pointer context acquire its special status?

As you know, in standard modern C language the constant 0 value used in pointer context acts as a null-pointer constant, which gets converted to a platform-specific (and possibly even type-specific) null-pointer value.
Meanwhile, the early versions of C language, as the one described in C Reference Manual, did not make much of a distinction between pointer and integer contexts, allowing one to freely compare and assign integers to pointers. If am not mistaken, in that version of C the constant 0 had no special status, meaning that assigning the value of constant 0 to a pointer would simply make it point to physical address 0 (just like assigning the value of 42 to a pointer would make it point to physical address 42).
In ANSI C things have changed significantly. Now assigning the constant 0 to a pointer will place some platform-specific null-pointer value into that pointer. Null-pointer value is not required to be represented by physical 0 value.
So, at what point in the history of C language did it change from one to another? Did K&R C already incorporate the higher-level concept of null-pointer with constant 0 given its special status? Or did the K&R C still guarantee physical assignment of integers to pointers even for constant 0?
It goes back to nearly the beginning of C (if not the very beginning). If you look on page 21 of the January 1974 C reference manual, it's more or less directly stated in some sample code:
/* is pointer null? */
if (p == 0) {
Going back still a bit further, to ca. 1972-73 PDP-11/20 compiler, we find:
match(tree, table, nreg)
int tree[], table[]; {
extern opdope[], dcalc, notcompat;
int op, d1, d2, t1, t2, p1[], p2[];
char mp[];
if (tree==0)
return(0);
op = *tree;
At least if I'm reading this correctly, the if (tree==0) line is checking that tree is a non-null pointer before attempting to dereference it.
Unfortunately, Dennis says he can't be much more certain about the date than "1972-73".
There isn't much history of C before that. Nonetheless, there does seem to be a bit of history of 0 being treated as a null pointer. It looks to me like use of 0 as a null pointer is something that C "inherited" from Unix. The entry for exec in the November 1971 1st Edition Unix programmer's manual shows a pointer with the value 0 to signal the end of the list of arguments. According to Dennis' description, at this point "C was still to come."
Based on all this, I'd tentatively conclude that C treated 0 as a null pointer from the very beginning, or at least so early on that there's probably no longer any record of a version of the language that was otherwise.
I haven't been nearly as successful at tracking down documentation about the first point at which a null pointer might have had non-zero bits. From the viewpoint of the language, this has never been relevant. I suspect it happened fairly early on, but finding documentation to support that would be difficult. One of the earliest ports of C was to IBM System/360 mainframes, and although I can't find direct documentation of it, my guess would be that internally the null pointer value used on these machines was probably non-zero. I don't have the exact number handy, but I know that PL/I on these machines used a non-zero value for its equivalent of a null pointer; I'd guess that when they ported C to these machines, they probably used the same value.
See the C-faq question 5.4
As a matter of style, many programmers prefer not to have unadorned 0's scattered through their programs, some representing numbers and some representing pointers. Therefore, the preprocessor macro NULL is defined (by several headers, including and ) as a null pointer constant, typically 0 or ((void *)0) (see also question 5.6). A programmer who wishes to make explicit the distinction between 0 the integer and 0 the null pointer constant can then use NULL whenever a null pointer is required.
Using NULL is a stylistic convention only; the preprocessor turns NULL back into 0 which is then recognized by the compiler, in pointer contexts, as before. In particular, a cast may still be necessary before NULL (as before 0) in a function call argument. The table under question 5.2 above applies for NULL as well as 0 (an unadorned NULL is equivalent to an unadorned 0).
NULL should be used only as a pointer constant; see question 5.9.
References: K&R1 Sec. 5.4 pp. 97-8
K&R2 Sec. 5.4 p. 102
ISO Sec. 7.1.6, Sec. 6.2.2.3
Rationale Sec. 4.1.5
H&S Sec. 5.3.2 p. 122, Sec. 11.1 p. 292
What is this infamous null pointer anyways?
The language definition states that for each pointer type, there is a special value--the "null pointer"--which is distinguishable from all other pointer values and which is "guaranteed to compare unequal to a pointer to any object or function." That is, a null pointer points definitively nowhere; it is not the address of any object or function. The address-of operator & will never yield a null pointer, nor will a successful call to malloc.[footnote] (malloc does return a null pointer when it fails, and this is a typical use of null pointers: as a "special" pointer value with some other meaning, usually "not allocated" or "not pointing anywhere yet.")
A null pointer is conceptually different from an uninitialized pointer. A null pointer is known not to point to any object or function; an uninitialized pointer might point anywhere. See also questions 1.30, 7.1, and 7.31.
As mentioned above, there is a null pointer for each pointer type, and the internal values of null pointers for different types may be different. Although programmers need not know the internal values, the compiler must always be informed which type of null pointer is required, so that it can make the distinction if necessary (see questions 5.2, 5.5, and 5.6).
References: K&R1 Sec. 5.4 pp. 97-8
K&R2 Sec. 5.4 p. 102
ISO Sec. 6.2.2.3
Rationale Sec. 3.2.2.3
H&S Sec. 5.3.2 pp. 121-3
Finally, only constant integral expressions with value 0 are guaranteed to indicate null pointers.

Pointer comparisons in C. Are they signed or unsigned?

Hi I'm sure this must be a common question but I can't find the answer when I search for it. My question basically concerns two pointers. I want to compare their addresses and determine if one is bigger than the other. I would expect all addresses to be unsigned during comparison. Is this true, and does it vary between C89, C99 and C++? When I compile with gcc the comparison is unsigned.
If I have two pointers that I'm comparing like this:
char *a = (char *) 0x80000000; //-2147483648 or 2147483648 ?
char *b = (char *) 0x1;
Then a is greater. Is this guaranteed by a standard?
Edit to update on what I am trying to do. I have a situation where I would like to determine that if there's an arithmetic error it will not cause a pointer to go out of bounds. Right now I have the start address of the array and the end address. And if there's an error and the pointer calculation is wrong, and outside of the valid addresses of memory for the array, I would like to make sure no access violation occurs. I believe I can prevent this by comparing the suspect pointer, which has been returned by another function, and determining if it is within the acceptable range of the array. The question of negative and positive addresses has to do with whether I can make the comparisons, as discussed above in my original question.
I appreciate the answers so far. Based on my edit would you say that what I'm doing is undefined behavior in gcc and msvc? This is a program that will run on Microsoft Windows only.
Here's an over simplified example:
char letters[26];
char *do_not_read = &letters[26];
char *suspect = somefunction_i_dont_control(letters,26);
if( (suspect >= letters) && (suspect < do_not_read) )
printf("%c", suspect);
Another edit, after reading AndreyT's answer it appears to be correct. Therefore I will do something like this:
char letters[26];
uintptr_t begin = letters;
uintptr_t toofar = begin + sizeof(letters);
char *suspect = somefunction_i_dont_control(letters,26);
if( ((uintptr_t)suspect >= begin) && ((uintptr_t)suspect < toofar ) )
printf("%c", suspect);
Thanks everyone!
Pointer comparisons cannot be signed or unsigned. Pointers are not integers.
C language (as well as C++) defines relative pointer comparisons only for pointers that point into the same aggregate (struct or array). The ordering is natural: the pointer that points to an element with smaller index in an array is smaller. The pointer that points to a struct member declared earlier is smaller. That's it.
You can't legally compare arbitrary pointers in C/C++. The result of such comparison is not defined. If you are interested in comparing the numerical values of the addresses stored in the pointers, it is your responsibility to manually convert the pointers to integer values first. In that case, you will have to decide whether to use a signed or unsigned integer type (intptr_t or uintptr_t). Depending on which type you choose, the comparison will be "signed" or "unsigned".
The integer-to-pointer conversion is wholly implementation defined, so it depends on the implementation you are using.
That said, you are only allowed to relationally compare pointers that point to parts of the same object (basically, to subobjects of the same struct or elements of the same array). You aren't allowed to compare two pointers to arbitrary, wholly unrelated objects.
From a draft C++ Standard 5.9:
If two pointers p and q of the same type point to different objects
that are not members of the same object or elements of the same array
or to different functions, or if only one of them is null, the results
of p<q, p>q, p<=q, and p>=q are unspecified.
So, if you cast numbers to pointers and compare them, C++ gives you unspecified results. If you take the address of elements you can validly compare, the results of comparison operations are specified independently of the signed-ness of the pointer types.
Note unspecified is not undefined: it's quite possible to compare pointers to different objects of the same type that aren't in the same structure or array, and you can expect some self-consistent result (otherwise it'd be impossible to use such pointers as keys in trees, or to sort a vector of such pointers, binary search the vector etc., where a consistent intuitive overall < ordering is needed).
Note that in very old C++ Standards the behaviour was undefined - like the 2005 WG14/N1124 draft andrewdski links to under James McNellis's answer -
To complement the other answers, comparison between pointers that point to different objects depends on the standard.
In C99 (ISO/IEC 9899:1999 (E)), §6.5.8:
5 [...] In all other cases, the behavior is undefined.
In C++03 (ISO/IEC 14882:2003(E)), §5.9:
-Other pointer comparisons are unspecified.
I know several of the answers here say you cannot compare pointers unless they point to within the same structure, but that's a red herring and I'll try to explain why. One of your pointers points to the start of your array, the other to the end, so they are pointing to the same structure. A language lawyer could say that if your third pointer points outside of the object, the comparison is undefined, so x >= array.start might be true for all x. But this is no issue, since at the point of comparison C++ cannot know if the array isn't embedded in an even bigger structure. Furthermore, if your address space is linear, like it's bound to be these days, your pointer comparison will be implemented as an (un)signed integer comparison, since any other implementation would be slower. Even in the times of segments and offsets, (far) pointer comparison was implemented by first normalising the pointer and then comparing them as integers.
What this all boils down to then, is that if your compiler is okay, comparing the pointers without worrying about the signs should work, if all you care about is that the pointer points within the array, since the compiler should make the pointers signed or unsigned depending on which of the two boundaries a C++ object may straddle.
Different platforms behave differently in this matter, which is why C++ has to leave it up to the platform. There are even platforms in which both addresses near 0 and 80..00h are not mappable or already taken at process start-up. In that case, it doesn't matter, as long as you're consistent about it.
Sometimes this can cause compatibility issues. As an example, in Win32 pointers are unsigned. Now, it used to be the case that of the 4GB address space only the lower half (more precisely 10000h ... 7FFFFFFFh, because of the NULL-Pointer Assignment Partition) was available to applications; high addresses were only available to the kernel. This caused some people to put addresses in signed variables, and their programs would keep working since the high bit was always 0. But then came /3GB switch, which made almost 3 GB available to applications (more precisely 10000h ... BFFFFFFFh) and the application would crash or behave erratically.
You explicitly state your program will be Windows-only, which uses unsigned pointers. However, maybe you'll change your mind in the future, and using intptr_t or uintptr_t is bad for portability. I also wonder if you should be doing this at all... if you're indexing into an array it might be safer to compare indices instead. Suppose for example that you have a 1 GB array at 1500000h ... 41500000h, consisting of 16,384 elements of 64 kB each. Suppose you accidentally look up index 80,000 – clearly out of range. The pointer calculation will yield 39D00000h, so your pointer check will allow it, even though it shouldn't.

Can a pointer (address) ever be negative?

I have a function that I would like to be able to return special values for failure and uninitialized (it returns a pointer on success).
Currently it returns NULL for failure, and -1 for uninitialized, and this seems to work... but I could be cheating the system. IIRC, addresses are always positive, are they not? (although since the compiler is allowing me to set an address to -1, this seems strange).
[update]
Another idea I had (in the event that -1 was risky) is to malloc a char # the global scope, and use that address as a sentinel.
No, addresses aren't always positive - on x86_64, pointers are sign-extended and the address space is clustered symmetrically around 0 (though it is usual for the "negative" addresses to be kernel addresses).
However the point is mostly moot, since C only defines the meaning of < and > pointer comparisons between pointers that are to part of the same object, or one past the end of an array. Pointers to completely different objects cannot be meaningfully compared other than for exact equality, at least in standard C - if (p < NULL) has no well defined semantics.
You should create a dummy object with static storage duration and use its address as your unintialised value:
extern char uninit_sentinel;
#define UNINITIALISED ((void *)&uninit_sentinel)
It's guaranteed to have a single, unique address across your program.
The valid values for a pointer are entirely implementation-dependent, so, yes, a pointer address could be negative.
More importantly, however, consider (as an example of a possible implementation choice) the case where you are on a 32-bit platform with a 32-bit pointer size. Any value that can be represented by that 32-bit value might be a valid pointer. Other than the null pointer, any pointer value might be a valid pointer to an object.
For your specific use case, you should consider returning a status code and perhaps taking the pointer as a parameter to the function.
It's generally a bad design to try to multiplex special values onto a return value... you're trying to do too much with a single value. It would be cleaner to return your "success pointer" via argument, rather than the return value. That leaves lots of non-conflicting space in the return value for all of the conditions you want to describe:
int SomeFunction(SomeType **p)
{
*p = NULL;
if (/* check for uninitialized ... */)
return UNINITIALIZED;
if (/* check for failure ... */)
return FAILURE;
*p = yourValue;
return SUCCESS;
}
You should also do typical argument checking (ensure that 'p' isn't NULL).
The C language does not define the notion of "negativity" for pointers. The property of "being negative" is a chiefly arithmetical one, not in any way applicable to values of pointer type.
If you have a pointer-returning function, then you cannot meaningfully return the value of -1 from that function. In C language integral values (other than zero) are not implicitly convertible to pointer types. An attempt to return -1 from a pointer-returning function is an immediate constraint violation that will result in diagnostic message. In short, it is an error. If your compiler allows it, it simply means that it doesn't enforce that constraint too strictly (most of the time they do it for compatibility with pre-standard code).
If you force the value of -1 to pointer type by an explicit cast, the result of the cast will be implementation-defined. The language itself makes no guarantees about it. It might easily prove to be the same as some other, valid pointer value.
If you want to create a reserved pointer value, there no need to malloc anything. You can simple declare a global variable of the desired type and use its address as the reserved value. It is guaranteed to be unique.
Pointers can be negative like an unsigned integer can be negative. That is, sure, in a two's-complement interpretation, you could interpret the numerical value to be negative because the most-significant-bit is on.
What's the difference between failure and unitialized. If unitialized is not another kind of failure, then you probably want to redesign the interface to separate these two conditions.
Probably the best way to do this is to return the result through a parameter, so the return value only indicates an error. For example where you would write:
void* func();
void* result=func();
if (result==0)
/* handle error */
else if (result==-1)
/* unitialized */
else
/* initialized */
Change this to
// sets the *a to the returned object
// *a will be null if the object has not been initialized
// returns true on success, false otherwise
int func(void** a);
void* result;
if (func(&result)){
/* handle error */
return;
}
/*do real stuff now*/
if (!result){
/* initialize */
}
/* continue using the result now that it's been initialized */
#James is correct, of course, but I'd like to add that pointers don't always represent absolute memory addresses, which theoretically would always be positive. Pointers also represent relative addresses to some point in memory, often a stack or frame pointer, and those can be both positive and negative.
So your best bet is to have your function accept a pointer to a pointer as a parameter and fill that pointer with a valid pointer value on success while returning a result code from the actual function.
James answer is probably correct, but of course describes an implementation choice, not a choice that you can make.
Personally, I think addresses are "intuitively" unsigned. Finding a pointer that compares as less-than a null pointer would seem wrong. But ~0 and -1, for the same integer type, give the same value. If it's intuitively unsigned, ~0 may make a more intuitive special-case value - I use it for error-case unsigned ints quite a lot. It's not really different (zero is an int by default, so ~0 is -1 until you cast it) but it looks different.
Pointers on 32-bit systems can use all 32 bits BTW, though -1 or ~0 is an extremely unlikely pointer to occur for a genuine allocation in practice. There are also platform-specific rules - for example on 32-bit Windows, a process can only have a 2GB address space, and there's a lot of code around that encodes some kind of flag into the top bit of a pointer (e.g. for balancing flags in balanced binary trees).
Actually, (at least on x86), the NULL-pointer exception is generated not only by dereferencing the NULL pointer, but by a larger range of addresses (eg, first 65kb). This helps catching such errors as
int* x = NULL;
x[10] = 1;
So, there are more addresses that are garanteed to generate the NULL pointer exception when dereferenced.
Now consider this code (made compilable for AndreyT):
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define ERR_NOT_ENOUGH_MEM (int)NULL
#define ERR_NEGATIVE (int)NULL + 1
#define ERR_NOT_DIGIT (int)NULL + 2
char* fn(int i){
if (i < 0)
return (char*)ERR_NEGATIVE;
if (i >= 10)
return (char*)ERR_NOT_DIGIT;
char* rez = (char*)malloc(strlen("Hello World ")+sizeof(char)*2);
if (rez)
sprintf(rez, "Hello World %d", i);
return rez;
};
int main(){
char* rez = fn(3);
switch((int)rez){
case ERR_NOT_ENOUGH_MEM: printf("Not enough memory!\n"); break;
case ERR_NEGATIVE: printf("The parameter was negative\n"); break;
case ERR_NOT_DIGIT: printf("The parameter is not a digit\n"); break;
default: printf("we received %s\n", rez);
};
return 0;
};
this could be useful in some cases.
It won't work on some Harvard architectures, but will work on von Neumann ones.
Do not use malloc for this purpose. It might keep unnecessary memory tied up (if a lot of memory is already in use when malloc gets called and the sentinel gets allocated at a high address, for example) and it confuses memory debuggers/leak detectors. Instead simply return a pointer to a local static const char object. This pointer will never compare equal to any pointer the program could obtain in any other way, and it only wastes one byte of bss.
You don't need to care about the signness of a pointer, because it's implementation defined. The real question here is "how to return special values from a function returning pointer?" which I've explained in detail in my answer to the question Pointer address span on various platforms
In summary, the all-one bit pattern (-1) is (almost) always safe, because it's already at the end of the spectrum and data cannot be stored wrapped around to the first address, and the malloc family never returns -1. In fact this value is even returned by many Linux system calls and Win32 APIs to indicate another state for the pointer. So if you need just failure and uninitialized then it's a good choice
But you can return far more error states by utilizing the fact that variables must be aligned properly (unless you specified some other options). For example in a pointer to int32_t the low 2 bits are always zero which means only ¹⁄₄ of the possible values are valid addresses, leaving all of the remaining bit patterns for you to use. So a simple solution would be just checking the lowest bit
int* result = func();
if (!result)
error_happened();
else if ((uintptr_t)result & 1)
uninitialized();
In this case you can return both a valid pointer and some additional data at the same time
You can also use the high bits for storing data in 64-bit systems. On ARM there's a flag that tells the CPU to ignore the high bits in the addresses. On x86 there isn't a similar thing but you can still use those bits as long as you make it canonical before dereferencing. See Using the extra 16 bits in 64-bit pointers
See also
Is ((void *) -1) a valid address?
NULL is the only valid error return in this case, this is true anytime an unsigned value such as a pointer is returned. It may be true that in some cases pointers will not be large enough to use the sign bit as a data bit, however since pointers are controlled by the OS not the program I would not rely on this behavior.
Remember that a pointer is basically a 32-bit value; whether or not this is a possible negative or always positive number is just a matter of interpretation (i.e.) whether the 32nd bit is interpreted as the sign bit or as a data bit. So if you interpreted 0xFFFFFFF as a signed number it would be -1, if you interpreted it as an unsigned number it would be 4294967295. Technically, it is unlikely that a pointer would ever be this large, but this case should be considered anyway.
As far as an alternative you could use an additional out parameter (returning NULL for all failures), however this would require clients to create and pass a value even if they don't need to distinguish between specific errors.
Another alternative would be to use the GetLastError/SetLastError mechanism to provide additional error information (This would be specific to Windows, don't know if that is an issue or not), or to throw an exception on error instead.
Positive or negative is not a meaningful facet of pointer type. They pertain to signed integer including signed char, short, int etc.
People talk about negative pointer mostly in a situation that treats pointer's machine representation as an integer type. e.g. reinterpret_cast<intptr_t>(ptr). In this case, they are actually talking about the cast integer, not the pointer itself.
In some scenario I think pointer is inherently unsigned, we talk about address in terms below or above. 0xFFFF.FFFF is above 0x0AAAA.0000, which is intuitively for human beings. Although 0xFFFF.FFFF is actually a "negative" while 0x0AAA.0000 is positive.
But in other scenarios such as pointer subtraction (ptr1 - ptr2) that results in a signed value whose type is ptrdiff_t, it's inconsistent when you compare with integer's subtraction, signed_int_a - signed_int_b results in a signed int type, unsigned_int_a - unsigned_int_b produces an unsigned type. But for pointer subtraction, it produces a signed type, because the semantic is the distance between two pointers, the unit is number of elements.
In summary I suggest treating pointer type as standalone type, every type has it's set of operation on it. For pointers (excluding function pointer, member function pointer, and void *):
List item
+, +=
ptr + any_integer_type
-, -=
ptr - any_integer_type
ptr1 - ptr2
++ both prefix and postfix
-- both prefix and postfix
Note there are no / * % operations for pointer. That's also supported that pointer should be treated as a standalone type, instead of "A type similar to int" or "A type whose underlying type is int so it should looks like int".

Resources