In open source component, cjson,
#define is_error(ptr) ((unsigned long)ptr > (unsigned long)-4000L)
Above statement is used to check the validity of pointer as shown below
json_object* reply = json_object_new_object();
if (!reply || is_error(reply))
{
. . . //error handling
}
How does comparing pointer with (unsigned long)-4000L validates pointer?
The reason for this looks like they're using the pointer value to contain "either a pointer or an error value".
Look here:
struct json_object* json_tokener_parse(const char *str)
{
struct json_tokener* tok;
struct json_object* obj;
tok = json_tokener_new();
obj = json_tokener_parse_ex(tok, str, -1);
if(tok->err != json_tokener_success)
obj = error_ptr(-tok->err); // <<<<<---
json_tokener_free(tok);
return obj;
}
The function is returning a special value as a pointer. The err_ptr macro returns the negative of the error code, presumably because the author assumes this will never be a valid pointer address.
Here is a test that demonstrates the expected usage of the macro, i.e. malformed JSON.
new_obj = json_tokener_parse("{ foo }");
if(is_error(new_obj)) printf("got error as expected\n");
So, the reason for using that special value is so they can hold "either a pointer to a structure or an error code". This could also be done with a union or a struct, or by some other means, but they chose to do it this way.
wrong check! There should not be any such value like (unsigned long)-4000L). It should be either a NULL (not allocated) or non-NULL (having valid value). Whenever the pointer gets deallocated, it should be reverted back to NULL, I mean re-assign to NULL to avoid dangling pointer. A value of non-NULL check is enough to know the pointer is valid and pointing to some valid content.
It doesn't, except in some very specific cases - maybe the original author had a pointer bug where his pointers were corrupted in a way that they got in the "range" of this comparison, or his compiler and operating system have some strange non-standard pointer validation or interpretation "features".
-4000 interpreted as an unsigned 32-bit integer is 0xfffff060.
If it were code running in kernel, it mighy have something to do with the userland / kernel memory space divide (i.e. top half vs bottom half of the address space), but even then it would probably be wrong. My money is on the first idea.
It look likes the author is making platform-specific assumptions about where virtual memory will be allocated. In this case the assumption is that a valid pointer will always be in within roughtly the first 4GB of addressable space [edit: that's onthe (dubious) assumption that long is 32-bit.]
Your particular example seems to come from a thing called OpenWebOS. I don't know what that is; but if they are the makers of the Operating System, then they get to make up their own rules about where the pointers will go. Perhaps that OS even has a convention that addresses above a certain value are used to signal errors.
If you want to write portable code, you should think hard before trying tricks like this.
Related
I found a trick on a youtube video explaining how you can get the offset of a struct member by using a NULL pointer. I understand the code snippit below (the casts, the ampersand, and so on), but I do not understand why this works with the NULL pointer. I thought that the NULL pointer could not point to anything. So I cannot mentally visualize how it works. Second, the NULL pointer is not always represented by the compiler as being 0, somtimes it is a non-zero value. But than how could this piece of code work correctly ? Or wouldn't it work correctly anymore ?
#include <stdio.h>
int main(void)
{
/* Getting the offset of a variable inside a struct */
typedef struct {
int a;
char b[23];
float c;
} MyStructType;
unsigned offset = (unsigned)(&((MyStructType * )NULL)->c);
printf("offset = %u\n", offset);
return 0;
}
I found a trick on a youtube video explaining how you can get the
offset of a struct member by using a NULL pointer.
Well, at least you came here to ask about the random Internet advice you turned up. We're an Internet resource ourselves, of course, but I like to think that our structure and reputation gives you a basis for estimating the reliability of what we have to say.
I understand the
code snippit below (the casts, the ampersand, and so on), but I do not
understand why this works with the NULL pointer. I thought that the
NULL pointer could not point to anything.
Yes, from the perspective of C semantics, a null pointer definitely does not point to anything, and NULL is a null pointer constant.
So I cannot mentally
visualize how it works.
The (flawed) idea is that
NULL is equivalent to a pointer to address 0 in a flat address space (unsafe assumption);
((MyStructType * )NULL)->c designates the member c of an altogether hypothetical object of type MyStructType residing at that address (not supported by the standard);
applying the & operator yields the address that such a member would have if it in fact existed (not supported by the standard); and
converting the resulting address to an integer yields an address in the assumed flat address space, expressed in units the size of a C char (in no way guaranteed);
so that the resulting integer simultaneously represents both an absolute address and an offset (follows from the previous assumptions, because the supposed base address of the hypothetical structure is 0).
Second, the NULL pointer is not always
represented by the compiler as being 0, somtimes it is a non-zero
value.
Quite right, that is one of the flaws in the scheme presented.
But than how could this piece of code work correctly ? Or
wouldn't it work correctly anymore ?
Although the Standard provides no basis to justify relying on the code to behave as advertised, that does not mean that it must necessarily fail. C implementations do need to be internally consistent about how they represent null pointers, and -- to a certain degree -- about how they convert between pointers and integer. It turns out to be fairly common that the code's assumptions about those things are in fact satisfied by implementations.
So in practice, the code does work with many C implementations. But it systematically produces the wrong answer with some others, and there may be some in which it produces the right answer some appreciable fraction of the time, but the wrong answer the rest of the time.
Note that this code is actually undefined behaviour. Dereferencing a NULL pointer is never allowed, even if no value is accessed, only the address (this was a root cause for a linux kernel exploit)
Use offsetof instead for a save alternative.
As to why it seems works with a NULL pointer: it assumes that NULL is 0. Basically you could use any pointer and calculate:
MyStructType t;
unsigned off = (unsigned)(&(&t)->c) - (unsigned)&t;
if &t == 0, this becomes:
unsigned off = (unsigned)(&(0)->c) - 0;
Substracting 0 is a no-op
This code is platform specific. This code might cause undefined behaviour on one platform and it might work on others.
That's why the C standard requires every library to implement the offsetof macro which could expand to code like derefering the NULL pointer, at least you can be sure the code will not crash on any platform
typedef struct Struct
{
double d;
} Struct;
offsetof(Struct, d)
This question resembles me to something seen more than 30 years ago:
#define XtOffset(p_type,field) \
((Cardinal) (((char *) (&(((p_type)NULL)->field))) - ((char *) NULL)))
#ifdef offsetof
#define XtOffsetOf(s_type,field) offsetof(s_type,field)
#else
#define XtOffsetOf(s_type,field) XtOffset(s_type*,field)
#endif
from xorg-libXt/include/X11/Intrinsic.h X11R4.
They took into account that a NULL Pointer could be different to 0x0 and included that in the definition of the XtOffsetOf macro.
This is a dirty hack and might not necessarily work.
(MyStructType * )NULL creates a null pointer. Null pointer and null pointer constant are two different terms. NULL is guaranteed to be a null pointer constant equivalent to 0, but the obtained null pointer we get when casting it to another type can be any implementation-defined value.
So it happened to work by luck on your specific system, you could as well have gotten any strange value.
The offsetof macro has been standard C since 1989 so maybe your Youtube hacker is still stuck in the early 1980s.
While switching from linux back to windows, I noticed that my code stopped working. Using the trusty debugger, I found that structs were being initialised differently.
typedef struct base{
struct item * first;
}BASE;
typedef BASE *SPACE;
...
hashmap = malloc(sizeof(SPACE *) * length);
hashSpaceSize = length;
Look at this code for example (hid extra code to keep it tidy, also ignore struct item it's not useful here). Let's say that length is 3. In Linux, when I check the debugger, I see that:
hashmap[0] = NULL;
hashmap[1] = NULL;
hashmap[2] = NULL;
Because I did not initialise the BASEs, I only initialised the fact that there is an array of them. However, in Windows, I see that all of the BASES are initialised. Not only that, but all of the ITEMs within the BASEs are initialised as well.
However, if I, for example, immediately afterwards add this line:
hashmap[0]->first = NULL, I end up with a SIGSEGV error that I can't find the cause of. In Linux, this is because hashmap[0] is NULL, and hence hashmap[0]->first can't even be accessed in the first place. But on Windows, it clearly shows that hashmap[0] exists and has an initialised first value.
I don't know what is going on here, and I can't find anything regarding this bug. If more code is needed, everything is on my github. Linked to the actual file this code is in. But for now, I'm confused as to what's going on...
UPDATE: Apparently I had some looking up to do. Didn't know that malloc returned an uninitialized pointer and not just NULL. That was set by Linux. Thanks though, learnt something new today.
Let's say that length is 3. In Linux, when I check the debugger, I see
that:
hashmap[0] = NULL;
hashmap[1] = NULL;
hashmap[2] = NULL;
Because I did not initialise the BASEs, I only initialised the fact
that there is an array of them.
No. You get all of those being NULL because that happens to be what you get. C does not specify the initial contents of the memory returned by malloc(), and if you performed that allocation under other circumstances then you might not get all NULLs.
However, in Windows, I see that all of
the BASES are initialised. Not only that, but all of the ITEMs within
the BASEs are initialised as well.
They may have non-NULL values, but that's very different from being initialized. The values are very likely to be wild pointers. If they happen to point to accessible memory then you can interpret the data where they point as ITEMs, but again, that does not mean they are initialized, or that it is safe to access that memory. You are delving into undefined behavior here.
However, if I, for example, immediately afterwards add this line:
hashmap[0]->first = NULL, I end up with a SIGSEGV error that I can't
find the cause of.
We can't speak to the cause of your segmentation fault because you have not presented the code responsible, but having an array of pointers does not mean the pointer values within are valid. If they are not, then dereferencing them produces undefined behavior, which often manifests as a segfault. Note well that this does not depend on those pointers being NULL; it can attend accessing any pointer value that does not point to an object belonging to your program and having compatible type.
In Linux, this is because hashmap[0] is NULL, and
hence hashmap[0]->first can't even be accessed in the first place. But
on Windows, it clearly shows that hashmap[0] exists and has an
initialised first value.
No, it doesn't. Again, your debugger shows hashmap[0] having a non-NULL value, which is not at all the same thing.
It is your responsibility to avoid dereferencing invalid pointer values, which are by no means limited to NULL.
The values of bytes pointed to after a successfull call to malloc are uninitialized. That means they can be set to any arbitrary value, including zero. So just because the bytes are either zero or non-zero doesn't mean they are initialized.
Section 7.22.3.4 of the C standard regarding malloc states:
1
#include <stdlib.h>
void *malloc(size_t size);
2 The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate.
So there are no guarantees what the memory returned by malloc will contain.
If on the other hand you use calloc, that function will initialize all allocated bytes to 0.
hashmap = calloc(length, sizeof(SPACE *));
I have a callback function written in C that runs on a server and MUST be crash proof. That is, if expecting an integer and is passed a character string pointer, I must internal to the function determine that, and prevent getting Segmentation faults when trying to do something not allowed on the incorrect parameter type.
The function protoype is:
void callback_function(parameter_type a, const b);
and 'a' is supposed to tell me, via enum, whether 'b' is an integer or a character string pointer.
If the calling function programmer makes a mistake, and tells me in 'a' that 'b' is an integer, and 'b' is really a character string pointer, then how do I determine that without crashing the callback function code. This runs on a server and must keep going if the caller function made a mistake.
The code has to be in C, and be portable so C library extensions would not be possible. The compiler is: gcc v4.8.2
The sizeof an integer on the platform is 4, and so is the length of a character pointer.
An integer could have the same value, numerically, as a character pointer, and vice versa.
If I think I get a character pointer and its not, when I try to find the content of that, I of course get a Segmentation Fault.
If I write a signal handler to handle the fault, how do I now "clear" the signal, and resume execution at a sane place?
Did I mention that 'b' is a union defined as:
union param_2 {
char * c;
int i;
} param_to_be_passed;
I think that's about it.
Thank You for your answers.
That is not possible.
There's no way to "look" at at pointer and determine if it's valid to de-reference, except for NULL-checking it of course.
Other than that, there's no magic way to know if a pointer points at character data, an integer, a function, or anything else.
You are looking for a hack.
What ever proposal comes, do not use such things in production.
If late binding is needed take a different, a fail-safe approach.
If you're writing code for an embedded device, you would expect that all variables would reside in RAM. For example, you might have 128 kB of RAM from addresses 0x20000000 to 0x20020000. If you were passed a pointer to a memory address without this range, in regard to c, that would be another way to determine something was wrong, in addition to checking for a NULL address.
if((a == STRING) && ((b.c == NULL) || (b.c < 0x20000000) || (b.c > 0x20020000)))
return ERROR;
If you're working in a multithreaded environment, you may be able to take this a step further and require all addresses passed to callback_function come from a certain thread's memory space (stack).
If the caller says in a that the result is int, there is no great risk of crash, because:
in your case both types have the same length (be aware that this is NOT GUARANTEED TO BE PORTABLE!)
The C standards says (ISO - sect.6.3.2.3): "Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined.
But fortunately, most 32 bit values will be a valid integer.
Keep in mind that in the worst case, the value could be meaningless. So you it's up to you to avoid the crash, by systematically verifying consistency of the value (for example do bound controls if you use the integer to adress some array elements)
If the caller says in "a" that the result is a pointer but provides an int, it's much more difficult to avoid a crash in a portable manner.
The standard ISO says: An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
In practice most of these errors are trapped by memory access exceptions at a very low system level. The behaviour being implementation defined, there's no portable way of doing it.
NOTE: This doesn't actually attempt to make the function "crash-proof", because I suspect that thats not possible.
If you are allowed to change the API, one option may be to combine the union only use an api for accessing the type.
typedef enum Type { STRING, INT } Type;
typedef struct StringOrInt {
Type type;
union { int i; char* s } value;
} StringOrInt;
void soi_set_int(StringOrInt* v, int i) {
v->type = INT;
v->value.i = i;
}
void soi_set_string(StringOrInt* v, char* s) {
v->type = STRING;
v->value.s = s;
}
Type soi_get_type(StringOrInt cosnt* v) {
return v->type;
}
int soi_get_int(StringOrInt const* v) {
assert(v->type == INT);
return v->value.i;
}
char* soi_get_string(StringOrInt const* v) {
assert(v->type == STRING);
return v->value.s;
}
While this doesn't actually make it crash proof, users of the API will find it more convenient to use the API than change the members by hand, reducing the errors significantly.
Run-time type checking in C is effectively impossible.
The burden is on the caller to pass the data correctly; there's no (good, standard, portable) way for you to determine whether b contains data of the correct type (that is, that the caller didn't pass you a pointer value as an integer or vice versa).
The only suggestion I can make is to create two separate callbacks, one of which takes an int and the other a char *, and put the burden on the compiler to do type checking at compile time.
I'm trying to write a simple C program on Ubuntu using Eclipse CDT (yes, I'm more comfortable with an IDE and I'm used to Eclipse from Java development), and I'm stuck with something weird. On one part of my code, I initialize a char array in a function, and it is by default pointing to the same location with one of the inputs, which has nothing to do with that char array. Here is my code:
char* subdir(const char input[], const char dir[]){
[*] int totallen = strlen(input) + strlen(dir) + 2;
char retval[totallen];
strcpy(retval, input);
strcat(retval, dir);
...}
Ok at the part I've marked with [*], there is a checkpoint. Even at that breakpoint, when I check y locals, I see that retval is pointing to the same address with my argument input. It not even possible as input comes from another function and retval is created in this function. Is is me being unexperienced with C and missing something, or is there a bug somewhere with the C compiler?
It seems so obvious to me that they should't point to the same (and a valid, of course, they aren't NULL) location. When the code goes on, it literally messes up everything; I get random characters and shapes in console and the program crashes.
I don't think it makes sense to check the address of retval BEFORE it appears, it being a VLA and all (by definition the compiler and the debugger don't know much about it, it's generated at runtime on the stack).
Try checking its address after its point of definition.
EDIT
I just read the "I get random characters and shapes in console". It's obvious now that you are returning the VLA and expecting things to work.
A VLA is only valid inside the block where it was defined. Using it outside is undefined behavior and thus very dangerous. Even if the size were constant, it still wouldn't be valid to return it from the function. In this case you most definitely want to malloc the memory.
What cnicutar said.
I hate people who do this, so I hate me ... but ... Arrays of non-const size are a C99 extension and not supported by C++. Of course GCC has extensions to make it happen.
Under the covers you are essentially doing an _alloca, so your odds of blowing out the stack are proportional to who has access to abuse the function.
Finally, I hope it doesn't actually get returned, because that would be returning a pointer to a stack allocated array, which would be your real problem since that array is gone as of the point of return.
In C++ you would typically use a string class.
In C you would either pass a pointer and length in as parameters, or a pointer to a pointer (or return a pointer) and specify the calls should call free() on it when done. These solutions all suck because they are error prone to leaks or truncation or overflow. :/
Well, your fundamental problem is that you are returning a pointer to the stack allocated VLA. You can't do that. Pointers to local variables are only valid inside the scope of the function that declares them. Your code results in Undefined Behaviour.
At least I am assuming that somewhere in the ..... in the real code is the line return retval.
You'll need to use heap allocation, or pass a suitably sized buffer to the function.
As well as that, you only need +1 rather than +2 in the length calculation - there is only one null-terminator.
Try changing retval to a character pointer and allocating your buffer using malloc().
Pass the two string arguments as, char * or const char *
Rather than returning char *, you should just pass another parameter with a string pointer that you already malloc'd space for.
Return bool or int describing what happened in the function, and use the parameter you passed to store the result.
Lastly don't forget to free the memory since you're having to malloc space for the string on the heap...
//retstr is not a const like the other two
bool subdir(const char *input, const char *dir,char *retstr){
strcpy(retstr, input);
strcat(retstr, dir);
return 1;
}
int main()
{
char h[]="Hello ";
char w[]="World!";
char *greet=(char*)malloc(strlen(h)+strlen(w)+1); //Size of the result plus room for the terminator!
subdir(h,w,greet);
printf("%s",greet);
return 1;
}
This will print: "Hello World!" added together by your function.
Also when you're creating a string on the fly you must malloc. The compiler doesn't know how long the two other strings are going to be, thus using char greet[totallen]; shouldn't work.
I have a function that I would like to be able to return special values for failure and uninitialized (it returns a pointer on success).
Currently it returns NULL for failure, and -1 for uninitialized, and this seems to work... but I could be cheating the system. IIRC, addresses are always positive, are they not? (although since the compiler is allowing me to set an address to -1, this seems strange).
[update]
Another idea I had (in the event that -1 was risky) is to malloc a char # the global scope, and use that address as a sentinel.
No, addresses aren't always positive - on x86_64, pointers are sign-extended and the address space is clustered symmetrically around 0 (though it is usual for the "negative" addresses to be kernel addresses).
However the point is mostly moot, since C only defines the meaning of < and > pointer comparisons between pointers that are to part of the same object, or one past the end of an array. Pointers to completely different objects cannot be meaningfully compared other than for exact equality, at least in standard C - if (p < NULL) has no well defined semantics.
You should create a dummy object with static storage duration and use its address as your unintialised value:
extern char uninit_sentinel;
#define UNINITIALISED ((void *)&uninit_sentinel)
It's guaranteed to have a single, unique address across your program.
The valid values for a pointer are entirely implementation-dependent, so, yes, a pointer address could be negative.
More importantly, however, consider (as an example of a possible implementation choice) the case where you are on a 32-bit platform with a 32-bit pointer size. Any value that can be represented by that 32-bit value might be a valid pointer. Other than the null pointer, any pointer value might be a valid pointer to an object.
For your specific use case, you should consider returning a status code and perhaps taking the pointer as a parameter to the function.
It's generally a bad design to try to multiplex special values onto a return value... you're trying to do too much with a single value. It would be cleaner to return your "success pointer" via argument, rather than the return value. That leaves lots of non-conflicting space in the return value for all of the conditions you want to describe:
int SomeFunction(SomeType **p)
{
*p = NULL;
if (/* check for uninitialized ... */)
return UNINITIALIZED;
if (/* check for failure ... */)
return FAILURE;
*p = yourValue;
return SUCCESS;
}
You should also do typical argument checking (ensure that 'p' isn't NULL).
The C language does not define the notion of "negativity" for pointers. The property of "being negative" is a chiefly arithmetical one, not in any way applicable to values of pointer type.
If you have a pointer-returning function, then you cannot meaningfully return the value of -1 from that function. In C language integral values (other than zero) are not implicitly convertible to pointer types. An attempt to return -1 from a pointer-returning function is an immediate constraint violation that will result in diagnostic message. In short, it is an error. If your compiler allows it, it simply means that it doesn't enforce that constraint too strictly (most of the time they do it for compatibility with pre-standard code).
If you force the value of -1 to pointer type by an explicit cast, the result of the cast will be implementation-defined. The language itself makes no guarantees about it. It might easily prove to be the same as some other, valid pointer value.
If you want to create a reserved pointer value, there no need to malloc anything. You can simple declare a global variable of the desired type and use its address as the reserved value. It is guaranteed to be unique.
Pointers can be negative like an unsigned integer can be negative. That is, sure, in a two's-complement interpretation, you could interpret the numerical value to be negative because the most-significant-bit is on.
What's the difference between failure and unitialized. If unitialized is not another kind of failure, then you probably want to redesign the interface to separate these two conditions.
Probably the best way to do this is to return the result through a parameter, so the return value only indicates an error. For example where you would write:
void* func();
void* result=func();
if (result==0)
/* handle error */
else if (result==-1)
/* unitialized */
else
/* initialized */
Change this to
// sets the *a to the returned object
// *a will be null if the object has not been initialized
// returns true on success, false otherwise
int func(void** a);
void* result;
if (func(&result)){
/* handle error */
return;
}
/*do real stuff now*/
if (!result){
/* initialize */
}
/* continue using the result now that it's been initialized */
#James is correct, of course, but I'd like to add that pointers don't always represent absolute memory addresses, which theoretically would always be positive. Pointers also represent relative addresses to some point in memory, often a stack or frame pointer, and those can be both positive and negative.
So your best bet is to have your function accept a pointer to a pointer as a parameter and fill that pointer with a valid pointer value on success while returning a result code from the actual function.
James answer is probably correct, but of course describes an implementation choice, not a choice that you can make.
Personally, I think addresses are "intuitively" unsigned. Finding a pointer that compares as less-than a null pointer would seem wrong. But ~0 and -1, for the same integer type, give the same value. If it's intuitively unsigned, ~0 may make a more intuitive special-case value - I use it for error-case unsigned ints quite a lot. It's not really different (zero is an int by default, so ~0 is -1 until you cast it) but it looks different.
Pointers on 32-bit systems can use all 32 bits BTW, though -1 or ~0 is an extremely unlikely pointer to occur for a genuine allocation in practice. There are also platform-specific rules - for example on 32-bit Windows, a process can only have a 2GB address space, and there's a lot of code around that encodes some kind of flag into the top bit of a pointer (e.g. for balancing flags in balanced binary trees).
Actually, (at least on x86), the NULL-pointer exception is generated not only by dereferencing the NULL pointer, but by a larger range of addresses (eg, first 65kb). This helps catching such errors as
int* x = NULL;
x[10] = 1;
So, there are more addresses that are garanteed to generate the NULL pointer exception when dereferenced.
Now consider this code (made compilable for AndreyT):
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define ERR_NOT_ENOUGH_MEM (int)NULL
#define ERR_NEGATIVE (int)NULL + 1
#define ERR_NOT_DIGIT (int)NULL + 2
char* fn(int i){
if (i < 0)
return (char*)ERR_NEGATIVE;
if (i >= 10)
return (char*)ERR_NOT_DIGIT;
char* rez = (char*)malloc(strlen("Hello World ")+sizeof(char)*2);
if (rez)
sprintf(rez, "Hello World %d", i);
return rez;
};
int main(){
char* rez = fn(3);
switch((int)rez){
case ERR_NOT_ENOUGH_MEM: printf("Not enough memory!\n"); break;
case ERR_NEGATIVE: printf("The parameter was negative\n"); break;
case ERR_NOT_DIGIT: printf("The parameter is not a digit\n"); break;
default: printf("we received %s\n", rez);
};
return 0;
};
this could be useful in some cases.
It won't work on some Harvard architectures, but will work on von Neumann ones.
Do not use malloc for this purpose. It might keep unnecessary memory tied up (if a lot of memory is already in use when malloc gets called and the sentinel gets allocated at a high address, for example) and it confuses memory debuggers/leak detectors. Instead simply return a pointer to a local static const char object. This pointer will never compare equal to any pointer the program could obtain in any other way, and it only wastes one byte of bss.
You don't need to care about the signness of a pointer, because it's implementation defined. The real question here is "how to return special values from a function returning pointer?" which I've explained in detail in my answer to the question Pointer address span on various platforms
In summary, the all-one bit pattern (-1) is (almost) always safe, because it's already at the end of the spectrum and data cannot be stored wrapped around to the first address, and the malloc family never returns -1. In fact this value is even returned by many Linux system calls and Win32 APIs to indicate another state for the pointer. So if you need just failure and uninitialized then it's a good choice
But you can return far more error states by utilizing the fact that variables must be aligned properly (unless you specified some other options). For example in a pointer to int32_t the low 2 bits are always zero which means only ¹⁄₄ of the possible values are valid addresses, leaving all of the remaining bit patterns for you to use. So a simple solution would be just checking the lowest bit
int* result = func();
if (!result)
error_happened();
else if ((uintptr_t)result & 1)
uninitialized();
In this case you can return both a valid pointer and some additional data at the same time
You can also use the high bits for storing data in 64-bit systems. On ARM there's a flag that tells the CPU to ignore the high bits in the addresses. On x86 there isn't a similar thing but you can still use those bits as long as you make it canonical before dereferencing. See Using the extra 16 bits in 64-bit pointers
See also
Is ((void *) -1) a valid address?
NULL is the only valid error return in this case, this is true anytime an unsigned value such as a pointer is returned. It may be true that in some cases pointers will not be large enough to use the sign bit as a data bit, however since pointers are controlled by the OS not the program I would not rely on this behavior.
Remember that a pointer is basically a 32-bit value; whether or not this is a possible negative or always positive number is just a matter of interpretation (i.e.) whether the 32nd bit is interpreted as the sign bit or as a data bit. So if you interpreted 0xFFFFFFF as a signed number it would be -1, if you interpreted it as an unsigned number it would be 4294967295. Technically, it is unlikely that a pointer would ever be this large, but this case should be considered anyway.
As far as an alternative you could use an additional out parameter (returning NULL for all failures), however this would require clients to create and pass a value even if they don't need to distinguish between specific errors.
Another alternative would be to use the GetLastError/SetLastError mechanism to provide additional error information (This would be specific to Windows, don't know if that is an issue or not), or to throw an exception on error instead.
Positive or negative is not a meaningful facet of pointer type. They pertain to signed integer including signed char, short, int etc.
People talk about negative pointer mostly in a situation that treats pointer's machine representation as an integer type. e.g. reinterpret_cast<intptr_t>(ptr). In this case, they are actually talking about the cast integer, not the pointer itself.
In some scenario I think pointer is inherently unsigned, we talk about address in terms below or above. 0xFFFF.FFFF is above 0x0AAAA.0000, which is intuitively for human beings. Although 0xFFFF.FFFF is actually a "negative" while 0x0AAA.0000 is positive.
But in other scenarios such as pointer subtraction (ptr1 - ptr2) that results in a signed value whose type is ptrdiff_t, it's inconsistent when you compare with integer's subtraction, signed_int_a - signed_int_b results in a signed int type, unsigned_int_a - unsigned_int_b produces an unsigned type. But for pointer subtraction, it produces a signed type, because the semantic is the distance between two pointers, the unit is number of elements.
In summary I suggest treating pointer type as standalone type, every type has it's set of operation on it. For pointers (excluding function pointer, member function pointer, and void *):
List item
+, +=
ptr + any_integer_type
-, -=
ptr - any_integer_type
ptr1 - ptr2
++ both prefix and postfix
-- both prefix and postfix
Note there are no / * % operations for pointer. That's also supported that pointer should be treated as a standalone type, instead of "A type similar to int" or "A type whose underlying type is int so it should looks like int".