Pointer to integer and back again - c

First, let me emphasize that this question is legalistic in nature. I am not asking whether the following program will work, in practice, on real implementations, I am asking whether it is legal (:= not producing an undefined behavior) according to the strictest legalistic interpretation of the ISO-9899 standards (:1999 and :2011).
The question is whether it is permissible to convert a pointer to an uintptr_t integer, perform some arithmetic on that integer, return it to the same value, and convert the integer back to a pointer.
So, is the following program legal (in the sense that it does not produce an undefined behavior)?
#include <stdint.h>
#include <stdio.h>
int
main(void)
{
int answer = 42; void *ptr;
uintptr_t deepthought;
ptr = &answer;
deepthought = (uintptr_t)ptr;
ptr = 0;
deepthought ^= 0xdeadbeef;
printf("I'm thinking about it...\n");
deepthought ^= 0xdeadbeef;
ptr = (void *)deepthought;
printf("The answer is: %d\n", *((int *)ptr));
return 0;
}
Again, I am aware that this code will cause no difficulty on any real system. The question is whether it lives up to the legalese in the C standard, esp. §7.18.1.4 in ISO-9899:1999 / §7.20.1.4 in ISO-9899:2011, in which the phrase "an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer": it isn't clear whether "then converted back" allows for intermediate arithmetic computations.
To make the question a little less theoretical, here is a reason why one might wish for this kind of processing to be forbidden. If we change the example just a little bit so that the pointer is mallocated instead of pointing to a local variable, and if it happens to run on an implementation with a (conservative) garbage-collector, the memory could conceivably be reclaimed during the printf call because, at that moment, there is nothing pointing to that region of memory. So if the C standard makes the above example illegal (e.g., if nothing can be written to a pointer that did not hold a legal pointer value all the time), this provides a legalistic justification for the assumptions made by garbage-collectors.
But I repeat that the question is about the hermeneutics of the C standard, not about any practical or real-world outcome.

Yes, it must work.
From how I read the Standardese, you could write the value of deepthought to a file (say with fwrite), destroy any copies of the value in the program, then read the value from the file again (fread). The value so read and converted to a pointer must compare equal to the original pointer. I don't find any wording that forbids this.
A garbage collector which could move the address of an object would have to take such possibilities into account.

Related

Is the difference between the addresses of a function's parameters always 4 bytes?

I've been doing some pointers testing in C, and I was just curious if the addresses of a function's parameters are always in a difference of 4 bytes from one another.
I've tries to run the following code:
#include <stdio.h>
void func(long a, long b);
int main(void)
{
func(1, 2);
getchar();
return 0;
}
void func(long a, long b)
{
printf("%d\n", (int)&b - (int)&a);
}
This code seems to always print 4, no matter what is the type of func's parameters.
I was just wondering if it's ALWAYS 4, because if so it can be useful for something I'm trying to do (but if it isn't necessarily 4 I guess I could just use va_list for my function or something).
So: Is it necessarily 4 bytes?
Absolutely not, in so many ways that it would be hard to count them all.
First and foremost, the memory layout of arguments is simply not specified by the C language. Full stop. It is not specified. Thus the answer is "no" immediately.
va_list exists because there was a need to be able to navigate a list of varadic arguments because it wasn't specified other than that. va_list is intentionally very limited, so that it works on platforms where the shape of the stack does not match your intuition.
Other reasons it can't always be 4:
What if you pass an object of length 8?
What if the compiler optimizes a reference to actually point at the object in another frame?
What if the compiler adds padding, perhaps to align a 64-bit number on a 64-bit boundary?
What if the stack is built in the opposite direction (such that the difference would be -4 instead of +4)
The list goes on and on. C does not specify the relative addresses between arguments.
As the other answers correctly say:
No.
Furthermore, even trying to determine whether the addresses differ by 4 bytes, depending on how you do it, probably has undefined behavior, which means the C standard says nothing about what your program does.
void func(long a, long b)
{
printf("%d\n", (int)&b - (int)&a);
}
&a and &b are expression of type long*. Converting a pointer to int is legal, but the result is implementation-defined, and "If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type."
It's very likely that pointers are 64 bits and int is 32 bits, so the conversions could lose information.
Most likely the conversions will give you values of type int, but they don't necessarily have any meaning, nor does their difference.
Now you can subtract pointer values directly, with a result of the signed integer type ptrdiff_t (which, unlike int, is probably big enough to hold the result).
printf("%td\n", &b - &a);
But "When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements." Pointers to distinct object cannot be meaningfully compared or subtracted.
Having said all that, it's likely that the implementation you're using has a memory model that's reasonably straightforward, and that pointer values are in effect represented as indices into a monolithic memory space. Comparing &b vs. &a is not permitted by the C language, but examining the values can provide some insight about what's going on behind the curtain -- which can be especially useful if you're tracking down a bug.
Here's something you can do portably to examine the addresses:
printf("&a = %p\n", (void*)&a);
printf("&b = %p\n", (void*)&b);
The result you're seeing for the subtraction (4) suggests that type long is probably 4 bytes (32 bits) on your system. I'd guess you're on Windows. It also suggests something about the way function parameters are allocated -- something that you as a programmer should almost never have to care about, but is worth understanding anyway.
[...] I was just curious if the addresses of a function's parameters are always in a difference of 4 bytes from one another."
The greatest error in your reasoning is to think that the parameters exist in memory at all.
I am running this program on x86-64:
#include <stdio.h>
#include <stdint.h>
void func(long a, long b)
{
printf("%d\n", (int)((intptr_t)&b - (intptr_t)&a));
}
int main(void)
{
func(1, 2);
}
and compile it with gcc -O3 it prints 8, proving that your guess is absolutely wrong. Except... when I compile it without optimization it prints out -8.
X86-64 SYSV calling convention says that the arguments are passed in registers instead of being passed in memory. a and b do not have an address until you take their address with & - then the compiler is caught with its pants down from cheating the as-if rule and it quickly pulls up its pants and stuffs them into some memory location so that they can have their address taken, but it is in no way consistent in where they're stored.

Does C always have to use pointers to handle addresses?

As I understand it, all of the cases where C has to handle an address involve the use of a pointer. For example, the & operand creates a pointer to the program, instead of just giving the bare address as data (i.e it never gives the address without using a pointer first):
scanf("%d", &foo)
Or when using the & operand
int i; //a variable
int *p; //a variable that store adress
p = &i; //The & operator returns a pointer to its operand, and equals p to that pointer.
My question is: Is there a reason why C programs always have to use a pointer to manage addresses? Is there a case where C can handle a bare address (the numerical value of the address) on its own or with another method? Or is that completely impossible? (Being because of system architecture, memory allocation changing during and in each runtime, etc). And finally, would that be useful being that addresses change because of memory management? If that was the case, it would be a reason why pointers are always needed.
I'm trying to figure out if the use pointers is a must in C standardized languages. Not because I want to use something else, but because I want to know for sure that the only way to use addresses is with pointers, and just forget about everything else.
Edit: Since part of the question was answered in the comments of Eric Postpischil, Michał Marszałek, user3386109, Mike Holt and Gecko; I'll group those bits here: Yes, using bare adresses bear little to no use because of different factors (Pointers allow a number of operations, adresses may change each time the program is run, etc). As Michał Marszałek pointed out (No pun intended) scanf() uses a pointer because C can only work with copies, so a pointer is needed to change the variable used. i.e
int foo;
scanf("%d", foo) //Does nothing, since value can't be changed
scanf("%d", &foo) //Now foo can be changed, since we use it's address.
Finally, as Gecko mentioned, pointers are there to represent indirection, so that the compiler can make the difference between data and address.
John Bode covers most of those topics in it's answer, so I'll mark that one.
A pointer is an address (or, more properly, it’s an abstraction of an address). Pointers are how we deal with address values in C.
Outside of a few domains, a “bare address” value simply isn’t useful on its own. We’re less interested in the address than the object at that address. C requires us to use pointers in two situations:
When we want a function to write to a parameter
When we need to track dynamically allocated memory
In these cases, we don’t really care what the address value actually is; we just need it to access the object we’re interested in.
Yes, in the embedded world specific address values are meaningful. But you still use pointers to access those locations. Like I said above, a pointer is an address for our purposes.
C allows you to convert pointers to integers. The <stdint.h> header provides a uintptr_t type with the property that any pointer to void can be converted to uintptr_t and back, and the result will compare equal to the original pointer.
Per C 2018 6.3.2.3 6, the result of converting a pointer to an integer is implementation-defined. Non-normative note 69 says “The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.”
Thus, on a machine where addresses are a simple numbering scheme, converting a pointer to a uintptr_t ought to give you the natural machine address, even though the standard does not require it. There are, however, environments where addresses are more complicated, and the result of converting a pointer to an integer may not be straightforward.
int i; //a variable
int *p; //a variable that store adres
i = 10; //now i is set to 10
p = &i; //now p is set to i address
*p = 20; //we set to 20 the given address
int tab[10]; // a table
p = tab; //set address
p++; //operate on address and move it to next element tab[1]
We can operate on address by pointers move forward or backwards. We can set and read from given address.
In C if we want get return values from functions we must use pointers. Or use return value from functions, but that way we can only get one value.
In C we don't have references therefore we must use pointers.
void fun(int j){
j = 10;
}
void fun2(int *j){
*j = 10;
}
int i;
i = 5; // now I is set to 5
fun(i);
//printf i will print 5
fun2(&i);
//printf I will print 10

how can I 'crash proof' a callback function written in C if passed the wrong parameter type

I have a callback function written in C that runs on a server and MUST be crash proof. That is, if expecting an integer and is passed a character string pointer, I must internal to the function determine that, and prevent getting Segmentation faults when trying to do something not allowed on the incorrect parameter type.
The function protoype is:
void callback_function(parameter_type a, const b);
and 'a' is supposed to tell me, via enum, whether 'b' is an integer or a character string pointer.
If the calling function programmer makes a mistake, and tells me in 'a' that 'b' is an integer, and 'b' is really a character string pointer, then how do I determine that without crashing the callback function code. This runs on a server and must keep going if the caller function made a mistake.
The code has to be in C, and be portable so C library extensions would not be possible. The compiler is: gcc v4.8.2
The sizeof an integer on the platform is 4, and so is the length of a character pointer.
An integer could have the same value, numerically, as a character pointer, and vice versa.
If I think I get a character pointer and its not, when I try to find the content of that, I of course get a Segmentation Fault.
If I write a signal handler to handle the fault, how do I now "clear" the signal, and resume execution at a sane place?
Did I mention that 'b' is a union defined as:
union param_2 {
char * c;
int i;
} param_to_be_passed;
I think that's about it.
Thank You for your answers.
That is not possible.
There's no way to "look" at at pointer and determine if it's valid to de-reference, except for NULL-checking it of course.
Other than that, there's no magic way to know if a pointer points at character data, an integer, a function, or anything else.
You are looking for a hack.
What ever proposal comes, do not use such things in production.
If late binding is needed take a different, a fail-safe approach.
If you're writing code for an embedded device, you would expect that all variables would reside in RAM. For example, you might have 128 kB of RAM from addresses 0x20000000 to 0x20020000. If you were passed a pointer to a memory address without this range, in regard to c, that would be another way to determine something was wrong, in addition to checking for a NULL address.
if((a == STRING) && ((b.c == NULL) || (b.c < 0x20000000) || (b.c > 0x20020000)))
return ERROR;
If you're working in a multithreaded environment, you may be able to take this a step further and require all addresses passed to callback_function come from a certain thread's memory space (stack).
If the caller says in a that the result is int, there is no great risk of crash, because:
in your case both types have the same length (be aware that this is NOT GUARANTEED TO BE PORTABLE!)
The C standards says (ISO - sect.6.3.2.3): "Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined.
But fortunately, most 32 bit values will be a valid integer.
Keep in mind that in the worst case, the value could be meaningless. So you it's up to you to avoid the crash, by systematically verifying consistency of the value (for example do bound controls if you use the integer to adress some array elements)
If the caller says in "a" that the result is a pointer but provides an int, it's much more difficult to avoid a crash in a portable manner.
The standard ISO says: An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
In practice most of these errors are trapped by memory access exceptions at a very low system level. The behaviour being implementation defined, there's no portable way of doing it.
NOTE: This doesn't actually attempt to make the function "crash-proof", because I suspect that thats not possible.
If you are allowed to change the API, one option may be to combine the union only use an api for accessing the type.
typedef enum Type { STRING, INT } Type;
typedef struct StringOrInt {
Type type;
union { int i; char* s } value;
} StringOrInt;
void soi_set_int(StringOrInt* v, int i) {
v->type = INT;
v->value.i = i;
}
void soi_set_string(StringOrInt* v, char* s) {
v->type = STRING;
v->value.s = s;
}
Type soi_get_type(StringOrInt cosnt* v) {
return v->type;
}
int soi_get_int(StringOrInt const* v) {
assert(v->type == INT);
return v->value.i;
}
char* soi_get_string(StringOrInt const* v) {
assert(v->type == STRING);
return v->value.s;
}
While this doesn't actually make it crash proof, users of the API will find it more convenient to use the API than change the members by hand, reducing the errors significantly.
Run-time type checking in C is effectively impossible.
The burden is on the caller to pass the data correctly; there's no (good, standard, portable) way for you to determine whether b contains data of the correct type (that is, that the caller didn't pass you a pointer value as an integer or vice versa).
The only suggestion I can make is to create two separate callbacks, one of which takes an int and the other a char *, and put the burden on the compiler to do type checking at compile time.

Invalid pointer becoming valid again

int *p;
{
int x = 0;
p = &x;
}
// p is no longer valid
{
int x = 0;
if (&x == p) {
*p = 2; // Is this valid?
}
}
Accessing a pointer after the thing it points to has been freed is undefined behavior, but what happens if some later allocation happens in the same area, and you explicitly compare the old pointer to a pointer to the new thing? Would it have mattered if I cast &x and p to uintptr_t before comparing them?
(I know it's not guaranteed that the two x variables occupy the same spot. I have no reason to do this, but I can imagine, say, an algorithm where you intersect a set of pointers that might have been freed with a set of definitely valid pointers, removing the invalid pointers in the process. If a previously-invalidated pointer is equal to a known good pointer, I'm curious what would happen.)
By my understanding of the standard (6.2.4. (2))
The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.
you have undefined behaviour when you compare
if (&x == p) {
as that meets these points listed in Annex J.2:
— The value of a pointer to an object whose lifetime has ended is used (6.2.4).
— The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.9, 6.8).
Okay, this seems to be interpreted as a two- make that three part question by some people.
First, there were concerns if using the pointer for a comparison is defined at all.
As is pointed out in the comments, the mere use of the pointer is UB, since $J.2: says use of pointer to object whose lifetime has ended is UB.
However, if that obstacle is passed (which is well in the range of UB, it can work after all and will on many platforms), here is what I found about the other concerns:
Given the pointers do compare equal, the code is valid:
C Standard, §6.5.3.2,4:
[...] If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
Although a footnote at that location explicitly says. that the address of an object after the end of its lifetime is an invalid pointer value, this does not apply here, since the if makes sure the pointer's value is the address of x and thus is valid.
C++ Standard, §3.9.2,3:
If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained. [ Note: For instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address.
Emphasis is mine.
It will probably work with most of the compilers but it still is undefined behavior. For the C language these x are two different objects, one has ended its lifetime, so you have UB.
More seriously, some compilers may decide to fool you in a different way than you expect.
The C standard says
Two pointers compare equal if and only if both are null pointers, both
are pointers to the same object (including a pointer to an object and
a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer to
one past the end of one array object and the other is a pointer to the
start of a different array object that happens to immediately follow
the first array object in the address space.
Note in particular the phrase "both are pointers to the same object". In the sense of the standard the two "x"s are not the same object. They may happen to be realized in the same memory location, but this is to the discretion of the compiler. Since they are clearly two distinct objects, declared in different scopes the comparison should in fact never be true. So an optimizer might well cut away that branch completely.
Another aspect that has not yet been discussed of all that is that the validity of this depends on the "lifetime" of the objects and not the scope. If you'd add a possible jump into that scope
{
int x = 0;
p = &x;
BLURB: ;
}
...
if (...)
...
if (something) goto BLURB;
the lifetime would extend as long as the scope of the first x is reachable. Then everything is valid behavior, but still your test would always be false, and optimized out by a decent compiler.
From all that you see that you better leave it at argument for UB, and don't play such games in real code.
It would work, if by work you use a very liberal definition, roughly equivalent to that it would not crash.
However, it is a bad idea. I cannot imagine a single reason why it is easier to cross your fingers and hope that the two local variables are stored in the same memory address than it is to write p=&x again. If this is just an academic question, then yes it's valid C - but whether the if statement is true or not is not guaranteed to be consistent across platforms or even different programs.
Edit: To be clear, the undefined behavior is whether &x == p in the second block. The value of p will not change, it's still a pointer to that address, that address just doesn't belong to you anymore. Now the compiler might (probably will) put the second x at that same address (assuming there isn't any other intervening code). If that happens to be true, it's perfectly legal to dereference p just as you would &x, as long as it's type is a pointer to an int or something smaller. Just like it's legal to say p = 0x00000042; if (p == &x) {*p = whatever;}.
The behaviour is undefined. However, your question reminds me of another case where a somewhat similar concept was being employed. In the case alluded, there were these threads which would get different amounts of cpu times because of their priorities. So, thread 1 would get a little more time because thread 2 was waiting for I/O or something. Once its job was done, thread 1 would write values to the memory for the thread two to consume. This is not "sharing" the memory in a controlled way. It would write to the calling stack itself. Where variables in thread 2 would be allocated memory. Now, when thread 2 eventually got round to execution,all its declared variables would never have to be assigned values because the locations they were occupying had valid values. I don't know what they did if something went wrong in the process but this is one of the most hellish optimizations in C code I have ever witnessed.
Winner #2 in this undefined behavior contest is rather similar to your code:
#include <stdio.h>
#include <stdlib.h>
int main() {
int *p = (int*)malloc(sizeof(int));
int *q = (int*)realloc(p, sizeof(int));
*p = 1;
*q = 2;
if (p == q)
printf("%d %d\n", *p, *q);
}
According to the post:
Using a recent version of Clang (r160635 for x86-64 on Linux):
$ clang -O realloc.c ; ./a.out
1 2
This can only be explained if the Clang developers consider that this example, and yours, exhibit undefined behavior.
Put aside the fact if it is valid (and I'm convinced now that it's not, see Arne Mertz's answer) I still think that it's academic.
The algorithm you are thinking of would not produce very useful results, as you could only compare two pointers, but you have no chance to determine if these pointers point to the same kind of object or to something completely different. A pointer to a struct could now be the address of a single char for example.

Can a pointer (address) ever be negative?

I have a function that I would like to be able to return special values for failure and uninitialized (it returns a pointer on success).
Currently it returns NULL for failure, and -1 for uninitialized, and this seems to work... but I could be cheating the system. IIRC, addresses are always positive, are they not? (although since the compiler is allowing me to set an address to -1, this seems strange).
[update]
Another idea I had (in the event that -1 was risky) is to malloc a char # the global scope, and use that address as a sentinel.
No, addresses aren't always positive - on x86_64, pointers are sign-extended and the address space is clustered symmetrically around 0 (though it is usual for the "negative" addresses to be kernel addresses).
However the point is mostly moot, since C only defines the meaning of < and > pointer comparisons between pointers that are to part of the same object, or one past the end of an array. Pointers to completely different objects cannot be meaningfully compared other than for exact equality, at least in standard C - if (p < NULL) has no well defined semantics.
You should create a dummy object with static storage duration and use its address as your unintialised value:
extern char uninit_sentinel;
#define UNINITIALISED ((void *)&uninit_sentinel)
It's guaranteed to have a single, unique address across your program.
The valid values for a pointer are entirely implementation-dependent, so, yes, a pointer address could be negative.
More importantly, however, consider (as an example of a possible implementation choice) the case where you are on a 32-bit platform with a 32-bit pointer size. Any value that can be represented by that 32-bit value might be a valid pointer. Other than the null pointer, any pointer value might be a valid pointer to an object.
For your specific use case, you should consider returning a status code and perhaps taking the pointer as a parameter to the function.
It's generally a bad design to try to multiplex special values onto a return value... you're trying to do too much with a single value. It would be cleaner to return your "success pointer" via argument, rather than the return value. That leaves lots of non-conflicting space in the return value for all of the conditions you want to describe:
int SomeFunction(SomeType **p)
{
*p = NULL;
if (/* check for uninitialized ... */)
return UNINITIALIZED;
if (/* check for failure ... */)
return FAILURE;
*p = yourValue;
return SUCCESS;
}
You should also do typical argument checking (ensure that 'p' isn't NULL).
The C language does not define the notion of "negativity" for pointers. The property of "being negative" is a chiefly arithmetical one, not in any way applicable to values of pointer type.
If you have a pointer-returning function, then you cannot meaningfully return the value of -1 from that function. In C language integral values (other than zero) are not implicitly convertible to pointer types. An attempt to return -1 from a pointer-returning function is an immediate constraint violation that will result in diagnostic message. In short, it is an error. If your compiler allows it, it simply means that it doesn't enforce that constraint too strictly (most of the time they do it for compatibility with pre-standard code).
If you force the value of -1 to pointer type by an explicit cast, the result of the cast will be implementation-defined. The language itself makes no guarantees about it. It might easily prove to be the same as some other, valid pointer value.
If you want to create a reserved pointer value, there no need to malloc anything. You can simple declare a global variable of the desired type and use its address as the reserved value. It is guaranteed to be unique.
Pointers can be negative like an unsigned integer can be negative. That is, sure, in a two's-complement interpretation, you could interpret the numerical value to be negative because the most-significant-bit is on.
What's the difference between failure and unitialized. If unitialized is not another kind of failure, then you probably want to redesign the interface to separate these two conditions.
Probably the best way to do this is to return the result through a parameter, so the return value only indicates an error. For example where you would write:
void* func();
void* result=func();
if (result==0)
/* handle error */
else if (result==-1)
/* unitialized */
else
/* initialized */
Change this to
// sets the *a to the returned object
// *a will be null if the object has not been initialized
// returns true on success, false otherwise
int func(void** a);
void* result;
if (func(&result)){
/* handle error */
return;
}
/*do real stuff now*/
if (!result){
/* initialize */
}
/* continue using the result now that it's been initialized */
#James is correct, of course, but I'd like to add that pointers don't always represent absolute memory addresses, which theoretically would always be positive. Pointers also represent relative addresses to some point in memory, often a stack or frame pointer, and those can be both positive and negative.
So your best bet is to have your function accept a pointer to a pointer as a parameter and fill that pointer with a valid pointer value on success while returning a result code from the actual function.
James answer is probably correct, but of course describes an implementation choice, not a choice that you can make.
Personally, I think addresses are "intuitively" unsigned. Finding a pointer that compares as less-than a null pointer would seem wrong. But ~0 and -1, for the same integer type, give the same value. If it's intuitively unsigned, ~0 may make a more intuitive special-case value - I use it for error-case unsigned ints quite a lot. It's not really different (zero is an int by default, so ~0 is -1 until you cast it) but it looks different.
Pointers on 32-bit systems can use all 32 bits BTW, though -1 or ~0 is an extremely unlikely pointer to occur for a genuine allocation in practice. There are also platform-specific rules - for example on 32-bit Windows, a process can only have a 2GB address space, and there's a lot of code around that encodes some kind of flag into the top bit of a pointer (e.g. for balancing flags in balanced binary trees).
Actually, (at least on x86), the NULL-pointer exception is generated not only by dereferencing the NULL pointer, but by a larger range of addresses (eg, first 65kb). This helps catching such errors as
int* x = NULL;
x[10] = 1;
So, there are more addresses that are garanteed to generate the NULL pointer exception when dereferenced.
Now consider this code (made compilable for AndreyT):
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define ERR_NOT_ENOUGH_MEM (int)NULL
#define ERR_NEGATIVE (int)NULL + 1
#define ERR_NOT_DIGIT (int)NULL + 2
char* fn(int i){
if (i < 0)
return (char*)ERR_NEGATIVE;
if (i >= 10)
return (char*)ERR_NOT_DIGIT;
char* rez = (char*)malloc(strlen("Hello World ")+sizeof(char)*2);
if (rez)
sprintf(rez, "Hello World %d", i);
return rez;
};
int main(){
char* rez = fn(3);
switch((int)rez){
case ERR_NOT_ENOUGH_MEM: printf("Not enough memory!\n"); break;
case ERR_NEGATIVE: printf("The parameter was negative\n"); break;
case ERR_NOT_DIGIT: printf("The parameter is not a digit\n"); break;
default: printf("we received %s\n", rez);
};
return 0;
};
this could be useful in some cases.
It won't work on some Harvard architectures, but will work on von Neumann ones.
Do not use malloc for this purpose. It might keep unnecessary memory tied up (if a lot of memory is already in use when malloc gets called and the sentinel gets allocated at a high address, for example) and it confuses memory debuggers/leak detectors. Instead simply return a pointer to a local static const char object. This pointer will never compare equal to any pointer the program could obtain in any other way, and it only wastes one byte of bss.
You don't need to care about the signness of a pointer, because it's implementation defined. The real question here is "how to return special values from a function returning pointer?" which I've explained in detail in my answer to the question Pointer address span on various platforms
In summary, the all-one bit pattern (-1) is (almost) always safe, because it's already at the end of the spectrum and data cannot be stored wrapped around to the first address, and the malloc family never returns -1. In fact this value is even returned by many Linux system calls and Win32 APIs to indicate another state for the pointer. So if you need just failure and uninitialized then it's a good choice
But you can return far more error states by utilizing the fact that variables must be aligned properly (unless you specified some other options). For example in a pointer to int32_t the low 2 bits are always zero which means only ¹⁄₄ of the possible values are valid addresses, leaving all of the remaining bit patterns for you to use. So a simple solution would be just checking the lowest bit
int* result = func();
if (!result)
error_happened();
else if ((uintptr_t)result & 1)
uninitialized();
In this case you can return both a valid pointer and some additional data at the same time
You can also use the high bits for storing data in 64-bit systems. On ARM there's a flag that tells the CPU to ignore the high bits in the addresses. On x86 there isn't a similar thing but you can still use those bits as long as you make it canonical before dereferencing. See Using the extra 16 bits in 64-bit pointers
See also
Is ((void *) -1) a valid address?
NULL is the only valid error return in this case, this is true anytime an unsigned value such as a pointer is returned. It may be true that in some cases pointers will not be large enough to use the sign bit as a data bit, however since pointers are controlled by the OS not the program I would not rely on this behavior.
Remember that a pointer is basically a 32-bit value; whether or not this is a possible negative or always positive number is just a matter of interpretation (i.e.) whether the 32nd bit is interpreted as the sign bit or as a data bit. So if you interpreted 0xFFFFFFF as a signed number it would be -1, if you interpreted it as an unsigned number it would be 4294967295. Technically, it is unlikely that a pointer would ever be this large, but this case should be considered anyway.
As far as an alternative you could use an additional out parameter (returning NULL for all failures), however this would require clients to create and pass a value even if they don't need to distinguish between specific errors.
Another alternative would be to use the GetLastError/SetLastError mechanism to provide additional error information (This would be specific to Windows, don't know if that is an issue or not), or to throw an exception on error instead.
Positive or negative is not a meaningful facet of pointer type. They pertain to signed integer including signed char, short, int etc.
People talk about negative pointer mostly in a situation that treats pointer's machine representation as an integer type. e.g. reinterpret_cast<intptr_t>(ptr). In this case, they are actually talking about the cast integer, not the pointer itself.
In some scenario I think pointer is inherently unsigned, we talk about address in terms below or above. 0xFFFF.FFFF is above 0x0AAAA.0000, which is intuitively for human beings. Although 0xFFFF.FFFF is actually a "negative" while 0x0AAA.0000 is positive.
But in other scenarios such as pointer subtraction (ptr1 - ptr2) that results in a signed value whose type is ptrdiff_t, it's inconsistent when you compare with integer's subtraction, signed_int_a - signed_int_b results in a signed int type, unsigned_int_a - unsigned_int_b produces an unsigned type. But for pointer subtraction, it produces a signed type, because the semantic is the distance between two pointers, the unit is number of elements.
In summary I suggest treating pointer type as standalone type, every type has it's set of operation on it. For pointers (excluding function pointer, member function pointer, and void *):
List item
+, +=
ptr + any_integer_type
-, -=
ptr - any_integer_type
ptr1 - ptr2
++ both prefix and postfix
-- both prefix and postfix
Note there are no / * % operations for pointer. That's also supported that pointer should be treated as a standalone type, instead of "A type similar to int" or "A type whose underlying type is int so it should looks like int".

Resources