Invalid pointer becoming valid again - c

int *p;
{
int x = 0;
p = &x;
}
// p is no longer valid
{
int x = 0;
if (&x == p) {
*p = 2; // Is this valid?
}
}
Accessing a pointer after the thing it points to has been freed is undefined behavior, but what happens if some later allocation happens in the same area, and you explicitly compare the old pointer to a pointer to the new thing? Would it have mattered if I cast &x and p to uintptr_t before comparing them?
(I know it's not guaranteed that the two x variables occupy the same spot. I have no reason to do this, but I can imagine, say, an algorithm where you intersect a set of pointers that might have been freed with a set of definitely valid pointers, removing the invalid pointers in the process. If a previously-invalidated pointer is equal to a known good pointer, I'm curious what would happen.)

By my understanding of the standard (6.2.4. (2))
The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.
you have undefined behaviour when you compare
if (&x == p) {
as that meets these points listed in Annex J.2:
— The value of a pointer to an object whose lifetime has ended is used (6.2.4).
— The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.9, 6.8).

Okay, this seems to be interpreted as a two- make that three part question by some people.
First, there were concerns if using the pointer for a comparison is defined at all.
As is pointed out in the comments, the mere use of the pointer is UB, since $J.2: says use of pointer to object whose lifetime has ended is UB.
However, if that obstacle is passed (which is well in the range of UB, it can work after all and will on many platforms), here is what I found about the other concerns:
Given the pointers do compare equal, the code is valid:
C Standard, §6.5.3.2,4:
[...] If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
Although a footnote at that location explicitly says. that the address of an object after the end of its lifetime is an invalid pointer value, this does not apply here, since the if makes sure the pointer's value is the address of x and thus is valid.
C++ Standard, §3.9.2,3:
If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained. [ Note: For instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address.
Emphasis is mine.

It will probably work with most of the compilers but it still is undefined behavior. For the C language these x are two different objects, one has ended its lifetime, so you have UB.
More seriously, some compilers may decide to fool you in a different way than you expect.
The C standard says
Two pointers compare equal if and only if both are null pointers, both
are pointers to the same object (including a pointer to an object and
a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer to
one past the end of one array object and the other is a pointer to the
start of a different array object that happens to immediately follow
the first array object in the address space.
Note in particular the phrase "both are pointers to the same object". In the sense of the standard the two "x"s are not the same object. They may happen to be realized in the same memory location, but this is to the discretion of the compiler. Since they are clearly two distinct objects, declared in different scopes the comparison should in fact never be true. So an optimizer might well cut away that branch completely.
Another aspect that has not yet been discussed of all that is that the validity of this depends on the "lifetime" of the objects and not the scope. If you'd add a possible jump into that scope
{
int x = 0;
p = &x;
BLURB: ;
}
...
if (...)
...
if (something) goto BLURB;
the lifetime would extend as long as the scope of the first x is reachable. Then everything is valid behavior, but still your test would always be false, and optimized out by a decent compiler.
From all that you see that you better leave it at argument for UB, and don't play such games in real code.

It would work, if by work you use a very liberal definition, roughly equivalent to that it would not crash.
However, it is a bad idea. I cannot imagine a single reason why it is easier to cross your fingers and hope that the two local variables are stored in the same memory address than it is to write p=&x again. If this is just an academic question, then yes it's valid C - but whether the if statement is true or not is not guaranteed to be consistent across platforms or even different programs.
Edit: To be clear, the undefined behavior is whether &x == p in the second block. The value of p will not change, it's still a pointer to that address, that address just doesn't belong to you anymore. Now the compiler might (probably will) put the second x at that same address (assuming there isn't any other intervening code). If that happens to be true, it's perfectly legal to dereference p just as you would &x, as long as it's type is a pointer to an int or something smaller. Just like it's legal to say p = 0x00000042; if (p == &x) {*p = whatever;}.

The behaviour is undefined. However, your question reminds me of another case where a somewhat similar concept was being employed. In the case alluded, there were these threads which would get different amounts of cpu times because of their priorities. So, thread 1 would get a little more time because thread 2 was waiting for I/O or something. Once its job was done, thread 1 would write values to the memory for the thread two to consume. This is not "sharing" the memory in a controlled way. It would write to the calling stack itself. Where variables in thread 2 would be allocated memory. Now, when thread 2 eventually got round to execution,all its declared variables would never have to be assigned values because the locations they were occupying had valid values. I don't know what they did if something went wrong in the process but this is one of the most hellish optimizations in C code I have ever witnessed.

Winner #2 in this undefined behavior contest is rather similar to your code:
#include <stdio.h>
#include <stdlib.h>
int main() {
int *p = (int*)malloc(sizeof(int));
int *q = (int*)realloc(p, sizeof(int));
*p = 1;
*q = 2;
if (p == q)
printf("%d %d\n", *p, *q);
}
According to the post:
Using a recent version of Clang (r160635 for x86-64 on Linux):
$ clang -O realloc.c ; ./a.out
1 2
This can only be explained if the Clang developers consider that this example, and yours, exhibit undefined behavior.

Put aside the fact if it is valid (and I'm convinced now that it's not, see Arne Mertz's answer) I still think that it's academic.
The algorithm you are thinking of would not produce very useful results, as you could only compare two pointers, but you have no chance to determine if these pointers point to the same kind of object or to something completely different. A pointer to a struct could now be the address of a single char for example.

Related

Why init pointer to NULL is a good practice? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
In case you don't want (or can not) init a pointer with an address, I often hear people say that you should init it with NULL and it's a good practice.
You can find people also say something like that in SO, for example here.
Working in many C projects, I don't think it is a good practice or at least somehow better than not init the pointer with anything.
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
So, could you tell me what are the reasons if you say that it is a good pratice or people just say it for granted (just like you should always init an variable) ?
Note that, I tried to find in Misra 2004 and also did not find any rule or recommendation for that.
Update:
So most of the comments and answers give the main reason that the pointer could have an random address before being used, so it's better to have it as null so you could figure out the problem faster.
To make my point clearer, I think that it doesn't make senses nowadays in commercial C softwares, in a practical point of view. The unassigned pointer that is used will be detected right the way by most of static analyzer, that's why I prefer to let it be un-init because if I init it with NULL then when developers forget to assign it to a "real" address, it passes the static analyzer and will cause runtime problems with null pointer.
You said
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
I would argue the main reason is actually due to exactly this. If you don't init pointers to NULL, then if there is a dereferecing error it's going to be a lot harder to find the problem because the pointer is not going to be set to NULL, it's going to be a most likely garbage value that may look exactly like a valid pointer.
C has very little runtime error checking, but NULL is guaranteed not to refer to a valid address, so a runtime environment (typically an operating system) is able to trap any attempt to de-refernce a NULL. The trap will identify the point at which the de-reference occurs rather then the point the program may eventually fail, making identification of the bug far easier.
Moreover when debugging, and unitialised pointer with random content may not be easily distinguishable from a valid pointer - it may refer to a plausible address, whereas NULL is always an invalid address.
If you de-reference an uninitialised pointer the result is non-deterministic - it may crash, it may not, but it will still be wrong.
If it does crash you cannot tell how or even when, since it may result in corruption of data, or reading of invalid data that may have no effect until that corrupted data is later used. The point of failure will not necessarily be the point of error.
So the purpose is that you will get deterministic failure, whereas without initialising, anything could happen - including nothing, leaving you with a latent undetected bug.
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
Deterministic failure is not "terrible" - it increases your chance of finding the error during development, rather then having your users finding the error after deployment. What you are effectively suggesting is that it is better to leave the bugs in and hide them. The dereference on null is guaranteed to be trapped, de-referencing an unitialised pointer is not.
That said initialising with null, should only be done if at the point of declaration you cannot directly assign an otherwise valid value. That is to say, for example:
char* x = malloc( y ) ;
is much preferable to:
char* x = NULL ;
...
x = malloc( y ) ;
which is in turn preferable to:
char* x ;
...
x = malloc( y ) ;
Note that, I tried to find in Misra 2004 and also did not find any
rule or recommendation for that.
MISRA C:2004, 9.1 - All automatic variables shall have been assigned a value before being used.
That is to say, there is no guideline to initialise to NULL, simply that initialisation is required. As I said initialisation to NULL is not preferable to initialising to a valid pointer. Don't blindly follow the "must initialise to NULL advice", because the rule is simply "must initialise", and sometimes the appropriate initialisation value is NULL.
If you don't initialize the pointer, it can have any value, including possibly NULL. It's hard to imagine a scenario where having any value including NULL is preferable to definitely having a NULL value. The logic is that it's better to at least know its value than have its value unpredictably depend on who knows what, possibly resulting in the code behaving differently on different platforms, with different compilers, and so on.
I strongly disagree with any answer or argument based on the idea that you can reliably use a test for NULL to tell if a pointer is valid or not. You can set a pointer to NULL and then test it for NULL within a limited context where that is known to be safe. But there will always be contexts where more than one pointer points to the same thing and you cannot ensure that every possible pointer to an object will be set to NULL at the very place the object is freed. It is simply an essential C programming discipline to understand that a pointer may or may not point to a valid object depending on what is going on in the code.
One issue is that given a pointer variable p, there is no way defined by the C language to ask, "does this pointer point to valid memory or not?" The pointer might point to valid memory. It might point to memory that (once upon a time) was allocated by malloc, but that has since been freed (meaning that the pointer is invalid). It might be an uninitialized pointer, meaning that it's not even meaningful to ask where it points (although it is definitely invalid). But, again, there's no way to know.
So if you're bopping along in some far-off corner of a large program, and you want to say
if(p is valid) {
do something with p;
} else {
fprintf(stderr, "invalid pointer!\n");
}
you can't do this. Once again, the C language gives you no way of writing if(p is valid).
So that's where the rule to always initialize pointers to NULL comes in. If you adopt this rule and follow it faithfully, if you initialize every pointer to NULL or as a pointer to valid memory, if whenever you call free(p) you always immediately follow it with p = NULL;, then if you follow all these rules, you can achieve a decent way of asking "is p valid?", namely:
if(p != NULL) {
do something with p;
} else {
fprintf(stderr, "invalid pointer!\n");
}
And of course it's very common to use an abbreviation:
if(p) {
do something with p;
} else {
fprintf(stderr, "invalid pointer!\n");
}
Here most people would read if(p) as "if p is valid" or "if p is allocated".
Addendum: This answer has attracted some criticism, and I suppose that's because, to make a point, I wrote some unrealistic code which some people are reading more into than I'd intended. The idiom I'm advocating here is not so much valid pointers versus invalid pointers, but rather, pointers I have allocated versus pointers I have not allocated (yet). No one writes code that simply detects and prints "invalid pointer!" as if to say "I don't know where this pointer points; it might be uninitialized or stale". The more realistic way of using the idiom is do do something like
/* ensure allocation before proceeding */
if(p == NULL)
p = malloc(...);
or
if(p == NULL) {
/* nothing to do */
return;
}
or
if(p == NULL) {
fprintf(stderr, "null pointer detected\n");
exit(0);
}
(And in all three cases the abbreviation if(!p) is popular as well.)
But, of course, if what you're trying to discriminate is pointers I have allocated versus pointers I have not allocated (yet), it is vital that you initialize all your un-allocated pointers with the explicit marker you're using to record that they're un-allocated, namely NULL.
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
Which is why you add a check against NULL before using that pointer value:
if ( p ) // p != NULL
{
// do something with p
}
NULL is a well-defined invalid pointer value, guaranteed to compare unequal to any object or function pointer value. It's a well-defined "nowhere" that's easy to check against.
Compare that to the indeterminate value that the uninitialized pointer1 may have - most likely, it will also be an invalid pointer value that will lead to a crash as soon as you try to use it, but it's almost impossible to determine that beforehand. Is 0xfff78567abcd2220 a valid or invalid pointer value? How would you check that?
Obviously, you should do some analysis to see if an initialization is required. Is there a risk of that pointer being dereferenced before you assign a valid pointer value to it? If not, then you don't need to initialize it beforehand.
Since C99, the proper answer has been to defer instantiating a pointer (or any other type of object, really) until you have a valid value to initialize it with:
void foo( void )
{
printf( "Gimme a length: " );
int length;
scanf( "%d", &length );
char *buf = malloc( sizeof *buf * length );
...
}
ETA
I added a comment to Steve's answer that I think needs to be emphasized:
There's no way to determine if a pointer is valid - if you receive a pointer argument in a function like
void foo( int *ptr )
{
...
}
there is no test you can run on ptr to indicate that yes, it definitely points to an object within that object's lifetime and is safe to use.
By contrast, there is an easy, standard test to indicate that a pointer is definitely invalid and unsafe to use, and that's by checking that its value is NULL. So you can at least avoid using pointers that are definitely invalid with the
if ( p )
{
// do something with p
}
idiom.
Now, just because p isn't NULL doesn't automatically mean it's valid, but if you're consistent and disciplined about setting unused pointers to NULL, then the odds are pretty high that it is.
This is one of those areas where C doesn't protect you, and you have to devote non-trivial amounts of effort to make sure your code is safe and robust. Frankly, it's a pain in the ass more often than not. But being disciplined with using NULL for inactive pointers makes things a little easier.
Again, you have to do some analysis and think about how pointers are being used in your code. If you know you're going to set a pointer to valid value before it's ever read, then it's not critical to initialize it to anything in particular. If you have multiple pointers pointing to the same object, then you need to make sure if that object ever goes away that all those pointers are updated appropriately.
This assumes that the pointer in question has auto storage duration - if the pointer is declared with the static keyword or at file scope, then it is implicitly initialized to NULL.

Does C always have to use pointers to handle addresses?

As I understand it, all of the cases where C has to handle an address involve the use of a pointer. For example, the & operand creates a pointer to the program, instead of just giving the bare address as data (i.e it never gives the address without using a pointer first):
scanf("%d", &foo)
Or when using the & operand
int i; //a variable
int *p; //a variable that store adress
p = &i; //The & operator returns a pointer to its operand, and equals p to that pointer.
My question is: Is there a reason why C programs always have to use a pointer to manage addresses? Is there a case where C can handle a bare address (the numerical value of the address) on its own or with another method? Or is that completely impossible? (Being because of system architecture, memory allocation changing during and in each runtime, etc). And finally, would that be useful being that addresses change because of memory management? If that was the case, it would be a reason why pointers are always needed.
I'm trying to figure out if the use pointers is a must in C standardized languages. Not because I want to use something else, but because I want to know for sure that the only way to use addresses is with pointers, and just forget about everything else.
Edit: Since part of the question was answered in the comments of Eric Postpischil, Michał Marszałek, user3386109, Mike Holt and Gecko; I'll group those bits here: Yes, using bare adresses bear little to no use because of different factors (Pointers allow a number of operations, adresses may change each time the program is run, etc). As Michał Marszałek pointed out (No pun intended) scanf() uses a pointer because C can only work with copies, so a pointer is needed to change the variable used. i.e
int foo;
scanf("%d", foo) //Does nothing, since value can't be changed
scanf("%d", &foo) //Now foo can be changed, since we use it's address.
Finally, as Gecko mentioned, pointers are there to represent indirection, so that the compiler can make the difference between data and address.
John Bode covers most of those topics in it's answer, so I'll mark that one.
A pointer is an address (or, more properly, it’s an abstraction of an address). Pointers are how we deal with address values in C.
Outside of a few domains, a “bare address” value simply isn’t useful on its own. We’re less interested in the address than the object at that address. C requires us to use pointers in two situations:
When we want a function to write to a parameter
When we need to track dynamically allocated memory
In these cases, we don’t really care what the address value actually is; we just need it to access the object we’re interested in.
Yes, in the embedded world specific address values are meaningful. But you still use pointers to access those locations. Like I said above, a pointer is an address for our purposes.
C allows you to convert pointers to integers. The <stdint.h> header provides a uintptr_t type with the property that any pointer to void can be converted to uintptr_t and back, and the result will compare equal to the original pointer.
Per C 2018 6.3.2.3 6, the result of converting a pointer to an integer is implementation-defined. Non-normative note 69 says “The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.”
Thus, on a machine where addresses are a simple numbering scheme, converting a pointer to a uintptr_t ought to give you the natural machine address, even though the standard does not require it. There are, however, environments where addresses are more complicated, and the result of converting a pointer to an integer may not be straightforward.
int i; //a variable
int *p; //a variable that store adres
i = 10; //now i is set to 10
p = &i; //now p is set to i address
*p = 20; //we set to 20 the given address
int tab[10]; // a table
p = tab; //set address
p++; //operate on address and move it to next element tab[1]
We can operate on address by pointers move forward or backwards. We can set and read from given address.
In C if we want get return values from functions we must use pointers. Or use return value from functions, but that way we can only get one value.
In C we don't have references therefore we must use pointers.
void fun(int j){
j = 10;
}
void fun2(int *j){
*j = 10;
}
int i;
i = 5; // now I is set to 5
fun(i);
//printf i will print 5
fun2(&i);
//printf I will print 10

Comparison (>,>=,<,<=) of pointers from unrelated blocks

I was looking at the GNU implementation of obstacks, and I noticed the obstack_free subroutine is using pointer comparison to the beginnings and ends of the previous links of the linked list to find what block the pointer-to-be freed belongs to.
https://code.woboq.org/userspace/glibc/malloc/obstack.c.html
while (lp != 0 && ((void *) lp >= obj || (void *) (lp)->limit < obj))
{
plp = lp->prev;
CALL_FREEFUN (h, lp);
lp = plp;
h->maybe_empty_object = 1;
} //...
Such comparison appears to be undefined as per http://port70.net/~nsz/c/c11/n1570.html#6.5.8p5:
When two pointers are compared, the result depends on the relative
locations in the address space of the objects pointed to. If two
pointers to object types both point to the same object, or both point
one past the last element of the same array object, they compare
equal. If the objects pointed to are members of the same aggregate
object, pointers to structure members declared later compare greater
than pointers to members declared earlier in the structure, and
pointers to array elements with larger subscript values compare
greater than pointers to elements of the same array with lower
subscript values. All pointers to members of the same union object
compare equal. If the expression P points to an element of an array
object and the expression Q points to the last element of the same
array object, the pointer expression Q+1 compares greater than P. In
all other cases, the behavior is undefined.
Is there a fully standard compliant way to implement obstacks. If not, what platforms could such comparison practically break on?
I am not a language-lawyer, so I don't know how to answer OP's question, except that a plain reading of the standard does not describe the entire picture.
While the standard says that comparing unrelated pointers yields undefined results, the behaviour of a standards-compliant C compiler is much more restricted.
The first sentence in the section concerning pointer comparison is
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to.
and for a very good reason.
If we examine the possibilities how the pointer comparison code may be used, we find that unless the compiler can determine which objects the compared pointers belong to at compile time, all pointers in the same address space must compare arithmetically, according to the addresses they refer to.
(If we prove that a standards-compliant C compiler is required by the standard to provide specific results when a plain reading of the C standard itself says the results are undefined, is such code standards-compliant or not? I don't know. I only know such code works in practice.)
A literal interpretation of the standard may lead to one believing that there is absolutely no way of determining whether a pointer refers to an array element or not. In particular, observing
int is_within(const char *arr, const size_t len, const char *ptr)
{
return (ptr >= arr) && (ptr < (arr + len));
}
a standards compliant C compiler could decide that because comparison between unrelated pointers is undefined, it is justified in optimizing the above function into
int is_within(const char *arr, const size_t len, const char *ptr)
{
if (size)
return ptr != (arr + len);
else
return 0;
}
which returns 1 for pointers within array const char arr[len], and zero at the element just past the end of the array, just like the standard requires; and 1 for all undefined cases.
The problem in that line of thinking arises when a caller, in a separate compilation unit, does e.g.
char buffer[1024];
char *p = buffer + 768;
if (is_within(buffer, (sizeof buffer) / 2, p)) {
/* bug */
} else {
/* correct */
}
Obviously, if the is_within() function was declared static (or static inline), the compiler could examine all call chains that end up in is_within(), and produce correct code.
However, when is_within() is in a separate compilation unit compared to its callers, the compiler can no longer make such assumptions: it simply does not, and cannot know, the object boundaries beforehand. Instead, the only way it can be implemented by a standards-compliant C compiler, is to rely on the addresses the pointers refer to, blindly; something like
int is_within(const char *arr, const size_t len, const char *ptr)
{
const uintptr_t start = POINTER_TO_UINTPTR(arr);
const uintptr_t limit = POINTER_TO_UINTPTR(arr + len);
const uintptr_t thing = POINTER_TO_UINTPTR(ptr);
return (thing >= start) && (thing < limit);
}
where the POINTER_TO_UINTPTR() would be a compiler-internal macro or function, that converts the pointer losslessly to an unsigned integer value (with the intent that there would be a corresponding UINTPTR_TO_POINTER() that could recover the exact same pointer from the unsigned integer value), without consideration for any optimizations or rules allowed by the C standard.
So, if we assume that the code is compiled in a separate compilation unit to its users, the compiler is forced to generate code that provides more quarantees than a simple reading of the C standard would indicate.
In particular, if arr and ptr are in the same address space, the C compiler must generate code that compares the addresses the pointers point to, even if the C standard says that comparison of unrelated pointers yields undefined results; simply because it is at least theoretically possible for an array of objects to occupy any subregion of the address space. The compiler just cannot make assumptions that break conforming C code later on.
In the GNU obstack implementation, the obstacks all exist in the same address space (because of how they are obtained from the OS/kernel). The code assumes that the pointers supplied to it refer to these objects. Although the code does return an error if it detects that a pointer is invalid, it does not guarantee it always detects invalid pointers; thus, we can ignore the invalid pointer cases, and simply assume that because all obstacks are from the same address space, so are all the user-supplied pointers.
There are many architectures with multiple address spaces. x86 with a segmented memory model is one of these. Many microcontrollers have Harvard architecture, with separate address spaces for code and data. Some microcontrollers have a separate address space (different machine instructions) for accessing RAM and flash memory (but capable of executing from both), and so on.
It is even possible for there to be an architecture where each pointer has not only its memory address, but some kind of unique object ID associated with it. This is nothing special; it just means that on such an architecture, each object has their own address space.

Evaluating the condition containing unitialized pointer - UB, but can it crash?

Somewhere on the forums I encountered this:
Any attempt to evaluate an uninitialized pointer variable
invokes undefined behavior. For example:
int *ptr; /* uninitialized */
if (ptr == NULL) ...; /* undefined behavior */
What is meant here?
Is it meant that if I ONLY write:
if(ptr==NULL){int t;};
this statement is already UB?
Why? I am not dereferencing the pointer right?
(I noticed there maybe terminology issue, by UB in this case, I referred to: will my code crash JUST due to the if check?)
Using unitialized variables invokes undefined behavior. It doesn't matter whether it is pointer or not.
int i;
int j = 7 * i;
is undefined as well. Note that "undefined" means that anything can happen, including a possibility that it will work as expected.
In your case:
int *ptr;
if (ptr == NULL) { int i = 0; /* this line does nothing at all */ }
ptr might contain anything, it can be some random trash, but it can be NULL too. This code will most likely not crash since you are just comparing value of ptr to NULL. We don't know if the execution enters the condition's body or not, we can't be even sure that some value will be successfully read - and therefore, the behavior is undefined.
your pointer is not initialized. Your statement would be the same as:
int a;
if (a == 3){int t;}
since a is not initialized; its value can be anything so you have undefined behavior. It doesn't matter whether you dereference your pointer or not. If you would do that, you would get a segfault
The C99 draft standard says it is undefined clearly in Annex J.2 Undefined behavior:
The value of an object with automatic storage duration is used while it is
indeterminate (6.2.4, 6.7.8, 6.8).
and the normative text has an example that also says the same thing in section 6.5.2.5 Compound literals paragraph 17 which says:
Note that if an iteration statement were used instead of an explicit goto and a labeled statement, the lifetime of the unnamed object would be the body of the loop only, and on entry next time around p would have an indeterminate value, which would result in undefined behavior.
and the draft standard defines undefined behavior as:
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
and notes that:
Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
As Shafik has pointed out, the C99 standard draft declares any use of unintialized variables with automatic storage duration undefined behaviour. That amazes me, but that's how it is. My rationale for pointer use comes below, but similar reasons must be true for other types as well.
After int *pi; if (pi == NULL){} your prog is allowed to do arbitrary things. In reality, on PCs, nothing will happen. But there are architectures out there which have illegal address values, much like NaN floats, which will cause a hardware trap when they are loaded in a register. These to us modern PC users unheard of architectures are the reason for this provision. Cf. e.g. How does a hardware trap in a three-past-the-end pointer happen even if the pointer is never dereferenced?.
The behavior of this is undefined because of how the stack is used for various function calls. When a function is called the stack grows to make space for variables within the scope of that function, but this memory space is not cleared or zeroed out.
This can be shown to be unpredictable in code like the following:
#include <stdio.h>
void test()
{
int *ptr;
printf("ptr is %p\n", ptr);
}
void another_test()
{
test();
}
int main()
{
test();
test();
another_test();
test();
return 0;
}
This simply calls the test() function multiple times, which just prints where 'ptr' lives in memory. You'd expect maybe to get the same results each time, but as the stack is manipulated the physical location of where 'ptr' is has changed and the data at that address is unknown in advance.
On my machine running this program results in this output:
ptr is 0x400490
ptr is 0x400490
ptr is 0x400575
ptr is 0x400585
To explore this a bit more, consider the possible security implications of using pointers that you have not intentionally set yourself
#include <stdio.h>
void test()
{
int *ptr;
printf("ptr is %p\n", ptr);
}
void something_different()
{
int *not_ptr_or_is_it = (int*)0xdeadbeef;
}
int main()
{
test();
test();
something_different();
test();
return 0;
}
This results in something that is undefined even though it is predictable. It is undefined because on some machines this will work the same and others it might not work at all, it's part of the magic that happens when your C code is converted to machine code
ptr is 0x400490
ptr is 0x400490
ptr is 0xdeadbeef
Some implementations may be designed in such a way that an attempted rvalue conversion of an invalid pointer may case arbitrary behavior. Other implementations are designed in such a way that an attempt to compare any pointer object with null will never do anything other than yield 0 or 1.
Most implementations target hardware where pointer comparisons simply compare bits without regard for whether those bits represent valid pointers. The authors of many such implementations have historically considered it so obvious that a pointer comparison on such hardware should never have any side-effect other than to report that pointers are equal or report that they are unequal that they seldom bothered to explicitly document such behavior.
Unfortunately, it has become fashionable for implementations to aggressively "optimize" Undefined Behavior by identifying inputs that would cause a program to invoke UB, assuming such inputs cannot occur, and then eliminating any code that would be irrelevant if such inputs were never received. The "modern" viewpoint is that because the authors of the Standard refrained from requiring side-effect-free comparisons on implementations where such a requirement would
impose significant expense, there's no reason compilers for any platform should guarantee them.
You're not dereferencing the pointer, so you don't end up with a segfault. It will not crash. I don't understand why anyone thinks that comparing two numbers will crash. It's nonsense. So again:
IT WILL NOT CRASH. PERIOD.
But it's still UB. You don't know what memory address the pointer contains. It may or may not be NULL. So your condition if (ptr == NULL) may or may not evaluate to true.
Back to my IT WILL NOT CRASH statement. I've just tested the pointer going from 0 to 0xFFFFFFFF on the 32-bit x86 and ARMv6 platforms. It did not crash.
I've also tested the 0..0xFFFFFFFF and 0xFFFFFFFF00000000..0xFFFFFFFFFFFFFFFF ranges on and amd64 platform. Checking the full range would take a few thousand years I guess.
Again, it did not crash.
I challenge the commenters and downvoters to show a platform and value where it crashes. Until then, I'll probably be able to survive a few negative points.
There is also a SO link to
trap representation
which also indicates that it will not crash.

Is it dangerous to use NULL pointer dereferencing at rvalue side ?

I have the following code.
int *a = NULL;
int b;
b = *a;
Is it undefined in any way?
I am trying to access the location pointed by a only for reading. What wrong it can do ?
Is it undefined in any way?
Yes, the behavior is undefined.
I am trying to access the location pointed by a only for reading.
The location pointed to does not exist. There's nothing there. To read something, there has to be a value to read, but a null pointer points... nowhere. It doesn't point at anything.
Implementation chooses the physical values of null-pointers specifically to make them "point" to address(es) that contains nothing of any value on the given platform. That means that there's nothing to see there. Moreover, on a modern virtual memory platform there might not even be a specific "there" associated with that address.
If you are doing it out of pure curiosity, it still won't work on a typical modern platform: that [virtual] memory region is typically "not yours", meaning that hardware/OS protection mechanism will prevent you from reading it (by crashing your program).
Meanwhile, the language states that any attempts to dereference a null-pointer leads to undefined behavior, regardless of what you are dereferencing it for.
Yes, it's totally undefined behaviour. You are dereferencing a null pointer. Anything could happen.
Don't dereference NULL pointers. It's like crossing the streams, but worse. I'm not even sure what you're trying to accomplish? Are you trying to initialize two pointers to NULL?
The result is undefined and will be different under different platforms.
NULL is equal to 0. Therefore saying int* a = NULL means that you make a pointer to address 0, which should contain an integer.
By dereferecing a by b = *a you instruct the program to go to the address 0 and get the data.
Now, depending on the platform, address 0 will most probably be inaccessible to the user code. If you are not running on top of an operating system (as the case is for example on a microcontroller environment), then address 0 is normally the beginning of the internal ROM. In other cases it can be the beginning of the stack etc.
In most of the cases you are going to get a null pointer exception when you dereference a null pointer.
You have two alarms that pop up here. NULL and all other forms of null pointers are strickly forbidden by the standard. Not because they are often realized internally by a bit pattern of all 0 for the representation of such a pointer but because that is the one-and-single reserved value for pointers that indicate that it isn't pointing anywhere. Don't do this.
Having a pointer point to the address that corresponds to the integer value 0 is a different thing. You could only enforce such a value and interpretation of a point by some trick, e.g
union over {
int *a;
uintptr_t b;
} ov;
ov.b = 0;
int c = *ov.a;
What then happens is completely depending on your platform. If 0 is a valid address in your address space this could work, it probably wouldn't. Don't do it unless you know better.

Resources