Why init pointer to NULL is a good practice? [closed] - c

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
In case you don't want (or can not) init a pointer with an address, I often hear people say that you should init it with NULL and it's a good practice.
You can find people also say something like that in SO, for example here.
Working in many C projects, I don't think it is a good practice or at least somehow better than not init the pointer with anything.
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
So, could you tell me what are the reasons if you say that it is a good pratice or people just say it for granted (just like you should always init an variable) ?
Note that, I tried to find in Misra 2004 and also did not find any rule or recommendation for that.
Update:
So most of the comments and answers give the main reason that the pointer could have an random address before being used, so it's better to have it as null so you could figure out the problem faster.
To make my point clearer, I think that it doesn't make senses nowadays in commercial C softwares, in a practical point of view. The unassigned pointer that is used will be detected right the way by most of static analyzer, that's why I prefer to let it be un-init because if I init it with NULL then when developers forget to assign it to a "real" address, it passes the static analyzer and will cause runtime problems with null pointer.

You said
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
I would argue the main reason is actually due to exactly this. If you don't init pointers to NULL, then if there is a dereferecing error it's going to be a lot harder to find the problem because the pointer is not going to be set to NULL, it's going to be a most likely garbage value that may look exactly like a valid pointer.

C has very little runtime error checking, but NULL is guaranteed not to refer to a valid address, so a runtime environment (typically an operating system) is able to trap any attempt to de-refernce a NULL. The trap will identify the point at which the de-reference occurs rather then the point the program may eventually fail, making identification of the bug far easier.
Moreover when debugging, and unitialised pointer with random content may not be easily distinguishable from a valid pointer - it may refer to a plausible address, whereas NULL is always an invalid address.
If you de-reference an uninitialised pointer the result is non-deterministic - it may crash, it may not, but it will still be wrong.
If it does crash you cannot tell how or even when, since it may result in corruption of data, or reading of invalid data that may have no effect until that corrupted data is later used. The point of failure will not necessarily be the point of error.
So the purpose is that you will get deterministic failure, whereas without initialising, anything could happen - including nothing, leaving you with a latent undetected bug.
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
Deterministic failure is not "terrible" - it increases your chance of finding the error during development, rather then having your users finding the error after deployment. What you are effectively suggesting is that it is better to leave the bugs in and hide them. The dereference on null is guaranteed to be trapped, de-referencing an unitialised pointer is not.
That said initialising with null, should only be done if at the point of declaration you cannot directly assign an otherwise valid value. That is to say, for example:
char* x = malloc( y ) ;
is much preferable to:
char* x = NULL ;
...
x = malloc( y ) ;
which is in turn preferable to:
char* x ;
...
x = malloc( y ) ;
Note that, I tried to find in Misra 2004 and also did not find any
rule or recommendation for that.
MISRA C:2004, 9.1 - All automatic variables shall have been assigned a value before being used.
That is to say, there is no guideline to initialise to NULL, simply that initialisation is required. As I said initialisation to NULL is not preferable to initialising to a valid pointer. Don't blindly follow the "must initialise to NULL advice", because the rule is simply "must initialise", and sometimes the appropriate initialisation value is NULL.

If you don't initialize the pointer, it can have any value, including possibly NULL. It's hard to imagine a scenario where having any value including NULL is preferable to definitely having a NULL value. The logic is that it's better to at least know its value than have its value unpredictably depend on who knows what, possibly resulting in the code behaving differently on different platforms, with different compilers, and so on.
I strongly disagree with any answer or argument based on the idea that you can reliably use a test for NULL to tell if a pointer is valid or not. You can set a pointer to NULL and then test it for NULL within a limited context where that is known to be safe. But there will always be contexts where more than one pointer points to the same thing and you cannot ensure that every possible pointer to an object will be set to NULL at the very place the object is freed. It is simply an essential C programming discipline to understand that a pointer may or may not point to a valid object depending on what is going on in the code.

One issue is that given a pointer variable p, there is no way defined by the C language to ask, "does this pointer point to valid memory or not?" The pointer might point to valid memory. It might point to memory that (once upon a time) was allocated by malloc, but that has since been freed (meaning that the pointer is invalid). It might be an uninitialized pointer, meaning that it's not even meaningful to ask where it points (although it is definitely invalid). But, again, there's no way to know.
So if you're bopping along in some far-off corner of a large program, and you want to say
if(p is valid) {
do something with p;
} else {
fprintf(stderr, "invalid pointer!\n");
}
you can't do this. Once again, the C language gives you no way of writing if(p is valid).
So that's where the rule to always initialize pointers to NULL comes in. If you adopt this rule and follow it faithfully, if you initialize every pointer to NULL or as a pointer to valid memory, if whenever you call free(p) you always immediately follow it with p = NULL;, then if you follow all these rules, you can achieve a decent way of asking "is p valid?", namely:
if(p != NULL) {
do something with p;
} else {
fprintf(stderr, "invalid pointer!\n");
}
And of course it's very common to use an abbreviation:
if(p) {
do something with p;
} else {
fprintf(stderr, "invalid pointer!\n");
}
Here most people would read if(p) as "if p is valid" or "if p is allocated".
Addendum: This answer has attracted some criticism, and I suppose that's because, to make a point, I wrote some unrealistic code which some people are reading more into than I'd intended. The idiom I'm advocating here is not so much valid pointers versus invalid pointers, but rather, pointers I have allocated versus pointers I have not allocated (yet). No one writes code that simply detects and prints "invalid pointer!" as if to say "I don't know where this pointer points; it might be uninitialized or stale". The more realistic way of using the idiom is do do something like
/* ensure allocation before proceeding */
if(p == NULL)
p = malloc(...);
or
if(p == NULL) {
/* nothing to do */
return;
}
or
if(p == NULL) {
fprintf(stderr, "null pointer detected\n");
exit(0);
}
(And in all three cases the abbreviation if(!p) is popular as well.)
But, of course, if what you're trying to discriminate is pointers I have allocated versus pointers I have not allocated (yet), it is vital that you initialize all your un-allocated pointers with the explicit marker you're using to record that they're un-allocated, namely NULL.

One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
Which is why you add a check against NULL before using that pointer value:
if ( p ) // p != NULL
{
// do something with p
}
NULL is a well-defined invalid pointer value, guaranteed to compare unequal to any object or function pointer value. It's a well-defined "nowhere" that's easy to check against.
Compare that to the indeterminate value that the uninitialized pointer1 may have - most likely, it will also be an invalid pointer value that will lead to a crash as soon as you try to use it, but it's almost impossible to determine that beforehand. Is 0xfff78567abcd2220 a valid or invalid pointer value? How would you check that?
Obviously, you should do some analysis to see if an initialization is required. Is there a risk of that pointer being dereferenced before you assign a valid pointer value to it? If not, then you don't need to initialize it beforehand.
Since C99, the proper answer has been to defer instantiating a pointer (or any other type of object, really) until you have a valid value to initialize it with:
void foo( void )
{
printf( "Gimme a length: " );
int length;
scanf( "%d", &length );
char *buf = malloc( sizeof *buf * length );
...
}
ETA
I added a comment to Steve's answer that I think needs to be emphasized:
There's no way to determine if a pointer is valid - if you receive a pointer argument in a function like
void foo( int *ptr )
{
...
}
there is no test you can run on ptr to indicate that yes, it definitely points to an object within that object's lifetime and is safe to use.
By contrast, there is an easy, standard test to indicate that a pointer is definitely invalid and unsafe to use, and that's by checking that its value is NULL. So you can at least avoid using pointers that are definitely invalid with the
if ( p )
{
// do something with p
}
idiom.
Now, just because p isn't NULL doesn't automatically mean it's valid, but if you're consistent and disciplined about setting unused pointers to NULL, then the odds are pretty high that it is.
This is one of those areas where C doesn't protect you, and you have to devote non-trivial amounts of effort to make sure your code is safe and robust. Frankly, it's a pain in the ass more often than not. But being disciplined with using NULL for inactive pointers makes things a little easier.
Again, you have to do some analysis and think about how pointers are being used in your code. If you know you're going to set a pointer to valid value before it's ever read, then it's not critical to initialize it to anything in particular. If you have multiple pointers pointing to the same object, then you need to make sure if that object ever goes away that all those pointers are updated appropriately.
This assumes that the pointer in question has auto storage duration - if the pointer is declared with the static keyword or at file scope, then it is implicitly initialized to NULL.

Related

Why is Pointer = NULL defined as clean code?

So my programming teacher told us, that if you don't use a pointer but like to declare it, it is always better to initialize it with NULL. How does it prevent any errors if i don't even use it?
Or if I am wrong, what are the benefits of it?
De-referencing NULL is likely (guaranteed?) to cause a segmentation fault, immediately crashing your app and alerting you to the unsafe memory access you just preformed.
Leaving your pointer uninitialized will mean it still has whatever junk was left over from the previous user of that memory. It's entirely possible for that to be a pointer to a real memory region in your app. Dereferencing it will be undefined behavior, might cause a seg fault, or it might not. The latter is the worst case, which you should fear. Your app will just keep chugging along with whatever non-sense behavior resulted from that.
Here's a demonstration:
#include <stdio.h>
#include <stdlib.h>
void i_segfault() {
fflush(stdout); // Flush whatever is left of STDOUT before we blow up
int *i = NULL;
printf("%i", *i); // Boom
}
void use_some_memory() {
int *some_pointer = malloc(sizeof(int));
*some_pointer = 123;
printf(
"I used the address %p to store a pointer to %p, which contains %i\n",
&some_pointer, some_pointer, *some_pointer
);
}
void i_dont_segfault() {
int *my_new_pointer; // uninitialized
printf(
"I re-used the address %p, which still has a lingering value %p, which still points to %i\n",
&my_new_pointer, my_new_pointer, *my_new_pointer
);
}
int main(int argc, char *argv[]) {
use_some_memory();
i_dont_segfault();
i_segfault();
}
There are at least two ways to make sure your program does not use an uninitialized pointer:
You can design and write your program carefully so that you are sure that program control never flows through a path that uses the pointer without initializing it first.
You can initialize the pointer to NULL.
The first can be hard, depending on the program, and humans keep making mistakes with it. The second is easy.
On the other hand, the second only solves one problem: It ensures that if you use the pointer without otherwise initializing it, it will have an assigned value. Further, in many systems, it ensures that if you attempt to use that value to access a pointed-to object, your program will crash rather than do something worse, like corrupt data and produce wrong results or erase valuable information.
That is a useful problem to solve, because it means this bug of failing to assign the desired value is likely to be caught during testing and, even if it is not, the damage is may due is likely to be limited. However, this does not solve the problem of ensuring that the pointer is assigned the desired value before it is used. So, like many of these code recommendations, it is a useful tip to help limit human errors, but it is not a complete solution.
If you don't initialize a variable (by mistake) the problem is that your program can be even more anomalous than normal. Imagine you have declared an enum with the values ONE, TWO, THREE, and you forget to initialize it, and at some point you include the following code:
switch(my_var) {
case ONE: /* do one */
...
break;
case TWO: /* do two */
...
break;
case THREE: /* do three */
...
break;
}
and you can get nuts because you assume that, at least one of the possible values of the type should be executed... but that's simply not true, as the variable has not been initialized and the value doesn't need even to comply with the constraints imposed to the data type it represents.
Initializing a variable can introduce a short couple of instructions in your code, but it saves a lot of nightmares when searching for errors.
In the case of an uninitialized pointer, the case is worse, as many programmers free() memory allocated from the heap based on the test
if (var) free(var);
and most probably an uninitialized automatic variable of pointer type will point somewhere, and sill be tried to be free()d.

Invalid pointer becoming valid again

int *p;
{
int x = 0;
p = &x;
}
// p is no longer valid
{
int x = 0;
if (&x == p) {
*p = 2; // Is this valid?
}
}
Accessing a pointer after the thing it points to has been freed is undefined behavior, but what happens if some later allocation happens in the same area, and you explicitly compare the old pointer to a pointer to the new thing? Would it have mattered if I cast &x and p to uintptr_t before comparing them?
(I know it's not guaranteed that the two x variables occupy the same spot. I have no reason to do this, but I can imagine, say, an algorithm where you intersect a set of pointers that might have been freed with a set of definitely valid pointers, removing the invalid pointers in the process. If a previously-invalidated pointer is equal to a known good pointer, I'm curious what would happen.)
By my understanding of the standard (6.2.4. (2))
The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.
you have undefined behaviour when you compare
if (&x == p) {
as that meets these points listed in Annex J.2:
— The value of a pointer to an object whose lifetime has ended is used (6.2.4).
— The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.9, 6.8).
Okay, this seems to be interpreted as a two- make that three part question by some people.
First, there were concerns if using the pointer for a comparison is defined at all.
As is pointed out in the comments, the mere use of the pointer is UB, since $J.2: says use of pointer to object whose lifetime has ended is UB.
However, if that obstacle is passed (which is well in the range of UB, it can work after all and will on many platforms), here is what I found about the other concerns:
Given the pointers do compare equal, the code is valid:
C Standard, §6.5.3.2,4:
[...] If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
Although a footnote at that location explicitly says. that the address of an object after the end of its lifetime is an invalid pointer value, this does not apply here, since the if makes sure the pointer's value is the address of x and thus is valid.
C++ Standard, §3.9.2,3:
If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained. [ Note: For instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address.
Emphasis is mine.
It will probably work with most of the compilers but it still is undefined behavior. For the C language these x are two different objects, one has ended its lifetime, so you have UB.
More seriously, some compilers may decide to fool you in a different way than you expect.
The C standard says
Two pointers compare equal if and only if both are null pointers, both
are pointers to the same object (including a pointer to an object and
a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer to
one past the end of one array object and the other is a pointer to the
start of a different array object that happens to immediately follow
the first array object in the address space.
Note in particular the phrase "both are pointers to the same object". In the sense of the standard the two "x"s are not the same object. They may happen to be realized in the same memory location, but this is to the discretion of the compiler. Since they are clearly two distinct objects, declared in different scopes the comparison should in fact never be true. So an optimizer might well cut away that branch completely.
Another aspect that has not yet been discussed of all that is that the validity of this depends on the "lifetime" of the objects and not the scope. If you'd add a possible jump into that scope
{
int x = 0;
p = &x;
BLURB: ;
}
...
if (...)
...
if (something) goto BLURB;
the lifetime would extend as long as the scope of the first x is reachable. Then everything is valid behavior, but still your test would always be false, and optimized out by a decent compiler.
From all that you see that you better leave it at argument for UB, and don't play such games in real code.
It would work, if by work you use a very liberal definition, roughly equivalent to that it would not crash.
However, it is a bad idea. I cannot imagine a single reason why it is easier to cross your fingers and hope that the two local variables are stored in the same memory address than it is to write p=&x again. If this is just an academic question, then yes it's valid C - but whether the if statement is true or not is not guaranteed to be consistent across platforms or even different programs.
Edit: To be clear, the undefined behavior is whether &x == p in the second block. The value of p will not change, it's still a pointer to that address, that address just doesn't belong to you anymore. Now the compiler might (probably will) put the second x at that same address (assuming there isn't any other intervening code). If that happens to be true, it's perfectly legal to dereference p just as you would &x, as long as it's type is a pointer to an int or something smaller. Just like it's legal to say p = 0x00000042; if (p == &x) {*p = whatever;}.
The behaviour is undefined. However, your question reminds me of another case where a somewhat similar concept was being employed. In the case alluded, there were these threads which would get different amounts of cpu times because of their priorities. So, thread 1 would get a little more time because thread 2 was waiting for I/O or something. Once its job was done, thread 1 would write values to the memory for the thread two to consume. This is not "sharing" the memory in a controlled way. It would write to the calling stack itself. Where variables in thread 2 would be allocated memory. Now, when thread 2 eventually got round to execution,all its declared variables would never have to be assigned values because the locations they were occupying had valid values. I don't know what they did if something went wrong in the process but this is one of the most hellish optimizations in C code I have ever witnessed.
Winner #2 in this undefined behavior contest is rather similar to your code:
#include <stdio.h>
#include <stdlib.h>
int main() {
int *p = (int*)malloc(sizeof(int));
int *q = (int*)realloc(p, sizeof(int));
*p = 1;
*q = 2;
if (p == q)
printf("%d %d\n", *p, *q);
}
According to the post:
Using a recent version of Clang (r160635 for x86-64 on Linux):
$ clang -O realloc.c ; ./a.out
1 2
This can only be explained if the Clang developers consider that this example, and yours, exhibit undefined behavior.
Put aside the fact if it is valid (and I'm convinced now that it's not, see Arne Mertz's answer) I still think that it's academic.
The algorithm you are thinking of would not produce very useful results, as you could only compare two pointers, but you have no chance to determine if these pointers point to the same kind of object or to something completely different. A pointer to a struct could now be the address of a single char for example.

is it necessary to call pointer = NULL when initializing?

when I create a pointer to certain struct, do I have to set it to NULL, then alloc it then use it? and why?
No, you don't have to set it to NULL, but some consider it good practice as it gives a new pointer a value that makes it explicit it's not pointing at anything (yet).
If you are creating a pointer and then immediately assigning another value to it, then there's really not much value in setting it to NULL.
It is a good idea to set a pointer to NULL after you free the memory it was pointing to, though.
No, there is no requirement (as far as the language is concerned) to initialize a pointer variable to anything when declaring it. Thus
T* ptr;
is a valid declaration that introduces a variable named ptr with an indeterminate value. You can even use the variable in certain ways without first allocating anything or setting it to any specific value:
func(&ptr);
According to the C standard, not initializing an automatic storage variable leaves its value indeterminate.
You are definitely encouraged to set pointers to NULL whenever the alternative is your pointer having an indeterminate value. Usage of pointers with an indeterminate value is undefined behavior, a concept that you might not grasp properly if you're used to C programming and come from higher level languages.
Undefined behavior means that anything could happen, and this is a part of the C standard since the philosophy of C is letting the programmer have the burden to control things, instead of protecting him from his mistakes.
You want to avoid undefined behaviors in code: it is very hard to debug stuff when any behavior can occurr. It is not caught by a compiler, and your tests may always pass, leaving the bug unnoticed until it is very expensive to correct it.
If you do something like this:
char *p;
if(!p) puts("Pointer is NULL");
You don't actually know whether that if control will be true or false. This is extremely dangerous in large programs where you may declare your variable and then use it somewhere very far in space and time.
Same concept when you reference freed memory.
free(p);
printf("%s\n", p);
p = q;
You don't actually know what you're going to get with the last two statements. You can't test it properly, you cannot be sure of your outcome. Your code may seem to work because freed memory is recycled even if it may have been altered... you could even overwrite memory, or corrupt other variables that share the same memory. So you still have a bug that sooner or later will manifest itself and could be very hard to debug it.
If you set the pointer to NULL, you protect yourself from these dangerous situations: during your tests, your program will have a deterministic, predictable behavior that will fail fast and cheaply. You get a segfault, you catch the bug, you fix the bug. Clean and simple:
#include <stdio.h>
#include <stdlib.h>
int main(){
char* q;
if(!q) puts("This may or may not happen, who knows");
q = malloc(10);
free(q);
printf("%s\n", q); // this is probably don't going to crash, but you still have a bug
char* p = NULL;
if(!p) puts("This is always going to happen");
p = malloc(10);
free(p);
p = NULL;
printf("%s\n", p); // this is always going to crash
return 0;
}
So, when you initialize a pointer, especially in large programs, you want them to be explicitely initliazed or set to NULL. You don't want them to get an indeterminate value, never. I should say unless you know what you are doing, but I prefer never instead.
No, don't forget that initialization has to be to a null pointer at all. A very convenient idiom in modern C is to declare variables at their first use
T * ptr = malloc(sizeof *ptr);
This avoids you a lot of hussle of remembering the type and whether or not a variable is already initialized. Only if you don't know where (or even if) it is later initialized, then you definitively should initialize it to a null pointer.
So as a rule of thumb, always initialize variables to the appropriate value. The "proper 0 for the type" is always a good choice if you don't have a better one at hand. For all types, C is made like that, such that it works.
Not initializing a variable is premature optimization in most cases. Only go through your variables when you see that there is a real performance bottleneck, there. In particular if there is an assignment inside the same function before a use of the initial value, andy modern compiler will optimize you the initialization out. Think correctness of your program first.
You dont have to unless you dont want it to dangle a while. You know you should after freeing(defensive style).
You do not have to initialize to NULL. If you intend to allocate it immediately, you can skip it.
At can be useful for error handling like in the following example. In this case breaking to the end after p allocation would leave q uninitialized.
int func(void)
{
char *p;
char *q;
p = malloc(100);
if (!p)
goto end;
q = malloc(100);
if (!q)
goto end;
/* Do something */
end:
free(p);
free(q);
return 0;
}
Also, and this is my personal taste, allocate structs always with calloc. Strings may be allocated uninitialized with malloc

initializing char pointers

I have a char pointer which would be used to store a string. It is used later in the program.
I have declared and initialized like this:
char * p = NULL;
I am just wondering if this is good practice. I'm using gcc 4.3.3.
Yes, it's good idea.
Google Code Style recommends:
To initialize all your variables even if you don't need them right now.
Initialize pointers by NULL, int's by 0 and float's by 0.0 -- just for better readability.
int i = 0;
double x = 0.0;
char* c = NULL;
You cannot store a string in a pointer.
Your definition of mgt_dev_name is good, but you need to point it somewhere with space for your string. Either malloc() that space or use a previously defined array of characters.
char *mgt_dev_name = NULL;
char data[4200];
/* ... */
mgt_dev_name = data; /* use array */
/* ... */
mgt_dev_name = malloc(4200);
if (mgt_dev_name != NULL) {
/* use malloc'd space */
free(mgt_dev_name);
} else {
/* error: not enough memory */
}
It is good practice to initialize all variables.
If you're asking whether it's necessary, or whether it's a good idea to initialize the variable to NULL before you set it to something else later on: It's not necessary to initialize it to NULL, it won't make any difference for the functionality of your program.
Note that in programming, it's important to understand every line of code - why it's there and what exactly it's doing. Don't do things without knowing what they mean or without understanding why you're doing them.
Another option is to not define the variable until the place in your code where you have access to it's initial value. So rather then doing:
char *name = NULL;
...
name = initial_value;
I would change that to:
...
char *name = initial_value;
The compiler will then prevent you from referencing the variable in the part of the code where it has no value. Depending on the specifics of your code this may not always be possible (for example, the initial value is set in an inner scope but the variable has a different lifetime), moving the definition as late as possible in the code prevents errors.
That said, this is only allowed starting with the c99 standard (it's also valid C++). To enable c99 features in gcc, you'll need to either do:
gcc -std=gnu99
or if you don't want gcc extensions to the standard:
gcc -std=c99
No, it is not a good practice, if I understood your context correctly.
If your code actually depends on the mgt_dev_name having the initial value of a null-pointer, then, of course, including the initializer into the declaration is a very good idea. I.e. if you'd have to do this anyway
char *mgt_dev_name;
/* ... and soon after */
mgt_dev_name = NULL;
then it is always a better idea to use initialization instead of assignment
char *mgt_dev_name = NULL;
However, initialization is only good when you can initialize your object with a meaningful useful value. A value that you will actually need. In general case, this is only possible in languages that allow declarations at any point in the code, C99 and C++ being good examples of such languages. By the time you need your object, you usually already know the appropriate initializer for that object, and so can easily come up with an elegant declaration with a good initializer.
In C89/90 on the other hand, declarations can only be placed at the beginning of the block. At that point, in general case, you won't have meaningful initializers for all of your objects. Should you just initialize them with something, anything (like 0 or NULL) just to have them initialized? No!!! Never do meaningless things in your code. It will not improve anything, regardless of what various "style guides" might tell you. In reality, meaningless initialization might actually cover bugs in your code, making it the harder to discover and fix them.
Note, that even in C89/90 it is always beneficial to strive for better locality of declarations. I.e. a well-known good practice guideline states: always make your variables as local as they can be. Don't pile up all your local object declarations at the very beginning of the function, but rather move them to the beginning of the smallest block that envelopes the entire lifetime of the object as tightly as possible. Sometimes it might even be a good idea to introduce a fictive, otherwise unnecessary block just to improve the locality of declarations. Following this practice will help you to provide good useful initializers to your objects in many (if not most) cases. But some objects will remain uninitialized in C89/90 just because you won't have a good initializer for them at the point of declaration. Don't try to initialize them with "something" just for the sake of having them initialized. This will achieve absolutely nothing good, and might actually have negative consequences.
Note that some modern development tools (like MS Visual Studio 2005, for example) will catch run-time access to uninitialized variables in debug version of the code. I.e these tools can help you to detect situations when you access a variable before it had a chance to acquire a meaningful value, indicating a bug in the code. But performing unconditional premature initialization of your variables you essentially kill that capability of the tool and sweep these bugs under the carpet.
This topic has already been discussed here:
http://www.velocityreviews.com/forums/t282290-how-to-initialize-a-char.html
It refers to C++, but it might be useful for you, too.
There are several good answers to this question, one of them has been accepted. I'm going to answer anyway in order to expand on practicalities.
Yes, it is good practice to initialize pointers to NULL, as well as set pointers to NULL after they are no longer needed (i.e. freed).
In either case, its very practical to be able to test a pointer prior to dereferencing it. Lets say you have a structure that looks like this:
struct foo {
int counter;
unsigned char ch;
char *context;
};
You then write an application that spawns several threads, all of which operate on a single allocated foo structure (safely) through the use of mutual exclusion.
Thread A gets a lock on foo, increments counter and checks for a value in ch. It does not find one, so it does not allocate (or modify) context. Instead, it stores a value in ch so that thread B can do this work instead.
Thread B Sees that counter has been incremented, notes a value in ch but isn't sure if thread A has done anything with context. If context was initialized as NULL, thread B no longer has to care what thread A did, it knows context is safe to dereference (if not NULL) or allocate (if NULL) without leaking.
Thread B does its business, thread A reads its context, frees it, then re-initializes it to NULL.
The same reasoning applies to global variables, without the use of threads. Its good to be able to test them in various functions prior to dereferencing them (or attempting to allocate them thus causing a leak and undefined behavior in your program).
When it gets silly is when the scope of the pointer does not go beyond a single function. If you have a single function and can't keep track of the pointers within it, usually this means the function should be re-factored. However, there is nothing wrong with initializing a pointer in a single function, if only to keep uniform habits.
The only time I've ever seen an 'ugly' case of relying on an initialized pointer (before and after use) is in something like this:
void my_free(void **p)
{
if (*p != NULL) {
free(*p);
*p = NULL;
}
}
Not only is dereferencing a type punned pointer frowned upon on strict platforms, the above code makes free() even more dangerous, because callers will have some delusion of safety. You can't rely on a practice 'wholesale' unless you are sure every operation is in agreement.
Probably a lot more information than you actually wanted.
Preferred styles:
in C: char * c = NULL;
in C++: char * c = 0;
My rationale is that if you don't initialize with NULL, and then forget to initialize altogether, the kinds of bugs you will get in your code when dereferencing are much more difficult to trace due to the potential garbage held in memory at that point. On the other hand, if you do initialize to NULL, most of the time you will only get a segmentation fault, which is better, considering the alternative.
Initializing variables even when you don't need them initialized right away is a good practice. Usually, we initialize pointers to NULL, int to 0 and floats to 0.0 as a convention.
int* ptr = NULL;
int i = 0;
float r = 0.0;
It is always good to initialize pointer variables in C++ as shown below:
int *iPtr = nullptr;
char *cPtr = nullptr;
Because initializing as above will help in condition like below since nullptr is convertible to bool, else your code will end up throwing some compilation warnings or undefined behaviour:
if(iPtr){
//then do something.
}
if(cPtr){
//then do something.
}

How can dereferencing a NULL pointer in C not crash a program?

I need help of a real C guru to analyze a crash in my code. Not for fixing the crash; I can easily fix it, but before doing so I'd like to understand how this crash is even possible, as it seems totally impossible to me.
This crash only happens on a customer machine and I cannot reproduce it locally (so I cannot step through the code using a debugger), as I cannot obtain a copy of this user's database. My company also won't allow me to just change a few lines in the code and make a custom build for this customer (so I cannot add some printf lines and have him run the code again) and of course the customer has a build without debug symbols. In other words, my debbuging abilities are very limited. Nonetheless I could nail down the crash and get some debugging information. However when I look at that information and then at the code I cannot understand how the program flow could ever reach the line in question. The code should have crashed long before getting to that line. I'm totally lost here.
Let's start with the relevant code. It's very little code:
// ... code above skipped, not relevant ...
if (data == NULL) return -1;
information = parseData(data);
if (information == NULL) return -1;
/* Check if name has been correctly \0 terminated */
if (information->kind.name->data[information->kind.name->length] != '\0') {
freeParsedData(information);
return -1;
}
/* Copy the name */
realLength = information->kind.name->length + 1;
*result = malloc(realLength);
if (*result == NULL) {
freeParsedData(information);
return -1;
}
strlcpy(*result, (char *)information->kind.name->data, realLength);
// ... code below skipped, not relevant ...
That's already it. It crashes in strlcpy. I can tell you even how strlcpy is really called at runtime. strlcpy is actually called with the following paramaters:
strlcpy ( 0x341000, 0x0, 0x1 );
Knowing this it is rather obvious why strlcpy crashes. It tries to read one character from a NULL pointer and that will of course crash. And since the last parameter has a value of 1, the original length must have been 0. My code clearly has a bug here, it fails to check for the name data being NULL. I can fix this, no problem.
My question is:
How can this code ever get to the strlcpy in the first place?
Why does this code not crash at the if-statement?
I tried it locally on my machine:
int main (
int argc,
char ** argv
) {
char * nullString = malloc(10);
free(nullString);
nullString = NULL;
if (nullString[0] != '\0') {
printf("Not terminated\n");
exit(1);
}
printf("Can get past the if-clause\n");
char xxx[10];
strlcpy(xxx, nullString, 1);
return 0;
}
This code never gets passed the if statement. It crashes in the if statement and that is definitely expected.
So can anyone think of any reason why the first code can get passed that if-statement without crashing if name->data is really NULL? This is totally mysterious to me. It doesn't seem deterministic.
Important extra information:
The code between the two comments is really complete, nothing has been left out. Further the application is single threaded, so there is no other thread that could unexpectedly alter any memory in the background. The platform where this happens is a PPC CPU (a G4, in case that could play any role). And in case someone wonders about "kind.", this is because "information" contains a "union" named "kind" and name is a struct again (kind is a union, every possible union value is a different type of struct); but this all shouldn't really matter here.
I'm grateful for any idea here. I'm even more grateful if it's not just a theory, but if there is a way I can verify that this theory really holds true for the customer.
Solution
I accepted the right answer already, but just in case anyone finds this question on Google, here's what really happened:
The pointers were pointing to memory, that has already been freed. Freeing memory won't make it all zero or cause the process to give it back to the system at once. So even though the memory has been erroneously freed, it was containing the correct values. The pointer in question is not NULL at the time the "if check" is performed.
After that check I allocate some new memory, calling malloc. Not sure what exactly malloc does here, but every call to malloc or free can have far-reaching consequences to all dynamic memory of the virtual address space of a process. After the malloc call, the pointer is in fact NULL. Somehow malloc (or some system call malloc uses) zeros the already freed memory where the pointer itself is located (not the data it points to, the pointer itself is in dynamic memory). Zeroing that memory, the pointer now has a value of 0x0, which is equal to NULL on my system and when strlcpy is called, it will of course crash.
So the real bug causing this strange behavior was at a completely different location in my code. Never forget: Freed memory keeps it values, but it is beyond your control for how long. To check if your app has a memory bug of accessing already freed memory, just make sure the freed memory is always zeroed before it is freed. In OS X you can do this by setting an environment variable at runtime (no need to recompile anything). Of course this slows down the program quite a bit, but you will catch those bugs much earlier.
First, dereferencing a null pointer is undefined behavior. It can crash, not crash, or set your wallpaper to a picture of SpongeBob Squarepants.
That said, dereferencing a null pointer will usually result in a crash. So your problem is probably memory corruption-related, e.g. from writing past the end of one of your strings. This can cause a delayed-effect crash. I'm particularly suspicious because it's highly unlikely that malloc(1) will fail unless your program is butting up against the end of its available virtual memory, and you would probably notice if that were the case.
Edit: OP pointed out that it isn't result that is null but information->kind.name->data. Here's a potential issue then:
There is no check for whether information->kind.name->data is null. The only check on that is
if (information->kind.name->data[information->kind.name->length] != '\0') {
Let's assume that information->kind.name->data is null, but information->kind.name->length is, say, 100. Then this statement is equivalent to:
if (*(information->kind.name->data + 100) != '\0') {
Which does not dereference NULL but rather dereferences address 100. If this does not crash, and address 100 happens to contain 0, then this test will pass.
It is possible that the structure is located in memory that has been free()'d, or the heap is corrupted. In that case, malloc() could be modifying the memory, thinking that it is free.
You might try running your program under a memory checker. One memory checker that supports Mac OS X is valgrind, although it supports Mac OS X only on Intel, not on PowerPC.
The effect of dereferencing the null pointer is undefined by standard as far as I know.
According to C Standard 6.5.3.2/4:
If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
So there could be crash or could be not.
You may be experiencing stack corruption. The line of code you are refering to may not be being executed at all.
My theory is that information->kind.name->length is a very large value so that information->kind.name->data[information->kind.name->length] is actually referring to a valid memory address.
The act of dereferencing a NULL pointer is undefined by the standard. It is not guaranteed to crash and often times won't unless you actually try and write to the memory.
As an FYI, when I see this line:
if (information->kind.name->data[information->kind.name->length] != '\0') {
I see up to three different pointer dereferences:
information
name
data (if it's a pointer and not a fixed array)
You check information for non-null, but not name and not data. What makes you so sure that they're correct?
I also echo other sentiments here about something else possibly damaging your heap earlier. If you're running on windows, consider using gflags to do things like page allocation, which can be used to detect if you or someone else is writing past the end of a buffer and stepping on your heap.
Saw that you're on a Mac - ignore the gflags comment - it might help someone else who reads this. If you're running on something earlier than OS X, there are a number of handy Macsbugs tools to stress the heap (like the heap scramble command, 'hs').
I'm interested in the char* cast in the call to strlcpy.
Could the type data* be different in size than the char* on your system? If char pointers are smaller you could get a subset of the data pointer which could be NULL.
Example:
int a = 0xffff0000;
short b = (short) a; //b could be 0 if lower bits are used
Edit: Spelling mistakes corrected.
Here's one specific way you can get past the 'data' pointer being NULL in
if (information->kind.name->data[information->kind.name->length] != '\0') {
Say information->kind.name->length is large. Atleast larger than
4096, on a particular platform with a particular compiler (Say, most *nixes with a stock gcc compiler) the code will result in a memory read of "address of kind.name->data + information->kind.name->length].
At a lower level, that read is "read memory at address (0 + 8653)" (or whatever the length was).
It's common on *nixes to mark the first page in the address space as "not accessible", meaning dereferencing a NULL pointer that reads memory address 0 to 4096 will result in a hardware trap being propagated to the application and crash it.
Reading past that first page, you might happen to poke into valid mapped memory, e.g. a shared library or something else that happened to be mapped there - and the memory access will not fail. And that's ok. Dereferencing a NULL pointer is undefined behavior, nothing requires it to fail.
Missing '{' after last if statement means that something in the "// ... code above skipped, not relevant ..." section is controlling access to that entire fragment of code. Out of all the code pasted only the strlcpy is executed. Solution: never use if statements without curly brackets to clarify control.
Consider this...
if(false)
{
if(something == stuff)
{
doStuff();
.. snip ..
if(monkey == blah)
some->garbage= nothing;
return -1;
}
}
crash();
Only "crash();" gets executed.
I would run your program under valgrind. You already know there's a problem with NULL pointers, so profile that code.
The advantage that valgrind beings here is that it checks every single pointer reference and checks to see if that memory location has been previously declared, and it will tell you the line number, structure, and anything else you care to know about memory.
As every one else mentioned, referencing the 0 memory location is a "que sera, sera" kinda thing.
My C tinged spidey sense is telling me that you should break out those structure walks on the
if (information->kind.name->data[information->kind.name->length] != '\0') {
line like
if (information == NULL) {
return -1;
}
if (information->kind == NULL) {
return -1;
}
and so on.
Wow, thats strange. One thing does look slightly suspicious to me, though it may not contribute:
What would happen if information and data were good pointers (non null), but information.kind.name was null. You don't dereference this pointer until the strlcpy line, so if it was null, it might not crash until then. Of course, earlier than t hat you do dereference data[1] to set it to \0, which should also crash, but due to whatever fluke, your program may just happen to have write access to 0x01 but not 0x00.
Also, I see you use information->name.length in one place but information->kind.name.length in another, not sure if thats a typo or if thats desired.
Despite the fact that dereferencing a null pointer leads to undefined behaviour and not necessarily to a crash, you should check the value of information->kind.name->data and not the contents of information->kind.name->data[1].
char * p = NULL;
p[i] is like
p += i;
which is a valid operation, even on a nullpointer. it then points at memory location 0x0000[...]i
You should always check whether information->kind.name->data is null anyway, but in this case
in
if (*result == NULL)
freeParsedData(information);
return -1;
}
you have missed a {
it should be
if (*result == NULL)
{
freeParsedData(information);
return -1;
}
This is a good reason for this coding style, instead of
if (*result == NULL) {
freeParsedData(information);
return -1;
}
where you might not spot the missing brace because you are used to the shape of the code block without the brace separating it from the if clause.
*result = malloc(realLength); // ???
Address of newly allocated memory segment is stored at the location referenced by the address contained in the variable "result".
Is this the intent? If so, the strlcpy may need modification.
As per my understanding, the special case of this problem is invalid access resulting with an attempt to read or write, using a Null pointer. Here the detection of the problem is very much hardware dependent. On some platforms, accessing memory for read or write using in NULL pointer will result in an exception.

Resources