Why is Pointer = NULL defined as clean code? - c

So my programming teacher told us, that if you don't use a pointer but like to declare it, it is always better to initialize it with NULL. How does it prevent any errors if i don't even use it?
Or if I am wrong, what are the benefits of it?

De-referencing NULL is likely (guaranteed?) to cause a segmentation fault, immediately crashing your app and alerting you to the unsafe memory access you just preformed.
Leaving your pointer uninitialized will mean it still has whatever junk was left over from the previous user of that memory. It's entirely possible for that to be a pointer to a real memory region in your app. Dereferencing it will be undefined behavior, might cause a seg fault, or it might not. The latter is the worst case, which you should fear. Your app will just keep chugging along with whatever non-sense behavior resulted from that.
Here's a demonstration:
#include <stdio.h>
#include <stdlib.h>
void i_segfault() {
fflush(stdout); // Flush whatever is left of STDOUT before we blow up
int *i = NULL;
printf("%i", *i); // Boom
}
void use_some_memory() {
int *some_pointer = malloc(sizeof(int));
*some_pointer = 123;
printf(
"I used the address %p to store a pointer to %p, which contains %i\n",
&some_pointer, some_pointer, *some_pointer
);
}
void i_dont_segfault() {
int *my_new_pointer; // uninitialized
printf(
"I re-used the address %p, which still has a lingering value %p, which still points to %i\n",
&my_new_pointer, my_new_pointer, *my_new_pointer
);
}
int main(int argc, char *argv[]) {
use_some_memory();
i_dont_segfault();
i_segfault();
}

There are at least two ways to make sure your program does not use an uninitialized pointer:
You can design and write your program carefully so that you are sure that program control never flows through a path that uses the pointer without initializing it first.
You can initialize the pointer to NULL.
The first can be hard, depending on the program, and humans keep making mistakes with it. The second is easy.
On the other hand, the second only solves one problem: It ensures that if you use the pointer without otherwise initializing it, it will have an assigned value. Further, in many systems, it ensures that if you attempt to use that value to access a pointed-to object, your program will crash rather than do something worse, like corrupt data and produce wrong results or erase valuable information.
That is a useful problem to solve, because it means this bug of failing to assign the desired value is likely to be caught during testing and, even if it is not, the damage is may due is likely to be limited. However, this does not solve the problem of ensuring that the pointer is assigned the desired value before it is used. So, like many of these code recommendations, it is a useful tip to help limit human errors, but it is not a complete solution.

If you don't initialize a variable (by mistake) the problem is that your program can be even more anomalous than normal. Imagine you have declared an enum with the values ONE, TWO, THREE, and you forget to initialize it, and at some point you include the following code:
switch(my_var) {
case ONE: /* do one */
...
break;
case TWO: /* do two */
...
break;
case THREE: /* do three */
...
break;
}
and you can get nuts because you assume that, at least one of the possible values of the type should be executed... but that's simply not true, as the variable has not been initialized and the value doesn't need even to comply with the constraints imposed to the data type it represents.
Initializing a variable can introduce a short couple of instructions in your code, but it saves a lot of nightmares when searching for errors.
In the case of an uninitialized pointer, the case is worse, as many programmers free() memory allocated from the heap based on the test
if (var) free(var);
and most probably an uninitialized automatic variable of pointer type will point somewhere, and sill be tried to be free()d.

Related

Returning an array from function in C

I've written a function that returns an array whilst I know that I should return a dynamically allocated pointer instead, but still I wanted to know what happens when I am returning an array declared locally inside a function (without declaring it as static), and I got surprised when I noticed that the memory of the internal array in my function wasn't deallocated, and I got my array back to main.
The main:
int main()
{
int* arr_p;
arr_p = demo(10);
return 0;
}
And the function:
int* demo(int i)
{
int arr[10] = { 0 };
for (int i = 0; i < 10; i++)
{
arr[i] = i;
}
return arr;
}
When I dereference arr_p I can see the 0-9 integers set in the demo function.
Two questions:
How come when I examined arr_p I saw that its address is the same as arr which is in the demo function?
How come demo_p is pointing to data which is not deallocated (the 0-9 numbers) already in demo? I expected that arr inside demo will be deallocated as we got out of demo scope.
One of the things you have to be careful of when programming is to pay good attention to what the rules say, and not just to what seems to work. The rules say you're not supposed to return a pointer to a locally-allocated array, and that's a real, true rule.
If you don't get an error when you write a program that returns a pointer to a locally-allocated array, that doesn't mean it was okay. (Although, it means you really ought to get a newer compiler, because any decent, modern compiler will warn about this.)
If you write a program that returns a pointer to a locally-allocated array and it seems to work, that doesn't mean it was okay, either. Be really careful about this: In general, in programming, but especially in C, seeming to work is not proof that your program is okay. What you really want is for your program to work for the right reasons.
Suppose you rent an apartment. Suppose, when your lease is up, and you move out, your landlord does not collect your key from you, but does not change the lock, either. Suppose, a few days later, you realize you forgot something in the back of one closet. Suppose, without asking, you sneak back to try to collect it. What happens next?
As it happens, your key still works in the lock. Is this a total surprise, or mildly unexpected, or guaranteed to work?
As it happens, your forgotten item still is in the closet. It has not yet been cleared out. Is this a total surprise, or mildly unexpected, or guaranteed to happen?
In the end, neither your old landlord, nor the police, accost you for this act of trespass. Once more, is this a total surprise, or mildly unexpected, or just about completely expected?
What you need to know is that, in C, reusing memory you're no longer allowed to use is just about exactly analogous to sneaking back in to an apartment you're no longer renting. It might work, or it might not. Your stuff might still be there, or it might not. You might get in trouble, or you might not. There's no way to predict what will happen, and there's no (valid) conclusion you can draw from whatever does or doesn't happen.
Returning to your program: local variables like arr are usually stored on the call stack, meaning they're still there even after the function returns, and probably won't be overwritten until the next function gets called and uses that zone on the stack for its own purposes (and maybe not even then). So if you return a pointer to locally-allocated memory, and dereference that pointer right away (before calling any other function), it's at least somewhat likely to "work". This is, again, analogous to the apartment situation: if no one else has moved in yet, it's likely that your forgotten item will still be there. But it's obviously not something you can ever depend on.
arr is a local variable in demo that will get destroyed when you return from the function. Since you return a pointer to that variable, the pointer is said to be dangling. Dereferencing the pointer makes your program have undefined behavior.
One way to fix it is to malloc (memory allocate) the memory you need.
Example:
#include <stdio.h>
#include <stdlib.h>
int* demo(int n) {
int* arr = malloc(sizeof(*arr) * n); // allocate
for (int i = 0; i < n; i++) {
arr[i] = i;
}
return arr;
}
int main() {
int* arr_p;
arr_p = demo(10);
printf("%d\n", arr_p[9]);
free(arr_p) // free the allocated memory
}
Output:
9
How come demo_p is pointing to data which is not deallocated (the 0-9 numbers) already in demo? I expected that arr inside demo will be deallocated as we got out of demo scope.
The life of the arr object has ended and reading the memory addresses previously occupied by arr makes your program have undefined behavior. You may be able to see the old data or the program may crash - or do something completely different. Anything can happen.
… I noticed that the memory of the internal array in my function wasn't deallocated…
Deallocation of memory is not something you can notice or observe, except by looking at the data that records memory reservations (in this case, the stack pointer). When memory is reserved or released, that is just a bookkeeping process about what memory is available or not available. Releasing memory does not necessarily erase memory or immediately reuse it for another purpose. Looking at the memory does not necessarily tell you whether it is in use or not.
When int arr[10] = { 0 }; appears inside a function, it defines an array that is allocated automatically when the function starts executing (or at certain times within the function execution if the definition is in some nested scope). This is commonly done by adjusting the stack pointer. In common systems, programs have a region of memory called the stack, and a stack pointer contains an address that marks the end of the portion of the stack that is currently reserved for use. When a function starts executing, the stack pointer is changed to reserve more memory for that function’s data. When execution of the function ends, the stack pointer is changed to release that memory.
If you keep a pointer to that memory (how you can do that is another matter, discussed below), you will not “notice” or “observe” any change to that memory immediately after the function returns. That is why you see the value of arr_p is the address that arr had, and it is why you see the old data in that memory.
If you call some other function, the stack pointer will be adjusted for the new function, that function will generally use the memory for its own purposes, and then the contents of that memory will have changed. The data you had in arr will be gone. A common example of this that beginners happen across is:
int main(void)
{
int *p = demo(10);
// p points to where arr started, and arr’s data is still there.
printf("arr[3] = %d.\n", p[3]);
// To execute this call, the program loads data from p[3]. Since it has
// not changed, 3 is loaded. This is passed to printf.
// Then printf prints “arr[3] = 3.\n”. In doing this, it uses memory
// on the stack. This changes the data in the memory that p points to.
printf("arr[3] = %d.\n", p[3]);
// When we try the same call again, the program loads data from p[3],
// but it has been changed, so something different is printed. Two
// different things are printed by the same printf statement even
// though there is no visible code changing p[3].
}
Going back to how you can have a copy of a pointer to memory, compilers follow rules that are specified abstractly in the C standard. The C standard defines an abstract lifetime of the array arr in demo and says that lifetime ends when the function returns. It further says the value of a pointer becomes indeterminate when the lifetime of the object it points to ends.
If your compiler is simplistically generating code, as it does when you compile using GCC with -O0 to turn off optimization, it typically keeps the address in p and you will see the behaviors described above. But, if you turn optimization on and compile more complicated programs, the compiler seeks to optimize the code it generates. Instead of mechanically generating assembly code, it tries to find the “best” code that performs the defined behavior of your program. If you use a pointer with indeterminate value or try to access an object whose lifetime has ended, there is no defined behavior of your program, so optimization by the compiler can produce results that are unexpected by new programmers.
As you know dear, the existence of a variable declared in the local function is within that local scope only. Once the required task is done the function terminates and the local variable is destroyed afterwards. As you are trying to return a pointer from demo() function ,but the thing is the array to which the pointer points to will get destroyed once we come out of demo(). So indeed you are trying to return a dangling pointer which is pointing to de-allocated memory. But our rule suggests us to avoid dangling pointer at any cost.
So you can avoid it by re-initializing it after freeing memory using free(). Either you can also allocate some contiguous block of memory using malloc() or you can declare your array in demo() as static array. This will store the allocated memory constant also when the local function exits successfully.
Thank You Dear..
#include<stdio.h>
#define N 10
int demo();
int main()
{
int* arr_p;
arr_p = demo();
printf("%d\n", *(arr_p+3));
}
int* demo()
{
static int arr[N];
for(i=0;i<N;i++)
{
arr[i] = i;
}
return arr;
}
OUTPUT : 3
Or you can also write as......
#include <stdio.h>
#include <stdlib.h>
#define N 10
int* demo() {
int* arr = (int*)malloc(sizeof(arr) * N);
for(int i = 0; i < N; i++)
{
arr[i]=i;
}
return arr;
}
int main()
{
int* arr_p;
arr_p = demo();
printf("%d\n", *(arr_p+3));
free(arr_p);
return 0;
}
OUTPUT : 3
Had the similar situation when i have been trying to return char array from the function. But i always needed an array of a fixed size.
Solved this by declaring a struct with a fixed size char array in it and returning that struct from the function:
#include <time.h>
typedef struct TimeStamp
{
char Char[9];
} TimeStamp;
TimeStamp GetTimeStamp()
{
time_t CurrentCalendarTime;
time(&CurrentCalendarTime);
struct tm* LocalTime = localtime(&CurrentCalendarTime);
TimeStamp Time = { 0 };
strftime(Time.Char, 9, "%H:%M:%S", LocalTime);
return Time;
}

Why init pointer to NULL is a good practice? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
In case you don't want (or can not) init a pointer with an address, I often hear people say that you should init it with NULL and it's a good practice.
You can find people also say something like that in SO, for example here.
Working in many C projects, I don't think it is a good practice or at least somehow better than not init the pointer with anything.
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
So, could you tell me what are the reasons if you say that it is a good pratice or people just say it for granted (just like you should always init an variable) ?
Note that, I tried to find in Misra 2004 and also did not find any rule or recommendation for that.
Update:
So most of the comments and answers give the main reason that the pointer could have an random address before being used, so it's better to have it as null so you could figure out the problem faster.
To make my point clearer, I think that it doesn't make senses nowadays in commercial C softwares, in a practical point of view. The unassigned pointer that is used will be detected right the way by most of static analyzer, that's why I prefer to let it be un-init because if I init it with NULL then when developers forget to assign it to a "real" address, it passes the static analyzer and will cause runtime problems with null pointer.
You said
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
I would argue the main reason is actually due to exactly this. If you don't init pointers to NULL, then if there is a dereferecing error it's going to be a lot harder to find the problem because the pointer is not going to be set to NULL, it's going to be a most likely garbage value that may look exactly like a valid pointer.
C has very little runtime error checking, but NULL is guaranteed not to refer to a valid address, so a runtime environment (typically an operating system) is able to trap any attempt to de-refernce a NULL. The trap will identify the point at which the de-reference occurs rather then the point the program may eventually fail, making identification of the bug far easier.
Moreover when debugging, and unitialised pointer with random content may not be easily distinguishable from a valid pointer - it may refer to a plausible address, whereas NULL is always an invalid address.
If you de-reference an uninitialised pointer the result is non-deterministic - it may crash, it may not, but it will still be wrong.
If it does crash you cannot tell how or even when, since it may result in corruption of data, or reading of invalid data that may have no effect until that corrupted data is later used. The point of failure will not necessarily be the point of error.
So the purpose is that you will get deterministic failure, whereas without initialising, anything could happen - including nothing, leaving you with a latent undetected bug.
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
Deterministic failure is not "terrible" - it increases your chance of finding the error during development, rather then having your users finding the error after deployment. What you are effectively suggesting is that it is better to leave the bugs in and hide them. The dereference on null is guaranteed to be trapped, de-referencing an unitialised pointer is not.
That said initialising with null, should only be done if at the point of declaration you cannot directly assign an otherwise valid value. That is to say, for example:
char* x = malloc( y ) ;
is much preferable to:
char* x = NULL ;
...
x = malloc( y ) ;
which is in turn preferable to:
char* x ;
...
x = malloc( y ) ;
Note that, I tried to find in Misra 2004 and also did not find any
rule or recommendation for that.
MISRA C:2004, 9.1 - All automatic variables shall have been assigned a value before being used.
That is to say, there is no guideline to initialise to NULL, simply that initialisation is required. As I said initialisation to NULL is not preferable to initialising to a valid pointer. Don't blindly follow the "must initialise to NULL advice", because the rule is simply "must initialise", and sometimes the appropriate initialisation value is NULL.
If you don't initialize the pointer, it can have any value, including possibly NULL. It's hard to imagine a scenario where having any value including NULL is preferable to definitely having a NULL value. The logic is that it's better to at least know its value than have its value unpredictably depend on who knows what, possibly resulting in the code behaving differently on different platforms, with different compilers, and so on.
I strongly disagree with any answer or argument based on the idea that you can reliably use a test for NULL to tell if a pointer is valid or not. You can set a pointer to NULL and then test it for NULL within a limited context where that is known to be safe. But there will always be contexts where more than one pointer points to the same thing and you cannot ensure that every possible pointer to an object will be set to NULL at the very place the object is freed. It is simply an essential C programming discipline to understand that a pointer may or may not point to a valid object depending on what is going on in the code.
One issue is that given a pointer variable p, there is no way defined by the C language to ask, "does this pointer point to valid memory or not?" The pointer might point to valid memory. It might point to memory that (once upon a time) was allocated by malloc, but that has since been freed (meaning that the pointer is invalid). It might be an uninitialized pointer, meaning that it's not even meaningful to ask where it points (although it is definitely invalid). But, again, there's no way to know.
So if you're bopping along in some far-off corner of a large program, and you want to say
if(p is valid) {
do something with p;
} else {
fprintf(stderr, "invalid pointer!\n");
}
you can't do this. Once again, the C language gives you no way of writing if(p is valid).
So that's where the rule to always initialize pointers to NULL comes in. If you adopt this rule and follow it faithfully, if you initialize every pointer to NULL or as a pointer to valid memory, if whenever you call free(p) you always immediately follow it with p = NULL;, then if you follow all these rules, you can achieve a decent way of asking "is p valid?", namely:
if(p != NULL) {
do something with p;
} else {
fprintf(stderr, "invalid pointer!\n");
}
And of course it's very common to use an abbreviation:
if(p) {
do something with p;
} else {
fprintf(stderr, "invalid pointer!\n");
}
Here most people would read if(p) as "if p is valid" or "if p is allocated".
Addendum: This answer has attracted some criticism, and I suppose that's because, to make a point, I wrote some unrealistic code which some people are reading more into than I'd intended. The idiom I'm advocating here is not so much valid pointers versus invalid pointers, but rather, pointers I have allocated versus pointers I have not allocated (yet). No one writes code that simply detects and prints "invalid pointer!" as if to say "I don't know where this pointer points; it might be uninitialized or stale". The more realistic way of using the idiom is do do something like
/* ensure allocation before proceeding */
if(p == NULL)
p = malloc(...);
or
if(p == NULL) {
/* nothing to do */
return;
}
or
if(p == NULL) {
fprintf(stderr, "null pointer detected\n");
exit(0);
}
(And in all three cases the abbreviation if(!p) is popular as well.)
But, of course, if what you're trying to discriminate is pointers I have allocated versus pointers I have not allocated (yet), it is vital that you initialize all your un-allocated pointers with the explicit marker you're using to record that they're un-allocated, namely NULL.
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
Which is why you add a check against NULL before using that pointer value:
if ( p ) // p != NULL
{
// do something with p
}
NULL is a well-defined invalid pointer value, guaranteed to compare unequal to any object or function pointer value. It's a well-defined "nowhere" that's easy to check against.
Compare that to the indeterminate value that the uninitialized pointer1 may have - most likely, it will also be an invalid pointer value that will lead to a crash as soon as you try to use it, but it's almost impossible to determine that beforehand. Is 0xfff78567abcd2220 a valid or invalid pointer value? How would you check that?
Obviously, you should do some analysis to see if an initialization is required. Is there a risk of that pointer being dereferenced before you assign a valid pointer value to it? If not, then you don't need to initialize it beforehand.
Since C99, the proper answer has been to defer instantiating a pointer (or any other type of object, really) until you have a valid value to initialize it with:
void foo( void )
{
printf( "Gimme a length: " );
int length;
scanf( "%d", &length );
char *buf = malloc( sizeof *buf * length );
...
}
ETA
I added a comment to Steve's answer that I think needs to be emphasized:
There's no way to determine if a pointer is valid - if you receive a pointer argument in a function like
void foo( int *ptr )
{
...
}
there is no test you can run on ptr to indicate that yes, it definitely points to an object within that object's lifetime and is safe to use.
By contrast, there is an easy, standard test to indicate that a pointer is definitely invalid and unsafe to use, and that's by checking that its value is NULL. So you can at least avoid using pointers that are definitely invalid with the
if ( p )
{
// do something with p
}
idiom.
Now, just because p isn't NULL doesn't automatically mean it's valid, but if you're consistent and disciplined about setting unused pointers to NULL, then the odds are pretty high that it is.
This is one of those areas where C doesn't protect you, and you have to devote non-trivial amounts of effort to make sure your code is safe and robust. Frankly, it's a pain in the ass more often than not. But being disciplined with using NULL for inactive pointers makes things a little easier.
Again, you have to do some analysis and think about how pointers are being used in your code. If you know you're going to set a pointer to valid value before it's ever read, then it's not critical to initialize it to anything in particular. If you have multiple pointers pointing to the same object, then you need to make sure if that object ever goes away that all those pointers are updated appropriately.
This assumes that the pointer in question has auto storage duration - if the pointer is declared with the static keyword or at file scope, then it is implicitly initialized to NULL.

is it necessary to call pointer = NULL when initializing?

when I create a pointer to certain struct, do I have to set it to NULL, then alloc it then use it? and why?
No, you don't have to set it to NULL, but some consider it good practice as it gives a new pointer a value that makes it explicit it's not pointing at anything (yet).
If you are creating a pointer and then immediately assigning another value to it, then there's really not much value in setting it to NULL.
It is a good idea to set a pointer to NULL after you free the memory it was pointing to, though.
No, there is no requirement (as far as the language is concerned) to initialize a pointer variable to anything when declaring it. Thus
T* ptr;
is a valid declaration that introduces a variable named ptr with an indeterminate value. You can even use the variable in certain ways without first allocating anything or setting it to any specific value:
func(&ptr);
According to the C standard, not initializing an automatic storage variable leaves its value indeterminate.
You are definitely encouraged to set pointers to NULL whenever the alternative is your pointer having an indeterminate value. Usage of pointers with an indeterminate value is undefined behavior, a concept that you might not grasp properly if you're used to C programming and come from higher level languages.
Undefined behavior means that anything could happen, and this is a part of the C standard since the philosophy of C is letting the programmer have the burden to control things, instead of protecting him from his mistakes.
You want to avoid undefined behaviors in code: it is very hard to debug stuff when any behavior can occurr. It is not caught by a compiler, and your tests may always pass, leaving the bug unnoticed until it is very expensive to correct it.
If you do something like this:
char *p;
if(!p) puts("Pointer is NULL");
You don't actually know whether that if control will be true or false. This is extremely dangerous in large programs where you may declare your variable and then use it somewhere very far in space and time.
Same concept when you reference freed memory.
free(p);
printf("%s\n", p);
p = q;
You don't actually know what you're going to get with the last two statements. You can't test it properly, you cannot be sure of your outcome. Your code may seem to work because freed memory is recycled even if it may have been altered... you could even overwrite memory, or corrupt other variables that share the same memory. So you still have a bug that sooner or later will manifest itself and could be very hard to debug it.
If you set the pointer to NULL, you protect yourself from these dangerous situations: during your tests, your program will have a deterministic, predictable behavior that will fail fast and cheaply. You get a segfault, you catch the bug, you fix the bug. Clean and simple:
#include <stdio.h>
#include <stdlib.h>
int main(){
char* q;
if(!q) puts("This may or may not happen, who knows");
q = malloc(10);
free(q);
printf("%s\n", q); // this is probably don't going to crash, but you still have a bug
char* p = NULL;
if(!p) puts("This is always going to happen");
p = malloc(10);
free(p);
p = NULL;
printf("%s\n", p); // this is always going to crash
return 0;
}
So, when you initialize a pointer, especially in large programs, you want them to be explicitely initliazed or set to NULL. You don't want them to get an indeterminate value, never. I should say unless you know what you are doing, but I prefer never instead.
No, don't forget that initialization has to be to a null pointer at all. A very convenient idiom in modern C is to declare variables at their first use
T * ptr = malloc(sizeof *ptr);
This avoids you a lot of hussle of remembering the type and whether or not a variable is already initialized. Only if you don't know where (or even if) it is later initialized, then you definitively should initialize it to a null pointer.
So as a rule of thumb, always initialize variables to the appropriate value. The "proper 0 for the type" is always a good choice if you don't have a better one at hand. For all types, C is made like that, such that it works.
Not initializing a variable is premature optimization in most cases. Only go through your variables when you see that there is a real performance bottleneck, there. In particular if there is an assignment inside the same function before a use of the initial value, andy modern compiler will optimize you the initialization out. Think correctness of your program first.
You dont have to unless you dont want it to dangle a while. You know you should after freeing(defensive style).
You do not have to initialize to NULL. If you intend to allocate it immediately, you can skip it.
At can be useful for error handling like in the following example. In this case breaking to the end after p allocation would leave q uninitialized.
int func(void)
{
char *p;
char *q;
p = malloc(100);
if (!p)
goto end;
q = malloc(100);
if (!q)
goto end;
/* Do something */
end:
free(p);
free(q);
return 0;
}
Also, and this is my personal taste, allocate structs always with calloc. Strings may be allocated uninitialized with malloc

pointer to a structure

this will give a proper output even though i have not allocated memory and have declared a pointer to structure two inside main
struct one
{
char x;
int y;
};
struct two
{
char a;
struct one * ONE;
};
main()
{
struct two *TWO;
scanf("%d",&TWO->ONE->y);
printf("%d\n",TWO->ONE->y);
}
but when i declare a pointer to two after the structure outside main i will get segmentation fault but why is it i don't get segmentation fault in previous case
struct one
{
char x;
int y;
};
struct two
{
char a;
struct one * ONE;
}*TWO;
main()
{
scanf("%d",&TWO->ONE->y);
printf("%d\n",TWO->ONE->y);
}
In both the cases TWO is a pointer to a object of type struct two.
In case 1 the pointer is wild and can be pointing anywhere.
In case 2 the pointer is NULL as it is global.
But in both the cases it a pointer not pointing to a valid struct two object. Your code in scanf is treating this pointer as though it was referring to a valid object. This leads to undefined behavior.
Because what you are doing is undefined behaviour. Sometimes it seems to work. That doesn't mean you should do it :-)
The most likely explanation is to do with how the variables are initialised. Automatic variables (on the stack) will get whatever garbage happens to be on the stack when the stack pointer was decremented.
Variables outside functions (like in the second case) are always initialised to zero (null pointer for pointer types).
That's the basic difference between your two situations but, as I said, the first one is working purely by accident.
When declaring a global pointer, it will be initialized to zero, and so the generated addresses will be small numbers that may or may not be readable on your system.
When declaring an automatic pointer, its initial value is likely to be much more interesting. It will be, in this case, whatever the run-time library left at that point on the stack prior to calling main(), or perhaps a left-over value from the compiler-generated stack-frame setup code. It is somewhat likely to be a saved stack pointer or frame pointer, which is a valid pointer if used with small offsets.
So anyway, the uninitialized pointer does have something in it, and one value leads to a fault while the other, for now, on your system, does not.
And that's because the segmentation fault is a mechanism of the OS and not the C language.
A fault is a block-based mechanism that allocates to itself and other programs some number of pages -- which are each several K -- and it protects itself and other program's pages while allowing your program free reign. You must stray outside of the block context or try to write a read-only page (even if yours) to generate a fault. Simply breaking a language rule is not necessarily enough. The OS is happy to let your program misbehave and act oddly due to its wild references, just as long as it only reads and writes (or clobbers) itself.

How can dereferencing a NULL pointer in C not crash a program?

I need help of a real C guru to analyze a crash in my code. Not for fixing the crash; I can easily fix it, but before doing so I'd like to understand how this crash is even possible, as it seems totally impossible to me.
This crash only happens on a customer machine and I cannot reproduce it locally (so I cannot step through the code using a debugger), as I cannot obtain a copy of this user's database. My company also won't allow me to just change a few lines in the code and make a custom build for this customer (so I cannot add some printf lines and have him run the code again) and of course the customer has a build without debug symbols. In other words, my debbuging abilities are very limited. Nonetheless I could nail down the crash and get some debugging information. However when I look at that information and then at the code I cannot understand how the program flow could ever reach the line in question. The code should have crashed long before getting to that line. I'm totally lost here.
Let's start with the relevant code. It's very little code:
// ... code above skipped, not relevant ...
if (data == NULL) return -1;
information = parseData(data);
if (information == NULL) return -1;
/* Check if name has been correctly \0 terminated */
if (information->kind.name->data[information->kind.name->length] != '\0') {
freeParsedData(information);
return -1;
}
/* Copy the name */
realLength = information->kind.name->length + 1;
*result = malloc(realLength);
if (*result == NULL) {
freeParsedData(information);
return -1;
}
strlcpy(*result, (char *)information->kind.name->data, realLength);
// ... code below skipped, not relevant ...
That's already it. It crashes in strlcpy. I can tell you even how strlcpy is really called at runtime. strlcpy is actually called with the following paramaters:
strlcpy ( 0x341000, 0x0, 0x1 );
Knowing this it is rather obvious why strlcpy crashes. It tries to read one character from a NULL pointer and that will of course crash. And since the last parameter has a value of 1, the original length must have been 0. My code clearly has a bug here, it fails to check for the name data being NULL. I can fix this, no problem.
My question is:
How can this code ever get to the strlcpy in the first place?
Why does this code not crash at the if-statement?
I tried it locally on my machine:
int main (
int argc,
char ** argv
) {
char * nullString = malloc(10);
free(nullString);
nullString = NULL;
if (nullString[0] != '\0') {
printf("Not terminated\n");
exit(1);
}
printf("Can get past the if-clause\n");
char xxx[10];
strlcpy(xxx, nullString, 1);
return 0;
}
This code never gets passed the if statement. It crashes in the if statement and that is definitely expected.
So can anyone think of any reason why the first code can get passed that if-statement without crashing if name->data is really NULL? This is totally mysterious to me. It doesn't seem deterministic.
Important extra information:
The code between the two comments is really complete, nothing has been left out. Further the application is single threaded, so there is no other thread that could unexpectedly alter any memory in the background. The platform where this happens is a PPC CPU (a G4, in case that could play any role). And in case someone wonders about "kind.", this is because "information" contains a "union" named "kind" and name is a struct again (kind is a union, every possible union value is a different type of struct); but this all shouldn't really matter here.
I'm grateful for any idea here. I'm even more grateful if it's not just a theory, but if there is a way I can verify that this theory really holds true for the customer.
Solution
I accepted the right answer already, but just in case anyone finds this question on Google, here's what really happened:
The pointers were pointing to memory, that has already been freed. Freeing memory won't make it all zero or cause the process to give it back to the system at once. So even though the memory has been erroneously freed, it was containing the correct values. The pointer in question is not NULL at the time the "if check" is performed.
After that check I allocate some new memory, calling malloc. Not sure what exactly malloc does here, but every call to malloc or free can have far-reaching consequences to all dynamic memory of the virtual address space of a process. After the malloc call, the pointer is in fact NULL. Somehow malloc (or some system call malloc uses) zeros the already freed memory where the pointer itself is located (not the data it points to, the pointer itself is in dynamic memory). Zeroing that memory, the pointer now has a value of 0x0, which is equal to NULL on my system and when strlcpy is called, it will of course crash.
So the real bug causing this strange behavior was at a completely different location in my code. Never forget: Freed memory keeps it values, but it is beyond your control for how long. To check if your app has a memory bug of accessing already freed memory, just make sure the freed memory is always zeroed before it is freed. In OS X you can do this by setting an environment variable at runtime (no need to recompile anything). Of course this slows down the program quite a bit, but you will catch those bugs much earlier.
First, dereferencing a null pointer is undefined behavior. It can crash, not crash, or set your wallpaper to a picture of SpongeBob Squarepants.
That said, dereferencing a null pointer will usually result in a crash. So your problem is probably memory corruption-related, e.g. from writing past the end of one of your strings. This can cause a delayed-effect crash. I'm particularly suspicious because it's highly unlikely that malloc(1) will fail unless your program is butting up against the end of its available virtual memory, and you would probably notice if that were the case.
Edit: OP pointed out that it isn't result that is null but information->kind.name->data. Here's a potential issue then:
There is no check for whether information->kind.name->data is null. The only check on that is
if (information->kind.name->data[information->kind.name->length] != '\0') {
Let's assume that information->kind.name->data is null, but information->kind.name->length is, say, 100. Then this statement is equivalent to:
if (*(information->kind.name->data + 100) != '\0') {
Which does not dereference NULL but rather dereferences address 100. If this does not crash, and address 100 happens to contain 0, then this test will pass.
It is possible that the structure is located in memory that has been free()'d, or the heap is corrupted. In that case, malloc() could be modifying the memory, thinking that it is free.
You might try running your program under a memory checker. One memory checker that supports Mac OS X is valgrind, although it supports Mac OS X only on Intel, not on PowerPC.
The effect of dereferencing the null pointer is undefined by standard as far as I know.
According to C Standard 6.5.3.2/4:
If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
So there could be crash or could be not.
You may be experiencing stack corruption. The line of code you are refering to may not be being executed at all.
My theory is that information->kind.name->length is a very large value so that information->kind.name->data[information->kind.name->length] is actually referring to a valid memory address.
The act of dereferencing a NULL pointer is undefined by the standard. It is not guaranteed to crash and often times won't unless you actually try and write to the memory.
As an FYI, when I see this line:
if (information->kind.name->data[information->kind.name->length] != '\0') {
I see up to three different pointer dereferences:
information
name
data (if it's a pointer and not a fixed array)
You check information for non-null, but not name and not data. What makes you so sure that they're correct?
I also echo other sentiments here about something else possibly damaging your heap earlier. If you're running on windows, consider using gflags to do things like page allocation, which can be used to detect if you or someone else is writing past the end of a buffer and stepping on your heap.
Saw that you're on a Mac - ignore the gflags comment - it might help someone else who reads this. If you're running on something earlier than OS X, there are a number of handy Macsbugs tools to stress the heap (like the heap scramble command, 'hs').
I'm interested in the char* cast in the call to strlcpy.
Could the type data* be different in size than the char* on your system? If char pointers are smaller you could get a subset of the data pointer which could be NULL.
Example:
int a = 0xffff0000;
short b = (short) a; //b could be 0 if lower bits are used
Edit: Spelling mistakes corrected.
Here's one specific way you can get past the 'data' pointer being NULL in
if (information->kind.name->data[information->kind.name->length] != '\0') {
Say information->kind.name->length is large. Atleast larger than
4096, on a particular platform with a particular compiler (Say, most *nixes with a stock gcc compiler) the code will result in a memory read of "address of kind.name->data + information->kind.name->length].
At a lower level, that read is "read memory at address (0 + 8653)" (or whatever the length was).
It's common on *nixes to mark the first page in the address space as "not accessible", meaning dereferencing a NULL pointer that reads memory address 0 to 4096 will result in a hardware trap being propagated to the application and crash it.
Reading past that first page, you might happen to poke into valid mapped memory, e.g. a shared library or something else that happened to be mapped there - and the memory access will not fail. And that's ok. Dereferencing a NULL pointer is undefined behavior, nothing requires it to fail.
Missing '{' after last if statement means that something in the "// ... code above skipped, not relevant ..." section is controlling access to that entire fragment of code. Out of all the code pasted only the strlcpy is executed. Solution: never use if statements without curly brackets to clarify control.
Consider this...
if(false)
{
if(something == stuff)
{
doStuff();
.. snip ..
if(monkey == blah)
some->garbage= nothing;
return -1;
}
}
crash();
Only "crash();" gets executed.
I would run your program under valgrind. You already know there's a problem with NULL pointers, so profile that code.
The advantage that valgrind beings here is that it checks every single pointer reference and checks to see if that memory location has been previously declared, and it will tell you the line number, structure, and anything else you care to know about memory.
As every one else mentioned, referencing the 0 memory location is a "que sera, sera" kinda thing.
My C tinged spidey sense is telling me that you should break out those structure walks on the
if (information->kind.name->data[information->kind.name->length] != '\0') {
line like
if (information == NULL) {
return -1;
}
if (information->kind == NULL) {
return -1;
}
and so on.
Wow, thats strange. One thing does look slightly suspicious to me, though it may not contribute:
What would happen if information and data were good pointers (non null), but information.kind.name was null. You don't dereference this pointer until the strlcpy line, so if it was null, it might not crash until then. Of course, earlier than t hat you do dereference data[1] to set it to \0, which should also crash, but due to whatever fluke, your program may just happen to have write access to 0x01 but not 0x00.
Also, I see you use information->name.length in one place but information->kind.name.length in another, not sure if thats a typo or if thats desired.
Despite the fact that dereferencing a null pointer leads to undefined behaviour and not necessarily to a crash, you should check the value of information->kind.name->data and not the contents of information->kind.name->data[1].
char * p = NULL;
p[i] is like
p += i;
which is a valid operation, even on a nullpointer. it then points at memory location 0x0000[...]i
You should always check whether information->kind.name->data is null anyway, but in this case
in
if (*result == NULL)
freeParsedData(information);
return -1;
}
you have missed a {
it should be
if (*result == NULL)
{
freeParsedData(information);
return -1;
}
This is a good reason for this coding style, instead of
if (*result == NULL) {
freeParsedData(information);
return -1;
}
where you might not spot the missing brace because you are used to the shape of the code block without the brace separating it from the if clause.
*result = malloc(realLength); // ???
Address of newly allocated memory segment is stored at the location referenced by the address contained in the variable "result".
Is this the intent? If so, the strlcpy may need modification.
As per my understanding, the special case of this problem is invalid access resulting with an attempt to read or write, using a Null pointer. Here the detection of the problem is very much hardware dependent. On some platforms, accessing memory for read or write using in NULL pointer will result in an exception.

Resources