I need help of a real C guru to analyze a crash in my code. Not for fixing the crash; I can easily fix it, but before doing so I'd like to understand how this crash is even possible, as it seems totally impossible to me.
This crash only happens on a customer machine and I cannot reproduce it locally (so I cannot step through the code using a debugger), as I cannot obtain a copy of this user's database. My company also won't allow me to just change a few lines in the code and make a custom build for this customer (so I cannot add some printf lines and have him run the code again) and of course the customer has a build without debug symbols. In other words, my debbuging abilities are very limited. Nonetheless I could nail down the crash and get some debugging information. However when I look at that information and then at the code I cannot understand how the program flow could ever reach the line in question. The code should have crashed long before getting to that line. I'm totally lost here.
Let's start with the relevant code. It's very little code:
// ... code above skipped, not relevant ...
if (data == NULL) return -1;
information = parseData(data);
if (information == NULL) return -1;
/* Check if name has been correctly \0 terminated */
if (information->kind.name->data[information->kind.name->length] != '\0') {
freeParsedData(information);
return -1;
}
/* Copy the name */
realLength = information->kind.name->length + 1;
*result = malloc(realLength);
if (*result == NULL) {
freeParsedData(information);
return -1;
}
strlcpy(*result, (char *)information->kind.name->data, realLength);
// ... code below skipped, not relevant ...
That's already it. It crashes in strlcpy. I can tell you even how strlcpy is really called at runtime. strlcpy is actually called with the following paramaters:
strlcpy ( 0x341000, 0x0, 0x1 );
Knowing this it is rather obvious why strlcpy crashes. It tries to read one character from a NULL pointer and that will of course crash. And since the last parameter has a value of 1, the original length must have been 0. My code clearly has a bug here, it fails to check for the name data being NULL. I can fix this, no problem.
My question is:
How can this code ever get to the strlcpy in the first place?
Why does this code not crash at the if-statement?
I tried it locally on my machine:
int main (
int argc,
char ** argv
) {
char * nullString = malloc(10);
free(nullString);
nullString = NULL;
if (nullString[0] != '\0') {
printf("Not terminated\n");
exit(1);
}
printf("Can get past the if-clause\n");
char xxx[10];
strlcpy(xxx, nullString, 1);
return 0;
}
This code never gets passed the if statement. It crashes in the if statement and that is definitely expected.
So can anyone think of any reason why the first code can get passed that if-statement without crashing if name->data is really NULL? This is totally mysterious to me. It doesn't seem deterministic.
Important extra information:
The code between the two comments is really complete, nothing has been left out. Further the application is single threaded, so there is no other thread that could unexpectedly alter any memory in the background. The platform where this happens is a PPC CPU (a G4, in case that could play any role). And in case someone wonders about "kind.", this is because "information" contains a "union" named "kind" and name is a struct again (kind is a union, every possible union value is a different type of struct); but this all shouldn't really matter here.
I'm grateful for any idea here. I'm even more grateful if it's not just a theory, but if there is a way I can verify that this theory really holds true for the customer.
Solution
I accepted the right answer already, but just in case anyone finds this question on Google, here's what really happened:
The pointers were pointing to memory, that has already been freed. Freeing memory won't make it all zero or cause the process to give it back to the system at once. So even though the memory has been erroneously freed, it was containing the correct values. The pointer in question is not NULL at the time the "if check" is performed.
After that check I allocate some new memory, calling malloc. Not sure what exactly malloc does here, but every call to malloc or free can have far-reaching consequences to all dynamic memory of the virtual address space of a process. After the malloc call, the pointer is in fact NULL. Somehow malloc (or some system call malloc uses) zeros the already freed memory where the pointer itself is located (not the data it points to, the pointer itself is in dynamic memory). Zeroing that memory, the pointer now has a value of 0x0, which is equal to NULL on my system and when strlcpy is called, it will of course crash.
So the real bug causing this strange behavior was at a completely different location in my code. Never forget: Freed memory keeps it values, but it is beyond your control for how long. To check if your app has a memory bug of accessing already freed memory, just make sure the freed memory is always zeroed before it is freed. In OS X you can do this by setting an environment variable at runtime (no need to recompile anything). Of course this slows down the program quite a bit, but you will catch those bugs much earlier.
First, dereferencing a null pointer is undefined behavior. It can crash, not crash, or set your wallpaper to a picture of SpongeBob Squarepants.
That said, dereferencing a null pointer will usually result in a crash. So your problem is probably memory corruption-related, e.g. from writing past the end of one of your strings. This can cause a delayed-effect crash. I'm particularly suspicious because it's highly unlikely that malloc(1) will fail unless your program is butting up against the end of its available virtual memory, and you would probably notice if that were the case.
Edit: OP pointed out that it isn't result that is null but information->kind.name->data. Here's a potential issue then:
There is no check for whether information->kind.name->data is null. The only check on that is
if (information->kind.name->data[information->kind.name->length] != '\0') {
Let's assume that information->kind.name->data is null, but information->kind.name->length is, say, 100. Then this statement is equivalent to:
if (*(information->kind.name->data + 100) != '\0') {
Which does not dereference NULL but rather dereferences address 100. If this does not crash, and address 100 happens to contain 0, then this test will pass.
It is possible that the structure is located in memory that has been free()'d, or the heap is corrupted. In that case, malloc() could be modifying the memory, thinking that it is free.
You might try running your program under a memory checker. One memory checker that supports Mac OS X is valgrind, although it supports Mac OS X only on Intel, not on PowerPC.
The effect of dereferencing the null pointer is undefined by standard as far as I know.
According to C Standard 6.5.3.2/4:
If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undeļ¬ned.
So there could be crash or could be not.
You may be experiencing stack corruption. The line of code you are refering to may not be being executed at all.
My theory is that information->kind.name->length is a very large value so that information->kind.name->data[information->kind.name->length] is actually referring to a valid memory address.
The act of dereferencing a NULL pointer is undefined by the standard. It is not guaranteed to crash and often times won't unless you actually try and write to the memory.
As an FYI, when I see this line:
if (information->kind.name->data[information->kind.name->length] != '\0') {
I see up to three different pointer dereferences:
information
name
data (if it's a pointer and not a fixed array)
You check information for non-null, but not name and not data. What makes you so sure that they're correct?
I also echo other sentiments here about something else possibly damaging your heap earlier. If you're running on windows, consider using gflags to do things like page allocation, which can be used to detect if you or someone else is writing past the end of a buffer and stepping on your heap.
Saw that you're on a Mac - ignore the gflags comment - it might help someone else who reads this. If you're running on something earlier than OS X, there are a number of handy Macsbugs tools to stress the heap (like the heap scramble command, 'hs').
I'm interested in the char* cast in the call to strlcpy.
Could the type data* be different in size than the char* on your system? If char pointers are smaller you could get a subset of the data pointer which could be NULL.
Example:
int a = 0xffff0000;
short b = (short) a; //b could be 0 if lower bits are used
Edit: Spelling mistakes corrected.
Here's one specific way you can get past the 'data' pointer being NULL in
if (information->kind.name->data[information->kind.name->length] != '\0') {
Say information->kind.name->length is large. Atleast larger than
4096, on a particular platform with a particular compiler (Say, most *nixes with a stock gcc compiler) the code will result in a memory read of "address of kind.name->data + information->kind.name->length].
At a lower level, that read is "read memory at address (0 + 8653)" (or whatever the length was).
It's common on *nixes to mark the first page in the address space as "not accessible", meaning dereferencing a NULL pointer that reads memory address 0 to 4096 will result in a hardware trap being propagated to the application and crash it.
Reading past that first page, you might happen to poke into valid mapped memory, e.g. a shared library or something else that happened to be mapped there - and the memory access will not fail. And that's ok. Dereferencing a NULL pointer is undefined behavior, nothing requires it to fail.
Missing '{' after last if statement means that something in the "// ... code above skipped, not relevant ..." section is controlling access to that entire fragment of code. Out of all the code pasted only the strlcpy is executed. Solution: never use if statements without curly brackets to clarify control.
Consider this...
if(false)
{
if(something == stuff)
{
doStuff();
.. snip ..
if(monkey == blah)
some->garbage= nothing;
return -1;
}
}
crash();
Only "crash();" gets executed.
I would run your program under valgrind. You already know there's a problem with NULL pointers, so profile that code.
The advantage that valgrind beings here is that it checks every single pointer reference and checks to see if that memory location has been previously declared, and it will tell you the line number, structure, and anything else you care to know about memory.
As every one else mentioned, referencing the 0 memory location is a "que sera, sera" kinda thing.
My C tinged spidey sense is telling me that you should break out those structure walks on the
if (information->kind.name->data[information->kind.name->length] != '\0') {
line like
if (information == NULL) {
return -1;
}
if (information->kind == NULL) {
return -1;
}
and so on.
Wow, thats strange. One thing does look slightly suspicious to me, though it may not contribute:
What would happen if information and data were good pointers (non null), but information.kind.name was null. You don't dereference this pointer until the strlcpy line, so if it was null, it might not crash until then. Of course, earlier than t hat you do dereference data[1] to set it to \0, which should also crash, but due to whatever fluke, your program may just happen to have write access to 0x01 but not 0x00.
Also, I see you use information->name.length in one place but information->kind.name.length in another, not sure if thats a typo or if thats desired.
Despite the fact that dereferencing a null pointer leads to undefined behaviour and not necessarily to a crash, you should check the value of information->kind.name->data and not the contents of information->kind.name->data[1].
char * p = NULL;
p[i] is like
p += i;
which is a valid operation, even on a nullpointer. it then points at memory location 0x0000[...]i
You should always check whether information->kind.name->data is null anyway, but in this case
in
if (*result == NULL)
freeParsedData(information);
return -1;
}
you have missed a {
it should be
if (*result == NULL)
{
freeParsedData(information);
return -1;
}
This is a good reason for this coding style, instead of
if (*result == NULL) {
freeParsedData(information);
return -1;
}
where you might not spot the missing brace because you are used to the shape of the code block without the brace separating it from the if clause.
*result = malloc(realLength); // ???
Address of newly allocated memory segment is stored at the location referenced by the address contained in the variable "result".
Is this the intent? If so, the strlcpy may need modification.
As per my understanding, the special case of this problem is invalid access resulting with an attempt to read or write, using a Null pointer. Here the detection of the problem is very much hardware dependent. On some platforms, accessing memory for read or write using in NULL pointer will result in an exception.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
In case you don't want (or can not) init a pointer with an address, I often hear people say that you should init it with NULL and it's a good practice.
You can find people also say something like that in SO, for example here.
Working in many C projects, I don't think it is a good practice or at least somehow better than not init the pointer with anything.
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
So, could you tell me what are the reasons if you say that it is a good pratice or people just say it for granted (just like you should always init an variable) ?
Note that, I tried to find in Misra 2004 and also did not find any rule or recommendation for that.
Update:
So most of the comments and answers give the main reason that the pointer could have an random address before being used, so it's better to have it as null so you could figure out the problem faster.
To make my point clearer, I think that it doesn't make senses nowadays in commercial C softwares, in a practical point of view. The unassigned pointer that is used will be detected right the way by most of static analyzer, that's why I prefer to let it be un-init because if I init it with NULL then when developers forget to assign it to a "real" address, it passes the static analyzer and will cause runtime problems with null pointer.
You said
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
I would argue the main reason is actually due to exactly this. If you don't init pointers to NULL, then if there is a dereferecing error it's going to be a lot harder to find the problem because the pointer is not going to be set to NULL, it's going to be a most likely garbage value that may look exactly like a valid pointer.
C has very little runtime error checking, but NULL is guaranteed not to refer to a valid address, so a runtime environment (typically an operating system) is able to trap any attempt to de-refernce a NULL. The trap will identify the point at which the de-reference occurs rather then the point the program may eventually fail, making identification of the bug far easier.
Moreover when debugging, and unitialised pointer with random content may not be easily distinguishable from a valid pointer - it may refer to a plausible address, whereas NULL is always an invalid address.
If you de-reference an uninitialised pointer the result is non-deterministic - it may crash, it may not, but it will still be wrong.
If it does crash you cannot tell how or even when, since it may result in corruption of data, or reading of invalid data that may have no effect until that corrupted data is later used. The point of failure will not necessarily be the point of error.
So the purpose is that you will get deterministic failure, whereas without initialising, anything could happen - including nothing, leaving you with a latent undetected bug.
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
Deterministic failure is not "terrible" - it increases your chance of finding the error during development, rather then having your users finding the error after deployment. What you are effectively suggesting is that it is better to leave the bugs in and hide them. The dereference on null is guaranteed to be trapped, de-referencing an unitialised pointer is not.
That said initialising with null, should only be done if at the point of declaration you cannot directly assign an otherwise valid value. That is to say, for example:
char* x = malloc( y ) ;
is much preferable to:
char* x = NULL ;
...
x = malloc( y ) ;
which is in turn preferable to:
char* x ;
...
x = malloc( y ) ;
Note that, I tried to find in Misra 2004 and also did not find any
rule or recommendation for that.
MISRA C:2004, 9.1 - All automatic variables shall have been assigned a value before being used.
That is to say, there is no guideline to initialise to NULL, simply that initialisation is required. As I said initialisation to NULL is not preferable to initialising to a valid pointer. Don't blindly follow the "must initialise to NULL advice", because the rule is simply "must initialise", and sometimes the appropriate initialisation value is NULL.
If you don't initialize the pointer, it can have any value, including possibly NULL. It's hard to imagine a scenario where having any value including NULL is preferable to definitely having a NULL value. The logic is that it's better to at least know its value than have its value unpredictably depend on who knows what, possibly resulting in the code behaving differently on different platforms, with different compilers, and so on.
I strongly disagree with any answer or argument based on the idea that you can reliably use a test for NULL to tell if a pointer is valid or not. You can set a pointer to NULL and then test it for NULL within a limited context where that is known to be safe. But there will always be contexts where more than one pointer points to the same thing and you cannot ensure that every possible pointer to an object will be set to NULL at the very place the object is freed. It is simply an essential C programming discipline to understand that a pointer may or may not point to a valid object depending on what is going on in the code.
One issue is that given a pointer variable p, there is no way defined by the C language to ask, "does this pointer point to valid memory or not?" The pointer might point to valid memory. It might point to memory that (once upon a time) was allocated by malloc, but that has since been freed (meaning that the pointer is invalid). It might be an uninitialized pointer, meaning that it's not even meaningful to ask where it points (although it is definitely invalid). But, again, there's no way to know.
So if you're bopping along in some far-off corner of a large program, and you want to say
if(p is valid) {
do something with p;
} else {
fprintf(stderr, "invalid pointer!\n");
}
you can't do this. Once again, the C language gives you no way of writing if(p is valid).
So that's where the rule to always initialize pointers to NULL comes in. If you adopt this rule and follow it faithfully, if you initialize every pointer to NULL or as a pointer to valid memory, if whenever you call free(p) you always immediately follow it with p = NULL;, then if you follow all these rules, you can achieve a decent way of asking "is p valid?", namely:
if(p != NULL) {
do something with p;
} else {
fprintf(stderr, "invalid pointer!\n");
}
And of course it's very common to use an abbreviation:
if(p) {
do something with p;
} else {
fprintf(stderr, "invalid pointer!\n");
}
Here most people would read if(p) as "if p is valid" or "if p is allocated".
Addendum: This answer has attracted some criticism, and I suppose that's because, to make a point, I wrote some unrealistic code which some people are reading more into than I'd intended. The idiom I'm advocating here is not so much valid pointers versus invalid pointers, but rather, pointers I have allocated versus pointers I have not allocated (yet). No one writes code that simply detects and prints "invalid pointer!" as if to say "I don't know where this pointer points; it might be uninitialized or stale". The more realistic way of using the idiom is do do something like
/* ensure allocation before proceeding */
if(p == NULL)
p = malloc(...);
or
if(p == NULL) {
/* nothing to do */
return;
}
or
if(p == NULL) {
fprintf(stderr, "null pointer detected\n");
exit(0);
}
(And in all three cases the abbreviation if(!p) is popular as well.)
But, of course, if what you're trying to discriminate is pointers I have allocated versus pointers I have not allocated (yet), it is vital that you initialize all your un-allocated pointers with the explicit marker you're using to record that they're un-allocated, namely NULL.
One of my biggest reason is: init a pointer with NULL increase the chance of null pointer derefence which may crash the whole software, and it's terrible.
Which is why you add a check against NULL before using that pointer value:
if ( p ) // p != NULL
{
// do something with p
}
NULL is a well-defined invalid pointer value, guaranteed to compare unequal to any object or function pointer value. It's a well-defined "nowhere" that's easy to check against.
Compare that to the indeterminate value that the uninitialized pointer1 may have - most likely, it will also be an invalid pointer value that will lead to a crash as soon as you try to use it, but it's almost impossible to determine that beforehand. Is 0xfff78567abcd2220 a valid or invalid pointer value? How would you check that?
Obviously, you should do some analysis to see if an initialization is required. Is there a risk of that pointer being dereferenced before you assign a valid pointer value to it? If not, then you don't need to initialize it beforehand.
Since C99, the proper answer has been to defer instantiating a pointer (or any other type of object, really) until you have a valid value to initialize it with:
void foo( void )
{
printf( "Gimme a length: " );
int length;
scanf( "%d", &length );
char *buf = malloc( sizeof *buf * length );
...
}
ETA
I added a comment to Steve's answer that I think needs to be emphasized:
There's no way to determine if a pointer is valid - if you receive a pointer argument in a function like
void foo( int *ptr )
{
...
}
there is no test you can run on ptr to indicate that yes, it definitely points to an object within that object's lifetime and is safe to use.
By contrast, there is an easy, standard test to indicate that a pointer is definitely invalid and unsafe to use, and that's by checking that its value is NULL. So you can at least avoid using pointers that are definitely invalid with the
if ( p )
{
// do something with p
}
idiom.
Now, just because p isn't NULL doesn't automatically mean it's valid, but if you're consistent and disciplined about setting unused pointers to NULL, then the odds are pretty high that it is.
This is one of those areas where C doesn't protect you, and you have to devote non-trivial amounts of effort to make sure your code is safe and robust. Frankly, it's a pain in the ass more often than not. But being disciplined with using NULL for inactive pointers makes things a little easier.
Again, you have to do some analysis and think about how pointers are being used in your code. If you know you're going to set a pointer to valid value before it's ever read, then it's not critical to initialize it to anything in particular. If you have multiple pointers pointing to the same object, then you need to make sure if that object ever goes away that all those pointers are updated appropriately.
This assumes that the pointer in question has auto storage duration - if the pointer is declared with the static keyword or at file scope, then it is implicitly initialized to NULL.
While switching from linux back to windows, I noticed that my code stopped working. Using the trusty debugger, I found that structs were being initialised differently.
typedef struct base{
struct item * first;
}BASE;
typedef BASE *SPACE;
...
hashmap = malloc(sizeof(SPACE *) * length);
hashSpaceSize = length;
Look at this code for example (hid extra code to keep it tidy, also ignore struct item it's not useful here). Let's say that length is 3. In Linux, when I check the debugger, I see that:
hashmap[0] = NULL;
hashmap[1] = NULL;
hashmap[2] = NULL;
Because I did not initialise the BASEs, I only initialised the fact that there is an array of them. However, in Windows, I see that all of the BASES are initialised. Not only that, but all of the ITEMs within the BASEs are initialised as well.
However, if I, for example, immediately afterwards add this line:
hashmap[0]->first = NULL, I end up with a SIGSEGV error that I can't find the cause of. In Linux, this is because hashmap[0] is NULL, and hence hashmap[0]->first can't even be accessed in the first place. But on Windows, it clearly shows that hashmap[0] exists and has an initialised first value.
I don't know what is going on here, and I can't find anything regarding this bug. If more code is needed, everything is on my github. Linked to the actual file this code is in. But for now, I'm confused as to what's going on...
UPDATE: Apparently I had some looking up to do. Didn't know that malloc returned an uninitialized pointer and not just NULL. That was set by Linux. Thanks though, learnt something new today.
Let's say that length is 3. In Linux, when I check the debugger, I see
that:
hashmap[0] = NULL;
hashmap[1] = NULL;
hashmap[2] = NULL;
Because I did not initialise the BASEs, I only initialised the fact
that there is an array of them.
No. You get all of those being NULL because that happens to be what you get. C does not specify the initial contents of the memory returned by malloc(), and if you performed that allocation under other circumstances then you might not get all NULLs.
However, in Windows, I see that all of
the BASES are initialised. Not only that, but all of the ITEMs within
the BASEs are initialised as well.
They may have non-NULL values, but that's very different from being initialized. The values are very likely to be wild pointers. If they happen to point to accessible memory then you can interpret the data where they point as ITEMs, but again, that does not mean they are initialized, or that it is safe to access that memory. You are delving into undefined behavior here.
However, if I, for example, immediately afterwards add this line:
hashmap[0]->first = NULL, I end up with a SIGSEGV error that I can't
find the cause of.
We can't speak to the cause of your segmentation fault because you have not presented the code responsible, but having an array of pointers does not mean the pointer values within are valid. If they are not, then dereferencing them produces undefined behavior, which often manifests as a segfault. Note well that this does not depend on those pointers being NULL; it can attend accessing any pointer value that does not point to an object belonging to your program and having compatible type.
In Linux, this is because hashmap[0] is NULL, and
hence hashmap[0]->first can't even be accessed in the first place. But
on Windows, it clearly shows that hashmap[0] exists and has an
initialised first value.
No, it doesn't. Again, your debugger shows hashmap[0] having a non-NULL value, which is not at all the same thing.
It is your responsibility to avoid dereferencing invalid pointer values, which are by no means limited to NULL.
The values of bytes pointed to after a successfull call to malloc are uninitialized. That means they can be set to any arbitrary value, including zero. So just because the bytes are either zero or non-zero doesn't mean they are initialized.
Section 7.22.3.4 of the C standard regarding malloc states:
1
#include <stdlib.h>
void *malloc(size_t size);
2 The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate.
So there are no guarantees what the memory returned by malloc will contain.
If on the other hand you use calloc, that function will initialize all allocated bytes to 0.
hashmap = calloc(length, sizeof(SPACE *));
realloc may return either the same input address or a different address. If it returns a different address then it shall internally de-allocate/free the input memory and moving that content into an another location and returns that new address.
Please consider the following case.
new_ptr = realloc (2000, 10000) // Lets assume the input address is 2000
// Lets assume the new_ptr address is 3000
So, internally realloc shall free the memory where pointer points to 2000 and move those data into a new location 3000 and return the 3000 address.
Now the address 2000 is points to invalid. Hence it is not assigned to NULL by realloc API.
Now, passing that invalid address to realloc function. In real time there may be changes that realloc may get the invalid input address.
new_ptr = realloc(2000, 10000)
This 2000 address is invalid since it is already freed by previous realloc. Now the program crashes.
Can I resolve this issue by doing the following way.
if (new_ptr != old_ptr ) {
old_ptr = NULL;
}
Since the old_ptr is invalid. I shall assign it to NULL.
Please confirm me the correction.
Think about your first sentence:
realloc may return either the same input address or a different address.
This implies you can just use the return value as your new pointer, you don't have to know whether it's different from your previous one or not. If it is different, realloc() already handled freeing the previous block for you.
But there's one exception: realloc() may return 0 / NULL if the allocation fails. Only in this case, the old pointer is still valid. Therefore, the common idiom to use realloc() correctly looks like this:
T *x = malloc(x_size);
// check x for NULL
// [...]
T *tmp = realloc(x, new_size);
if (!tmp)
{
free(x);
// handle error, in many cases just exit(1) or similar
}
x = tmp; // use the new pointer, don't care whether it's the same
Note that using x (from my example above) after a successful call to realloc() is undefined, according to the C standard, x is invalid after the call. This doesn't tell you anything about the actual value of x. It just tells you "Don't use it, otherwise your program might do anything".
This self-quote might help you to understand what undefined behavior means:
Undefined behavior in C
C is a very low-level language and one consequence of that is the following:
Nothing will ever stop you from doing something completely wrong.
Many languages, especially those for some managed environment like Java
or C# actually stop you when you do things that are not allowed, say,
access an array element that does not exist. C doesn't. As long as your
program is syntactically correct, the compiler won't complain. If you do
something forbidden in your program, C just calls the behavior of your
program undefined. This formally allows anything to happen when running
the program. Often, the result will be a crash or just output of "garbage"
values, as seen above. But if you're really unlucky, your program will seem
to work just fine until it gets some slightly different input, and by that
time, you will have a really hard time to spot where exactly your program is
undefined. Therefore avoid undefined behavior by all means!.
On a side note, undefined behavior can also cause security holes. This
has happened a lot in practice.
Realloc will free the old memory block if it is successful.
NOTE: if it can append in the same memory block it will append there itself. If it is not able to append it will create a new memory block and free the old one.
if you have any loop logic or you are using pointer that point to old memory block inside function after realloc is done.Than yes crash will come.
if old pointer is just to do realloc than no need.Since local pointer you have created and its scope will be limited to that function.Every time you call the function it will be new pointer varaible.
I am now studying C and in some code examples I saw that after we allocate some memory to a pointer, we have to check the pointer to be not a NULL. For example:
CVector *vector = malloc(sizeof(struct CVectorImplementation));
assert(vector != NULL);
another example:
vector->elements = realloc(vector->elements, vector->elemsz * vector->vec_capacity);
assert(vector->elements != NULL);
However, I think since the pointer is already been allocated, then it has the address of the allocated memory as its value, thus is it always necessary? why?
If you've reassigned the original pointer in response to realloc, it's too late to do anything useful in response to a failure. When realloc fails, it returns NULL, but it does not free the original pointer. So even if you have some reasonable response to an allocation failure (not common), you've already leaked the memory you were trying to realloc.
The answer to your main question is mostly "it's a bad idea to allow NULL pointer dereferences to occur because it's a source of vulnerabilities"; usually the vulnerabilities crop up in kernel code (where NULL is just as valid an address as anywhere else), but even when it's not exploitable, it means the program segfaults instead of reporting an error in a useful way.
It's a great idea to check the pointer returned from malloc/realloc.
If there's an error, you will get a null value returned. Use this check to your advantage because if you make reference to the same pointer later in your program and your program suddenly crashes, then chances are the pointer is set to null.
If you do have a valid pointer from a malloc/realloc call, then make sure you use it inside the free() function before deciding to modify the pointer value and before the program terminates, otherwise, you may run into memory leaks.
If you need to change the pointer value to write to a different section of the memory you allocated, then use another pointer.
Here's code in C that shows what I mean:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(){
char *block=calloc(1,10000);
if (block==NULL){
printf("Can't allocate memory\n");
return -1;
}
memset(block,48,20); //set 1st 20 bytes of memory to number zero (ascii code 48)
char *insideoftheblock=block+10; // I set a pointer to go to index #10 in the memory
*insideoftheblock='x';
*insideoftheblock++;
*insideoftheblock='y';
printf("Memory = '%s'",block);
free(block);
}
P.S.
I updated my code to include a check to see if memory has been actually allocated.
The realloc function attempts to allocate new memory. If this allocation fails then the realloc function returns NULL. Your code must deal with this situation.
If you want to abort your program in this case then the assert as you currently have it is suitable. If you want to recover, then you will need to store the realloc result in a separate variable while you assess the situation, e.g.:
void *new = realloc(vector->elements, vector->elemsz * vector->vec_capacity);
if ( !new )
// take some action.... the old vector->elements is still valid
else
vector->elements = new;
A failed allocation typical results in 1 of 2 actions:
1) Exit the program with a diagnostic. This is far better than not checking and letting the code continue to who--knows--what.
2) In select circumstances, code can cope with the failure. Maybe freeing other resources and trying again, return a failure code and leave the problem to the calling routine or writing a "suicide note" and re-starting the system. IAC, the action is very specific to the situation.
Robust code checks the result. Beginner code does not.
I've seen a lot of code that checks for NULL pointers whenever an allocation is made. This makes the code verbose, and if it's not done consistently, only when the programmer felt like it, doesn't even ensure that the program won't crash when the address space runs out. Besides, if the program can't make more allocations, it wouldn't be able to do its function anyway, right?
So my question is, isn't it better for most programs not to check at all and just let the program crash if memory runs out? At least the code is more readable that way.
Note
I'm talking about desktop apps that run on modern computers (at least 2 GB address space), and that most definitely don't operate space shuttles, life support systems, or BP's oil platforms. Most importantly I'm talking about programs that use malloc but never really go above 5 MB of memory usage.
Always check the return value, but for clarity, it's common to wrap malloc() in a function which never returns NULL:
void *
emalloc(size_t amt){
void *v = malloc(amt);
if(!v){
fprintf(stderr, "out of mem\n");
exit(EXIT_FAILURE);
}
return v;
}
Then, later you can use
char *foo = emalloc(56);
foo[12] = 'A';
With no guilty conscience.
Yes, you should check for a null return value from malloc. Even if you can't recover from the failure of memory allocation you should explicitly exit. Carrying on as though memory allocation had succeeded leaves your application in an inconsistent state and is likely to cause "undefined behavior" which should be avoided.
For example, you may end up writing inconsistent data to external storage which may hinder the ability of the next run of the application to recover. It's much safer to exit swiftly in a more controlled fashion.
Many applications that want to exit on allocation failure wrap malloc in a function that checks the return value and explicitly aborts on failure.
Arguably, this is one advantage of the C++ default new approach to throw an exception on allocation failure. It requires no effort to exit on memory allocation failure.
Similar to Dave's approach above, but adds a macro that automatically passes
the file name and line number to our allocation routine so that we can report
that information in the event of a failure.
#include <stdio.h>
#include <stdlib.h>
#define ZMALLOC(theSize) zmalloc(__FILE__, __LINE__, theSize)
static void *zmalloc(const char *file, int line, int size)
{
void *ptr = malloc(size);
if(!ptr)
{
printf("Could not allocate: %d bytes (%s:%d)\n", size, file, line);
exit(1);
}
return(ptr);
}
int main()
{
/* -- Set 'forceFailure' to a non-zero value in order to observe
how 'zmalloc' behaves when it cannot allocate the
requested memory -- */
int bytes = 10 * sizeof(int);
int forceFailure = 0;
int *anArray = NULL;
if(forceFailure)
bytes = -1;
anArray = ZMALLOC(bytes);
free(anArray);
return(0);
}
but it is much more difficult to troubleshoot if you don't log where the malloc failed.
failed to allocate memory in line XX is to prefer than just to crash.
You should definitely check the return value for malloc. Helpful in debugging and the code becomes robust.
Always check malloc'ed memory?
In a hosted environment error checking the return of malloc makes not much sense nowadays. Most machines have a virtual address space of 64 bit. You'd need a lot of time to exhaust that. Your program will most likely fail at a completely different place, namely when your physical+swap memory is exhausted. It will have shown completely ridiculous performance before that, because it only was swapping and the user will have triggered Cntrl-C long before you ever come there.
Segfaulting "nicely" on a null pointer reference would be a clear point to see where things fail in a debugger. But in my practice I have never seen a failed malloc as a cause.
When programming for embedded systems the picture changes completely. There you definitively should check for failed malloc.
Edit: To clarify that after the edit of the question. The kind of programs/systems described there are clearly not "embedded". I have never seen malloc fail under the circumstances described there.
I'd like to add that edge cases should always be checked even if you think they are safe or cannot lead to other issues than a crash. Null pointer dereference can potentially be exploited (http://uninformed.org/?v=4&a=5&t=sumry).