C variable being set to 0 after passing to function - c

So I am running into an odd problem. I have a pointer to a struct, and I am passing it to a function. Once inside that function, one of the variables I am interested in appears to be 0, but before passing the struct to the function I check to make sure that specific variable is not 0. The weird thing is this does not happen every time, only once in a while. Has anyone ever seen something like this happen before?
Source Code:
if( expand->num == 0)
return status;
status = decode( expand );
...
Status_type decode ( expand_type * expand )
{
if(expand->cur_num >= expand->num) // Here is where my error occurs
// 'num' is 0.
{
// Do stuff
}
}

Sounds like a "dangling pointer" problem: You have a rogue pointer that is pointing to memory which does not belong to it (e.g., not an allocated instance of the type the pointer references), and somewhere the program instructs the pointer to use that "value", and the pointer is now overwriting that value (which is accidentally on the value you are inspecting).
Your symptoms match: Intermittent, memory overwrite when it appears there should be no memory overwrite.
Those can be very painful to track down, because you need to check the integrity of all your pointer operations, even the ones that have nothing to do with the code you are currently looking at (monitoring).
One thing to help you track it down would be to explicitly initialize all pointers to "null" on instantiation. This is generally not necessary, but can be helpful for debugging, because indirection-on-null will typically crash your program right there (which is what you want), rather than remain as a "hidden problem" as you access memory that is not yours (because pointers in C are by default not initialized to null). So, for example:
int* p; // Initialized to "garbage" memory address
...change to:
int* p = NULL; // Force initialization for debugging, crashes on indirection
On the bright side, when you suffer that enough, you have habits that make you very careful with your pointers. ;-)))

Three possibilities come to mind.
Corruption from another task.
Uninitialized pointer
Dangling pointer
Of these three possibilities, my money (based on the current information) is that of the uninitialized pointer.
My guess is that the value the uninitialized pointer has points just far enough down the stack that when decode() is called, the memory where 'expand->num' is stored is corrupted by either copying the parameter 'expand' onto the stack, or copying the return address to the stack, or by setting up the stack frame for decode().

Finally found the problem. There are multiple threads, and two threads were trying to access the same variable at the same time (the expand variable), causing it to be overwritten! So I had to stop one of the threads at a certain point so both wouldn't be accessing the same stuff.

Related

Why i have to assign NULL to pointer after freeing allocated memory? [duplicate]

In my company there is a coding rule that says, after freeing any memory, reset the variable to NULL. For example ...
void some_func ()
{
int *nPtr;
nPtr = malloc (100);
free (nPtr);
nPtr = NULL;
return;
}
I feel that, in cases like the code shown above, setting to NULL does not have any meaning. Or am I missing something?
If there is no meaning in such cases, I am going to take it up with the "quality team" to remove this coding rule. Please advice.
Setting unused pointers to NULL is a defensive style, protecting against dangling pointer bugs. If a dangling pointer is accessed after it is freed, you may read or overwrite random memory. If a null pointer is accessed, you get an immediate crash on most systems, telling you right away what the error is.
For local variables, it may be a little bit pointless if it is "obvious" that the pointer isn't accessed anymore after being freed, so this style is more appropriate for member data and global variables. Even for local variables, it may be a good approach if the function continues after the memory is released.
To complete the style, you should also initialize pointers to NULL before they get assigned a true pointer value.
Most of the responses have focused on preventing a double free, but setting the pointer to NULL has another benefit. Once you free a pointer, that memory is available to be reallocated by another call to malloc. If you still have the original pointer around you might end up with a bug where you attempt to use the pointer after free and corrupt some other variable, and then your program enters an unknown state and all kinds of bad things can happen (crash if you're lucky, data corruption if you're unlucky). If you had set the pointer to NULL after free, any attempt to read/write through that pointer later would result in a segfault, which is generally preferable to random memory corruption.
For both reasons, it can be a good idea to set the pointer to NULL after free(). It's not always necessary, though. For example, if the pointer variable goes out of scope immediately after free(), there's not much reason to set it to NULL.
Setting a pointer to NULL after free is a dubious practice that is often popularized as a "good programming" rule on a patently false premise. It is one of those fake truths that belong to the "sounds right" category but in reality achieve absolutely nothing useful (and sometimes leads to negative consequences).
Allegedly, setting a pointer to NULL after free is supposed to prevent the dreaded "double free" problem when the same pointer value is passed to free more than once. In reality though, in 9 cases out of 10 the real "double free" problem occurs when different pointer objects holding the same pointer value are used as arguments for free. Needless to say, setting a pointer to NULL after free achieves absolutely nothing to prevent the problem in such cases.
Of course, it is possible to run into "double free" problem when using the same pointer object as an argument to free. However, in reality situations like that normally indicate a problem with the general logical structure of the code, not a mere accidental "double free". A proper way to deal with the problem in such cases is to review and rethink the structure of the code in order to avoid the situation when the same pointer is passed to free more than once. In such cases setting the pointer to NULL and considering the problem "fixed" is nothing more than an attempt to sweep the problem under the carpet. It simply won't work in general case, because the problem with the code structure will always find another way to manifest itself.
Finally, if your code is specifically designed to rely on the pointer value being NULL or not NULL, it is perfectly fine to set the pointer value to NULL after free. But as a general "good practice" rule (as in "always set your pointer to NULL after free") it is, once again, a well-known and pretty useless fake, often followed by some for purely religious, voodoo-like reasons.
This is considered good practice to avoid overwriting memory. In the above function, it is unnecessary, but oftentimes when it is done it can find application errors.
Try something like this instead:
#if DEBUG_VERSION
void myfree(void **ptr)
{
free(*ptr);
*ptr = NULL;
}
#else
#define myfree(p) do { void ** p_tmp = (p); free(*(p_tmp)); *(p_tmp) = NULL; } while (0)
#endif
The DEBUG_VERSION lets you profile frees in debugging code, but both are functionally the same.
Edit: Added do ... while as suggested below, thanks.
If you reach pointer that has been free()d, it might break or not. That memory might be reallocated to another part of your program and then you get memory corruption,
If you set the pointer to NULL, then if you access it, the program always crashes with a segfault. No more ,,sometimes it works'', no more ,,crashes in unpredictible way''. It's way easier to debug.
Setting the pointer to the free'd memory means that any attempt to access that memory through the pointer will immediately crash, instead of causing undefined behavior. It makes it much easier to determine where things went wrong.
I can see your argument: since nPtr is going out of scope right after nPtr = NULL, there doesn't seem to be a reason to set it to NULL. However, in the case of a struct member or somewhere else where the pointer is not immediately going out of scope, it makes more sense. It's not immediately apparent whether or not that pointer will be used again by code that shouldn't be using it.
It's likely the rule is stated without making a distinction between these two cases, because it's much more difficult to automatically enforce the rule, let alone for the developers to follow it. It doesn't hurt to set pointers to NULL after every free, but it has the potential of pointing out big problems.
the most common bug in c is the double free. Basically you do something like that
free(foobar);
/* lot of code */
free(foobar);
and it end up pretty bad, the OS try to free some already freed memory and generally it segfault. So the good practice is to set to NULL, so you can make test and check if you really need to free this memory
if(foobar != null){
free(foobar);
}
also to be noted that free(NULL) won't do anything so you don't have to write the if statement. I am not really an OS guru but I am pretty even now most OSes would crash on double free.
That's also a main reason why all languages with garbage collection (Java, dotnet) was so proud of not having this problem and also not having to leave to developers the memory management as a whole.
The idea behind this, is to stop accidental reuse of the freed pointer.
Recently I come across the same question after I was looking for the answer. I reached this conclusion:
It is best practice, and one must follow this to make it portable on all (embedded) systems.
free() is a library function, which varies as one changes the platform, so you should not expect that after passing pointer to this function and after freeing memory, this pointer will be set to NULL. This may not be the case for some library implemented for the platform.
so always go for
free(ptr);
ptr = NULL;
This (can) actually be important. Although you free the memory, a later part of the program could allocate something new that happens to land in the space. Your old pointer would now point to a valid chunk of memory. It is then possible that someone would use the pointer, resulting in invalid program state.
If you NULL out the pointer, then any attempt to use it is going to dereference 0x0 and crash right there, which is easy to debug. Random pointers pointing to random memory is hard to debug. It's obviously not necessary but then that's why it's in a best practices document.
From the ANSI C standard:
void free(void *ptr);
The free function causes the space
pointed to by ptr to be deallocated,
that is, made available for further
allocation. If ptr is a null pointer,
no action occurs. Otherwise, if the
argument does not match a pointer
earlier returned by the calloc ,
malloc , or realloc function, or if
the space has been deallocated by a
call to free or realloc , the behavior
is undefined.
"the undefined behavior" is almost always a program crash. So as to avoid this it is safe to reset the pointer to NULL. free() itself cannot do this as it is passed only a pointer, not a pointer to a pointer. You can also write a safer version of free() that NULLs the pointer:
void safe_free(void** ptr)
{
free(*ptr);
*ptr = NULL;
}
I find this to be little help as in my experience when people access a freed memory allocation it's almost always because they have another pointer to it somewhere. And then it conflicts with another personal coding standard which is "Avoid useless clutter", so I don't do it as I think it rarely helps and makes the code slightly less readable.
However - I won't set the variable to null if the pointer isn't supposed to be used again, but often the higher level design gives me a reason to set it to null anyway. For example if the pointer is a member of a class and I've deleted what it points to then the "contract" if you like of the class is that that member will point to something valid at any time so it must be set to null for that reason. A small distinction but I think an important one.
In c++ it's important to always be thinking who owns this data when you allocate some memory (unless you are using smart pointers but even then some thought is required). And this process tends to lead to pointers generally being a member of some class and generally you want a class to be in a valid state at all times, and the easiest way to do that is to set the member variable to NULL to indicate it points to nothing now.
A common pattern is to set all the member pointers to NULL in the constructor and have the destructor call delete on any pointers to data that your design says that class owns. Clearly in this case you have to set the pointer to NULL when you delete something to indicate that you don't own any data before.
So to summarise, yes i often set the pointer to NULL after deleting something, but it's as part of a larger design and thoughts on who owns the data rather than due to blindly following a coding standard rule. I wouldn't do so in your example as I think there is no benefit to doing so and it adds "clutter" which in my experience is just as responsible for bugs and bad code as this kind of thing.
It is always advisable to declare a pointer variable with NULL such as,
int *ptr = NULL;
Let's say, ptr is pointing to 0x1000 memory address.
After using free(ptr), it's always advisable to nullify the pointer variable by declaring again to NULL.
e.g.:
free(ptr);
ptr = NULL;
If not re-declared to NULL, the pointer variable still keeps on pointing to the same address (0x1000), this pointer variable is called a dangling pointer.
If you define another pointer variable (let's say, q) and dynamically allocate address to the new pointer, there is a chance of taking the same address (0x1000) by new pointer variable. If in case, you use the same pointer (ptr) and update the value at the address pointed by the same pointer (ptr), then the program will end up writing a value to the place where q is pointing (since p and q are pointing to the same address (0x1000)).
e.g.
*ptr = 20; //Points to 0x1000
free(ptr);
int *q = (int *)malloc(sizeof(int) * 2); //Points to 0x1000
*ptr = 30; //Since ptr and q are pointing to the same address, so the value of the address to which q is pointing would also change.
This rule is useful when you're trying to avoid the following scenarios:
1) You have a really long function with complicated logic and memory management and you don't want to accidentally reuse the pointer to deleted memory later in the function.
2) The pointer is a member variable of a class that has fairly complex behavior and you don't want to accidentally reuse the pointer to deleted memory in other functions.
In your scenario, it doesn't make a whole lot of sense, but if the function were to get longer, it might matter.
You may argue that setting it to NULL may actually mask logic errors later on, or in the case where you assume it is valid, you still crash on NULL, so it doesn't matter.
In general, I would advise you to set it to NULL when you think it is a good idea, and not bother when you think it isn't worth it. Focus instead on writing short functions and well designed classes.
This might be more an argument to initialize all pointers to NULL, but something like this can be a very sneaky bug:
void other_func() {
int *p; // forgot to initialize
// some unrelated mallocs and stuff
// ...
if (p) {
*p = 1; // hm...
}
}
void caller() {
some_func();
other_func();
}
p ends up in the same place on the stack as the former nPtr, so it might still contain a seemingly valid pointer. Assigning to *p might overwrite all kinds of unrelated things and lead to ugly bugs. Especially if the compiler initializes local variables with zero in debug mode but doesn't once optimizations are turned on. So the debug builds don't show any signs of the bug while release builds blow up randomly...
Set the pointer that has just been freed to NULL is not mandatory but a good practice. In this way , you can avoid 1) using a freed pointed 2)free it towice
There are two reasons:
Avoid crashes when double-freeing
Written by RageZ in a duplicate question.
The most common bug in c is the double
free. Basically you do something like
that
free(foobar);
/* lot of code */
free(foobar);
and it end up pretty bad, the OS try
to free some already freed memory and
generally it segfault. So the good
practice is to set to NULL, so you
can make test and check if you really
need to free this memory
if(foobar != NULL){
free(foobar);
}
also to be noted that free(NULL)
won't do anything so you don't have to
write the if statement. I am not
really an OS guru but I am pretty even
now most OSes would crash on double
free.
That's also a main reason why all
languages with garbage collection
(Java, dotnet) was so proud of not
having this problem and also not
having to leave to developer the
memory management as a whole.
Avoid using already freed pointers
Written by Martin v. Löwis in a another answer.
Setting unused pointers to NULL is a
defensive style, protecting against
dangling pointer bugs. If a dangling
pointer is accessed after it is freed,
you may read or overwrite random
memory. If a null pointer is accessed,
you get an immediate crash on most
systems, telling you right away what
the error is.
For local variables, it may be a
little bit pointless if it is
"obvious" that the pointer isn't
accessed anymore after being freed, so
this style is more appropriate for
member data and global variables. Even
for local variables, it may be a good
approach if the function continues
after the memory is released.
To complete the style, you should also
initialize pointers to NULL before
they get assigned a true pointer
value.
To add to what other have said, one good method of pointer usage is to always check whether it is a valid pointer or not. Something like:
if(ptr)
ptr->CallSomeMethod();
Explicitly marking the pointer as NULL after freeing it allows for this kind of usage in C/C++.
Settings a pointer to NULL is to protect agains so-called double-free - a situation when free() is called more than once for the same address without reallocating the block at that address.
Double-free leads to undefined behaviour - usually heap corruption or immediately crashing the program. Calling free() for a NULL pointer does nothing and is therefore guaranteed to be safe.
So the best practice unless you now for sure that the pointer leaves scope immediately or very soon after free() is to set that pointer to NULL so that even if free() is called again it is now called for a NULL pointer and undefined behaviour is evaded.
The idea is that if you try to dereference the no-longer-valid pointer after freeing it, you want to fail hard (segfault) rather than silently and mysteriously.
But... be careful. Not all systems cause a segfault if you dereference NULL. On (at least some versions of) AIX, *(int *)0 == 0, and Solaris has optional compatibility with this AIX "feature."
To the original question:
Setting the pointer to NULL directly after freeing the contents is a complete waste of time, provided the code meets all requirements, is fully debugged and will never be modified again. On the other hand, defensively NULLing a pointer that has been freed can be quite useful when someone thoughtlessly adds a new block of code beneath the free(), when the design of the original module isn't correct, and in the case of it-compiles-but-doesn't-do-what-I-want bugs.
In any system, there is an unobtainable goal of making it easiest to the right thing, and the irreducible cost of inaccurate measurements. In C we're offered a set of very sharp, very strong tools, which can create many things in the hands of a skilled worker, and inflict all sorts of metaphoric injuries when handled improperly. Some are hard to understand or use correctly. And people, being naturally risk averse, do irrational things like checking a pointer for NULL value before calling free with it…
The measurement problem is that whenever you attempt to divide good from less good, the more complex the case, the more likely you get an ambiguous measurement. If the goal is do keep only good practices, then some ambiguous ones get tossed out with the actually not good. IF your goal is to eliminate the not good, then the ambiguities may stay with the good. The two goals, keep only good or eliminate clearly bad, would seem to be diametrically opposed, but there is usually a third group that's neither one nor the other, some of both.
Before you make a case with the quality department, try looking through the bug data base to see how often, if ever, invalid pointer values caused problems that had to be written down. If you want to make real difference, identify the most common problem in your production code and propose three ways to prevent it
As you have a quality assurance team in place, let me add a minor point about QA. Some automated QA tools for C will flag assignments to freed pointers as "useless assignment to ptr". For example PC-lint/FlexeLint from Gimpel Software says
tst.c 8 Warning 438: Last value assigned to variable 'nPtr' (defined at line 5) not used
There are ways to selectively suppress messages, so you can still satisfy both QA requirements, should your team decide so.
Long story short: You do not want to accidentally (by mistake) access the address that you have freed. Because, when you free the address, you allow that address in the heap to be allocated to some other application.
However, if you do not set the pointer to NULL, and by mistake try to de-reference the pointer, or change the value of that address; YOU CAN STILL DO IT. BUT NOT SOMETHING THAT YOU WOULD LOGICALLY WANT TO DO.
Why can I still access the memory location that I have freed? Because: You may have free the memory, but the pointer variable still had information about the heap memory address. So, as a defensive strategy, please set it to NULL.

C get value from pointer to array of structs [duplicate]

I have the following code.
#include <iostream>
int * foo()
{
int a = 5;
return &a;
}
int main()
{
int* p = foo();
std::cout << *p;
*p = 8;
std::cout << *p;
}
And the code is just running with no runtime exceptions!
The output was 58
How can it be? Isn't the memory of a local variable inaccessible outside its function?
How can it be? Isn't the memory of a local variable inaccessible outside its function?
You rent a hotel room. You put a book in the top drawer of the bedside table and go to sleep. You check out the next morning, but "forget" to give back your key. You steal the key!
A week later, you return to the hotel, do not check in, sneak into your old room with your stolen key, and look in the drawer. Your book is still there. Astonishing!
How can that be? Aren't the contents of a hotel room drawer inaccessible if you haven't rented the room?
Well, obviously that scenario can happen in the real world no problem. There is no mysterious force that causes your book to disappear when you are no longer authorized to be in the room. Nor is there a mysterious force that prevents you from entering a room with a stolen key.
The hotel management is not required to remove your book. You didn't make a contract with them that said that if you leave stuff behind, they'll shred it for you. If you illegally re-enter your room with a stolen key to get it back, the hotel security staff is not required to catch you sneaking in. You didn't make a contract with them that said "if I try to sneak back into my room later, you are required to stop me." Rather, you signed a contract with them that said "I promise not to sneak back into my room later", a contract which you broke.
In this situation anything can happen. The book can be there—you got lucky. Someone else's book can be there and yours could be in the hotel's furnace. Someone could be there right when you come in, tearing your book to pieces. The hotel could have removed the table and book entirely and replaced it with a wardrobe. The entire hotel could be just about to be torn down and replaced with a football stadium, and you are going to die in an explosion while you are sneaking around.
You don't know what is going to happen; when you checked out of the hotel and stole a key to illegally use later, you gave up the right to live in a predictable, safe world because you chose to break the rules of the system.
C++ is not a safe language. It will cheerfully allow you to break the rules of the system. If you try to do something illegal and foolish like going back into a room you're not authorized to be in and rummaging through a desk that might not even be there anymore, C++ is not going to stop you. Safer languages than C++ solve this problem by restricting your power—by having much stricter control over keys, for example.
UPDATE
Holy goodness, this answer is getting a lot of attention. (I'm not sure why—I considered it to be just a "fun" little analogy, but whatever.)
I thought it might be germane to update this a bit with a few more technical thoughts.
Compilers are in the business of generating code which manages the storage of the data manipulated by that program. There are lots of different ways of generating code to manage memory, but over time two basic techniques have become entrenched.
The first is to have some sort of "long lived" storage area where the "lifetime" of each byte in the storage—that is, the period of time when it is validly associated with some program variable—cannot be easily predicted ahead of time. The compiler generates calls into a "heap manager" that knows how to dynamically allocate storage when it is needed and reclaim it when it is no longer needed.
The second method is to have a “short-lived” storage area where the lifetime of each byte is well known. Here, the lifetimes follow a “nesting” pattern. The longest-lived of these short-lived variables will be allocated before any other short-lived variables, and will be freed last. Shorter-lived variables will be allocated after the longest-lived ones, and will be freed before them. The lifetime of these shorter-lived variables is “nested” within the lifetime of longer-lived ones.
Local variables follow the latter pattern; when a method is entered, its local variables come alive. When that method calls another method, the new method's local variables come alive. They'll be dead before the first method's local variables are dead. The relative order of the beginnings and endings of lifetimes of storages associated with local variables can be worked out ahead of time.
For this reason, local variables are usually generated as storage on a "stack" data structure, because a stack has the property that the first thing pushed on it is going to be the last thing popped off.
It's like the hotel decides to only rent out rooms sequentially, and you can't check out until everyone with a room number higher than you has checked out.
So let's think about the stack. In many operating systems you get one stack per thread and the stack is allocated to be a certain fixed size. When you call a method, stuff is pushed onto the stack. If you then pass a pointer to the stack back out of your method, as the original poster does here, that's just a pointer to the middle of some entirely valid million-byte memory block. In our analogy, you check out of the hotel; when you do, you just checked out of the highest-numbered occupied room. If no one else checks in after you, and you go back to your room illegally, all your stuff is guaranteed to still be there in this particular hotel.
We use stacks for temporary stores because they are really cheap and easy. An implementation of C++ is not required to use a stack for storage of locals; it could use the heap. It doesn't, because that would make the program slower.
An implementation of C++ is not required to leave the garbage you left on the stack untouched so that you can come back for it later illegally; it is perfectly legal for the compiler to generate code that turns back to zero everything in the "room" that you just vacated. It doesn't because again, that would be expensive.
An implementation of C++ is not required to ensure that when the stack logically shrinks, the addresses that used to be valid are still mapped into memory. The implementation is allowed to tell the operating system "we're done using this page of stack now. Until I say otherwise, issue an exception that destroys the process if anyone touches the previously-valid stack page". Again, implementations do not actually do that because it is slow and unnecessary.
Instead, implementations let you make mistakes and get away with it. Most of the time. Until one day something truly awful goes wrong and the process explodes.
This is problematic. There are a lot of rules and it is very easy to break them accidentally. I certainly have many times. And worse, the problem often only surfaces when memory is detected to be corrupt billions of nanoseconds after the corruption happened, when it is very hard to figure out who messed it up.
More memory-safe languages solve this problem by restricting your power. In "normal" C# there simply is no way to take the address of a local and return it or store it for later. You can take the address of a local, but the language is cleverly designed so that it is impossible to use it after the lifetime of the local ends. In order to take the address of a local and pass it back, you have to put the compiler in a special "unsafe" mode, and put the word "unsafe" in your program, to call attention to the fact that you are probably doing something dangerous that could be breaking the rules.
For further reading:
What if C# did allow returning references? Coincidentally that is the subject of today's blog post:
Ref returns and ref locals
Why do we use stacks to manage memory? Are value types in C# always stored on the stack? How does virtual memory work? And many more topics in how the C# memory manager works. Many of these articles are also germane to C++ programmers:
Memory management
You're are simply reading and writing to memory that used to be the address of a. Now that you're outside of foo, it's just a pointer to some random memory area. It just so happens that in your example, that memory area does exist and nothing else is using it at the moment.
You don't break anything by continuing to use it, and nothing else has overwritten it yet. Therefore, the 5 is still there. In a real program, that memory would be reused almost immediately and you'd break something by doing this (though the symptoms may not appear until much later!).
When you return from foo, you tell the OS that you're no longer using that memory and it can be reassigned to something else. If you're lucky and it never does get reassigned, and the OS doesn't catch you using it again, then you'll get away with the lie. Chances are though you'll end up writing over whatever else ends up with that address.
Now if you're wondering why the compiler doesn't complain, it's probably because foo got eliminated by optimization. It usually will warn you about this sort of thing. C assumes you know what you're doing though, and technically you haven't violated scope here (there's no reference to a itself outside of foo), only memory access rules, which only triggers a warning rather than an error.
In short: this won't usually work, but sometimes will by chance.
Because the storage space wasn't stomped on just yet. Don't count on that behavior.
A little addition to all the answers:
If you do something like this:
#include <stdio.h>
#include <stdlib.h>
int * foo(){
int a = 5;
return &a;
}
void boo(){
int a = 7;
}
int main(){
int * p = foo();
boo();
printf("%d\n", *p);
}
The output probably will be: 7
That is because after returning from foo() the stack is freed and then reused by boo().
If you disassemble the executable, you will see it clearly.
In C++, you can access any address, but it doesn't mean you should. The address you are accessing is no longer valid. It works because nothing else scrambled the memory after foo returned, but it could crash under many circumstances. Try analyzing your program with Valgrind, or even just compiling it optimized, and see...
You never throw a C++ exception by accessing invalid memory. You are just giving an example of the general idea of referencing an arbitrary memory location. I could do the same like this:
unsigned int q = 123456;
*(double*)(q) = 1.2;
Here I am simply treating 123456 as the address of a double and write to it. Any number of things could happen:
q might in fact genuinely be a valid address of a double, e.g. double p; q = &p;.
q might point somewhere inside allocated memory and I just overwrite 8 bytes in there.
q points outside allocated memory and the operating system's memory manager sends a segmentation fault signal to my program, causing the runtime to terminate it.
You win the lottery.
The way you set it up it is a bit more reasonable that the returned address points into a valid area of memory, as it will probably just be a little further down the stack, but it is still an invalid location that you cannot access in a deterministic fashion.
Nobody will automatically check the semantic validity of memory addresses like that for you during normal program execution. However, a memory debugger such as Valgrind will happily do this, so you should run your program through it and witness the errors.
Did you compile your program with the optimiser enabled? The foo() function is quite simple and might have been inlined or replaced in the resulting code.
But I agree with Mark B that the resulting behavior is undefined.
Your problem has nothing to do with scope. In the code you show, the function main does not see the names in the function foo, so you can't access a in foo directly with this name outside foo.
The problem you are having is why the program doesn't signal an error when referencing illegal memory. This is because C++ standards does not specify a very clear boundary between illegal memory and legal memory. Referencing something in popped out stack sometimes causes error and sometimes not. It depends. Don't count on this behavior. Assume it will always result in error when you program, but assume it will never signal error when you debug.
Pay attention to all warnings. Do not only solve errors.
GCC shows this warning:
warning: address of local variable 'a' returned
This is the power of C++. You should care about memory. With the -Werror flag, this warning became an error and now you have to debug it.
It works because the stack has not been altered (yet) since a was put there.
Call a few other functions (which are also calling other functions) before accessing a again and you will probably not be so lucky anymore... ;-)
You are just returning a memory address. It's allowed, but it's probably an error.
Yes, if you try to dereference that memory address you will have undefined behavior.
int * ref () {
int tmp = 100;
return &tmp;
}
int main () {
int * a = ref();
// Up until this point there is defined results
// You can even print the address returned
// but yes probably a bug
cout << *a << endl;//Undefined results
}
This behavior is undefined, as Alex pointed out. In fact, most compilers will warn against doing this, because it's an easy way to get crashes.
For an example of the kind of spooky behavior you are likely to get, try this sample:
int *a()
{
int x = 5;
return &x;
}
void b( int *c )
{
int y = 29;
*c = 123;
cout << "y=" << y << endl;
}
int main()
{
b( a() );
return 0;
}
This prints out "y=123", but your results may vary (really!). Your pointer is clobbering other, unrelated local variables.
That's classic undefined behaviour that's been discussed here not two days ago -- search around the site for a bit. In a nutshell, you were lucky, but anything could have happened and your code is making invalid access to memory.
You actually invoked undefined behaviour.
Returning the address of a temporary works, but as temporaries are destroyed at the end of a function the results of accessing them will be undefined.
So you did not modify a but rather the memory location where a once was. This difference is very similar to the difference between crashing and not crashing.
In typical compiler implementations, you can think of the code as "print out the value of the memory block with adress that used to be occupied by a". Also, if you add a new function invocation to a function that constains a local int it's a good chance that the value of a (or the memory address that a used to point to) changes. This happens because the stack will be overwritten with a new frame containing different data.
However, this is undefined behaviour and you should not rely on it to work!
It can, because a is a variable allocated temporarily for the lifetime of its scope (foo function). After you return from foo the memory is free and can be overwritten.
What you're doing is described as undefined behavior. The result cannot be predicted.
The things with correct (?) console output can change dramatically if you use ::printf but not cout.
You can play around with debugger within below code (tested on x86, 32-bit, Visual Studio):
char* foo()
{
char buf[10];
::strcpy(buf, "TEST");
return buf;
}
int main()
{
char* s = foo(); // Place breakpoint and the check 's' variable here
::printf("%s\n", s);
}
It's the 'dirty' way of using memory addresses. When you return an address (pointer) you don't know whether it belongs to local scope of a function. It's just an address.
Now that you invoked the 'foo' function, that address (memory location) of 'a' was already allocated there in the (safely, for now at least) addressable memory of your application (process).
After the 'foo' function returned, the address of 'a' can be considered 'dirty', but it's there, not cleaned up, nor disturbed/modified by expressions in other part of program (in this specific case at least).
A C/C++ compiler doesn't stop you from such 'dirty' access (it might warn you though, if you care). You can safely use (update) any memory location that is in the data segment of your program instance (process) unless you protect the address by some means.
After returning from a function, all identifiers are destroyed instead of kept values in a memory location and we can not locate the values without having an identifier. But that location still contains the value stored by previous function.
So, here function foo() is returning the address of a and a is destroyed after returning its address. And you can access the modified value through that returned address.
Let me take a real world example:
Suppose a man hides money at a location and tells you the location. After some time, the man who had told you the money location dies. But still you have the access of that hidden money.
Your code is very risky. You are creating a local variable (which is considered destroyed after function ends) and you return the address of memory of that variable after it is destroyed.
That means the memory address could be valid or not, and your code will be vulnerable to possible memory address issues (for example, a segmentation fault).
This means that you are doing a very bad thing, because you are passing a memory address to a pointer which is not trustable at all.
Consider this example, instead, and test it:
int * foo()
{
int *x = new int;
*x = 5;
return x;
}
int main()
{
int* p = foo();
std::cout << *p << "\n"; // Better to put a newline in the output, IMO
*p = 8;
std::cout << *p;
delete p;
return 0;
}
Unlike your example, with this example you are:
allocating memory for an int into a local function
that memory address is still valid also when function expires (it is not deleted by anyone)
the memory address is trustable (that memory block is not considered free, so it will be not overridden until it is deleted)
the memory address should be deleted when not used. (see the delete at the end of the program)

How does C manage stack with pointers?

So I was messing up with dynamic memory and pointers, and I was wondering how C was managing the stack when it comes to pointers that points to local variables.
I came out with this simple function :
int* dummy(){
int test = 4;
int *t2;
t2 = &test;
return t2;
}
This function initialize a pointer, and an int as a local variable (should not be accessible outside of my function, as the stack state will be restored once I get out of the function). However, I am returning the pointer as the result of my function.
I can get the pointer back and print the value of my local variable with :
#include <stdio.h>
int main(void){
int* p = dummy();
// some other calls to other functions to mess up the bellow stack,
// where my local variable "test" was supposed to be landing
printf("%d\n", *p); // printing the value of "test" (which is 4)
}
Result
$ ./a.out
4
Why is this printing the correct result? Isn't the pointer pointing at a variable in a stack from an other state? I am confused.
If the memory stays somewhere without dynamic allocation, where does it stay? Is it lost forever? (no way to "free" it)
EDIT after the comments
The behavior is undefined. Adding compiler options for warning such as pedantic will print a warning that I am returning a pointer pointing at a local variable, and the executable gets bugus.
The reason for this is that dummy's stack state get lost when the program exists the function, thus not assuring the value of local variables, because they are... local.
One of the possible outcomes of undefined behavior is - behaving as expected.
Once dummy returns, test no longer exists - logically speaking. However, the region of the stack it occupied may not be immediately overwritten, so that value may persist in that (virtual) location for some time afterwards.
The pointer is invalid - we’re using it to access an object outside of that object’s lifetime - so the behavior is definitely undefined. But that doesn’t mean that the value must be something other than 4.
Undefined behavior is undefined. There is no value this program can print that is, in any sense, "correct". What's confusing you is that you think there is some "correct" value this program can print and therefore you wonder why it's printing the "correct" value.
The problem is entirely in your incorrect understanding that some value is more "correct" than some other value. All values are equally "correct" for this program.
I was wondering how C was managing the stack when it comes to pointers that points to local variables
It does not. C does not even require a stack. What you are doing is undefined behavior which means that the C standard does not impose ANY requirements on the compiler.
So the answer to your question has nothing to do with the C standard. It can be boiled down to "how it's usually handled". But when you do stuff like it is, as I mentioned, undefined behavior. So this kind of code is likely to be messed up as soon as you turn on any compiler optimizations.
It's possible to make educated guesses about the behavior, but you have no guarantees.
Actually, it's kind of like looking into your neighbours window and be surprised that it's the same neighbour you've had for years. In the exact same way, you have no guarantees that your neighbour has not suddenly moved and someone else have moved in. Well, in your case, this happened. Your neighbour have not moved out yet.

Basic C pointer allocation/deallocation

Writing code in C, never formally learned any of it, using GNU's GSL library, quick fundamental question.
Correct me if I'm wrong, but the way I understand it, when I allocate memory to use for my matrices (using the built-in var = gsl_matrix_alloc(x,x)) and store them in a variable, I'm essentially creating
a pointer, which is simply some address of memory, like:
x01234749162
which POINTS to the first pointer/memory location of my GSL matrix. Tracking when to deallocate the memory of the structure associated with a pointer (again, built-in gsl_matrix_free(x,x,x)) is no problem, and I understand I need to do this before I reassign the pointer of the structure, otherwise I've created a memory leak.
So now to my question, and again, I know this is basic, but hear me out--I couldn't find a particularly straight answer on stackoverflow, mostly because a lot of the answers involve C++ rather than C--how do I free the pointer to the structure itself?
Everyone says "oh, just set it to NULL". Why would that work? That's just changing the memory address that POINTS to the deallocated structure. Does that tell the MMU that that memory location is okay to use now? When I debug my program in XCode, for example, all of the properties of the gsl_matrix structure is successfully deallocated; everything just becomes this garbage string of random hex characters, which is what free'ing memory is supposed to do. But, I can still see the variable name (pointer) while stepping through the debugger... even if I set the variable to NULL. I would interpret that as meaning I did not free the pointer, I just freed the structure and set it to x0000000 (NULL).
Am I doing everything correct, and this is just a feature of XCode, or am I missing something basic?
And I realize that a single pointer to a structure, if the structure is deallocated, could be argued to not be a big deal, but it matters.
Here's some code to try to illustrate my thoughts.
gsl_matrix* my_matrix;
// create single memory address in memory, not pointing to anything yet
my_matrix = gsl_matrix_alloc(5, 5);
// allocates 25 memory spaces for the values that the pointer held by my_matrix
// points too
// Note: so, now there's 26 memory spots allocated to the matrix, excluding other
// properties created along with the my-matrix structure, right?
gsl_matrix_free(my_matrix); // deallocates those 25 spaces the structure had,
// along with other properties that may have been automatically created
free(my_matrix); // SIGBRT error. Is the pointer to the deallocated structure
// still using that one memory address?
my_matrix = NULL; // this doesn't make sense to me.I get that any future referral
// to the my_matrix pointer will just return garbage, and so setting a pointer to
// that can help in debugging, but can the pointer--that is just one memory
// address--be completely deallocated such that in the debugger the variable name
// disappears?
What you miss here is the knowledge of how "local variables" work at the machine level and the concept of "the stack".
The stack is a block of free memory that is allocated for your program when it starts. Suppose, for the sake of a simple example, that your program is allocated a stack of size 1MB. The stack is accompanied with a special register, called "stack pointer", which initially points to the end of the stack (don't ask why not the beginning, historical reasons). Here's how it looks:
[---------- stack memory, all yours for taking ------------]
^
|
Stack pointer
Now suppose your program defines a bunch of variables in the main function, i.e. something like
int main() {
int x;
What this means is that when the main function is invoked at the start of your program, the compiler will generate the following instructions:
sp = sp - 4; // Decrement stack pointer
x_address = sp;
and remember (for purposes of further compilation) that x is now a 4-byte integer located at memory position x_address. Your stack now looks as follows:
[---------- stack memory, all yours for taking --------[-x--]
^
|
Stack pointer
Next, suppose you invoke some function f from within the main. Suppose f defines inside it another variable,
int f() {
char z[8];
Guess what happens now? Before entering f the compiler will perform:
sp = sp - 8;
z_address = sp;
I.e. you'll get:
[---------- stack memory, all yours for taking -[--z----][-x--]
^
|
Stack pointer
If you now invoke another function, the stack pointer will move deeper into the stack, "creating" more space for the local variables. Each time when you exit a function, though, the stack pointer is restored back to where it was before the function was invoked. E.g. after you exit f, your stack will be looking as follows:
[---------- stack memory, all yours for taking -[--z----][-x--]
^
|
Stack pointer
Note that the z array was not essentially freed, it is still there on the stack, but you do not care. Why don't you care? Because the whole stack is automatically deallocated when your application terminates. This is the reason why you do not need to manually deallocate "variables on the stack", i.e. those which are defined as local to your functions and modules. In particular, your my_matrix pointer is just a yet another variable like that.
PS: there is a bit more happening on the stack than I described. In particular, the stack pointer value is stored on the stack before decrementing it, so that it can be restored after exiting the function. In addition, function arguments are often passed by putting them onto the stack. In this sense they look like local variables for purposes of memory management and you don't need to free them.
PPS: In principle, the compiler is free to optimize your code (especially if you compile with the -O flag) and rather than allocate your local variables on the stack it may:
Decide to avoid allocating them at all (e.g. if they turn out to be useless)
Decide to allocate them temporarily in the registers (which are fixed memory slots in the processor that do not need to be freed). This is often done for loop variables (the ones within for (int i = ...)).
.. and well, do whatever else comes to his twisted mind as long as the result does not contradict the semantics.
PPPS: Now you are prepared to learn how buffer overflow works. Go read about it, really, it is an amazing trick. Oh, and, once you're at it, check out the meaning of stack overflow. ;)
Why would 26 memory spots be allocated for a 5x5 matrix? I'd say trust the library-provided gsl_matrix_free function to do the right thing and deallocate the whole structure.
In general, you only need to call free if you called malloc or calloc. Library functions that provide an allocator usually provide a matching deallocator so that you don't have to keep track of the internals.
If the 26th spot you're worried about is the pointer itself (in other words, the memory needed to store the address of the matrix), that space is part of the stack frame for your function, and it is automatically popped when the function returns.
Everyone says "oh, just set it to NULL". Why would that work?
They probably mean that this would fix the problem where you are calling free on a pointer to some data that has already been de-allocated, which is what you are doing here:
gsl_matrix_free(my_matrix); // deallocate
free(my_matrix); // Mistake, BIG PROBLEM: my_matrix points to de-allocated data
It fixes the problem because calling free on a null-ptr is a no op:
gsl_matrix_free(my_matrix); // deallocate
my_matrix = NULL;
free(my_matrix); // Mistake, but no problem
Note: my_matrix itself has automatic storage, so there is no need to de-allocate it manually. Its memory will be reclaimed when it goes out of scope. The only thing that needs to be de-allocated is the memory that was dynamically allocated (and to which my_matrix points.)

Why are the contents pointed to by a pointer not changed when memory is deallocated using free()?

I am a newbie when it comes to dynamic memory allocation. When we free the memory using void free(void *ptr) the memory is deallocated but the contents of the pointer are not deleted. Why is that? Is there any difference in more recent C compilers?
Computers don't "delete" memory as such, they just stop using all references to that memory cell and forget that anything of value is stored there. For example:
int* func (void)
{
int x = 5;
return &x;
}
printf("%d", *func()); // undefined behavior
Once the function has finished, the program stops reserving the memory location where x is stored, any other part of the program (or perhaps another program) is free to use it. So the above code could print 5, or it could print garbage, or it could even crash the program: referencing the contents of a memory cell that has ceased to be valid is undefined behavior.
Dynamic memory is no exception to this and works in the same manner. Once you have called free(), the contents of that part of the memory can be used by anyone.
Also, see this question.
The thing is that accessing memory after it has been freed is undefined behavior. It's not only that the memory contents are undefined, accessing them could lead to anything. At least some compilers when you build a debug version of the code, actually do change the contents of the memory to aid in debugging, but in release versions it's generally unnecessary to do that, so the memory is just left as is, but anyway, that is not something you can safely rely upon, don't access freed memory, it's unsafe!
In C, parameters are passed by value. So free just can't change the value of ptr.
Any change it would make would only change the value within the free function, and won't affect the caller's variable.
Also, changing it won't be so much help. There can be multiple pointers pointing to the same piece of memory, and they should all be reset when freeing. The language can't keep track of them all, so it leaves the programmer to handle the pointers.
This is very normal, because clearing the memory location after free is an overhead and generally not necessary. If you have security concerns, you can wrap the free call within a function which clears the region before freeing. You'll also notice that this requires the knowledge of the allocation size, which is another overhead.
Actually the C programming language specifies that after the lifetime of the object, even the value of any pointer pointing to it becomes indeterminate, i.e. you can't even depend on the pointer to even retain the original value.
That is because a good compiler will try to aggressively store all the variables into the CPU registers instead of memory. So after it sees that the program flow calls a function named free with the argument ptr, it can mark the register of the ptr free for other use, until it has been assigned to again, for example ptr = malloc(42);.
In between these two it could be seen changing the value, or comparing inequal against its original value, or other similar behaviour. Here's an example of what might happen.

Resources