Pointer scopes in C - c

In the following code, the explanation for the failure to print anything is that the pointer returned by get_message() is out of scope:
char *get_message() {
char msg [] = "Aren’t pointers fun?";
return msg ;
}
int main (void) {
char *foo = get_message();
puts(foo);
return 0;
}
When run in gdb, it turns out that the data at the position of foo is the string "Aren't pointers fun?":
Old value = 0x0
New value = 0x7fffffffde60 "Aren’t pointers fun?"
(This seems consistent with answers which states that the data for a pointer which passes out of scope remains in memory), but the documentation for "puts" states first data is copied from the address given: presumably 0x7fffffffde60 in this case.
Therefore: why is nothing output?
EDIT: Thanks for your answers:
I ran the original code to completion in gdb, the call to puts does indeed change the data at the address where foo was stored.
(gdb) p foo
$1 = 0x7fffffffde60 "Aren’t pointers fun?"
(gdb) n
11 return 0;
(gdb) p foo
$2 = 0x7fffffffde60 "`\336\377\377\377\177"
Interestingly, the code did print the message when I changed the code for change_msg() to:
char *get_message() {
char *msg = "Aren’t pointers fun?";
return msg ;
}
In this case, the data at foo (address 0x4005f4 - does the smaller size of the address mean anything?) remains the same throughout the code. It'd be cool to find out why this changes the behaviour

The variable msg is allocated on the stack of get_message()
char msg [] = "Aren’t pointers fun?";
Once get_message() returns, the stack for that method is torn down. There is no guarantee at that point of what is in the memory that the pointer returned to foo now points to.
When puts() is called, the stack is likely modified, overwriting "Aren't pointer's fun."

It is likely that calling puts modifies the stack and overwrites the string.
Just returning from get_message leaves the string unchanged, but deallocated, i.e. its memory space is available for reuse.

The real question here is not, "why doesn't it work?". The question is, "Why does the string seem to exist even after the return from get_message, but then still not work?"
To clarify, let's look at the main function again, with two comments for reference:
int main (void) {
char *foo = get_message();
/* point A */
puts(foo);
/* point B */
return 0;
}
I just compiled and ran this under gdb. Indeed, at point A, when I printed out the value of the variable foo in gdb, gdb showed me that it pointed to the string "Aren’t pointers fun?". But then, puts failed to print that string. And then, at point B, if I again printed out foo in gdb, it was no longer the string it had been.
The explanation, as several earlier commenters have explained, is that function get_message leaves the string on the stack, where it's not guaranteed to stay for long. After get_message returns, and before anything else has been called, it's still there. But when we call puts, and puts begins working, it's using that same portion of the stack for its own local storage, so sometime in there (and before puts manages to actually print the string), the string gets destroyed.
In response to the OP's follow-on question: When we had
char *get_message() {
char msg [] = "Aren’t pointers fun?";
return msg ;
}
the string lives in the array msg which is on the stack, and we return a pointer to that array, which doesn't work because the data in the array eventually disappears. If we change it to
char * msg = "Aren’t pointers fun?";
(such a tiny-seeming change!), now the string is stored in the program's initialized data segment, and we return a pointer to that, and since it's in the program's initialized data segment, it sticks around essentially forever. (And yes, the fact that get_message ends up returning a different-looking address is significant, although I wouldn't read too much into whether it's lower or higher.)
The bottom line is that arrays and pointers are different. Hugely hugely different. The line
char arr[] = "Hello, world!";
bears almost no relation to the very similar-looking line
char *ptr = "Hello, world!";
Now, they're the same in that you can do both
printf("%s\n", arr);
and
printf("%s\n", ptr);
But if you try to say
arr = "Goodbye"; /* WRONG */
you can't, because you can't assign to an array. If you want a new string here, you have to use strcpy, and you have to make sure that the new string is the same length or shorter:
strcpy(arr, "Goodbye");
But if you try the strcpy thing with the pointer:
strcpy(ptr, "Goodbye"); /* WRONG */
now that doesn't work, because the string constant that ptr points is nonwritable. In the pointer case, you can (and often must) use simple assignment:
ptr = "Goodbye";
and in this case there's no problem setting it to a longer string, too:
ptr = "Supercalafragalisticexpialadocious";
Those are the basic differences, but as this question points out, another big difference is that the array arr can't be usefully declared in and returned from a function (unless you make it static), while the pointer ptr can.

The lifetime of msg ends when returning from the function get_message. The returned pointer points to the object whose lifetime has ended.
Accessing it yields undefined behaviour. Anything can happen.
In your case, the memory of the former msg seems to be overwritten with 0 already.
And this is not about "scope". You can fix your code by making msg static. This does not change the scope but its lifetime (a.k.a. storage duration).

In your getMessage function, the memory used by your message is on the stack and not on the heap. Its still a pointer, just to a location on the stack. Once the function returns, the stack altered (to get the return ip etc) when means, although the message MIGHT still be in the same location in memory, there is absolutely no guarantee. If anything else puts something on to the stack (such as another function call) then most likely it will be overridden. Your message is gone.
The better approach would be to allocate the memory dynamically with malloc to make certain the string in on the heap (although this leads to the problem of who owns the pointer and is responsible for freeing it.)
If you must do something like this, I have seen it done using static:
static char * message = "I love static pointers";
Edit: despite mentioning that is MIGHT still be on the stack, NEVER EVER ASSUME it is. Most languages won't even allow this.

Related

C null terminator's throught char* correct handelling

This question is aimed at improving my understanding of
what I can and cannot do with pointers when allocating and freeing:
The bellow code is not meant to run, but just set up a situation for the questions bellow.
char *var1 = calloc(8,sizeof(char));
char **var2 = calloc(3,sizeof(char*));
var1 = "01234567";
var1[2] = '\0';
var1[5] = '\0';
//var1 = [0][1][\0][3][4][\0][6][7]
var2[0] = var1[0];
var2[1] = var1[3];
var2[2] = var1[6];
free(var1);
free(var2);
given the following snippet
1: is it ok to write to a location after the \0 if you know the size you allocated.
2: Can I do what I did with var2 , if it points to a block that another pointer is pointing at?
3: are the calls to free ok? or will free die due to the \0 located throughout var1.
I printed out all the variables after free, and only the ones up to the first null got freed (changed to null or other weird and normal looking characters). Is that ok?
4: Any other stuff you wish to point out that is completely wrong and should be avoided.
Thank you very much.
Ok, let's just just recap what you have done here:
char *var1 = calloc(8,sizeof(char));
char **var2 = calloc(3,sizeof(char*));
So var1 is (a pointer to) a block of 8 chars, all set to zero \0.
And var2 is (a pointer to) a block of 3 pointers, all set to NULL.
So now it's the program's memory, it can do whatever it wants with it.
To answer your questions specifically ~
It's quite normal to write characters around inside your char block. It's a common programming pattern to parse string buffers by writing a \0 after a section of text to use everyday C string operations on it, but then point to the next character after the added \0 and continue parsing.
var2 is simply a bunch of char-pointers, it can point to whatever char is necessary, it doesn't necessarily have to be at the beginning of the string.
The calls to free() are somewhat OK (except for the bug - see below). It's normal for the content of free()d blocks to be overwritten when they are returned to the stack, so they often seem to have "rubbish" characters in them if printed out afterwards.
There is some issues with the assignment of var1 ~
var1 = "01234567";
Here you are saying "var1 now points to this constant string". Your compiler may have generated a warning about about this. Firstly the code assigns a const char* to a char* (hard-coded strings are const, but C compilers will only warn about this [EDIT: this is true for C++, not C, see comment from n.m.]). And secondly, the code lost all references to the block of memory that var1 used to point to. You can now never free() this memory - it has leaked. However, at the end of the program, the free() is trying to operate on a pointer-to a block of memory (the "01234567") which was not allocated on the heap. This is BAD. Since you're exiting immediately, there's no ill-effects, but if this was in the middle of execution, the next allocation (or next 1000th!) could crash weirdly. These sorts of problems are hard to debug.
Probably what you should have done here (I'm guessing your intention though) is used a string copy:
strncpy(var1, "01234567", 8);
With that operation instead of the assignment, everything is OK. This is because the digits are stored in the memory allocated on line1.
Question 4 - what's wrong
You 'calloc' some memory and store a pointer to it in var1. Then later you execute var1 = "01234567" which stores a pointer to a literal string in var1, thus losing the calloc'd memory. I imagine you thought you were copying a string. Use strcpy or similar.
Then you write zero values into what var1 points to. Since that's a literal string, it may fail if the literal is in read-only memory. The result is undefined.
free(var1) is not going to go well with a pointer to a literal. Your code may fail or you may get heap corruption.
Pointers don't work this way.
If someone wrote
int a = 6*9;
a = 42;
you would wonder why they ever bothered to initialise a to 6*9 in the first place — and you would be right. There's no reason to. The value returned by * is simply forgotten without being used. It could be never calculated in the first place and no one would know the difference. This is exactly equivalent to
int a = 42;
Now when pointers are involved, there's some kind of evil neural pathway in our brain that tries to tell us that a sequence of statements that is exactly like the one shown above is somehow working differently. Don't trust your brain. It isn't.
char *var1 = calloc(8,sizeof(char));
var1 = "01234567";
You would wonder why they ever bothered to initialise var1 to calloc(8,sizeof(char)); in the first place — and you would be right. There's no reason to. The value returned by calloc is simply forgotten without being used. It could be never calculated in the first place and no one would know the difference. This is exactly equivalent to
char* var1 = "01234567";
... which is a problem, because you cannot modify string literals.
What you probably want is
char *var1 = calloc(8, 1); // note sizeof(char)==1, always
strncpy (var1, "01234567", 8); // note not strcpy — you would need 9 bytes for it
or some variation of that.
var1 = "01234567"; is not correct because you assign a value of pointer to const char to a pointer to mutable char and causes a memory leak because the value of pointer to a calloc allocated buffer of 8 char stored in variable var1 is lost. It seems like you actually intended to initialize allocated array with the value of the string literal instead (though that would require allocation of an array of 9 items). Assignment var1[2] = '\0'; causes undefined behavior because the location var1 points to is not mutable. var2[0] = var1[0]; is wrong as well because you assign a value of char to pointer to char. Finally free(var1); will try to deallocate a pointer to buffer baking string literal, not something you allocated.

Pointer and Function ambiguity in C

Please look at the following code:
char* test ( )
{
char word[20];
printf ("Type a word: ");
scanf ("%s", word);
return word;
}
void main()
{
printf("%s",test());
}
When the function returns, the variable word is destroyed and it prints some garbage value. But when I replace
char word[20];
by char *word;
it prints the correct value. According to me, the pointer variable should have been destroyed similar to the character array and the output should be some garbage value. Can anyone please explain the ambiguity?
Undefined behavior is just that - undefined. Sometimes it will appear to work, but that is just coincidence. In this case, it's possible that the uninitialized pointer just happens to point to valid writeable memory, and that memory is not used for anything else, so it successfully wrote and read the value. This is obviously not something you should count on.
You have undefined behavior either way, but purely from a "what's going on here" viewpoint, there's still some difference between the two.
When you use an array, the data it holds is allocated on the stack. When the function returns, that memory will no longer be part of the stack, and almost certainly will be overwritten in the process of calling printf.
When you use the pointer, your data is going to be written to whatever random location that pointer happens to have pointed at. Though writing there is undefined behavior, simple statistics says that if you have (for example) a 32-bit address space of ~4 billion locations, the chances of hitting one that will be overwritten in the new few instructions is fairly low.
You obviously shouldn't do either one, but the result you got isn't particularly surprising either.
Because the char array is defined and declared in the function, it is a local variable and no longer exists after the function returns. If you use a char pointer and ALLOCATE MEMORY FOR IT then it will remain, and all you need is the pointer (aka a number).
int main(int argc, char* argv[]) {
printf("%s", test());
return 0;
}
char* test(void) {
char* str = (char*)malloc(20 * sizeof(char));
scanf("%19s", str);
return str;
}
Notice how I used %19s instead of %s. Your current function can easily lead to a buffer overflow if a user enters 20 or more characters.
During program execution first it will create activation records for the function main in stack segment of the process memory. In that main activation records it will allocate memory for the local variable of that function(main) and some more memory for internal purpose. In your program main doesn't has any local variable, so it will not allocate any memory for local variables in main activation records.
Then while executing the statement for calling the function test, it will create one more activation records for the calling function(test) and it will allocate 20 bytes for the local variable word.
Once the control exits the function test, activation record created for that function will be poped out of that stack. Then it will continue to execute the remaining statment (printf) of the called function main. Here printf is trying to print the characters in the test function's local variable which is already poped out of the stack. So this behaviour is undefined, sometimes it may print the proper string or else it will print some junk strings.
So in this situation only dynamic memory comes into picture. With the help of dynamic memory we can control the lifetime(or scope) of a variable. So use dynamic memory like below.
char *word = NULL:
word = (char *) malloc(sizeof(char) * 20);
Note : Take care of NULL check for the malloc return value and also dont forget to free the allocated memory after printf in main function.

If referencing constant character strings with pointers, is memory permanently occupied?

I'm trying to understand where things are stored in memory (stack/heap, are there others?) when running a c program. Compiling this gives warning: function return adress of local variable:
char *giveString (void)
{
char string[] = "Test";
return string;
}
int main (void)
{
char *string = giveString ();
printf ("%s\n", string);
}
Running gives various results, it just prints jibberish. I gather from this that the char array called string in giveString() is stored in the stack frame of the giveString() function while it is running. But if I change the type of string in giveString() from char array to char pointer:
char *string = "Test";
I get no warnings, and the program prints out "Test". So does this mean that the character string "Test" is now located on the heap? It certainly doesn't seem to be in the stack frame of giveString() anymore. What exactly is going on in each of these two cases? And if this character string is located on the heap, so all parts of the program can access it through a pointer, will it never be deallocated before the program terminates? Or would the memory space be freed up if there was no pointers pointing to it, like if I hadn't returned the pointer to main? (But that is only possible with a garbage collector like in Java, right?) Is this a special case of heap allocation that is only applicable to pointers to constant character strings (hardcoded strings)?
You seem to be confused about what the following statements do.
char string[] = "Test";
This code means: create an array in the local stack frame of sufficient size and copy the contents of constant string "Test" into it.
char *string = "Test";
This code means: set the pointer to point to constant string "Test".
In both cases, "Test" is in the const or cstring segment of your binary, where non-modifiable data exists. It is neither in the heap nor stack. In the former case, you're making a copy of "Test" that you can modify, but that copy disappears once your function returns. In the latter case, you are merely pointing to it, so you can use it once your function returns, but you can never modify it.
You can think of the actual string "Test" as being global and always there in memory, but the concept of allocation and deallocation is not generally applicable to const data.
No. The string "Test" is still on the stack, it's just in the data portion of the stack which basically gets set up before the program runs. It's there, but you can think of it kind of like "global" data.
The following may clear it up a tad for you:
char string[] = "Test"; // declare a local array, and copy "Test" into it
char* string = "Test"; // declare a local pointer and point it at the "Test"
// string in the data section of the stack
It's because in the second case you are creating a constant string :
char *string = "Test";
The value pointed by string is a constant and can never change, so it's allocated at compile time like a static variable(but it's still stack not heap).

What could be the possible reason behind the warning which comes up when the following piece of code is compiled

This is a simple piece of code which i wrote to check whether it is legitimate to return the address of a local variable and my assumptions were proved correct by the compiler which gives a warning saying the same:
warning: function returns address of local variable
But the correct address is printed when executed... Seems strange!
#include<stdio.h>
char * returnAddress();
main()
{
char *ptr;
ptr = returnAddress();
printf("%p\n",ptr);
}
char * returnAddress()
{
int x;
printf("%p\n",&x);
return &x;
}
The behaviour is undefined.
Anything is allowed to happen when you invoke undefined behaviour - including behaving semi-sanely.
The address of a local variable is returned. It remains an address; it might even be a valid address if you're lucky. What you get if you access the data that it points to is anyone's guess - though you're best off not knowing. If you call another function, the space pointed at could be overwritten by new data.
You should be getting warnings about the conversion between int pointer and char pointer - as well as warnings about returning the address of a local variable.
What you are trying to do is usually dangerous:
In returnAddress() you declare a local, non-static variable i on the stack. Then you return its address which will be invalid once the function returned.
Additionally you try to return a char * while you actually have an int *.
To get rid of the warning caused by returning a pointer to a local var, you could use this code:
void *p = &x;
return p;
Of course printing it is completely harmless but dereferencing (e.g. int x = *ptr;) it would likely crash your program.
However, what you are doing is a great way to break things - other people might not know that you return an invalid pointer that must never be dereferenced.
Yes, the same address is printed both times. Except that, when the address is printed in main(), it no longer points to any valid memory address. (The variable x was created in the stack frame of returnAddress(), which was scrapped when the function returned.)
That's why the warning is generated: Because you now have an address that you must not use.
Because you can access the memory of the local variable, doesn't mean it is a correct thing to do. After the end of a function call, the stack pointer backtracks to its previous position in memory, so you could access the local variables of the function, as they are not erased. But there is no guaranty that such a thing won't fail (like a segmentation fault), or that you won't read garbages.
Which warning? I get a type error (you're returning an int* but the type says char*) and a warning about returning the address of a local variable.
The type error is because the type you've declared for the function is lies (or statistics?).
The second is because that is a crazy thing to do. That address is going to be smack in the middle (or rather, near the top) of the stack. If you use it you'll be stomping on data (or have your data stomped on by subsequent function calls).
Its not strange. The local variables of a function is allocated in the stack of that function. Once the control goes out of the function, the local variables are invalid. You may have the reference to the address but the same space of memory can be replaced by some other values. This is why the behavior is undefined. If you want reference a memory throughout your program, allocate using malloc. This will allocate the memory in heap instead of stack. You can safely reference it until you free the memory explicitly.
#include<stdio.h>
#include<stdlib.h>
char * returnAddress();
main()
{
char *ptr;
ptr = returnAddress();
printf("%p\n",ptr);
}
char * returnAddress()
{
char *x = malloc(sizeof(char));
printf("%p\n",x);
return x;
}

Returning a pointer to an automatic variable

Say you have the following function:
char *getp()
{
char s[] = "hello";
return s;
}
Since the function is returning a pointer to a local variable in the function to be used outside, will it cause a memory leak?
P.S. I am still learning C so my question may be a bit naive...
[Update]
So, if say you want to return a new char[] array (ie maybe for a substring function), what do you return exactly? Should it be pointer to an external variable ? ie a char[] that is not local to the function?
It won't cause a memory leak. It'll cause a dangling reference. The local variable is allocated on the stack and will be freed as soon as it goes out of scope. As a result, when the function ends, the pointer you are returning no longer points to a memory you own. This is not a memory leak (memory leak is when you allocate some memory and don't free it).
[Update]:
To be able to return an array allocated in a function, you should allocate it outside stack (e.g. in the heap) like:
char *test() {
char* arr = malloc(100);
arr[0] = 'M';
return arr;
}
Now, if you don't free the memory in the calling function after you finished using it, you'll have a memory leak.
No, it wont leak, since its destroyed after getp() ends;
It will result in undefined behaviour, because now you have a pointer to a memory area that no longer holds what you think it does, and that can be reused by anyone.
A memory leak would happen if you stored that array on the heap, without executing a call to free().
char* getp(){
char* p = malloc(N);
//do stuff to p
return p;
}
int main(){
char* p = getp();
//free(p) No leak if this line is uncommented
return 0;
}
Here, p is not destroyed because its not in the stack, but in the heap. However, once the program ends, allocated memory has not been released, causing a memory leak ( even though its done once the process dies).
[UPDATE]
If you want to return a new c-string from a function, you have two options.
Store it in the heap (as the example
above or like this real example that returns a duplicated string);
Pass a buffer parameter
for example:
//doesnt exactly answer your update question, but probably a better idea.
size_t foo (const char* str, size_t strleng, char* newstr);
Here, you'd have to allocate memory somewhere for newstr (could be stack OR heap) before calling foo function. In this particular case, it would return the amount of characters in newstr.
It's not a memory leak because the memory is being release properly.
But it is a bug. You have a pointer to unallocated memory. It is called a dangling reference and is a common source of errors in C. The results are undefined. You wont see any problems until run-time when you try to use that pointer.
Auto variables are destroyed at the end of the function call; you can't return a pointer to them. What you're doing could be described as "returning a pointer to the block of memory that used to hold s, but now is unused (but might still have something in it, at least for now) and that will rapidly be filled with something else entirely."
It will not cause memory leak, but it will cause undefined behavior. This case is particularly dangerous because the pointer will point somewhere in the program's stack, and if you use it, you will be accessing random data. Such pointer, when written through, can also be used to compromise program security and make it execute arbitrary code.
No-one else has yet mentioned another way that you can make this construct valid: tell the compiler that you want the array "s" to have "static storage duration" (this means it lives for the life of the program, like a global variable). You do this with the keyword "static":
char *getp()
{
static char s[] = "hello";
return s;
}
Now, the downside of this is that there is now only one instance of s, shared between every invocation of the getp() function. With the function as you've written it, that won't matter. In more complicated cases, it might not do what you want.
PS: The usual kind of local variables have what's called "automatic storage duration", which means that a new instance of the variable is brought into existence when the function is called, and disappears when the function returns. There's a corresponding keyword "auto", but it's implied anyway if you don't use "static", so you almost never see it in real world code.
I've deleted my earlier answer after putting the code in a debugger and watching the disassembly and the memory window.
The code in the question is invalid and returns a reference to stack memory, which will be overwritten.
This slightly different version, however, returns a reference to fixed memory, and works fine:
char *getp()
{
char* s = "hello";
return s;
}
s is a stack variable - it's automatically de-referenced at the end of the function. However, your pointer won't be valid and will refer to an area of memory that could be overwritten at any point.

Resources