Today something strange came to my mind. When I want to hold some string in C (C++) the old way, without using string header, I just create array and store that string into it. But, I read that any variable definition in C in local scope of function ends up in pushing these values onto the stack.
So, the string is actually 2* bigger than needed. Because first, the push instructions are located in memory, but then when they are executed (pushed onto the stack) another "copy" of the string is created. First the push instructions, than the stack space is used for one string.
So, why is it this way? Why doesn't compiler just add the string (or other variables) to the program instead of creating them once again when executed? Yes, I know you cannot just have some data inside program block, but it could just be attached to the end of the program, with some jump instruction before. And than, we would just point to these data? Because they are stored in RAM when the program is executed.
Thanks.
There are a couple of ways of dealing with static strings in C and C++:
char string[] = "Contents of the string";
char const *string2 = "Contents of another string";
If you do these inside a function, the first creates a string on the stack, about like you described. The second just creates a pointer to a statically string that's embedded into the executable, about like you imply that you want.
A very good question indeed. You do know that using static keyword on a variable (definition) EDIT: declaration does just what you described, right?
As far as locals are concerned, performance optimization is the key. A local variable cannot be accessed outside the scope of a function. Why then would the compiler try to persist memory for it outside of the stack?
It is not how it works. Nothing gets "pushed", the compiler simply reserves space in the stack frame. You cannot return such a string from the function, you'll return a pointer to a dead stack frame. Any subsequent function call will destroy the string.
Return strings by letting the caller pass a pointer to a buffer, as well as an argument that says how large the buffer is so you won't overrun the end of the buffer when the string is too long.
If you have:
extern void some_function(char * s, int l);
void do_it(void) {
char str[] = "I'm doing it!";
some_function(str, sizeof(str) );
}
This would turn into something like (in psudo asm for a made up processor):
.data
local .do_it.str ; The contents of str are stored in a static variable
.text ; text is where code lives within the executable or object file
do_it:
subtract (sizeof(str)) from stack_pointer ; This reserves the space for str, sizeof(str)
; and a pointer to str on the stack
copy (sizeof(str)) bytes from .do_it.str to [stack_pointer+0] ; this loads the local variable
; using to the memory at the top of the stack
; This copy can be a function call or inline code.
push sizeof(str) ; push second argument first
push stack_pointer+4 ; assuming that we have 4 byte integers,
; this is the memory just on the other side of where we pushed
; sizeof(str), which is where str[0] ended up
call some_function
add (sizeof(str)+8) to stack_pointer ; reclaim the memory used by str, sizeof(str),
; and the pointer to str from the stack
return
From this you can see that your assumption about how the local variable str is created aren't completely correct, but this still is not necessarily as efficient as it could be.
If you did
void do_it(void) {
static str[] = "I'm doing it!";
Then the compiler would not reserve the space on the stack for the string and then copy it onto the stack. If some_function were to alter the contents of str then the next (or a concurrent) call to do_it (in the same process) would be using the altered version of str.
If some_function had been declared as:
extern void some_function(const char * s, int l);
Then, since the compiler can see that there are no operations that change str within do_it it could also get away with not making a local copy of str on the stack even if str were not declared static.
Related
I have a global variable declared char* global=NULL and a function where I parse a separate string but copy its value into the global variable. Essentially copying the second word in the string into the global variable. And at the end of this function in the last print statement it prints out the global variable correctly, which again contains the second word of the string.
void parsestring(char* s, char** ssp){
int i;
char st[30];
strcpy(st,s);
printf("String st after copy is %s",st);
char* first=strtok(st," ");
char* second=strtok(NULL," ");
printf("\nstring s at the end is %s\n",s);
global=second;
printf("\nsecond after assigning to blobal is is %s\n",second);
printf("\n global at end of parse function is %s",global);
}
But when I call this global variable from a different function and test print its value, it only prints part of the string, not the entirety of it like it does at the end of the above function.
I know what i'm doing isn't optimal but I don't get why the global variable changes when called later in my program.
in your function you assign global to point into a buffer on the stack.
when the function exists, st is no longer yours.
change global to be a buffer by itself.
No, you cannot do that safely:
char st[30];
char * second=strtok(NULL," ");
global = second;
The first line above creates a local variable in the function (which you copy some string into).
The second line gives you an address within that string, and the third line assigns that address to your global variable.
Unfortunately for you, the st object goes out of scope at the end of the function so you are not allowed to use it by, for example, dereferencing an address that points into it.
What's probably happening (although this is irrelevant) is that the area of the stack where st is stored is being reused somehow, "damaging" the string at which the global variable points.
I say "irrelevant" since the word "stack" appears approximately zero times in the ISO C standard. The basic rule is that you need to follow what the standard says, or bad things may happen. What those bad things are is dependent totally on the implementation :-)
In terms of fixing it, you basically need to make sure the information goes somewhere that won't go out of scope before you need to use it. That could be, for example, making global a character array (much like st) rather than a pointer, and strcpying to it.
Or, having your function allocate memory that will survive function exit, such as with (see here if your implementation doesn't provide a strdup):
global = strdup(second);
Just remember to free that memory once you're done with it.
My question is if i have some function
void func1(){
char * s = "hello";
char * c;
int b;
c = (char *) malloc(15);
strcpy(c,s);
}
I think the s pointer is allocated on the stack but where is the data "hello" stored does that go in the data segment of the program? As for c and b they are unitialized and since 'c = some memory address' and it doesnt have one yet how does that work? and b also has no contents so it cant stored on the stack?
Then when we allocate memory for c on the heap with malloc c now has some memory address, how is this unitialized c variable given the address of the first byte for that string on the heap?
We need to consider what memory location a variable has and what its contents are. Keep this in mind.
For an int, the variable has a memory address and has a number as its contents.
For a char pointer, the variable has a memory address and its contents is a pointer to a string--the actual string data is at another memory location.
To understand this, we need to consider two things:(1) the memory layout of a program
(2) the memory layout of a function when it's been called
Program layout [typical]. Lower memory address to higher memory address:code segment -- where instructions go:
...
machine instructions for func1
...
data segment -- where initialized global variables and constants go:
...
int myglobal_inited = 23;
...
"hello"
...
bss segment -- for unitialized globals:
...
int myglobal_tbd;
...
heap segment -- where malloc data is stored (grows upward towards higher memory
addresses):
...
stack segment -- starts at top memory address and grows downward toward end
of heap
Now here's a stack frame for a function. It will be within the stack segment somewhere. Note, this is higher memory address to lower:function arguments [if any]:
arg2
arg1
arg0
function's return address [where it will go when it returns]
function's stack/local variables:
char *s
char *c
int b
char buf[20]
Note that I've added a "buf". If we changed func1 to return a string pointer (e.g. "char *func1(arg0,arg1,arg2)" and we added "strcpy(buf,c)" or "strcpy(buf,c)" buf would be usable by func1. func1 could return either c or s, but not buf.
That's because with "c" the data is stored in the data segment and persists after func1 returns. Likewise, s can be returned because the data is in the heap segment.
But, buf would not work (e.g. return buf) because the data is stored in func1's stack frame and that is popped off the stack when func1 returns [meaning it would appear as garbage to caller]. In other words, data in the stack frame of a given function is available to it and any function that it may call [and so on ...]. But, this stack frame is not available to a caller of that function. That is, the stack frame data only "persists" for the lifetime of the called function.
Here's the fully adjusted sample program:
int myglobal_initialized = 23;
int myglobal_tbd;
char *
func1(int arg0,int arg1,int arg2)
{
char *s = "hello";
char *c;
int b;
char buf[20];
char *ret;
c = malloc(15);
strcpy(c,s);
strcpy(buf,s);
// ret can be c, s, but _not_ buf
ret = ...;
return ret;
}
Let's divide this answer in two points of view of the same stuff, because the standards only complicate understanding of this topic, but they're standards anyway :).
Subject common to both parts
void func1() {
char *s = "hello";
char *c;
int b;
c = (char*)malloc(15);
strcpy(c, s);
}
Part I: From a standardese point of view
According to the standards, there's this useful concept known as automatic variable duration, in which a variable's space is reserved automatically upon entering a given scope (with unitialized values, a.k.a: garbage!), it may be set/accessed or not during such a scope, and such a space is freed for future use. Note: In C++, this also involves construction and destruction of objects.
So, in your example, you have three automatic variables:
char *s, which gets initialized to whatever the address of "hello" happens to be.
char *c, which holds garbage until it's initialized by a later assignment.
int b, which holds garbage all of its lifetime.
BTW, how storage works with functions is unspecified by the standards.
Part II: From a real-world point of view
On any decent computer architecture you will find a data structure known as the stack. The stack's purpose is to hold space that can be used and recycled by automatic variables, as well as some space for some stuff needed for recursion/function calling, and can serve as a place to hold temporary values (for optimization purposes) if the compiler decides to.
The stack works in a PUSH/POP fashion, that is, the stack grows downwards. Let my explain it a little better. Imagine an empty stack like this:
[Top of the Stack]
[Bottom of the Stack]
If you, for example, PUSH an int of value 5, you get:
[Top of the Stack]
5
[Bottom of the Stack]
Then, if you PUSH -2:
[Top of the Stack]
5
-2
[Bottom of the Stack]
And, if you POP, you retrieve -2, and the stack looks as before -2 was PUSHed.
The bottom of the stack is a barrier that can be moved uppon PUSHing and POPing. On most architectures, the bottom of the stack is recorded by a processor register known as the stack pointer. Think of it as a unsigned char*. You can decrease it, increase it, do pointer arithmetic on it, etcetera. Everything with the sole purpose to do black magic on the stack's contents.
Reserving (space for) automatic variables in the stack is done by decreasing it (remember, it grows downwards), and releasing them is done by increasing it. Basing us on this, the previous theoretical PUSH -2 is shorthand to something like this in pseudo-assembly:
SUB %SP, $4 # Subtract sizeof(int) from the stack pointer
MOV $-2, (%SP) # Copy the value `-2` to the address pointed by the stack pointer
POP whereToPop is merely the inverse
MOV (%SP), whereToPop # Get the value
ADD %SP, $4 # Free the space
Now, compiling func1() may yield the following pseudo-assembly (Note: you are not expected to understand this at its fullest):
.rodata # Read-only data goes here!
.STR0 = "hello" # The string literal goes here
.text # Code goes here!
func1:
SUB %SP, $12 # sizeof(char*) + sizeof(char*) + sizeof(int)
LEA .STR0, (%SP) # Copy the address (LEA, load effective address) of `.STR0` (the string literal) into the first 4-byte space in the stack (a.k.a `char *s`)
PUSH $15 # Pass argument to `malloc()` (note: arguments are pushed last to first)
CALL malloc
ADD %SP, 4 # The caller cleans up the stack/pops arguments
MOV %RV, 4(%SP) # Move the return value of `malloc()` (%RV) to the second 4-byte variable allocated (`4(%SP)`, a.k.a `char *c`)
PUSH (%SP) # Second argument to `strcpy()`
PUSH 4(%SP) # First argument to `strcpy()`
CALL strcpy
RET # Return with no value
I hope this has led some light on you!
In the following code, the explanation for the failure to print anything is that the pointer returned by get_message() is out of scope:
char *get_message() {
char msg [] = "Aren’t pointers fun?";
return msg ;
}
int main (void) {
char *foo = get_message();
puts(foo);
return 0;
}
When run in gdb, it turns out that the data at the position of foo is the string "Aren't pointers fun?":
Old value = 0x0
New value = 0x7fffffffde60 "Aren’t pointers fun?"
(This seems consistent with answers which states that the data for a pointer which passes out of scope remains in memory), but the documentation for "puts" states first data is copied from the address given: presumably 0x7fffffffde60 in this case.
Therefore: why is nothing output?
EDIT: Thanks for your answers:
I ran the original code to completion in gdb, the call to puts does indeed change the data at the address where foo was stored.
(gdb) p foo
$1 = 0x7fffffffde60 "Aren’t pointers fun?"
(gdb) n
11 return 0;
(gdb) p foo
$2 = 0x7fffffffde60 "`\336\377\377\377\177"
Interestingly, the code did print the message when I changed the code for change_msg() to:
char *get_message() {
char *msg = "Aren’t pointers fun?";
return msg ;
}
In this case, the data at foo (address 0x4005f4 - does the smaller size of the address mean anything?) remains the same throughout the code. It'd be cool to find out why this changes the behaviour
The variable msg is allocated on the stack of get_message()
char msg [] = "Aren’t pointers fun?";
Once get_message() returns, the stack for that method is torn down. There is no guarantee at that point of what is in the memory that the pointer returned to foo now points to.
When puts() is called, the stack is likely modified, overwriting "Aren't pointer's fun."
It is likely that calling puts modifies the stack and overwrites the string.
Just returning from get_message leaves the string unchanged, but deallocated, i.e. its memory space is available for reuse.
The real question here is not, "why doesn't it work?". The question is, "Why does the string seem to exist even after the return from get_message, but then still not work?"
To clarify, let's look at the main function again, with two comments for reference:
int main (void) {
char *foo = get_message();
/* point A */
puts(foo);
/* point B */
return 0;
}
I just compiled and ran this under gdb. Indeed, at point A, when I printed out the value of the variable foo in gdb, gdb showed me that it pointed to the string "Aren’t pointers fun?". But then, puts failed to print that string. And then, at point B, if I again printed out foo in gdb, it was no longer the string it had been.
The explanation, as several earlier commenters have explained, is that function get_message leaves the string on the stack, where it's not guaranteed to stay for long. After get_message returns, and before anything else has been called, it's still there. But when we call puts, and puts begins working, it's using that same portion of the stack for its own local storage, so sometime in there (and before puts manages to actually print the string), the string gets destroyed.
In response to the OP's follow-on question: When we had
char *get_message() {
char msg [] = "Aren’t pointers fun?";
return msg ;
}
the string lives in the array msg which is on the stack, and we return a pointer to that array, which doesn't work because the data in the array eventually disappears. If we change it to
char * msg = "Aren’t pointers fun?";
(such a tiny-seeming change!), now the string is stored in the program's initialized data segment, and we return a pointer to that, and since it's in the program's initialized data segment, it sticks around essentially forever. (And yes, the fact that get_message ends up returning a different-looking address is significant, although I wouldn't read too much into whether it's lower or higher.)
The bottom line is that arrays and pointers are different. Hugely hugely different. The line
char arr[] = "Hello, world!";
bears almost no relation to the very similar-looking line
char *ptr = "Hello, world!";
Now, they're the same in that you can do both
printf("%s\n", arr);
and
printf("%s\n", ptr);
But if you try to say
arr = "Goodbye"; /* WRONG */
you can't, because you can't assign to an array. If you want a new string here, you have to use strcpy, and you have to make sure that the new string is the same length or shorter:
strcpy(arr, "Goodbye");
But if you try the strcpy thing with the pointer:
strcpy(ptr, "Goodbye"); /* WRONG */
now that doesn't work, because the string constant that ptr points is nonwritable. In the pointer case, you can (and often must) use simple assignment:
ptr = "Goodbye";
and in this case there's no problem setting it to a longer string, too:
ptr = "Supercalafragalisticexpialadocious";
Those are the basic differences, but as this question points out, another big difference is that the array arr can't be usefully declared in and returned from a function (unless you make it static), while the pointer ptr can.
The lifetime of msg ends when returning from the function get_message. The returned pointer points to the object whose lifetime has ended.
Accessing it yields undefined behaviour. Anything can happen.
In your case, the memory of the former msg seems to be overwritten with 0 already.
And this is not about "scope". You can fix your code by making msg static. This does not change the scope but its lifetime (a.k.a. storage duration).
In your getMessage function, the memory used by your message is on the stack and not on the heap. Its still a pointer, just to a location on the stack. Once the function returns, the stack altered (to get the return ip etc) when means, although the message MIGHT still be in the same location in memory, there is absolutely no guarantee. If anything else puts something on to the stack (such as another function call) then most likely it will be overridden. Your message is gone.
The better approach would be to allocate the memory dynamically with malloc to make certain the string in on the heap (although this leads to the problem of who owns the pointer and is responsible for freeing it.)
If you must do something like this, I have seen it done using static:
static char * message = "I love static pointers";
Edit: despite mentioning that is MIGHT still be on the stack, NEVER EVER ASSUME it is. Most languages won't even allow this.
This question already has answers here:
Can a local variable's memory be accessed outside its scope?
(20 answers)
Closed 9 years ago.
The following code:
#include <stdio.h>
#include "string.h"
int main() {
char *s ;
char *fun() ;
s = fun() ;
printf ("%s",s) ;
return 0;
}
char *fun() {
char buffer[30] ;
strcpy ( buffer, "RAM - Rarely Adequate Memory" ) ;
return ( buffer ) ;
}
Gives unexpected results whenever the size of buffer is changed and does not give the required answer.
By making char buffer[30] static the code prints "RAM-Rarely Adequate Memory" which is right.
How does static make a difference?
(automatic) local variables are destroyed as soon as you leave the block where they were declared. Keeping a reference to them is just as wrong as keeping a pointer to a deallocated memory block.
static (local) variables have the same lifespan as the program that declares them.
Implementation-wise, a static variable is allocated in the same space as the global variables.
Semantically, it is still only visible inside the block where it was declared, but it retains its value outside its declaration scope.
Beware, though. Retaining a reference to a static variable might have unpleasant side effects.
For instance, each call to your function will reset the string to its initial value, so the callers should make their own copies of it if they don't want any other piece of the code to be able to mess with the value they got from your function.
You're returning a local variable, which is triggering undefined behaviour. char buffer[30] is a stack variable, so when the function exits, it goes out of scope, and is cleaned up.
Making it static means it is does not go out of scope when the function exits, and thus works correctly.
In C, one would usually fix this by passing in a buffer to write to:
void fun(char *buffer, size_t len)
{
// Write some stuff into buffer
}
In C++, use std::string.
Without static, the variable buffer passes out of scope as soon as the function fun terminates. The main routine is then left holding a pointer to a defunct variable, and dereferencing it causes Undefined Behavior.
buffer is a local variable, wich is destroyed one time fun ends.
if buffer is static, is always in the program.
if you want that fun return an array, you can use dynamic memory.
#include<stdio.h>
#include"string.h"
int main( ){
char *s;
char *fun();
s = fun();
printf ("%s",s);
// in c pure
//free(s);
// in c++
delete s;
}
char *fun(){
// in c pure
//char * buffer = (char*) malloc(30*sizeof(char));
// in c++
char *buffer = new char[30];
strcpy ( buffer, "RAM - Rarely Adequate Memory" );
return ( buffer ) ;
}
First, pointer to local variable should not be returned.
Second, auto local variable will disappear as soon as the code block (function) exits. But static local variable will stay there till the whole program is ended. For the same reason, you can access that static local variable next time you get into that code block.
The buffer without static is placed on the processor's stack. When fun ends, the stack memory that was used for this function is freed.
But this freeing does not actually change the contents of these used stack memory locations. This means that the string you copied to buffer will initially still be there.
As the program progresses, this freed stack space (which was first used by your buffer), will get used for other stuff. But it's hard to tell when this will happen, as it all depends on how the compiler planned to use stack space.
In theory, your program could have nicely printed out RAM - Rarely Adequate Memory; when this part of the stack were not overwritten. So actually you are lucky, as you could have had a latent bug that could bite at a later stage (for example after you started adding code to main(), or changed compiler options for example.)
I'm trying to understand where things are stored in memory (stack/heap, are there others?) when running a c program. Compiling this gives warning: function return adress of local variable:
char *giveString (void)
{
char string[] = "Test";
return string;
}
int main (void)
{
char *string = giveString ();
printf ("%s\n", string);
}
Running gives various results, it just prints jibberish. I gather from this that the char array called string in giveString() is stored in the stack frame of the giveString() function while it is running. But if I change the type of string in giveString() from char array to char pointer:
char *string = "Test";
I get no warnings, and the program prints out "Test". So does this mean that the character string "Test" is now located on the heap? It certainly doesn't seem to be in the stack frame of giveString() anymore. What exactly is going on in each of these two cases? And if this character string is located on the heap, so all parts of the program can access it through a pointer, will it never be deallocated before the program terminates? Or would the memory space be freed up if there was no pointers pointing to it, like if I hadn't returned the pointer to main? (But that is only possible with a garbage collector like in Java, right?) Is this a special case of heap allocation that is only applicable to pointers to constant character strings (hardcoded strings)?
You seem to be confused about what the following statements do.
char string[] = "Test";
This code means: create an array in the local stack frame of sufficient size and copy the contents of constant string "Test" into it.
char *string = "Test";
This code means: set the pointer to point to constant string "Test".
In both cases, "Test" is in the const or cstring segment of your binary, where non-modifiable data exists. It is neither in the heap nor stack. In the former case, you're making a copy of "Test" that you can modify, but that copy disappears once your function returns. In the latter case, you are merely pointing to it, so you can use it once your function returns, but you can never modify it.
You can think of the actual string "Test" as being global and always there in memory, but the concept of allocation and deallocation is not generally applicable to const data.
No. The string "Test" is still on the stack, it's just in the data portion of the stack which basically gets set up before the program runs. It's there, but you can think of it kind of like "global" data.
The following may clear it up a tad for you:
char string[] = "Test"; // declare a local array, and copy "Test" into it
char* string = "Test"; // declare a local pointer and point it at the "Test"
// string in the data section of the stack
It's because in the second case you are creating a constant string :
char *string = "Test";
The value pointed by string is a constant and can never change, so it's allocated at compile time like a static variable(but it's still stack not heap).