If referencing constant character strings with pointers, is memory permanently occupied? - c

I'm trying to understand where things are stored in memory (stack/heap, are there others?) when running a c program. Compiling this gives warning: function return adress of local variable:
char *giveString (void)
{
char string[] = "Test";
return string;
}
int main (void)
{
char *string = giveString ();
printf ("%s\n", string);
}
Running gives various results, it just prints jibberish. I gather from this that the char array called string in giveString() is stored in the stack frame of the giveString() function while it is running. But if I change the type of string in giveString() from char array to char pointer:
char *string = "Test";
I get no warnings, and the program prints out "Test". So does this mean that the character string "Test" is now located on the heap? It certainly doesn't seem to be in the stack frame of giveString() anymore. What exactly is going on in each of these two cases? And if this character string is located on the heap, so all parts of the program can access it through a pointer, will it never be deallocated before the program terminates? Or would the memory space be freed up if there was no pointers pointing to it, like if I hadn't returned the pointer to main? (But that is only possible with a garbage collector like in Java, right?) Is this a special case of heap allocation that is only applicable to pointers to constant character strings (hardcoded strings)?

You seem to be confused about what the following statements do.
char string[] = "Test";
This code means: create an array in the local stack frame of sufficient size and copy the contents of constant string "Test" into it.
char *string = "Test";
This code means: set the pointer to point to constant string "Test".
In both cases, "Test" is in the const or cstring segment of your binary, where non-modifiable data exists. It is neither in the heap nor stack. In the former case, you're making a copy of "Test" that you can modify, but that copy disappears once your function returns. In the latter case, you are merely pointing to it, so you can use it once your function returns, but you can never modify it.
You can think of the actual string "Test" as being global and always there in memory, but the concept of allocation and deallocation is not generally applicable to const data.

No. The string "Test" is still on the stack, it's just in the data portion of the stack which basically gets set up before the program runs. It's there, but you can think of it kind of like "global" data.
The following may clear it up a tad for you:
char string[] = "Test"; // declare a local array, and copy "Test" into it
char* string = "Test"; // declare a local pointer and point it at the "Test"
// string in the data section of the stack

It's because in the second case you are creating a constant string :
char *string = "Test";
The value pointed by string is a constant and can never change, so it's allocated at compile time like a static variable(but it's still stack not heap).

Related

How do I free up the memory consumed by a string literal?

How do I free up all the memory used by a char* after it's no longer useful?
I have some struct
struct information
{
/* code */
char * fileName;
}
I'm obviously going to save a file name in that char*, but after using it some time afterwards, I want to free up the memory it used to take, how do I do this?
E: I didn't mean to free the pointer, but the space pointed by fileName, which will most likely be a string literal.
There are multiple string "types" fileName may point to:
Space returned by malloc, calloc, or realloc. In this case, use free.
A string literal. If you assign info.fileName = "some string", there is no way. The string literal is written in the executable itsself and is usually stored together with the program's code. There is a reason a string literal should be accessed by const char* only and C++ only allows const char*s to point to them.
A string on the stack like char str[] = "some string";. Use curly braces to confine its scope and lifetime like that:
struct information info;
{
char str[] = "some string";
info.fileName = str;
}
printf("%s\n", info.fileName);
The printf call results in undefined behavior since str has already gone out of scope, so the string has already been deallocated.
You could use foo.fileName = malloc(howmanychars); and free(foo.fileName);.
You cannot free the memory if you initialize fileName from a string literal or other non-dynamically allocated way.
But then, freeing a handful of bytes is next to pointless, unless you need a large number of such structs/fileNames. The OS will likely not return the freed memory to other processes; the returned memory may be available for future memory allocations of your process.

When to allocate memory to char *

I am bit confused when to allocate memory to a char * and when to point it to a const string.
Yes, I understand that if I wish to modify the string, I need to allocate it memory.
But in cases when I don't wish to modify the string to which I point and just need to pass the value should I just do the below? What are the disadvantages in the below steps as compared to allocating memory with malloc?
char *str = NULL;
str = "This is a test";
str = "Now I am pointing here";
Let's try again your example with the -Wwrite-strings compiler warning flag, you will see a warning:
warning: initialization discards 'const' qualifier from pointer target type
This is because the type of "This is a test" is const char *, not char *. So you are losing the constness information when you assign the literal address to the pointer.
For historical reasons, compilers will allow you to store string literals which are constants in non-const variables.
This is, however, a bad behavior and I suggest you to use -Wwrite-strings all the time.
If you want to prove it for yourself, try to modify the string:
char *str = "foo";
str[0] = 'a';
This program behavior is undefined but you may see a segmentation fault on many systems.
Running this example with Valgrind, you will see the following:
Process terminating with default action of signal 11 (SIGSEGV)
Bad permissions for mapped region at address 0x4005E4
The problem is that the binary generated by your compiler will store the string literals in a memory location which is read-only. By trying to write in it you cause a segmentation fault.
What is important to understand is that you are dealing here with two different systems:
The C typing system which is something to help you to write correct code and can be easily "muted" (by casting, etc.)
The Kernel memory page permissions which are here to protect your system and which shall always be honored.
Again, for historical reasons, this is a point where 1. and 2. do not agree. Or to be more clear, 1. is much more permissive than 2. (resulting in your program being killed by the kernel).
So don't be fooled by the compiler, the string literals you are declaring are really constant and you cannot do anything about it!
Considering your pointer str read and write is OK.
However, to write correct code, it should be a const char * and not a char *. With the following change, your example is a valid piece of C:
const char *str = "some string";
str = "some other string";
(const char * pointer to a const string)
In this case, the compiler does not emit any warning. What you write and what will be in memory once the code is executed will match.
Note: A const pointer to a const string being const char *const:
const char *const str = "foo";
The rule of thumb is: always be as constant as possible.
If you need to modify the string, use dynamic allocation (malloc() or better, some higher level string manipulation function such as strdup, etc. from the libc), if you don't need to, use a string literal.
If you know that str will always be read-only, why not declare it as such?
char const * str = NULL;
/* OR */
const char * str = NULL;
Well, actually there is one reason why this may be difficult - when you are passing the string to a read-only function that does not declare itself as such. Suppose you are using an external library that declares this function:
int countLettersInString(char c, char * str);
/* returns the number of times `c` occurs in `str`, or -1 if `str` is NULL. */
This function is well-documented and you know that it will not attempt to change the string str - but if you call it with a constant string, your compiler might give you a warning! You know there is nothing dangerous about it, but your compiler does not.
Why? Because as far as the compiler is concerned, maybe this function does try to modify the contents of the string, which would cause your program to crash. Maybe you rely very heavily on this library and there are lots of functions that all behave like this. Then maybe it's easier not to declare the string as const in the first place - but then it's all up to you to make sure you don't try to modify it.
On the other hand, if you are the one writing the countLettersInString function, then simply make sure the compiler knows you won't modify the string by declaring it with const:
int countLettersInString(char c, char const * str);
That way it will accept both constant and non-constant strings without issue.
One disadvantage of using string-literals is that they have length restrictions.
So you should keep in mind from the document ISO/IEC:9899
(emphasis mine)
5.2.4.1 Translation limits
1 The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:
[...]
— 4095 characters in a character string literal or wide string literal (after concatenation)
So If your constant text exceeds this count (What some times throughout may be possible, especially if you write a dynamic webserver in C) you are forbidden to use the string literal approach if you want to stay system independent.
There is no problem in your code as long as you are not planing to modify the contents of that string. Also, the memory for such string literals will remain for the full life time of the program. The memory allocated by malloc is read-write, so you can manipulate the contents of that memory.
If you have a string literal that you do not want to modify, what you are doing is ok:
char *str = NULL;
str = "This is a test";
str = "Now I am pointing here";
Here str a pointer has a memory which it points to. In second line you write to that memory "This is a test" and then again in 3 line you write in that memory "Now I am pointing here". This is legal in C.
You may find it a bit contradicting but you can't modify string that is something like this -
str[0]='X' // will give a problem.
However, if you want to be able to modify it, use it as a buffer to hold a line of input and so on, use malloc:
char *str=malloc(BUFSIZE); // BUFSIZE size what you want to allocate
free(str); // freeing memory
Use malloc() when you don't know the amount of memory needed during compile time.
It is legal in C unfortunately, but any attempt to modify the string literal via the pointer will result in undefined behavior.
Say
str[0] = 'Y'; //No compiler error, undefined behavior
It will run fine, but you may get a warning by the compiler, because you are pointing to a constant string.
P.S.: It will run OK only when you are not modifying it. So the only disadvantage of not using malloc is that you won't be able to modify it.

Pointer scopes in C

In the following code, the explanation for the failure to print anything is that the pointer returned by get_message() is out of scope:
char *get_message() {
char msg [] = "Aren’t pointers fun?";
return msg ;
}
int main (void) {
char *foo = get_message();
puts(foo);
return 0;
}
When run in gdb, it turns out that the data at the position of foo is the string "Aren't pointers fun?":
Old value = 0x0
New value = 0x7fffffffde60 "Aren’t pointers fun?"
(This seems consistent with answers which states that the data for a pointer which passes out of scope remains in memory), but the documentation for "puts" states first data is copied from the address given: presumably 0x7fffffffde60 in this case.
Therefore: why is nothing output?
EDIT: Thanks for your answers:
I ran the original code to completion in gdb, the call to puts does indeed change the data at the address where foo was stored.
(gdb) p foo
$1 = 0x7fffffffde60 "Aren’t pointers fun?"
(gdb) n
11 return 0;
(gdb) p foo
$2 = 0x7fffffffde60 "`\336\377\377\377\177"
Interestingly, the code did print the message when I changed the code for change_msg() to:
char *get_message() {
char *msg = "Aren’t pointers fun?";
return msg ;
}
In this case, the data at foo (address 0x4005f4 - does the smaller size of the address mean anything?) remains the same throughout the code. It'd be cool to find out why this changes the behaviour
The variable msg is allocated on the stack of get_message()
char msg [] = "Aren’t pointers fun?";
Once get_message() returns, the stack for that method is torn down. There is no guarantee at that point of what is in the memory that the pointer returned to foo now points to.
When puts() is called, the stack is likely modified, overwriting "Aren't pointer's fun."
It is likely that calling puts modifies the stack and overwrites the string.
Just returning from get_message leaves the string unchanged, but deallocated, i.e. its memory space is available for reuse.
The real question here is not, "why doesn't it work?". The question is, "Why does the string seem to exist even after the return from get_message, but then still not work?"
To clarify, let's look at the main function again, with two comments for reference:
int main (void) {
char *foo = get_message();
/* point A */
puts(foo);
/* point B */
return 0;
}
I just compiled and ran this under gdb. Indeed, at point A, when I printed out the value of the variable foo in gdb, gdb showed me that it pointed to the string "Aren’t pointers fun?". But then, puts failed to print that string. And then, at point B, if I again printed out foo in gdb, it was no longer the string it had been.
The explanation, as several earlier commenters have explained, is that function get_message leaves the string on the stack, where it's not guaranteed to stay for long. After get_message returns, and before anything else has been called, it's still there. But when we call puts, and puts begins working, it's using that same portion of the stack for its own local storage, so sometime in there (and before puts manages to actually print the string), the string gets destroyed.
In response to the OP's follow-on question: When we had
char *get_message() {
char msg [] = "Aren’t pointers fun?";
return msg ;
}
the string lives in the array msg which is on the stack, and we return a pointer to that array, which doesn't work because the data in the array eventually disappears. If we change it to
char * msg = "Aren’t pointers fun?";
(such a tiny-seeming change!), now the string is stored in the program's initialized data segment, and we return a pointer to that, and since it's in the program's initialized data segment, it sticks around essentially forever. (And yes, the fact that get_message ends up returning a different-looking address is significant, although I wouldn't read too much into whether it's lower or higher.)
The bottom line is that arrays and pointers are different. Hugely hugely different. The line
char arr[] = "Hello, world!";
bears almost no relation to the very similar-looking line
char *ptr = "Hello, world!";
Now, they're the same in that you can do both
printf("%s\n", arr);
and
printf("%s\n", ptr);
But if you try to say
arr = "Goodbye"; /* WRONG */
you can't, because you can't assign to an array. If you want a new string here, you have to use strcpy, and you have to make sure that the new string is the same length or shorter:
strcpy(arr, "Goodbye");
But if you try the strcpy thing with the pointer:
strcpy(ptr, "Goodbye"); /* WRONG */
now that doesn't work, because the string constant that ptr points is nonwritable. In the pointer case, you can (and often must) use simple assignment:
ptr = "Goodbye";
and in this case there's no problem setting it to a longer string, too:
ptr = "Supercalafragalisticexpialadocious";
Those are the basic differences, but as this question points out, another big difference is that the array arr can't be usefully declared in and returned from a function (unless you make it static), while the pointer ptr can.
The lifetime of msg ends when returning from the function get_message. The returned pointer points to the object whose lifetime has ended.
Accessing it yields undefined behaviour. Anything can happen.
In your case, the memory of the former msg seems to be overwritten with 0 already.
And this is not about "scope". You can fix your code by making msg static. This does not change the scope but its lifetime (a.k.a. storage duration).
In your getMessage function, the memory used by your message is on the stack and not on the heap. Its still a pointer, just to a location on the stack. Once the function returns, the stack altered (to get the return ip etc) when means, although the message MIGHT still be in the same location in memory, there is absolutely no guarantee. If anything else puts something on to the stack (such as another function call) then most likely it will be overridden. Your message is gone.
The better approach would be to allocate the memory dynamically with malloc to make certain the string in on the heap (although this leads to the problem of who owns the pointer and is responsible for freeing it.)
If you must do something like this, I have seen it done using static:
static char * message = "I love static pointers";
Edit: despite mentioning that is MIGHT still be on the stack, NEVER EVER ASSUME it is. Most languages won't even allow this.

Regarding initializing a string as an array

I'm told that when initializing a string like so
char str[] = "Hello world!";
The compiler will allocate an area in constants memory(read only for the program) and then copy the string to the array which resides in the stack. My question is, can I read or point to the original string after modifying the copy I'm given, and how? And if not, why does the string even exist outside of the stack in the first place?
It's done this way for space efficiency. When you write:
char str[] = "Hello world!";
it's compiled effectively as if you'd written:
static char str_init[] = "Hello world!";
char str[13];
strncpy(str, str_init, 13);
An alternative way to implement this might be equivalent to:
char str[13];
str[0] = 'H';
str[1] = 'e';
...
str[11] = '!';
str[12] = 0;
But for long strings, this is very inefficient. Instead of 1 byte of static data for each character of the string, it will use a full word of instruction (probably 4 bytes, but maybe more on some architectures) for each character. This will quadruple the size of the initialization data unnecessarily.
Because the program has to remember the string somewhere, i.e., your so-called "constant memory". Otherwise how can it know what values to assign when allocating the variable? Think about a variable with a given initial value. The variable is not allocated until declared. But the initial value must be stored somewhere else.
When this statement is compiled
char str[] = "Hello world!";
the compiler does not keep the string literal in the program. it is used only to initialize the array.
If you want to keep the string literal then you have to write the following way
char *s = "Hello world!";
char str[13];
strcpy( str, s );
When the program runs, "Hello world" will be stored in the constant part of the memory as a string literal, after that, the program will reserve enough space in the stack and copy character by character from the constant part of the memory. Unfortunately, you don't have access to the constant part that stores the string literal because you are telling the program that you want the values to be modifiable (string stored in stack), so it gives what you asked.
Most of your question has been addressed in the other answers, However, I did not see anyone address this one specifically:
Regarding your question: ...can I read or point to the original string after modifying the copy I'm given, and how?
The following sequence demonstrates how you can read the original after modifying a copy:
char str[] = "hello world"; //creates original (stack memory)
char *str2 = 0;//create a pointer (pointer created, no memory allocated)
str2 = StrDup(str); populate pointer with original (memory allocated on heap)
str2[5]=0; //edit copy: results in "hello" (i.e. modified) (modifying a location on the heap)
str; //still contains "hello world" (viewing value on the stack)
EDIT (answering comment question)
The answer above only addressed the specific question about accessing an original string after a copy has been modified. I just showed one possible set of steps to address that. You can edit the original string too:
char str[] = "Hello world!"; //creates location in stack memory called "str",
//and assigns space enough for literal string:
//"Hello world!", 13 spaces in all (including the \0)
strcpy(str, "new string"); //replaces original contents with "new string"
//old contents are no longer available.
So, using these steps, the original values in the variable str are changed, and are no longer available.
The method I outline in my original answer, (at top) shows a way whereby you can make an editable copy, while maintaining the original variable.
In your comment question, you are referring to things such as system memory and constant memory. Normally, system memory refers to RAM implementations on a system (i.e. how much physical memory). By constant memory, my guess is that you are referring to memory used by variables created on the stack. (read on)
First In a development, or run-time environment, there is stack memory. This is usually defaulted to some maximum value, such as 250,000 bytes perhaps. It is a pre-build settable value in most development environments, and is available for use by any variable you create on the stack. Example:
int x[10]; //creates a variable on the stack
//using enough memory space for 10 integers.
int y = 1; //same here, except uses memory for only 1 integer value
Second There is also what is referred to a heap memory. The amount of heap memory is system dependent, the more physical memory your system has available, the more heap memory you can use for variable memory space in your application. Heap memory is used when you dynamically allocate memory, for example using malloc(), calloc(), realloc().
int *x=0; //creates a pointer, no memory allocation yet...
x = malloc(10); //allocates enough memory for 10 integers, but the
//memory allocated is from the _heap_
//and must be freed for use by the system
//when you are done with it.
free(x);
I have marked the original post (above) with indications showing what type of memory each variable is using. I hope this helps.

C Duration of strings, constants, compound literals, and why not, the code itself

I didn't remember where I read, that If I pass a string to a function like.
char *string;
string = func ("heyapple!");
char *func (char *string) {
char *p
p = string;
return p;
}
printf ("%s\n", string);
The string pointer continue to be valid because the "heyapple!" is in memory, it IS in the code the I wrote, so it never will be take off, right?
And about constants like 1, 2.10, 'a'?
And compound literals?
like If I do it:
func (1, 'a', "string");
Only the string will be all of my program execution, or the constans will be too?
For example I learned that I can take the address of string doing it
&"string";
Can I take the address of the constants literals? like 1, 2.10, 'a'?
I'm passing theses to functions arguments and it need to have static duration like strings without the word static.
Thanks a lot.
This doesn't make a whole lot of sense.
Values that are not pointers cannot be "freed", they are values, they can't go away.
If I do:
int c = 1;
The variable 'c' is not a pointer, it cannot do anything else than contain an integer value, to be more specific it can't NOT contain an integer value. That's all it does, there are no alternatives.
In practice, the literals will be compiled into the generated machine-code, so that somewhere in the code resulting from the above will be something like
load r0, 1
Or whatever the assembler for the underlying instruction set looks like. The '1' is a part of the instruction encoding, it can't go away.
Make sure you distinguish between values and pointers to memory. Pointers are themselves values, but a special kind of value that contains an address to memory.
With char* hello = "hello";, there are two things happening:
the string "hello" and a null-terminator are written somewhere in memory
a variable named hello contains a value which is the address to that memory
With int i = 0; only one thing happens:
a variable named i contains the value 0
When you pass around variables to functions their values are always copied. This is called pass by value and works fine for primitive types like int, double, etc. With pointers this is tricky because only the address is copied; you have to make sure that the contents of that address remain valid.
Short answer: yes. 1 and 'a' stick around due to pass by value semantics and "hello" sticks around due to string literal allocation.
Stuff like 1, 'a', and "heyapple!" are called literals, and they get stored in the compiled code, and in memory for when they have to be used. If they remain or not in memory for the duration of the program depends on where they are declared in the program, their size, and the compiler's characteristics, but you can generally assume that yes, they are stored somewhere in memory, and that they don't go away.
Note that, depending on the compiler and OS, it may be possible to change the value of literals, inadvertently or purposely. Many systems store literals in read-only areas (CONST sections) of memory to avoid nasty and hard-to-debug accidents.
For literals that fit into a memory word, like ints and chars it doesn't matter how they are stored: one repeats the literal throughout the code and lets the compiler decide how to make it available. For larger literals, like strings and structures, it would be bad practice to repeat, so a reference should be kept.
Note that if you use macros (#define HELLO "Hello!") it is up to the compiler to decide how many copies of the literal to store, because macro expansion is exactly that, a substitution of macros for their expansion that happens before the compiler takes a shot at the source code. If you want to make sure that only one copy exists, then you must write something like:
#define HELLO "Hello!"
char* hello = HELLO;
Which is equivalent to:
char* hello = "Hello!";
Also note that a declaration like:
const char* hello = "Hello!";
Keeps hello immutable, but not necessarily the memory it points to, because of:
char h = (char) hello;
h[3] = 'n';
I don't know if this case is defined in the C reference, but I would not rely on it:
char* hello = "Hello!";
char* hello2 = "Hello!"; // is it the same memory?
It is better to think of literals as unique and constant, and treat them accordingly in the code.
If you do want to modify a copy of a literal, use arrays instead of pointers, so it's guaranteed a different copy of the literal (and not an alias) is used each time:
char hello[] = "Hello!";
Back to your original question, the memory for the literal "heyapple!" will be available (will be referenceable) as long as a reference is kept to it in the running code. Keeping a whole module (a loadable library) in memory because of a literal may have consequences on overall memory use, but that's another concern (you could also force the unloading of the module that defines the literal and get all kind of strange results).
First,it IS in the code the I wrote, so it never will be take off, right? my answer is yes. I recommend you to have a look at the structure of ELF or runtime structure of executable. The position that the string literal stored is implementation dependent, in gcc, string literal is store in the .rdata segment. As the name implies, the .rdata is read-only. In your code
char *p
p = string;
the pointer p now point to an address in a readonly segment, so even after the end of function call, that address is still valid. But if you try to return a pointer point to a local variable then it is dangerous and may cause hard-to-find bugs:
int *func () {
int localVal = 100;
int *ptr = localVal;
return p;
}
int val = func ();
printf ("%d\n", val);
after the execution of func, as the stack space of func is retrieve by the c runtime, the memory address where localVal was stored will no longer guarantee to hold the original localVal value. It can be overidden by operation following the func.
Back to your question title
-
string literal have static duration.
As for "And about constants like 1, 2.10, 'a'?"
my answer is NO, your can't get address of a integer literal using &1. You may be confused by the name 'integer constant', but 1,2.10,'a' is not right value ! They do not identify a memory place,thus, they don't have duration, a variable contain their value can have duration
compound literals, well, I am not sure about this.

Resources