memory allocation of string literal strcpy - c

int main()
{
char *s;
strcpy(s,"here");
return 0;
}
In the code above I guess the memory for the string literal is assigned in a global space.Which section does it actually go to and when ? Does the compiler go through and assign it in the program space ?? Also if i initalise another string with same string literal i.e ( char *k = "here"; ) will it be pointing to the same memory location.
I am trying to think since I cannot free this location, do I run into any trouble if I have lot of string initialisations in my code. I guess the only thing I should be worried about is the compiler output being too big, since there is no run time memory allocation in this case?

The exact location depends on the object file format (PE vs. ELF vs. COFF) and any command-line options (some may allow string literals to be stored to a writable memory segment). ELF will store it in the .rodata segment, which, as the name implies, is read-only.
Multiple instances of the same string literal may map to the same location, but it's not required AFAIK (I'm not aware of any compiler that creates multiple instances of the same literal, but my experience isn't that broad).
Things that are certain:
Space for string literals is allocated at program startup (usually when the program is loaded into memory) and held until the program terminates;
Attempting to modify the contents of a string literal invokes undefined behavior - your code may segfault, or it may work as intended, or it may reformat your hard drive, or it may trigger the zombie apocalypse.
Note that your code has a bug - you never assign a meaningful address to s, so the strcpy is essentially trying to write the string "here" to a random location, which again is undefined behavior. You may have intended to write
s = "here";
which sets s to point to the literal. If not, then s will either have to be an array large enough to hold the string:
char s[sizeof "here"]; // sizeof evaluated at compile time
or you'll have to allocate that space dynamically:
char *s = malloc( strlen( "here" ) + 1 );
if ( s )
strcpy( s, "here" );

Related

Is String Literal in C really not modifiable?

As far as I know, a string literal can't be modified for example:
char* a = "abc";
a[0] = 'c';
That would not work since string literal is read-only. I can only modify it if:
char a[] = "abc";
a[0] = 'c';
However, in this post,
Parse $PATH variable and save the directory names into an array of strings, the first answer modified a string literal at these two places:
path_var[j]='\0';
array[current_colon] = path_var+j+1;
I'm not very familiar with C so any explanation would be appreciated.
In programming, there are quite a few rules that are up to you to follow, even though they are not — necessarily — enforced. And "String literals in C are not modifiable" is one of those. So is "Strings returned by getenv should not be modified".
There are some real-world analogies that apply. Here's one: If you're at an intersection, and the light is red, you're not supposed to cross. But, much of the time, if you break the rule, and cross, you might get away with it. You might get a ticket from a policeman — or you might not. You might cause a crash — or you might not. But if you get lucky, and neither of these things happens, that does not imply that crossing the intersection against the red light was okay — it's still quite true that it was very much against the rules.
Similarly, in C, if you write some code that modifies a string literal, or a string returned from getenv, you might get away with it. The compiler might give you a warning or error message — or it might not. Your program might crash — or it might not. But if the program seems to work, that does not imply that these strings are actually modifiable — they're not.
Code blocks from the post you linked:
const char *orig_path_var = getenv("PATH");
char *path_var = strdup(orig_path_var ? orig_path_var : "");
const char **array;
array = malloc((nb_colons+1) * sizeof(*array));
array[0] = path_var;
array[current_colon] = path_var+j+1;
First block:
In the 1st line getenv() returns a pointer to a string which is pointed to by orig_path_var. The string that get_env() returns should be treated as a read-only string as the behaviour is undefined if the program attempts to modify it.
In the 2nd line strdup() is called to make a duplicate of this string. The way strdup() does this is by calling malloc() and allocating memory for the size of the string + 1 and then copying the string into the memory.
Since malloc() is used, the string is stored on the heap, this allows us to edit the string and modify it.
Second block:
In the 1st line we can see that array points to a an array of char * pointers. There is nb_colons+1 pointers in the array.
Then in the 2nd line the 0th element of array is initilized to path_var (remember it is not a string literal, but a copy of one).
In the 3rd line, the current_colonth element of array is set to path_var+j+1. If you don't understand pointer arithmetic, this just means it assigns the address of the j+1th char of path_var to array[current_colon].
As you can see, the code is not operating on const string literals like orig_path_var. Instead it uses a copy made with strdup(). This seems to be where your confusion stems from so take a look at this:
char *strdup(const char *s);
The strdup() function returns a pointer to a new string which is a duplicate of the string s. Memory for the new string is obtained with malloc(3), and can be freed with free(3).
The above text shows what strdup() does according to its man page.
It may also help to read the malloc() man page.
In the example
char* a = "abc";
the token "abc" produces a literal object in the program image, and denotes an expression which yields that object's address.
In the example
char a[] = "abc";
The token "abc" is serves as an array initializer, and doesn't denote a literal object. It is equivalent to:
char a[] = { 'a', 'b', 'c', 0 };
The individual character values of "abc" are literal data is recorded somewhere and somehow in the program image, but they are not accessible as a string literal object.
The array a isn't a literal, needless to say. Modifying a doesn't constitute modifying a literal, because it isn't one.
Regarding the remark:
That would not work since string literal is read-only.
That isn't accurate. The ISO C standard (no version of it to date) doesn't specify any requirements for what happens if a program tries to modify a string literal. It is undefined behavior. If your implementation stops the program with some diagnostic message, that's because of undefined behavior, not because it is required.
C implementations are not required to support string literal modification, which has the benefits like:
standard-conforming C programs can be translated into images that can be be burned into ROM chips, such that their string literals are accessed directly from that ROM image without having to be copied into RAM on start-up.
compilers can condense the storage for string literals by taking advantage of situations when one literal is a suffix of another. The expression "string" + 2 == "ring" can yield true. Since a strictly conforming program will not do something like "ring"[0] = 'w', due to that being undefined behavior, such a program will thereby avoid falling victim to the surprise of "string" unexpectedly turning into "stwing".
There are several reasons for which you had better not to modify them:
The first is that the operating system and/or the compiler can enforce the non-writable property of string literals, putting them in read-only memory (e.g. ROM) or in the .text segment.
second, the compiler is allowed to merge string literals together, so if you modify (and do it successfully) you can get surprises later because other literals (that have been merged because e.g. one of them is a suffix of the other) change apparently by no reason.
if you need an initialized string that is modifiable, you can do it by allocating an array with a declaration, as in (which you can freely modify):
char array[100] = "abc"; // initialized to { 'a' ,'b', 'c', '\0',
// /* and 96 more '\0' characters */
// };

When to allocate memory to char *

I am bit confused when to allocate memory to a char * and when to point it to a const string.
Yes, I understand that if I wish to modify the string, I need to allocate it memory.
But in cases when I don't wish to modify the string to which I point and just need to pass the value should I just do the below? What are the disadvantages in the below steps as compared to allocating memory with malloc?
char *str = NULL;
str = "This is a test";
str = "Now I am pointing here";
Let's try again your example with the -Wwrite-strings compiler warning flag, you will see a warning:
warning: initialization discards 'const' qualifier from pointer target type
This is because the type of "This is a test" is const char *, not char *. So you are losing the constness information when you assign the literal address to the pointer.
For historical reasons, compilers will allow you to store string literals which are constants in non-const variables.
This is, however, a bad behavior and I suggest you to use -Wwrite-strings all the time.
If you want to prove it for yourself, try to modify the string:
char *str = "foo";
str[0] = 'a';
This program behavior is undefined but you may see a segmentation fault on many systems.
Running this example with Valgrind, you will see the following:
Process terminating with default action of signal 11 (SIGSEGV)
Bad permissions for mapped region at address 0x4005E4
The problem is that the binary generated by your compiler will store the string literals in a memory location which is read-only. By trying to write in it you cause a segmentation fault.
What is important to understand is that you are dealing here with two different systems:
The C typing system which is something to help you to write correct code and can be easily "muted" (by casting, etc.)
The Kernel memory page permissions which are here to protect your system and which shall always be honored.
Again, for historical reasons, this is a point where 1. and 2. do not agree. Or to be more clear, 1. is much more permissive than 2. (resulting in your program being killed by the kernel).
So don't be fooled by the compiler, the string literals you are declaring are really constant and you cannot do anything about it!
Considering your pointer str read and write is OK.
However, to write correct code, it should be a const char * and not a char *. With the following change, your example is a valid piece of C:
const char *str = "some string";
str = "some other string";
(const char * pointer to a const string)
In this case, the compiler does not emit any warning. What you write and what will be in memory once the code is executed will match.
Note: A const pointer to a const string being const char *const:
const char *const str = "foo";
The rule of thumb is: always be as constant as possible.
If you need to modify the string, use dynamic allocation (malloc() or better, some higher level string manipulation function such as strdup, etc. from the libc), if you don't need to, use a string literal.
If you know that str will always be read-only, why not declare it as such?
char const * str = NULL;
/* OR */
const char * str = NULL;
Well, actually there is one reason why this may be difficult - when you are passing the string to a read-only function that does not declare itself as such. Suppose you are using an external library that declares this function:
int countLettersInString(char c, char * str);
/* returns the number of times `c` occurs in `str`, or -1 if `str` is NULL. */
This function is well-documented and you know that it will not attempt to change the string str - but if you call it with a constant string, your compiler might give you a warning! You know there is nothing dangerous about it, but your compiler does not.
Why? Because as far as the compiler is concerned, maybe this function does try to modify the contents of the string, which would cause your program to crash. Maybe you rely very heavily on this library and there are lots of functions that all behave like this. Then maybe it's easier not to declare the string as const in the first place - but then it's all up to you to make sure you don't try to modify it.
On the other hand, if you are the one writing the countLettersInString function, then simply make sure the compiler knows you won't modify the string by declaring it with const:
int countLettersInString(char c, char const * str);
That way it will accept both constant and non-constant strings without issue.
One disadvantage of using string-literals is that they have length restrictions.
So you should keep in mind from the document ISO/IEC:9899
(emphasis mine)
5.2.4.1 Translation limits
1 The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:
[...]
— 4095 characters in a character string literal or wide string literal (after concatenation)
So If your constant text exceeds this count (What some times throughout may be possible, especially if you write a dynamic webserver in C) you are forbidden to use the string literal approach if you want to stay system independent.
There is no problem in your code as long as you are not planing to modify the contents of that string. Also, the memory for such string literals will remain for the full life time of the program. The memory allocated by malloc is read-write, so you can manipulate the contents of that memory.
If you have a string literal that you do not want to modify, what you are doing is ok:
char *str = NULL;
str = "This is a test";
str = "Now I am pointing here";
Here str a pointer has a memory which it points to. In second line you write to that memory "This is a test" and then again in 3 line you write in that memory "Now I am pointing here". This is legal in C.
You may find it a bit contradicting but you can't modify string that is something like this -
str[0]='X' // will give a problem.
However, if you want to be able to modify it, use it as a buffer to hold a line of input and so on, use malloc:
char *str=malloc(BUFSIZE); // BUFSIZE size what you want to allocate
free(str); // freeing memory
Use malloc() when you don't know the amount of memory needed during compile time.
It is legal in C unfortunately, but any attempt to modify the string literal via the pointer will result in undefined behavior.
Say
str[0] = 'Y'; //No compiler error, undefined behavior
It will run fine, but you may get a warning by the compiler, because you are pointing to a constant string.
P.S.: It will run OK only when you are not modifying it. So the only disadvantage of not using malloc is that you won't be able to modify it.

Regarding initializing a string as an array

I'm told that when initializing a string like so
char str[] = "Hello world!";
The compiler will allocate an area in constants memory(read only for the program) and then copy the string to the array which resides in the stack. My question is, can I read or point to the original string after modifying the copy I'm given, and how? And if not, why does the string even exist outside of the stack in the first place?
It's done this way for space efficiency. When you write:
char str[] = "Hello world!";
it's compiled effectively as if you'd written:
static char str_init[] = "Hello world!";
char str[13];
strncpy(str, str_init, 13);
An alternative way to implement this might be equivalent to:
char str[13];
str[0] = 'H';
str[1] = 'e';
...
str[11] = '!';
str[12] = 0;
But for long strings, this is very inefficient. Instead of 1 byte of static data for each character of the string, it will use a full word of instruction (probably 4 bytes, but maybe more on some architectures) for each character. This will quadruple the size of the initialization data unnecessarily.
Because the program has to remember the string somewhere, i.e., your so-called "constant memory". Otherwise how can it know what values to assign when allocating the variable? Think about a variable with a given initial value. The variable is not allocated until declared. But the initial value must be stored somewhere else.
When this statement is compiled
char str[] = "Hello world!";
the compiler does not keep the string literal in the program. it is used only to initialize the array.
If you want to keep the string literal then you have to write the following way
char *s = "Hello world!";
char str[13];
strcpy( str, s );
When the program runs, "Hello world" will be stored in the constant part of the memory as a string literal, after that, the program will reserve enough space in the stack and copy character by character from the constant part of the memory. Unfortunately, you don't have access to the constant part that stores the string literal because you are telling the program that you want the values to be modifiable (string stored in stack), so it gives what you asked.
Most of your question has been addressed in the other answers, However, I did not see anyone address this one specifically:
Regarding your question: ...can I read or point to the original string after modifying the copy I'm given, and how?
The following sequence demonstrates how you can read the original after modifying a copy:
char str[] = "hello world"; //creates original (stack memory)
char *str2 = 0;//create a pointer (pointer created, no memory allocated)
str2 = StrDup(str); populate pointer with original (memory allocated on heap)
str2[5]=0; //edit copy: results in "hello" (i.e. modified) (modifying a location on the heap)
str; //still contains "hello world" (viewing value on the stack)
EDIT (answering comment question)
The answer above only addressed the specific question about accessing an original string after a copy has been modified. I just showed one possible set of steps to address that. You can edit the original string too:
char str[] = "Hello world!"; //creates location in stack memory called "str",
//and assigns space enough for literal string:
//"Hello world!", 13 spaces in all (including the \0)
strcpy(str, "new string"); //replaces original contents with "new string"
//old contents are no longer available.
So, using these steps, the original values in the variable str are changed, and are no longer available.
The method I outline in my original answer, (at top) shows a way whereby you can make an editable copy, while maintaining the original variable.
In your comment question, you are referring to things such as system memory and constant memory. Normally, system memory refers to RAM implementations on a system (i.e. how much physical memory). By constant memory, my guess is that you are referring to memory used by variables created on the stack. (read on)
First In a development, or run-time environment, there is stack memory. This is usually defaulted to some maximum value, such as 250,000 bytes perhaps. It is a pre-build settable value in most development environments, and is available for use by any variable you create on the stack. Example:
int x[10]; //creates a variable on the stack
//using enough memory space for 10 integers.
int y = 1; //same here, except uses memory for only 1 integer value
Second There is also what is referred to a heap memory. The amount of heap memory is system dependent, the more physical memory your system has available, the more heap memory you can use for variable memory space in your application. Heap memory is used when you dynamically allocate memory, for example using malloc(), calloc(), realloc().
int *x=0; //creates a pointer, no memory allocation yet...
x = malloc(10); //allocates enough memory for 10 integers, but the
//memory allocated is from the _heap_
//and must be freed for use by the system
//when you are done with it.
free(x);
I have marked the original post (above) with indications showing what type of memory each variable is using. I hope this helps.

C Duration of strings, constants, compound literals, and why not, the code itself

I didn't remember where I read, that If I pass a string to a function like.
char *string;
string = func ("heyapple!");
char *func (char *string) {
char *p
p = string;
return p;
}
printf ("%s\n", string);
The string pointer continue to be valid because the "heyapple!" is in memory, it IS in the code the I wrote, so it never will be take off, right?
And about constants like 1, 2.10, 'a'?
And compound literals?
like If I do it:
func (1, 'a', "string");
Only the string will be all of my program execution, or the constans will be too?
For example I learned that I can take the address of string doing it
&"string";
Can I take the address of the constants literals? like 1, 2.10, 'a'?
I'm passing theses to functions arguments and it need to have static duration like strings without the word static.
Thanks a lot.
This doesn't make a whole lot of sense.
Values that are not pointers cannot be "freed", they are values, they can't go away.
If I do:
int c = 1;
The variable 'c' is not a pointer, it cannot do anything else than contain an integer value, to be more specific it can't NOT contain an integer value. That's all it does, there are no alternatives.
In practice, the literals will be compiled into the generated machine-code, so that somewhere in the code resulting from the above will be something like
load r0, 1
Or whatever the assembler for the underlying instruction set looks like. The '1' is a part of the instruction encoding, it can't go away.
Make sure you distinguish between values and pointers to memory. Pointers are themselves values, but a special kind of value that contains an address to memory.
With char* hello = "hello";, there are two things happening:
the string "hello" and a null-terminator are written somewhere in memory
a variable named hello contains a value which is the address to that memory
With int i = 0; only one thing happens:
a variable named i contains the value 0
When you pass around variables to functions their values are always copied. This is called pass by value and works fine for primitive types like int, double, etc. With pointers this is tricky because only the address is copied; you have to make sure that the contents of that address remain valid.
Short answer: yes. 1 and 'a' stick around due to pass by value semantics and "hello" sticks around due to string literal allocation.
Stuff like 1, 'a', and "heyapple!" are called literals, and they get stored in the compiled code, and in memory for when they have to be used. If they remain or not in memory for the duration of the program depends on where they are declared in the program, their size, and the compiler's characteristics, but you can generally assume that yes, they are stored somewhere in memory, and that they don't go away.
Note that, depending on the compiler and OS, it may be possible to change the value of literals, inadvertently or purposely. Many systems store literals in read-only areas (CONST sections) of memory to avoid nasty and hard-to-debug accidents.
For literals that fit into a memory word, like ints and chars it doesn't matter how they are stored: one repeats the literal throughout the code and lets the compiler decide how to make it available. For larger literals, like strings and structures, it would be bad practice to repeat, so a reference should be kept.
Note that if you use macros (#define HELLO "Hello!") it is up to the compiler to decide how many copies of the literal to store, because macro expansion is exactly that, a substitution of macros for their expansion that happens before the compiler takes a shot at the source code. If you want to make sure that only one copy exists, then you must write something like:
#define HELLO "Hello!"
char* hello = HELLO;
Which is equivalent to:
char* hello = "Hello!";
Also note that a declaration like:
const char* hello = "Hello!";
Keeps hello immutable, but not necessarily the memory it points to, because of:
char h = (char) hello;
h[3] = 'n';
I don't know if this case is defined in the C reference, but I would not rely on it:
char* hello = "Hello!";
char* hello2 = "Hello!"; // is it the same memory?
It is better to think of literals as unique and constant, and treat them accordingly in the code.
If you do want to modify a copy of a literal, use arrays instead of pointers, so it's guaranteed a different copy of the literal (and not an alias) is used each time:
char hello[] = "Hello!";
Back to your original question, the memory for the literal "heyapple!" will be available (will be referenceable) as long as a reference is kept to it in the running code. Keeping a whole module (a loadable library) in memory because of a literal may have consequences on overall memory use, but that's another concern (you could also force the unloading of the module that defines the literal and get all kind of strange results).
First,it IS in the code the I wrote, so it never will be take off, right? my answer is yes. I recommend you to have a look at the structure of ELF or runtime structure of executable. The position that the string literal stored is implementation dependent, in gcc, string literal is store in the .rdata segment. As the name implies, the .rdata is read-only. In your code
char *p
p = string;
the pointer p now point to an address in a readonly segment, so even after the end of function call, that address is still valid. But if you try to return a pointer point to a local variable then it is dangerous and may cause hard-to-find bugs:
int *func () {
int localVal = 100;
int *ptr = localVal;
return p;
}
int val = func ();
printf ("%d\n", val);
after the execution of func, as the stack space of func is retrieve by the c runtime, the memory address where localVal was stored will no longer guarantee to hold the original localVal value. It can be overidden by operation following the func.
Back to your question title
-
string literal have static duration.
As for "And about constants like 1, 2.10, 'a'?"
my answer is NO, your can't get address of a integer literal using &1. You may be confused by the name 'integer constant', but 1,2.10,'a' is not right value ! They do not identify a memory place,thus, they don't have duration, a variable contain their value can have duration
compound literals, well, I am not sure about this.

Advantages and disadvantages of using strdup on a string literal

I want to be clear about all the advantages/disadvantages of the following code:
{
char *str1 = strdup("some string");
char *str2 = "some string";
free(str1);
}
str1:
You can modify the contents of the string
str2:
You don't have to use free()
Faster
Any other differences?
Use neither if you can and avoid it by one of the following
static char const str3[] = { "some string" };
char str4[] = { "some string" };
str3 if you never plan to modify it and str4 if you do.
str3 ensures that no other function in your program can modify your string (string literals may be shared and mutable). str4 allocates a constant sized array on the stack, so allocation and deallocation comes with no overhead. The system has just to copy your data.
Using the original string - whether it's a literal in the source, part of a memory-mapped file, or even an allocated string "owned" by another part of your program - has the advantage of saving memory, and possibly eliminating ugly error conditions you'd otherwise have to handle if you performed an allocation (which could fail). The disadvantage, of course, is that you have to keep track of the fact that this string is not "owned" by the code currently using it, and thus that it cannot be modified/freed. Sometimes this means you need a flag in a structure to indicate whether a string it uses was allocated for the structure or not. With smaller programs, it might just mean you have to manually follow the logic of string ownership through several functions and make sure it's correct.
By the way, if the string is going to be used by a structure, one nice way to get around having to keep a flag marking whether it was allocated for the structure or not is to allocate space for the structure and the string (if needed) with a single call to malloc. Then, freeing the structure always just works, regardless of whether the string was allocated for the structure or assigned from a string literal or other source.
strdup is not C89 and not C99 -> not
ANSI C -> not portable
is portable and str2 is implicit const

Resources