On this answer by michael-burr on this question:
what-is-the-type-of-string-literals-in-c-and-c
I found that
In C the type of a string literal is a char[] - it's not const according to the type, but it is undefined behavior to modify the contents
from this I can think that sentence "How are you" can't be modified (just as char c*="how are you?") but once it is used to initialize some char[] then it can be unless declared as const.
Apart from this from that answer:
The multibyte character sequence is then used to initialize an array of static storage duration
and from C Primer Plus 6th Edition I found:
Character string constants are placed in the static storage class, which means that if you use
a string constant in a function, the string is stored just once and lasts for the duration of the
program, even if the function is called several times
But when I tried this code:
#include <stdio.h>
void fun() {
char c[] = "hello";
printf("%s\n", c);
c[2] = 'x';
}
int main(void) {
fun();
fun();
return 0;
}
The array inside function fun doesn't behave as if it has retained the changed value.
Where am I going wrong on this?
char c[]="hello"; is not at all the same thing as char *c="hello";. The latter initializes a pointer to the aforementioned static string storage, modifying c[2] would be undefined behavior. The former is equivalent to:
char c[] = {'h', 'e', 'l', 'l', 'o', '\0'};
It's initializing an array on the stack, it's not creating a reference or pointer to the static string in some other memory location. Like any other non-const stack array, you can modify it however you please (as long as you don't go out of bounds).
You’re not modifying the string literal. You’re modifying a local array that contains a copy of the string literal. Each time you call fun, a new instance of c is created and initialized. When fun exits, that instance of c ceases to exist.
Because it is an automatic variable and it's instance is initialized every time you call the function.
Change it to have static storage
static char c[]="hello";
And it will behave as you expect keeping the changed value between the function calls
This is something completely different from how the string literals and compound literals are stored and used in the variable initialization. It is left to the implementation - for example this initialization of the automatic storage variable may be done by copying data from .rodata segment or it can be just couple immediate store instructions and the literal will be stored in the .text segment
Related
This question already has answers here:
String literals: Where do they go?
(8 answers)
Closed 3 years ago.
I've been reading in various sources that string literals remain in memory for the whole lifetime of the program. In that case, what is the difference between those two functions
char *f1() { return "hello"; }
char *f2() {
char str[] = "hello";
return str;
}
While f1 compiles fine, f2 complains that I'm returning stack allocated data. What happens here?
if the str points to the actual string literal (which has static duration), why do I get an error?
if the string literal is copied to the local variable str, where does the original string literal go? does it remain in memory with no reference to it?
I've been reading in various sources that string literals remain in
memory for the whole lifetime of the program.
Yes.
In that case, what is
the difference between those two functions
char *f1() { return "hello"; }
char *f2() {
char str[] = "hello";
return str;
}
f1 returns a pointer to the first element of the array represented by a string literal, which has static storage duration. f2 returns a pointer to the first element of the automatic array str. str has a string literal for an initializer, but it is a separate object.
While f1 compiles fine, f2 complains that I'm returning stack
allocated data. What happens here?
if the str points to the actual string literal (which has static duration), why do I get an error?
It does not. In fact, it itself does not point to anything. It is an array, not a pointer.
if the string literal is copied to the local variable str, where does the original string literal go? does it remain in memory with no
reference to it?
C does not specify, but in practice, yes, some representation of the string literal must be stored somewhere in the program, perhaps in the function implementation, because it needs to be used to initialize str anew each time f2 is called.
This
char str[] = "hello";
is a declaration of a local array that is initialized by the string literal "hello".
In fact it is the same as if you declared the array the following way
char str[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
That is the own area of memory (with the automatic storage duration) of the array is initialized by a string literal.
After exiting the function the array will not be alive.
That is the function
char *f2() {
char str[] = "hello";
return str;
}
tries to return a pointer to the first element of the local character array str that has the automatic storage duration.
As for this function definition
char *f1() { return "hello"; }
then the function returns a pointer to the first character of the string literal "hello" that indeed has the static storage duration.
You may imagine the first function definition the following way
char literal[] = "hello";
char *f1() { return literal; }
Now compare where the arrays are defined in the first function definition and in the second function definition.
In the first function definition the array literal is defined globally while in the second function definition the array str is defined locally.
if the str points to the actual string literal (which has static
duration), why do I get an error?
str is not a pointer. It is a named extent of memory that was initialized by a string literal. That is the array has the type char[6].
In the return statement
return str;
the array is implicitly converted to pointer to its first element of the type char *.
Functions in C and C++ may not return arrays. In C++ functions may return references to arrays.
The string that you will see on your stack is not a direct result of the presence of a string literal. The string is stored, in case of ELF, in a separate region of the executable binary called "string table section", along with other string literals that the linker meets during the linking process. Whenever the stack context of the code that actually caused a string to be included is instantiated, the contents of the string in string table section are actually copied to the stack.
A brief reading that you might be interested in:
http://refspecs.linuxbase.org/elf/gabi4+/ch4.strtab.html
char str[] = "hello"; is a special syntax which copies the string literal, and your function returns a pointer to this local variable, which is destroyed once the function returns.
char *f1() { return "hello"; } is correct but returning const char* would probably be better.
In this question it was said in the comments:
char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; and char arr[10] =
"Hello"; are strictly the same thing. – Michael Walz
This got me thinking.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal with.
Does char b[10]= {72, 101, 108, 108, 111, 0}; also create a "string" literal with static storage duration? Because theoretically it is the same thing.
char a = 'a'; is the same thing as char a; ...; a = 'a';, so your thoughts are correct 'a' is simply written to a
Are there differences between:
char a = 'a';
char a = {'a'};
How/where are the differences defined?
EDIT:
I see that I haven't made it clear enough that I am particularly interested in the memory usage/storage duration of the literals. I will leave the question as it is, but would like to make the emphasis of the question more clear in this edit.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
Yes, but string literals are also a grammatical item in the C language. char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; is not a string literal, it is an initializer list. The initializer list does however behave as if it has static storage duration, remaining elements after the explicit \0 are set to zero etc.
The initializer list itself is stored in some manner of ROM memory. If your variable arr has static storage duration too, it will get allocated in the .data segment and initialized from the ROM init list before the program is started. If arr has automatic storage duration (local), then it is initialized from ROM in run-time, when the function containing arr is called.
The ROM memory where the initializer list is stored may or may not be the same ROM memory as used for string literals. Often there's a segment called .rodata where these things end up, but they may as well end up in some other segment, such as the code segment .text.
Compilers like to store string literals in a particular memory segment, because that means that they can perform an optimization called "string pooling". Meaning that if you have the string literal "Hello" several times in your program, the compiler will use the same memory location for it. It may not necessarily do this same optimization for initializer lists.
Regarding 'a' versus {'a'} in an initializer list, that's just a syntax hiccup in the C language. C11 6.7.6/11:
The initializer for a scalar shall be a single expression, optionally enclosed in braces. The
initial value of the object is that of the expression (after conversion); the same type
constraints and conversions as for simple assignment apply,
In plain English, this means that a "non-array" (scalar) can be either initialized with or without braces, it has the same meaning. Apart from that, the same rules as for regular assignment apply.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
Yes. But with char arr[10] = "Hello";, you are copying the string literal to an array arr and there's no need to "keep" the string literal. So if an implementation chooses to do remove the string literal altogether after copying it to arr and that's totally valid.
But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal.
Again there's no need to make/store a string literal for this.
Only if you directly have a pointer to a string literal, it'd be usually stored somewhere such as:
char *string = "Hello, world!\n";
Even then an implementation can choose not to do so under the "as-if" rule. E.g.,
#include <stdio.h>
#include <string.h>
static const char *str = "Hi";
int main(void)
{
char arr[10];
strcpy(arr, str);
puts(arr);
}
"Hi" can be eliminated because it's used only for copying it into arr and isn't accessed directly anywhere. So eliminating the string literal (and the strcpy call too) as if you had "char arr[10] = "Hi"; and wouldn't affect the observable behaviour.
Basically the C standard doesn't necessitate a string literal has to be stored anywhere as long as the properties associated with a string literal are satisfied.
Are there differences between: char a = 'a'; char a = {'a'}; How/where are the differences defined?
Yes. C11, 6.7.9 says:
The initializer for a scalar shall be a single expression, optionally enclosed in braces. [..]
Per the syntax, even:
char c = {'a',}; is valid and equivalent too (though I wouldn't recommend this :).
In the abstract machine, char arr[10] = "Hello"; means that arr is initialized by copying data from the string literal "Hello" which has its own existence elsewhere; whereas the other version just has initial values like any other variable -- there is no string literal involved.
However, the observable behaviour of both versions is identical: there is created arr with values set as specified. This is what the other poster meant by the code being identical; according to the Standard, two programs are the same if they have the same observable behaviour. Compilers are allowed to generate the same assembly for both versions.
Your second question is entirely separate to the first; but char a = 'a'; and char a = {'a'}; are identical. A single initializer may optionally be enclosed in braces.
I belive your question is highly implementation dependant (HW and compiler wise). However, in general: arrays are placed in RAM, let it be global or not.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
Yes this saves the string "Hello" in ROM (read only memory). Your array is loaded the literal in runtime.
But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal.
Yes but in this case the single characters are placed in ROM. The array you are initialized is loaded with character literals in runtime.
Does char b[10]= {72, 101, 108, 108, 111, 0}; also create a "string" literal with static storage duration? Because theoretically it is the same thing.
If you use UTF-8, then yes, since char == uint8_t and those are the values.
Are there differences between:
char a = 'a';
char a = {'a'};
How/where are the differences defined?
I believe not.
In reply to edit
Do you mean the lifetime of storage of string literals? Have a look at this.
So a string literal has static storage duration. It remains throughout the lifetime of the program, hardcoded in memory.
somewhere I read the following lines :-
char *p = "string literal";
My program crashes if I try to assign a new value to p[i].
A:-It turns into an unnamed, static array of characters, and this unnamed array may be stored in read-only memory, and which therefore cannot necessarily be modified. In an expression context, the array is converted at once to a pointer, as usual (see section 6), so the declaration initializes p to point to the unnamed array's first element.
I know what static do but I did not understand the following in the above lines
static array of characters.
This does not refer to the static keyword, but static in the sense that it cannot be changed.
EDIT: Thinking better, it seems this phrase was badly written, I think the author back then (for those wondering, this comes from the C faq) meant "constant"
EDIT2: OP asked what is a string literal, here is the answer:
String literal is a string that is hardcoded in your source (and later in your compiled program), you do it by using double quotes " a example would be this "some string literal here"
When you assigned this to a pointer, the pointer points to the string literal, that is stored in your program running code, NOT on the main memory, this is why it cannot be modified.
You can assign a string literal to array, to initialize the array, the meaning there is different, where the array will be sent to the memory, and will have that string as its initial value.
Mind you, a string literal must be inside double quotes " if you attempt other hacks it won't compile at all. You cannot for example do this: char* someVar = {'f', 'o', 'o', '\0'}; it won't work at all. (my compiler gives the error: excess elements in scalar initializer)
"Static" refers to the storage duration of the object that will be created for the string literal.
To quote C99 6.4.5:
The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence.
Simply string literals refer to string constants about which C11 standard says that:
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
It can't change during program execution. While the string variables can change during program execution. String variables are arrays of characters whose last element is a NUL character (\0).
All string (variables) are array of characters but all character arrays are not string.
When compiler encounters a string literal, then it stores it in the read only section of memory, i.e, ROM. Here the word static refers to unmodifiable not the keyword static.
A string literal:
char *string_literal = "string literal";
or this can also be seen as
char *string_literal = {'s','t','r','i','n','g',' ','l','i','t','e','r','a','l','\0'};
A string variable
char string_var[] = "string variable";
or it can also be seen as
char string_var[] = {'s','t','r','i','n','g',' ','v','a','r','i','a','b','l','e', '\0'};
A character array:
char character_array[] = {'c','h','a','r','a','c','t','e','r',' ', 'a', 'r', 'r', 'a', 'y'};
I know that string literal used in program gets storage in read only area for eg.
//global
const char *s="Hello World \n";
Here string literal "Hello World\n" gets storage in read only area of program .
Now suppose I declare some literal in body of some function like
func1(char *name)
{
const char *s="Hello World\n";
}
As local variables to function are stored on activation record of that function, is this the
same case for string literals also? Again assume I call func1 from some function func2 as
func2()
{
//code
char *s="Mary\n";
//call1
func1(s);
//call2
func1("Charles");
//code
}
Here above,in 1st call of func1 from func2, starting address of 's' is passed i.e. address of s[0], while in 2nd call I am not sure what does actually happens. Where does string literal "Charles" get storage. Whether some temperory is created by compiler and it's address is passed or something else happens?
I found literals get storage in "read-only-data" section from
String literals: Where do they go?
but I am unclear about whether that happens only for global literals or for literals local to some function also. Any insight will be appreciable. Thank you.
A C string literal represents an array object of type char[len+1], where len is the length, plus 1 for the terminating '\0'. This array object has static storage duration, meaning that it exists for the entire execution of the program. This applies regardless of where the string literal appears.
The literal itself is an expression type char[len+1]. (In most but not all contexts, it will be implicitly converted to a char* value pointing to the first character.)
Compilers may optimize this by, for example, storing identical string literals just once, or by not storing them at all if they're never referenced.
If you write this:
const char *s="Hello World\n";
inside a function, the literal's meaning is as I described above. The pointer object s is initialized to point to the first character of the array object.
For historical reasons, string literals are not const in C, but attempting to modify the corresponding array object has undefined behavior. Declaring the pointer const, as you've done here, is not required, but it's an excellent idea.
Where string literals (or rather, the character arrays they are compiled to) are located in memory is an implementation detail in the compiler, so if you're thinking about what the C standard guarantees, they could be in a number of places, and string literals used in different ways in the program could end up in different places.
But in practice most compilers will treat all string literals the same, and they will probably all end up in a read-only segment. So string literals used as function arguments, or used inside functions, will be stored in the same place as the "global" ones.
I was going through some documentation which states that
First Case
char * p_var="Sack";
will create a constant string literal.
And hence code like
p_var[1]="u";
will fail because of that property.
Second Case
Also mentioned is that this is possible only for character literals and not for other data types through pointers. So code like
float *p="3.14";
will fail, resulting in a compiler error.
But when i try it out i don't get compiler errors ,accessing it though gives me 0.000000f(using gcc on Ubuntu).
So regarding the above, i have three queries:
Why are string literals created in First Case read-only?
Why are only string literals allowed to be created and not other constants like float through pointers?
3. Why is Second Case not giving me compiler errors?
Update
Please discard the 3rd question and second case. I tested it by adding quotes.
Thanks
The premise is wrong: pointers don’t create any string literals, neither read-only nor writeable.
What does create a read-only string literal is the literal itself: "foo" is a read-only string literal. And if you assign it to a pointer, then that pointer points to a read-only memory location.
With that, let’s turn to your questions:
Why are string literals created in First Case read-only?
The real question is: why not? In most cases, you won’t want to change the value of a string literal later on so the default assumption makes sense. Furthermore, you can create writeable strings in C via other means.
Why are only string literals allowed to be created and not other constants like float?
Again, wrong assumption. You can create other constants:
float f = 1.23f;
Here, the 1.23f literal is read-only. You can also assign it to a constant variable:
const float f = 1.23f;
Why is Second Case not giving me compiler errors?
Because the compiler cannot check in general whether your pointer points to read-only memory or to writeable memory. Consider this:
char* p = "Hello";
char str[] = "world"; // `str` is a writeable string!
p = &str[0];
p[1] = 'x';
Here, p[1] = 'x' is entirely legal – if we hadn’t re-assigned p beforehand, it would have been illegal. Checking this cannot be generally done at compile-time.
Regarding your question:
Why are string literals created in First Case read-only?
char *p_var="Sack";
Well, the p_var is assigned with the starting address of the memory allocated to the string "Sack". p_var content is not read-only, since you haven't put the const keyword anywhere in C constructs. Although manipulating the p_var contents like strcpy or strcat may cause undefined behavior.
Quote C ISO 9899:
The declaration
char s[] = "abc", t[3] = "abc";
defines plain char array objects s and t whose elements are initialized with character string literals.
This declaration is identical to:
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };
The contents of the arrays are modifiable. On the other hand, the declaration:
char *p = "abc";
defines p with type pointer to char and initializes it to point to an object with type array of char with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
An explanation of why it could be read-only per your platform and compiler:
Commonly string literals will be put in "read-only-data" section which gets mapped into the process space as read-only (which is why you seem to not being allowed to change it).
But some platforms do allow, the data segment to be writable.
Why are only string literals allowed to be created and not other constants like float? and the third question.
To create a float constant you should use:
const float f=1.5f;
Now, when you are doing:
float *p="3.14";
you are basically assigning the string literal's address to a float pointer.
Try compiling with -Wall -Werror -Wextra. You will find out what is happening.
It works because, in practice, there's no difference between a char * and a float * under the hood.
Its as if you are writing this:
float *p=(float*) "3.14";
This is a well-defined behaviour, unless the memory alignment requirements of float and char differ, in which case it results in undefined behaviour (Reference: C99, 6.3.2.3 p7).
Efficiency
they are
It's a string mind the quotes
float *p="3.14";
This is also a string literal !
Why are string literals created in First Case read-only?
No, both "sack" and "3.14" are string literals and both are read-only.
Why are only string literals allowed to be created and not other constants like float?
If you want to create a float const then do:
const float p=3.14;
Why is Second Case not giving me compiler errors?
You are making the pointer p point to a string literal. When you dereference p, it expects to read a float value. So there's nothing wrong as far as the compiler can see.