I understand that char pointers initialized to string literals are stored in read-only memory. But what about arrays of string literals?
In the array:
int main(void) {
char *str[] = {"hello", "world"};
}
Are "hello" and "world" stored as string literals in read-only memory? Or on the stack?
Technically, a string literal is a quoted string in source code. Colloquially, people use “string literal” to refer to the array of characters created for a string literal. Often we can overlook this informality, but, when asking about storage, we should be clear.
The array created for a string literal has static storage duration, meaning it exists (notionally, in the abstract computer the C standard uses as a model of computing) for the entire execution of the program. Because the behavior of attempting to modify the elements of this array is not defined by the C standard, the C implementation may treat them as constants and may place them in read-only memory. It is not required to do so by the C standard, but this is common practice in C implementations for general-purpose multi-user operating systems.
In the code you show, string literals are used as initializers for an array of pointers. In this use, the array of each string literal is converted to a pointer to its first element, and that address is used as the initial value for the corresponding element of the array of pointers.
The array of the string literal is the same as for any string literal; the C implementation may place it in read-only memory, and common practice is to do so.
Here is what the c17 standard says:
String literals [...] It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined. (6.4.5 p 7)
Like string literals, const-qualified compound literals can be placed into read-only memory and can even be shared. (6.5.2.5 p 13).
Related
char *str = "String!";
char *str = (char []){"String!"};
Are these two initializations equivalent? If not, what is the difference between them?
Are these two initializations equivalent?
No.
If not, what is the difference between them?
One of the major differences between the two is that you cannot modify the object that str is pointing to in the first snippet of code, while in the second one, you can.
Trying to modify a string literal in C is undefined behavior. So in case of
char *str = "String!";
if you try to modify the object pointed to by str, it'll invoke UB.
In case of
char *str = (char []){"String!"};
however, (char[]){"String!"} is a compound literal, which has type array of chars, and str points to the first element of that array.
Since the compound literal is not read-only (doesn't have a const qualifier), you can modify the object pointed to by str.
Another difference you should note is that the string literal "String!" in the first one has static storage duration, while the compound literal (char []){"String!"} in the second one has static storage duration only if it occurs outside the body of a function; otherwise, it has automatic storage duration associated with the enclosing block.
From n1570 6.5.2.5 (Compound literals) p12:
12 EXAMPLE 5 The following three expressions have different meanings:
"/tmp/fileXXXXXX"
(char []){"/tmp/fileXXXXXX"}
(const char []){"/tmp/fileXXXXXX"}
The first always has static storage duration and has type array of
char, but need not be modifiable; the last two have automatic storage
duration when they occur within the body of a function, and the first
of these two is modifiable.
Are these two initializations equivalent?
No. There are at least three differences:
The C standard does not define the behavior upon attempting to modify the characters of "String!", but it does define the behavior of attempting to modify the characters of (char []){"String!"}.
Each occurrence of "String!" may be the same object, but each occurrence of (char []){"String!"} is a distinct object.
The array defined by "String!" has static lifetime (the memory for it is reserved for all of program execution), but the array defined by (char []){"String!"} has static lifetime if it is outside of any function and automatic lifetime if it is inside a function.
Let’s examine these differences:
The C standard only requires the contents of string literals to be available for reading, not writing. It does not define the behavior if you attempt to modify them. This does not mean a program may not modify them, just that the standard does not say what happens if a program tries. In consequence, good engineers will not attempt to modify them in ordinary situations. (However, a C implementation may define the behavior, in which case a program for that implementation could make use of that.)
C implementations are allowed to coalesce string literals and parts of them, so defining two pointers initialized with the same string, as with char *str0 = "String!"; and char *str1 = "String!";, may yield two pointers with the same value. This is due to a special rule in the C standard for string literals, so it does not apply for compound literals. When two pointers are initialized with the same compound literal source code, they must have different values. This means that multiple uses of compound literals must use more memory than multiple uses of strings, unless the compiler is able to optimize them away. Even a single use of a compound literal to initialize a pointer inside a function may cause more use of memory because the compiler typically must use space on the stack for the compound literal and separately have a copy of the string used to initialize it.
Because a string literal has static lifetime, its address may be returned from a function and used by the caller. However, a compound literal may not be relied on after the function it is defined in returns. For example:
char *GetErrorMessageFromCode(int code)
{
char *ErrorMessages[] =
{
"invalid argument",
"out of range",
"resource unavailable",
…
}
return ErrorMessages[code];
/* The above is working code for returning pointers to (the first
characters of) static string literals. If compound literals were
used instead, the behavior would not be defined by the C standard
because the memory for the compound literals would not be reserved
after the function returns.
*/
}
I am trying to understand the reason behind not being able to modify a string literal in C.
Why is the following illegal in C?
char* p = "abc";
*p = 'd';
From the C89 Rationale, 3.1.4 String literals:
String literals are specified to be unmodifiable. This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and perform certain optimizations. However, string literals do not have the type array of const char, in order to avoid the problems of pointer type checking, particularly with library functions, since assigning a pointer to const char to a plain pointer to char is not valid. Those members of the Committee who insisted that string literals should be modifiable were content to have this practice designated a common extension (see F.5.5).
A book on C programming says,
"There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals.
Takeaway - String literals are read-only.
If introduced today, the type of string literals would certainly be char const[], an array of const-qualified characters. Unfortunately, the const keyword was introduced to the C language much later than string literals, and therefore it remained as it is for backward
compatibility."
Question 1. How can strings be read only, like takeaway says, if they can be modified?
Question 2. "There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals."
What type is this referring to, which doesn't keep a string literal from being modified?
Question 3. If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
How can strings be read only, like takeaway says, if they can be modified?
Because Unfortunately, the const keyword was introduced to the C language much later than string literals, and therefore it remained as it is for backward compatibility.
String literals existed before protection (in the form of const keyword) existed in the language.
Ergo, they are not protected, because the tools to protect it did not exist. A classic example of undefined behavior is writing to a string literal, see Undefined, unspecified and implementation-defined behavior .
What type is this referring to, which doesn't keep a string literal from being modified?
It has the type char[N] - "array of N chars" - where N is the number of characters in the string plus one for zero terminating character.
See https://en.cppreference.com/w/c/language/string_literal .
char *str = "string literal";
// ^^^ - implicit decay from `char [N]` -> `char *`
str[0] = 'a'; // compiler will be fine, but it's invalid code
// or super shorter form:
"string literal"[0] = 'a'; // invalid code
If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
The type would be const char[N] - an array of N constant chars, which means you can't modify the characters.
// assuming string literal has type const char [N]
const char *str = "string literal";
// ^^^ - implicit decay from `const char [N]` -> `const char *`
str[0] = 'a'; // compile time error
// or super shorter form:
"string literal"[0] = 'a'; // with `const char [N]` would be a compiler time error
With gcc compiler use -Wwrite-strings to protect against mistakes like writing to string literals.
Question 1. How can strings be read only, like takeaway says, if they can be modified?
The text does not say they can be modified. It says they are not protected from being modified. That is a slight error; properly, it should say they are not protected from attempts to modify them: The rules of the C standard do not prevent you from writing code that attempts to modify a string literal, and they do not define the results when a program executes such an attempt. In some circumstances, attempting to modify a string literal may result in a signal, usually ending program execution by default. In other circumstances, the attempt may succeed, and the string literal will be modified. In other circumstances, nothing will happen; there will be neither a signal nor a change to the string literal. It is also possible other behaviors may occur.
Question 2. "There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals."
What type is this referring to, which doesn't keep a string literal from being modified?
Technically, a string literal is a piece of source code that has a character sequence inside quotes, optionally with an encoding prefix. During compilation or program execution, an array is generated with the contents of the character sequence and a terminating null character. For string literals without a prefix, the type of that array is char []. (If there is a prefix, the type may also be wchar_t [], char16_t [], or char32_t, depending on the prefix.)
Colloquially, we often refer to this array as the string literal, even though the array is the thing that results from a string literal (an array in memory) not the actual string literal (in the source code).
The type char [] does not contain const, so it does not offer the protections that const char [] does. (Those protections are fairly mild.)
Question 3. If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
Your confusion here is unclear. When a string literal appears in source code, the compiler arranges for its contents to be in the memory of the running program. Those contents are in memory as an array of characters. If the rules of C were different, the type of that array would be const char [] instead of char [].
Say I do this:
const char *myvar = NULL;
Then later
*myval = “hello”;
And then again:
*myval = “world”;
I’d like to understand what happens to the memory where “hello” was stored?
I understand it is in the read only stack space, but does that memory space stays there forever while running and no other process can use that memory space?
Thanks
Assuming you meant
myval = "world";
instead, then
I’d like to understand what happens to the memory where “hello” was stored?
Nothing.
You just modify the pointer itself to point to some other string literal.
And string literals in C programs are static fixed (non-modifiable) arrays of characters, with a life-time of the full programs. The assignment really makes the pointer point to the first element of such an array.
String literals have static storage duration. They are usually placed by the compiler in a stack pool. So string literals are created before the program will run.
In these statements
*myval = “hello”;
*myval = “world”;
the pointer myval is reassigned by addresses of first characters of these two string literals.
Pat attention to that you may not change string literals. Whether the equal string literals are stored as a one character array or different character arrays with the static storage duration depends on compiler options.
From the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
I am trying to understand the reason behind not being able to modify a string literal in C.
Why is the following illegal in C?
char* p = "abc";
*p = 'd';
From the C89 Rationale, 3.1.4 String literals:
String literals are specified to be unmodifiable. This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and perform certain optimizations. However, string literals do not have the type array of const char, in order to avoid the problems of pointer type checking, particularly with library functions, since assigning a pointer to const char to a plain pointer to char is not valid. Those members of the Committee who insisted that string literals should be modifiable were content to have this practice designated a common extension (see F.5.5).