I am trying to understand the reason behind not being able to modify a string literal in C.
Why is the following illegal in C?
char* p = "abc";
*p = 'd';
From the C89 Rationale, 3.1.4 String literals:
String literals are specified to be unmodifiable. This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and perform certain optimizations. However, string literals do not have the type array of const char, in order to avoid the problems of pointer type checking, particularly with library functions, since assigning a pointer to const char to a plain pointer to char is not valid. Those members of the Committee who insisted that string literals should be modifiable were content to have this practice designated a common extension (see F.5.5).
Related
I understand that char pointers initialized to string literals are stored in read-only memory. But what about arrays of string literals?
In the array:
int main(void) {
char *str[] = {"hello", "world"};
}
Are "hello" and "world" stored as string literals in read-only memory? Or on the stack?
Technically, a string literal is a quoted string in source code. Colloquially, people use “string literal” to refer to the array of characters created for a string literal. Often we can overlook this informality, but, when asking about storage, we should be clear.
The array created for a string literal has static storage duration, meaning it exists (notionally, in the abstract computer the C standard uses as a model of computing) for the entire execution of the program. Because the behavior of attempting to modify the elements of this array is not defined by the C standard, the C implementation may treat them as constants and may place them in read-only memory. It is not required to do so by the C standard, but this is common practice in C implementations for general-purpose multi-user operating systems.
In the code you show, string literals are used as initializers for an array of pointers. In this use, the array of each string literal is converted to a pointer to its first element, and that address is used as the initial value for the corresponding element of the array of pointers.
The array of the string literal is the same as for any string literal; the C implementation may place it in read-only memory, and common practice is to do so.
Here is what the c17 standard says:
String literals [...] It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined. (6.4.5 p 7)
Like string literals, const-qualified compound literals can be placed into read-only memory and can even be shared. (6.5.2.5 p 13).
A book on C programming says,
"There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals.
Takeaway - String literals are read-only.
If introduced today, the type of string literals would certainly be char const[], an array of const-qualified characters. Unfortunately, the const keyword was introduced to the C language much later than string literals, and therefore it remained as it is for backward
compatibility."
Question 1. How can strings be read only, like takeaway says, if they can be modified?
Question 2. "There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals."
What type is this referring to, which doesn't keep a string literal from being modified?
Question 3. If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
How can strings be read only, like takeaway says, if they can be modified?
Because Unfortunately, the const keyword was introduced to the C language much later than string literals, and therefore it remained as it is for backward compatibility.
String literals existed before protection (in the form of const keyword) existed in the language.
Ergo, they are not protected, because the tools to protect it did not exist. A classic example of undefined behavior is writing to a string literal, see Undefined, unspecified and implementation-defined behavior .
What type is this referring to, which doesn't keep a string literal from being modified?
It has the type char[N] - "array of N chars" - where N is the number of characters in the string plus one for zero terminating character.
See https://en.cppreference.com/w/c/language/string_literal .
char *str = "string literal";
// ^^^ - implicit decay from `char [N]` -> `char *`
str[0] = 'a'; // compiler will be fine, but it's invalid code
// or super shorter form:
"string literal"[0] = 'a'; // invalid code
If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
The type would be const char[N] - an array of N constant chars, which means you can't modify the characters.
// assuming string literal has type const char [N]
const char *str = "string literal";
// ^^^ - implicit decay from `const char [N]` -> `const char *`
str[0] = 'a'; // compile time error
// or super shorter form:
"string literal"[0] = 'a'; // with `const char [N]` would be a compiler time error
With gcc compiler use -Wwrite-strings to protect against mistakes like writing to string literals.
Question 1. How can strings be read only, like takeaway says, if they can be modified?
The text does not say they can be modified. It says they are not protected from being modified. That is a slight error; properly, it should say they are not protected from attempts to modify them: The rules of the C standard do not prevent you from writing code that attempts to modify a string literal, and they do not define the results when a program executes such an attempt. In some circumstances, attempting to modify a string literal may result in a signal, usually ending program execution by default. In other circumstances, the attempt may succeed, and the string literal will be modified. In other circumstances, nothing will happen; there will be neither a signal nor a change to the string literal. It is also possible other behaviors may occur.
Question 2. "There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals."
What type is this referring to, which doesn't keep a string literal from being modified?
Technically, a string literal is a piece of source code that has a character sequence inside quotes, optionally with an encoding prefix. During compilation or program execution, an array is generated with the contents of the character sequence and a terminating null character. For string literals without a prefix, the type of that array is char []. (If there is a prefix, the type may also be wchar_t [], char16_t [], or char32_t, depending on the prefix.)
Colloquially, we often refer to this array as the string literal, even though the array is the thing that results from a string literal (an array in memory) not the actual string literal (in the source code).
The type char [] does not contain const, so it does not offer the protections that const char [] does. (Those protections are fairly mild.)
Question 3. If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
Your confusion here is unclear. When a string literal appears in source code, the compiler arranges for its contents to be in the memory of the running program. Those contents are in memory as an array of characters. If the rules of C were different, the type of that array would be const char [] instead of char [].
I am trying to understand the reason behind not being able to modify a string literal in C.
Why is the following illegal in C?
char* p = "abc";
*p = 'd';
From the C89 Rationale, 3.1.4 String literals:
String literals are specified to be unmodifiable. This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and perform certain optimizations. However, string literals do not have the type array of const char, in order to avoid the problems of pointer type checking, particularly with library functions, since assigning a pointer to const char to a plain pointer to char is not valid. Those members of the Committee who insisted that string literals should be modifiable were content to have this practice designated a common extension (see F.5.5).
String is said to be a constant in C programming language.
So, when I give a statement like char *s = "Hello", I have learned that s points to a memory location of H since "Hello" is stored in some static memory of the program and also "Hello" is immutable.
Does it mean the variable s is now a variable of type pointer to constant data such as const int a = 3;const int *i = &a;. This seems so because I can't manipulate the data (when I do, it results in segmentation fault).
But, if it is so, shouldn't compiler be able to detect and say that I have assigned qualified data to unqualified variable.
Something like char *p p is a pointer to unqualified character and when I say char *p="Hello" p, the pointer to unqualified character can't point to a const character type?
What am I missing here?
If it is not the case as above, then how is an array of constant characters made immutable?
First of all, a string in C isn't immutable. C doesn't even know a type for strings -- a string is just defined as a sequence of char ending with '\0'.
What you're talking about are string literals and they can be immutable. The C standard defines that attempting to modify a string literal is undefined behavior, still their type is char *. So, if you are sure that in your implementation of C, a string literal is writable, you can do so! *)
But your code won't be well-defined C any more and won't work on other platforms with read-only string literals. It will compile, because writing through char * is perfectly fine, but fail at runtime in unpredictable ways (like, possibly, a crash).
Therefore, it's just best practice for portable code to assign string literals only to const char * pointers and, if you need a mutable string, use the string literal as an initializer for a char [].
*) beware this is very uncommon, you'll find it nowadays only with specialized compilers targeting embedded or very old platforms. A modern platform will place string literals in a read-only data segment or similar.
Syntax char *s = "Hello"; is present from days when const keyword was not part of C specs. Later it remained for reverse compatibility. Writing to such s[i] would lead to undefined behaviour. (Seg fault observed in your case for few runs)
This behaviour (Conversion from string literal or const char [] to non-constant char *) was supported in C++ briefly until C++11 and then deprecated.
Type safety in C is limited.
I was going through some documentation which states that
First Case
char * p_var="Sack";
will create a constant string literal.
And hence code like
p_var[1]="u";
will fail because of that property.
Second Case
Also mentioned is that this is possible only for character literals and not for other data types through pointers. So code like
float *p="3.14";
will fail, resulting in a compiler error.
But when i try it out i don't get compiler errors ,accessing it though gives me 0.000000f(using gcc on Ubuntu).
So regarding the above, i have three queries:
Why are string literals created in First Case read-only?
Why are only string literals allowed to be created and not other constants like float through pointers?
3. Why is Second Case not giving me compiler errors?
Update
Please discard the 3rd question and second case. I tested it by adding quotes.
Thanks
The premise is wrong: pointers don’t create any string literals, neither read-only nor writeable.
What does create a read-only string literal is the literal itself: "foo" is a read-only string literal. And if you assign it to a pointer, then that pointer points to a read-only memory location.
With that, let’s turn to your questions:
Why are string literals created in First Case read-only?
The real question is: why not? In most cases, you won’t want to change the value of a string literal later on so the default assumption makes sense. Furthermore, you can create writeable strings in C via other means.
Why are only string literals allowed to be created and not other constants like float?
Again, wrong assumption. You can create other constants:
float f = 1.23f;
Here, the 1.23f literal is read-only. You can also assign it to a constant variable:
const float f = 1.23f;
Why is Second Case not giving me compiler errors?
Because the compiler cannot check in general whether your pointer points to read-only memory or to writeable memory. Consider this:
char* p = "Hello";
char str[] = "world"; // `str` is a writeable string!
p = &str[0];
p[1] = 'x';
Here, p[1] = 'x' is entirely legal – if we hadn’t re-assigned p beforehand, it would have been illegal. Checking this cannot be generally done at compile-time.
Regarding your question:
Why are string literals created in First Case read-only?
char *p_var="Sack";
Well, the p_var is assigned with the starting address of the memory allocated to the string "Sack". p_var content is not read-only, since you haven't put the const keyword anywhere in C constructs. Although manipulating the p_var contents like strcpy or strcat may cause undefined behavior.
Quote C ISO 9899:
The declaration
char s[] = "abc", t[3] = "abc";
defines plain char array objects s and t whose elements are initialized with character string literals.
This declaration is identical to:
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };
The contents of the arrays are modifiable. On the other hand, the declaration:
char *p = "abc";
defines p with type pointer to char and initializes it to point to an object with type array of char with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
An explanation of why it could be read-only per your platform and compiler:
Commonly string literals will be put in "read-only-data" section which gets mapped into the process space as read-only (which is why you seem to not being allowed to change it).
But some platforms do allow, the data segment to be writable.
Why are only string literals allowed to be created and not other constants like float? and the third question.
To create a float constant you should use:
const float f=1.5f;
Now, when you are doing:
float *p="3.14";
you are basically assigning the string literal's address to a float pointer.
Try compiling with -Wall -Werror -Wextra. You will find out what is happening.
It works because, in practice, there's no difference between a char * and a float * under the hood.
Its as if you are writing this:
float *p=(float*) "3.14";
This is a well-defined behaviour, unless the memory alignment requirements of float and char differ, in which case it results in undefined behaviour (Reference: C99, 6.3.2.3 p7).
Efficiency
they are
It's a string mind the quotes
float *p="3.14";
This is also a string literal !
Why are string literals created in First Case read-only?
No, both "sack" and "3.14" are string literals and both are read-only.
Why are only string literals allowed to be created and not other constants like float?
If you want to create a float const then do:
const float p=3.14;
Why is Second Case not giving me compiler errors?
You are making the pointer p point to a string literal. When you dereference p, it expects to read a float value. So there's nothing wrong as far as the compiler can see.