A book on C programming says,
"There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals.
Takeaway - String literals are read-only.
If introduced today, the type of string literals would certainly be char const[], an array of const-qualified characters. Unfortunately, the const keyword was introduced to the C language much later than string literals, and therefore it remained as it is for backward
compatibility."
Question 1. How can strings be read only, like takeaway says, if they can be modified?
Question 2. "There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals."
What type is this referring to, which doesn't keep a string literal from being modified?
Question 3. If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
How can strings be read only, like takeaway says, if they can be modified?
Because Unfortunately, the const keyword was introduced to the C language much later than string literals, and therefore it remained as it is for backward compatibility.
String literals existed before protection (in the form of const keyword) existed in the language.
Ergo, they are not protected, because the tools to protect it did not exist. A classic example of undefined behavior is writing to a string literal, see Undefined, unspecified and implementation-defined behavior .
What type is this referring to, which doesn't keep a string literal from being modified?
It has the type char[N] - "array of N chars" - where N is the number of characters in the string plus one for zero terminating character.
See https://en.cppreference.com/w/c/language/string_literal .
char *str = "string literal";
// ^^^ - implicit decay from `char [N]` -> `char *`
str[0] = 'a'; // compiler will be fine, but it's invalid code
// or super shorter form:
"string literal"[0] = 'a'; // invalid code
If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
The type would be const char[N] - an array of N constant chars, which means you can't modify the characters.
// assuming string literal has type const char [N]
const char *str = "string literal";
// ^^^ - implicit decay from `const char [N]` -> `const char *`
str[0] = 'a'; // compile time error
// or super shorter form:
"string literal"[0] = 'a'; // with `const char [N]` would be a compiler time error
With gcc compiler use -Wwrite-strings to protect against mistakes like writing to string literals.
Question 1. How can strings be read only, like takeaway says, if they can be modified?
The text does not say they can be modified. It says they are not protected from being modified. That is a slight error; properly, it should say they are not protected from attempts to modify them: The rules of the C standard do not prevent you from writing code that attempts to modify a string literal, and they do not define the results when a program executes such an attempt. In some circumstances, attempting to modify a string literal may result in a signal, usually ending program execution by default. In other circumstances, the attempt may succeed, and the string literal will be modified. In other circumstances, nothing will happen; there will be neither a signal nor a change to the string literal. It is also possible other behaviors may occur.
Question 2. "There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals."
What type is this referring to, which doesn't keep a string literal from being modified?
Technically, a string literal is a piece of source code that has a character sequence inside quotes, optionally with an encoding prefix. During compilation or program execution, an array is generated with the contents of the character sequence and a terminating null character. For string literals without a prefix, the type of that array is char []. (If there is a prefix, the type may also be wchar_t [], char16_t [], or char32_t, depending on the prefix.)
Colloquially, we often refer to this array as the string literal, even though the array is the thing that results from a string literal (an array in memory) not the actual string literal (in the source code).
The type char [] does not contain const, so it does not offer the protections that const char [] does. (Those protections are fairly mild.)
Question 3. If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
Your confusion here is unclear. When a string literal appears in source code, the compiler arranges for its contents to be in the memory of the running program. Those contents are in memory as an array of characters. If the rules of C were different, the type of that array would be const char [] instead of char [].
Related
I am trying to understand the reason behind not being able to modify a string literal in C.
Why is the following illegal in C?
char* p = "abc";
*p = 'd';
From the C89 Rationale, 3.1.4 String literals:
String literals are specified to be unmodifiable. This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and perform certain optimizations. However, string literals do not have the type array of const char, in order to avoid the problems of pointer type checking, particularly with library functions, since assigning a pointer to const char to a plain pointer to char is not valid. Those members of the Committee who insisted that string literals should be modifiable were content to have this practice designated a common extension (see F.5.5).
I am trying to understand the reason behind not being able to modify a string literal in C.
Why is the following illegal in C?
char* p = "abc";
*p = 'd';
From the C89 Rationale, 3.1.4 String literals:
String literals are specified to be unmodifiable. This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and perform certain optimizations. However, string literals do not have the type array of const char, in order to avoid the problems of pointer type checking, particularly with library functions, since assigning a pointer to const char to a plain pointer to char is not valid. Those members of the Committee who insisted that string literals should be modifiable were content to have this practice designated a common extension (see F.5.5).
String is said to be a constant in C programming language.
So, when I give a statement like char *s = "Hello", I have learned that s points to a memory location of H since "Hello" is stored in some static memory of the program and also "Hello" is immutable.
Does it mean the variable s is now a variable of type pointer to constant data such as const int a = 3;const int *i = &a;. This seems so because I can't manipulate the data (when I do, it results in segmentation fault).
But, if it is so, shouldn't compiler be able to detect and say that I have assigned qualified data to unqualified variable.
Something like char *p p is a pointer to unqualified character and when I say char *p="Hello" p, the pointer to unqualified character can't point to a const character type?
What am I missing here?
If it is not the case as above, then how is an array of constant characters made immutable?
First of all, a string in C isn't immutable. C doesn't even know a type for strings -- a string is just defined as a sequence of char ending with '\0'.
What you're talking about are string literals and they can be immutable. The C standard defines that attempting to modify a string literal is undefined behavior, still their type is char *. So, if you are sure that in your implementation of C, a string literal is writable, you can do so! *)
But your code won't be well-defined C any more and won't work on other platforms with read-only string literals. It will compile, because writing through char * is perfectly fine, but fail at runtime in unpredictable ways (like, possibly, a crash).
Therefore, it's just best practice for portable code to assign string literals only to const char * pointers and, if you need a mutable string, use the string literal as an initializer for a char [].
*) beware this is very uncommon, you'll find it nowadays only with specialized compilers targeting embedded or very old platforms. A modern platform will place string literals in a read-only data segment or similar.
Syntax char *s = "Hello"; is present from days when const keyword was not part of C specs. Later it remained for reverse compatibility. Writing to such s[i] would lead to undefined behaviour. (Seg fault observed in your case for few runs)
This behaviour (Conversion from string literal or const char [] to non-constant char *) was supported in C++ briefly until C++11 and then deprecated.
Type safety in C is limited.
I am trying to understanding the passing of string to a called function and modifying the elements of the array inside the called function.
void foo(char p[]){
p[0] = 'a';
printf("%s",p);
}
void main(){
char p[] = "jkahsdkjs";
p[0] = 'a';
printf("%s",p);
foo("fgfgf");
}
Above code returns an exception. I know that string in C is immutable, but would like to know what is there is difference between modifying in main and modifying the calling function. What happens in case of other date types?
I know that string in C is immutable
That's not true. The correct version is: modifying string literals in C are undefined behaviors.
In main(), you defined the string as:
char p[] = "jkahsdkjs";
which is a non-literal character array, so you can modify it. But what you passed to foo is "fgfgf", which is a string literal.
Change it to:
char str[] = "fgfgf";
foo(str);
would be fine.
In the first case:
char p[] = "jkahsdkjs";
p is an array that is initialized with a copy of the string literal. Since you don't specify the size it will determined by the length of the string literal plus the null terminating character. This is covered in the draft C99 standard section 6.7.8 Initialization paragraph 14:
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
in the second case:
foo("fgfgf");
you are attempting to modify a string literal which is undefined behavior, which means the behavior of program is unpredictable, and an exception is one possibility. From the C99 draft standard section 6.4.5 String literals paragraph 6 (emphasis mine):
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
The difference is in how you are initializing p[].
char p[] = "jkahsdkjs";
This initializas a writeable array called p, auto-sized to be large enough to contain your string and stored on the stack at runtime.
However, in the case of:
foo("fgfgf");
You are passing in a pointer to the actual string literal, which are usually enforced as read-only in most compilers.
What happens in case of other date types?
String literals are a very special case. Other data types, such as int, etc do not have an issue that is analogous to this, since they are stored strictly by value.
I was going through some documentation which states that
First Case
char * p_var="Sack";
will create a constant string literal.
And hence code like
p_var[1]="u";
will fail because of that property.
Second Case
Also mentioned is that this is possible only for character literals and not for other data types through pointers. So code like
float *p="3.14";
will fail, resulting in a compiler error.
But when i try it out i don't get compiler errors ,accessing it though gives me 0.000000f(using gcc on Ubuntu).
So regarding the above, i have three queries:
Why are string literals created in First Case read-only?
Why are only string literals allowed to be created and not other constants like float through pointers?
3. Why is Second Case not giving me compiler errors?
Update
Please discard the 3rd question and second case. I tested it by adding quotes.
Thanks
The premise is wrong: pointers don’t create any string literals, neither read-only nor writeable.
What does create a read-only string literal is the literal itself: "foo" is a read-only string literal. And if you assign it to a pointer, then that pointer points to a read-only memory location.
With that, let’s turn to your questions:
Why are string literals created in First Case read-only?
The real question is: why not? In most cases, you won’t want to change the value of a string literal later on so the default assumption makes sense. Furthermore, you can create writeable strings in C via other means.
Why are only string literals allowed to be created and not other constants like float?
Again, wrong assumption. You can create other constants:
float f = 1.23f;
Here, the 1.23f literal is read-only. You can also assign it to a constant variable:
const float f = 1.23f;
Why is Second Case not giving me compiler errors?
Because the compiler cannot check in general whether your pointer points to read-only memory or to writeable memory. Consider this:
char* p = "Hello";
char str[] = "world"; // `str` is a writeable string!
p = &str[0];
p[1] = 'x';
Here, p[1] = 'x' is entirely legal – if we hadn’t re-assigned p beforehand, it would have been illegal. Checking this cannot be generally done at compile-time.
Regarding your question:
Why are string literals created in First Case read-only?
char *p_var="Sack";
Well, the p_var is assigned with the starting address of the memory allocated to the string "Sack". p_var content is not read-only, since you haven't put the const keyword anywhere in C constructs. Although manipulating the p_var contents like strcpy or strcat may cause undefined behavior.
Quote C ISO 9899:
The declaration
char s[] = "abc", t[3] = "abc";
defines plain char array objects s and t whose elements are initialized with character string literals.
This declaration is identical to:
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };
The contents of the arrays are modifiable. On the other hand, the declaration:
char *p = "abc";
defines p with type pointer to char and initializes it to point to an object with type array of char with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
An explanation of why it could be read-only per your platform and compiler:
Commonly string literals will be put in "read-only-data" section which gets mapped into the process space as read-only (which is why you seem to not being allowed to change it).
But some platforms do allow, the data segment to be writable.
Why are only string literals allowed to be created and not other constants like float? and the third question.
To create a float constant you should use:
const float f=1.5f;
Now, when you are doing:
float *p="3.14";
you are basically assigning the string literal's address to a float pointer.
Try compiling with -Wall -Werror -Wextra. You will find out what is happening.
It works because, in practice, there's no difference between a char * and a float * under the hood.
Its as if you are writing this:
float *p=(float*) "3.14";
This is a well-defined behaviour, unless the memory alignment requirements of float and char differ, in which case it results in undefined behaviour (Reference: C99, 6.3.2.3 p7).
Efficiency
they are
It's a string mind the quotes
float *p="3.14";
This is also a string literal !
Why are string literals created in First Case read-only?
No, both "sack" and "3.14" are string literals and both are read-only.
Why are only string literals allowed to be created and not other constants like float?
If you want to create a float const then do:
const float p=3.14;
Why is Second Case not giving me compiler errors?
You are making the pointer p point to a string literal. When you dereference p, it expects to read a float value. So there's nothing wrong as far as the compiler can see.