What is difference between two definitions below:
char *str1 = "string 1"; // (1)
const char *str2 = "string 2"; // (2)
(1), is this an undefined behavior for string literal ?
If no, what is definition we should use (can you give me some examples) ?
If you need a mutable string you should use the following
char str1[]="string 1";
In C++ you cannot convert a string literal to non-const, in C you can, but this practice is not recommended.
Update.
In C++ you may do the following
char *str1 = (char *)"string 1" ; // (1)
But you must not use this pointer to change the value of the string.
(1), is this an undefined behavior for string literal ?
This declaration
char *str1 = "string 1"; // (1)
is a valid declaration of a pointer to a string literal in C. Opposite to C++ in C string literals have types of non-constant character arrays.
However string literals are immutable in C as in C++. You may not change a string literal. Any attempt to change a string literal results in undefined behavior.
From the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
It is preferable to declare pointers to string literals as it is required in C++ that is like
const char *str2 = "string 2"; // (2)
This makes your program more safer because the compiler will issue an error if you will try to pass a pointer to a string literal to a function the corresponding parameter of which is a pointer to non-constant char.
Related
A book on C programming says,
"There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals.
Takeaway - String literals are read-only.
If introduced today, the type of string literals would certainly be char const[], an array of const-qualified characters. Unfortunately, the const keyword was introduced to the C language much later than string literals, and therefore it remained as it is for backward
compatibility."
Question 1. How can strings be read only, like takeaway says, if they can be modified?
Question 2. "There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals."
What type is this referring to, which doesn't keep a string literal from being modified?
Question 3. If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
How can strings be read only, like takeaway says, if they can be modified?
Because Unfortunately, the const keyword was introduced to the C language much later than string literals, and therefore it remained as it is for backward compatibility.
String literals existed before protection (in the form of const keyword) existed in the language.
Ergo, they are not protected, because the tools to protect it did not exist. A classic example of undefined behavior is writing to a string literal, see Undefined, unspecified and implementation-defined behavior .
What type is this referring to, which doesn't keep a string literal from being modified?
It has the type char[N] - "array of N chars" - where N is the number of characters in the string plus one for zero terminating character.
See https://en.cppreference.com/w/c/language/string_literal .
char *str = "string literal";
// ^^^ - implicit decay from `char [N]` -> `char *`
str[0] = 'a'; // compiler will be fine, but it's invalid code
// or super shorter form:
"string literal"[0] = 'a'; // invalid code
If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
The type would be const char[N] - an array of N constant chars, which means you can't modify the characters.
// assuming string literal has type const char [N]
const char *str = "string literal";
// ^^^ - implicit decay from `const char [N]` -> `const char *`
str[0] = 'a'; // compile time error
// or super shorter form:
"string literal"[0] = 'a'; // with `const char [N]` would be a compiler time error
With gcc compiler use -Wwrite-strings to protect against mistakes like writing to string literals.
Question 1. How can strings be read only, like takeaway says, if they can be modified?
The text does not say they can be modified. It says they are not protected from being modified. That is a slight error; properly, it should say they are not protected from attempts to modify them: The rules of the C standard do not prevent you from writing code that attempts to modify a string literal, and they do not define the results when a program executes such an attempt. In some circumstances, attempting to modify a string literal may result in a signal, usually ending program execution by default. In other circumstances, the attempt may succeed, and the string literal will be modified. In other circumstances, nothing will happen; there will be neither a signal nor a change to the string literal. It is also possible other behaviors may occur.
Question 2. "There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals."
What type is this referring to, which doesn't keep a string literal from being modified?
Technically, a string literal is a piece of source code that has a character sequence inside quotes, optionally with an encoding prefix. During compilation or program execution, an array is generated with the contents of the character sequence and a terminating null character. For string literals without a prefix, the type of that array is char []. (If there is a prefix, the type may also be wchar_t [], char16_t [], or char32_t, depending on the prefix.)
Colloquially, we often refer to this array as the string literal, even though the array is the thing that results from a string literal (an array in memory) not the actual string literal (in the source code).
The type char [] does not contain const, so it does not offer the protections that const char [] does. (Those protections are fairly mild.)
Question 3. If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
Your confusion here is unclear. When a string literal appears in source code, the compiler arranges for its contents to be in the memory of the running program. Those contents are in memory as an array of characters. If the rules of C were different, the type of that array would be const char [] instead of char [].
I am trying to understand the reason behind not being able to modify a string literal in C.
Why is the following illegal in C?
char* p = "abc";
*p = 'd';
From the C89 Rationale, 3.1.4 String literals:
String literals are specified to be unmodifiable. This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and perform certain optimizations. However, string literals do not have the type array of const char, in order to avoid the problems of pointer type checking, particularly with library functions, since assigning a pointer to const char to a plain pointer to char is not valid. Those members of the Committee who insisted that string literals should be modifiable were content to have this practice designated a common extension (see F.5.5).
One does usually associate 'unmodifiable' with the term literal
char* str = "Hello World!";
*str = 'B'; // Bus Error!
However when using compound literals, I quickly discovered they are completely modifiable (and looking at the generated machine code, you see they are pushed on the stack):
char* str = (char[]){"Hello World"};
*str = 'B'; // A-Okay!
I'm compiling with clang-703.0.29. Shouldn't those two examples generate the exact same machine code? Is a compound literal really a literal, if it's modifiable?
EDIT: An even shorter example would be:
"Hello World"[0] = 'B'; // Bus Error!
(char[]){"Hello World"}[0] = 'B'; // Okay!
A compound literal is an lvalue and values of its elements are modifiable. In case of
char* str = (char[]){"Hello World"};
*str = 'B'; // A-Okay!
you are modifying a compound literal which is legal.
C11-§6.5.2.5/4:
If the type name specifies an array of unknown size, the size is determined by the initializer list as specified in 6.7.9, and the type of the compound literal is that of the completed array type. Otherwise (when the type name specifies an object type), the type
of the compound literal is that specified by the type name. In either case, the result is an lvalue.
As it can be seen that the type of compound literal is a complete array type and is lvalue, therefore it is modifiable unlike string literals
Standard also mention that
§6.5.2.5/7:
String literals, and compound literals with const-qualified types, need not designate distinct objects.101
Further it says:
11 EXAMPLE 4 A read-only compound literal can be specified through constructions like:
(const float []){1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6}
12 EXAMPLE 5 The following three expressions have different meanings:
"/tmp/fileXXXXXX"
(char []){"/tmp/fileXXXXXX"}
(const char []){"/tmp/fileXXXXXX"}
The first always has static storage duration and has type array of char, but need not be modifiable; the last two have automatic storage duration when they occur within the body of a function, and the first of these
two is modifiable.
13 EXAMPLE 6 Like string literals, const-qualified compound literals can be placed into read-only memory and can even be shared. For example,
(const char []){"abc"} == "abc"
might yield 1 if the literals’ storage is shared.
The compound literal syntax is a short hand expression equivalent to a local declaration with an initializer followed by a reference to the unnamed object thus declared:
char *str = (char[]){ "Hello World" };
is equivalent to:
char __unnamed__[] = { "Hello world" };
char *str = __unnamed__;
The __unnamed__ has automatic storage and is defined as modifiable, it can be modified via the pointer str initialized to point to it.
In the case of char *str = "Hello World!"; the object pointed to by str is not supposed to be modified. In fact attempting to modify it has undefined behavior.
The C Standard could have defined such string literals as having type const char[] instead of char[], but this would generate many warnings and errors in legacy code.
Yet it is advisable to pass a flag to the compiler to make such string literals implicitly const and make the whole project const correct, ie: defining all pointer arguments that are not used to modify their object as const. For gcc and clang, the command line option is -Wwrite-strings. I also strongly advise to enable many more warnings and make them fatal with -Wall -W -Werror.
I am trying to understanding the passing of string to a called function and modifying the elements of the array inside the called function.
void foo(char p[]){
p[0] = 'a';
printf("%s",p);
}
void main(){
char p[] = "jkahsdkjs";
p[0] = 'a';
printf("%s",p);
foo("fgfgf");
}
Above code returns an exception. I know that string in C is immutable, but would like to know what is there is difference between modifying in main and modifying the calling function. What happens in case of other date types?
I know that string in C is immutable
That's not true. The correct version is: modifying string literals in C are undefined behaviors.
In main(), you defined the string as:
char p[] = "jkahsdkjs";
which is a non-literal character array, so you can modify it. But what you passed to foo is "fgfgf", which is a string literal.
Change it to:
char str[] = "fgfgf";
foo(str);
would be fine.
In the first case:
char p[] = "jkahsdkjs";
p is an array that is initialized with a copy of the string literal. Since you don't specify the size it will determined by the length of the string literal plus the null terminating character. This is covered in the draft C99 standard section 6.7.8 Initialization paragraph 14:
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
in the second case:
foo("fgfgf");
you are attempting to modify a string literal which is undefined behavior, which means the behavior of program is unpredictable, and an exception is one possibility. From the C99 draft standard section 6.4.5 String literals paragraph 6 (emphasis mine):
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
The difference is in how you are initializing p[].
char p[] = "jkahsdkjs";
This initializas a writeable array called p, auto-sized to be large enough to contain your string and stored on the stack at runtime.
However, in the case of:
foo("fgfgf");
You are passing in a pointer to the actual string literal, which are usually enforced as read-only in most compilers.
What happens in case of other date types?
String literals are a very special case. Other data types, such as int, etc do not have an issue that is analogous to this, since they are stored strictly by value.
I am working on Microsoft Visual Studio environment. I came across a strange behavior
char *src ="123";
char *des ="abc";
printf("\nThe src string is %c", src[0]);
printf("\tThe dest string is %c",dest[0]);
des[0] = src[0];
printf("\nThe src string is %c", src[0]);
printf("\tThe dest string is %c",dest[0]);
The result is:
1 a
1 a
That means the des[0] is not being initialized. As src is pointing to the first element of the string. I guess by rules this should work.
This is undefined behavior:
des[0] = src[0];
Try this instead:
char des[] ="abc";
Since src and des are initialized with string literals, their type should actually be const char *, not char *; like this:
const char * src ="123";
const char * des ="abc";
There was never memory allocated for either of them, they just point to the predefined constants. Therefore, the statement des[0] = src[0] is undefined behavior; you're trying to change a constant there!
Any decent compiler should actually warn you about the implicit conversion from const char * to char *...
If using C++, consider using std::string instead of char *, and std::cout instead of printf.
Section 2.13.4 of ISO/IEC 14882 (Programming languages - C++) says:
A string literal is a sequence of characters (as defined in 2.13.2) surrounded by double quotes, optionally beginning with the letter L, as in "..." or L"...". A string literal that does not begin with L is an ordinary string literal, also referred to as a narrow string literal. An ordinary string literal has type “array of n const char” and static storage duration (3.7), where n is the size of the string as defined below, and is initialized with the given characters. ...
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation defined. The effect of attempting to modify a string literal is undefined.
In C, string literals such as "123" are stored as arrays of char (const char in C++). These arrays are stored in memory such that they are available over the lifetime of the program. Attempting to modify the contents of a string literal results in undefined behavior; sometimes it will "work", sometimes it won't, depending on the compiler and the platform, so it's best to treat string literals as unwritable.
Remember that under most circumstances, an expression of type "N-element array of T" will be converted to an expression of type "pointer to T" whose value is the location of the first element in the array.
Thus, when you write
char *src = "123";
char *des = "abc";
the expressions "123" and "abc" are converted from "3-element array of char" to "pointer to char", and src will point to the '1' in "123", and des will point to the 'a' in "abc".
Again, attempting to modify the contents of a string literal results in undefined behavior, so when you write
des[0] = src[0];
the compiler is free to treat that statement any way it wants to, from ignoring it completely to doing exactly what you expect it to do to anything in between. That means that string literals, or a pointer to them, cannot be used as target parameters to calls like strcpy, strcat, memcpy, etc., nor should they be used as parameters to calls like strtok.
vinaygarg: That means the des[0] is not being initialized. As src is pointing to the first element of the string. I guess by rules this should work.
Firstly you must remember that *src and *dst are defined as pointers, nothing more, nothing less.
So you must then ask yourself what exactly "123" and "abc" are and why it cannot be altered? Well to cut a long story short, it is stored in application memory, which is read-only. Why? The strings must be stored with the program in order to be available to your code at run time, in theory you should get a compiler warning for assigning a non-const char* to a const char *. Why is it read-only? The memory for exe's and dll's need to be protected from being overwritten somehow, so it must be read-only to stop bugs and viruses from modifying executing code.
So how can you get this string into modifiable memory?
// Copying into an array.
const size_t BUFFER_SIZE = 256;
char buffer[BUFFER_SIZE];
strcpy(buffer, "abc");
strncpy(buffer, "abc", BUFFER_SIZE-1);