Why are compound literals in C modifiable - c

One does usually associate 'unmodifiable' with the term literal
char* str = "Hello World!";
*str = 'B'; // Bus Error!
However when using compound literals, I quickly discovered they are completely modifiable (and looking at the generated machine code, you see they are pushed on the stack):
char* str = (char[]){"Hello World"};
*str = 'B'; // A-Okay!
I'm compiling with clang-703.0.29. Shouldn't those two examples generate the exact same machine code? Is a compound literal really a literal, if it's modifiable?
EDIT: An even shorter example would be:
"Hello World"[0] = 'B'; // Bus Error!
(char[]){"Hello World"}[0] = 'B'; // Okay!

A compound literal is an lvalue and values of its elements are modifiable. In case of
char* str = (char[]){"Hello World"};
*str = 'B'; // A-Okay!
you are modifying a compound literal which is legal.
C11-§6.5.2.5/4:
If the type name specifies an array of unknown size, the size is determined by the initializer list as specified in 6.7.9, and the type of the compound literal is that of the completed array type. Otherwise (when the type name specifies an object type), the type
of the compound literal is that specified by the type name. In either case, the result is an lvalue.
As it can be seen that the type of compound literal is a complete array type and is lvalue, therefore it is modifiable unlike string literals
Standard also mention that
§6.5.2.5/7:
String literals, and compound literals with const-qualified types, need not designate distinct objects.101
Further it says:
11 EXAMPLE 4 A read-only compound literal can be specified through constructions like:
(const float []){1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6}
12 EXAMPLE 5 The following three expressions have different meanings:
"/tmp/fileXXXXXX"
(char []){"/tmp/fileXXXXXX"}
(const char []){"/tmp/fileXXXXXX"}
The first always has static storage duration and has type array of char, but need not be modifiable; the last two have automatic storage duration when they occur within the body of a function, and the first of these
two is modifiable.
13 EXAMPLE 6 Like string literals, const-qualified compound literals can be placed into read-only memory and can even be shared. For example,
(const char []){"abc"} == "abc"
might yield 1 if the literals’ storage is shared.

The compound literal syntax is a short hand expression equivalent to a local declaration with an initializer followed by a reference to the unnamed object thus declared:
char *str = (char[]){ "Hello World" };
is equivalent to:
char __unnamed__[] = { "Hello world" };
char *str = __unnamed__;
The __unnamed__ has automatic storage and is defined as modifiable, it can be modified via the pointer str initialized to point to it.
In the case of char *str = "Hello World!"; the object pointed to by str is not supposed to be modified. In fact attempting to modify it has undefined behavior.
The C Standard could have defined such string literals as having type const char[] instead of char[], but this would generate many warnings and errors in legacy code.
Yet it is advisable to pass a flag to the compiler to make such string literals implicitly const and make the whole project const correct, ie: defining all pointer arguments that are not used to modify their object as const. For gcc and clang, the command line option is -Wwrite-strings. I also strongly advise to enable many more warnings and make them fatal with -Wall -W -Werror.

Related

Are these two initializations equivalent?

char *str = "String!";
char *str = (char []){"String!"};
Are these two initializations equivalent? If not, what is the difference between them?
Are these two initializations equivalent?
No.
If not, what is the difference between them?
One of the major differences between the two is that you cannot modify the object that str is pointing to in the first snippet of code, while in the second one, you can.
Trying to modify a string literal in C is undefined behavior. So in case of
char *str = "String!";
if you try to modify the object pointed to by str, it'll invoke UB.
In case of
char *str = (char []){"String!"};
however, (char[]){"String!"} is a compound literal, which has type array of chars, and str points to the first element of that array.
Since the compound literal is not read-only (doesn't have a const qualifier), you can modify the object pointed to by str.
Another difference you should note is that the string literal "String!" in the first one has static storage duration, while the compound literal (char []){"String!"} in the second one has static storage duration only if it occurs outside the body of a function; otherwise, it has automatic storage duration associated with the enclosing block.
From n1570 6.5.2.5 (Compound literals) p12:
12 EXAMPLE 5 The following three expressions have different meanings:
"/tmp/fileXXXXXX"
(char []){"/tmp/fileXXXXXX"}
(const char []){"/tmp/fileXXXXXX"}
The first always has static storage duration and has type array of
char, but need not be modifiable; the last two have automatic storage
duration when they occur within the body of a function, and the first
of these two is modifiable.
Are these two initializations equivalent?
No. There are at least three differences:
The C standard does not define the behavior upon attempting to modify the characters of "String!", but it does define the behavior of attempting to modify the characters of (char []){"String!"}.
Each occurrence of "String!" may be the same object, but each occurrence of (char []){"String!"} is a distinct object.
The array defined by "String!" has static lifetime (the memory for it is reserved for all of program execution), but the array defined by (char []){"String!"} has static lifetime if it is outside of any function and automatic lifetime if it is inside a function.
Let’s examine these differences:
The C standard only requires the contents of string literals to be available for reading, not writing. It does not define the behavior if you attempt to modify them. This does not mean a program may not modify them, just that the standard does not say what happens if a program tries. In consequence, good engineers will not attempt to modify them in ordinary situations. (However, a C implementation may define the behavior, in which case a program for that implementation could make use of that.)
C implementations are allowed to coalesce string literals and parts of them, so defining two pointers initialized with the same string, as with char *str0 = "String!"; and char *str1 = "String!";, may yield two pointers with the same value. This is due to a special rule in the C standard for string literals, so it does not apply for compound literals. When two pointers are initialized with the same compound literal source code, they must have different values. This means that multiple uses of compound literals must use more memory than multiple uses of strings, unless the compiler is able to optimize them away. Even a single use of a compound literal to initialize a pointer inside a function may cause more use of memory because the compiler typically must use space on the stack for the compound literal and separately have a copy of the string used to initialize it.
Because a string literal has static lifetime, its address may be returned from a function and used by the caller. However, a compound literal may not be relied on after the function it is defined in returns. For example:
char *GetErrorMessageFromCode(int code)
{
char *ErrorMessages[] =
{
"invalid argument",
"out of range",
"resource unavailable",
…
}
return ErrorMessages[code];
/* The above is working code for returning pointers to (the first
characters of) static string literals. If compound literals were
used instead, the behavior would not be defined by the C standard
because the memory for the compound literals would not be reserved
after the function returns.
*/
}

What are the differences between string literal and const string literal

What is difference between two definitions below:
char *str1 = "string 1"; // (1)
const char *str2 = "string 2"; // (2)
(1), is this an undefined behavior for string literal ?
If no, what is definition we should use (can you give me some examples) ?
If you need a mutable string you should use the following
char str1[]="string 1";
In C++ you cannot convert a string literal to non-const, in C you can, but this practice is not recommended.
Update.
In C++ you may do the following
char *str1 = (char *)"string 1" ; // (1)
But you must not use this pointer to change the value of the string.
(1), is this an undefined behavior for string literal ?
This declaration
char *str1 = "string 1"; // (1)
is a valid declaration of a pointer to a string literal in C. Opposite to C++ in C string literals have types of non-constant character arrays.
However string literals are immutable in C as in C++. You may not change a string literal. Any attempt to change a string literal results in undefined behavior.
From the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
It is preferable to declare pointers to string literals as it is required in C++ that is like
const char *str2 = "string 2"; // (2)
This makes your program more safer because the compiler will issue an error if you will try to pass a pointer to a string literal to a function the corresponding parameter of which is a pointer to non-constant char.

Why do variables of type char* cstring = "myString" are impossible to modify but variables defined char[] cstring= "mystring" are possible to modify? [duplicate]

Both GCC and Clang do not complain if I assign a string literal to a char*, even when using lots of pedantic options (-Wall -W -pedantic -std=c99):
char *foo = "bar";
while they (of course) do complain if I assign a const char* to a char*.
Does this mean that string literals are considered to be of char* type? Shouldn't they be const char*? It's not defined behavior if they get modified!
And (an uncorrelated question) what about command line parameters (ie: argv): is it considered to be an array of string literals?
They are of type char[N] where N is the number of characters including the terminating \0. So yes you can assign them to char*, but you still cannot write to them (the effect will be undefined).
Wrt argv: It points to an array of pointers to strings. Those strings are explicitly modifiable. You can change them and they are required to hold the last stored value.
For completeness sake the C99 draft standard(C89 and C11 have similar wording) in section 6.4.5 String literals paragraph 5 says:
[...]a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence;[...]
So this says a string literal has static storage duration(lasts the lifetime of the program) and it's type is char[](not char *) and its length is the size of the string literal with an appended zero. *Paragraph 6` says:
If the program attempts to modify such an array, the behavior is undefined.
So attempting to modify a string literal is undefined behavior regardless of the fact that they are not const.
With respect to argv in section 5.1.2.2.1 Program startup paragraph 2 says:
If they are declared, the parameters to the main function shall obey the following
constraints:
[...]
-The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program
startup and program termination.
So argv is not considered an array of string literals and it is ok to modify the contents of argv.
Using -Wwrite-strings option you will get:
warning: initialization discards qualifiers from pointer target type
Irrespective of that option, GCC will put literals into read-only memory section, unless told otherwise by using -fwritable-strings (however this option has been removed from recent GCC versions).
Command line parameters are not const, they typically live on the stack.
(Sorry, I've only just noticed this question is tagged as c, not c++. Maybe my answer isn't so relevant to this question after all!)
String literals are not quite const or not-const, there is a special strange rule for literals.
(Summary: Literals can be taken by reference-to-array as foo( const char (&)[N]) and cannot be taken as the non-const array. They prefer to decay to const char *. So far, that makes it seem like they are const. But there is a special legacy rule which allows literals to decay to char *. See experiments below.)
(Following experiments done on clang3.3 with -std=gnu++0x. Perhaps this is a C++11 issue? Or specific to clang? Either way, there is something strange going on.)
At first, literals appears to be const:
void foo( const char * ) { std::cout << "const char *" << std::endl; }
void foo( char * ) { std::cout << " char *" << std::endl; }
int main() {
const char arr_cc[3] = "hi";
char arr_c[3] = "hi";
foo(arr_cc); // const char *
foo(arr_c); // char *
foo("hi"); // const char *
}
The two arrays behave as expected, demonstrating that foo is able to tell us whether the pointer is const or not. Then "hi" selects the const version of foo. So it seems like that settles it: literals are const ... aren't they?
But, if you remove void foo( const char * ) then it gets strange. First, the call to foo(arr_c) fails with an error at compile time. That is expected. But the literal call (foo("hi")) works via the non-const call.
So, literals are "more const" than arr_c (because they prefer to decay to the const char *, unlike arr_c. But literals are "less const" than arr_cc because they are willing to decay to char * if needed.
(Clang gives a warning when it decays to char *).
But what about the decaying? Let's avoid it for simplicity.
Let's take the arrays by reference into foo instead. This gives us more 'intuitive' results:
void foo( const char (&)[3] ) { std::cout << "const char (&)[3]" << std::endl; }
void foo( char (&)[3] ) { std::cout << " char (&)[3]" << std::endl; }
As before, the literal and the const array (arr_cc) use the const version, and the non-const version is used by arr_c. And if we delete foo( const char (&)[3] ), then we get errors with both foo(arr_cc); and foo("hi");. In short, if we avoid the pointer-decay and use reference-to-array instead, literals behave as if they are const.
Templates?
In templates, the system will deduce const char * instead of char * and you're "stuck" with that.
template<typename T>
void bar(T *t) { // will deduce const char when a literal is supplied
foo(t);
}
So basically, a literal behaves as const at all times, except in the particular case where you directly initialize a char * with a literal.
Johannes' answer is correct concerning the type and contents. But in addition to that, yes, it is undefined behavior to modify contents of a string literal.
Concerning your question about argv:
The parameters argc and argv and the
strings pointed to by the argv array
shall be modifiable by the program,
and retain their last-stored values
between program startup and program
termination.
In both C89 and C99, string literals are of type char * (for historical reasons, as I understand it). You are correct that trying to modify one results in undefined behavior. GCC has a specific warning flag, -Wwrite-strings (which is not part of -Wall), that will warn you if you try to do so.
As for argv, the arguments are copied into your program's address space, and can safely be modified in your main() function.
EDIT: Whoops, had -Wno-write-strings copied by accident. Updated with the correct (positive) form of the warning flag.
String literals have formal type char [] but semantic type const char []. The purists hate it but this is generally useful and harmless, except for bringing lots of newbies to SO with "WHY IS MY PROGRAM CRASHING?!?!" questions.
They are const char*, but there is a specific exclusion for assigning them to char* for legacy code that existed before const did. And the command line arguments are definitely not literal, they are created at run-time.

Pointers create constant string literals, why?

I was going through some documentation which states that
First Case
char * p_var="Sack";
will create a constant string literal.
And hence code like
p_var[1]="u";
will fail because of that property.
Second Case
Also mentioned is that this is possible only for character literals and not for other data types through pointers. So code like
float *p="3.14";
will fail, resulting in a compiler error.
But when i try it out i don't get compiler errors ,accessing it though gives me 0.000000f(using gcc on Ubuntu).
So regarding the above, i have three queries:
Why are string literals created in First Case read-only?
Why are only string literals allowed to be created and not other constants like float through pointers?
3. Why is Second Case not giving me compiler errors?
Update
Please discard the 3rd question and second case. I tested it by adding quotes.
Thanks
The premise is wrong: pointers don’t create any string literals, neither read-only nor writeable.
What does create a read-only string literal is the literal itself: "foo" is a read-only string literal. And if you assign it to a pointer, then that pointer points to a read-only memory location.
With that, let’s turn to your questions:
Why are string literals created in First Case read-only?
The real question is: why not? In most cases, you won’t want to change the value of a string literal later on so the default assumption makes sense. Furthermore, you can create writeable strings in C via other means.
Why are only string literals allowed to be created and not other constants like float?
Again, wrong assumption. You can create other constants:
float f = 1.23f;
Here, the 1.23f literal is read-only. You can also assign it to a constant variable:
const float f = 1.23f;
Why is Second Case not giving me compiler errors?
Because the compiler cannot check in general whether your pointer points to read-only memory or to writeable memory. Consider this:
char* p = "Hello";
char str[] = "world"; // `str` is a writeable string!
p = &str[0];
p[1] = 'x';
Here, p[1] = 'x' is entirely legal – if we hadn’t re-assigned p beforehand, it would have been illegal. Checking this cannot be generally done at compile-time.
Regarding your question:
Why are string literals created in First Case read-only?
char *p_var="Sack";
Well, the p_var is assigned with the starting address of the memory allocated to the string "Sack". p_var content is not read-only, since you haven't put the const keyword anywhere in C constructs. Although manipulating the p_var contents like strcpy or strcat may cause undefined behavior.
Quote C ISO 9899:
The declaration
char s[] = "abc", t[3] = "abc";
defines plain char array objects s and t whose elements are initialized with character string literals.
This declaration is identical to:
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };
The contents of the arrays are modifiable. On the other hand, the declaration:
char *p = "abc";
defines p with type pointer to char and initializes it to point to an object with type array of char with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
An explanation of why it could be read-only per your platform and compiler:
Commonly string literals will be put in "read-only-data" section which gets mapped into the process space as read-only (which is why you seem to not being allowed to change it).
But some platforms do allow, the data segment to be writable.
Why are only string literals allowed to be created and not other constants like float? and the third question.
To create a float constant you should use:
const float f=1.5f;
Now, when you are doing:
float *p="3.14";
you are basically assigning the string literal's address to a float pointer.
Try compiling with -Wall -Werror -Wextra. You will find out what is happening.
It works because, in practice, there's no difference between a char * and a float * under the hood.
Its as if you are writing this:
float *p=(float*) "3.14";
This is a well-defined behaviour, unless the memory alignment requirements of float and char differ, in which case it results in undefined behaviour (Reference: C99, 6.3.2.3 p7).
Efficiency
they are
It's a string mind the quotes
float *p="3.14";
This is also a string literal !
Why are string literals created in First Case read-only?
No, both "sack" and "3.14" are string literals and both are read-only.
Why are only string literals allowed to be created and not other constants like float?
If you want to create a float const then do:
const float p=3.14;
Why is Second Case not giving me compiler errors?
You are making the pointer p point to a string literal. When you dereference p, it expects to read a float value. So there's nothing wrong as far as the compiler can see.

Char String Assignment

I am working on Microsoft Visual Studio environment. I came across a strange behavior
char *src ="123";
char *des ="abc";
printf("\nThe src string is %c", src[0]);
printf("\tThe dest string is %c",dest[0]);
des[0] = src[0];
printf("\nThe src string is %c", src[0]);
printf("\tThe dest string is %c",dest[0]);
The result is:
1 a
1 a
That means the des[0] is not being initialized. As src is pointing to the first element of the string. I guess by rules this should work.
This is undefined behavior:
des[0] = src[0];
Try this instead:
char des[] ="abc";
Since src and des are initialized with string literals, their type should actually be const char *, not char *; like this:
const char * src ="123";
const char * des ="abc";
There was never memory allocated for either of them, they just point to the predefined constants. Therefore, the statement des[0] = src[0] is undefined behavior; you're trying to change a constant there!
Any decent compiler should actually warn you about the implicit conversion from const char * to char *...
If using C++, consider using std::string instead of char *, and std::cout instead of printf.
Section 2.13.4 of ISO/IEC 14882 (Programming languages - C++) says:
A string literal is a sequence of characters (as defined in 2.13.2) surrounded by double quotes, optionally beginning with the letter L, as in "..." or L"...". A string literal that does not begin with L is an ordinary string literal, also referred to as a narrow string literal. An ordinary string literal has type “array of n const char” and static storage duration (3.7), where n is the size of the string as defined below, and is initialized with the given characters. ...
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation defined. The effect of attempting to modify a string literal is undefined.
In C, string literals such as "123" are stored as arrays of char (const char in C++). These arrays are stored in memory such that they are available over the lifetime of the program. Attempting to modify the contents of a string literal results in undefined behavior; sometimes it will "work", sometimes it won't, depending on the compiler and the platform, so it's best to treat string literals as unwritable.
Remember that under most circumstances, an expression of type "N-element array of T" will be converted to an expression of type "pointer to T" whose value is the location of the first element in the array.
Thus, when you write
char *src = "123";
char *des = "abc";
the expressions "123" and "abc" are converted from "3-element array of char" to "pointer to char", and src will point to the '1' in "123", and des will point to the 'a' in "abc".
Again, attempting to modify the contents of a string literal results in undefined behavior, so when you write
des[0] = src[0];
the compiler is free to treat that statement any way it wants to, from ignoring it completely to doing exactly what you expect it to do to anything in between. That means that string literals, or a pointer to them, cannot be used as target parameters to calls like strcpy, strcat, memcpy, etc., nor should they be used as parameters to calls like strtok.
vinaygarg: That means the des[0] is not being initialized. As src is pointing to the first element of the string. I guess by rules this should work.
Firstly you must remember that *src and *dst are defined as pointers, nothing more, nothing less.
So you must then ask yourself what exactly "123" and "abc" are and why it cannot be altered? Well to cut a long story short, it is stored in application memory, which is read-only. Why? The strings must be stored with the program in order to be available to your code at run time, in theory you should get a compiler warning for assigning a non-const char* to a const char *. Why is it read-only? The memory for exe's and dll's need to be protected from being overwritten somehow, so it must be read-only to stop bugs and viruses from modifying executing code.
So how can you get this string into modifiable memory?
// Copying into an array.
const size_t BUFFER_SIZE = 256;
char buffer[BUFFER_SIZE];
strcpy(buffer, "abc");
strncpy(buffer, "abc", BUFFER_SIZE-1);

Resources