Why are strings in C declared with 'const'? - c

For example, why not:
char *s= "example";
instead of:
const char *s= "example";
I understand that const makes it unchangeable, but why do I receive an error when compiling the first?
Additionally, how does the concept apply to
int * x;
vs
const int *x;
I see the second used a lot more, is it good practice to use "cons int *"?

There's no requirement to use const, but it's a good idea.
In C, a string literal is an expression of type char[N], where N is the length of the string plus 1 (for the terminating '\0' null character). But attempting to modify the array that corresponds to the string literal has undefined behavior. Many compilers arrange for that array to be stored in read-only memory (not physical ROM, but memory that's marked read-only by the operating system). (An array expression is, in most contexts converted to a pointer expression referring to the initial element of the array object.)
It would have made more sense to make string literals const, but the const keyword did not exist in old versions of C, and it would have broken existing code. (C++ did make string literals const).
This:
char *s= "example"; /* not recommended */
is actually perfectly valid in C, but it's potentially dangerous. If, after this declaration, you do:
s[0] = 'E';
then you're attempting to modify the string literal, and the behavior is undefined.
This:
const char *s= "example"; /* recommended */
is also valid; the char* value that results from evaluating the string literal is safely and quietly converted to const char*. And it's generally better than the first version because it lets the compiler warn you if you attempt to modify the string literal (it's better to catch errors at compile time than at run time).
If you get an error on your first example, then it's likely that you're inadvertently compiling your code as C++ rather than as C -- or that you're using gcc's -Wwrite-strings option or something similar. (-Wwrite-strings makes string literals const; it can improve safety, but it can also cause gcc to reject, or at least warn about, valid C code.)

With Visual Studio 2015 at warning level 4, this compiles and runs whether compiled as C or C++:
#include <stdio.h>
char *s1= "example\n";
const char *s2= "example\n";
int main(int argc, char **argv)
{
printf(s1); // prints "example"
s1[2] = 'x';
printf(s1); // prints "exxmple"
printf(s2);
return 0;
}
If I add this line, it will fail to compile as C or C++ with every compiler I know of:
s2[2] = 'x'; // produces compile error
This is the error the const keyword is designed to avoid. It simply tells the compiler not to allow assignments to the object pointed to.
It doesn't matter if your pointer points to char or int or anything else. The const keyword has the same effect on all pointers, and that's to make it impossible (well, very hard) to assign to the thing declared const.

A string literal used as a value compiles to an array of char that should not be modified. Attempting to modify it invokes undefined behavior. For historical reasons of backward compatibility, its type is char [] although is really should be const char []. You can enable extra compiler warnings to change this and instruct the compiler to consider such strings to be const.

Related

char*, char[] and const char* - stack, code segment and compiler behavior [duplicate]

And where are literals in memory exactly? (see examples below)
I cannot modify a literal, so it would supposedly be a const char*, although the compiler let me use a char* for it, I have no warnings even with most of the compiler flags.
Whereas an implicit cast of a const char* type to a char* type gives me a warning, see below (tested on GCC, but it behaves similarly on VC++2010).
Also, if I modify the value of a const char (with a trick below where GCC would better give me a warning for), it gives no error and I can even modify and display it on GCC (even though I guess it is still an undefined behavior, I wonder why it did not do the same with the literal). That is why I am asking where those literal are stored, and where are more common const supposedly stored?
const char* a = "test";
char* b = a; /* warning: initialization discards qualifiers
from pointer target type (on gcc), error on VC++2k10 */
char *c = "test"; // no compile errors
c[0] = 'p'; /* bus error when execution (we are not supposed to
modify const anyway, so why can I and with no errors? And where is the
literal stored for I have a "bus error"?
I have 'access violation writing' on VC++2010 */
const char d = 'a';
*(char*)&d = 'b'; // no warnings (why not?)
printf("%c", d); /* displays 'b' (why doesn't it do the same
behavior as modifying a literal? It displays 'a' on VC++2010 */
The C standard does not forbid the modification of string literals. It just says that the behaviour is undefined if the attempt is made. According to the C99 rationale, there were people in the committee who wanted string literals to be modifiable, so the standard does not explicitly forbid it.
Note that the situation is different in C++. In C++, string literals are arrays of const char. However, C++ allows conversions from const char * to char *. That feature has been deprecated, though.
I'm not certain about what C/C++ standards stand for about strings. But I can tell exactly what actually happens with string literals in MSVC. And, I believe, other compilers behave similarly.
String literals reside in a const data section. Their memory is mapped into the process address space. However the memory pages they're stored in are ead-only (unless explicitly modified during the run).
But there's something more you should know. Not all the C/C++ expressions containing quotes have the same meaning. Let's clarify everything.
const char* a = "test";
The above statement makes the compiler create a string literal "test". The linker makes sure it'll be in the executable file.
In the function body the compiler generates a code that declares a variable a on the stack, which gets initialized by the address of the string literal "test.
char* b = a;
Here you declare another variable b on the stack which gets the value of a. Since a pointed to a read-only address - so would b. The even fact b has no const semantics doesn't mean you may modify what it points on.
char *c = "test"; // no compile errors
c[0] = 'p';
The above generates an access violation. Again, the lack of const doesn't mean anything at the machine level
const char d = 'a';
*(char*)&d = 'b';
First of all - the above is not related to string literals. 'a' is not a string. It's a character. It's just a number. It's like writing the following:
const int d = 55;
*(int*)&d = 56;
The above code makes a fool out of compiler. You say the variable is const, however you manage to modify it. But this is not related to the processor exception, since d resides in the read/write memory nevertheless.
I'd like to add one more case:
char b[] = "test";
b[2] = 'o';
The above declares an array on the stack, and initializes it with the string "test". It resides in the read/write memory, and can be modified. There's no problem here.
Mostly historical reasons. But keep in mind that they are somewhat justified: String literals don't have type char *, but char [N] where N denotes the size of the buffer (otherwise, sizeof wouldn't work as expected on string literals) and can be used to initialize non-const arrays. You can only assign them to const pointers because of the implicit conversions of arrays to pointers and non-const to const.
It would be more consistent if string literals exhibited the same behaviour as compound literals, but as these are a C99 construct and backwards-compatibility had to be maintained, this wasn't an option, so string literals stay an exceptional case.
And where are literals in memory exactly? (see examples below)
Initialized data segment. On Linux it is either .data or .rodata.
I cannot modify a literal, so it would supposedly be a const char*, although the compiler let me use a char* for it, I have no warnings even with most of the compiler flags.
Historical as it was already explained by others. Most compilers allow you tell whether the string literals should be read-only or modifiable with a command line option.
The reason it is generally desired to have string literals read-only is that the segment with read-only data in memory can be (and normally is) shared between all the processes started from the executable. That obviously frees some RAM from being wasted to keep redundant copies of the same information.
I have no warnings even with most of the compiler flags
Really? When I compile the following code snippet:
int main()
{
char* p = "some literal";
}
on g++ 4.5.0 even without any flags, I get the following warning:
warning: deprecated conversion from string constant to 'char*'
You can write to c because you didn't make it const. Defining c as const would be correct practice since the right hand side has type const char*.
It generates an error at runtime because the "test" value is probably allocated to the code segment which is read-only. See here and here.

What should happen, when we try to modify a string constant?

#include<stdio.h>
#include<string.h>
int main()
{
int i, n;
char *x="Alice"; // ....... 1
n = strlen(x); // ....... 2
*x = x[n]; // ....... 3
for(i=0; i<=n; i++)
{
printf("%s ", x);
x++;
}
printf("\n");
return 0;
}
String constant cannot be modified. In the above code *x means 'A'. In line 3 we are trying to modify a string constant. Is it correct to write that statement? When I run this code on Linux, I got segmentation fault. But on www.indiabix.com, they have given answer:
If you compile and execute this program in windows platform with Turbo C, it will give lice ice ce e It may give different output in other platforms (depends upon compiler and machine). The online C compiler given in this site will give Alice lice ice ce e as output (it runs on Linux platform).
Your analysis is correct. The line
*x = x[n];
is trying to modify a string literal, so it's undefined behavior.
BTW, I checked the website that you linked. Just browsing it for two minutes, I've already found multiple incorrect code samples (to name a few, using gets, using char(not int) to assign return value of getchar, etc), so my suggestion is don't use it.
Your analysis is correct, but doesn't contradict what you quoted.
The code is broken. The answer already acknowledges that it may behave differently on different implementations, and has given two different outputs by two different implementations. You happen to have found an implementation that behaves in a third way. That's perfectly fine.
Modification of a string literal is Undefined Behaviour. So the behaviour you observe, and the two described, are consistent with the requirements of the C standard (as is emailing your boss and your spouse, or making demons fly out of your nose). Those three are all actually quite reasonable actions (modify the 'constant', ignore the write, or signal an error).
With GCC, you can ask to be warned when you assign the address of a string literal to a pointer to (writable) char:
cc -g -Wall -Wextra -Wwrite-strings -c -o 27211884.o 27211884.c
27211884.c: In function ‘main’:
27211884.c:7:13: warning: initialization discards ‘const’ qualifier from pointer target type [enabled by default]
char *x="Alice"; // ....... 1
^
This warning is on by default when compiling C++, but not for C, because char* is often used for string literals in old codebases. I recommend using it when writing new code.
There are two correct ways to write the code of the example, depending on whether you want your string to actually be constant or not:
const char *x = "Alice";
char x[] = "Alice";
In this code, the memory for "Alice" will be in the read-only data section of the executable file and x is a pointer pointing to that read-only location. When we try to modify the read-only data section, it should not allow this. But char *x="Alice"; is telling the compiler that x is declared as a pointer to a character, i.e. x is pointing to a character which can be modified (i.e. is not read-only). So the compiler will think that it can be modified. Thus the line *x = x[n]; will behave differently on different compilers. So it will be undefined behavior.
The correct way of declaring a pointer to a assign string literal is as below:
const char *x ="Alice";
Only then can the behavior of the compiler be predicted.

Why doesn't the compiler detect and produce errors when attempting to modify char * string literals?

Assume the following two pieces of code:
char *c = "hello world";
c[1] = 'y';
The one above doesn't work.
char c[] = "hello world";
c[1] = 'y';
This one does.
With regards to the first one, I understand that the string "hello world" might be stored in the read only memory section and hence can't be changed. The second one however creates a character array on the stack and hence can be modified.
My question is this - why don't compilers detect the first type of error? Why isn't that part of the C standard? Is there some particular reason for this?
C compilers are not required to detect the first error, because C string literals are not const.
Referring to the N1256 draft of the C99 standard:
6.4.5 paragraph 5:
In translation phase 7, a byte or code of value zero is appended to
each multibyte character sequence that results from a string literal
or literals. The multibyte character sequence is then used to
initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the
array elements have type char, and are initialized with the
individual bytes of the multibyte character sequence; [...]
Paragraph 6:
It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
(C11 does not change this.)
So the string literal "hello, world" is of type char[13] (not const char[13]), which is converted to char* in most contexts.
Attempting to modify a const object has undefined behavior, and most code that attempts to do so must be diagnosed by the compiler (you can get around that with a cast, for example). Attempting to modify a string literal also has undefined behavior, but not because it's const (it isn't); it's because the standard specifically says the behavior is undefined.
For example, this program is strictly conforming:
#include <stdio.h>
void print_string(char *s) {
printf("%s\n", s);
}
int main(void) {
print_string("Hello, world");
return 0;
}
If string literals were const, then passing "Hello, world" to a function that takes a (non-const) char* would require a diagnostic. The program is valid, but it would exhibit undefined behavior if print_string() attempted to modify the string pointed to by s.
The reason is historical. Pre-ANSI C didn't have the const keyword, so there was no way to define a function that takes a char* and promises not to modify what it points to. Making string literals const in ANSI C (1989) would have broken existing code, and there hasn't been a good opportunity to make such a change in later editions of the standard.
gcc's -Wwrite-strings does cause it to treat string literals as const, but makes gcc a non-conforming compiler, since it fails to issue a diagnostic for this:
const char (*p)[6] = &"hello";
("hello" is of type char[6], so &"hello" is of type char (*)[6], which is incompatible with the declared type of p. With -Wwrite-strings, &"hello" is treated as being of type const char (*)[6].) Presumably this is why neither -Wall nor -Wextra includes -Wwrite-strings.
On the other hand, code that triggers a warning with -Wwrite-strings should probably be fixed anyway. It's not a bad idea to write your C code so it compiles without diagnostics both with and without -Wwrite-strings.
(Note that C++ string literals are const, because when Bjarne Stroustrup was designing C++ he wasn't as concerned about strict compatibility for old C code.)
Compilers can detect the first "error".
In modern versions of gcc, if you use -Wwrite-strings, you'll get a message saying that you can't assign from const char* to char*. This warning is on by default for C++ code.
That's where the problem is - the first assignment, not the c[1] = 'y' bit. Of course it's legal to take a char*, dereference it, and assign to the dereferenced address.
Quoting from man 1 gcc:
When compiling C, give string constants the type "const char[length]" so that
copying the address of one into a non-"const" "char *" pointer will get a warning.
These warnings will help you find at compile time code that can try to write into a
string constant, but only if you have been very careful about using "const" in
declarations and prototypes. Otherwise, it will just be a nuisance. This is why we
did not make -Wall request these warnings.
So, basically, because most programmers didn't write const-correct code in the early days of C, it's not the default behavior for gcc. But it is for g++.
-Wwrite-strings seems to do what you want. Could have sworn that this was part of -Wall.
% cat chars.c
#include <stdio.h>
int main()
{
char *c = "hello world";
c[1] = 'y';
return 0;
}
% gcc -Wall -o chars chars.c
% gcc -Wwrite-strings -o chars chars.c
chars.c: In function ‘main’:
chars.c:5: warning: initialization discards qualifiers from pointer target type
From the man pages:
When compiling C, give string constants the type "const char[length]" so that copying the address of one into a non-"const" "char *" pointer will get a warning. These warnings will help you find at compile time code that can try to write into a string constant, but only if you have been very careful about using "const" in declarations and prototypes. Otherwise, it will just be a nuisance. This is why we did not make -Wall request these warnings.
When compiling C++, warn about the deprecated conversion from string literals to "char *". This warning is enabled by default for C++ programs.
Note the "enabled by default for C++" is probably why I (and others) think -Wall covers it. Also note the explanation as to why it isn't part of -Wall.
As for relating to the standard, C99, 6.4.5 item 6 (page 63 of the linked PDF) reads:
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
char* c = strdup("..."); would make c[1] sensible. (Removed rant on C) Though an intelligent compiler could/does warn against this, C traditionally is machine near, without (bounds/format/...) checking and other such "needless" overhead.
lint is the tool for detecting such errors: that a const char* was assigned to a char*. It would also mark a char c = c[30]; (No longer type dependent, but also addressing error.) As it would be nice to have declared c as const char*. C is an older language with a tradition of leniency and operating on many platforms.

Know if const qualifier is used

Is there any way in C to find if a variable has the const qualifier? Or if it's stored in the .rodata section?
For example, if I have this function:
void foo(char* myString) {...}
different actions should be taken in these two different function calls:
char str[] = "abc";
foo(str);
foo("def");
In the first case I can modify the string, in the second one no.
Not in standard C, i.e. not portably.
myString is just a char* in foo, all other information is lost. Whatever you feed into the function is automatically converted to char*.
And C does not know about ".rodata".
Depending on your platform you could check the address in myString (if you know your address ranges).
You can't differ them using the language alone. In other words, this is not possible without recurring to features specific to the compiler you're using, which is likely not to be portable. A few important remarks though:
In the first case you COULD modify the string, but you MUST NOT. If you want a mutable string, use initialization instead of assignment.
char *str1 = "abc"; // NOT OK, should be const char *
const char *str2 = "abc"; // OK, but not mutable
char str3[] = "abc"; // OK, using initialization, you can change its contents
#include<stdio.h>
void foo(char *mystr)
{
int a;
/*code goes here*/
#ifdef CHECK
int local_var;
printf(" strings address %p\n",mystr);
printf("local variables address %p \n",&local_var);
puts("");
puts("");
#endif
return;
}
int main()
{
char a[]="hello";
char *b="hello";
foo(a);
foo(b);
foo("hello");
}
On compiling with gcc -DCHECK prog_name.c and executing on my linux machine the following output comes...
strings address 0xbfdcacf6
local variables address 0xbfdcacc8
strings address 0x8048583
local variables address 0xbfdcacc8
strings address 0x8048583
local variables address 0xbfdcacc8
for first case when string is defined and initialized in the "proper c way for mutable strings" the difference between the addresses is 0x2E.(5 bytes).
in the second case when string is defined as char *p="hello" the differences in addresses is
0xB7D82745.Thats bigger than the size of my stack.so i am pretty sure the string is not on the stack.Hence the only place where you can find it is .rodata section.
The third one is similar case
PS:As mentioned above this isn't portable but the original question hardly leaves any scope for portability by mentioning .rodata :)
GCC provides the __builtin_constant_p builtin function, which enables you to determine whether an expression is constant or not at compile-time:
Built-in Function: int __builtin_constant_p (exp)
You can use the built-in function __builtin_constant_p to determine if a value is known to be constant at compile-time and hence that GCC can perform constant-folding on expressions involving that value. The argument of the function is the value to test. The function returns the integer 1 if the argument is known to be a compile-time constant and 0 if it is not known to be a compile-time constant. A return of 0 does not indicate that the value is not a constant, but merely that GCC cannot prove it is a constant with the specified value of the `-O' option.
So I guess you should rewrite your foo function as a macro in such a case:
#define foo(x) \
(__builtin_constant_p(x) ? foo_on_const(x) : foo_on_var(x))
foo("abc") would expand to foo_on_const("abc") and foo(str) would expand to foo_on_var(str).

char *p="orkut" vs const char *p="orkut"

char *p="orkut" vs const char *p="orkut"
whats the difference btwn these two...
EDIT
from bjarne stroustrup,3rd edition page 90
void f()
{
char* p="plato";
p[4]='e' // error: assign to const;result is undefined
}
this kind of error cannont be general b caught until run time and implementations differ in their enforcement of this rule
Same with const char *p="plato"
Thats why iam asking the diffrence... Whats the significance of const here..
The const char* variant is correct.
You should not change memory that comes from a string literal (referred to as static storage usually). It is read only memory.
The difference is that the char* variant will allow you to write the syntax to change the data that it points to by dereferencing it. What it actually does though is undefined.
//Option 1:
char *p = "orkut";
*p = 'x';//undefined behavior
//Option 2:
const char *q = "orkut";
*q = 'x';//compiling error
I would rather have option 2 happen to me.
A declaration of const char * p means that the thing p points at is const, ie should not change. I say should not because it is possible to cast away the constness. As has been pointed out, changing a string literal is undefined and often leads to an access violation/segmentation fault.
Not this declaration is different than char * const p which means that p itself it const rather than the thing p points at.
The problem that Stroustrup discusses in what you're quoting is that in C++ a string literal will readily convert to "an rvlaue of type "pointer to char" (4.2/2 "Array-to-pointer conversion"). This is specifically so that the very common idiom of pointing a char* to a literal string wouldn't cause a bazillion programs to fail to compile. (especially when C++ was initially evolving from C).
If you can get away with declaring you pointer as char const* (or the equivalent const char*), you'll help yourself from running into problems like those described in the Stroustrup quote. However, you might well run into irritating problems using the pointer with functions that aren't 100% const correct.
See this question.
Basically,
char *p="orkut";
In this case, p is supposed to be read-only, but this is not enforced by all compilers (or the standard).
note that more recent gcc implementations will not let you do
char *foo = "foo";
they insist on the const (certainly in -wall -werror mode)

Resources