What happens when a char array gets initialized from a string literal? - c

As I understand it, the following code works like so:
char* cptr = "Hello World";
"Hello World" lives in the .rodata section of the program's memory. The string literal "Hello World" returns a pointer to the base address of the string, or the address of the first element in the so-called "array", since the chars are laid out sequentially in memory it would be the 'H'. This is my little diagram as I visualize the string literal getting stored in the memory:
0x4 : 'H'
0x5 : 'e'
0x6 : 'l'
0x6 : 'l'
0x7 : 'o'
0x8 : ' '
0x9 : 'W'
0xa : 'o'
0xb : 'r'
0xc : 'l'
0xd : 'd'
0xe : '\0'
So the declaration above becomes:
char* cptr = 0x4;
Now cptr points to the string literal. I'm just making up the addresses.
0xa1 : 0x4
Now how does this code work?
char cString[] = "Hello World";
I am assuming that as in the previous situation "Hello World" also degrades to the address of 'H' and 0x4.
char cString[] = 0x4;
I am reading the = as an overloaded assignment operator when it used with initialization of a char array. As I understand, at initialization of C-string only, it copies char-by-char starting at the given base address into the C-string until it hits a '\0' as the last char copied. It also allocates enough memory for all the chars. Because overloaded operators are really just functions, I assume that it's internal implementation is similar to strcpy().
I would like one of the more experienced C programmers to confirm my assumptions of how this code works. This is my visualization of the C-string after the chars from the string literal get copied into it:
0xb4 : 'H'
0xb5 : 'e'
0xb6 : 'l'
0xb6 : 'l'
0xb7 : 'o'
0xb8 : ' '
0xb9 : 'W'
0xba : 'o'
0xbb : 'r'
0xbc : 'l'
0xbd : 'd'
0xbe : '\0'
Once again, the addresses are arbitrary, the point is that the C-string in the stack is distinct from the string literal in the .rodata section in memory.
What am I trying to do? I am trying to use a char pointer to temporarily hold the base address of the string literal, and use that same char pointer (base address of string literal) to initialize the C-string.
char* cptr = "Hello World";
char cString[] = cptr;
I assume that "Hello World" evaluates to its base address, 0x4. So this code ought to look like this:
char* cptr = 0x4;
char cString[] = 0x4;
I assume that it should be no different from char cString[] = "Hello World"; since "Hello World" evaluates to its base address, and that is what is stored in the char pointer!
However, gcc gives me an error:
error: invalid initializer
char cString[] = cptr;
^
How come you can't use a char pointer as a tempoorary placeholder to store the base address of a string literal?
How does this code work? Are my assumptions correct?
Does using a string literal in the code return the base address to the "array" where the chars are stored in the memory?

Your understanding of memory layout is more or less correct. But the problem you are having is one of initialization semantics in C.
The = symbol in a declaration here is NOT the assignment operator. Instead, it is syntax that specifies the initializer for a variable being instantiated. In the general case, T x = y; is not the same as T x; x = y;.
There is a language rule that a character array can be initialized from a string literal. (The string literal is not "evaluated to its base address" in this context). There is not a language rule that an array can be initialized from a pointer to the elements intended to be copied into the array.
Why are the rules like this? "Historical reasons".

I am assuming that as in the previous situation "Hello World" also degrades to the address of 'H' and 0x4.
Not really: cString[] gets a completely new address in memory. Compiler allocates 12 chars to it, and initializes them with the content of "Hello World" string literal.
I assume that "Hello World" evaluates to its base address, 0x4. Does using a string literal in the code return the base address to the "array" where the chars are stored in the memory?
cString may be converted to char* later on, yielding its base address, but it remains an array in the regular contexts. In particular, if you invoke sizeof(cString) you would get the size of the array, not the size of the pointer.
How come you can't use a char pointer as a temporary placeholder to store the base address of a string literal?
You can. However, once a string literal is assigned to char *, it stops being a string literal, at least as far as the compiler is concerned. It becomes a char * pointer, no different from other char * pointers.
Note that modern C compilers combine identical string literals as an optimization, so if you write
#define HELLO_WORLD "Hello World"
...
char* cptr = HELLO_WORLD;
char cString[] = HELLO_WORLD;
and turn optimization on, the compiler would eliminate duplicate copies of the string literal.

The second definition char cString[] = "Hello World"; is a shorthand for this equivalent definition:
char cString[12] = { 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0' };
If this definition occurs as a global scope or with static storage, cString will be in the .data segment with the initial contents in the executable image. If it occurs un the scope of a function with automatic storage, the compiler will allocate automatic storage for the array (reserving space on the stack frame or equivalent) and generate code to perform the initialization at run-time.

Related

How to replace a char in a string in C? [duplicate]

In C, one can use a string literal in a declaration like this:
char s[] = "hello";
or like this:
char *s = "hello";
So what is the difference? I want to know what actually happens in terms of storage duration, both at compile and run time.
The difference here is that
char *s = "Hello world";
will place "Hello world" in the read-only parts of the memory, and making s a pointer to that makes any writing operation on this memory illegal.
While doing:
char s[] = "Hello world";
puts the literal string in read-only memory and copies the string to newly allocated memory on the stack. Thus making
s[0] = 'J';
legal.
First off, in function arguments, they are exactly equivalent:
void foo(char *x);
void foo(char x[]); // exactly the same in all respects
In other contexts, char * allocates a pointer, while char [] allocates an array. Where does the string go in the former case, you ask? The compiler secretly allocates a static anonymous array to hold the string literal. So:
char *x = "Foo";
// is approximately equivalent to:
static const char __secret_anonymous_array[] = "Foo";
char *x = (char *) __secret_anonymous_array;
Note that you must not ever attempt to modify the contents of this anonymous array via this pointer; the effects are undefined (often meaning a crash):
x[1] = 'O'; // BAD. DON'T DO THIS.
Using the array syntax directly allocates it into new memory. Thus modification is safe:
char x[] = "Foo";
x[1] = 'O'; // No problem.
However the array only lives as long as its contaning scope, so if you do this in a function, don't return or leak a pointer to this array - make a copy instead with strdup() or similar. If the array is allocated in global scope, of course, no problem.
This declaration:
char s[] = "hello";
Creates one object - a char array of size 6, called s, initialised with the values 'h', 'e', 'l', 'l', 'o', '\0'. Where this array is allocated in memory, and how long it lives for, depends on where the declaration appears. If the declaration is within a function, it will live until the end of the block that it is declared in, and almost certainly be allocated on the stack; if it's outside a function, it will probably be stored within an "initialised data segment" that is loaded from the executable file into writeable memory when the program is run.
On the other hand, this declaration:
char *s ="hello";
Creates two objects:
a read-only array of 6 chars containing the values 'h', 'e', 'l', 'l', 'o', '\0', which has no name and has static storage duration (meaning that it lives for the entire life of the program); and
a variable of type pointer-to-char, called s, which is initialised with the location of the first character in that unnamed, read-only array.
The unnamed read-only array is typically located in the "text" segment of the program, which means it is loaded from disk into read-only memory, along with the code itself. The location of the s pointer variable in memory depends on where the declaration appears (just like in the first example).
Given the declarations
char *s0 = "hello world";
char s1[] = "hello world";
assume the following hypothetical memory map (the columns represent characters at offsets 0 to 3 from the given row address, so e.g. the 0x00 in the bottom right corner is at address 0x0001000C + 3 = 0x0001000F):
+0 +1 +2 +3
0x00008000: 'h' 'e' 'l' 'l'
0x00008004: 'o' ' ' 'w' 'o'
0x00008008: 'r' 'l' 'd' 0x00
...
s0: 0x00010000: 0x00 0x00 0x80 0x00
s1: 0x00010004: 'h' 'e' 'l' 'l'
0x00010008: 'o' ' ' 'w' 'o'
0x0001000C: 'r' 'l' 'd' 0x00
The string literal "hello world" is a 12-element array of char (const char in C++) with static storage duration, meaning that the memory for it is allocated when the program starts up and remains allocated until the program terminates. Attempting to modify the contents of a string literal invokes undefined behavior.
The line
char *s0 = "hello world";
defines s0 as a pointer to char with auto storage duration (meaning the variable s0 only exists for the scope in which it is declared) and copies the address of the string literal (0x00008000 in this example) to it. Note that since s0 points to a string literal, it should not be used as an argument to any function that would try to modify it (e.g., strtok(), strcat(), strcpy(), etc.).
The line
char s1[] = "hello world";
defines s1 as a 12-element array of char (length is taken from the string literal) with auto storage duration and copies the contents of the literal to the array. As you can see from the memory map, we have two copies of the string "hello world"; the difference is that you can modify the string contained in s1.
s0 and s1 are interchangeable in most contexts; here are the exceptions:
sizeof s0 == sizeof (char*)
sizeof s1 == 12
type of &s0 == char **
type of &s1 == char (*)[12] // pointer to a 12-element array of char
You can reassign the variable s0 to point to a different string literal or to another variable. You cannot reassign the variable s1 to point to a different array.
C99 N1256 draft
There are two different uses of character string literals:
Initialize char[]:
char c[] = "abc";
This is "more magic", and described at 6.7.8/14 "Initialization":
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
So this is just a shortcut for:
char c[] = {'a', 'b', 'c', '\0'};
Like any other regular array, c can be modified.
Everywhere else: it generates an:
unnamed
array of char What is the type of string literals in C and C++?
with static storage
that gives UB if modified
So when you write:
char *c = "abc";
This is similar to:
/* __unnamed is magic because modifying it gives UB. */
static char __unnamed[] = "abc";
char *c = __unnamed;
Note the implicit cast from char[] to char *, which is always legal.
Then if you modify c[0], you also modify __unnamed, which is UB.
This is documented at 6.4.5 "String literals":
5 In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals. The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence [...]
6 It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
6.7.8/32 "Initialization" gives a direct example:
EXAMPLE 8: The declaration
char s[] = "abc", t[3] = "abc";
defines "plain" char array objects s and t whose elements are initialized with character string literals.
This declaration is identical to
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };
The contents of the arrays are modifiable. On the other hand, the declaration
char *p = "abc";
defines p with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
GCC 4.8 x86-64 ELF implementation
Program:
#include <stdio.h>
int main(void) {
char *s = "abc";
printf("%s\n", s);
return 0;
}
Compile and decompile:
gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o
Output contains:
char *s = "abc";
8: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
f: 00
c: R_X86_64_32S .rodata
Conclusion: GCC stores char* it in .rodata section, not in .text.
Note however that the default linker script puts .rodata and .text in the same segment, which has execute but no write permission. This can be observed with:
readelf -l a.out
which contains:
Section to Segment mapping:
Segment Sections...
02 .text .rodata
If we do the same for char[]:
char s[] = "abc";
we obtain:
17: c7 45 f0 61 62 63 00 movl $0x636261,-0x10(%rbp)
so it gets stored in the stack (relative to %rbp).
char s[] = "hello";
declares s to be an array of char which is long enough to hold the initializer (5 + 1 chars) and initializes the array by copying the members of the given string literal into the array.
char *s = "hello";
declares s to be a pointer to one or more (in this case more) chars and points it directly at a fixed (read-only) location containing the literal "hello".
char s[] = "Hello world";
Here, s is an array of characters, which can be overwritten if we wish.
char *s = "hello";
A string literal is used to create these character blocks somewhere in the memory which this pointer s is pointing to. We can here reassign the object it is pointing to by changing that, but as long as it points to a string literal the block of characters to which it points can't be changed.
As an addition, consider that, as for read-only purposes the use of both is identical, you can access a char by indexing either with [] or *(<var> + <index>)
format:
printf("%c", x[1]); //Prints r
And:
printf("%c", *(x + 1)); //Prints r
Obviously, if you attempt to do
*(x + 1) = 'a';
You will probably get a Segmentation Fault, as you are trying to access read-only memory.
Just to add: you also get different values for their sizes.
printf("sizeof s[] = %zu\n", sizeof(s)); //6
printf("sizeof *s = %zu\n", sizeof(s)); //4 or 8
As mentioned above, for an array '\0' will be allocated as the final element.
char *str = "Hello";
The above sets str to point to the literal value "Hello" which is hard-coded in the program's binary image, which is flagged as read-only in memory, means any change in this String literal is illegal and that would throw segmentation faults.
char str[] = "Hello";
copies the string to newly allocated memory on the stack. Thus making any change in it is allowed and legal.
means str[0] = 'M';
will change the str to "Mello".
For more details, please go through the similar question:
Why do I get a segmentation fault when writing to a string initialized with "char *s" but not "char s[]"?
An example to the difference:
printf("hello" + 2); //llo
char a[] = "hello" + 2; //error
In the first case pointer arithmetics are working (arrays passed to a function decay to pointers).
char *s1 = "Hello world"; // Points to fixed character string which is not allowed to modify
char s2[] = "Hello world"; // As good as fixed array of characters in string so allowed to modify
// s1[0] = 'J'; // Illegal
s2[0] = 'J'; // Legal
In the case of:
char *x = "fred";
x is an lvalue -- it can be assigned to. But in the case of:
char x[] = "fred";
x is not an lvalue, it is an rvalue -- you cannot assign to it.
In the light of comments here it should be obvious that : char * s = "hello" ;
Is a bad idea, and should be used in very narrow scope.
This might be a good opportunity to point out that "const correctness" is a "good thing". Whenever and wherever You can, use the "const" keyword to protect your code, from "relaxed" callers or programmers, which are usually most "relaxed" when pointers come into play.
Enough melodrama, here is what one can achieve when adorning pointers with "const".
(Note: One has to read pointer declarations right-to-left.)
Here are the 3 different ways to protect yourself when playing with pointers :
const DBJ* p means "p points to a DBJ that is const"
— that is, the DBJ object can't be changed via p.
DBJ* const p means "p is a const pointer to a DBJ"
— that is, you can change the DBJ object via p, but you can't change the pointer p itself.
const DBJ* const p means "p is a const pointer to a const DBJ"
— that is, you can't change the pointer p itself, nor can you change the DBJ object via p.
The errors related to attempted const-ant mutations are caught at compile time. There is no runtime space or speed penalty for const.
(Assumption is you are using C++ compiler, of course ?)
--DBJ

Unhandled exception thrown: write access violation. #C [duplicate]

In C, one can use a string literal in a declaration like this:
char s[] = "hello";
or like this:
char *s = "hello";
So what is the difference? I want to know what actually happens in terms of storage duration, both at compile and run time.
The difference here is that
char *s = "Hello world";
will place "Hello world" in the read-only parts of the memory, and making s a pointer to that makes any writing operation on this memory illegal.
While doing:
char s[] = "Hello world";
puts the literal string in read-only memory and copies the string to newly allocated memory on the stack. Thus making
s[0] = 'J';
legal.
First off, in function arguments, they are exactly equivalent:
void foo(char *x);
void foo(char x[]); // exactly the same in all respects
In other contexts, char * allocates a pointer, while char [] allocates an array. Where does the string go in the former case, you ask? The compiler secretly allocates a static anonymous array to hold the string literal. So:
char *x = "Foo";
// is approximately equivalent to:
static const char __secret_anonymous_array[] = "Foo";
char *x = (char *) __secret_anonymous_array;
Note that you must not ever attempt to modify the contents of this anonymous array via this pointer; the effects are undefined (often meaning a crash):
x[1] = 'O'; // BAD. DON'T DO THIS.
Using the array syntax directly allocates it into new memory. Thus modification is safe:
char x[] = "Foo";
x[1] = 'O'; // No problem.
However the array only lives as long as its contaning scope, so if you do this in a function, don't return or leak a pointer to this array - make a copy instead with strdup() or similar. If the array is allocated in global scope, of course, no problem.
This declaration:
char s[] = "hello";
Creates one object - a char array of size 6, called s, initialised with the values 'h', 'e', 'l', 'l', 'o', '\0'. Where this array is allocated in memory, and how long it lives for, depends on where the declaration appears. If the declaration is within a function, it will live until the end of the block that it is declared in, and almost certainly be allocated on the stack; if it's outside a function, it will probably be stored within an "initialised data segment" that is loaded from the executable file into writeable memory when the program is run.
On the other hand, this declaration:
char *s ="hello";
Creates two objects:
a read-only array of 6 chars containing the values 'h', 'e', 'l', 'l', 'o', '\0', which has no name and has static storage duration (meaning that it lives for the entire life of the program); and
a variable of type pointer-to-char, called s, which is initialised with the location of the first character in that unnamed, read-only array.
The unnamed read-only array is typically located in the "text" segment of the program, which means it is loaded from disk into read-only memory, along with the code itself. The location of the s pointer variable in memory depends on where the declaration appears (just like in the first example).
Given the declarations
char *s0 = "hello world";
char s1[] = "hello world";
assume the following hypothetical memory map (the columns represent characters at offsets 0 to 3 from the given row address, so e.g. the 0x00 in the bottom right corner is at address 0x0001000C + 3 = 0x0001000F):
+0 +1 +2 +3
0x00008000: 'h' 'e' 'l' 'l'
0x00008004: 'o' ' ' 'w' 'o'
0x00008008: 'r' 'l' 'd' 0x00
...
s0: 0x00010000: 0x00 0x00 0x80 0x00
s1: 0x00010004: 'h' 'e' 'l' 'l'
0x00010008: 'o' ' ' 'w' 'o'
0x0001000C: 'r' 'l' 'd' 0x00
The string literal "hello world" is a 12-element array of char (const char in C++) with static storage duration, meaning that the memory for it is allocated when the program starts up and remains allocated until the program terminates. Attempting to modify the contents of a string literal invokes undefined behavior.
The line
char *s0 = "hello world";
defines s0 as a pointer to char with auto storage duration (meaning the variable s0 only exists for the scope in which it is declared) and copies the address of the string literal (0x00008000 in this example) to it. Note that since s0 points to a string literal, it should not be used as an argument to any function that would try to modify it (e.g., strtok(), strcat(), strcpy(), etc.).
The line
char s1[] = "hello world";
defines s1 as a 12-element array of char (length is taken from the string literal) with auto storage duration and copies the contents of the literal to the array. As you can see from the memory map, we have two copies of the string "hello world"; the difference is that you can modify the string contained in s1.
s0 and s1 are interchangeable in most contexts; here are the exceptions:
sizeof s0 == sizeof (char*)
sizeof s1 == 12
type of &s0 == char **
type of &s1 == char (*)[12] // pointer to a 12-element array of char
You can reassign the variable s0 to point to a different string literal or to another variable. You cannot reassign the variable s1 to point to a different array.
C99 N1256 draft
There are two different uses of character string literals:
Initialize char[]:
char c[] = "abc";
This is "more magic", and described at 6.7.8/14 "Initialization":
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
So this is just a shortcut for:
char c[] = {'a', 'b', 'c', '\0'};
Like any other regular array, c can be modified.
Everywhere else: it generates an:
unnamed
array of char What is the type of string literals in C and C++?
with static storage
that gives UB if modified
So when you write:
char *c = "abc";
This is similar to:
/* __unnamed is magic because modifying it gives UB. */
static char __unnamed[] = "abc";
char *c = __unnamed;
Note the implicit cast from char[] to char *, which is always legal.
Then if you modify c[0], you also modify __unnamed, which is UB.
This is documented at 6.4.5 "String literals":
5 In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals. The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence [...]
6 It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
6.7.8/32 "Initialization" gives a direct example:
EXAMPLE 8: The declaration
char s[] = "abc", t[3] = "abc";
defines "plain" char array objects s and t whose elements are initialized with character string literals.
This declaration is identical to
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };
The contents of the arrays are modifiable. On the other hand, the declaration
char *p = "abc";
defines p with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
GCC 4.8 x86-64 ELF implementation
Program:
#include <stdio.h>
int main(void) {
char *s = "abc";
printf("%s\n", s);
return 0;
}
Compile and decompile:
gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o
Output contains:
char *s = "abc";
8: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
f: 00
c: R_X86_64_32S .rodata
Conclusion: GCC stores char* it in .rodata section, not in .text.
Note however that the default linker script puts .rodata and .text in the same segment, which has execute but no write permission. This can be observed with:
readelf -l a.out
which contains:
Section to Segment mapping:
Segment Sections...
02 .text .rodata
If we do the same for char[]:
char s[] = "abc";
we obtain:
17: c7 45 f0 61 62 63 00 movl $0x636261,-0x10(%rbp)
so it gets stored in the stack (relative to %rbp).
char s[] = "hello";
declares s to be an array of char which is long enough to hold the initializer (5 + 1 chars) and initializes the array by copying the members of the given string literal into the array.
char *s = "hello";
declares s to be a pointer to one or more (in this case more) chars and points it directly at a fixed (read-only) location containing the literal "hello".
char s[] = "Hello world";
Here, s is an array of characters, which can be overwritten if we wish.
char *s = "hello";
A string literal is used to create these character blocks somewhere in the memory which this pointer s is pointing to. We can here reassign the object it is pointing to by changing that, but as long as it points to a string literal the block of characters to which it points can't be changed.
As an addition, consider that, as for read-only purposes the use of both is identical, you can access a char by indexing either with [] or *(<var> + <index>)
format:
printf("%c", x[1]); //Prints r
And:
printf("%c", *(x + 1)); //Prints r
Obviously, if you attempt to do
*(x + 1) = 'a';
You will probably get a Segmentation Fault, as you are trying to access read-only memory.
Just to add: you also get different values for their sizes.
printf("sizeof s[] = %zu\n", sizeof(s)); //6
printf("sizeof *s = %zu\n", sizeof(s)); //4 or 8
As mentioned above, for an array '\0' will be allocated as the final element.
char *str = "Hello";
The above sets str to point to the literal value "Hello" which is hard-coded in the program's binary image, which is flagged as read-only in memory, means any change in this String literal is illegal and that would throw segmentation faults.
char str[] = "Hello";
copies the string to newly allocated memory on the stack. Thus making any change in it is allowed and legal.
means str[0] = 'M';
will change the str to "Mello".
For more details, please go through the similar question:
Why do I get a segmentation fault when writing to a string initialized with "char *s" but not "char s[]"?
An example to the difference:
printf("hello" + 2); //llo
char a[] = "hello" + 2; //error
In the first case pointer arithmetics are working (arrays passed to a function decay to pointers).
char *s1 = "Hello world"; // Points to fixed character string which is not allowed to modify
char s2[] = "Hello world"; // As good as fixed array of characters in string so allowed to modify
// s1[0] = 'J'; // Illegal
s2[0] = 'J'; // Legal
In the case of:
char *x = "fred";
x is an lvalue -- it can be assigned to. But in the case of:
char x[] = "fred";
x is not an lvalue, it is an rvalue -- you cannot assign to it.
In the light of comments here it should be obvious that : char * s = "hello" ;
Is a bad idea, and should be used in very narrow scope.
This might be a good opportunity to point out that "const correctness" is a "good thing". Whenever and wherever You can, use the "const" keyword to protect your code, from "relaxed" callers or programmers, which are usually most "relaxed" when pointers come into play.
Enough melodrama, here is what one can achieve when adorning pointers with "const".
(Note: One has to read pointer declarations right-to-left.)
Here are the 3 different ways to protect yourself when playing with pointers :
const DBJ* p means "p points to a DBJ that is const"
— that is, the DBJ object can't be changed via p.
DBJ* const p means "p is a const pointer to a DBJ"
— that is, you can change the DBJ object via p, but you can't change the pointer p itself.
const DBJ* const p means "p is a const pointer to a const DBJ"
— that is, you can't change the pointer p itself, nor can you change the DBJ object via p.
The errors related to attempted const-ant mutations are caught at compile time. There is no runtime space or speed penalty for const.
(Assumption is you are using C++ compiler, of course ?)
--DBJ

C: Pointers/arrays manipulation [duplicate]

In C, one can use a string literal in a declaration like this:
char s[] = "hello";
or like this:
char *s = "hello";
So what is the difference? I want to know what actually happens in terms of storage duration, both at compile and run time.
The difference here is that
char *s = "Hello world";
will place "Hello world" in the read-only parts of the memory, and making s a pointer to that makes any writing operation on this memory illegal.
While doing:
char s[] = "Hello world";
puts the literal string in read-only memory and copies the string to newly allocated memory on the stack. Thus making
s[0] = 'J';
legal.
First off, in function arguments, they are exactly equivalent:
void foo(char *x);
void foo(char x[]); // exactly the same in all respects
In other contexts, char * allocates a pointer, while char [] allocates an array. Where does the string go in the former case, you ask? The compiler secretly allocates a static anonymous array to hold the string literal. So:
char *x = "Foo";
// is approximately equivalent to:
static const char __secret_anonymous_array[] = "Foo";
char *x = (char *) __secret_anonymous_array;
Note that you must not ever attempt to modify the contents of this anonymous array via this pointer; the effects are undefined (often meaning a crash):
x[1] = 'O'; // BAD. DON'T DO THIS.
Using the array syntax directly allocates it into new memory. Thus modification is safe:
char x[] = "Foo";
x[1] = 'O'; // No problem.
However the array only lives as long as its contaning scope, so if you do this in a function, don't return or leak a pointer to this array - make a copy instead with strdup() or similar. If the array is allocated in global scope, of course, no problem.
This declaration:
char s[] = "hello";
Creates one object - a char array of size 6, called s, initialised with the values 'h', 'e', 'l', 'l', 'o', '\0'. Where this array is allocated in memory, and how long it lives for, depends on where the declaration appears. If the declaration is within a function, it will live until the end of the block that it is declared in, and almost certainly be allocated on the stack; if it's outside a function, it will probably be stored within an "initialised data segment" that is loaded from the executable file into writeable memory when the program is run.
On the other hand, this declaration:
char *s ="hello";
Creates two objects:
a read-only array of 6 chars containing the values 'h', 'e', 'l', 'l', 'o', '\0', which has no name and has static storage duration (meaning that it lives for the entire life of the program); and
a variable of type pointer-to-char, called s, which is initialised with the location of the first character in that unnamed, read-only array.
The unnamed read-only array is typically located in the "text" segment of the program, which means it is loaded from disk into read-only memory, along with the code itself. The location of the s pointer variable in memory depends on where the declaration appears (just like in the first example).
Given the declarations
char *s0 = "hello world";
char s1[] = "hello world";
assume the following hypothetical memory map (the columns represent characters at offsets 0 to 3 from the given row address, so e.g. the 0x00 in the bottom right corner is at address 0x0001000C + 3 = 0x0001000F):
+0 +1 +2 +3
0x00008000: 'h' 'e' 'l' 'l'
0x00008004: 'o' ' ' 'w' 'o'
0x00008008: 'r' 'l' 'd' 0x00
...
s0: 0x00010000: 0x00 0x00 0x80 0x00
s1: 0x00010004: 'h' 'e' 'l' 'l'
0x00010008: 'o' ' ' 'w' 'o'
0x0001000C: 'r' 'l' 'd' 0x00
The string literal "hello world" is a 12-element array of char (const char in C++) with static storage duration, meaning that the memory for it is allocated when the program starts up and remains allocated until the program terminates. Attempting to modify the contents of a string literal invokes undefined behavior.
The line
char *s0 = "hello world";
defines s0 as a pointer to char with auto storage duration (meaning the variable s0 only exists for the scope in which it is declared) and copies the address of the string literal (0x00008000 in this example) to it. Note that since s0 points to a string literal, it should not be used as an argument to any function that would try to modify it (e.g., strtok(), strcat(), strcpy(), etc.).
The line
char s1[] = "hello world";
defines s1 as a 12-element array of char (length is taken from the string literal) with auto storage duration and copies the contents of the literal to the array. As you can see from the memory map, we have two copies of the string "hello world"; the difference is that you can modify the string contained in s1.
s0 and s1 are interchangeable in most contexts; here are the exceptions:
sizeof s0 == sizeof (char*)
sizeof s1 == 12
type of &s0 == char **
type of &s1 == char (*)[12] // pointer to a 12-element array of char
You can reassign the variable s0 to point to a different string literal or to another variable. You cannot reassign the variable s1 to point to a different array.
C99 N1256 draft
There are two different uses of character string literals:
Initialize char[]:
char c[] = "abc";
This is "more magic", and described at 6.7.8/14 "Initialization":
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
So this is just a shortcut for:
char c[] = {'a', 'b', 'c', '\0'};
Like any other regular array, c can be modified.
Everywhere else: it generates an:
unnamed
array of char What is the type of string literals in C and C++?
with static storage
that gives UB if modified
So when you write:
char *c = "abc";
This is similar to:
/* __unnamed is magic because modifying it gives UB. */
static char __unnamed[] = "abc";
char *c = __unnamed;
Note the implicit cast from char[] to char *, which is always legal.
Then if you modify c[0], you also modify __unnamed, which is UB.
This is documented at 6.4.5 "String literals":
5 In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals. The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence [...]
6 It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
6.7.8/32 "Initialization" gives a direct example:
EXAMPLE 8: The declaration
char s[] = "abc", t[3] = "abc";
defines "plain" char array objects s and t whose elements are initialized with character string literals.
This declaration is identical to
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };
The contents of the arrays are modifiable. On the other hand, the declaration
char *p = "abc";
defines p with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
GCC 4.8 x86-64 ELF implementation
Program:
#include <stdio.h>
int main(void) {
char *s = "abc";
printf("%s\n", s);
return 0;
}
Compile and decompile:
gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o
Output contains:
char *s = "abc";
8: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
f: 00
c: R_X86_64_32S .rodata
Conclusion: GCC stores char* it in .rodata section, not in .text.
Note however that the default linker script puts .rodata and .text in the same segment, which has execute but no write permission. This can be observed with:
readelf -l a.out
which contains:
Section to Segment mapping:
Segment Sections...
02 .text .rodata
If we do the same for char[]:
char s[] = "abc";
we obtain:
17: c7 45 f0 61 62 63 00 movl $0x636261,-0x10(%rbp)
so it gets stored in the stack (relative to %rbp).
char s[] = "hello";
declares s to be an array of char which is long enough to hold the initializer (5 + 1 chars) and initializes the array by copying the members of the given string literal into the array.
char *s = "hello";
declares s to be a pointer to one or more (in this case more) chars and points it directly at a fixed (read-only) location containing the literal "hello".
char s[] = "Hello world";
Here, s is an array of characters, which can be overwritten if we wish.
char *s = "hello";
A string literal is used to create these character blocks somewhere in the memory which this pointer s is pointing to. We can here reassign the object it is pointing to by changing that, but as long as it points to a string literal the block of characters to which it points can't be changed.
As an addition, consider that, as for read-only purposes the use of both is identical, you can access a char by indexing either with [] or *(<var> + <index>)
format:
printf("%c", x[1]); //Prints r
And:
printf("%c", *(x + 1)); //Prints r
Obviously, if you attempt to do
*(x + 1) = 'a';
You will probably get a Segmentation Fault, as you are trying to access read-only memory.
Just to add: you also get different values for their sizes.
printf("sizeof s[] = %zu\n", sizeof(s)); //6
printf("sizeof *s = %zu\n", sizeof(s)); //4 or 8
As mentioned above, for an array '\0' will be allocated as the final element.
char *str = "Hello";
The above sets str to point to the literal value "Hello" which is hard-coded in the program's binary image, which is flagged as read-only in memory, means any change in this String literal is illegal and that would throw segmentation faults.
char str[] = "Hello";
copies the string to newly allocated memory on the stack. Thus making any change in it is allowed and legal.
means str[0] = 'M';
will change the str to "Mello".
For more details, please go through the similar question:
Why do I get a segmentation fault when writing to a string initialized with "char *s" but not "char s[]"?
An example to the difference:
printf("hello" + 2); //llo
char a[] = "hello" + 2; //error
In the first case pointer arithmetics are working (arrays passed to a function decay to pointers).
char *s1 = "Hello world"; // Points to fixed character string which is not allowed to modify
char s2[] = "Hello world"; // As good as fixed array of characters in string so allowed to modify
// s1[0] = 'J'; // Illegal
s2[0] = 'J'; // Legal
In the case of:
char *x = "fred";
x is an lvalue -- it can be assigned to. But in the case of:
char x[] = "fred";
x is not an lvalue, it is an rvalue -- you cannot assign to it.
In the light of comments here it should be obvious that : char * s = "hello" ;
Is a bad idea, and should be used in very narrow scope.
This might be a good opportunity to point out that "const correctness" is a "good thing". Whenever and wherever You can, use the "const" keyword to protect your code, from "relaxed" callers or programmers, which are usually most "relaxed" when pointers come into play.
Enough melodrama, here is what one can achieve when adorning pointers with "const".
(Note: One has to read pointer declarations right-to-left.)
Here are the 3 different ways to protect yourself when playing with pointers :
const DBJ* p means "p points to a DBJ that is const"
— that is, the DBJ object can't be changed via p.
DBJ* const p means "p is a const pointer to a DBJ"
— that is, you can change the DBJ object via p, but you can't change the pointer p itself.
const DBJ* const p means "p is a const pointer to a const DBJ"
— that is, you can't change the pointer p itself, nor can you change the DBJ object via p.
The errors related to attempted const-ant mutations are caught at compile time. There is no runtime space or speed penalty for const.
(Assumption is you are using C++ compiler, of course ?)
--DBJ

Difference between an array of characters and a pointer to a string constant? [duplicate]

In C, one can use a string literal in a declaration like this:
char s[] = "hello";
or like this:
char *s = "hello";
So what is the difference? I want to know what actually happens in terms of storage duration, both at compile and run time.
The difference here is that
char *s = "Hello world";
will place "Hello world" in the read-only parts of the memory, and making s a pointer to that makes any writing operation on this memory illegal.
While doing:
char s[] = "Hello world";
puts the literal string in read-only memory and copies the string to newly allocated memory on the stack. Thus making
s[0] = 'J';
legal.
First off, in function arguments, they are exactly equivalent:
void foo(char *x);
void foo(char x[]); // exactly the same in all respects
In other contexts, char * allocates a pointer, while char [] allocates an array. Where does the string go in the former case, you ask? The compiler secretly allocates a static anonymous array to hold the string literal. So:
char *x = "Foo";
// is approximately equivalent to:
static const char __secret_anonymous_array[] = "Foo";
char *x = (char *) __secret_anonymous_array;
Note that you must not ever attempt to modify the contents of this anonymous array via this pointer; the effects are undefined (often meaning a crash):
x[1] = 'O'; // BAD. DON'T DO THIS.
Using the array syntax directly allocates it into new memory. Thus modification is safe:
char x[] = "Foo";
x[1] = 'O'; // No problem.
However the array only lives as long as its contaning scope, so if you do this in a function, don't return or leak a pointer to this array - make a copy instead with strdup() or similar. If the array is allocated in global scope, of course, no problem.
This declaration:
char s[] = "hello";
Creates one object - a char array of size 6, called s, initialised with the values 'h', 'e', 'l', 'l', 'o', '\0'. Where this array is allocated in memory, and how long it lives for, depends on where the declaration appears. If the declaration is within a function, it will live until the end of the block that it is declared in, and almost certainly be allocated on the stack; if it's outside a function, it will probably be stored within an "initialised data segment" that is loaded from the executable file into writeable memory when the program is run.
On the other hand, this declaration:
char *s ="hello";
Creates two objects:
a read-only array of 6 chars containing the values 'h', 'e', 'l', 'l', 'o', '\0', which has no name and has static storage duration (meaning that it lives for the entire life of the program); and
a variable of type pointer-to-char, called s, which is initialised with the location of the first character in that unnamed, read-only array.
The unnamed read-only array is typically located in the "text" segment of the program, which means it is loaded from disk into read-only memory, along with the code itself. The location of the s pointer variable in memory depends on where the declaration appears (just like in the first example).
Given the declarations
char *s0 = "hello world";
char s1[] = "hello world";
assume the following hypothetical memory map (the columns represent characters at offsets 0 to 3 from the given row address, so e.g. the 0x00 in the bottom right corner is at address 0x0001000C + 3 = 0x0001000F):
+0 +1 +2 +3
0x00008000: 'h' 'e' 'l' 'l'
0x00008004: 'o' ' ' 'w' 'o'
0x00008008: 'r' 'l' 'd' 0x00
...
s0: 0x00010000: 0x00 0x00 0x80 0x00
s1: 0x00010004: 'h' 'e' 'l' 'l'
0x00010008: 'o' ' ' 'w' 'o'
0x0001000C: 'r' 'l' 'd' 0x00
The string literal "hello world" is a 12-element array of char (const char in C++) with static storage duration, meaning that the memory for it is allocated when the program starts up and remains allocated until the program terminates. Attempting to modify the contents of a string literal invokes undefined behavior.
The line
char *s0 = "hello world";
defines s0 as a pointer to char with auto storage duration (meaning the variable s0 only exists for the scope in which it is declared) and copies the address of the string literal (0x00008000 in this example) to it. Note that since s0 points to a string literal, it should not be used as an argument to any function that would try to modify it (e.g., strtok(), strcat(), strcpy(), etc.).
The line
char s1[] = "hello world";
defines s1 as a 12-element array of char (length is taken from the string literal) with auto storage duration and copies the contents of the literal to the array. As you can see from the memory map, we have two copies of the string "hello world"; the difference is that you can modify the string contained in s1.
s0 and s1 are interchangeable in most contexts; here are the exceptions:
sizeof s0 == sizeof (char*)
sizeof s1 == 12
type of &s0 == char **
type of &s1 == char (*)[12] // pointer to a 12-element array of char
You can reassign the variable s0 to point to a different string literal or to another variable. You cannot reassign the variable s1 to point to a different array.
C99 N1256 draft
There are two different uses of character string literals:
Initialize char[]:
char c[] = "abc";
This is "more magic", and described at 6.7.8/14 "Initialization":
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
So this is just a shortcut for:
char c[] = {'a', 'b', 'c', '\0'};
Like any other regular array, c can be modified.
Everywhere else: it generates an:
unnamed
array of char What is the type of string literals in C and C++?
with static storage
that gives UB if modified
So when you write:
char *c = "abc";
This is similar to:
/* __unnamed is magic because modifying it gives UB. */
static char __unnamed[] = "abc";
char *c = __unnamed;
Note the implicit cast from char[] to char *, which is always legal.
Then if you modify c[0], you also modify __unnamed, which is UB.
This is documented at 6.4.5 "String literals":
5 In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals. The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence [...]
6 It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
6.7.8/32 "Initialization" gives a direct example:
EXAMPLE 8: The declaration
char s[] = "abc", t[3] = "abc";
defines "plain" char array objects s and t whose elements are initialized with character string literals.
This declaration is identical to
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };
The contents of the arrays are modifiable. On the other hand, the declaration
char *p = "abc";
defines p with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
GCC 4.8 x86-64 ELF implementation
Program:
#include <stdio.h>
int main(void) {
char *s = "abc";
printf("%s\n", s);
return 0;
}
Compile and decompile:
gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o
Output contains:
char *s = "abc";
8: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
f: 00
c: R_X86_64_32S .rodata
Conclusion: GCC stores char* it in .rodata section, not in .text.
Note however that the default linker script puts .rodata and .text in the same segment, which has execute but no write permission. This can be observed with:
readelf -l a.out
which contains:
Section to Segment mapping:
Segment Sections...
02 .text .rodata
If we do the same for char[]:
char s[] = "abc";
we obtain:
17: c7 45 f0 61 62 63 00 movl $0x636261,-0x10(%rbp)
so it gets stored in the stack (relative to %rbp).
char s[] = "hello";
declares s to be an array of char which is long enough to hold the initializer (5 + 1 chars) and initializes the array by copying the members of the given string literal into the array.
char *s = "hello";
declares s to be a pointer to one or more (in this case more) chars and points it directly at a fixed (read-only) location containing the literal "hello".
char s[] = "Hello world";
Here, s is an array of characters, which can be overwritten if we wish.
char *s = "hello";
A string literal is used to create these character blocks somewhere in the memory which this pointer s is pointing to. We can here reassign the object it is pointing to by changing that, but as long as it points to a string literal the block of characters to which it points can't be changed.
As an addition, consider that, as for read-only purposes the use of both is identical, you can access a char by indexing either with [] or *(<var> + <index>)
format:
printf("%c", x[1]); //Prints r
And:
printf("%c", *(x + 1)); //Prints r
Obviously, if you attempt to do
*(x + 1) = 'a';
You will probably get a Segmentation Fault, as you are trying to access read-only memory.
Just to add: you also get different values for their sizes.
printf("sizeof s[] = %zu\n", sizeof(s)); //6
printf("sizeof *s = %zu\n", sizeof(s)); //4 or 8
As mentioned above, for an array '\0' will be allocated as the final element.
char *str = "Hello";
The above sets str to point to the literal value "Hello" which is hard-coded in the program's binary image, which is flagged as read-only in memory, means any change in this String literal is illegal and that would throw segmentation faults.
char str[] = "Hello";
copies the string to newly allocated memory on the stack. Thus making any change in it is allowed and legal.
means str[0] = 'M';
will change the str to "Mello".
For more details, please go through the similar question:
Why do I get a segmentation fault when writing to a string initialized with "char *s" but not "char s[]"?
An example to the difference:
printf("hello" + 2); //llo
char a[] = "hello" + 2; //error
In the first case pointer arithmetics are working (arrays passed to a function decay to pointers).
char *s1 = "Hello world"; // Points to fixed character string which is not allowed to modify
char s2[] = "Hello world"; // As good as fixed array of characters in string so allowed to modify
// s1[0] = 'J'; // Illegal
s2[0] = 'J'; // Legal
In the case of:
char *x = "fred";
x is an lvalue -- it can be assigned to. But in the case of:
char x[] = "fred";
x is not an lvalue, it is an rvalue -- you cannot assign to it.
In the light of comments here it should be obvious that : char * s = "hello" ;
Is a bad idea, and should be used in very narrow scope.
This might be a good opportunity to point out that "const correctness" is a "good thing". Whenever and wherever You can, use the "const" keyword to protect your code, from "relaxed" callers or programmers, which are usually most "relaxed" when pointers come into play.
Enough melodrama, here is what one can achieve when adorning pointers with "const".
(Note: One has to read pointer declarations right-to-left.)
Here are the 3 different ways to protect yourself when playing with pointers :
const DBJ* p means "p points to a DBJ that is const"
— that is, the DBJ object can't be changed via p.
DBJ* const p means "p is a const pointer to a DBJ"
— that is, you can change the DBJ object via p, but you can't change the pointer p itself.
const DBJ* const p means "p is a const pointer to a const DBJ"
— that is, you can't change the pointer p itself, nor can you change the DBJ object via p.
The errors related to attempted const-ant mutations are caught at compile time. There is no runtime space or speed penalty for const.
(Assumption is you are using C++ compiler, of course ?)
--DBJ

Why can't I print a string point in C? [duplicate]

In C, one can use a string literal in a declaration like this:
char s[] = "hello";
or like this:
char *s = "hello";
So what is the difference? I want to know what actually happens in terms of storage duration, both at compile and run time.
The difference here is that
char *s = "Hello world";
will place "Hello world" in the read-only parts of the memory, and making s a pointer to that makes any writing operation on this memory illegal.
While doing:
char s[] = "Hello world";
puts the literal string in read-only memory and copies the string to newly allocated memory on the stack. Thus making
s[0] = 'J';
legal.
First off, in function arguments, they are exactly equivalent:
void foo(char *x);
void foo(char x[]); // exactly the same in all respects
In other contexts, char * allocates a pointer, while char [] allocates an array. Where does the string go in the former case, you ask? The compiler secretly allocates a static anonymous array to hold the string literal. So:
char *x = "Foo";
// is approximately equivalent to:
static const char __secret_anonymous_array[] = "Foo";
char *x = (char *) __secret_anonymous_array;
Note that you must not ever attempt to modify the contents of this anonymous array via this pointer; the effects are undefined (often meaning a crash):
x[1] = 'O'; // BAD. DON'T DO THIS.
Using the array syntax directly allocates it into new memory. Thus modification is safe:
char x[] = "Foo";
x[1] = 'O'; // No problem.
However the array only lives as long as its contaning scope, so if you do this in a function, don't return or leak a pointer to this array - make a copy instead with strdup() or similar. If the array is allocated in global scope, of course, no problem.
This declaration:
char s[] = "hello";
Creates one object - a char array of size 6, called s, initialised with the values 'h', 'e', 'l', 'l', 'o', '\0'. Where this array is allocated in memory, and how long it lives for, depends on where the declaration appears. If the declaration is within a function, it will live until the end of the block that it is declared in, and almost certainly be allocated on the stack; if it's outside a function, it will probably be stored within an "initialised data segment" that is loaded from the executable file into writeable memory when the program is run.
On the other hand, this declaration:
char *s ="hello";
Creates two objects:
a read-only array of 6 chars containing the values 'h', 'e', 'l', 'l', 'o', '\0', which has no name and has static storage duration (meaning that it lives for the entire life of the program); and
a variable of type pointer-to-char, called s, which is initialised with the location of the first character in that unnamed, read-only array.
The unnamed read-only array is typically located in the "text" segment of the program, which means it is loaded from disk into read-only memory, along with the code itself. The location of the s pointer variable in memory depends on where the declaration appears (just like in the first example).
Given the declarations
char *s0 = "hello world";
char s1[] = "hello world";
assume the following hypothetical memory map (the columns represent characters at offsets 0 to 3 from the given row address, so e.g. the 0x00 in the bottom right corner is at address 0x0001000C + 3 = 0x0001000F):
+0 +1 +2 +3
0x00008000: 'h' 'e' 'l' 'l'
0x00008004: 'o' ' ' 'w' 'o'
0x00008008: 'r' 'l' 'd' 0x00
...
s0: 0x00010000: 0x00 0x00 0x80 0x00
s1: 0x00010004: 'h' 'e' 'l' 'l'
0x00010008: 'o' ' ' 'w' 'o'
0x0001000C: 'r' 'l' 'd' 0x00
The string literal "hello world" is a 12-element array of char (const char in C++) with static storage duration, meaning that the memory for it is allocated when the program starts up and remains allocated until the program terminates. Attempting to modify the contents of a string literal invokes undefined behavior.
The line
char *s0 = "hello world";
defines s0 as a pointer to char with auto storage duration (meaning the variable s0 only exists for the scope in which it is declared) and copies the address of the string literal (0x00008000 in this example) to it. Note that since s0 points to a string literal, it should not be used as an argument to any function that would try to modify it (e.g., strtok(), strcat(), strcpy(), etc.).
The line
char s1[] = "hello world";
defines s1 as a 12-element array of char (length is taken from the string literal) with auto storage duration and copies the contents of the literal to the array. As you can see from the memory map, we have two copies of the string "hello world"; the difference is that you can modify the string contained in s1.
s0 and s1 are interchangeable in most contexts; here are the exceptions:
sizeof s0 == sizeof (char*)
sizeof s1 == 12
type of &s0 == char **
type of &s1 == char (*)[12] // pointer to a 12-element array of char
You can reassign the variable s0 to point to a different string literal or to another variable. You cannot reassign the variable s1 to point to a different array.
C99 N1256 draft
There are two different uses of character string literals:
Initialize char[]:
char c[] = "abc";
This is "more magic", and described at 6.7.8/14 "Initialization":
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
So this is just a shortcut for:
char c[] = {'a', 'b', 'c', '\0'};
Like any other regular array, c can be modified.
Everywhere else: it generates an:
unnamed
array of char What is the type of string literals in C and C++?
with static storage
that gives UB if modified
So when you write:
char *c = "abc";
This is similar to:
/* __unnamed is magic because modifying it gives UB. */
static char __unnamed[] = "abc";
char *c = __unnamed;
Note the implicit cast from char[] to char *, which is always legal.
Then if you modify c[0], you also modify __unnamed, which is UB.
This is documented at 6.4.5 "String literals":
5 In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals. The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence [...]
6 It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
6.7.8/32 "Initialization" gives a direct example:
EXAMPLE 8: The declaration
char s[] = "abc", t[3] = "abc";
defines "plain" char array objects s and t whose elements are initialized with character string literals.
This declaration is identical to
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };
The contents of the arrays are modifiable. On the other hand, the declaration
char *p = "abc";
defines p with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
GCC 4.8 x86-64 ELF implementation
Program:
#include <stdio.h>
int main(void) {
char *s = "abc";
printf("%s\n", s);
return 0;
}
Compile and decompile:
gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o
Output contains:
char *s = "abc";
8: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
f: 00
c: R_X86_64_32S .rodata
Conclusion: GCC stores char* it in .rodata section, not in .text.
Note however that the default linker script puts .rodata and .text in the same segment, which has execute but no write permission. This can be observed with:
readelf -l a.out
which contains:
Section to Segment mapping:
Segment Sections...
02 .text .rodata
If we do the same for char[]:
char s[] = "abc";
we obtain:
17: c7 45 f0 61 62 63 00 movl $0x636261,-0x10(%rbp)
so it gets stored in the stack (relative to %rbp).
char s[] = "hello";
declares s to be an array of char which is long enough to hold the initializer (5 + 1 chars) and initializes the array by copying the members of the given string literal into the array.
char *s = "hello";
declares s to be a pointer to one or more (in this case more) chars and points it directly at a fixed (read-only) location containing the literal "hello".
char s[] = "Hello world";
Here, s is an array of characters, which can be overwritten if we wish.
char *s = "hello";
A string literal is used to create these character blocks somewhere in the memory which this pointer s is pointing to. We can here reassign the object it is pointing to by changing that, but as long as it points to a string literal the block of characters to which it points can't be changed.
As an addition, consider that, as for read-only purposes the use of both is identical, you can access a char by indexing either with [] or *(<var> + <index>)
format:
printf("%c", x[1]); //Prints r
And:
printf("%c", *(x + 1)); //Prints r
Obviously, if you attempt to do
*(x + 1) = 'a';
You will probably get a Segmentation Fault, as you are trying to access read-only memory.
Just to add: you also get different values for their sizes.
printf("sizeof s[] = %zu\n", sizeof(s)); //6
printf("sizeof *s = %zu\n", sizeof(s)); //4 or 8
As mentioned above, for an array '\0' will be allocated as the final element.
char *str = "Hello";
The above sets str to point to the literal value "Hello" which is hard-coded in the program's binary image, which is flagged as read-only in memory, means any change in this String literal is illegal and that would throw segmentation faults.
char str[] = "Hello";
copies the string to newly allocated memory on the stack. Thus making any change in it is allowed and legal.
means str[0] = 'M';
will change the str to "Mello".
For more details, please go through the similar question:
Why do I get a segmentation fault when writing to a string initialized with "char *s" but not "char s[]"?
An example to the difference:
printf("hello" + 2); //llo
char a[] = "hello" + 2; //error
In the first case pointer arithmetics are working (arrays passed to a function decay to pointers).
char *s1 = "Hello world"; // Points to fixed character string which is not allowed to modify
char s2[] = "Hello world"; // As good as fixed array of characters in string so allowed to modify
// s1[0] = 'J'; // Illegal
s2[0] = 'J'; // Legal
In the case of:
char *x = "fred";
x is an lvalue -- it can be assigned to. But in the case of:
char x[] = "fred";
x is not an lvalue, it is an rvalue -- you cannot assign to it.
In the light of comments here it should be obvious that : char * s = "hello" ;
Is a bad idea, and should be used in very narrow scope.
This might be a good opportunity to point out that "const correctness" is a "good thing". Whenever and wherever You can, use the "const" keyword to protect your code, from "relaxed" callers or programmers, which are usually most "relaxed" when pointers come into play.
Enough melodrama, here is what one can achieve when adorning pointers with "const".
(Note: One has to read pointer declarations right-to-left.)
Here are the 3 different ways to protect yourself when playing with pointers :
const DBJ* p means "p points to a DBJ that is const"
— that is, the DBJ object can't be changed via p.
DBJ* const p means "p is a const pointer to a DBJ"
— that is, you can change the DBJ object via p, but you can't change the pointer p itself.
const DBJ* const p means "p is a const pointer to a const DBJ"
— that is, you can't change the pointer p itself, nor can you change the DBJ object via p.
The errors related to attempted const-ant mutations are caught at compile time. There is no runtime space or speed penalty for const.
(Assumption is you are using C++ compiler, of course ?)
--DBJ

Resources