In the code char * str = "hello";, I understand that code "hello" is to allocate the word hello to any other memory and then put the first value of that allocated memory into the variable str.
But when I use the code char str[10] = "hello";, I understood that the word hello is included in each element of the array.
If then, on the top, the code "hello" returns the address of the memory
and on the bottom, the code "hello" returns the word h e l l o \n.
I want to know why they are different and if I'm wrong, I want to know what double quotes return.
C is a bit quirky. You have two distinct use cases here. But let's first start with what "hello" is.
Your "hello" in the program source code is a character string literal. That is a character sequence enclosed in double quotes. When the compiler is compiling this source code, it appends a zero byte to the sequence, so that standard library functions like strlen() can work on it. The resulting zero-terminated sequence is then used by the compiler to "initialize an array of static storage duration and length just sufficient to contain the sequence array of constant characters" (n1570 ISO C draft, 6.4.5/6). That length is 6: The 5 characters h, e, l, l and o as well as the appended zero byte.
"Static storage duration" means that the array exists the entire time the program is running (as opposed to objects with automatic local storage duration, e.g. local variables, and those with dynamic storage duration, which are created via malloc() or calloc()).
You can memorize the address of that array, as in char *str = "hello";. This address will point to valid memory during the lifetime of the program.
The second use case is a special syntax for initializing character arrays. It is just syntactic sugar for this common use case, and a deviation from the fact that you cannot normally initialize arrays with arrays.1
This time you don't define a pointer, you define a proper array of 10 chars. You then use the string literal to initialize it. You always can use the generic method to initialize a character array by listing the individual array elements, separated by commas, in curly braces (by the way, this generic method works also for the other kind of compound types, namely structs):
char str[10] = { 'h', 'e', 'l', 'l', 'o', '\0' };
This is entirely equivalent to
char str[10] = "hello";
Now your array has more elements (10) than the number of characters in the initializing array produced from the string literal (6); the standard stipulates that "subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration". Those global and static variables are initialized with zero, which means that the character array str ends with 4 zero characters.
It is immediately obvious why Dennis Ritchie added the somewhat anti-paradigmatic initialization of character arrays via a string literal, probably after the second time he had to do it with the generic array initialization syntax. Designing your own language has its benefits.
1 For example, static char src[] = "123"; char dest[] = src; doesn't work. You have to use strcpy().
The initialization:
char * str = "hello";
in most C implementations makes sure that the string hello is placed in a constant data section of the executable memory. Exactly six bytes are written, the last one being the string terminator '\0'.
str char pointer contains the address of the first character 'h', so that anyone accessing the string knows that the following bytes have to be read until the terminator character is found.
The other initialization
char str[10] = "hello"; // <-- string must be enclosed in double quotes
is very similar, as str points to the first character of the string and that the following characters are written in the following memory locations (included the string terminator).
But:
Even if only six bytes are explicitly initialized, ten bytes are allocated because that's the size of the array. In this case, the four trailing bytes will contain zeroes
Data is not constant and can be changed, while in the previous example it wasn't possible because such initialization, in most C implementations, instructs the compiler to use a constant data section
You seem to be mixing up some things:
char str[10] = "hello';
This does not even compile: when you start with a double-quote, you should end with one:
char str[10] = "hello";
In memory, this has following effect:
str[0] : h
str[1] : e
str[2] : l
str[3] : l
str[4] : o
str[5] : 0 (the zero character constant)
str[6] : xxx
str[7] : xxx
str[8] : xxx
str[9] : xxx
(By xxx, I mean that this can be anything)
As a result, the code will not return hello\n (with an end-of-line character), just hello\0 (the zero character).
The double quotes just mention the beginning and the ending of a string constant and return nothing.
Related
#include <stdio.h>
int main() {
char a = 5;
char b[2] = "hi"; // No explicit room for `\0`.
char c = 6;
return 0;
}
Whenever we write a string, enclosed in double quotes, C automatically creates an array of characters for us, containing that string, terminated by the \0 character
http://www.eskimo.com/~scs/cclass/notes/sx8.html
In the above example b only has room for 2 characters so the null terminating char doesn't have a spot to be placed at and yet the compiler is reorganizing the memory store instructions so that a and c are stored before b in memory to make room for a \0 at the end of the array.
Is this expected or am I hitting undefined behavior?
It is allowed to initialize a char array with a string if the array is at least large enough to hold all of the characters in the string besides the null terminator.
This is detailed in section 6.7.9p14 of the C standard:
An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
However, this also means that you can't treat the array as a string since it's not null terminated. So as written, since you're not performing any string operations on b, your code is fine.
What you can't do is initialize with a string that's too long, i.e.:
char b[2] = "hello";
As this gives more initializers than can fit in the array and is a constraint violation. Section 6.7.9p2 states this as follows:
No initializer shall attempt to provide a value for an object not contained within the entity
being initialized.
If you were to declare and initialize the array like this:
char b[] = "hi";
Then b would be an array of size 3, which is large enough to hold the two characters in the string constant plus the terminating null byte, making b a string.
To summarize:
If the array has a fixed size:
If the string constant used to initialize it is shorter than the array, the array will contain the characters in the string with successive elements set to 0, so the array will contain a string.
If the array is exactly large enough to contain the elements of the string but not the null terminator, the array will contain the characters in the string without the null terminator, meaning the array is not a string.
If the string constant (not counting the null terminator) is longer than the array, this is a constraint violation which triggers undefined behavior
If the array does not have an explicit size, the array will be sized to hold the string constant plus the terminating null byte.
Whenever we write a string, enclosed in double quotes, C automatically creates an array of characters for us, containing that string, terminated by the \0 character.
Those notes are mildly misleading in this case. I shall have to update them.
When you write something like
char *p = "Hello";
or
printf("world!\n");
C automatically creates an array of characters for you, of just the right size, containing the string, terminated by the \0 character.
In the case of array initializers, however, things are slightly different. When you write
char b[2] = "hi";
the string is merely the initializer for an array which you are creating. So you have complete control over the size. There are several possibilities:
char b0[] = "hi"; // compiler infers size
char b1[1] = "hi"; // error
char b2[2] = "hi"; // No terminating 0 in the array. (Illegal in C++, BTW)
char b3[3] = "hi"; // explicit size matches string literal
char b4[10] = "hi"; // space past end of initializer is always zero-initialized
For b0, you don't specify a size, so the compiler uses the string initializer to pick the right size, which will be 3.
For b1, you specify a size, but it's too small, so the compiler should give you a error.
For b2, which is the case you asked about, you specify a size which is just barely big enough for the explicit characters in the string initializer, but not the terminating \0. This is a special case. It's legal, but what you end up with in b2 is not a proper null-terminated string. Since it's unusual at best, the compiler might give you a warning. See this question for more information on this case.
For b3, you specify a size which is just right, so you get a proper string in an exactly-sized array, just like b0.
For b4, you specify a size which is too big, although this is no problem. There ends up being extra space in the array, beyond the terminating \0. (As a matter of fact, this extra space will also be filled with \0.) This extra space would let you safely do something like strcat(b4, ", wrld!").
Needless to say, most of the time you want to use the b0 form. Counting characters is tedious and error-prone. As Brian Kernighan (one of the creators of C) has written in this context, "Let the computer do the dirty work."
One more thing. You wrote:
and yet the compiler is reorganizing the memory store instructions so that a and c are stored before b in memory to make room for a \0 at the end of the array.
I don't know what's going on there, but it's safe to say that the compiler is not trying to "make room for a \0". Compilers can and often do store variables in their own inscrutable internal order, matching neither the order you declared them, nor alphabetical order, nor anything else you might think of. If under your compiler array b ended up with extra space after it which did contain a \0 as if to terminate the string, that was probably basically random chance, not because the compiler was trying to be nice to you and helping to make something like printf("%s\n", b) be better defined. (Under the two compilers where I tried it, printf("%s\n", b) printed hi^E and hi ??, clearly showing the presence of trailing random garbage, as expected.)
There are two things in your question.
String literal. String literal (ie something enclosed in the double quotes) is always the correct null character terminated string.
char *p = "ABC"; // p references null character terminated string
Character array may only hold as many elements as it has so if you try to initialize two element array with three elements string literal, only two first will be written. So the array will not contain the null character terminated C string
char p[2] = "AB"; // p is not a valid C string.
A array of char need not be terminated by anything at all. It is an array. If the actual content is smaller than the dimensions of the array then you need to track the size of that content.
Answers here seem to have degenerated into a string discussion. Not all arrays of char are strings. However it is a very strong convention to use a null terminator as a sentinel if they are to be handled as de facto strings.
Your array may use something else, and may also have separators and zones. After all it may be a Union or overlay a structure. Possibly a staging area for another system.
I was reading about pointers in K&R book here:
https://hikage.freeshell.org/books/theCprogrammingLanguage.pdf
There is an important difference between these definitions:
char amessage[] = "now is the time"; /* an array */
char *pmessage = "now is the time"; /* a pointer */
amessage is an array, just big enough to hold the sequence of characters and ’\0’ that initializes it. Individual characters
within the array may be changed but amessage will always refer to the same storage. On the other hand, pmessage is a
pointer, initialized to point to a string constant; the pointer may subsequently be modified to point elsewhere, but the result is
undefined if you try to modify the string contents.
I dont understand why cwe cant modify the string content !
I dont understand why cwe cant modify the string content !
Because the C standard says so: “If the program attempts to modify such an array [the array defined by a string literal], the behavior is undefined” (C 2018 6.4.5 7). A string literal is a sequence of characters in quotes in source code, such as "Hello, world.\n". (String literals may also be preceded by an encoding prefix u8, u, U, or L, as in L"abc".) A string literal defines an array containing the characters of the string plus a terminating null character.
A reason that attempting to modify the string literal’s array is that string literals were, and are, widely used for strings that are constant—error messages to be printed at times, format strings for printf operations, hard-coded names of things, and so on. As C developed, and the standard was written, it made sense for string literals to be treated as read-only and to allow a compiler to put them in read-only storage. Additionally, some compilers would use the same storage for identical string literals that appeared in different places, and some would use the same storage for a string literal that was a trailing substring of another string literal. Because of this shared storage, modifying one string would also modify the other. So allowing programs to modify string literals could cause some problems.
So, if you merely point to a string literal, you are pointing to something that should not be modified. If you want your own copy that can be modified, simply define it with an array as you show with char amessage[] = "now is the time";. Such a definition defines an array, amessage that has its own storage. That array is initialized with the contents of the string literal but is separate from it.
char amessage[] = "now is the time"; /* an array */
amessage is a modifiable array of chars.
char *pmessage = "now is the time"; /* a pointer */
pmessage is a pointer to the string literal. Attempt to modify the string literal is an Undefined Behaviour.
When you initialize a pointer with a string literal, the compiler creates a read-only array (and indeed is free to merge the pointers into one if you have several initializers using the same literal string (character by character) as in:
char *a = "abcdef", *b = "abcdef";
it is probable that both pointers be initialized to the same address in memory. This is the reason by which you are not allowed to modify the string, and why the behaviour can be unpredictable (you don't know if the compiler has merged both strings)
The thing goes further, as the compiler is permitted to do the following, on the next scenario:
char *a = "foo bar", *b = "bar";
the compiler is permitted to initialize a to point to a char array with the characters {'f', 'o', 'o', ' ', 'b', 'a', 'r', '\0'} and initialize also the pointer b to the fifth position of the array, as one of the string literals is a suffix of the other.
Allowing this allows the compiler to make extensive savings in the final executable and so, the string literals are assigned a read-only segment in the executable (they are placed in the .text segment or a similar one)
On the other hand, initializing an array has no problems, as you are defining the array variable that will store the characters, and it is not the compiler which is doing this. An initialization like:
char a[] = "Hello";
will arrange things to have a global variable of type array of chars with space for six characters. But you can also specify between the brackets the array size, as in
char a[32] = "Hello";
and then the array will have 32 characters (from 0 to 31) and the first five will be initialized to the character literals 'H', 'e', 'l', 'l' and 'o', followed by 27 null characters '\0'.
You are also allowed to say:
char a[4] = "Hello";
but in this case you will get an array initialized as {'H', 'e', 'l', 'l'} (only the first four characters are used from the string literal, and you will get a warning from the compiler, signalling the dangerous bend)
Last, think always that an assignment and an initialization are different things, despite they use the same symbol = to indicate it, they are not the same thing. You will never be allowed to write a sentence like:
char a[26];
a = "foo bar";
because the expression "foo bar" represents a char * pointing to a static array (unmodifiable) and an array cannot be assigned.
For instance, if I write:
void function(char *k){ printf("%s",k);}
and call it like this:
function("hello");
does the code translate that string to: "hello\0" ? Or I'm the one who has to add it?
In C (And C++), when you do
const char* mystr = "Hello";, the compiler will generate the following in (read-only) RAM:
0x7fff2fe0: 'H'
0x7fff2fe1: 'e'
0x7fff2fe2: 'l'
0x7fff2fe3: 'l'
0x7fff2fe4: 'o'
0x7fff2fe5: '\0'
Then, the compiler will replace
const char* mystr = "Hello";
with
const char* mystr = 0x7fff2fe0;
For your usage, your code will turn into
function(0x7fff2fe0)
Simple as that.
On a compiler level, all string literals have type const char[N], where the char array is an array that contains all of the written characters, followed by a \0. The char[N] has a length N that is 1 + the length of the string you write (char[6] for "Hello"). More information can be found in the here, where they also use the string "Hello" as an example. Thus, sizeof("Hello") == 6, and "Hello"[5] == '\0' (Yes, "Hello"[5] is legal, remember, "Hello" has type const char[6]). We see this information exemplified in the following:
printf("%d\n", sizeof("Hello")); // 6
const char[] str = "Hello"; // Casts from const char[6] to const char[6]
// Resulting in a copy of all 6 bytes
printf("%d\n", sizeof(str)); // 6
const char* str2 = "Hello"; // Casts from const char[6] to const char*
printf("%d\n", sizeof(str2)); // 4 on a 32bit system, 8 on a 64bit system
Do note, when casting to a pointer, that you get some pointer e.g. 0x7fff2fe0 to an array of characters that is not modifiable - attempting to modify the data pointed at 0x7fff2fe0 or 0x7fff2fe5 is explicitly undefined behavior. This status is commonly represented with const; by writing const, the compiler will correctly complain if you try to edit it.
As an additional note, by writing
char[] myarr = "Hello";
You will create a duplicate stack-allocated character array named myarr, and that array may be modified. myarr will indeed still contain \0 and have a size of 6 chars, in particular, myarr will have type char[6], with sizeof(myarr) == 6.
From the C11 Standard
Section 6.4.5 String Literals, Paragraph 6 (p. 71):
In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals.
78)
The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence
A string literal already includes a terminating \0 by itself, regardless of what you do with that literal. "hello" is always a char [6] array of h, e, l, l, o and \0, by definition. So, the fact that you "pass it to a function" is completely inconsequential here.
There's no need to add anything.
String literals are not passed to the functions only the pointer to the first character. The referenced object will have all the chars + terminating zero.
The following code receives seg fault on line 2:
char *str = "string";
str[0] = 'z'; // could be also written as *str = 'z'
printf("%s\n", str);
While this works perfectly well:
char str[] = "string";
str[0] = 'z';
printf("%s\n", str);
Tested with MSVC and GCC.
See the C FAQ, Question 1.32
Q: What is the difference between these initializations?
char a[] = "string literal";
char *p = "string literal";
My program crashes if I try to assign a new value to p[i].
A: A string literal (the formal term
for a double-quoted string in C
source) can be used in two slightly
different ways:
As the initializer for an array of char, as in the declaration of char a[] , it specifies the initial values
of the characters in that array (and,
if necessary, its size).
Anywhere else, it turns into an unnamed, static array of characters,
and this unnamed array may be stored
in read-only memory, and which
therefore cannot necessarily be
modified. In an expression context,
the array is converted at once to a
pointer, as usual (see section 6), so
the second declaration initializes p
to point to the unnamed array's first
element.
Some compilers have a switch
controlling whether string literals
are writable or not (for compiling old
code), and some may have options to
cause string literals to be formally
treated as arrays of const char (for
better error catching).
Normally, string literals are stored in read-only memory when the program is run. This is to prevent you from accidentally changing a string constant. In your first example, "string" is stored in read-only memory and *str points to the first character. The segfault happens when you try to change the first character to 'z'.
In the second example, the string "string" is copied by the compiler from its read-only home to the str[] array. Then changing the first character is permitted. You can check this by printing the address of each:
printf("%p", str);
Also, printing the size of str in the second example will show you that the compiler has allocated 7 bytes for it:
printf("%d", sizeof(str));
Most of these answers are correct, but just to add a little more clarity...
The "read only memory" that people are referring to is the text segment in ASM terms. It's the same place in memory where the instructions are loaded. This is read-only for obvious reasons like security. When you create a char* initialized to a string, the string data is compiled into the text segment and the program initializes the pointer to point into the text segment. So if you try to change it, kaboom. Segfault.
When written as an array, the compiler places the initialized string data in the data segment instead, which is the same place that your global variables and such live. This memory is mutable, since there are no instructions in the data segment. This time when the compiler initializes the character array (which is still just a char*) it's pointing into the data segment rather than the text segment, which you can safely alter at run-time.
Why do I get a segmentation fault when writing to a string?
C99 N1256 draft
There are two different uses of character string literals:
Initialize char[]:
char c[] = "abc";
This is "more magic", and described at 6.7.8/14 "Initialization":
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
So this is just a shortcut for:
char c[] = {'a', 'b', 'c', '\0'};
Like any other regular array, c can be modified.
Everywhere else: it generates an:
unnamed
array of char What is the type of string literals in C and C++?
with static storage
that gives UB if modified
So when you write:
char *c = "abc";
This is similar to:
/* __unnamed is magic because modifying it gives UB. */
static char __unnamed[] = "abc";
char *c = __unnamed;
Note the implicit cast from char[] to char *, which is always legal.
Then if you modify c[0], you also modify __unnamed, which is UB.
This is documented at 6.4.5 "String literals":
5 In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals. The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence [...]
6 It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
6.7.8/32 "Initialization" gives a direct example:
EXAMPLE 8: The declaration
char s[] = "abc", t[3] = "abc";
defines "plain" char array objects s and t whose elements are initialized with character string literals.
This declaration is identical to
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };
The contents of the arrays are modifiable. On the other hand, the declaration
char *p = "abc";
defines p with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
GCC 4.8 x86-64 ELF implementation
Program:
#include <stdio.h>
int main(void) {
char *s = "abc";
printf("%s\n", s);
return 0;
}
Compile and decompile:
gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o
Output contains:
char *s = "abc";
8: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
f: 00
c: R_X86_64_32S .rodata
Conclusion: GCC stores char* it in .rodata section, not in .text.
If we do the same for char[]:
char s[] = "abc";
we obtain:
17: c7 45 f0 61 62 63 00 movl $0x636261,-0x10(%rbp)
so it gets stored in the stack (relative to %rbp).
Note however that the default linker script puts .rodata and .text in the same segment, which has execute but no write permission. This can be observed with:
readelf -l a.out
which contains:
Section to Segment mapping:
Segment Sections...
02 .text .rodata
In the first code, "string" is a string constant, and string constants should never be modified because they are often placed into read only memory. "str" is a pointer being used to modify the constant.
In the second code, "string" is an array initializer, sort of short hand for
char str[7] = { 's', 't', 'r', 'i', 'n', 'g', '\0' };
"str" is an array allocated on the stack and can be modified freely.
Because the type of "whatever" in the context of the 1st example is const char * (even if you assign it to a non-const char*), which means you shouldn't try and write to it.
The compiler has enforced this by putting the string in a read-only part of memory, hence writing to it generates a segfault.
char *str = "string";
The above sets str to point to the literal value "string" which is hard-coded in the program's binary image, which is probably flagged as read-only in memory.
So str[0]= is attempting to write to the read-only code of the application. I would guess this is probably compiler dependent though.
To understand this error or problem you should first know difference b/w the pointer and array
so here firstly i have explain you differences b/w them
string array
char strarray[] = "hello";
In memory array is stored in continuous memory cells, stored as [h][e][l][l][o][\0] =>[] is 1 char byte size memory cell ,and this continuous memory cells can be access by name named strarray here.so here string array strarray itself containing all characters of string initialized to it.in this case here "hello"
so we can easily change its memory content by accessing each character by its index value
`strarray[0]='m'` it access character at index 0 which is 'h'in strarray
and its value changed to 'm' so strarray value changed to "mello";
one point to note here that we can change the content of string array by changing character by character but can not initialized other string directly to it like strarray="new string" is invalid
Pointer
As we all know pointer points to memory location in memory ,
uninitialized pointer points to random memory location so and after initialization points to particular memory location
char *ptr = "hello";
here pointer ptr is initialized to string "hello" which is constant string stored in read only memory (ROM) so "hello" can not be changed as it is stored in ROM
and ptr is stored in stack section and pointing to constant string "hello"
so ptr[0]='m' is invalid since you can not access read only memory
But ptr can be initialised to other string value directly since it is just pointer so it can be point to any memory address of variable of its data type
ptr="new string"; is valid
char *str = "string";
allocates a pointer to a string literal, which the compiler is putting in a non-modifiable part of your executable;
char str[] = "string";
allocates and initializes a local array which is modifiable
The C FAQ that #matli linked to mentions it, but no one else here has yet, so for clarification: if a string literal (double-quoted string in your source) is used anywhere other than to initialize a character array (ie: #Mark's second example, which works correctly), that string is stored by the compiler in a special static string table, which is akin to creating a global static variable (read-only, of course) that is essentially anonymous (has no variable "name"). The read-only part is the important part, and is why the #Mark's first code example segfaults.
The
char *str = "string";
line defines a pointer and points it to a literal string. The literal string is not writable so when you do:
str[0] = 'z';
you get a seg fault. On some platforms, the literal might be in writable memory so you won't see a segfault, but it's invalid code (resulting in undefined behavior) regardless.
The line:
char str[] = "string";
allocates an array of characters and copies the literal string into that array, which is fully writable, so the subsequent update is no problem.
String literals like "string" are probably allocated in your executable's address space as read-only data (give or take your compiler). When you go to touch it, it freaks out that you're in its bathing suit area and lets you know with a seg fault.
In your first example, you're getting a pointer to that const data. In your second example, you're initializing an array of 7 characters with a copy of the const data.
// create a string constant like this - will be read only
char *str_p;
str_p = "String constant";
// create an array of characters like this
char *arr_p;
char arr[] = "String in an array";
arr_p = &arr[0];
// now we try to change a character in the array first, this will work
*arr_p = 'E';
// lets try to change the first character of the string contant
*str_p = 'G'; // this will result in a segmentation fault. Comment it out to work.
/*-----------------------------------------------------------------------------
* String constants can't be modified. A segmentation fault is the result,
* because most operating systems will not allow a write
* operation on read only memory.
*-----------------------------------------------------------------------------*/
//print both strings to see if they have changed
printf("%s\n", str_p); //print the string without a variable
printf("%s\n", arr_p); //print the string, which is in an array.
In the first place, str is a pointer that points at "string". The compiler is allowed to put string literals in places in memory that you cannot write to, but can only read. (This really should have triggered a warning, since you're assigning a const char * to a char *. Did you have warnings disabled, or did you just ignore them?)
In the second place, you're creating an array, which is memory that you've got full access to, and initializing it with "string". You're creating a char[7] (six for the letters, one for the terminating '\0'), and you do whatever you like with it.
Assume the strings are,
char a[] = "string literal copied to stack";
char *p = "string literal referenced by p";
In the first case, the literal is to be copied when 'a' comes into scope. Here 'a' is an array defined on stack. It means the string will be created on the stack and its data is copied from code (text) memory, which is typically read-only (this is implementation specific, a compiler can place this read-only program data in read-writable memory also).
In the second case, p is a pointer defined on stack (local scope) and referring a string literal (program data or text) stored else where. Usually modifying such memory is not good practice nor encouraged.
Section 5.5 Character Pointers and Functions of K&R also discusses about this topic:
There is an important difference between these definitions:
char amessage[] = "now is the time"; /* an array */
char *pmessage = "now is the time"; /* a pointer */
amessage is an array, just big enough to hold the sequence of characters and '\0' that initializes it. Individual characters within the array may be changed but amessage will always refer to the same storage. On the other hand, pmessage is a pointer, initialized to point to a string constant; the pointer may subsequently be modified to point elsewhere, but the result is undefined if you try to modify the string contents.
Constant memory
Since string literals are read-only by design, they are stored in the Constant part of memory. Data stored there is immutable, i.e., cannot be changed. Thus, all string literals defined in C code get a read-only memory address here.
Stack memory
The Stack part of memory is where the addresses of local variables live, e.g., variables defined in functions.
As #matli's answer suggests, there are two ways of working with string these constant strings.
1. Pointer to string literal
When we define a pointer to a string literal, we are creating a pointer variable living in Stack memory. It points to the read-only address where the underlying string literal resides.
#include <stdio.h>
int main(void) {
char *s = "hello";
printf("%p\n", &s); // Prints a read-only address, e.g. 0x7ffc8e224620
return 0;
}
If we try to modify s by inserting
s[0] = 'H';
we get a Segmentation fault (core dumped). We are trying to access memory that we shouldn't access. We are attempting to modify the value of a read-only address, 0x7ffc8e224620.
2. Array of chars
For the sake of the example, suppose the string literal "Hello" stored in constant memory has a read-only memory address identical to the one above, 0x7ffc8e224620.
#include <stdio.h>
int main(void) {
// We create an array from a string literal with address 0x7ffc8e224620.
// C initializes an array variable in the stack, let's give it address
// 0x7ffc7a9a9db2.
// C then copies the read-only value from 0x7ffc8e224620 into
// 0x7ffc7a9a9db2 to give us a local copy we can mutate.
char a[] = "hello";
// We can now mutate the local copy
a[0] = 'H';
printf("%p\n", &a); // Prints the Stack address, e.g. 0x7ffc7a9a9db2
printf("%s\n", a); // Prints "Hello"
return 0;
}
Note: When using pointers to string literals as in 1., best practice is to use the const keyword, like const *s = "hello". This is more readable and the compiler will provide better help when it's violated. It will then throw an error like error: assignment of read-only location ‘*s’ instead of the seg fault. Linters in editors will also likely pick up the error before you manually compile the code.
First is one constant string which can't be modified. Second is an array with initialized value, so it can be modified.
Segmentation fault is caused when you try to access the memory which is inaccessible.
char *str is a pointer to a string that is nonmodifiable(the reason for getting segfault).
whereas char str[] is an array and can be modifiable..
somewhere I read the following lines :-
char *p = "string literal";
My program crashes if I try to assign a new value to p[i].
A:-It turns into an unnamed, static array of characters, and this unnamed array may be stored in read-only memory, and which therefore cannot necessarily be modified. In an expression context, the array is converted at once to a pointer, as usual (see section 6), so the declaration initializes p to point to the unnamed array's first element.
I know what static do but I did not understand the following in the above lines
static array of characters.
This does not refer to the static keyword, but static in the sense that it cannot be changed.
EDIT: Thinking better, it seems this phrase was badly written, I think the author back then (for those wondering, this comes from the C faq) meant "constant"
EDIT2: OP asked what is a string literal, here is the answer:
String literal is a string that is hardcoded in your source (and later in your compiled program), you do it by using double quotes " a example would be this "some string literal here"
When you assigned this to a pointer, the pointer points to the string literal, that is stored in your program running code, NOT on the main memory, this is why it cannot be modified.
You can assign a string literal to array, to initialize the array, the meaning there is different, where the array will be sent to the memory, and will have that string as its initial value.
Mind you, a string literal must be inside double quotes " if you attempt other hacks it won't compile at all. You cannot for example do this: char* someVar = {'f', 'o', 'o', '\0'}; it won't work at all. (my compiler gives the error: excess elements in scalar initializer)
"Static" refers to the storage duration of the object that will be created for the string literal.
To quote C99 6.4.5:
The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence.
Simply string literals refer to string constants about which C11 standard says that:
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
It can't change during program execution. While the string variables can change during program execution. String variables are arrays of characters whose last element is a NUL character (\0).
All string (variables) are array of characters but all character arrays are not string.
When compiler encounters a string literal, then it stores it in the read only section of memory, i.e, ROM. Here the word static refers to unmodifiable not the keyword static.
A string literal:
char *string_literal = "string literal";
or this can also be seen as
char *string_literal = {'s','t','r','i','n','g',' ','l','i','t','e','r','a','l','\0'};
A string variable
char string_var[] = "string variable";
or it can also be seen as
char string_var[] = {'s','t','r','i','n','g',' ','v','a','r','i','a','b','l','e', '\0'};
A character array:
char character_array[] = {'c','h','a','r','a','c','t','e','r',' ', 'a', 'r', 'r', 'a', 'y'};