meaning of static array of characters? - c

somewhere I read the following lines :-
char *p = "string literal";
My program crashes if I try to assign a new value to p[i].
A:-It turns into an unnamed, static array of characters, and this unnamed array may be stored in read-only memory, and which therefore cannot necessarily be modified. In an expression context, the array is converted at once to a pointer, as usual (see section 6), so the declaration initializes p to point to the unnamed array's first element.
I know what static do but I did not understand the following in the above lines
static array of characters.

This does not refer to the static keyword, but static in the sense that it cannot be changed.
EDIT: Thinking better, it seems this phrase was badly written, I think the author back then (for those wondering, this comes from the C faq) meant "constant"
EDIT2: OP asked what is a string literal, here is the answer:
String literal is a string that is hardcoded in your source (and later in your compiled program), you do it by using double quotes " a example would be this "some string literal here"
When you assigned this to a pointer, the pointer points to the string literal, that is stored in your program running code, NOT on the main memory, this is why it cannot be modified.
You can assign a string literal to array, to initialize the array, the meaning there is different, where the array will be sent to the memory, and will have that string as its initial value.
Mind you, a string literal must be inside double quotes " if you attempt other hacks it won't compile at all. You cannot for example do this: char* someVar = {'f', 'o', 'o', '\0'}; it won't work at all. (my compiler gives the error: excess elements in scalar initializer)

"Static" refers to the storage duration of the object that will be created for the string literal.
To quote C99 6.4.5:
The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence.

Simply string literals refer to string constants about which C11 standard says that:
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
It can't change during program execution. While the string variables can change during program execution. String variables are arrays of characters whose last element is a NUL character (\0).
All string (variables) are array of characters but all character arrays are not string.
When compiler encounters a string literal, then it stores it in the read only section of memory, i.e, ROM. Here the word static refers to unmodifiable not the keyword static.
A string literal:
char *string_literal = "string literal";
or this can also be seen as
char *string_literal = {'s','t','r','i','n','g',' ','l','i','t','e','r','a','l','\0'};
A string variable
char string_var[] = "string variable";
or it can also be seen as
char string_var[] = {'s','t','r','i','n','g',' ','v','a','r','i','a','b','l','e', '\0'};
A character array:
char character_array[] = {'c','h','a','r','a','c','t','e','r',' ', 'a', 'r', 'r', 'a', 'y'};

Related

I want to know how double quotes are used in C

In the code char * str = "hello";, I understand that code "hello" is to allocate the word hello to any other memory and then put the first value of that allocated memory into the variable str.
But when I use the code char str[10] = "hello";, I understood that the word hello is included in each element of the array.
If then, on the top, the code "hello" returns the address of the memory
and on the bottom, the code "hello" returns the word h e l l o \n.
I want to know why they are different and if I'm wrong, I want to know what double quotes return.
C is a bit quirky. You have two distinct use cases here. But let's first start with what "hello" is.
Your "hello" in the program source code is a character string literal. That is a character sequence enclosed in double quotes. When the compiler is compiling this source code, it appends a zero byte to the sequence, so that standard library functions like strlen() can work on it. The resulting zero-terminated sequence is then used by the compiler to "initialize an array of static storage duration and length just sufficient to contain the sequence array of constant characters" (n1570 ISO C draft, 6.4.5/6). That length is 6: The 5 characters h, e, l, l and o as well as the appended zero byte.
"Static storage duration" means that the array exists the entire time the program is running (as opposed to objects with automatic local storage duration, e.g. local variables, and those with dynamic storage duration, which are created via malloc() or calloc()).
You can memorize the address of that array, as in char *str = "hello";. This address will point to valid memory during the lifetime of the program.
The second use case is a special syntax for initializing character arrays. It is just syntactic sugar for this common use case, and a deviation from the fact that you cannot normally initialize arrays with arrays.1
This time you don't define a pointer, you define a proper array of 10 chars. You then use the string literal to initialize it. You always can use the generic method to initialize a character array by listing the individual array elements, separated by commas, in curly braces (by the way, this generic method works also for the other kind of compound types, namely structs):
char str[10] = { 'h', 'e', 'l', 'l', 'o', '\0' };
This is entirely equivalent to
char str[10] = "hello";
Now your array has more elements (10) than the number of characters in the initializing array produced from the string literal (6); the standard stipulates that "subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration". Those global and static variables are initialized with zero, which means that the character array str ends with 4 zero characters.
It is immediately obvious why Dennis Ritchie added the somewhat anti-paradigmatic initialization of character arrays via a string literal, probably after the second time he had to do it with the generic array initialization syntax. Designing your own language has its benefits.
1 For example, static char src[] = "123"; char dest[] = src; doesn't work. You have to use strcpy().
The initialization:
char * str = "hello";
in most C implementations makes sure that the string hello is placed in a constant data section of the executable memory. Exactly six bytes are written, the last one being the string terminator '\0'.
str char pointer contains the address of the first character 'h', so that anyone accessing the string knows that the following bytes have to be read until the terminator character is found.
The other initialization
char str[10] = "hello"; // <-- string must be enclosed in double quotes
is very similar, as str points to the first character of the string and that the following characters are written in the following memory locations (included the string terminator).
But:
Even if only six bytes are explicitly initialized, ten bytes are allocated because that's the size of the array. In this case, the four trailing bytes will contain zeroes
Data is not constant and can be changed, while in the previous example it wasn't possible because such initialization, in most C implementations, instructs the compiler to use a constant data section
You seem to be mixing up some things:
char str[10] = "hello';
This does not even compile: when you start with a double-quote, you should end with one:
char str[10] = "hello";
In memory, this has following effect:
str[0] : h
str[1] : e
str[2] : l
str[3] : l
str[4] : o
str[5] : 0 (the zero character constant)
str[6] : xxx
str[7] : xxx
str[8] : xxx
str[9] : xxx
(By xxx, I mean that this can be anything)
As a result, the code will not return hello\n (with an end-of-line character), just hello\0 (the zero character).
The double quotes just mention the beginning and the ending of a string constant and return nothing.

difference between a string defined by pointer or an array

I was reading about pointers in K&R book here:
https://hikage.freeshell.org/books/theCprogrammingLanguage.pdf
There is an important difference between these definitions:
char amessage[] = "now is the time"; /* an array */
char *pmessage = "now is the time"; /* a pointer */
amessage is an array, just big enough to hold the sequence of characters and ’\0’ that initializes it. Individual characters
within the array may be changed but amessage will always refer to the same storage. On the other hand, pmessage is a
pointer, initialized to point to a string constant; the pointer may subsequently be modified to point elsewhere, but the result is
undefined if you try to modify the string contents.
I dont understand why cwe cant modify the string content !
I dont understand why cwe cant modify the string content !
Because the C standard says so: “If the program attempts to modify such an array [the array defined by a string literal], the behavior is undefined” (C 2018 6.4.5 7). A string literal is a sequence of characters in quotes in source code, such as "Hello, world.\n". (String literals may also be preceded by an encoding prefix u8, u, U, or L, as in L"abc".) A string literal defines an array containing the characters of the string plus a terminating null character.
A reason that attempting to modify the string literal’s array is that string literals were, and are, widely used for strings that are constant—error messages to be printed at times, format strings for printf operations, hard-coded names of things, and so on. As C developed, and the standard was written, it made sense for string literals to be treated as read-only and to allow a compiler to put them in read-only storage. Additionally, some compilers would use the same storage for identical string literals that appeared in different places, and some would use the same storage for a string literal that was a trailing substring of another string literal. Because of this shared storage, modifying one string would also modify the other. So allowing programs to modify string literals could cause some problems.
So, if you merely point to a string literal, you are pointing to something that should not be modified. If you want your own copy that can be modified, simply define it with an array as you show with char amessage[] = "now is the time";. Such a definition defines an array, amessage that has its own storage. That array is initialized with the contents of the string literal but is separate from it.
char amessage[] = "now is the time"; /* an array */
amessage is a modifiable array of chars.
char *pmessage = "now is the time"; /* a pointer */
pmessage is a pointer to the string literal. Attempt to modify the string literal is an Undefined Behaviour.
When you initialize a pointer with a string literal, the compiler creates a read-only array (and indeed is free to merge the pointers into one if you have several initializers using the same literal string (character by character) as in:
char *a = "abcdef", *b = "abcdef";
it is probable that both pointers be initialized to the same address in memory. This is the reason by which you are not allowed to modify the string, and why the behaviour can be unpredictable (you don't know if the compiler has merged both strings)
The thing goes further, as the compiler is permitted to do the following, on the next scenario:
char *a = "foo bar", *b = "bar";
the compiler is permitted to initialize a to point to a char array with the characters {'f', 'o', 'o', ' ', 'b', 'a', 'r', '\0'} and initialize also the pointer b to the fifth position of the array, as one of the string literals is a suffix of the other.
Allowing this allows the compiler to make extensive savings in the final executable and so, the string literals are assigned a read-only segment in the executable (they are placed in the .text segment or a similar one)
On the other hand, initializing an array has no problems, as you are defining the array variable that will store the characters, and it is not the compiler which is doing this. An initialization like:
char a[] = "Hello";
will arrange things to have a global variable of type array of chars with space for six characters. But you can also specify between the brackets the array size, as in
char a[32] = "Hello";
and then the array will have 32 characters (from 0 to 31) and the first five will be initialized to the character literals 'H', 'e', 'l', 'l' and 'o', followed by 27 null characters '\0'.
You are also allowed to say:
char a[4] = "Hello";
but in this case you will get an array initialized as {'H', 'e', 'l', 'l'} (only the first four characters are used from the string literal, and you will get a warning from the compiler, signalling the dangerous bend)
Last, think always that an assignment and an initialization are different things, despite they use the same symbol = to indicate it, they are not the same thing. You will never be allowed to write a sentence like:
char a[26];
a = "foo bar";
because the expression "foo bar" represents a char * pointing to a static array (unmodifiable) and an array cannot be assigned.

Static duration of string literal

On this answer by michael-burr on this question:
what-is-the-type-of-string-literals-in-c-and-c
I found that
In C the type of a string literal is a char[] - it's not const according to the type, but it is undefined behavior to modify the contents
from this I can think that sentence "How are you" can't be modified (just as char c*="how are you?") but once it is used to initialize some char[] then it can be unless declared as const.
Apart from this from that answer:
The multibyte character sequence is then used to initialize an array of static storage duration
and from C Primer Plus 6th Edition I found:
Character string constants are placed in the static storage class, which means that if you use
a string constant in a function, the string is stored just once and lasts for the duration of the
program, even if the function is called several times
But when I tried this code:
#include <stdio.h>
void fun() {
char c[] = "hello";
printf("%s\n", c);
c[2] = 'x';
}
int main(void) {
fun();
fun();
return 0;
}
The array inside function fun doesn't behave as if it has retained the changed value.
Where am I going wrong on this?
char c[]="hello"; is not at all the same thing as char *c="hello";. The latter initializes a pointer to the aforementioned static string storage, modifying c[2] would be undefined behavior. The former is equivalent to:
char c[] = {'h', 'e', 'l', 'l', 'o', '\0'};
It's initializing an array on the stack, it's not creating a reference or pointer to the static string in some other memory location. Like any other non-const stack array, you can modify it however you please (as long as you don't go out of bounds).
You’re not modifying the string literal. You’re modifying a local array that contains a copy of the string literal. Each time you call fun, a new instance of c is created and initialized. When fun exits, that instance of c ceases to exist.
Because it is an automatic variable and it's instance is initialized every time you call the function.
Change it to have static storage
static char c[]="hello";
And it will behave as you expect keeping the changed value between the function calls
This is something completely different from how the string literals and compound literals are stored and used in the variable initialization. It is left to the implementation - for example this initialization of the automatic storage variable may be done by copying data from .rodata segment or it can be just couple immediate store instructions and the literal will be stored in the .text segment

differences in array initialization(char, string, other) regarding storage duration

In this question it was said in the comments:
char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; and char arr[10] =
"Hello"; are strictly the same thing. – Michael Walz
This got me thinking.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal with.
Does char b[10]= {72, 101, 108, 108, 111, 0}; also create a "string" literal with static storage duration? Because theoretically it is the same thing.
char a = 'a'; is the same thing as char a; ...; a = 'a';, so your thoughts are correct 'a' is simply written to a
Are there differences between:
char a = 'a';
char a = {'a'};
How/where are the differences defined?
EDIT:
I see that I haven't made it clear enough that I am particularly interested in the memory usage/storage duration of the literals. I will leave the question as it is, but would like to make the emphasis of the question more clear in this edit.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
Yes, but string literals are also a grammatical item in the C language. char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; is not a string literal, it is an initializer list. The initializer list does however behave as if it has static storage duration, remaining elements after the explicit \0 are set to zero etc.
The initializer list itself is stored in some manner of ROM memory. If your variable arr has static storage duration too, it will get allocated in the .data segment and initialized from the ROM init list before the program is started. If arr has automatic storage duration (local), then it is initialized from ROM in run-time, when the function containing arr is called.
The ROM memory where the initializer list is stored may or may not be the same ROM memory as used for string literals. Often there's a segment called .rodata where these things end up, but they may as well end up in some other segment, such as the code segment .text.
Compilers like to store string literals in a particular memory segment, because that means that they can perform an optimization called "string pooling". Meaning that if you have the string literal "Hello" several times in your program, the compiler will use the same memory location for it. It may not necessarily do this same optimization for initializer lists.
Regarding 'a' versus {'a'} in an initializer list, that's just a syntax hiccup in the C language. C11 6.7.6/11:
The initializer for a scalar shall be a single expression, optionally enclosed in braces. The
initial value of the object is that of the expression (after conversion); the same type
constraints and conversions as for simple assignment apply,
In plain English, this means that a "non-array" (scalar) can be either initialized with or without braces, it has the same meaning. Apart from that, the same rules as for regular assignment apply.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
Yes. But with char arr[10] = "Hello";, you are copying the string literal to an array arr and there's no need to "keep" the string literal. So if an implementation chooses to do remove the string literal altogether after copying it to arr and that's totally valid.
But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal.
Again there's no need to make/store a string literal for this.
Only if you directly have a pointer to a string literal, it'd be usually stored somewhere such as:
char *string = "Hello, world!\n";
Even then an implementation can choose not to do so under the "as-if" rule. E.g.,
#include <stdio.h>
#include <string.h>
static const char *str = "Hi";
int main(void)
{
char arr[10];
strcpy(arr, str);
puts(arr);
}
"Hi" can be eliminated because it's used only for copying it into arr and isn't accessed directly anywhere. So eliminating the string literal (and the strcpy call too) as if you had "char arr[10] = "Hi"; and wouldn't affect the observable behaviour.
Basically the C standard doesn't necessitate a string literal has to be stored anywhere as long as the properties associated with a string literal are satisfied.
Are there differences between: char a = 'a'; char a = {'a'}; How/where are the differences defined?
Yes. C11, 6.7.9 says:
The initializer for a scalar shall be a single expression, optionally enclosed in braces. [..]
Per the syntax, even:
char c = {'a',}; is valid and equivalent too (though I wouldn't recommend this :).
In the abstract machine, char arr[10] = "Hello"; means that arr is initialized by copying data from the string literal "Hello" which has its own existence elsewhere; whereas the other version just has initial values like any other variable -- there is no string literal involved.
However, the observable behaviour of both versions is identical: there is created arr with values set as specified. This is what the other poster meant by the code being identical; according to the Standard, two programs are the same if they have the same observable behaviour. Compilers are allowed to generate the same assembly for both versions.
Your second question is entirely separate to the first; but char a = 'a'; and char a = {'a'}; are identical. A single initializer may optionally be enclosed in braces.
I belive your question is highly implementation dependant (HW and compiler wise). However, in general: arrays are placed in RAM, let it be global or not.
I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.
Yes this saves the string "Hello" in ROM (read only memory). Your array is loaded the literal in runtime.
But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal.
Yes but in this case the single characters are placed in ROM. The array you are initialized is loaded with character literals in runtime.
Does char b[10]= {72, 101, 108, 108, 111, 0}; also create a "string" literal with static storage duration? Because theoretically it is the same thing.
If you use UTF-8, then yes, since char == uint8_t and those are the values.
Are there differences between:
char a = 'a';
char a = {'a'};
How/where are the differences defined?
I believe not.
In reply to edit
Do you mean the lifetime of storage of string literals? Have a look at this.
So a string literal has static storage duration. It remains throughout the lifetime of the program, hardcoded in memory.

I want to know why this works without having to bind memory for the string

Hello guys I recently picked up C programming and I am stuck at understanding pointers. As far as I understand to store a value in a pointer you have to bind memory (using malloc) the size of the value you want to store. Given this, the following code should not work as I have not allocated 11 bytes of memory to store my string of size 11 bytes and yet for some reason beyond my comprehension it works perfectly fine.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(){
char *str = NULL;
str = "hello world\0";
printf("filename = %s\n", str);
return 0;
}
In this case
str = "hello world\0";
str points to the address of the first element of an array of chars, initialized with "hello world\0". In other words, str points to a "string literal".
By definition, the array is allocated and the address of the first element has to be "valid".
Quoting C11, chapter §6.4.5, String literals
In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals.78) The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence. [....]
Memory allocation still happens, just not explicitly by you (via memory allocator functions).
That said, the "...\0" at the end is repetitive, as mentioned (in the first statement of the quote) above, by default, the array will be null-terminated.
Using a char variable without malloc is stating that the string you are assigning is read-only. This means that you are creating a pointer to a string constant. "hello world\0" is somewhere in the read-only part of memory and you are just pointing to it.
Now if you want to make changes to the string. Let's say changing the h to H, that would be str[0]='H'. Without malloc that will not be possible to make.
When you declare a string literal in a C program, it is stored in a read-only section of the program code. A statement of the form char *str = "hello"; will assign the address of this string to the char* pointer. However, the string itself (i.e., the characters h, e, l, l and o, plus the \0 string terminator) are still located in read-only memory, so you can't change them at all.
Note that there's no need for you to explicitly add a zero byte terminator to your string declarations. The C compiler will do this for you.
Right. But in this case you are just pointing to a string literal which is placed in the constant memory area. Your pointer is created in the stack area. So you are just pointing to another address. i.e, at the starting address of string literal.
Try using copy the string literal in your pointer variable. Then it will give error because you have not allocated memory. Hope you understand now.
Storage for string literals is set aside at program startup and held until the program exits. This storage may be read-only, and attempting to modify the contents of a string literal results in undefined behavior (it may work, it may crash, it may do something in between).

Resources