Where is the null-character in a fixed-length empty string? [duplicate] - c

This question already has answers here:
C char array initialization: what happens if there are less characters in the string literal than the array size?
(6 answers)
Closed 6 years ago.
So I got curious reading some C code; let's say we have the following code:
char text[10] = "";
Where does the C compiler then put the null character?
I can think of 3 possible cases
In the beginning, and then 9 characters of whatever used to be in memory
In the end, so 9 characters of garbage, and then a trailing '\0'
It fills it completely with 10 '\0'
The question is, depending on either case, whether it's necessary to add the trailing '\0' when doing a strncpy. If it's case 2 and 3, then it's not strictly necessary, but a good idea; and if it's case 1, then it's absolutely necessary.
Which is it?

In your initialization, the text array is filled with null bytes (i.e. option #3).
char text[10] = "";
is equivalent to:
char text[10] = { '\0' };
In that the first element of text is explicitly initialized to zero and rest of them are implicitly zero initialized as required by C11, Initialization 6.7.9, 21:
If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.

Quoting N1256 (roughly C99), since there are no relevant changes to the language before or after:
6.7.8 Initialization
14 An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
"" is a string literal consisting of one character (its terminating null character), and this paragraph states that that one character is used to initialise the elements of the array, which means the first character is initialised to zero. There's nothing in here that says what happens to the rest of the array, but there is:
21 If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
This paragraph states that the remaining characters are initialised the same as if they had static storage duration, which means the rest of the array gets initialised to zero as well.
Worth mentioning here as well is the "if there is room" in p14:
In C, char a[5] = "hello"; is perfectly valid too, and for this case too you might want to ask where the compiler puts the null character. The answer here is: it doesn't.

String literal "" has type of character array char[1] in C and const char [1] in C++.
You can imagine it the following way
In C
chat no_name[] = { '\0' };
or
in C++
const chat no_name[] = { '\0' };
When a string literal is used to initialize a character array then all its characters are used as initializers. So for this declaration
char text[10] = "";
you in fact has
char text[10] = { '\0' };
All other characters of the array that do not have corresponding initializers (except the first character that is text[0]) then they are initialized by 0.
From the C Standard (6.7.9 Initialization)
14 An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
and
21 If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration
and at last
10 If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate. If an object that has static
or thread storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized
(recursively) according to these rules, and any padding is initialized
to zero bits;
The similar is written in the C++ Standard.
Take into account that in C you may write for example the following way
char text[5] = "Hello";
^^^
In this case the character array will not have the terminating zero because there is no room for it. :) It is the same as if you defined
char text[5] = { 'H', 'e', 'l', 'l', 'o' };

Related

What exactly happens when a character array is initialized with data larger than its size? [duplicate]

This question already has answers here:
Why doesn't the compiler detect out-of-bounds in string constant initialization?
(6 answers)
Closed 3 years ago.
int main(void)
{
char s[4] = "heloo"; // The character array is initialized with more data than its size
printf("%s",s);
}
The output is: helo�[�G�.
Why is the output in this format?
The compiler shall issue an error because there are more initializers than the number of elements of the array and the redundant initializer is not the
terminating zero of the string literal.
From the C Standard (6.7.9 Initialization)
2 No initializer shall attempt to provide a value for an object not
contained within the entity being initialized.
The one exclusion of this rule is regards character arrays when the terminating zero of a string literal can be excluded from initializers if the character array does not have a corresponding element.
14 An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
If the program was run then it has undefined behavior. The character array does not contain a string that is required when the conversion specifier %s is used.

Is there a null character added after the string literal even the bound is not correct?

Is there any null character after the character c in memory:
char a[3]="abc";
printf("the value of the character is %.3s\n",a);
printf("the value of the character is %s\n",a);
Which line is correct ?
char a[3] = "abc"; is well-formed; the three elements of the array will be the characters 'a', 'b', and 'c'. There will not be a NUL terminator. (There might still happen to be a zero byte in memory immediately after the storage allocated to the array, but if there is, it is not part of the array. printf("%s", a) has undefined behavior.)
You might think that this violates the normal rule for when the initializer is too long for the object, C99 6.7.8p2
No initializer shall attempt to provide a value for an object not contained within the entity
being initialized.
That's a "shall" sentence in a "constraints" section, so a program that violates it is ill-formed. But there is a special case for when you initialize a char array with a string literal: C99 6.7.8p14 reads
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
The parenthetical overrides 6.7.8p2 and specifies that in this case the terminating null character is discarded.
There is a similar special case for initializing a wchar_t array with a wide string literal.

What happens when the length of a string is greater than the length of it's characters?

char matrix_string[1000] = "the";
In the code above, is the resulting string "the" followed by a bunch of zeros or garbage values? What should I do if I know that this string will be getting bigger as I will be appending values to it?
Any time you initialize an array with fewer items than the array can hold, the remainder of the array is initialized to zero. Using a string literal as an initializer is no different. When you use a string literal to initialize an array, all of the array elements after the string will be initialized to zero.
The following quote is from the C11 specification, §6.7.9 paragraph 21 (emphasis added)
If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
And this is what §6.7.9 paragraph 10 says about the initialization of objects that have static storage duration
If an object that has static or thread storage duration is not
initialized explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero
bits;
if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized
to zero bits;
So the line
char matrix_string[1000] = "the";
puts 't','h','e','\0' in the first four elements of the array and sets the other 996 elements to 0.
It depends.
If matrix_string is a global variable, then the remainder of the bytes are initialised to zeros.
If matrix_string is a local variable, then the remainder of the bytes (after the first four bytes including the trailing nul) are uninitialised.
is the resulting string "the" followed by a bunch of zeros or garbage values?
It will be all zeros (or NULL characters) after "the". You can just check by printf matrix_string[5] for example, with different control %c (shows nothing) and %d (show zero).

Why is "%s" format specifier working even for a character array without `\0` & printing all its elements at once?

Look at the following code:
#include<stdio.h>
int main(void)
{
char name[7]={'E','R','I','C'};
printf("%s",name);
}
It outputs the entire name ERIC.Why is it so?Isn't %s supposed to work only if we initialize the character array name as follows:
char name[7]={'E','R','I','C','\0'}; //With NULL terminator
I am not considering the following as this obviously assumes a null-terminated character array:
char name[7]="ERIC"
According to the c11 specification
(6.7.9.21)
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
(6.7.9.10)
If an object that has static or thread storage duration is not initialized
explicitly, then:
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
Thus, when you init an array like this:
char name[7]={'E','R','I','C'};
It is as same as:
char name[7]={'E','R','I','C', 0, 0, 0};
So name is still null-terminated.
From C99 Section 7.21.6.1 Paragraph 8 %s specifier
If no l length modifier is present, the argument shall be a pointer to
the initial element of an array of character type. Characters from
the array are written up to (but not including) the terminating null
character. If the precision is specified, no more than that many bytes
are written. If the precision is not specified or is greater than the
size of the array, the array shall contain a null character.
Therefore if you have a pointer to a char * which you print using printf it will print until a \0 is not found.
Also
char name[7]={'E','R','I','C'}; is `\0' terminated in this case because the length of the array is 7 but only 4 of locations are initialized which will result in the other remaining locations to be initialized to 0. Check johnchen902's answer for more.

Char Array Initialization

How does this work::
char Test1[8] = {"abcde"} ;
AFAIK, this should be stored in memory at Test1 as
a b c d e 0 SomeJunkValue SomeJunkValue
instead it get stored as:
a b c d e 0 0 0
Initializing only adds one trailing NULL char after the string literals but how and why all other array members are initialized to NULL ?
Also, any links or any conceptual idea on what is the underlying method or function that does:char TEST1[8] = {"abcde"} ; would be very helpful.
How is:
char Test1[8] = {"abcde"} ;
different from
char Test1[8] = "abcde" ;
?
Unspecified members of a partially initialized aggregate are initialized to the zero of that type.
6.7.9 Initialization
21 - If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
10 - [...] If an object that has static or thread storage duration is not initialized
explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero; [...]
For the array char Test1[8], the initializers {"abcde"} and "abcde" are completely equivalent per 6.7.9:14:
An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces.

Resources