C automatically appending null character to a string? - arrays

#include <stdio.h>
#include <string.h>
int main(int argc, const char * argv[]) {
int i;
char s1[100] = "Computer Programming Class";
char s2[100] = "ECE";
int length = (int)strlen(s1);
for (i = 0; i < length; i++) {
s2[i] = s1[length - 1 - i];
}
s2[i] = '\n';
printf("%s", s2);
return 0;
}
This was on one of my tests and I don't understand why it works as intended. It's a piece of code that reverses the order of s1 and stores it in s2 and then prints it out. It appears to me that the null character in s2 would be overwritten when s1 is being stored in it backwards, plus the null character in s1 would never be written in s2 since it's starting from the last character. But it prints out just fine. Why?

strlen returns the length of the string in characters not including the null terminator, so i < length does not include the null terminator in its iteration over s1
When you partially initialize an array, as you did with char s2[100] = "ECE"; the remaining elements are already initialized to zero. In other words, your write to s2 as long as length < 99 is guaranteed to be null-terminated.

From the C Standard (6.7.9 Initialization)
21 If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
and
10 If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate. If an object that has static
or thread storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or
unsigned) zero;
— if it is an aggregate, every member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized
(recursively) according to these rules, and any padding is initialized
to zero bits;
Thus in this declaration
char s2[100] = "ECE";
all 96 elements that were not initialized explicitly by elements of the string literal are zero initialized implicitly.

Related

initializing string to null termination in order to avoid memset

I wrote this code:
#include<stdio.h>
int main(void)
{
char c[10]=""; //Q
if(c[2]=='\0')
printf("hello");
return 0;
}
In the line //Q is it the entire string set to '\0' or just the 0th index? Though on checking the output it prints hello but I am not sure if its some value by fallacy or by design?
From the C Standard (6.7.9 Initialization)
21 If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
and
...If an object that has static or thread storage duration is not
initialized explicitly, then:
— if it has arithmetic type, it is initialized to (positive or
unsigned) zero;
— if it is an aggregate, every member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;
Thus all elements of the character array will be zero-initialized.
if you want to set only one char to zero (i your case the first one) you need to assign zero to this char.
void foo()
{
char c[64];
c[0] = 0;
/* ... */
}

Why is this non-null terminated string printed correctly

Yesterday, I had my Unit Test. One of the programs was to copy strings and find out its length without the string functions. This was the code I wrote:
#include <stdio.h>
int main(){
char str1[100], str2[100] = {'a'};
printf("Enter a string\n");
fgets(str1, sizeof(str1), stdin);
int i;
for(i = 0; str1[i] != '\0'; i++){
str2[i] = str1[i];
}
str2[i] = '\0';
printf("Copied string = %s", str2);
printf("Length of string = %d", i-1);
}
I had a rather surprising observation! Even if a commented str2[i] = '\0', the string would be printed correctly i.e., without the extra 'a's in the initialization which should not be overwritten as per my knowledge.
After commenting str2[i] = '\0', i expected to see this output:
test
Copied string = testaaaaaaaaaaaaaaaaaaaaaaaaaaa....
Length of string = 4
This is the output:
test
Copied string = test
Length of string = 4
How is str2 printed correctly? Is it the fact that the compiler recognized the copying of the string and silently added the null termination? I am using gcc but clang also produces similar output.
str2[100] = {'a'}; does not fill str2 with 100 repeated a. It just sets str[0] to 'a' and the rest to zero.
As far back as C89:
3.5.7 Initialization
...
Semantics
...
If an object that has static storage duration is not initialized
explicitly, it is initialized implicitly as if every member that has
arithmetic type were assigned 0 and every member that has pointer type
were assigned a null pointer constant. If an object that has
automatic storage duration is not initialized explicitly, its value is
indeterminate./65/
...
If there are fewer initializers in a list than there are members of
an aggregate, the remainder of the aggregate shall be initialized
implicitly the same as objects that have static storage duration.
First, the rule of initialization for aggregate types[1], quoting C11, chapter 6.7.9 (emphasis mine)
The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject;151) all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.
and,
If an object that has static or thread storage duration is not initialized explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
Now, an initialization statement like
char str2[100] = {'a'};
will initialize str2[0] to 'a', and str2[1] through str2[99] with 0, according to the above rule. That 0 value is the null-terminator for strings.
Thus, any value you store there, lesser than the length of the array, up to the length-1 element, is automatically going to be terminated by a null.
So, you're okay to use the array as string and get the expected behavior of that of a string.
[1]: Aggregate types:
According to chapter 6.2.5/P21
[...] Array and structure types are collectively called aggregate types.

Where is the null-character in a fixed-length empty string? [duplicate]

This question already has answers here:
C char array initialization: what happens if there are less characters in the string literal than the array size?
(6 answers)
Closed 6 years ago.
So I got curious reading some C code; let's say we have the following code:
char text[10] = "";
Where does the C compiler then put the null character?
I can think of 3 possible cases
In the beginning, and then 9 characters of whatever used to be in memory
In the end, so 9 characters of garbage, and then a trailing '\0'
It fills it completely with 10 '\0'
The question is, depending on either case, whether it's necessary to add the trailing '\0' when doing a strncpy. If it's case 2 and 3, then it's not strictly necessary, but a good idea; and if it's case 1, then it's absolutely necessary.
Which is it?
In your initialization, the text array is filled with null bytes (i.e. option #3).
char text[10] = "";
is equivalent to:
char text[10] = { '\0' };
In that the first element of text is explicitly initialized to zero and rest of them are implicitly zero initialized as required by C11, Initialization 6.7.9, 21:
If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
Quoting N1256 (roughly C99), since there are no relevant changes to the language before or after:
6.7.8 Initialization
14 An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
"" is a string literal consisting of one character (its terminating null character), and this paragraph states that that one character is used to initialise the elements of the array, which means the first character is initialised to zero. There's nothing in here that says what happens to the rest of the array, but there is:
21 If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
This paragraph states that the remaining characters are initialised the same as if they had static storage duration, which means the rest of the array gets initialised to zero as well.
Worth mentioning here as well is the "if there is room" in p14:
In C, char a[5] = "hello"; is perfectly valid too, and for this case too you might want to ask where the compiler puts the null character. The answer here is: it doesn't.
String literal "" has type of character array char[1] in C and const char [1] in C++.
You can imagine it the following way
In C
chat no_name[] = { '\0' };
or
in C++
const chat no_name[] = { '\0' };
When a string literal is used to initialize a character array then all its characters are used as initializers. So for this declaration
char text[10] = "";
you in fact has
char text[10] = { '\0' };
All other characters of the array that do not have corresponding initializers (except the first character that is text[0]) then they are initialized by 0.
From the C Standard (6.7.9 Initialization)
14 An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
and
21 If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration
and at last
10 If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate. If an object that has static
or thread storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized
(recursively) according to these rules, and any padding is initialized
to zero bits;
The similar is written in the C++ Standard.
Take into account that in C you may write for example the following way
char text[5] = "Hello";
^^^
In this case the character array will not have the terminating zero because there is no room for it. :) It is the same as if you defined
char text[5] = { 'H', 'e', 'l', 'l', 'o' };

Why is "%s" format specifier working even for a character array without `\0` & printing all its elements at once?

Look at the following code:
#include<stdio.h>
int main(void)
{
char name[7]={'E','R','I','C'};
printf("%s",name);
}
It outputs the entire name ERIC.Why is it so?Isn't %s supposed to work only if we initialize the character array name as follows:
char name[7]={'E','R','I','C','\0'}; //With NULL terminator
I am not considering the following as this obviously assumes a null-terminated character array:
char name[7]="ERIC"
According to the c11 specification
(6.7.9.21)
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
(6.7.9.10)
If an object that has static or thread storage duration is not initialized
explicitly, then:
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
Thus, when you init an array like this:
char name[7]={'E','R','I','C'};
It is as same as:
char name[7]={'E','R','I','C', 0, 0, 0};
So name is still null-terminated.
From C99 Section 7.21.6.1 Paragraph 8 %s specifier
If no l length modifier is present, the argument shall be a pointer to
the initial element of an array of character type. Characters from
the array are written up to (but not including) the terminating null
character. If the precision is specified, no more than that many bytes
are written. If the precision is not specified or is greater than the
size of the array, the array shall contain a null character.
Therefore if you have a pointer to a char * which you print using printf it will print until a \0 is not found.
Also
char name[7]={'E','R','I','C'}; is `\0' terminated in this case because the length of the array is 7 but only 4 of locations are initialized which will result in the other remaining locations to be initialized to 0. Check johnchen902's answer for more.

Char Array Initialization

How does this work::
char Test1[8] = {"abcde"} ;
AFAIK, this should be stored in memory at Test1 as
a b c d e 0 SomeJunkValue SomeJunkValue
instead it get stored as:
a b c d e 0 0 0
Initializing only adds one trailing NULL char after the string literals but how and why all other array members are initialized to NULL ?
Also, any links or any conceptual idea on what is the underlying method or function that does:char TEST1[8] = {"abcde"} ; would be very helpful.
How is:
char Test1[8] = {"abcde"} ;
different from
char Test1[8] = "abcde" ;
?
Unspecified members of a partially initialized aggregate are initialized to the zero of that type.
6.7.9 Initialization
21 - If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
10 - [...] If an object that has static or thread storage duration is not initialized
explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero; [...]
For the array char Test1[8], the initializers {"abcde"} and "abcde" are completely equivalent per 6.7.9:14:
An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces.

Resources