Yesterday, I had my Unit Test. One of the programs was to copy strings and find out its length without the string functions. This was the code I wrote:
#include <stdio.h>
int main(){
char str1[100], str2[100] = {'a'};
printf("Enter a string\n");
fgets(str1, sizeof(str1), stdin);
int i;
for(i = 0; str1[i] != '\0'; i++){
str2[i] = str1[i];
}
str2[i] = '\0';
printf("Copied string = %s", str2);
printf("Length of string = %d", i-1);
}
I had a rather surprising observation! Even if a commented str2[i] = '\0', the string would be printed correctly i.e., without the extra 'a's in the initialization which should not be overwritten as per my knowledge.
After commenting str2[i] = '\0', i expected to see this output:
test
Copied string = testaaaaaaaaaaaaaaaaaaaaaaaaaaa....
Length of string = 4
This is the output:
test
Copied string = test
Length of string = 4
How is str2 printed correctly? Is it the fact that the compiler recognized the copying of the string and silently added the null termination? I am using gcc but clang also produces similar output.
str2[100] = {'a'}; does not fill str2 with 100 repeated a. It just sets str[0] to 'a' and the rest to zero.
As far back as C89:
3.5.7 Initialization
...
Semantics
...
If an object that has static storage duration is not initialized
explicitly, it is initialized implicitly as if every member that has
arithmetic type were assigned 0 and every member that has pointer type
were assigned a null pointer constant. If an object that has
automatic storage duration is not initialized explicitly, its value is
indeterminate./65/
...
If there are fewer initializers in a list than there are members of
an aggregate, the remainder of the aggregate shall be initialized
implicitly the same as objects that have static storage duration.
First, the rule of initialization for aggregate types[1], quoting C11, chapter 6.7.9 (emphasis mine)
The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject;151) all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.
and,
If an object that has static or thread storage duration is not initialized explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
Now, an initialization statement like
char str2[100] = {'a'};
will initialize str2[0] to 'a', and str2[1] through str2[99] with 0, according to the above rule. That 0 value is the null-terminator for strings.
Thus, any value you store there, lesser than the length of the array, up to the length-1 element, is automatically going to be terminated by a null.
So, you're okay to use the array as string and get the expected behavior of that of a string.
[1]: Aggregate types:
According to chapter 6.2.5/P21
[...] Array and structure types are collectively called aggregate types.
Related
#include <stdio.h>
#include <string.h>
int main(int argc, const char * argv[]) {
int i;
char s1[100] = "Computer Programming Class";
char s2[100] = "ECE";
int length = (int)strlen(s1);
for (i = 0; i < length; i++) {
s2[i] = s1[length - 1 - i];
}
s2[i] = '\n';
printf("%s", s2);
return 0;
}
This was on one of my tests and I don't understand why it works as intended. It's a piece of code that reverses the order of s1 and stores it in s2 and then prints it out. It appears to me that the null character in s2 would be overwritten when s1 is being stored in it backwards, plus the null character in s1 would never be written in s2 since it's starting from the last character. But it prints out just fine. Why?
strlen returns the length of the string in characters not including the null terminator, so i < length does not include the null terminator in its iteration over s1
When you partially initialize an array, as you did with char s2[100] = "ECE"; the remaining elements are already initialized to zero. In other words, your write to s2 as long as length < 99 is guaranteed to be null-terminated.
From the C Standard (6.7.9 Initialization)
21 If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
and
10 If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate. If an object that has static
or thread storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or
unsigned) zero;
— if it is an aggregate, every member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized
(recursively) according to these rules, and any padding is initialized
to zero bits;
Thus in this declaration
char s2[100] = "ECE";
all 96 elements that were not initialized explicitly by elements of the string literal are zero initialized implicitly.
I wrote this code:
#include<stdio.h>
int main(void)
{
char c[10]=""; //Q
if(c[2]=='\0')
printf("hello");
return 0;
}
In the line //Q is it the entire string set to '\0' or just the 0th index? Though on checking the output it prints hello but I am not sure if its some value by fallacy or by design?
From the C Standard (6.7.9 Initialization)
21 If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
and
...If an object that has static or thread storage duration is not
initialized explicitly, then:
— if it has arithmetic type, it is initialized to (positive or
unsigned) zero;
— if it is an aggregate, every member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;
Thus all elements of the character array will be zero-initialized.
if you want to set only one char to zero (i your case the first one) you need to assign zero to this char.
void foo()
{
char c[64];
c[0] = 0;
/* ... */
}
This question already has answers here:
C char array initialization: what happens if there are less characters in the string literal than the array size?
(6 answers)
Closed 6 years ago.
So I got curious reading some C code; let's say we have the following code:
char text[10] = "";
Where does the C compiler then put the null character?
I can think of 3 possible cases
In the beginning, and then 9 characters of whatever used to be in memory
In the end, so 9 characters of garbage, and then a trailing '\0'
It fills it completely with 10 '\0'
The question is, depending on either case, whether it's necessary to add the trailing '\0' when doing a strncpy. If it's case 2 and 3, then it's not strictly necessary, but a good idea; and if it's case 1, then it's absolutely necessary.
Which is it?
In your initialization, the text array is filled with null bytes (i.e. option #3).
char text[10] = "";
is equivalent to:
char text[10] = { '\0' };
In that the first element of text is explicitly initialized to zero and rest of them are implicitly zero initialized as required by C11, Initialization 6.7.9, 21:
If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
Quoting N1256 (roughly C99), since there are no relevant changes to the language before or after:
6.7.8 Initialization
14 An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
"" is a string literal consisting of one character (its terminating null character), and this paragraph states that that one character is used to initialise the elements of the array, which means the first character is initialised to zero. There's nothing in here that says what happens to the rest of the array, but there is:
21 If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
This paragraph states that the remaining characters are initialised the same as if they had static storage duration, which means the rest of the array gets initialised to zero as well.
Worth mentioning here as well is the "if there is room" in p14:
In C, char a[5] = "hello"; is perfectly valid too, and for this case too you might want to ask where the compiler puts the null character. The answer here is: it doesn't.
String literal "" has type of character array char[1] in C and const char [1] in C++.
You can imagine it the following way
In C
chat no_name[] = { '\0' };
or
in C++
const chat no_name[] = { '\0' };
When a string literal is used to initialize a character array then all its characters are used as initializers. So for this declaration
char text[10] = "";
you in fact has
char text[10] = { '\0' };
All other characters of the array that do not have corresponding initializers (except the first character that is text[0]) then they are initialized by 0.
From the C Standard (6.7.9 Initialization)
14 An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
and
21 If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration
and at last
10 If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate. If an object that has static
or thread storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized
(recursively) according to these rules, and any padding is initialized
to zero bits;
The similar is written in the C++ Standard.
Take into account that in C you may write for example the following way
char text[5] = "Hello";
^^^
In this case the character array will not have the terminating zero because there is no room for it. :) It is the same as if you defined
char text[5] = { 'H', 'e', 'l', 'l', 'o' };
How does this work::
char Test1[8] = {"abcde"} ;
AFAIK, this should be stored in memory at Test1 as
a b c d e 0 SomeJunkValue SomeJunkValue
instead it get stored as:
a b c d e 0 0 0
Initializing only adds one trailing NULL char after the string literals but how and why all other array members are initialized to NULL ?
Also, any links or any conceptual idea on what is the underlying method or function that does:char TEST1[8] = {"abcde"} ; would be very helpful.
How is:
char Test1[8] = {"abcde"} ;
different from
char Test1[8] = "abcde" ;
?
Unspecified members of a partially initialized aggregate are initialized to the zero of that type.
6.7.9 Initialization
21 - If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
10 - [...] If an object that has static or thread storage duration is not initialized
explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero; [...]
For the array char Test1[8], the initializers {"abcde"} and "abcde" are completely equivalent per 6.7.9:14:
An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces.
In C, you can partially initialize a struct or array, with the result that the members/elements that aren't mentioned in the initializer are zero-initialized. (C99 section 6.7.8.19). For example:-
int a[4] = {1, 2};
// a[0] == 1
// a[1] == 2
// a[2] == 0
// a[3] == 0
You can also initialize "an array of character type" with a string literal (C99 section 6.7.8.14), and "successive characters ... initialize the elements of the array". For example:-
char b[4] = "abc";
// b[0] == 'a'
// b[1] == 'b'
// b[2] == 'c'
// b[3] == '\0'
All pretty straightforward. But what happens if you explicitly give the length of the array, but use a literal that's too short to fill the array? Are the remaining characters zero-initialized, or do they have undefined values?
char c[4] = "a";
// c[0] == 'a'
// c[1] == '\0'
// c[2] == ?
// c[3] == ?
Treating it as a partial initializer would make sense, it would make char c[4] = "a" behave exactly like char c[4] = {'a'}, and it would have the useful side-effect of letting you zero-initialize a whole character array concisely with char d[N] = "", but it's not at all clear to me that that's what the spec requires.
char c[4] = "a";
All the remaining elements of the array will be set to 0. That is, not only c[1] but also c[2] and c[3].
Note that this does not depend on the storage duration of c, i. e., even if c has automatic storage duration the remaining elements will be set to 0.
From the C Standard (emphasis mine):
(C99, 6.7.8p21) "If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration."
From the C99 standard (as already stated by ouah):
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
and:
If an object that has automatic storage duration is not initialized explicitly,
its value is indeterminate. If an object that has static storage duration is
not initialized explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules;
if it is a union, the first named member is initialized (recursively) according to these
rules.
And char is an arithmetic type, so the remaining elements of the array will be initialised to zero.
Absolutely everywhere in C language it follows the all-or-nothing approach to initialization. If you initialize an aggregate only partially, the rest of that aggregate gets zero-initialized.
One can say that this is excessive and less than optimal with strings, but that's just how it works in C.