Sample 1:
char a []={'h','i'};
int i;
for(i=0;a[i]!='\0';i++){
printf("%c",a[i]);
}
printf("%s",a);
Output: hi☻hi♥
Sample 2:
char a []={'h','i'};
int i;
for(i=0;a[i]!='\0';i++){
char l = a[i];
printf("%c",a[i]);
}
printf("%s",a);
Output:hii♥hi♥♦
Sample 3:
char a [5]={'h','i'};
int i;
for(i=0;a[i]!='\0';i++){
printf("%c",a[i]);
}
printf("%s",a);
Output: hihi
Why the output of these three programs are dissimilar?
Sample 1 and sample 2 are almost similar code except an extra line char l = a[i] and Sample 3 is different from sample 1 and 2 based on the declaration of the size of the array.
In C, arrays only have a size, but no terminator. So an array of two characters (like your first two examples) will have the two characters you specified and nothing else. When you loop looking for the "terminator" you will go out of bounds and have undefined behavior.
The third case is different, because there you define an array of five elements but only initialize the first two. The C standard then requires the rest of the array to be initialized to zero, which is the same as the character '\0'. The array in the third example still haven't got an explicit terminator though, it just so happens that the remainder is initialized the same value as the string terminator.
For sample 1 and 2, you invoke undefined behavior by passing a non-null terminated array as argument to %s in printf().
For a definition like
char a []={'h','i'};
a will be allocated memory to hold only two elements, there will be no extra space allocated to store a terminating null, in this case of using brace-enclosed initializer list.
Quoting Chapter §7.21.6.1, for use of %s format specifier with printf() family,
s If no l length modifier is present, the argument shall be a pointer to the initial
element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. If the
precision is specified, no more than that many bytes are written. If the
precision is not specified or is greater than the size of the array, the array shall
contain a null character.
OTOH, in case of sample 3, for a definition like
char a [5]={'h','i'};
the array is null-terminated, so the output is proper. The array is null-terminated in this case, because, you have provided the array size at the time of declaration and supplied less number of initiliazers in the brace enclosed list, so the remaining elements are initialized to 0 (as if they have static storage). Related, C11, chapter §6.7.9, (emphasis mine)
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
For printf("%s",a) to work, the memory block pointed by a must end with 0.
Same thing goes for the code starting with for (i=0; a[i]!='\0'; i++).
In all of your examples, this memory block ends with 'i', not with 0.
You can fix it by changing the initialization of a to either one of the following:
char a[] = {'h','i',0};
char a[] = {'h','i','\0'};
char a[] = "hi";
char *a = "hi";
Related
This question already has answers here:
What is a null-terminated string?
(7 answers)
Closed last year.
#include <stdio.h>
#include <string.h>
int main()
{
char ch[20] = {'h','i'};
int k=strlen(ch);
printf("%d",k);
return 0;
}
The output is 2.
As far as I know '\0' helps compiler identify the end of string but the output here suggests the strlen can detect the end on it's own then why do we need '\0'?
long story short: it's your compiler making proactive decisions based on the standard.
long story:
char ch[20] = {'h','i'}
in the line above what you are implying to your compiler is;
allocate a memory big enough to store 20 characters (aka, array of 20 chars).
initialize first two slices (first two members of the array) as 'h' & 'i'.
implicitly initialize the rest.
since you are initialing your char array, your compiler is smart enough to insert the null terminator to the third element if it has enough space remaining. This process is the standard for initialization.
if you were to remove the initialization syntax and initialize each member manually like below, the result is undefined behavior.
char ch[20];
ch[0] = 'h';
ch[1] = 'i';
Also, if you were to not have extra space for your compiler to put the null terminator, even if you used a initializer the result would still be an undefined behavior as you can easily test via this code snippet below:
char ch[2] = { 'h','i' };
int k = strlen(ch);
printf("%d\n%s\n", k, ch);
now, if you were to increase the array size of 'ch' from 2 to 3 or any other number higher than 2, you can see that your compiler initializes it with the null terminator thus no more undefined behavior.
In this declaration:
char ch[20] = {'h','i'};
the first two elements are initialized explicitly and all other elements are initialized implicitly by zeroes.
The above declaration in fact (with one exceptions that the third element of the array is also explicitly initialized) is equivalent to:
char ch[20] = "hi";
Pat attention to that the string literal is represented as the following array:
{ 'h', 'i', '\0' }
That is the array contains a string that is terminated by the zero character '\0' and the function strlen can successfully find the length of the stored string.
If you would write for example:
char ch[2] = "hi";
then in this case the array ch does not have a space to store the terminating zero of the string literal. In this case applying the function strlen to this array invokes undefined behavior.
A null byte (i.e. the value 0) is what defines the end of a string in C.
When you defined ch, you gave less initializers than values in the array, so the remaining elements are set to 0. This results in a null terminated string.
The strlen function is basically looking for that value and counting how many elements it sees before it finds the null byte.
As far as I know '\0' helps compiler identify the end of string
Technically, it helps user code and the C runtime library identify the ends of strings. To the extent that the compiler needs to know where strings end, it knows without looking for a terminator.
but the output here suggests the strlen can detect the end on it's own
That would be a misinterpretation. The actual fact is that your string is null-terminated even though you did not put a null terminator in it explicitly. This is a consequence of declaring your array with an initializer that specifies values for only some of the elements. As some of your other answers describe in more detail, that does not produce a partial initialization. Rather, elements for which the initializer does not specify values are default-initialized. For elements of type char, that means initialization with 0, which serves as a string terminator.
Moreover, if the array were without a terminator then the result of passing it to strlen() would be undefined. You could not then conclude anything from the result.
then why do we need '\0'?
So that user code and many standard library functions can recognize the ends of strings. You already know this.
But in many cases we do not need to provide terminators explicitly. In particular, we do not need to represent them in string literals (and it means something different than you probably intended if you do), and you don't need to represent them in the initializers for char arrays storing strings, provided that the array has more elements than you specify in the initializer.
It is likely that your array ch contained zeros thus the byte after i is already set to zero. You can view it with a debugger or simply test it in the code. Trust me, strlen needs the zero to work.
I have a simple code in C to see if three same char arrays all end with '\0':
int main(){
char a[4] = "1234";
char b[4] = "1234";
char c[4] = "1234";
if(a[4] == '\0')
printf("a end with '\\0'\n");
if(b[4] == '\0')
printf("b end with '\\0'\n");
if(c[4] == '\0')
printf("c end with '\\0'\n");
return 0;
}
But the output shows that only array b ends with terminator '\0'. Why is that? I supposed all char arrays have to end with '\0'.
Output:
b end with '\0'
The major problem is, for an array defined like char a[4] = .... (with a size of 4 elements), using if (a[4] ....) is already off-by-one and causes undefined behavior.
You want to check for a[3], as that is the last valid element.
That said, in your case, you don;t have room for null-terminator!!
Emphasizing the quote from C11, §6.7.9,
An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
So, you need to either
use an array size which has room for the null-terminator
or, use an array of unknown size, like char a[ ] = "1234"; where, the array size is automatically determined by the length of the supplied initializer (including the null-terminator.)
It is undefined behaviour because you have trying to access array out of bound.
Do not specify the bound of a string initialized with a string literal because the compiler will automatically allocate sufficient space for entire string literal,including the terminating null character.
C standard(c11 - 6.7.9 : paragraph 14) says:
An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
So, does not specify the bound of a character array in the array initialize.
char a[] = "1234";
You need one more place at the end of the array to store the \0. Declare the arrays with length 5.
You can access the nth element of an array if the array has n elements. Here the size of the arrays are 4 bytes and you are trying to get the 5th byte (as array indices in C start from 0) when you do something like if(a[4] == '\0').
Execute the above code without specifying the array size, in that case all the 3 if statements will be executed, here as we have specified the array size and we know that the array of string will occupy 1 more char for NULL TERMINATION, but here we didn't give chance to the array to behave that way, therefore the compiler behaves randomly.
#include <stdio.h>
#include <string.h>
void main()
{
char a[10]="123456789";
char b[10]="123456789";
int d;
d=strcmp(a,b);
printf("\nstrcmp(a,b) %d", (strcmp(a,b)==0) ? 0:1);
printf("compare Value %d",d);
}
Output:
strcmp(a,b) 0
compare value 0
If the same program response is different when increase the array to full value, I mean 10 characters. That time the values are different.
#include <stdio.h>
#include <string.h>
void main()
{
char a[10]="1234567890";
char b[10]="1234567890";
int d;
d=strcmp(a,b);
printf("\nstrcmp(a,b) %d", (strcmp(a,b)==0) ? 0:1);
printf("compare Value %d",d);
}
Output:
strcmp(a,b) 1
compare value -175
Why strcmp responding differently when the string is reached full value of array ?
The behaviour of your second snippet is undefined.
There's no room for the null-terminator, which is relied upon by strcmp, when you write char a[10]="1234567890";. This causes strcmp to overrun the array.
One remedy is to use strncmp.
Another one is to use char a[]="1234567890"; (with b adjusted similarly) and let the compiler figure out the array length which will be, in this case, 11.
According to the definitions of terms used in the C Standard (7.1.1 Definitions of terms)
1 Astring is a contiguous sequence of characters terminated by and
including the first null character....The length of a string is the
number of bytes preceding the null character and the value of a string
is the sequence of the values of the contained characters, in order.
According to the description of function strcmp
2 The strcmp function compares the string pointed to by s1 to
the string pointed to by s2.
According to the section 6.7.9 Initialization Of the Standard
14 An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating
null character if there is room or if the array is of unknown size)
initialize the elements of the array.
In the first program arrays a and b initialized by string literals have room to store the terminating zero.
char a[10]="123456789";
char b[10]="123456789";
Thus the array contain string and the function strcmp may be applied to these arrays.
In the second program arrays a and b do not have a room to store the terminating zero
char a[10]="1234567890";
char b[10]="1234567890";
So the arrays do not contain strings and the function strcmp may not be applied to the arrays. Otherwise it will have undefined behaviour because it will stop when it finds either non-equal characters beyond the arrays (because the arrays have all equal characters) or a terminating zero.
You could get a valid result if you limit the comparison with the sizes of the arrays. To do that you have to use another standard function strncmp
Its call can look for example the following way
strncmp( a, b, sizeof( a ) );
In your second case,
char a[10]="1234567890";
char b[10]="1234567890";
you arrays are not null-terminated, so they cannot be used as strings. Any function operating on string family will invoke undefined behavior, (as they will go past the allocated memory in search of the null-terminator).
You better be using
char a[ ]="1234567890";
char b[ ]="1234567890";
to leave the size allocation to the compiler to avoid the null-termination issue. Compiler will allocate enough memory to hold the supplied initializer as well as the terminating null.
That said, void main() should br int main(void) at least to conform to the standards.
You declare and initialize your array with string literal(but no space for nul termiantor) and also string manipulation function requires C-style string to be passed as argument (terminated with '\0') .
So ,in your second program your arrays -
char a[10]="1234567890";
char b[10]="1234567890";
There is no space for '\0' character , so this invokes undefined behavior.
Increase size of your arrays -
char a[11]="1234567890"; //or char a[]="1234567890";
Look at the following code:
#include<stdio.h>
int main(void)
{
char name[7]={'E','R','I','C'};
printf("%s",name);
}
It outputs the entire name ERIC.Why is it so?Isn't %s supposed to work only if we initialize the character array name as follows:
char name[7]={'E','R','I','C','\0'}; //With NULL terminator
I am not considering the following as this obviously assumes a null-terminated character array:
char name[7]="ERIC"
According to the c11 specification
(6.7.9.21)
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
(6.7.9.10)
If an object that has static or thread storage duration is not initialized
explicitly, then:
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
Thus, when you init an array like this:
char name[7]={'E','R','I','C'};
It is as same as:
char name[7]={'E','R','I','C', 0, 0, 0};
So name is still null-terminated.
From C99 Section 7.21.6.1 Paragraph 8 %s specifier
If no l length modifier is present, the argument shall be a pointer to
the initial element of an array of character type. Characters from
the array are written up to (but not including) the terminating null
character. If the precision is specified, no more than that many bytes
are written. If the precision is not specified or is greater than the
size of the array, the array shall contain a null character.
Therefore if you have a pointer to a char * which you print using printf it will print until a \0 is not found.
Also
char name[7]={'E','R','I','C'}; is `\0' terminated in this case because the length of the array is 7 but only 4 of locations are initialized which will result in the other remaining locations to be initialized to 0. Check johnchen902's answer for more.
What is the difference between
char str1[32] = "\0";
and
char str2[32] = "";
Since you already declared the sizes, the two declarations are exactly equal. However, if you do not specify the sizes, you can see that the first declaration makes a larger string:
char a[] = "a\0";
char b[] = "a";
printf("%i %i\n", sizeof(a), sizeof(b));
prints
3 2
This is because a ends with two nulls (the explicit one and the implicit one) while b ends only with the implicit one.
Well, assuming the two cases are as follows (to avoid compiler errors):
char str1[32] = "\0";
char str2[32] = "";
As people have stated, str1 is initialized with two null characters:
char str1[32] = {'\0','\0'};
char str2[32] = {'\0'};
However, according to both the C and C++ standard, if part of an array is initialized, then remaining elements of the array are default initialized. For a character array, the remaining characters are all zero initialized (i.e. null characters), so the arrays are really initialized as:
char str1[32] = {'\0','\0','\0','\0','\0','\0','\0','\0',
'\0','\0','\0','\0','\0','\0','\0','\0',
'\0','\0','\0','\0','\0','\0','\0','\0',
'\0','\0','\0','\0','\0','\0','\0','\0'};
char str2[32] = {'\0','\0','\0','\0','\0','\0','\0','\0',
'\0','\0','\0','\0','\0','\0','\0','\0',
'\0','\0','\0','\0','\0','\0','\0','\0',
'\0','\0','\0','\0','\0','\0','\0','\0'};
So, in the end, there really is no difference between the two.
As others have pointed out, "" implies one terminating '\0' character, so "\0" actually initializes the array with two null characters.
Some other answerers have implied that this is "the same", but that isn't quite right. There may be no practical difference -- as long the only way the array is used is to reference it as a C string beginning with the first character. But note that they do indeed result in two different memory initalizations, in particular they differ in whether Str[1] is definitely zero, or is uninitialized (and could be anything, depending on compiler, OS, and other random factors). There are some uses of the array (perhaps not useful, but still) that would have different behaviors.
Unless I'm mistaken, the first will initialize 2 chars to 0 (the '\0' and the terminator that's always there, and leave the rest untouched, and the last will initialize only 1 char (the terminator).