Why strcmp giving different response for complete filled character array? - c

#include <stdio.h>
#include <string.h>
void main()
{
char a[10]="123456789";
char b[10]="123456789";
int d;
d=strcmp(a,b);
printf("\nstrcmp(a,b) %d", (strcmp(a,b)==0) ? 0:1);
printf("compare Value %d",d);
}
Output:
strcmp(a,b) 0
compare value 0
If the same program response is different when increase the array to full value, I mean 10 characters. That time the values are different.
#include <stdio.h>
#include <string.h>
void main()
{
char a[10]="1234567890";
char b[10]="1234567890";
int d;
d=strcmp(a,b);
printf("\nstrcmp(a,b) %d", (strcmp(a,b)==0) ? 0:1);
printf("compare Value %d",d);
}
Output:
strcmp(a,b) 1
compare value -175
Why strcmp responding differently when the string is reached full value of array ?

The behaviour of your second snippet is undefined.
There's no room for the null-terminator, which is relied upon by strcmp, when you write char a[10]="1234567890";. This causes strcmp to overrun the array.
One remedy is to use strncmp.
Another one is to use char a[]="1234567890"; (with b adjusted similarly) and let the compiler figure out the array length which will be, in this case, 11.

According to the definitions of terms used in the C Standard (7.1.1 Definitions of terms)
1 Astring is a contiguous sequence of characters terminated by and
including the first null character....The length of a string is the
number of bytes preceding the null character and the value of a string
is the sequence of the values of the contained characters, in order.
According to the description of function strcmp
2 The strcmp function compares the string pointed to by s1 to
the string pointed to by s2.
According to the section 6.7.9 Initialization Of the Standard
14 An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating
null character if there is room or if the array is of unknown size)
initialize the elements of the array.
In the first program arrays a and b initialized by string literals have room to store the terminating zero.
char a[10]="123456789";
char b[10]="123456789";
Thus the array contain string and the function strcmp may be applied to these arrays.
In the second program arrays a and b do not have a room to store the terminating zero
char a[10]="1234567890";
char b[10]="1234567890";
So the arrays do not contain strings and the function strcmp may not be applied to the arrays. Otherwise it will have undefined behaviour because it will stop when it finds either non-equal characters beyond the arrays (because the arrays have all equal characters) or a terminating zero.
You could get a valid result if you limit the comparison with the sizes of the arrays. To do that you have to use another standard function strncmp
Its call can look for example the following way
strncmp( a, b, sizeof( a ) );

In your second case,
char a[10]="1234567890";
char b[10]="1234567890";
you arrays are not null-terminated, so they cannot be used as strings. Any function operating on string family will invoke undefined behavior, (as they will go past the allocated memory in search of the null-terminator).
You better be using
char a[ ]="1234567890";
char b[ ]="1234567890";
to leave the size allocation to the compiler to avoid the null-termination issue. Compiler will allocate enough memory to hold the supplied initializer as well as the terminating null.
That said, void main() should br int main(void) at least to conform to the standards.

You declare and initialize your array with string literal(but no space for nul termiantor) and also string manipulation function requires C-style string to be passed as argument (terminated with '\0') .
So ,in your second program your arrays -
char a[10]="1234567890";
char b[10]="1234567890";
There is no space for '\0' character , so this invokes undefined behavior.
Increase size of your arrays -
char a[11]="1234567890"; //or char a[]="1234567890";

Related

Utility of '\0' in C string [duplicate]

This question already has answers here:
What is a null-terminated string?
(7 answers)
Closed last year.
#include <stdio.h>
#include <string.h>
int main()
{
char ch[20] = {'h','i'};
int k=strlen(ch);
printf("%d",k);
return 0;
}
The output is 2.
As far as I know '\0' helps compiler identify the end of string but the output here suggests the strlen can detect the end on it's own then why do we need '\0'?
long story short: it's your compiler making proactive decisions based on the standard.
long story:
char ch[20] = {'h','i'}
in the line above what you are implying to your compiler is;
allocate a memory big enough to store 20 characters (aka, array of 20 chars).
initialize first two slices (first two members of the array) as 'h' & 'i'.
implicitly initialize the rest.
since you are initialing your char array, your compiler is smart enough to insert the null terminator to the third element if it has enough space remaining. This process is the standard for initialization.
if you were to remove the initialization syntax and initialize each member manually like below, the result is undefined behavior.
char ch[20];
ch[0] = 'h';
ch[1] = 'i';
Also, if you were to not have extra space for your compiler to put the null terminator, even if you used a initializer the result would still be an undefined behavior as you can easily test via this code snippet below:
char ch[2] = { 'h','i' };
int k = strlen(ch);
printf("%d\n%s\n", k, ch);
now, if you were to increase the array size of 'ch' from 2 to 3 or any other number higher than 2, you can see that your compiler initializes it with the null terminator thus no more undefined behavior.
In this declaration:
char ch[20] = {'h','i'};
the first two elements are initialized explicitly and all other elements are initialized implicitly by zeroes.
The above declaration in fact (with one exceptions that the third element of the array is also explicitly initialized) is equivalent to:
char ch[20] = "hi";
Pat attention to that the string literal is represented as the following array:
{ 'h', 'i', '\0' }
That is the array contains a string that is terminated by the zero character '\0' and the function strlen can successfully find the length of the stored string.
If you would write for example:
char ch[2] = "hi";
then in this case the array ch does not have a space to store the terminating zero of the string literal. In this case applying the function strlen to this array invokes undefined behavior.
A null byte (i.e. the value 0) is what defines the end of a string in C.
When you defined ch, you gave less initializers than values in the array, so the remaining elements are set to 0. This results in a null terminated string.
The strlen function is basically looking for that value and counting how many elements it sees before it finds the null byte.
As far as I know '\0' helps compiler identify the end of string
Technically, it helps user code and the C runtime library identify the ends of strings. To the extent that the compiler needs to know where strings end, it knows without looking for a terminator.
but the output here suggests the strlen can detect the end on it's own
That would be a misinterpretation. The actual fact is that your string is null-terminated even though you did not put a null terminator in it explicitly. This is a consequence of declaring your array with an initializer that specifies values for only some of the elements. As some of your other answers describe in more detail, that does not produce a partial initialization. Rather, elements for which the initializer does not specify values are default-initialized. For elements of type char, that means initialization with 0, which serves as a string terminator.
Moreover, if the array were without a terminator then the result of passing it to strlen() would be undefined. You could not then conclude anything from the result.
then why do we need '\0'?
So that user code and many standard library functions can recognize the ends of strings. You already know this.
But in many cases we do not need to provide terminators explicitly. In particular, we do not need to represent them in string literals (and it means something different than you probably intended if you do), and you don't need to represent them in the initializers for char arrays storing strings, provided that the array has more elements than you specify in the initializer.
It is likely that your array ch contained zeros thus the byte after i is already set to zero. You can view it with a debugger or simply test it in the code. Trust me, strlen needs the zero to work.

C Arbitrary length string

I have a doubt how the length for an array is allocated
#include <stdio.h>
#include <string.h>
int main()
{
char str[] = "s";
long unsigned a = strlen(str);
scanf("%s", str);
printf("%s\n%lu\n", str, a);
return 0;
}
In the above program, I assign the string "s" to a char array.
I thought the length of str[] is 1. so we cannot store more than the length of the array. But it behaves differently. If I reading a string using scanf it is stored in str[] without any error. What was the length of the array str?
Sample I/O :
Hello
Hello 1
Your str is an array of char initialized with "s", that is, it has size 2 and length 1. The size is one more than the length because a NUL string terminator character (\0) is added at the end.
Your str array can hold at most two char. Trying to write more will cause your program to access memory past the end of the array, which is undefined behavior.
What actually happens though, is that since the str array is stored somewhere in memory (on the stack), and that memory region is far larger than 2 bytes, you are actually able to write past the end without causing a crash. This does not mean that you should. It's still undefined behavior.
Since your array has size 2, it can only hold a string of length 1, plus its terminator. To use scanf() and correctly avoid writing past the end of the array, you can use the field width specifier: a numeric value after the % and before the s, like this:
scanf("%1s", str);
When an array is declared without specifying its size when the size is determined by the used initializers.
In this declaration of an array
char str[] = "s";
there is used a string literal as an initializer. A string literal is a sequence of characters terminated by an included zero-terminating character. That is the string literal "s" has two characters { 's', '\0' }.
Its characters are used to initialize sequentially elements of the array str.
So if you will write
printf( "sizeof( str ) = %zu\n", sizeof( str ) );
then the output will be 2. The length of a string is determinate as a number of characters before the terminating zero character. So if you will write
#include <string.h>
//...
printf( "strlen( str ) = %zu\n", strlen( str ) );
then the output will be 1.
If you will try to write data outside an array then you will get undefined behavior because a memory that does not belong to the array will be overwritten. In some cases you can get the expected result. In other cases the program can finish abnormally. That is the behavior of the program is undefined.
The array str has size 2: 1 byte for the character 's' and one for the terminating null byte. What you're doing is writing past the end of the array. Doing so invokes undefined behavior.
When your code has undefined behavior, it could crash, it could output strange results, or it could (as in this case) appear to work properly. Also, making a seemingly unrelated change such as a printf call for debugging or an unused local variable can change how undefined behavior manifests itself.

Some char arrays don't end with '\0'

I have a simple code in C to see if three same char arrays all end with '\0':
int main(){
char a[4] = "1234";
char b[4] = "1234";
char c[4] = "1234";
if(a[4] == '\0')
printf("a end with '\\0'\n");
if(b[4] == '\0')
printf("b end with '\\0'\n");
if(c[4] == '\0')
printf("c end with '\\0'\n");
return 0;
}
But the output shows that only array b ends with terminator '\0'. Why is that? I supposed all char arrays have to end with '\0'.
Output:
b end with '\0'
The major problem is, for an array defined like char a[4] = .... (with a size of 4 elements), using if (a[4] ....) is already off-by-one and causes undefined behavior.
You want to check for a[3], as that is the last valid element.
That said, in your case, you don;t have room for null-terminator!!
Emphasizing the quote from C11, §6.7.9,
An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
So, you need to either
use an array size which has room for the null-terminator
or, use an array of unknown size, like char a[ ] = "1234"; where, the array size is automatically determined by the length of the supplied initializer (including the null-terminator.)
It is undefined behaviour because you have trying to access array out of bound.
Do not specify the bound of a string initialized with a string literal because the compiler will automatically allocate sufficient space for entire string literal,including the terminating null character.
C standard(c11 - 6.7.9 : paragraph 14) says:
An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
So, does not specify the bound of a character array in the array initialize.
char a[] = "1234";
You need one more place at the end of the array to store the \0. Declare the arrays with length 5.
You can access the nth element of an array if the array has n elements. Here the size of the arrays are 4 bytes and you are trying to get the 5th byte (as array indices in C start from 0) when you do something like if(a[4] == '\0').
Execute the above code without specifying the array size, in that case all the 3 if statements will be executed, here as we have specified the array size and we know that the array of string will occupy 1 more char for NULL TERMINATION, but here we didn't give chance to the array to behave that way, therefore the compiler behaves randomly.

Different outputs for almost same programs in C

Sample 1:
char a []={'h','i'};
int i;
for(i=0;a[i]!='\0';i++){
printf("%c",a[i]);
}
printf("%s",a);
Output: hi☻hi♥
Sample 2:
char a []={'h','i'};
int i;
for(i=0;a[i]!='\0';i++){
char l = a[i];
printf("%c",a[i]);
}
printf("%s",a);
Output:hii♥hi♥♦
Sample 3:
char a [5]={'h','i'};
int i;
for(i=0;a[i]!='\0';i++){
printf("%c",a[i]);
}
printf("%s",a);
Output: hihi
Why the output of these three programs are dissimilar?
Sample 1 and sample 2 are almost similar code except an extra line char l = a[i] and Sample 3 is different from sample 1 and 2 based on the declaration of the size of the array.
In C, arrays only have a size, but no terminator. So an array of two characters (like your first two examples) will have the two characters you specified and nothing else. When you loop looking for the "terminator" you will go out of bounds and have undefined behavior.
The third case is different, because there you define an array of five elements but only initialize the first two. The C standard then requires the rest of the array to be initialized to zero, which is the same as the character '\0'. The array in the third example still haven't got an explicit terminator though, it just so happens that the remainder is initialized the same value as the string terminator.
For sample 1 and 2, you invoke undefined behavior by passing a non-null terminated array as argument to %s in printf().
For a definition like
char a []={'h','i'};
a will be allocated memory to hold only two elements, there will be no extra space allocated to store a terminating null, in this case of using brace-enclosed initializer list.
Quoting Chapter §7.21.6.1, for use of %s format specifier with printf() family,
s If no l length modifier is present, the argument shall be a pointer to the initial
element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. If the
precision is specified, no more than that many bytes are written. If the
precision is not specified or is greater than the size of the array, the array shall
contain a null character.
OTOH, in case of sample 3, for a definition like
char a [5]={'h','i'};
the array is null-terminated, so the output is proper. The array is null-terminated in this case, because, you have provided the array size at the time of declaration and supplied less number of initiliazers in the brace enclosed list, so the remaining elements are initialized to 0 (as if they have static storage). Related, C11, chapter §6.7.9, (emphasis mine)
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
For printf("%s",a) to work, the memory block pointed by a must end with 0.
Same thing goes for the code starting with for (i=0; a[i]!='\0'; i++).
In all of your examples, this memory block ends with 'i', not with 0.
You can fix it by changing the initialization of a to either one of the following:
char a[] = {'h','i',0};
char a[] = {'h','i','\0'};
char a[] = "hi";
char *a = "hi";

strlen and size of for character arrays

I have the following code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char p[5];
char q[]="Hello";
int i=0;
strcpy(p,"Hello");
printf("strlen(p)=%d\n",strlen(p));
printf("sizeof(p)=%d\n",sizeof(p));
printf("strlen(q)=%d\n",strlen(q));
printf("sizeof(q)=%d\n",sizeof(q));
for(i=0;i<6;i++)
{
printf("p[%d]=%c\tq[%d]=%c\n",i,p[i],i,q[i]);
}
return 0;
}
The output that I get is:
strlen(p)=5
sizeof(p)=5
strlen(q)=5
sizeof(q)=6
p[0]=H q[0]=H
p[1]=e q[1]=e
p[2]=l q[2]=l
p[3]=l q[3]=l
p[4]=o q[4]=o
p[5]= q[5]=
I know declaring array like q[]="some string" sets the size of the array equal to the number of characters in the string const, but why is there a difference in the output of sizeof() for both the types of array declaration?
How does the strlen() & the printf() know when to stop, there was no null character added while declaring the two arrays.
There are multiple questions in your question.
strcpy(p,"Hello");
This is illegal since p is only 5 chars long, so there's no room
left for the terminating 0 added by strcpy. Consequently it is
either not 0-terminated or the 0 byte was added outside the available
space - calling strlen on it is also undefined behavior or fishy at
least
Calling sizeof on p is okay and yields the correct value of 5.
Calling strlen(q) yields 5 because q indeed contains a 0 terminator - implicitly added by initializing with a string literal - and there are 5 chars before the 0
Since it contains a 0 terminator, q is really an array of 6
characters so sizeof yields 6.
char p[5];
strcpy(p,"Hello");
copies 5 characters into p and writes the terminating null-character ('\0') at 6th position, i.e. out of the bounds of this array, which yields undefined behavior.
From manual page of strcpy:
"If the destination string of a strcpy() is not large enough, then anything might happen. Any time a program reads or copies data into a buffer, the program first needs to check that there's enough space."
char p[5];
strcpy(p,"Hello");
This strcpy writes a 0 into p[5]. So it's out of bounds. The sizeof(p) is still 5 though.
You have written over the end of p. It's incorrect and results in undefined behavior. In
this case nothing bad happened and it went unnoticed.
The other string you have, has a length of 5 and a sizeof 6.
The q char array also contains the null terminating character. While the fixed size of p doesn't allow the null character to be copied in. Notice that strlen will check for the null character to count the amount of characters of a string, therefore not having one will probably cause undefined behavior.
sizeof(q) is 6, since it contains null terminator.
p does not hold enough space for the null terminator - so strlen(p) can be any random value. This is called undefined behavior.
Strings in C are terminated by a NUL character '\0';
This is why sizeof(q) returns 6, it has enough space to store the '\0' at the end.
You've sized p yourself to be able to hold 5 characters, not enough for the trailing '\0'.
So, this code is undefined behaviour:
strcpy(p, "Hello");
This is copying the '\0' into p[5], which is out-of-bounds.
Question: why is there a difference in the output of sizeof() for both the types of array declaration?
Answer: This statement declares a variable named q, with type char[], pointing at a memory location that holds "Hello".
char q[] = "Hello";
sizeof(q) is 6 because the string "Hello" is comprised of 'H','e','l','l','o','\0', which includes the NULL char in the count.
This statement declares a variable named p, with type char[], pointing to a memory location where 5 char's are reserved.
char p[5];
Note that depending upon memory alignment flags to the compiler, you may actually have 6, 8, or more char's reserved at the location reserved to p. And C won't complain if you reference or assign p[5] (which is the ordinal sixth char in the p[] array).
sizeof(p) is 5 because the compiler has recorded how big the memory location you declared for p. So sizeof(p) and sizeof(q) return different values because p and q are declared differently and refer to different entities.
Question: How does the strlen() & the printf() know when to stop, there was no null character added while declaring the two arrays.
Answer: Both strlen() function calls count the number of non-NULL char's. So both strlen function calls count char's until they locate the NULL terminator. Which both p and q have, at least until the memory location at p+5 is assigned another value. This is because p and q are both allocated on the stack. Look at the addresses of p, q, and the integer i. Here is your function with additional variables added to help illustrate where p and q are located,
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define min(a,b) (((a)<(b))?(a):(b))
#define max(a,b) (((a)<(b))?(b):(a))
int main()
{
char m0 = 'X';
char p[5];
char m1 = 'Y';
char q[]="Hello";
char m2 = 'Z';
int i=0;
strcpy(p,"World");
printf("strlen(p)=%d\n",strlen(p));
printf("sizeof(p)=%d\n",sizeof(p));
printf("strlen(q)=%d\n",strlen(q));
printf("sizeof(q)=%d\n",sizeof(q));
for(i=0;i<6;i++)
{
printf("p[%d]=%c\tq[%d]=%c\n",i,p[i],i,q[i]);
}
printf("m0=%x, %c\n",&m0,m0);
printf(" p=%x\n",p);
printf("m1=%x, %c\n",&m1,m1);
printf(" q=%x\n",q);
printf("m2=%x, %c\n",&m2,m2);
char *x;
for(x=min(&m0,&m2);x<max(&m0,&m2);x++)
{
printf("x[%x]=%c\n",x,*x);
}
return 0;
}
Observe that m0, m1, and m2 are adjacent to the arrays p[] and q[]. When run on my Linux system, we observe that the strcpy of "World" modifies the value of m0 (replaces the 'X' with '\0').
strlen(p)=5
sizeof(p)=5
strlen(q)=5
sizeof(q)=6
p[0]=W q[0]=H
p[1]=o q[1]=e
p[2]=r q[2]=l
p[3]=l q[3]=l
p[4]=d q[4]=o
p[5]= q[5]=
m0=bfbea6a7,
p=bfbea6a2
m1=bfbea6a1, Y
q=bfbea69b
m2=bfbea69a, Z
x[bfbea69a]=Z
x[bfbea69b]=H
x[bfbea69c]=e
x[bfbea69d]=l
x[bfbea69e]=l
x[bfbea69f]=o
x[bfbea6a0]=
x[bfbea6a1]=Y
x[bfbea6a2]=W
x[bfbea6a3]=o
x[bfbea6a4]=r
x[bfbea6a5]=l
x[bfbea6a6]=d
x[bfbea6a7]=
A C literal string such as "Hello" or "World" is terminated by the NULL char, and includes that char in the size of the string. The strcpy() function copies the entire string, including the NULL char at the end.
You should use strncpy, or check the destination string size. Note that when you used strcpy(p,q), you copied more characters (the NULL terminator) than p[] had allocated. That is something you want to avoid. C does not do boundary checking on arrays, so it will let you perform the strcpy. Though lint would detect this error.

Resources