Some char arrays don't end with '\0' - c

I have a simple code in C to see if three same char arrays all end with '\0':
int main(){
char a[4] = "1234";
char b[4] = "1234";
char c[4] = "1234";
if(a[4] == '\0')
printf("a end with '\\0'\n");
if(b[4] == '\0')
printf("b end with '\\0'\n");
if(c[4] == '\0')
printf("c end with '\\0'\n");
return 0;
}
But the output shows that only array b ends with terminator '\0'. Why is that? I supposed all char arrays have to end with '\0'.
Output:
b end with '\0'

The major problem is, for an array defined like char a[4] = .... (with a size of 4 elements), using if (a[4] ....) is already off-by-one and causes undefined behavior.
You want to check for a[3], as that is the last valid element.
That said, in your case, you don;t have room for null-terminator!!
Emphasizing the quote from C11, §6.7.9,
An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
So, you need to either
use an array size which has room for the null-terminator
or, use an array of unknown size, like char a[ ] = "1234"; where, the array size is automatically determined by the length of the supplied initializer (including the null-terminator.)

It is undefined behaviour because you have trying to access array out of bound.
Do not specify the bound of a string initialized with a string literal because the compiler will automatically allocate sufficient space for entire string literal,including the terminating null character.
C standard(c11 - 6.7.9 : paragraph 14) says:
An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
So, does not specify the bound of a character array in the array initialize.
char a[] = "1234";

You need one more place at the end of the array to store the \0. Declare the arrays with length 5.
You can access the nth element of an array if the array has n elements. Here the size of the arrays are 4 bytes and you are trying to get the 5th byte (as array indices in C start from 0) when you do something like if(a[4] == '\0').

Execute the above code without specifying the array size, in that case all the 3 if statements will be executed, here as we have specified the array size and we know that the array of string will occupy 1 more char for NULL TERMINATION, but here we didn't give chance to the array to behave that way, therefore the compiler behaves randomly.

Related

Are char arrays guaranteed to be null terminated?

#include <stdio.h>
int main() {
char a = 5;
char b[2] = "hi"; // No explicit room for `\0`.
char c = 6;
return 0;
}
Whenever we write a string, enclosed in double quotes, C automatically creates an array of characters for us, containing that string, terminated by the \0 character
http://www.eskimo.com/~scs/cclass/notes/sx8.html
In the above example b only has room for 2 characters so the null terminating char doesn't have a spot to be placed at and yet the compiler is reorganizing the memory store instructions so that a and c are stored before b in memory to make room for a \0 at the end of the array.
Is this expected or am I hitting undefined behavior?
It is allowed to initialize a char array with a string if the array is at least large enough to hold all of the characters in the string besides the null terminator.
This is detailed in section 6.7.9p14 of the C standard:
An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
However, this also means that you can't treat the array as a string since it's not null terminated. So as written, since you're not performing any string operations on b, your code is fine.
What you can't do is initialize with a string that's too long, i.e.:
char b[2] = "hello";
As this gives more initializers than can fit in the array and is a constraint violation. Section 6.7.9p2 states this as follows:
No initializer shall attempt to provide a value for an object not contained within the entity
being initialized.
If you were to declare and initialize the array like this:
char b[] = "hi";
Then b would be an array of size 3, which is large enough to hold the two characters in the string constant plus the terminating null byte, making b a string.
To summarize:
If the array has a fixed size:
If the string constant used to initialize it is shorter than the array, the array will contain the characters in the string with successive elements set to 0, so the array will contain a string.
If the array is exactly large enough to contain the elements of the string but not the null terminator, the array will contain the characters in the string without the null terminator, meaning the array is not a string.
If the string constant (not counting the null terminator) is longer than the array, this is a constraint violation which triggers undefined behavior
If the array does not have an explicit size, the array will be sized to hold the string constant plus the terminating null byte.
Whenever we write a string, enclosed in double quotes, C automatically creates an array of characters for us, containing that string, terminated by the \0 character.
Those notes are mildly misleading in this case. I shall have to update them.
When you write something like
char *p = "Hello";
or
printf("world!\n");
C automatically creates an array of characters for you, of just the right size, containing the string, terminated by the \0 character.
In the case of array initializers, however, things are slightly different. When you write
char b[2] = "hi";
the string is merely the initializer for an array which you are creating. So you have complete control over the size. There are several possibilities:
char b0[] = "hi"; // compiler infers size
char b1[1] = "hi"; // error
char b2[2] = "hi"; // No terminating 0 in the array. (Illegal in C++, BTW)
char b3[3] = "hi"; // explicit size matches string literal
char b4[10] = "hi"; // space past end of initializer is always zero-initialized
For b0, you don't specify a size, so the compiler uses the string initializer to pick the right size, which will be 3.
For b1, you specify a size, but it's too small, so the compiler should give you a error.
For b2, which is the case you asked about, you specify a size which is just barely big enough for the explicit characters in the string initializer, but not the terminating \0. This is a special case. It's legal, but what you end up with in b2 is not a proper null-terminated string. Since it's unusual at best, the compiler might give you a warning. See this question for more information on this case.
For b3, you specify a size which is just right, so you get a proper string in an exactly-sized array, just like b0.
For b4, you specify a size which is too big, although this is no problem. There ends up being extra space in the array, beyond the terminating \0. (As a matter of fact, this extra space will also be filled with \0.) This extra space would let you safely do something like strcat(b4, ", wrld!").
Needless to say, most of the time you want to use the b0 form. Counting characters is tedious and error-prone. As Brian Kernighan (one of the creators of C) has written in this context, "Let the computer do the dirty work."
One more thing. You wrote:
and yet the compiler is reorganizing the memory store instructions so that a and c are stored before b in memory to make room for a \0 at the end of the array.
I don't know what's going on there, but it's safe to say that the compiler is not trying to "make room for a \0". Compilers can and often do store variables in their own inscrutable internal order, matching neither the order you declared them, nor alphabetical order, nor anything else you might think of. If under your compiler array b ended up with extra space after it which did contain a \0 as if to terminate the string, that was probably basically random chance, not because the compiler was trying to be nice to you and helping to make something like printf("%s\n", b) be better defined. (Under the two compilers where I tried it, printf("%s\n", b) printed hi^E and hi ??, clearly showing the presence of trailing random garbage, as expected.)
There are two things in your question.
String literal. String literal (ie something enclosed in the double quotes) is always the correct null character terminated string.
char *p = "ABC"; // p references null character terminated string
Character array may only hold as many elements as it has so if you try to initialize two element array with three elements string literal, only two first will be written. So the array will not contain the null character terminated C string
char p[2] = "AB"; // p is not a valid C string.
A array of char need not be terminated by anything at all. It is an array. If the actual content is smaller than the dimensions of the array then you need to track the size of that content.
Answers here seem to have degenerated into a string discussion. Not all arrays of char are strings. However it is a very strong convention to use a null terminator as a sentinel if they are to be handled as de facto strings.
Your array may use something else, and may also have separators and zones. After all it may be a Union or overlay a structure. Possibly a staging area for another system.

char pwdToTest[4] = "ABCD"; how can I detect the end of the string in C?

I am trying to get a grasp on char arrays to do "string" operations but I cannot figure out this behavior. I am basically trying to implement strlen function.
In essence, the problem is that if I allocate:
char pwdToTest[4] = "ABCD";
and then I try to iterate through each character looking for '\0', I have just realized that there is no '\0' at the end, but rather whatever it is in that fifth memory position (it could be an '\0\ or something else from previous tests), so I cannot detect the end of the "string".
This solves the problem though:
char pwdToTest[5] = "ABCD\0";
What is the proper way to create a "string" and allocate a value so it can be used in strlen or in my own strlen implementation.
My implementation for reference:
int calLenCharArr(char *charArray)
{
int i = 0;
while (charArray[i] != '\0')
{
printf("%c %i\n",charArray[i],charArray[i]);
i++;
}
printf("lenth: %i", i);
return i;
}
C strings are null-terminated character arrays. Unless a char array is terminated by null, you cannot call is a string, by definition.
Quoting C11, chapter §7.1.1
A string is a contiguous sequence of characters terminated by and including the first null
character. [...]
So, when you say
char pwdToTest[4] = "ABCD";
you don't have any place for the null-terminator to be stored.
The best way is to leave the size of the array and initialize it with the required string literal
char pwdToTest[ ] = "ABCD";
which automatically allocates the space, including the terminating null character.
Related, chapter §6.7.9, (emphasis mine)
An array of character type may be initialized by a character string literal or UTF−8 string
literal, optionally enclosed in braces. Successive bytes of the string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
Now, pwdToTest is qualified to be called as a string and all string related functions will work with this.

Is there a null character added after the string literal even the bound is not correct?

Is there any null character after the character c in memory:
char a[3]="abc";
printf("the value of the character is %.3s\n",a);
printf("the value of the character is %s\n",a);
Which line is correct ?
char a[3] = "abc"; is well-formed; the three elements of the array will be the characters 'a', 'b', and 'c'. There will not be a NUL terminator. (There might still happen to be a zero byte in memory immediately after the storage allocated to the array, but if there is, it is not part of the array. printf("%s", a) has undefined behavior.)
You might think that this violates the normal rule for when the initializer is too long for the object, C99 6.7.8p2
No initializer shall attempt to provide a value for an object not contained within the entity
being initialized.
That's a "shall" sentence in a "constraints" section, so a program that violates it is ill-formed. But there is a special case for when you initialize a char array with a string literal: C99 6.7.8p14 reads
An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
The parenthetical overrides 6.7.8p2 and specifies that in this case the terminating null character is discarded.
There is a similar special case for initializing a wchar_t array with a wide string literal.

Different outputs for almost same programs in C

Sample 1:
char a []={'h','i'};
int i;
for(i=0;a[i]!='\0';i++){
printf("%c",a[i]);
}
printf("%s",a);
Output: hi☻hi♥
Sample 2:
char a []={'h','i'};
int i;
for(i=0;a[i]!='\0';i++){
char l = a[i];
printf("%c",a[i]);
}
printf("%s",a);
Output:hii♥hi♥♦
Sample 3:
char a [5]={'h','i'};
int i;
for(i=0;a[i]!='\0';i++){
printf("%c",a[i]);
}
printf("%s",a);
Output: hihi
Why the output of these three programs are dissimilar?
Sample 1 and sample 2 are almost similar code except an extra line char l = a[i] and Sample 3 is different from sample 1 and 2 based on the declaration of the size of the array.
In C, arrays only have a size, but no terminator. So an array of two characters (like your first two examples) will have the two characters you specified and nothing else. When you loop looking for the "terminator" you will go out of bounds and have undefined behavior.
The third case is different, because there you define an array of five elements but only initialize the first two. The C standard then requires the rest of the array to be initialized to zero, which is the same as the character '\0'. The array in the third example still haven't got an explicit terminator though, it just so happens that the remainder is initialized the same value as the string terminator.
For sample 1 and 2, you invoke undefined behavior by passing a non-null terminated array as argument to %s in printf().
For a definition like
char a []={'h','i'};
a will be allocated memory to hold only two elements, there will be no extra space allocated to store a terminating null, in this case of using brace-enclosed initializer list.
Quoting Chapter §7.21.6.1, for use of %s format specifier with printf() family,
s If no l length modifier is present, the argument shall be a pointer to the initial
element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. If the
precision is specified, no more than that many bytes are written. If the
precision is not specified or is greater than the size of the array, the array shall
contain a null character.
OTOH, in case of sample 3, for a definition like
char a [5]={'h','i'};
the array is null-terminated, so the output is proper. The array is null-terminated in this case, because, you have provided the array size at the time of declaration and supplied less number of initiliazers in the brace enclosed list, so the remaining elements are initialized to 0 (as if they have static storage). Related, C11, chapter §6.7.9, (emphasis mine)
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
For printf("%s",a) to work, the memory block pointed by a must end with 0.
Same thing goes for the code starting with for (i=0; a[i]!='\0'; i++).
In all of your examples, this memory block ends with 'i', not with 0.
You can fix it by changing the initialization of a to either one of the following:
char a[] = {'h','i',0};
char a[] = {'h','i','\0'};
char a[] = "hi";
char *a = "hi";

2-D character array

#include<stdio.h>
void main()
{
char a[10][5] = {"hi", "hello", "fellow"};
printf("%s",a[0]);
}
Why this code printing only hi
#include<stdio.h>
void main()
{
char a[10][5] = {"hi", "hello", "fellow"};
printf("%s",a[1]);
}
While this code is printing "hellofellow"
Why this code printing only hi
You've told printf to print the string stored at a[0], and that string happens to be "hi".
While this code is printing "hellofellow"
This one is by coincidence, in fact your code ought to be rejected by the compiler due to a constraint violation:
No initializer shall attempt to provide a value for an object not contained within the entity being initialized.
The string "fellow", specifically the 'w' at the end of it does not fit within the char[5] being initialised, and this violates the C standard. Perhaps also by coincidence, your compiler provides an extension (making it technically a non-C compiler), and so you don't see the error messages that I do:
prog.c:3:6: error: return type of 'main' is not 'int' [-Werror=main]
void main()
^
prog.c: In function 'main':
prog.c:5:37: error: initializer-string for array of chars is too long [-Werror]
char a[10][5] = {"hi", "hello", "fellow"};
^
Note that the second error message is complaining about "fellow", but not "hello". Your "hello" initialisation is valid by exception:
An array of character type may be initialized by a character string literal or UTF-8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
The emphasis is mine. What the emphasised section states is that if there isn't enough room for a terminal '\0' character, that won't be used in the initialisation.
Your code:
char a[10][5] = {"hi", "hello", "fellow"};
Allocates 10 char [5]
"hello" takes up 5 so there is no room for the terminating \0, so it runs into "fellow"
If you try it, a [3] should be "w" because "fellow" is too big and the "w" runs over from a[2] to a[3]
Aside from being undefined behavior, it is confusing what you were trying to do
It will give undefines behaviour as string are null-terminated.
And element hello has length of 5.
Declare your array as a[10][7] then you will get intended output.
See here -https://ideone.com/c2zUs0
Why this code printing only hi
Because a[0][2] is null indicating termination thus giving you hi.
This is undefined behavior due to insufficient space to store \0 character.
Please note that the memory allocated is 5bytes per string in your array of strings. Thus, for the a[1] there is not sufficient memory to store the \0 character as all five bytes are assigned with "hello".
Thus, the subsequent memory is read until the \0 character is found.
Thus, you can change the line:
char a[10][5] = {"hi", "hello", "fellow"};
to
char a[][7] = {"hi", "hello", "fellow"};
Why this code printing only hi
This is because the \0 character is already encountered at a[0][2] and thus the reading of the characters is stopped.
What Your Code Does:
Look at the following statement:
char a[10][5] = {"hi", "hello", "fellow"};
It allocates 10 rows. 5 characters are allocated for each index of a.
What is the Problem:
Strings are Null Terminated there is always a null-terminator needed to be stored except for the given characters, so basically the used size of array is numOfCharacters+1, the extra one byte is for the null terminator. When you are initializing the array with exactly size number of characters, the null terminator is skipped. Normally the character array value is printed until the first \0(null terminator) is not found. Please also have a look at this.
The Solution:
No need to worry about this problem, all you need to do is just to set the size equal to the numOfCharactersInString + 1. You can use the following statement:
char a[10][7] = {"hi", "hello", "fellow"};
Since the largest string is "fellow" which contains 6 characters, you need to set the size 6 + 1 that is why the statement should use char a[10][7] instead of char a[10][5]
Hope it helps.
When you declare a 2-D character array as
char a[10][5] = {"hi", "hello", "fellow"};
char a[10][5] reserves memory to store 10 strings each of length 5 which means 4 characters + 1 '\0' character. A point to note is that the array elements are stored in contiguous memory locations.
a[0] points to the first string, a[1] to the second and so on.
Also when you initialize an array partially the other uninitialized elements become 0 instead of being garbage values.
Now in your case,after initialization if you try to visualize the array it would be something like
hi\0\0\0hellofello\0\0...
Now the command
printf("%s",a[0]);
prints characters starting from 'h' of "hi" and stops printing when a '\0' is encountered so "hi" is printed.
Now for the second case,
printf("%s",a[1]);
characters are printed starting from the 'h' of "hello" till a '\0' is encountered.Now the '\0' character is encountered only after printing "hellofello" and hence the output.

Resources