Why do I need to append a NUL character to an array? - c

#include <stdio.h>
#include <string.h>
int main()
{
char x[] = "Happy birthday to You";
char y[25];
char z[15];
printf("The string in array x is: %s\nThe string in array y is: %s\n", x, strcpy(y, x));
strncpy(z, x, 14);
z[14] = '\0'; // Why do I need to do this?
printf("The string in array z is: %s\n", z);
return 0;
}
Why do I need to append a NUL character to the array z?
Also I commented that line and the output didn't change, Why?
And if I do something like z[20] = '\0'; it's compiling and showing me result without any errors. Isn't that illegal access to memory as I declared my z array to be the size of 15?

C language does not have native string type. In C, strings are actually one-dimensional array of characters terminated by a null character \0.
Why do I need to append a NUL character to the array z?
C library function, like strcpy()/strncpy() operate on null-terminated character array. strncpy() copies at most count characters from the source to the destination. If it finds the terminating null character, copies it to destination and return. If count is reached before the entire source array was copied, the resulting character array is not null-terminated and you need to explicitly add the null terminating character at the end of destination.
Also I commented that line and the output didn't change, Why?
The only thing I can say is you are lucky. Perhaps the last character of destination (i.e. z[14]) could have all bits 0. But this may not happen every time you run your program. Make sure to add the null terminating character explicitly when using strncpy() or use some other C library function which do it automatically for you like snprintf().
And if I do something like z[20] = '\0'; it's compiling and showing me result without any errors. Isn't that illegal access to memory as I declared my z array to be the size of 15?
The size of array z is 15 and you are accessing z[20] i.e. accessing array z beyond its size. In C, accessing array out of bounds is undefined behavior. An undefined behavior includes program may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

Why do I need to append a NUL character to the array z?
From the strncpy man:
No null-character is implicitly appended at the end of destination if
source is longer than num. Thus, in this case, destination shall not
be considered a null terminated C string (reading it as such would
overflow).
Also I commented that line and the output didn't change, Why?
My guess is: z was already initialized to 0 (you can not rely on this for arrays with non static storage duration)
Isn't that illegal access to memory as I declared my z array to be the
size of 15?
Yes, out-of-bounds array accesses have undefined behavior, and can result in crashes or incorrect program output.
If you want to avoid hardcoding the NUL terminator, you can switch to snprintf:
snprintf(z, sizeof z, "%.*s", 14, x);

Why do I need to append a NUL character to the array z?
%s in printf prints everything until a NUL-terminator. If you don't add the NUL-terminator at the end, printf will go on accessing invalid memory locations beyond the array until a NUL-terminator invoking Undefined Behavior.
Also I commented that line and the output didn't change, Why?
Consider yourself unlucky that the z[14] had a NUL-terminator. This isn't guaranteed and you still invoke Undefined Behavior.
And if I do something like z[20] = '\0'; it's compiling and showing me result without any errors. Isn't that illegal access to memory as I declared my z array to be the size of 15?
C is a loosely typed language and does not do any bound checking. All power rests in the hands of the programmer and you should be coding stuff properly.
Yes, it accesses illegal memory. This invokes Undefined Behavior which means that anything could happen. Consider yourself unlucky that it didn't crash or anything

In C all char strings are really called null-terminated byte strings. All string functions use the null-terminator to know where the strings end, as they can't otherwise know the length of a string (don't forget that arrays decays to pointers to their first element, and pointer have no information about what they point to except the type).
Also note that there are cases when strncpy will not automatically terminate the destination, which means you have to do it explicitly. Your use in the example is one such case.
If a string is not terminated the string functions can go out of bounds searching for it, and that will lead to undefined behavior. Unfortunately one of the possibilities of UB is to seemingly work.
Lastly, uninitialized local non-static (a.k.a. "automatic") variables, including arrays, will have indeterminate (and seemingly random) values and contents. You could be "lucky" that z[14] just happens to contain a zero.

Related

Utility of '\0' in C string [duplicate]

This question already has answers here:
What is a null-terminated string?
(7 answers)
Closed last year.
#include <stdio.h>
#include <string.h>
int main()
{
char ch[20] = {'h','i'};
int k=strlen(ch);
printf("%d",k);
return 0;
}
The output is 2.
As far as I know '\0' helps compiler identify the end of string but the output here suggests the strlen can detect the end on it's own then why do we need '\0'?
long story short: it's your compiler making proactive decisions based on the standard.
long story:
char ch[20] = {'h','i'}
in the line above what you are implying to your compiler is;
allocate a memory big enough to store 20 characters (aka, array of 20 chars).
initialize first two slices (first two members of the array) as 'h' & 'i'.
implicitly initialize the rest.
since you are initialing your char array, your compiler is smart enough to insert the null terminator to the third element if it has enough space remaining. This process is the standard for initialization.
if you were to remove the initialization syntax and initialize each member manually like below, the result is undefined behavior.
char ch[20];
ch[0] = 'h';
ch[1] = 'i';
Also, if you were to not have extra space for your compiler to put the null terminator, even if you used a initializer the result would still be an undefined behavior as you can easily test via this code snippet below:
char ch[2] = { 'h','i' };
int k = strlen(ch);
printf("%d\n%s\n", k, ch);
now, if you were to increase the array size of 'ch' from 2 to 3 or any other number higher than 2, you can see that your compiler initializes it with the null terminator thus no more undefined behavior.
In this declaration:
char ch[20] = {'h','i'};
the first two elements are initialized explicitly and all other elements are initialized implicitly by zeroes.
The above declaration in fact (with one exceptions that the third element of the array is also explicitly initialized) is equivalent to:
char ch[20] = "hi";
Pat attention to that the string literal is represented as the following array:
{ 'h', 'i', '\0' }
That is the array contains a string that is terminated by the zero character '\0' and the function strlen can successfully find the length of the stored string.
If you would write for example:
char ch[2] = "hi";
then in this case the array ch does not have a space to store the terminating zero of the string literal. In this case applying the function strlen to this array invokes undefined behavior.
A null byte (i.e. the value 0) is what defines the end of a string in C.
When you defined ch, you gave less initializers than values in the array, so the remaining elements are set to 0. This results in a null terminated string.
The strlen function is basically looking for that value and counting how many elements it sees before it finds the null byte.
As far as I know '\0' helps compiler identify the end of string
Technically, it helps user code and the C runtime library identify the ends of strings. To the extent that the compiler needs to know where strings end, it knows without looking for a terminator.
but the output here suggests the strlen can detect the end on it's own
That would be a misinterpretation. The actual fact is that your string is null-terminated even though you did not put a null terminator in it explicitly. This is a consequence of declaring your array with an initializer that specifies values for only some of the elements. As some of your other answers describe in more detail, that does not produce a partial initialization. Rather, elements for which the initializer does not specify values are default-initialized. For elements of type char, that means initialization with 0, which serves as a string terminator.
Moreover, if the array were without a terminator then the result of passing it to strlen() would be undefined. You could not then conclude anything from the result.
then why do we need '\0'?
So that user code and many standard library functions can recognize the ends of strings. You already know this.
But in many cases we do not need to provide terminators explicitly. In particular, we do not need to represent them in string literals (and it means something different than you probably intended if you do), and you don't need to represent them in the initializers for char arrays storing strings, provided that the array has more elements than you specify in the initializer.
It is likely that your array ch contained zeros thus the byte after i is already set to zero. You can view it with a debugger or simply test it in the code. Trust me, strlen needs the zero to work.

Are char arrays guaranteed to be null terminated?

#include <stdio.h>
int main() {
char a = 5;
char b[2] = "hi"; // No explicit room for `\0`.
char c = 6;
return 0;
}
Whenever we write a string, enclosed in double quotes, C automatically creates an array of characters for us, containing that string, terminated by the \0 character
http://www.eskimo.com/~scs/cclass/notes/sx8.html
In the above example b only has room for 2 characters so the null terminating char doesn't have a spot to be placed at and yet the compiler is reorganizing the memory store instructions so that a and c are stored before b in memory to make room for a \0 at the end of the array.
Is this expected or am I hitting undefined behavior?
It is allowed to initialize a char array with a string if the array is at least large enough to hold all of the characters in the string besides the null terminator.
This is detailed in section 6.7.9p14 of the C standard:
An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
However, this also means that you can't treat the array as a string since it's not null terminated. So as written, since you're not performing any string operations on b, your code is fine.
What you can't do is initialize with a string that's too long, i.e.:
char b[2] = "hello";
As this gives more initializers than can fit in the array and is a constraint violation. Section 6.7.9p2 states this as follows:
No initializer shall attempt to provide a value for an object not contained within the entity
being initialized.
If you were to declare and initialize the array like this:
char b[] = "hi";
Then b would be an array of size 3, which is large enough to hold the two characters in the string constant plus the terminating null byte, making b a string.
To summarize:
If the array has a fixed size:
If the string constant used to initialize it is shorter than the array, the array will contain the characters in the string with successive elements set to 0, so the array will contain a string.
If the array is exactly large enough to contain the elements of the string but not the null terminator, the array will contain the characters in the string without the null terminator, meaning the array is not a string.
If the string constant (not counting the null terminator) is longer than the array, this is a constraint violation which triggers undefined behavior
If the array does not have an explicit size, the array will be sized to hold the string constant plus the terminating null byte.
Whenever we write a string, enclosed in double quotes, C automatically creates an array of characters for us, containing that string, terminated by the \0 character.
Those notes are mildly misleading in this case. I shall have to update them.
When you write something like
char *p = "Hello";
or
printf("world!\n");
C automatically creates an array of characters for you, of just the right size, containing the string, terminated by the \0 character.
In the case of array initializers, however, things are slightly different. When you write
char b[2] = "hi";
the string is merely the initializer for an array which you are creating. So you have complete control over the size. There are several possibilities:
char b0[] = "hi"; // compiler infers size
char b1[1] = "hi"; // error
char b2[2] = "hi"; // No terminating 0 in the array. (Illegal in C++, BTW)
char b3[3] = "hi"; // explicit size matches string literal
char b4[10] = "hi"; // space past end of initializer is always zero-initialized
For b0, you don't specify a size, so the compiler uses the string initializer to pick the right size, which will be 3.
For b1, you specify a size, but it's too small, so the compiler should give you a error.
For b2, which is the case you asked about, you specify a size which is just barely big enough for the explicit characters in the string initializer, but not the terminating \0. This is a special case. It's legal, but what you end up with in b2 is not a proper null-terminated string. Since it's unusual at best, the compiler might give you a warning. See this question for more information on this case.
For b3, you specify a size which is just right, so you get a proper string in an exactly-sized array, just like b0.
For b4, you specify a size which is too big, although this is no problem. There ends up being extra space in the array, beyond the terminating \0. (As a matter of fact, this extra space will also be filled with \0.) This extra space would let you safely do something like strcat(b4, ", wrld!").
Needless to say, most of the time you want to use the b0 form. Counting characters is tedious and error-prone. As Brian Kernighan (one of the creators of C) has written in this context, "Let the computer do the dirty work."
One more thing. You wrote:
and yet the compiler is reorganizing the memory store instructions so that a and c are stored before b in memory to make room for a \0 at the end of the array.
I don't know what's going on there, but it's safe to say that the compiler is not trying to "make room for a \0". Compilers can and often do store variables in their own inscrutable internal order, matching neither the order you declared them, nor alphabetical order, nor anything else you might think of. If under your compiler array b ended up with extra space after it which did contain a \0 as if to terminate the string, that was probably basically random chance, not because the compiler was trying to be nice to you and helping to make something like printf("%s\n", b) be better defined. (Under the two compilers where I tried it, printf("%s\n", b) printed hi^E and hi ??, clearly showing the presence of trailing random garbage, as expected.)
There are two things in your question.
String literal. String literal (ie something enclosed in the double quotes) is always the correct null character terminated string.
char *p = "ABC"; // p references null character terminated string
Character array may only hold as many elements as it has so if you try to initialize two element array with three elements string literal, only two first will be written. So the array will not contain the null character terminated C string
char p[2] = "AB"; // p is not a valid C string.
A array of char need not be terminated by anything at all. It is an array. If the actual content is smaller than the dimensions of the array then you need to track the size of that content.
Answers here seem to have degenerated into a string discussion. Not all arrays of char are strings. However it is a very strong convention to use a null terminator as a sentinel if they are to be handled as de facto strings.
Your array may use something else, and may also have separators and zones. After all it may be a Union or overlay a structure. Possibly a staging area for another system.

I want to know why this works without having to bind memory for the string

Hello guys I recently picked up C programming and I am stuck at understanding pointers. As far as I understand to store a value in a pointer you have to bind memory (using malloc) the size of the value you want to store. Given this, the following code should not work as I have not allocated 11 bytes of memory to store my string of size 11 bytes and yet for some reason beyond my comprehension it works perfectly fine.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(){
char *str = NULL;
str = "hello world\0";
printf("filename = %s\n", str);
return 0;
}
In this case
str = "hello world\0";
str points to the address of the first element of an array of chars, initialized with "hello world\0". In other words, str points to a "string literal".
By definition, the array is allocated and the address of the first element has to be "valid".
Quoting C11, chapter §6.4.5, String literals
In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals.78) The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence. [....]
Memory allocation still happens, just not explicitly by you (via memory allocator functions).
That said, the "...\0" at the end is repetitive, as mentioned (in the first statement of the quote) above, by default, the array will be null-terminated.
Using a char variable without malloc is stating that the string you are assigning is read-only. This means that you are creating a pointer to a string constant. "hello world\0" is somewhere in the read-only part of memory and you are just pointing to it.
Now if you want to make changes to the string. Let's say changing the h to H, that would be str[0]='H'. Without malloc that will not be possible to make.
When you declare a string literal in a C program, it is stored in a read-only section of the program code. A statement of the form char *str = "hello"; will assign the address of this string to the char* pointer. However, the string itself (i.e., the characters h, e, l, l and o, plus the \0 string terminator) are still located in read-only memory, so you can't change them at all.
Note that there's no need for you to explicitly add a zero byte terminator to your string declarations. The C compiler will do this for you.
Right. But in this case you are just pointing to a string literal which is placed in the constant memory area. Your pointer is created in the stack area. So you are just pointing to another address. i.e, at the starting address of string literal.
Try using copy the string literal in your pointer variable. Then it will give error because you have not allocated memory. Hope you understand now.
Storage for string literals is set aside at program startup and held until the program exits. This storage may be read-only, and attempting to modify the contents of a string literal results in undefined behavior (it may work, it may crash, it may do something in between).

storing of strings in char arrays in C

#include<stdio.h>
int main()
{
char a[5]="hello";
puts(a); //prints hello
}
Why does the code compile correctly? We need six places to store "hello", correct?
The C compiler will let you run off the end of arrays, it does no checks of that sort.
The C compiler allows you to explicitly ask for no null terminator.
char a[] = "Hello"; /* adds a terminator implicitly */
char a[6] = "Hello"; /* adds a terminator implicitly */
char a[5] = "Hello"; /* skips it */
Any value smaller than 5 results in an error.
As for why - one possibility is that your strings are of a fixed size, or are being used as buffers of byte values. In these cases you do not need a null terminator.
Best practice is to use char a[] so the compiler can set it to the correct value (including terminator) automatically.
a doesn't contain a null terminated string (extra initializers for fixed size arrays - such as the null terminator in "hello" - are discarded), so the behaviour when a pointer to that array is passed to puts is undefined.
In my experience, a lot of compilers will let you get away with compiling this. It will usually crash at runtime, though (because you don't have a null terminator).
C char array initialization includes the terminating null only if there is room or if the array dimensions are not specified.
You need 6 characters to store "hello" as a null terminated string. But char arrays are not constrained to store nul terminated string, you may need the array for another purpose and forcing an additional nul character in those cases would be pointless.
That is because in C memory management is done manually unlike in java and some other few languages....
The six places you allocated is not checked for during compilation but if you
have to get into filing(I mean storing actually) you are going to have a runtime error becuase the program kept five places in memory(but is expected to hold six) for the characters but the compiler did not check!
"hello" string is kept in read-only memory with 0 in the end. "a" points to this string, this is why the program may work correctly. But I think that generally this is undefined behavior.
It is necessary to see Assembly code generated by compiler to see what happens exactly. If you want to get junk output in this situation, try:
char a[5] = {'h', 'e', 'l', 'l', 'o'}
The C compiler you are using does not check that the string literal fits to the char array. You need 6 characters in the array to fit the literal "Hello" since the literal includes a terminating zero. Modern compilers, such as Visual C++ 2010 do check these things and give you and error.

I'm new to C, can someone explain why the size of this string can change?

I have never really done much C but am starting to play around with it. I am writing little snippets like the one below to try to understand the usage and behaviour of key constructs/functions in C. The one below I wrote trying to understand the difference between char* string and char string[] and how then lengths of strings work. Furthermore I wanted to see if sprintf could be used to concatenate two strings and set it into a third string.
What I discovered was that the third string I was using to store the concatenation of the other two had to be set with char string[] syntax or the binary would die with SIGSEGV (Address boundary error). Setting it using the array syntax required a size so I initially started by setting it to the combined size of the other two strings. This seemed to let me perform the concatenation well enough.
Out of curiosity, though, I tried expanding the "concatenated" string to be longer than the size I had allocated. Much to my surprise, it still worked and the string size increased and could be printf'd fine.
My question is: Why does this happen, is it invalid or have risks/drawbacks? Furthermore, why is char str3[length3] valid but char str3[7] causes "SIGABRT (Abort)" when sprintf line tries to execute?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main() {
char* str1 = "Sup";
char* str2 = "Dood";
int length1 = strlen(str1);
int length2 = strlen(str2);
int length3 = length1 + length2;
char str3[length3];
//char str3[7];
printf("%s (length %d)\n", str1, length1); // Sup (length 3)
printf("%s (length %d)\n", str2, length2); // Dood (length 4)
printf("total length: %d\n", length3); // total length: 7
printf("str3 length: %d\n", (int)strlen(str3)); // str3 length: 6
sprintf(str3, "%s<-------------------->%s", str1, str2);
printf("%s\n", str3); // Sup<-------------------->Dood
printf("str3 length after sprintf: %d\n", // str3 length after sprintf: 29
(int)strlen(str3));
}
This line is wrong:
char str3[length3];
You're not taking the terminating zero into account. It should be:
char str3[length3+1];
You're also trying to get the length of str3, while it hasn't been set yet.
In addition, this line:
sprintf(str3, "%s<-------------------->%s", str1, str2);
will overflow the buffer you allocated for str3. Make sure you allocate enough space to hold the complete string, including the terminating zero.
void main() {
char* str1 = "Sup"; // a pointer to the statically allocated sequence of characters {'S', 'u', 'p', '\0' }
char* str2 = "Dood"; // a pointer to the statically allocated sequence of characters {'D', 'o', 'o', 'd', '\0' }
int length1 = strlen(str1); // the length of str1 without the terminating \0 == 3
int length2 = strlen(str2); // the length of str2 without the terminating \0 == 4
int length3 = length1 + length2;
char str3[length3]; // declare an array of7 characters, uninitialized
So far so good. Now:
printf("str3 length: %d\n", (int)strlen(str3)); // What is the length of str3? str3 is uninitialized!
C is a primitive language. It doesn't have strings. What it does have is arrays and pointers. A string is a convention, not a datatype. By convention, people agree that "an array of chars is a string, and the string ends at the first null character". All the C string functions follow this convention, but it is a convention. It is simply assumed that you follow it, or the string functions will break.
So str3 is not a 7-character string. It is an array of 7 characters. If you pass it to a function which expects a string, then that function will look for a '\0' to find the end of the string. str3 was never initialized, so it contains random garbage. In your case, apparently, there was a '\0' after the 6th character so strlen returns 6, but that's not guaranteed. If it hadn't been there, then it would have read past the end of the array.
sprintf(str3, "%s<-------------------->%s", str1, str2);
And here it goes wrong again. You are trying to copy the string "Sup<-------------------->Dood\0" into an array of 7 characters. That won't fit. Of course the C function doesn't know this, it just copies past the end of the array. Undefined behavior, and will probably crash.
printf("%s\n", str3); // Sup<-------------------->Dood
And here you try to print the string stored at str3. printf is a string function. It doesn't care (or know) about the size of your array. It is given a string, and, like all other string functions, determines the length of the string by looking for a '\0'.
Instead of trying to learn C by trial and error, I suggest that you go to your local bookshop and buy an "introduction to C programming" book. You'll end up knowing the language a lot better that way.
There is nothing more dangerous than a programmer who half understands C!
What you have to understand is that C doesn't actually have strings, it has character arrays. Moreover, the character arrays don't have associated length information -- instead, string length is determined by iterating over the characters until a null byte is encountered. This implies, that every char array should be at least strlen + 1 characters in length.
C doesn't perform array bounds checking. This means that the functions you call blindly trust you to have allocated enough space for your strings. When that isn't the case, you may end up writing beyond the bounds of the memory you allocated for your string. For a stack allocated char array, you'll overwrite the values of local variables. For heap-allocated char arrays, you may write beyond the memory area of your application. In either case, the best case is you'll error out immediately, and the worst case is that things appear to be working, but actually aren't.
As for the assignment, you can't write something like this:
char *str;
sprintf(str, ...);
and expect it to work -- str is an uninitialized pointer, so the value is "not defined", which in practice means "garbage". Pointers are memory addresses, so an attempt to write to an uninitialized pointer is an attempt to write to a random memory location. Not a good idea. Instead, what you want to do is something like:
char *str = malloc(sizeof(char) * (string length + 1));
which allocates n+1 characters worth of storage and stores the pointer to that storage in str. Of course, to be safe, you should check whether or not malloc returns null. And when you're done, you need to call free(str).
The reason your code works with the array syntax is because the array, being a local variable, is automatically allocated, so there's actually a free slice of memory there. That's (usually) not the case with an uninitialized pointer.
As for the question of how the size of a string can change, once you understand the bit about null bytes, it becomes obvious: all you need to do to change the size of a string is futz with the null byte. For example:
char str[] = "Foo bar";
str[1] = (char)0; // I'd use the character literal, but this editor won't let me
At this point, the length of the string as reported by strlen will be exactly 1. Or:
char str[] = "Foo bar";
str[7] = '!';
after which strlen will probably crash, because it will keep trying to read more bytes from beyond the array boundary. It might encounter a null byte and then stop (and of course, return the wrong string length), or it might crash.
I've written all of one C program, so expect this answer to be inaccurate and incomplete in a number of ways, which will undoubtedly be pointed out in the comments. ;-)
Your str3 is too short - you need to add extra byte for null-terminator and the length of "<-------------------->" string literal.
Out of curiosity, though, I tried
expanding the "concatenated" string to
be longer than the size I had
allocated. Much to my surprise, it
still worked and the string size
increased and could be printf'd fine.
The behaviour is undefined so it may or may not segfault.
strlen returns the length of the string without the trailing NULL byte (\0, 0x00) but when you create a variable to hold the combined strings you need to add that 1 character.
char str3[length3 + 1];
…and you should be all set.
C strings are '\0' terminated and require an extra byte for that, so at least you should do
char str3[length3 + 1]
will do the job.
In sprintf() ypu are writing beyond the space allocated for str3. This may cause any type of undefined behavior (If you are lucky then it will crash). In strlen(), it is just searching for a NULL character from the memory location you specified and it is finding one in 29th location. It can as well be 129 also i.e. it will behave very erratically.
A few important points:
Just because it works doesn't mean it's safe. Going past the end of a buffer is always unsafe, and even if it works on your computer, it may fail under a different OS, different compiler, or even a second run.
I suggest you think of a char array as a container and a string as an object that is stored inside the container. In this case, the container must be 1 character longer than the object it holds, since a "null character" is required to indicate the end of the object. The container is a fixed size, and the object can change size (by moving the null character).
The first null character in the array indicates the end of the string. The remainder of the array is unused.
You can store different things in a char array (such as a sequence of numbers). It just depends on how you use it. But string function such as printf() or strcat() assume that there is a null-terminated string to be found there.

Resources