C String Null Zero? - c

I have a basic C programming question, here is the situation. If I am creating a character array and if I wanted to treat that array as a string using the %s conversion code do I have to include a null zero. Example:
char name[6] = {'a','b','c','d','e','f'};
printf("%s",name);
The console output for this is:
abcdef
Notice that there is not a null zero as the last element in the array, yet I am still printing this as a string.
I am new to programming...So I am reading a beginners C book, which states that since I am not using a null zero in the last element I cannot treat it as a string.
This is the same output as above, although I include the null zero.
char name[7] = {'a','b','c','d','e','f','\0'};
printf("%s",name);

You're just being lucky; probably after the end of that array, on the stack, there's a zero, so printf stops reading just after the last character. If your program is very short and that zone of stack is still "unexplored" - i.e. the stack hasn't grown yet up to that point - it's very easy that it's zero, since generally modern OSes give initially zeroed pages to the applications.
More formally: by not having explicitly the NUL terminator, you're going in the land of undefined behavior, which means that anything can happen; such anything may also be that your program works fine, but it's just luck - and it's the worst type of bug, since, if it generally works fine, it's difficult to spot.
TL;DR version: don't do that. Stick to what is guaranteed to work and you won't introduce sneaky bugs in your application.

The output of your fist printf is not predictable specifically because you failed to include the terminating zero character. If it appears to work in your experiment, it is only because by a random chance the next byte in memory happened to be zero and worked as a zero terminator. The chances of this happening depend greatly on where you declare your name array (it is not clear form your example). For a static array the chances might be pretty high, while for a local (automatic) array you'll run into various garbage being printed pretty often.

You must include the null character at the end.
It worked without error because of luck, and luck alone. Try this:
char name[6] = {'a','b','c','d','e','f'};
printf("%s",name);
printf("%d",name[6]);
You'll most probably see that you can read that memory, and that there's a zero in it. But it's sheer luck.

What most likely happened is that there happened to be the value of 0 at memory location name + 6. This is not defined behavior though, you could get different output on a different system or with a different compiler.

Yes. You do. There are a few other ways to do it.
This form of initialization, puts the NUL character in for you automatically.
char name[7] = "abcdef";
printf("%s",name);
Note that I added 1 to the array size, to make space for that NUL.
One can also get away with omitting the size, and letting the compiler figure it out.
char name[] = "abcdef";
printf("%s",name);
Another method is to specify it with a pointer to a char.
char *name = "abcdef";
printf("%s",name);

Related

If strncat adding NUL may cause the array go out of bound

I have some trouble with strncat().The book called Pointers On C says the function ,strncat(),always add a NUL in the end of the character string.To better understand it ,I do an experiment.
#include<stdio.h>
#include<string.h>
int main(void)
{
char a[14]="mynameiszhm";
strncat(a,"hello",3);
printf("%s",a);
return 0;
}
The result is mynameiszhmhel
In this case the array has 14 char memory.And there were originally 11 characters in the array except for NUL.Thus when I add three more characters,all 14 characters fill up the memory of array.So when the function want to add a NUL,the NUL takes up memory outside the array.This cause the array to go out of bounds but the program above can run without any warning.Why?Will this causes something unexpected?
So when we use the strncat ,should we consider the NUL,in case causes the array go out of bound?
And I also notice the function strncpy don't add NUL.Why this two string function do different things about the same thing?And why the designer of C do this design?
This cause the array to go out of bounds but the program above can run without any warning. Why?
Maybe. With strncat(a,"hello",3);, code attempted to write beyond the 14 of a[]. It might go out of bounds, it might not. It is undefined behavior (UB). Anything is allowed.
Will this causes something unexpected?
Maybe, the behavior is not defined. It might work just as you expect - whatever that is.
So when we use thestrncat ,should we consider the NUL, in case causes the array go out of bound?
Yes, the size parameter needs to account for appending a null character, else UB.
I also notice the function strncpy don't add NUL. Why this two string function do different things about the same thing? And why the designer of C do this design?
The 2 functions strncpy()/strncat() simple share similar names, not highly similar paired functionality of strcpy()/strcat().
Consider that the early 1970s, memory was far more expensive and many considerations can be traced back to a byte of memory more that an hour's wage. Uniformity of functionality/names was of lesser importance.
And there were originally 11 characters in the array except for NUL.
More like "And there were originally 11 characters in the array except for 3 NUL.". This is no partial initialization in C.
This is not really an answer, but a counterexample.
Observe the following modification to your program:
#include<stdio.h>
#include<string.h>
int main(void)
{
char p[]="***";
char a[14]="mynameiszhm";
char q[]="***";
strncat(a,"hello",3);
printf("%s%s%s", p, a, q);
return 0;
}
The results of this program are dependent on where p and q are located in memory, compared to a. If they are not adjacent, the results are not so clear but if either p or q immediately comes after a, then your strncat will overwrite the first * causing one of them not to be printed anymore because that will now be a string of length 0.
So the results are dependent on memory layout, and it should be clear that the compiler can put the variables in memory in any order it likes. And they can be adjacent or not.
So the problem is that you are not keeping to your promise not to put more than 14 bytes into a. The compiler did what you asked, and the C standards guarantee behaviour as long as you keep to the promises.
And now you have a program that may or may not do what you wanted it to do.

What happens when strnlen() is used with a larger maximum length than the buffer size actually is?

I've written the following code to understand better how strnlen behaves:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char bufferOnStack[10]={'a','b','c','d','e','f','g','h','i','j'};
char *bufferOnHeap = (char *) malloc(10);
bufferOnHeap[ 0]='a';
bufferOnHeap[ 1]='b';
bufferOnHeap[ 2]='c';
bufferOnHeap[ 3]='d';
bufferOnHeap[ 4]='e';
bufferOnHeap[ 5]='f';
bufferOnHeap[ 6]='g';
bufferOnHeap[ 7]='h';
bufferOnHeap[ 8]='i';
bufferOnHeap[ 9]='j';
int lengthOnStack = strnlen(bufferOnStack,39);
int lengthOnHeap = strnlen(bufferOnHeap, 39);
printf("lengthOnStack = %d\n",lengthOnStack);
printf("lengthOnHeap = %d\n",lengthOnHeap);
return 0;
}
Note the deliberate lack of null termination in both buffers.
According to the documentation, it seems that the lengths should
both be 39:
RETURN VALUE
The strnlen() function returns strlen(s), if that is less than maxlen, or
maxlen if there is no null terminating ('\0') among the first maxlen characters
pointed to by s.
Here's my compile line:
$ gcc ./main_08.c -o main
And the output:
$ ./main
lengthOnStack = 10
lengthOnHeap = 10
What's going on here? Thanks!
First of all, strnlen() is not defined by C standard; it's a POSIX standard function.
That being said, read the documentation carefully
The strnlen() function returns the number of bytes in the string pointed to by s, excluding the terminating null byte ('\0'), but at most maxlen. In doing this, strnlen() looks only at the first maxlen bytes at s and never beyond s+maxlen.
So that means, while calling the function, you need to make sure, for the value you provide for maxlen, the array idexing is valid for [maxlen -1] for the supplied string, i.e, the string has at least maxlen elements in it.
Otherwise, while accessing the string, you'll venture into memory location which is not allocated to you (array out of bound access) hereby invoking undefined behaviour.
Remember, this function is to calculate the length of an array, upper-bound to a value (maxlen). That implies, the supplied arrays are at least equal to or greater than the bound, not the other way around.
[Footnote]:
By definition, a string is null-terminated.
Quoting C11, chapter §7.1.1, Definitions of terms
A string is a contiguous sequence of characters terminated by and including the first null
character. [...]
Firstly, don't cast malloc.
Secondly, you are reading past the end of your arrays. The memory outside your array bounds is undefined, and therefore there is no guarantee that it is not zero; in this instance, it is!
In general, this kind of behaviour is sloppy - see this answer for a good summary of the potential consequences
Your question is roughly equivalent to the following:
I know that a burglar alarm is supposed to prevent your house from getting robbed. This morning when I left the house, I turned off the burglar alarm. Sometime during the day when I was away, a burglar broke in and stole my stuff. How did this happen?
Or to this:
I know you can use the cruise control on your car to help you avoid getting speeding tickets. Yesterday I was driving on a road where the speed limit was 65. I set the cruise control to 95. A cop pulled me over and I got a speeding ticket. How did this happen?
Actually, those aren't quite right. Here's a more contrived analogy:
I live in a house with a 10 yard long driveway to the street. I have trained my dog to fetch my newspaper. One day I made sure there were no newspapers on the driveway. I put my dog on a 39 yard leash, and I told him to fetch the newspapwer. I expected him to go to the end of the leash, 39 yards away. But instead, he only went 10 yards, then stopped. How did this happen?
And of course there are many answers. Perhaps, when your dog got to the end of your newspaper-free driveway, right away he found someone else's newspaper in the gutter. Or perhaps, when the leash failed to stop him at the end of the driveway and he continued into the street, he got run over by a car.
The point of putting your dog on a leash is to restrict him to a safe area -- in this case, your property, that you control. If you put him on such a long leash that he can go off into the street, or into the woods, you're kind of defeating the purpose of controlling him by putting him on a leash.
Similarly, the whole point of strnlen is to behave gracefully if, within the buffer you have defined, there is no null character for strnlen to find.
The problem with non-null-terminated strings is that functions like strlen (which blindly search for null terminators) sail off the end and rummage blindly around in undefined memory, desperately trying to find the terminator. For example, if you say
char non_null_terminated_string[3] = "abc";
int len = strlen(non_null_terminated_string);
the behavior is undefined, because strlen sails off the end. One way to fix this is to use strnlen:
char non_null_terminated_string[3] = "abc";
int len = strnlen(non_null_terminated_string, 3);
But if you hand a bigger number to strnlen, it defeats the whole purpose. You're back wondering what will happen when strnlen sails off the end, and there's no way to answer that.
What happens when ... "Undefined behaviour (UB)"?
“When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose”
Your heading is actually not UB, since calling strnlen("hi", 5) is perfectly legal, but the specifics of your question shows it is indeed UB...
Both strlen and strnlen expect a string, i.e. a nul-terminated char sequence. Providing your non-nul-terminatedchar array to the function is UB.
What happens in your case is that the function reads the first 10 chars, finds no '\0', and since it hasn't went out-of-bounds it continues to read further, and by that invoking UB (reading un-allocated memory). It could be that your compiler took the liberty to end your array with '\0', it could be that the '\0' was there before... the possibilities are limited only by the compiler designers.

C character array and its length

I am studying now C with "C Programming Absolute Beginner's Guide" (3rd Edition) and there was written that all character arrays should have a size equal to the string length + 1 (which is string-termination zero length). But this code:
#include <stdio.h>
main()
{
char name[4] = "Givi";
printf("%s\n",name);
return 0;
}
outputs Givi and not Giv. Array size is 4 and in that case it should output Giv, because 4 (string length) + 1 (string-termination zero character length) = 5, and the character array size is only 4.
Why does my code output Givi and not Giv?
I am using MinGW 4.9.2 SEH for compilation.
You are hitting what is considered to be undefined behavior. It's working now, but due to chance, not correctness.
In your case, it's because the memory in your program is probably all zeroed out at the beginning. So even though your string is not terminated properly, it just so happens that the memory right after it is zero, so printf knows when to stop.
+-----------------------+
|G|i|v|i|\0|\0|... |
+-----------------------+
| your | rest of |
| stuff | memory (stack)|
+-----------------------+
Other languages, such as Java, have safeguards against this sort of situations. Languages like C, however, do less hand holding, which, on the one hand, allows more flexibility, but on the other, give you much, much more ways to shoot you in the foot with subtle issues such as this one. In other words, if your code compiles, that doesn't mean it's correct and it won't blow up now, in 5 minutes or in 5 years.
In real life, this is almost never the case, and your string might end up getting stored next to other things, which would always end up getting printed out together with your string. You never want this. Situations like this might lead to crashes, exploits and leaked confidential information.
See the following diagram for an example. Imagine you're working on a web server and the string "secret"--a user's password or key is stored right next to your harmless string:
+-----------------------+
|G|i|v|i|s|e|c|r|e|t |
+-----------------------+
| your | rest of |
| stuff | memory (stack)|
+-----------------------+
Every time you would output what you would think is "Givi", you'd end up printing out the secret string, which is not what you want.
The byte after the last character always has to be 0, otherwise printf would not know when the string is terminanted and would try to access bytes (or chars) while they are not 0.
As Andrei said, apparently it just happened, that the compiler put at least one byte with the value 0 after your string data, so printf recognized the end of the string.
This can vary from compiler to compiler and thus is undefined behaviour.
There could, for instance, be a chance to have printf accessing an address, which your program is not allowed to. This would result in a crash.
In C text strings are stored as zero terminated arrays of characters. This means that the end of a text string is indicated by a special character, a numeric value of zero (0), to indicate the end of the string.
So the array of text characters to be used to store a C text string must include an array element for each of the characters as well as an additional array element for the end of string.
All of the C text string functions (strcpy(), strcmp(), strcat(), etc.) all expect that the end of a text string is indicated by a value of zero. This includes the printf() family of functions that print or output text to the screen or to a file. Since these functions depend on seeing a zero value to terminate the string, one source of errors when using C text strings is copying too many characters due to a missing zero terminator or copying a long text string into a smaller buffer. This type of error is known as a buffer overflow error.
The C compiler will perform some types of adjustments for you automatically. For instance:
char *pText = "four"; // pointer to a text string constant, compiler automatically adds zero to an additional array element for the constant "four"
char text[] = "four"; // compiler creates a array with 5 elements and puts the characters four in the first four array elements, a value of 0 in the fifth
char text[5] = "four"; // programmer creates array of 5 elements, compiler puts the characters four in the first four array elements, a value of 0 in the fifth
In the example you provided a good C compiler should issue at the minimum a warning and probably an error. However it looks like your compiler is truncating the string to the array size and is not adding the additional zero string terminator. And you are getting lucky in that there is a zero value after the end of the string. I suppose there is also the possibility that the C compiler is adding an additional array element anyway but that would seem unlikely.
What your book states is basically right, but there is missing the phrase "at least". The array can very well be larger.
You already stated the reason for the min length requirement. So what does that tell you about the example? It is crap!
What it exhibits is called undefined behaviour (UB) and might result in daemons flying out your nose for the printf() - not the initializer. It is just not covered by the C standard (well ,the standard actually says this is UB), so the compiler (and your libraries) are not expected to behave correctly.
For such cases, no terminator will be appended explicitly, so the string is not properly terminated when passed to `printf()".
Reason this does not produce an error is likely some legacy code which did exploit this to safe some bytes of memory. So, instead of reporting an error that the implicit trailing '\0' terminator does not fit, it simply does not append it. Silently truncating the string literal would also be a bad idea.
The following line:
char name[4] = "Givi";
May give warning like:
string for array of chars is too long
Because the behavior is Undefined, still compiler may pass it. But if you debug, you will see:
name[0] 'G'
name[1] 'i'
name[2] 'V'
name[3] '\0'
And so the output is
Giv
Not Give as you mentioned in the question!
I'm using GCC compiler.
But if you write something like this:
char name[4] = "Giv";
Compiles fine! And output is
Giv

Why I encounter a NULL terminating character at start of a string when I go backwards through it?

I found the following piece of code embedded in a C++ project. The code goes backwards through a C-style string. When I saw this I thought this should result in undefined behaviour. But it seems to work perfectly:
const char * hello = "Hello World.";
const char * helloPointPos = strchr(hello, '.');
for (const char * curchar = helloPointPos; *curchar; curchar--) {
printf("%s", curchar);
}
What I was wondering about is the part with *curchar; curchar--. This assumes that the string begins with a \0. Is this a legal assumption? Does this piece of code result in undefined behaviour? If not, why not?
I would appreciate if you could put some light on this. BTW platform is Windows and Compiler is VC++ 2010.
EDIT : Thank you all for your participation. Both answers are very good and helped me. But since I can only accept one answer I will go for paxdiablo's answer since it has more detail. Thank you!
No, it's very much not a requirement that the character before a string be \0, so that code does not have defined behaviour.
In fact, it's doubly undefined since you're not permitted to derefernce a pointer that's not within the array or one byte beyond the end. Since this is dereferencing one byte before the array, it's invalid in that sense as well.
It may work in some situations(a) but it's by no means good code.
In any case, the printing of the string rather than the character is going to give you strange results:
.d.ld.rld.orld.World. World.
and so on.
A better reverse iterator would be something like:
char *curchar = &(hello[strlen (hello)]); // one byte beyond
while (curchar-- != hello) // check if reached start, post-decr
putchar (*curchar); // just the character, thanks.
(a) In fact, it's often one of the most annoying things about undefined behaviour is that it sometimes does work, lulling you into a false sense of security.
I've often thought that all coders should have electrical wires hooked up to their most private parts so that undefined behaviour could deliver a short sharp shock - I suspect there would be a lot less undefined behaviour (or far fewer developers) after a while :-)
It's certainly not defined behavior, but in this case it isn't surprising that it works.
const char * hello = "Hello World."; puts the string Hello World. in a section with all other string literals. So very likely, there's a string literal before it, and it ends with \0, so there's \0' before Hello World., and the code works.
Obviously you can't rely on it - you're string might be the first in the section, or some non-string constant may be in there. Also, if the string is allocated any other way, chances to get \0 before it are lower.

Strings without a '\0' char?

If by mistake,I define a char array with no \0 as its last character, what happens then?
I'm asking this because I noticed that if I try to iterate through the array with while(cnt!='\0'), where cnt is an int variable used as an index to the array, and simultaneously print the cnt values to monitor what's happening the iteration stops at the last character +2.The extra characters are of course random but I can't get it why it has to stop after 2.Does the compiler automatically inserts a \0 character? Links to relevant documentation would be appreciated.
To make it clear I give an example. Let's say that the array str contains the word doh(with no '\0'). Printing the cnt variable at every loop would give me this:
doh+
or doh^
and so on.
EDIT (undefined behaviour)
Accessing array elements outside of the array boundaries is undefined behaviour.
Calling string functions with anything other than a C string is undefined behaviour.
Don't do it!
A C string is a sequence of bytes terminated by and including a '\0' (NUL terminator). All the bytes must belong to the same object.
Anyway, what you see is a coincidence!
But it might happen like this
,------------------ garbage
| ,---------------- str[cnt] (when cnt == 4, no bounds-checking)
memory ----> [...|d|o|h|*|0|0|0|4|...]
| | \_____/ -------- cnt (big-endian, properly 4-byte aligned)
\___/ ------------------ str
If you define a char array without the terminating \0 (called a "null terminator"), then your string, well, won't have that terminator. You would do that like so:
char strings[] = {'h', 'e', 'l', 'l', 'o'};
The compiler never automatically inserts a null terminator in this case. The fact that your code stops after "+2" is a coincidence; it could just as easily stopped at +50 or anywhere else, depending on whether there happened to be \0 character in the memory following your string.
If you define a string as:
char strings[] = "hello";
Then that will indeed be null-terminated. When you use quotation marks like that in C, then even though you can't physically see it in the text editor, there is a null terminator at the end of the string.
There are some C string-related functions that will automatically append a null-terminator. This isn't something the compiler does, but part of the function's specification itself. For example, strncat(), which concatenates one string to another, will add the null terminator at the end.
However, if one of the strings you use doesn't already have that terminator, then that function will not know where the string ends and you'll end up with garbage values (or a segmentation fault.)
In C language the term string refers to a zero-terminated array of characters. So, pedantically speaking there's no such thing as "strings without a '\0' char". If it is not zero-terminated, it is not a string.
Now, there's nothing wrong with having a mere array of characters without any zeros in it, as long as you understand that it is not a string. If you ever attempt to work with such character array as if it is a string, the behavior of your program is undefined. Anything can happen. It might appear to "work" for some magical reasons. Or it might crash all the time. It doesn't really matter what such a program will actually do, since if the behavior is undefined, the program is useless.
This would happen if, by coincidence, the byte at *(str + 5) is 0 (as a number, not ASCII)
As far as most string-handling functions are concerned, strings always stop at a '\0' character. If you miss this null-terminator somewhere, one of three things will usually happen:
Your program will continue reading past the end of the string until it finds a '\0' that just happened to be there. There are several ways for such a character to be there, but none of them is usually predictable beforehand: it could be part of another variable, part of the executable code or even part of a larger string that was previously stored in the same buffer. Of course by the time that happens, the program may have processed a significant amount of garbage. If you see lots of garbage produced by a printf(), an unterminated string is a common cause.
Your program will continue reading past the end of the string until it tries to read an address outside its address space, causing a memory error (e.g. the dreaded "Segmentation fault" in Linux systems).
Your program will run out of space when copying over the string and will, again, cause a memory error.
And, no, the C compiler will not normally do anything but what you specify in your program - for example it won't terminate a string on its own. This is what makes C so powerful and also so hard to code for.
I bet that an int is defined just after your string and that this int takes only small values such that at least one byte is 0.

Resources