Why/when to include terminating '\0' character for C Strings? - c

I'm very new to C and am a bit confused as to when we need to manually add the terminating '\0' character to strings. Given this function to calculate string length (for clarity's sake):
int stringLength(char string[])
{
int i = 0;
while (string[i] != '\0') {
i++;
}
return i;
}
which calculates the string's length based on the null terminating character. So, using the following cases, what is the role of the '\0' character, if any?
Case 1:
char * stack1 = "stack";
printf("WORD %s\n", stack1);
printf("Length %d\n", stringLength(stack1));
Prints:
WORD stack
Length 5
Case 2:
char stack2[5] = "stack";
printf("WORD %s\n", stack2);
printf("Length %d\n", stringLength(stack2));
Prints:
WORD stack���
Length 8
(These results vary each time, but are never correct).
Case 3:
char stack3[6] = "stack";
printf("WORD %s\n", stack3);
printf("Length %d\n", stringLength(stack3));
Prints:
WORD stack
Length 5
Case 4:
char stack4[6] = "stack";
stack4[5] = '\0';
printf("WORD %s\n", stack4);
printf("Length %d\n", stringLength(stack4));
Prints:
WORD stack
Length 5
Case 5:
char * stack5 = malloc(sizeof(char) * 5);
if (stack5 != NULL) {
stack5[0] = 's';
stack5[1] = 't';
stack5[2] = 'a';
stack5[3] = 'c';
stack5[4] = 'k';
printf("WORD %s\n", stack5);
printf("Length %d\n", stringLength(stack5));
}
free(stack5);
Prints:
WORD stack
Length 5
Case 6:
char * stack6 = malloc(sizeof(char) * 6);
if (stack6 != NULL) {
stack6[0] = 's';
stack6[1] = 't';
stack6[2] = 'a';
stack6[3] = 'c';
stack6[4] = 'k';
stack6[5] = '\0';
printf("WORD %s\n", stack6);
printf("Length %d\n", stringLength(stack6));
}
free(stack6);
Prints:
WORD stack
Length 5
Namely, I would like to know the difference between cases 1, 2, 3, and 4 (also why the erratic behavior of case 2 and no need to specify the null-terminating character in 1 and 3. Also, how 3 and 4 both work the same?) and how 5 and 6 print out the same thing even though not enough memory is allocated in case 5 for the null-terminating character (since only 5 char slots are allocated for each letter in "slack", how does it detect a '\0' character, i.e. the 6th character?)
I'm so sorry for this absurdly long question, it's just I couldn't find a good didactic explanation on these specific instances anywhere else

The storage for a string must always leave room for the terminating null character. In some of your examples you don't do this, explicitly giving a length of 5. In those cases you will get undefined behavior.
String literals always get the null terminator automatically. Even though strlen returns a length of 5, it is really taking 6 bytes.
Your case 5 only works because undefined sometimes means looking like it worked. You probably have a value of zero following the string in memory - but you can't rely on that.

In case 1, you are creating a string literal (a constant which will be on read only memory) which will have the \0 implicitly added to it.
Since \0's position is relied upon to find the end of string, your stringLength() function prints 5.
In case 2, you are trying to initialise a character array of size 5 with a string of 5 characters leaving no space for the \0 delimiter. The memory adjacent to the string can be anything and might have a \0 somewhere. This \0 is considered the end of string here which explains those weird characters that you get. It seems that for the output you gave, this \0 was found only after 3 more characters which were also taken into account while calculating the string length. Since the contents of the memory change over time, the output may not always be the same.
In case 3, you are initialising a character array of size 6 with a string of size 5 leaving enough space to store the \0 which will be implicitly stored. Hence, it will work properly.
Case 4 is similar to case 3. No modification is done by
char stack4[5] = '\0';
because size of stack4 is 6 and hence its last index is 5. You are overwriting a variable with its old value itself. stack4[5] had \0 in it even before you overwrote it.
In case 5, you have completely filled the character array with characters without leaving space for \0. Yet when you print the string, it prints right. I think it is because the memory adjacent to the memory allocated by malloc() merely happened to be zero which is the value of \0. But this is undefined behavior and should not be relied upon. What really happens depends on the implementation.
It should be noted that malloc() will not initialise the memory that it allocates unlike calloc().
Both
char str[2]='\0';
and
char str[2]=0;
are just the same.
But you cannot rely upon it being zero. Memory allocated dynamically could be having zero as the default value owing to the working of the operating system and for security reasons. See here and here for more about this.
If you need the default value of dynamically allocated memory to be zero, you can use calloc().
Case 6 has the \0 in the end and characters in the other positions. The proper string should be displayed when you print it.

Related

The null character at the end of string literal

I wrote two versions of codes to practice how to make a character array, and I was expecting the result to be the same.
version 1:
int main(void)
{
char a[7] = "and";
printf("size: %d length: %d",sizeof(a), strlen(a) );
}
version 2:
int main(void)
{
char a[7];
a[1] = 'a';
a[2] = 'n';
a[3] = 'd';
printf("size: %d length: %d",sizeof(a), strlen(a) );
}
However, here is the result I got:
version 1:
size: 7 length: 3
version 2:
size: 7 length: 4
As far as I know, the string ends with null character, and null character in a string literal is implicit, but why did it disappear? Why didn't it be included as the last element as length in Version 1 shows 3?
In fact strlen is supposed to exclude the null character, so the output 3 is correct. The problem with the second version is that a[7] is not initialized so its values may be arbitrary. It just so happens in this case that the 5th value is 0 and the 0th is not, thus the output 4. Note that in the second version you use wrong indices - indexing of arrays starts from 0, not from 1.
If you want to make this work in the second version, re-write it like so:
int main(void)
{
char a[7] = {0};
a[0] = 'a';
a[1] = 'n';
a[2] = 'd';
printf("size: %d length: %d",sizeof(a), strlen(a) );
}
This initializes the first value in a to 0 explicitly and implicitly makes all other values zero too.
In the second case
int main(void)
{
char a[7];
a[1] = 'a';
a[2] = 'n';
a[3] = 'd';
printf("size: %d length: %d",sizeof(a), strlen(a) );
}
you are lucky that
a[0] did not contain a 0 (i.e, '\0' or null) : you would have seen a value 0 as length.
a[4] actually had a null value, otherwise, your computer might have been burnt!!
In other words, for a local variable with automatic storage, if left uninitialized, the values are indeterminate. There's no guarantee of a 0-filling (which acts as null terminator), so using the array as a string (ex: argument to strlen()) will likely have an effect of accessing out of bound memory and invoke undefined behavior.
sizeof measure the memory size of something. strlen calculates the length of a c-string (length defined as the length of the sequence of characters excluding the nul terminating one). Here your c-string is shorter than the memory used to store it.
sizeof is C-operator evaluated as compile-time.
strlen is a library function, called at runtime.
strlen(3)
DESCRIPTION
The strlen() function computes the length of the string s.
RETURN VALUES
The strlen() function returns the number of characters that precede the
terminating NUL character.
Beware that your second example is undefined as the character array is not initialized, strlen may overflow... You have no guarantee that non initialized chars are set to 0.
As per strlen description:
size_t strlen(const char *str);
Returns the length of the given null-terminated byte string, that is, the number of characters in a character array whose first element is pointed to by str up to and not including the first null character.
The behavior is undefined if str is not a pointer to a null-terminated byte string.
Since in the second version, a is not a null-terminated byte string, the behavior is undefined.
Note that assigning individual character literals to a char array does not make it a string literal, to create a properly null-terminated char array using that kind of assignment you need to do it yourself, starting at index 0:
a[0] = 'a';
//...
a[3] = '\0';

Convert char String to an int but not possible to convert a char from an char string to an int?

char input[5] = "12345";
printf("Convert to int valid %d", atoi(input));
printf("Convert to int invalid %d", atoi(input[1])); // program crash
Is there a solution to convert an char "slice" of an char string into an int?
Short description:
User inputs a string with values for example: 1, 2 3 4 ,5
Iam formating that string to 12345
With each number i want to continue to work with the index of an array.
If you mean "how to access a substring in the char [] array", you can use pointer arithmetic:
char input[6] = "12345";
printf("strtol(input + 1) = %d\n", strtol(input + 1, NULL, 10)); // Output = "2345";
Few things to note though:
Your array should be 6 elements long to hold the null terminator
atoi shouldn't be used at all; strtol is a better function for the task of converting a string to a signed integer; see here for more info.
Also, to convert a single character to an int:
if(isdigit(c))
{
c -= '0';
}
The relation that a textual representation of a digit is exactly '0' higher than the numeric value of that digit is guaranteed to hold for every character set supported by C.
To properly convert an arbitrary slice, you have to either make a copy or modify the string by inserting a \0 after the slice. The latter may not be an option, depending on where the string is stored.
To make a copy, allocate an array big enough to hold the slice and a \0. If you know the size of the slice at compile time, you can allocate on the stack:
char slice[2];
Otherwise, you'll have to allocate dynamically:
char *slice;
slice = malloc(2);
Stack allocated slices do not need to be deallocated, but dynamically allocated ones should be freed as soon as they are no longer needed:
free(slice);
Once you have the slice allocated, copy the portion of interest and terminate it with \0:
strncpy(slice, s + 1, 1);
slice[1] = '\0';
atoi(slice);
This technique will pretty much always work.
If your slice always ends with the string, you don't need to make a copy: you just need to pass a pointer to the start of the slice:
atoi(s + 1);
Modifying the string itself probably won't work, unless it's in writeable memory. If you're sure this is the case, you can do something like:
char tmp;
tmp = s[1];
s[1] = '\0';
atoi(s);
s[1] = tmp;
If you were sure but the memory wasn't writeable, your program will seg-fault.
For the special case where your slice is exactly one character long, you can use the fact that characters are numbers:
s[0] - '0'
Note that '0' !='\0' and that this won't work if your machine uses EBCDIC or similar.

Print one character from string

I'm facing an issue connected with printing one char from string in c.
The function takes from users two variables - number (number which should print character from string) and string. When I put as a string "Martin" and number is 5 then the output is "i". But when the number is larger than the string length something goes wrong and I actually don't know what's wrong.
PS. If the number is longer than string size it should print "Nothing".
void printLetter() {
char * string = (char*)malloc(sizeof(char));
int n;
printf("Number:\n");
scanf("%i", &n);
printf("String:\n");
scanf("%s", string);
if(n > strlen(string)) {
printf("nothing");
} else {
printf("%c\n", string[n+1]);
}
free(string);
}
There is no need for dynamic allocation here, since you do not know the length of the string in advance, so just do:
void printLetter() {
char string[100]; // example size 100
...
scanf("%99s", string); // read no more than your array can hold
}
A fun exercise would be to count the length of the string, allocate dynamically exactly as mush space as you need (+1 for the null terminator), copy string to that dynamically allocated space, use it as you wish, and then free it.
Moreover this:
printf("%c\n", string[n+1]);
should be written as this:
printf("%c\n", string[n-1]);
since you do not want to go out bounds of your array (and cause Undefined Behavior), or print two characters next of the requested character, since when I ask for the 1st character, you should print string[0], when I ask for the 2nd, you should print string[1], and so on. So you see why we need to print string[n-1], when the user asks for the n-th letter.
By the way, it's common to use a variable named i, and not n as in your case, when dealing with an index. ;)
In your code, this:
char * string = malloc(sizeof(char));
allocates memory for just one character, which is no good, since even if the string had one letter only, where would you put the null terminator? You know that strings in C should (almost) always be NULL terminated.
In order to allocate dynamically memory for a string of size N, you should do:
char * string = malloc((N + 1) * sizeof(char));
where you allocate space for N characters, plus 1 for the NULL terminator.
Couple of problems...
sizeof(char) is generally 1 byte. Hence malloc() is allocating only one byte of memory to string. Perhaps a larger block of memory is required? "Martin", for example, will require at least 6 bytes, plus the string termination character (seven bytes total).
printf("%c\n", string[n+1]) is perhaps not quite right...
String: Martin\0
strlen= 6
Offset: 0123456
n = 5... [n+1] = 6
The character being output is the string terminator '\0' at index 6.
This might work better:
void printLetter() {
char * string = malloc(100 * sizeof(char));
int n;
printf("Number:\n");
scanf("%i", &n);
printf("String:\n");
scanf("%s", string);
if(n > strlen(string)) {
printf("nothing");
} else {
printf("%c\n", string[n-1]);
}
free(string);
}
You are facing buffer overflow.
Take a look to this question, so it will show you how to manage your memory properly in such situation: How to prevent scanf causing a buffer overflow in C?
Alternatively you can ask for number of letter and allocate only that much memory + 1. Then fgets(string, n,stdin); because you don't need rest of the string :-)

Strcpy: behaving more like 'strcut'

char a[3], b[3];
strcpy(a,"abc");
printf("a1 = %s\n", a);
strcpy(b,a);
printf("a2 = %s\n", a);
printf("b = %s\n", b);
From how I understand strcpy to work the output would be:
a1 = abc
a2 = abc
b = abc
Instead I obtain
a1 = abc
a2 =
b = abc
Why when I call strcpy the second time does it (apparently) erase the contents of a?
Thanks
This is a buffer overflow problem - your a and b are too short – they don't have room for the null terminator. What is happening is a is just after b in memory, so when strcpy(b,a) executes, the null terminator stored at the end of b is actually the same memory location as the first character of a. This makes a suddenly an empty string.
For starters, make the lengths of the arrays 4 instead of 3. This is okay in sandbox/play/learning mode, but consider in production code:
Use safer string functions (e.g. strncpy) to avoid buffer overflows.
Use character arrays/buffers that support variable size or pre-calculation of the size required to fit your data.
Since you arrays are too small and do not have room for the null terminator you are most likely overwriting a when you try to copy a to b since the strcpy does not know when to stop copying. This declaration would fix the problem for this particular program:
char a[4], b[4];
In the general case you need to ensure that your destination has enough to space to accommodate the source as well as the null terminator.
This example gives you a better idea of what is going on, this is just for demonstration purposes and you should use code like this for anything else but to learn. This works for me in ideone and you can see if live here but may not work properly in other compilers since we are invoking undefined behavior:
#include <stdio.h>
#include <string.h>
int main()
{
char a[3], b[4];
// a will have a lower address in memory than b
printf("%p %p\n", a, b);
// "abc" is a null terminated literal use a size of 4 to force a copy of null
strncpy(a,"abc",4);
// printf will not overrun buffer since we terminated it
printf("a2 = %s\n", a);
// explicitly only copy 3 bytes
strncpy(b,a,3);
// manually null terminate b
b[3] = '\0' ;
// So we can prove we are seeing b's contents
b[0] = 'z' ;
// This will overrun into b now since b[0] is no longer null
printf("a2 = %s\n", a);
printf("b = %s\n", b);
}
The first strcpy(a,"abc") is already wrong. Don't get confused with char array versus C-String... a C-String is a always a char array, but a char array is NOT always a C-String.
A C-String must have a '\0' char in the end. So when you do strcpy "abc" -> a[3] you are actually moving the following 4 bytes to your array { 'a', 'b', 'c', '\0' }
Because a and b were created together, b is right ahead a. When you print out a it goes fine IN THIS CASE because printf() still can find a '\0' to identify as the end of string, despite it's wrong... because your '\0' char is the the area reserved to b.
The following problems are all related to the same thing...
The solution is: the buffer to your C-String must the the maximum size of your string + 1, so you can guarantee you will have room for the '\0' char. If you need for more details, google for "C-String" or "null-terminated string".
You've made a very common beginners mistake. In C, there is no string primitive; when we talk about strings, we're really talking about null-terminated character arrays (or buffers, I don't care what nomenclature you like). so your char[3] will hold a string of 2 letters, plus the null terminator. Another subtle issue is that in memory, they will be laid out on the stack as a[0]a[1]a[2]b[0]b[1]b[2]--and this is the reason you didn't crash when you deserved to. See "abc" is REALLY "abc\0", so a[3] == c and b[0] == \0, and since the behavior is undefined when strings overlap (as these do), I suspect that your implementation just copied chars until it copies a \0. That being the case, strcpy(a, b) will result in a being an empty string.
On the other hand, your program works as it was written to. What you wrote isn't what you meant :)

About string length, terminating NUL, etc

I'm currently learning C and I'm confused with differences between char array and string, as well as how they work.
Question 1:
Why is there a difference in the outcomes of source code 1 and source code 2?
Source code 1:
#include <stdio.h>
#include <string.h>
int main(void)
{
char c[2]="Hi";
printf("%d\n", strlen(c)); //returns 3 (not 2!?)
return 0;
}
Source code 2:
#include <stdio.h>
#include <string.h>
int main(void)
{
char c[3]="Hi";
printf("%d\n", strlen(c)); //returns 2 (not 3!?)
return 0;
}
Question 2:
How is a string variable different from a char array? How to declare them with the minimum required index numbers allowing \0 to be stored if any (please read the codes below)?
char name[index] = "Mick"; //should index be 4 or 5?
char name[index] = {'M', 'i', 'c', 'k'}; //should index be 4 or 5?
#define name "Mick" //what is the size? Is there a \0?
Question 3:
Does the terminating NUL ONLY follow strings but not char arrays? So the actual value of the string "Hi" is [H][i][\0] and the actual value of the char array "Hi" is [H][i]?
Question 4:
Suppose c[2] is going to store "Hi" followed by a \0 (not sure how this is done, using gets(c) maybe?). So where is the \0 stored? Is it stored "somewhere" after c[2] to become [H][i]\0 or will c[2] be appended with a \0 to become c[3] which is [H][i][\0]?
It is quite confusing that sometimes there is a \0 following the string/char array and causes trouble when I compare two variables by if (c1==c2) as it most likely returns FALSE (0).
Detailed answers are appreciated. But keeping your answer brief helps my understanding :)
Thank you in advance!
Answer 1: In code 1 you have a char array that is not a string; in code 2 you have a char array that is also a string.
Answer 2: A string is a char array in which (at least) one element has the value 0; if you leave the size part empty, the compiler will automatically fill it with the minimum possible value.
char astring[] = "foobar"; /* compiler automagically uses 7 for size */
printf("%d\n", (int)sizeof astring);
Answer 3: a char array in which one of the elements is NUL is a string; a char array where no elements are NUL is not a string.
Answer 4: an array defined to hold two elements (char c[2];) cannot hold three elements. If it is going to be a string it can only be the empty string or a string with 1 character.
Question 1:
Why is there a difference in the outcomes of source code 1 and source
code 2?
Source code 1:
#include <stdio.h>
#include <string.h>
int main()
{
char c[2]="Hi";
printf("%d", strlen(c)); //returns 3 (not 2!?)
getchar();
}
Source code 2:
#include <stdio.h>
#include <string.h>
int main()
{
char c[3]="Hi";
printf("%d", strlen(c)); //returns 2 (not 3!?)
getchar();
}
answer:
Because in the first case, c[] is only holding "Hi". strlen looks for a zero at the end, and, depending on exactly what is behind c[] finds one sooner or later, or crashes. We can't say without knowing exactly what is in the memory behind the c[] array.
Question 2:
How is a string variable different from a char array? How to declare
them with the minimum required index numbers allowing \0 to be stored
if any (please read the codes below)?
char name[index] = "Mick"; //should index be 4 or 5?
char name[index] = {'M', 'i', 'c', 'k'}; //should index be 4 or 5?
answer
Really depends on what you want to do. Probably 5 if you want to actually use the content as a string. But there's nothing saying you can't store "Mick" in a 4 character array - you just can't use strlen to find out how long it is, because strlen will continue to 5 and quite possibly (much) further to find the length, and if there is no zero in the next several memory locations, it could lead to a crash, because eventually, there won't be valid memory addresses to read.
#define name "Mick" //what is the size? Is there a \0?
This has absolutely no size at all, until you use name somwhere. #defines are not part of what the compiler sees - the pre-processor will replace name with "Mick" if you use name anywhere - and hopefully, that's in a place the compiler can make sense of. And then the same rules apply as in previous answer - it depends on how you want to use the array of characters. For correct operation with strlen, strpy, and nearly all other str... functions, you need a zero at the end.
Question 3:
Does the terminating null ONLY follow strings but not char arrays? So
the actual value of the string "Hi" is [H][i][\0] and the actual value
of the char array "Hi" is [H][i]?
Yes, no, maybe. It all depends on how you USE the "Hi" string literal (that's the technical name for 'something within double quotes'). If the compiler is "allowed", it will put a zero at the end. But if you initialize an array to a given size, it will stuff the bytes in there, and if there isn't room for a zero, that's your problem, not the compiler's.
Question 4:
Suppose c[2] is going to store "Hi" followed by a \0 (not sure how
this is done, using gets(c) maybe?). So where is the \0 stored? Is it
stored "somewhere" after c[2] to become [H][i]\0 or will c[2] be
appended with a \0 to become c[3] which is [H][i][\0]?
In c[2], beyond the 'H', 'i', there is no telling what is stored [technically, it could well be "the end of the earth" - in computer terms, that's "memory that can't be read - in which case strlen on that WILL crash your program, because strlen reads beyond the end of the earth]. But if could also be a zero, a one, the letter 'a', the number 42, or any other 8-bit [1] value.
It is quiet confusing that sometimes there is a \0 following the
string/char array and causes trouble when I compare two variables by
if (c1==c2) as it most likely returns FALSE (0).
If c1 and c2 are char arrays, that will ALWAYS be false since c1 and c2 are never going to have the same address, and when using an array in C in that way, it becomes "the address in memory of the first element in the array". So no matter what teh contents of c1 and c2 is, their address can never be the same [because they are two different variables, and two variables can not have the same location in memory - that's like trying to park two cars in a parking space large enough only for one car - and no, crushing either car is not allowed in our thought experiment].
[1] Char isn't guaranteed to be 8 bits. But lets inore that for now.
Running source code one is undefined behavior because strlen() requires a NUL-terminated string, which c[2] = "Hi"; /* = { 'H', 'i' } */ is not. A string differs from a char array in that a string is a char array with at least one NUL byte somewhere in the array.
The remaining answers should follow easily from this fact.
To autosize a char array to match the size of a string literal at initialization, simply specify no array size:
char c[] = "This will automatically size the c array (including the NUL).";
Note that you cannot compare char arrays with the == operator. You have to use
if (strcmp(c1, c2) == 0) {
/* Equal. */
} else {
/* Not equal. */
}
strlen() works on \0 terminating characters and in C all strings should be \0 terminated. So when you have given only 2 spaces for 2 characters H and i but there is no room for \0. Hence you are getting Undefined Behavior in strlen().
In case of char c[3] = "Hi"; there is \0 at the third place and strlen() will calculate the actual length.
How to declare them with the minimum required index numbers allowing \0 to be stored if any ?
When you are not sure about the size of char array , Do like this :
char c1[] = "Mike"; // strlen = 4
char c2[] = "Omkant" // strlen = 6
NOTE :
EDIT :In the above case where no size is mentioned explicitly , Do not confuse with sizeof with the strlen().
strlen() returns only number of charaters
sizeof gives number of characters plus one more (for \0 character).
So sizeof always gives exactly 1 more than the number returned by strlen().

Resources