Confused about the Null Zero in C - c

Let's say you have:
char[5] = "March";
this won't work in C although:
char[0]='M'
char[1]='a'
char[2]='r'
char[3]='c'
char[4]='h'
char[5]='\0'
For some reason, I have to say to C that char[6]="March" and not char [5].
Why is this? What goes into char[6]?

First things first, char is not a valid variable in C, it's a keyword. You probably meant
char xyzzy[6];
or something similar, which would create a character array called xyzzy. But, once that's fixed up, nothing goes into xyyzy[6]. The statement char xyzzy[6]; means an array of six characters, the indexes of which are 0 through 5 inclusive: {0,1,2,3,4,5}, that's six elements.
In any case, unless you need the array to be bigger, you're usually better off letting the compiler choose the size with:
char xyzzy[] = "March";

A string literal, "March" in this case, has an implicit null terminator so a char[] array requires 6 elements to store it. To quote the relevant points from the section 6.4.5 String literals of the C99 standard:
A character string literal is a sequence of zero or more multibyte characters enclosed in
double-quotes, as in "xyz".
In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals.
In both cases the code is overrunning the end of the array. The code currently has undefined behaviour.
Change to:
char a[6] = "March";
The second code snippet accesses beyond the end of the array as indexes run from 0 to N - 1, where N is the number of elements in the array. For an array char a[5]; the valid index are 0, 1, 2, 3, 4.

The null terminator \0 goes in the last slot.
It is there because otherwise there would be no way to check the length of the string.

There are 6 elements if you count from 0 to 5 included.
So you must declare a char[6] in order to have str[0] to str[5].

char [5] means, you have memory for 5x sizeof(char). 0-5 are six times sizeof(char): 0 first, 1 second, 2 third, 3 fourth, 4 fifth, 5 sixth. So you need a segment with 6 times sizeof(char): char [6]. The first cell, zero, consumes space, too.

Nothing goes into char[6], but you still need a char[] of size 6 to hold the 6 characters including the null-terminator byte. They will go into char[0] through char[5]. char[6] remains undefined.

char array[n]
means an n elements long array, not an array whose last index is n. Because in C, arrays are zero-based, it means that the last valid index of an n-element array is n - 1. Since there's a zero terminator for a string, and "March" is 5 letters, that's a total of 6 characters, so you have to write char str[6] = "March";. If that's confusing for now, don't include the length of the array; for initialized arrays, the compiler will automagically fill it in, so you can write char str[] = "March"; instead.

Char is a keyword not a variable name, so don't use it as one.
Array indexing: that is counting the number of characters in an array begining from 0, i.e a[0].
Suppose there are 5 characters in the array (for eg: tiger) then a[0] = 't' and a[4] = 'r' and all the string literals in this case an array of characters ends in a '\0' i.e a EOF
character. So to store an array of n characters use a[n] compiler will add '\0' at the end of that array.

Related

Why different char array size for identical strings in C? [duplicate]

This question already has answers here:
Are char arrays guaranteed to be null terminated?
(4 answers)
Closed 1 year ago.
In C the string is always terminated with a \0 character, so the length of the string is the number of characters + 1. So, in case you need to initialise a string of 12 characters, you must "allow one element for the termination character" (from p.221, "Beginning C", Horton).
All right, let's try
#include <stdio.h>
int main(void)
{
char arr1[12] = "Hello world!";
char arr2[] = "Hello world!";
printf("\n>%s< has %lu elements", arr1, sizeof(arr1) / sizeof(arr1[0]));
printf("\n>%s< has %lu elements", arr2, sizeof(arr2) / sizeof(arr2[0]));
return 0;
}
outputs
>Hello world!< has 12 elements
>Hello world!< has 13 elements
Why the C complier allowed me to create a fixed size array arr1 without a \0 character and added \0 when I asked for a variable sized array arr2?
When you define an array with a fixed size and initialize it with a string literal:
If the size is smaller than the number of characters in the string (not counting the terminating null character), the compiler will complain about the mismatch.
If the size equals the number of characters in the string (not counting the terminating null character), the compiler will initialize the array with the characters in the string and no terminating null character. This is used to initialize an array that will be used only as an array of characters, not as a string. (A string is a sequence of characters terminated by a null character.)
If the size is greater than the number of characters in the string, the compiler will initialize the array with the characters in the string and a terminating null character. Any elements in the array beyond that will also be initialized to zero.
When you define an array without a stated size and initialize it with a string literal:
The compiler counts the characters in the string literal, including the terminating null character, and makes that the array size.
These are the rules of the C standard.

char array in a struct data type

I actually have a question regarding the concept of a char array, especially the one which is declared and initialized like below.
char aString[10] = "";
What i was taught was that this array can store up to 10 characters (index 0-9) and that at index 10 there is an automatically placed null terminating character (i know that accessing it would not be right) such that if we use string handling functions (printf, scanf, strcmp, etc.) they would know when the string stops.
However when I tried making a struct data type like below,
typedef struct customer{
char accountNum[10];
char name[100];
char idNum[15];
char address[200];
char dateOfBirth[10];
unsigned long long int balance;
char dateOpening[10];
}CUSTOMER;
inserted 10 characters into accountNum (any method, e.g. scanf), and printf it, what is printed out will be accountNum and values in the first word of name (i know that printf will stop at a space or a '\0'). This indicates that a char array does not have a terminating null at the end of the array.
Does this mean that if we have a char array of size 10 (char aString[10]), its maximum number of char it can store is 9 characters? or does things work differently in a struct? It would be nice if someone can help me the concept because it seems like i may have been working with undefined behaviour this whole time.
char aString[10] = "";
What i was taught was that this array can store up to 10 characters (index 0-9)
Yes.
and that at index 10 there is an automatically placed null terminating character
That is wrong. For one thing, index 10 would be out of bounds of the array. The compiler will certainly not initialize data outside of the memory it has reserved for the array.
What actually happens is that the compiler will copy the entire string literal including the null-terminator into the array, and if there are any remaining elements then they will be set to zeros. If the string literal is longer than the array can hold, the compile will simply fail.
In your example, the string literal has a length of 1 char (the null terminator), so the entire array ends up initialized with zeros.
i know that accessing it would not be right
There is no problem with accessing the null terminator, as long as it is inside the bounds of the array.
such that if we use string handling functions (printf, scanf, strcmp, etc.) they would know when the string stops.
Yes, they expect C-style strings and so will look for a null terminator - unless they are explicitly told the actual string length, ie by using a precision modifier for %s, or using strncmp(), etc.
However when I tried making a struct data type like below,
<snip>
inserted 10 characters into accountNum (any method, e.g. scanf), and printf it, what is printed out will be accountNum and values in the first word of name
That means you either forgot to null-terminate accountNum, or you likely overflowed it by writing too many characters into it. For instance, that is very easy to do when misusing scanf(), strcpy(), etc.
i know that printf will stop at a space or a '\0'
printf() does not stop on a space, only on a null terminator. Unless you tell it the max length explicitly, eg:
CUSTOMER c;
strncpy(c.accountNum, "1234567890", 10); // <-- will not be null terminated!
printf("%.10s", c.accountNum); // <-- stops after printing 10 chars!
If it has not encountered a null terminator by the time it reaches the 10th character, it will stop itself.
This indicates that a char array does not have a terminating null at the end of the array.
An array is just an array, there is no terminator, only a size. If you want to treat a character array as a C-style string, then you are responsible for making sure the array contains a nul character in it. But that is just semantics of the character data, the compiler will not do anything to ensure that behavior for you (except for in the one case of initializing a character array with a string literal).
Does this mean that if we have a char array of size 10 (char aString[10]), its maximum number of char it can store is 9 characters?
Its maximum storage will always be 10 chars, period. But if you want to treat the array as a C-style string, then one of those chars must be a nul.
or does things work differently in a struct?
No. Where an array is used does not matter. The compiler treats all array the same, regardless of context (except for the one special case of initializing a character array with a string literal).
What i was taught was that this array can store up to 10 characters (index 0-9) and that at index 10 there is an automatically placed null terminating character (i know that accessing it would not be right) such that if we use string handling functions (printf, scanf, strcmp, etc.) they would know when the string stops.
Yes, but accessing the null terminating character is absolutely safe.
inserted 10 characters into accountNum (any method, e.g. scanf), and printf it, what is printed out will be accountNum and values in the first word of name (i know that printf will stop at a space or a '\0'). This indicates that a char array does not have a terminating null at the end of the array.
printf does not stop for a space, only for a null terminating character. In this case, printf will print all characters until it sees '\0'.
Does this mean that if we have a char array of size 10 (char aString[10]), its maximum number of char it can store is 9 characters?
Yes.
or does things work differently in a struct?
There is no difference.

Char string length not getting initialized properly despite literally putting in the integer size I want it to be?

I'm working with char arrays in C. I'm setting the size in a previous step. When I print it out it clearly shows the num_digits as 1.
But then when I put it in to set the size of a char array to make it a char array of size num_digits, its setting the size of the array as 6.
In the next step when I print strlen(number_array), it prints 6. Printing it out I get something with a lot of question marks. Does anyone know why this is happening?
int num_digits = get_num_digits(number);
printf("Num digits are %d\n", num_digits);
char number_array[num_digits];
printf("String len of array: %d\n", strlen(number_array));
You need to null terminate your array.
char number_array[num_digits + 1];
number_array[num_digits] = '\0';
Without this null terminator, C has no way of know when you've reached the end of the array.
just use 'sizeof' instead of 'strlen'
printf("String len of array: %d\n", sizeof(number_array));
There are a couple possible issues I see here:
As noted in Michael Bianconi's answer, C character arrays (often called strings) require null terminators. You would explicitly set this this with something like:
number_array[number + 1] = '\0'; /* See below for why number + 1 */
Rather than just setting the last element to null, pre-initializing the entire character array to nulls might be helpful. Some compilers may do this for you, but if not you'll need to do this explicitly with something like:
for (int i = 0; i < num_digits + 1; i ++) number_array[i] = '\0';
Note that with gcc I had to use C99 mode using -std=c99 to get this to compile, as the compiler didn't like the initialization within the for statement.
Also, the code presented sets the length of the character array to be the same length as number's length. We don't know what get_num_digits returns, but if it returns the actual number of significant digits in an integer, this will come up one short (see above and other answer), as you need an extra character for the null terminator. An example: if the number is 123456 and get_number_digits returns 6, you would would need to set the length of number_array to 7, instead of 6 (i.e. number + 1).
char number_array[num_digits]; allocates some space for a string. It's an array of num_digits characters. Strings in C are represented as an array of characters, with a null byte at the end. (A null byte has the value zero, not to be confused with the digit character '0'.) So this array has room for a string of up to num_digits - 1 characters.
sizeof(number_array) gives you the array storage size. That's the total amount of space you have for a string plus its null terminator. At any given time, the array can contain a string of any length up to number_array - 1, or it might not contain a string at all if the array doesn't contain a null terminator.
strlen(number_array) gives you the length of the string contained in the array. If the array doesn't contain a null terminator, this call may return a garbage value or crash your program (or make demons fly out of your nose, but most computers fortunately lack the requisite hardware).
Since you haven't initialized number_array, it contains whatever happened to be there in memory before. Depending on how your system works, this may or may not vary from one execution of the program to the next, and this certainly does vary depending on what the program has been doing and on the compiler and operating system.
What you need to do is:
Give the array enough room for the null terminator.
Initialize the array to an empty string by making setting the first character to zero.
Optionally, initialize the whole array to zero. This is not necessary, but it may simplify further work with the array.
Use %zu rather than %d to print a size. %d is for an int, but sizeof and strlen return a size_t, which depending on your system may or may not be the same size of integers.
char number_array[num_digits + 1];
number_array[0] = 0; // or memset(number_array, 0, sizeof(number_array));
printf("Storage size of array: %zu\n", sizeof(number_array));
printf("The array contains an empty string: length=%zu\n", strlen(number_array));

whats the difference between inclusion of null character and exclusion in character array?

What's the difference between the following two character arrays: one with a null character and one without a null character?
char name1[] = {'j','o','h','n'};
char name2[] = {'j','o','h','n','\0'};
If there is a difference between name1 and name2 how does strlen work on name1 since it has no null character?
What would the result be for
printf("%d", name1[5] == '\0');
I expected it to be 0 but got 1
how does strlen work on name1 since it has no null character.
It doesn't. This would invoke undefined behaviour.
I expected it to be 0 but got 1
Your code snippet tries to access name1[5]. Given that name is a char array of size 4, you are accessing memory that has nothing to do with that array. Possibly at the time of execution that memory happened to contain a null character, leading to this result. This cannot be predicted however, and so the behaviour is undefined.
name1 doesn't define a C-string, but name2 does.
A C-string is a sequence of chars with the last one being the NUL char. C-string is not a type, you don't have string type in C; but the standard defines the concept of C-string. strlen should be used on C-string.
You defined arrays of chars. That is a type in C: a sequence of chars. Then some arrays of chars contains C-string, some others does not. strlen should not be used on arrays of chars that do not contain C-string.
name1[5] doesn't exists, that array contains only 5 chars (0 to 4).
the differences between the first and second arrays?
1) the first array is 4 characters while the second array is 5 characters
2) cannot use functions like strlen(),strcpy(),strcmp()` on the first array, but can use those functions on the second array
size of name1 is not the size of name2
size of name1 is 4
size of name2 is 5
It knows from array
Someone forgot to count from 0 not 1 when addressing an array. That is always annoying.
name1[5] points to a forbidden memory address. This is what is called a buffer overflow.

What if a null character is present in the middle of a string?

I understand that the end of a string is indicated by a null character, but i cannot understand the output of the following code.
#include <stdio.h>
#include <string.h>
int
main(void)
{
char s[] = "Hello\0Hi";
printf("%d %d", strlen(s), sizeof(s));
}
OUTPUT: 5 9
If strlen() detects the end of the string at the end of o, then why doesn't sizeof() do the same thing? Even if it doesn't do the same thing, isn't '\0' A null character (i.e, only one character), so shouldn't the answer be 8?
The sizeof operator does not give you the length of a string but instead the size of the type of it's operand. Since in your code the operand is an array, sizeof is giving you the size of the array including both null characters.
If it were like this
const char *string = "This is a large text\0This is another string";
printf("%zu %zu\n", strlen(string), sizeof(string));
the result will be very different because string is a pointer and not an array.
Note: Use the "%zu" specifier for size_t which is what strlen() returns, and is the type of the value given by sizeof.
strlen() doesn't care about the actual size of the string. It looks for a null byte and stops when it sees the first null byte.
But sizeof() operator knows the total size. It doesn't care about what bytes you are in the string literal. You might as well have all null bytes in the string and sizeof() would still give the correct size of the array (strlen() would retrun 0 in that case).
They are not comparable; they do different things.
If strlen() detects the end of the string at the end of o, then why doesn't sizeof() do the same thing?
strlen only works for string (character array), whereas sizeof works for every data type. sizeof calculates the exact memory spaces for any given data type; whereas strlen provides the length of a string (NOT including the NULL terminator \0). So in normal cases, this is true for a typical character array s:
char s[] = "Hello";
strlen( s ) + 1 = sizeof( s ); // +1 for the \0
In your case it's different because you have a NULL terminator in the middle of character array s:
char s[] = "Hello\0Hi";
Here, strlen would detect the first \0 and gives the length as 5. The sizeof, however, will calculate the total number of spaces enough to hold the character arrays, including two \0, so that's why it gives 9 as the second output.
strlen() computes the length of the string. This is done by returning the amount of characters before (and not including) the '\0' character. (See the manual page below.)
sizeof() returns the amount of bytes of the given variable (or data-type). Note that your example "Hello\0Hi" has 9 characters. But you don't seem to understand where character 9 comes from in your question. Let me explain the given string first. Your example string is:
"Hello\0Hi"
This can be written as the following array:
['H', 'e', 'l', 'l', 'o', '\0', 'H', 'i', '\0']
Note the last '\0' character. When using the string quotes the compiler ends the string with an '\0' character. This means "" also is ['\0'] and thus has 1 element.
BEWARE that sizeof() does NOT return the number of elements in the array. It returns the amount of bytes. char is 1 byte and therefor sizeof() does returns the number of elements. But if you used any other datatype, for example if you would call sizeof() on [1, 2, 3, 4] it would return 16. Since int is 4 bytes and the array has 4 elements.
BEWARE that passing an array as parameter will only passes the pointer. If you would pass s to another function and call sizeof() it will return the size of the pointer, which is the same as sizeof(void *). This is a fixed length independent from the array.
STRLEN(3) BSD Library Functions Manual STRLEN(3)
NAME
strlen, strnlen -- find length of string
LIBRARY
Standard C Library (libc, -lc)
SYNOPSIS
#include <string.h>
size_t
strlen(const char *s);
size_t
strnlen(const char *s, size_t maxlen);
DESCRIPTION
The strlen() function computes the length of the string s. The strnlen()
function attempts to compute the length of s, but never scans beyond the
first maxlen bytes of s.
RETURN VALUES
The strlen() function returns the number of characters that precede the
terminating NUL character. The strnlen() function returns either the
same result as strlen() or maxlen, whichever is smaller.
SEE ALSO
string(3), wcslen(3), wcswidth(3)
STANDARDS
The strlen() function conforms to ISO/IEC 9899:1990 (``ISO C90'').
The strnlen() function conforms to IEEE Std 1003.1-2008 (``POSIX.1'').
BSD February 28, 2009 BSD
As name literal itself implies string literal is a sequence of characters enclosed in double quotes. Implicitly this sequence of characters is appended by a terminating zero.
So any character enclosed in the double quotes is a part of the string literal.
When a string literal is used to initialize a character array all its characters including the terminating zero serve as initializers of the corresponding elements of the character array.
Each string literal in turn has type of a character array.
For example this string literal "Hello\0Hi" in C has type char[9]: 8 characters enclosed in the quotes plus the implicit terminating zero.
So in memory this string literal is stored like
{ 'H', 'e', 'l', 'l', 'o', '\0', 'H', 'i', '\0' }
Operator sizeof returns the number of bytes occupied by an object. So for the string literal above the operator sizeof will return value 9- it is the number of bytes occupied by the literal in memory.
If you wrote "Hello\0Hi" then the compiler may not itself just remove this part Hi from the literal. It has to store it in memory along with other characters of the literal enclosed in quotes.
The sizeof operator returns the size in bytes of any object in C not only of character arrays.
In general character arrays can store any raw data for example some binary data read from a binary file. In this case this data is not considered by the user and by the program like strings and as result are processed differently than strings.
Standard C function strlen is specially written for character arrays that to find the length of a stored string in a character array. It does not know what data are stored in an array and how they were written in it. All what it does is searches the first zero character in a character array and returns the number of characters in the character array before the zero character.
You can store in one character array several strings sequentially. For example
char s[12];
strcpy( s, "Hello" );
strcpy( s + sizeof( "Hello" ), "World" );
puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"
If you would define a two dimensional array like this
char t[2][6] = { "Hello", "World" };
then in memory it will be stored the same way as the one-dimensional array above. So you can write
char *s = ( char * )t;
puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"
Another example. Standard C function strtok can split one string stored in a character array to several strings substituting the specified by the user delimiters with zero bytes. As result the character array will contain several strings.
For example
char s[] = "Hello World";
printf( "%zu\n", sizeof( s ) ); // outputs 12
strtok( s, " " );
puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"
printf( "%zu\n", sizeof( s ) ); // outputs 12
The last printf statement will output the same value equal to 12 because the array occupies the same number of bytes. Simply one byte in the memory allocated for the array was changed from ' ' to '\0'.
Character arrays in C and pointers to character array are not same thing. Though you can print addresses and get same value.
An array in C is made up of following things.
Size of array
Its address / pointer
Homogenous Type of elements
Where a pointer is made up of just:
Address
Type information
char s[] = "Hello\0Hi";
printf("%d %d", strlen(s), sizeof(s));
Here you are calculating the size of array (which is s variable) using sizeof() which is 9.
But if you treat this character array as string than array(string now) looses its size information and become just a pointer to a character. Same thing happens when you try to print character array using %s.
So strlen() and %s treat character array as string and it utilize its address information only. You can guess, strlen() keep incrementing the pointer to calculate the length up-to first null character. When it encounter a null character you get a length up-to that point.
So the strlen() gives you 5 and do not count null character.
So sizeof() operator tells only the size of its operand. If you give it array variable than it utilize the array size information and tells the size regardless of null character position.
But if you give sizeof() the pointer to array of characters than it finds pointer without the size information and prints the size of pointer which is usually 64bit/8byte on 64bit systems or 32bit/4bytes on 32bit systems.
One more thing if you initialize your character arrays using double quotes like "Hello" than C adds a null character otherwise it does not in case of {'H','e','l','l','o'}.
Using gcc compiler. Hope it will help only to understand.

Resources