Why adding a Null Character in a String array? [duplicate] - c

This question already has answers here:
Why are strings in C++ usually terminated with '\0'?
(5 answers)
Why do we need to add a '\0' (null) at the end of a character array in C?
(9 answers)
Closed 9 years ago.
I know that we have to use a null character to terminate a string array like this:
char str[5] = { 'A','N','S','\0' };
But I just wanted to know why is it essential to use a null character to terminate an array like this?
Also why don't we add a null charater to terminate these :-
char str1[5]="ANS";

The NULL-termination is what differentiates a char array from a string (a NULL-terminated char-array) in C. Most string-manipulating functions relies on NULL to know when the string is finished (and its job is done), and won't work with simple char-array (eg. they'll keep on working past the boundaries of the array, and continue until it finds a NULL somewhere in memory - often corrupting memory as it goes).
In C, 0 (the integer value) is considered boolean FALSE - all other values are considered TRUE. if, for and while uses 0 (FALSE) or non-zero (TRUE) to determent how to branch or if to loop. char is an integer type, an the NULL-character (\0) is actually and simply a char with the decimal integer value 0 - ie. FALSE. This make it very simple to make functions for things like manipulating or copying strings, as they can safely loop as long as the character it's working on is non-zero (ie. TRUE) and stop when it encounters the NULL-character (ie. FALSE) - as this signifies the end of the string. It makes very simple loops, since we don't have to compare, we just need to know if it's 0 (FALSE) or not (TRUE).
Example:
char source[]="Test"; // Actually: T e s t \0 ('\0' is the NULL-character)
char dest[8];
int i=0;
char curr;
do {
curr = source[i];
dest[i] = curr;
i++;
} while(curr); //Will loop as long as condition is TRUE, ie. non-zero, all chars but NULL.

It isnt essential but if you are using any of the standard libraries, they all expect it.

Related

char array in a struct data type

I actually have a question regarding the concept of a char array, especially the one which is declared and initialized like below.
char aString[10] = "";
What i was taught was that this array can store up to 10 characters (index 0-9) and that at index 10 there is an automatically placed null terminating character (i know that accessing it would not be right) such that if we use string handling functions (printf, scanf, strcmp, etc.) they would know when the string stops.
However when I tried making a struct data type like below,
typedef struct customer{
char accountNum[10];
char name[100];
char idNum[15];
char address[200];
char dateOfBirth[10];
unsigned long long int balance;
char dateOpening[10];
}CUSTOMER;
inserted 10 characters into accountNum (any method, e.g. scanf), and printf it, what is printed out will be accountNum and values in the first word of name (i know that printf will stop at a space or a '\0'). This indicates that a char array does not have a terminating null at the end of the array.
Does this mean that if we have a char array of size 10 (char aString[10]), its maximum number of char it can store is 9 characters? or does things work differently in a struct? It would be nice if someone can help me the concept because it seems like i may have been working with undefined behaviour this whole time.
char aString[10] = "";
What i was taught was that this array can store up to 10 characters (index 0-9)
Yes.
and that at index 10 there is an automatically placed null terminating character
That is wrong. For one thing, index 10 would be out of bounds of the array. The compiler will certainly not initialize data outside of the memory it has reserved for the array.
What actually happens is that the compiler will copy the entire string literal including the null-terminator into the array, and if there are any remaining elements then they will be set to zeros. If the string literal is longer than the array can hold, the compile will simply fail.
In your example, the string literal has a length of 1 char (the null terminator), so the entire array ends up initialized with zeros.
i know that accessing it would not be right
There is no problem with accessing the null terminator, as long as it is inside the bounds of the array.
such that if we use string handling functions (printf, scanf, strcmp, etc.) they would know when the string stops.
Yes, they expect C-style strings and so will look for a null terminator - unless they are explicitly told the actual string length, ie by using a precision modifier for %s, or using strncmp(), etc.
However when I tried making a struct data type like below,
<snip>
inserted 10 characters into accountNum (any method, e.g. scanf), and printf it, what is printed out will be accountNum and values in the first word of name
That means you either forgot to null-terminate accountNum, or you likely overflowed it by writing too many characters into it. For instance, that is very easy to do when misusing scanf(), strcpy(), etc.
i know that printf will stop at a space or a '\0'
printf() does not stop on a space, only on a null terminator. Unless you tell it the max length explicitly, eg:
CUSTOMER c;
strncpy(c.accountNum, "1234567890", 10); // <-- will not be null terminated!
printf("%.10s", c.accountNum); // <-- stops after printing 10 chars!
If it has not encountered a null terminator by the time it reaches the 10th character, it will stop itself.
This indicates that a char array does not have a terminating null at the end of the array.
An array is just an array, there is no terminator, only a size. If you want to treat a character array as a C-style string, then you are responsible for making sure the array contains a nul character in it. But that is just semantics of the character data, the compiler will not do anything to ensure that behavior for you (except for in the one case of initializing a character array with a string literal).
Does this mean that if we have a char array of size 10 (char aString[10]), its maximum number of char it can store is 9 characters?
Its maximum storage will always be 10 chars, period. But if you want to treat the array as a C-style string, then one of those chars must be a nul.
or does things work differently in a struct?
No. Where an array is used does not matter. The compiler treats all array the same, regardless of context (except for the one special case of initializing a character array with a string literal).
What i was taught was that this array can store up to 10 characters (index 0-9) and that at index 10 there is an automatically placed null terminating character (i know that accessing it would not be right) such that if we use string handling functions (printf, scanf, strcmp, etc.) they would know when the string stops.
Yes, but accessing the null terminating character is absolutely safe.
inserted 10 characters into accountNum (any method, e.g. scanf), and printf it, what is printed out will be accountNum and values in the first word of name (i know that printf will stop at a space or a '\0'). This indicates that a char array does not have a terminating null at the end of the array.
printf does not stop for a space, only for a null terminating character. In this case, printf will print all characters until it sees '\0'.
Does this mean that if we have a char array of size 10 (char aString[10]), its maximum number of char it can store is 9 characters?
Yes.
or does things work differently in a struct?
There is no difference.

How does an array terminate?

As we know a string terminates with '\0'.
It's because to know the compiler that string ended, or to secure from garbage values.
But how does an array terminate?
If '\0' is used it will take it as 0 a valid integer,
So how does the compiler knows the array ended?
C does not perform bounds checking on arrays. That's part of what makes it fast. However that also means it's up to you to ensure you don't read or write past the end of an array. So the language will allow you to do something like this:
int arr[5];
arr[10] = 4;
But if you do, you invoke undefined behavior. So you need to keep track of how large an array is yourself and ensure you don't go past the end.
Note that this also applies to character arrays, which can be treated as a string if it contains a sequence of characters terminated by a null byte. So this is a string:
char str[10] = "hello";
And so is this:
char str[5] = { 'h', 'i', 0, 0, 0 };
But this is not:
char str[5] = "hello"; // no space for the null terminator.
C doesn't provide any protections or guarantees to you about 'knowing the array is ended.' That's on you as the programmer to keep in mind in order to avoid accessing memory outside your array.
C language does not have native string type. In C, strings are actually one-dimensional array of characters terminated by a null character '\0'.
From C Standard#7.1.1p1 [emphasis mine]
A string is a contiguous sequence of characters terminated by and including the first null character. The term multibyte string is sometimes used instead to emphasize special processing given to multibyte characters contained in the string or to avoid confusion with a wide string. A pointer to a string is a pointer to its initial (lowest addressed) character. The length of a string is the number of bytes preceding the null character and the value of a string is the sequence of the values of the contained characters, in order.
String is a special case of character array which is terminated by a null character '\0'. All the standard library string related functions read the input string based on this rule i.e. read until first null character.
There is no significance of null character '\0' in array of any type apart from character array in C.
So, apart from string, for all other types of array, programmer is suppose to explicitly keep the track of number of elements in the array.
Also, note that, first null character ('\0') is the indication of string termination but it is not stopping you to read beyond it.
Consider this example:
#include <stdio.h>
int main(void) {
char str[5] = {'H', 'i', '\0', 'z'};
printf ("%s\n", str);
printf ("%c\n", str[3]);
return 0;
}
When you print the string
printf ("%s\n", str);
the output you will get is - Hi
because with %s format specifier, printf() writes every byte up to and not including the first null terminator [note the use of null character in the strings], but you can also print the 4th character of array as it is within the range of char array str though beyond first '\0' character
printf ("%c\n", str[3]);
the output you will get is - z
Additional:
Trying to access array beyond its size lead to undefined behavior which includes the program may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
It’s just a matter of convention. If you wanted to, you could totally write code that handled array termination (for arrays of any type) via some sentinel value. Here’s an example that does just that, arbitrarily using -1 as the sentinel:
int length(int arr[]) {
int i;
for (i = 0; arr[i] != -1; i++) {}
return i;
}
However, this is obviously utterly unpractical: You couldn’t use -1 in the array any longer.
By contrast, for C strings the sentinel value '\0' is less problematic because it’s expected that normal test won’t contain this character. This assumption is kind of valid. But even so there are obviously many strings which do contain '\0' as a valid character, and null-termination is therefore by no means universal.
One very common alternative is to store strings in a struct that looks something like this:
struct string {
unsigned int length;
char *buffer;
}
That is, we explicitly store a length alongside a buffer. This buffer isn’t null-terminated (although in practice it often has an additional terminal '\0' byte for compatibility with C functions).
Anyway, the answer boils down to: For C strings, null termination is a convenient convention. But it is only a convention, enforced by the C string functions (and by the C string literal syntax). You could use a similar convention for other array types but it would be prohibitively impractical. This is why other conventions developed for arrays. Notably, most functions that deal with arrays expect both an array and a length parameter. This length parameter determines where the array terminates.

Alternative to strlen not breaking on 0 [duplicate]

This question already has answers here:
How to find the size of an array (from a pointer pointing to the first element array)?
(17 answers)
Closed 4 years ago.
Is there any better way of getting the right length of array containing digits?
I have an array of digits: 0, 0, 1 and I try to get length of it. It obviously breaks and returns 0. I am new to C but I tried to make custom strlen function:
int custom_strlen(char *str) {
for(int i = 1; ;i++) {
if (str[i] == 0) {
return i;
}
}
return -47;
}
but it is not that efficient and in some cases returns unexpected values as well. The expected out put would be 3 in this case.
Is there any function to use?
An array of integers is not a string. C arrays do not contain length information inherently. The way strlen works is that C strings are null terminated, meaning the last character is NUL (null character), which is 0. Otherwise, there is just no way to know how long an array is.
I think you may be wanting to do an array of '0','0','1'. Can you post the array you are using?
As mentioned, C strings are null terminated.
The only choices are
Using a string terminated with some special character that you watch for (like a null) or
Keeping track of how long the string is when you create it.
FWIW, if it's not null terminated, it's not actually a string in C, it's just memory that contains chars that you happen to be interpreting as a string.

How to Compare 2 Character Arrays [duplicate]

This question already has answers here:
How do I properly compare strings in C?
(10 answers)
Closed 6 years ago.
How do I compare these two character arrays to make sure they are identical?
char test[10] = "idrinkcoke"
char test2[10] = "idrinknote"
I'm thinking of using for loop, but I read somewhere else that I couldnt do test[i] == test2[i] in C.
I would really appreciate if someone could help this. Thank you.
but I read somewhere else that I couldnt do test[i] == test2[i] in C.
That would be really painful to compare character-by-character like that. As you want to compare two character arrays (strings) here, you should use strcmp instead:
if( strcmp(test, test2) == 0)
{
printf("equal");
}
Edit:
There is no need to specify the size when you initialise the character arrays. This would be better:
char test[] = "idrinkcoke";
char test2[] = "idrinknote";
It'd also be better if you use strncmp - which is safer in general (if a character array happens to be NOT NULL-terminated).
if(strncmp(test, test2, sizeof(test)) == 0)
You can use the C library function strcmp
Like this:
if strcmp(test, test2) == 0
From the documentation on strcmp:
Compares the C string str1 to the C string str2.
This function starts comparing the first character of each string. If
they are equal to each other, it continues with the following pairs
until the characters differ or until a terminating null-character is
reached.
This function performs a binary comparison of the characters. For a
function that takes into account locale-specific rules, see strcoll.
and on the return value:
returns 0 if the contents of both strings are equal

How does this example from K and R work [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How does “while(*s++ = *t++)” work?
I was trying to understand the following example. I am a little confused how this would actually work.
void strcpy(char *s, char *t)
{
while (*s++ = *t++)
;
}
Any help is great. Thanks!
Remember that a string in C is just a pointer to a list of chars, terminated with a \0.
Also remember that \0 (the null byte) is falsy, that is, if it's in a condition, that condition will be false.
This function gets a pointer to the start of the source string and one to the start of the destination string.
It then loops over each character in the source string, copying the character to the destination string. When the condition is evaluated, the post-increment ++ will advance the pointer forward a byte.
This implementation also has an issue, as far as I can tell. If the source string isn't the exact same length, it won't have a null terminator at the end. For safety's sake, you should tack a \0 at the end of the destination string.
The value of *s++ = *t++ is the value of the right side of the assignment, *t. So the loop will terminate when *t is 0, i.e., at the end of the string pointed by t. The condition also increments the value of t (and s), after assigning the character pointed by t to the char pointed by s. There is nothing in the loop body, the condition by itself does the copy.

Resources