String termination C/C++ char = 0 - c

#include<stdio.h>
#include<string.h>
void terminateString(char *str){
str[3] = 0;
printf("string after termination is:%s\n",str);
}
int main(){
char str[]="abababcdfef";
terminateString(str);
return 0;
}
Output:
string after termination is:aba
We are only assigning element at index '3' to 0, but why are all characters after that index are ignored? Can someone please explain this behavior?

We are only assigning element at index '3' to 0, but why do all
characters after that index are ignored? Can someone please explain
this behavior?
The convention with a zero-terminated string is that the 0 byte is what indicates the end of the string. So when printf() encounters the zero-byte at position 3, it stops printing.

The ISO C standard defines a string as follows (see, for example, C11 7.1.1 Definition of terms), emphasis is mine:
A string is a contiguous sequence of characters terminated by and including the first null character.
Hence, when you have the character sequence abababcdfef\0, that is indeed a string.
However, when you put a null at offset 3, the string is not aba\0abcdfef\0 but, by virtue of the fact it's only a string up to and including the first null, it is aba\0.

C-string is Null-terminated string. With null-terminated it means "a null character terminates (indicates the end of) the string".
A null character is a character with all its bits set to 0, or \0, presented in memory as 0x00.
When you set str[3] = 0 you're changing str[3] to the terminator token, so when printf reads the terminator, it thinks the string is end and only prints "aba".

What you are demonstrating is the difference in c++ between strings and char arrays. Strings are a sequence of characters that continue up to and including the first null character. A character array is a memory allocation unit. A string might not use the entire character array allocated for it (indeed it is possible that it may even exceed the bounds of the containing array). If you want to diagnostically print an array rather than a string, you would need to iterate over the array in a loop. See below:
#include<stdio.h>
#include<string.h>
void terminateString(char *str){
str[3] = 0;
printf("string after termination is:%s\n",str);
}
int main(){
char str[]="abababcdfef";
terminateString(str);
for (int i = 0; i < sizeof(str)/sizeof(str[0]); i++) {
(str[i] != 0) ? printf("%c ", str[i]) : printf("\\0 ");
}
printf("\n");
return 0;
}
// OUTPUT
// string after termination is:aba
// a b a \0 a b c d f e f \0

c/c++ doesn't really distinguish between 0, '\0', and NULL, they're all just 0 in memory. c style strings are a sequence of characters that end with '\0', so every function that works with them ends after it finds this char. When you assign str[3]=0; it's the same as str[3]='\0'; i.e. stop the string after 3 chars. If you want the letter 0, do str[3]='0';, where the single quotes let the compiler know you want the character 0, ascii 48
Edit:
Note that NULL is a macro that evaluates to 0, not the same as nullptr
apparently starting with C++11 NULL can evaluate to nullptr, in C or C++98, it is 0
http://www.cplusplus.com/reference/cstring/NULL/

Related

Array showing random characters at the end

I wanted to test things out with arrays on C as I'm just starting to learn the language. Here is my code:
#include <stdio.h>
main(){
int i,t;
char orig[5];
for(i=0;i<=4;i++){
orig[i] = '.';
}
printf("%s\n", orig);
}
Here is my output:
.....�
It is exactly that. What are those mysterious characters? What have i done wrong?
%s with printf() expects a pointer to a string, that is, pointer to the initial element of a null terminated character array. Your array is not null terminated.
Thus, in search of the terminating null character, printf() goes out of bound, and subsequently, invokes undefined behavior.
You have to null-terminate your array, if you want that to be used as a string.
Quote: C11, chapter §7.21.6.1, (emphasis mine)
s
If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. If the
precision is specified, no more than that many bytes are written. If the
precision is not specified or is greater than the size of the array, the array shall
contain a null character.
Quick solution:
Increase the array size by 1, char orig[6];.
Add a null -terminator in the end. After the loop body, add orig[i] = '\0';
And then, print the result.
char orig[5];//creates an array of 5 char. (with indices ranging from 0 to 4)
|?|?|?|0|0|0|0|0|?|?|?|?|
| ^memory you do not own (your mysterious characters)
^start of orig
for(i=0;i<=4;i++){ //attempts to populate array with '.'
orig[i] = '.';
|?|?|?|.|.|.|.|.|?|?|?|?|
| ^memory you do not own (your mysterious characters)
^start of orig
This results in a non null terminated char array, which will invoke undefined behavior if used in a function that expects a C string. C strings must contain enough space to allow for null termination. Change your declaration to the following to accommodate.
char orig[6];
Then add the null termination to the end of your loop:
...
for(i=0;i<=4;i++){
orig[i] = '.';
}
orig[i] = 0;
Resulting in:
|?|?|?|.|.|.|.|.|0|?|?|?|
| ^memory you do not own
^start of orig
Note: Because the null termination results in a C string, the function using it knows how to interpret its contents (i.e. no undefined behavior), and your mysterious characters are held at bay.
There is a difference between an array and a character array. You can consider a character array is an special case of array in which each element is of type char in C and the array should be ended (terminated) by a character null (ASCII value 0).
%s format specifier with printf() expects a pointer to a character array which is terminated by a null character. Your array is not null terminated and hence, printf function goes beyond 5 characters assigned by you and prints garbage values present after your 5th character ('.').
To solve your issues, you need to statically allocate the character array of size one more than the characters you want to store. In your case, a character array of size 6 will work.
#include <stdio.h>
int main(){
int i,t;
char orig[6]; // If you want to store 5 characters, allocate an array of size 6 to store null character at last position.
for (i=0; i<=4; i++) {
orig[i] = '.';
}
orig[5] = '\0';
printf("%s\n", orig);
}
There is a reason to waste one extra character space for the null character. The reason being whenever you pass any array to a function, then only pointer to first element is passed to the function (pushed in function's stack). This makes for a function impossible to determine the end of the array (means operators like sizeof won't work inside the function and sizeof will return the size of the pointer in your machine). That is the reason, functions like memcpy, memset takes an additional function arguments which mentions the array sizes (or the length upto which you want to operate).
However, using character array, function can determine the size of the array by looking for a special character (null character).
You need to add a NUL character (\0) at the end of your string.
#include <stdio.h>
main()
{
int i,t;
char orig[6];
for(i=0;i<=4;i++){
orig[i] = '.';
}
orig[i] = '\0';
printf("%s\n", orig);
}
If you do not know what \0 is, I strongly recommand you to check the ascii table (https://www.asciitable.com/).
Good luck
prinftf takes starting pointer of any memory location, array in this case and print till it encounter a \0 character. These type of strings are called as null terminated strings.
So please add a \0 at the end and put in characters till (size of array - 2) like this :
main(){
int i,t;
char orig[5];
for(i=0;i<4;i++){ //less then size of array -1
orig[i] = '.';
}
orig[i] = '\0'
printf("%s\n", orig);
}

Does a C for-loop exit on encountering a '\0' character?

I was going through a hash function and encountered a condition where the for loop is supposed to exit when a '\0' (NIL) character comes.
unsigned int hash_string (const char *s)
{
register unsigned int i;
for (i = 0; *s; s++) { // This for loop is supposed to end
// when a '\0' comes?
i *= 16777619;
i ^= *s;
}
return i;
}
As far as I know a C-Loop is supposed to end if a condition returns 0.
Here, however, there is no such condition and it still works?
Could someone also tell on what all conditions does a loop succed/fail?
The null character has the value of 0, so in your example, *s will evaluate to zero if it corresponds to the null termination of the character string.
From 5.2.1 Character sets
... A byte with all bits set to 0, called the null character, shall
exist in the basic execution character set; it is used to terminate a
character string.
Then in 6.4.4.4 Character constants
12 EXAMPLE 1 The construction '\0' is commonly used to represent the
null character.
*s de-references the character pointed by s.
If the character code is 0 the loop breaks and it passes for all values other than 0.
\0 is guaranteed to be 0, that why it is guaranteed that loop will terminate at string end when it encounters NUL character.
One of the reason for choosing \0 as string termination in C is to make constructs like this possible.
When *s evaluates to 0 or false, which is convertable from one another, the loop ends.
In fact, the integer representation for character \0 is 0. So it's the same thing.

sizeof() showing different output

Here is a snippet of C99 code:
int main(void)
{
char c[] = "\0";
printf("%d %d\n", sizeof(c), strlen(c));
return 0;
}
The program is outputting 2 0. I do not understand why sizeof(c) implies 2 seeing as I defined c to be a string literal that is immediately NULL terminated. Can someone explain why this is the case? Can you also provide a (some) resource(s) where I can investigate this phenomenon further on my own time.
didn't understand why size of is showing 2.
A string literal has an implicit terminating null character, so the ch[] is actually \0\0, so the size is two. From section 6.4.5 String literals of the C99 standard (draft n1124), clause 5:
In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals
As for strlen(), it stops counting when it encounters the first null terminating character. The value returned is unrelated to the sizeof the array that is containing the string. In the case of ch[], zero will be returned as the first character in the array is a null terminator.
In C, "" means: give me a string and null terminate it for me.
For example arr[] = "A" is completely equivalent to arr[] = {'A', '\0'};
Thus "\0" means: give me a string containing a null termination, then null terminate it for me.
arr [] = "\0"" is equivalent to arr[] = {'\0', '\0'};
"\0" is not the same as "". String literals are nul-terminated, so the first is the same as the compound literal (char){ 0, 0 } whereas the second is just (char){ 0 }. strlen finds the first character to be zero, so assumes the string ends. That doesn't mean the data ends.
When you declare a string literal as :
char c[]="\0";
It already has a '\0' character at the end so the sizeof(c) gives 2 because your string literal is actually : \0\0.
strlen(c) still gives 0 because it stops at the first \0.
strlen measures to the first \0 and gives the count of characters before the \0, so the answer is zero
sizeof on a char x[] gives the amount of storage used in bytes which is two, including the explict \0 at the end of the string
Great question. Consider this ...
ubuntu#amrith:/tmp$ more x.c
#include <stdio.h>
#include <string.h>
int main() {
char c[16];
printf("%d %d\n",sizeof(c),strlen(c));
return 0;
}
ubuntu#amrith:/tmp$ ./x
16 0
ubuntu#amrith:/tmp$
Consider also this:
ubuntu#amrith:/tmp$ more x.c
#include <stdio.h>
#include <string.h>
int main() {
int c[16];
printf("%d\n",sizeof(c));
return 0;
}
ubuntu#amrith:/tmp$ ./x
64
ubuntu#amrith:/tmp$
When you initialize a variable as an array (which is effectively what c[] is), sizeof(c) will give you the allocated size of the array.
The string "\0" is the literal string \NUL\NUL which takes two bytes.
On the other hand, strlen() computes the string length which is the offset into the string of the first termination character and that turns out to be zero and hence you get 2, 0.

Why do we need to add a '\0' (null) at the end of a character array in C?

Why do we need to add a '\0' (null) at the end of a character array in C?
I've read it in K&R 2 (1.9 Character Array). The code in the book to find the longest string is as follows :
#include <stdio.h>
#define MAXLINE 1000
int readline(char line[], int maxline);
void copy(char to[], char from[]);
main() {
int len;
int max;
char line[MAXLINE];
char longest[MAXLINE];
max = 0;
while ((len = readline(line, MAXLINE)) > 0)
if (len > max) {
max = len;
copy(longest, line);
}
if (max > 0)
printf("%s", longest);
return 0;
}
int readline(char s[],int lim) {
int c, i;
for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0'; //WHY DO WE DO THIS???
return i;
}
void copy(char to[], char from[]) {
int i;
i = 0;
while ((to[i] = from[i]) != '\0')
++i;
}
My Question is why do we set the last element of the character array as '\0'?
The program works fine without it...
Please help me...
You need to end C strings with '\0' since this is how the library knows where the string ends (and, in your case, this is what the copy() function expects).
The program works fine without it...
Without it, your program has undefined behaviour. If the program happens to do what you expect it to do, you are just lucky (or, rather, unlucky since in the real world the undefined behaviour will choose to manifest itself in the most inconvenient circumstances).
In c "string" means a null terminated array of characters. Compare this with a pascal string which means at most 255 charactes preceeded by a byte indicating the length of the string (but requiring no termination).
Each appraoch has it's pros and cons.
Especially string pointers pointed to array of characters without length known is the only way NULL terminator will determine the length of the string.
Awesome discussion about NULL termination at link
Because C defines a string as contiguous sequence of characters terminated by and including the first null character.
Basically the authors of C had the choice to define a string as a sequence of characters + the length of string or to use a magic marker to delimit the end of the string.
For more information on the subject I suggest to read this article:
"The Most Expensive One-byte Mistake" by Poul-Henning Kamp
http://queue.acm.org/detail.cfm?id=2010365
You have actually written the answer yourself right here:
void copy(char to[], char from[]) {
int i;
i = 0;
while ((to[i] = from[i]) != '\0')
++i;
}
The loop in this function will continue until it encounters a '\0' in the array from. Without a terminating zero the loop will continure an unknown number of steps, until it encounters a zero or an invalid memory region.
Really, you do not need to end a character array by \0. It is the char*, or the C representation of the string that needs to be ended by it.
As for array, you have to add a \0 after its end if you want to transfer it to the string (representer by char*).
On the other hand, you need to have \0 at the end of the array, if you want to address it as char* and plan to use char* functions on it.
'\0' in the array indicates the end of string, which means any character after this character is not considered part of the string.It doesn’t mean they are not part of the character array. i.e., we can still access these characters by indexing but they are just not part when we invoke string related things to this character array.
For a string to be in proper format and be able to work properly with the string functions, it must be a null-terminated character array. Without NULL, the programs show undefined behavior when we invoke string functions on the character array. Even though we might get lucky with the results most of the times, it still is an undefined behavior.
I've just looked it up
If your array is considered as string
Which means like this char array[MAX]="string";
Or like this scanf("%s",array);
Or char* table;
Then the NULL character '\0' will append automatically as the end of the characters on that table
But if you initialized it like this char array[MAX]={'n','o','t','s','t,'r'};
Or you fill it using character by character with %c format
for(int i=0;i<MAX;i++)
scanf("%c",&array[i]);
Or getchar() instead of scanf("%c",...)
Then you have to add '\0' by yourself
Because now it considered as any other array's type (int,float...) So the cases that we consider as empty are actually filled by random numbers or characters depends on the type
Meanwhile in the case of a string type the next character after the last considered character is by default '\0'
for more explanation the length of this char array[]="12345" is 6
The array[5]=='\0' will return 1
by other words you can't define a string array like this char array[3]="123" because we left no room for the '\0' that has to append automatically
last example char array[7]={'t','e','s','t','\0'};
Here array[4] is the NULL character
array[5] and array[6] are random values
But if it was string then "test" array[4] and 5 and 6 are all filled by the NULL character (NULL character can refers to any white_space as I think so tab '\t' and enter '\n' are also NULL characters just like '\0' which may refer to spacebar)
nota ben: we can't assign array[7] or more as we all know but if you try to output it, it'll show a random value as any empty case
It is string terminating symbol,When this is encountered ,compiler comes to know that your string is ended.

How to print a string with embedded nulls so that "(null)" is substituted for '\0'

I have a string I composed using memcpy() that (when expanded) looks like this:
char* str = "AAAA\x00\x00\x00...\x11\x11\x11\x11\x00\x00...";
I would like to print every character in the string, and if the character is null, print out "(null)" as a substitute for '\0'.
If I use a function like puts() or printf() it will just end at the first null and print out
AAAA
So how can I get it to print out the actual word "(null)" without it interpreting it as the end of the string?
You have to do that mapping yourself. If you want to, that is. In C, strings are null-terminated. So, if you use a formatted output function such as printf or puts and ask it to print a string (via the format specifier %s) it'd stop printing str as soon as it hits the first null. There is no null word in C. If you know exactly how many characters you have in str you might as well loop over them and print the characters out individually, substituting the 0 by your chosen mnemonic.
The draft says 7.21.6.1/8:
p The argument shall be a pointer to void. The value of the pointer is
converted to a sequence of printing characters, in an
implementation-defined manner.
However, the following:
$ cat null.c
#include <stdio.h>
int main() {
printf("%p\n", (void *)0);
}
produces:
00000000
on both gcc 4.6 and clang 3.2.
However, on digging deeper:
$ cat null.c
#include <stdio.h>
int main() {
printf("%s\n", (void *)0);
}
does indeed produce the desired output:
(null)
on both gcc and clang.
Note that the standard does not mandate this:
s If no l length modifier is present, the argument shall be a pointer
to the initial element of an array of character type.280) Characters
from the array are written up to (but not including) the terminating
null character. If the precision is specified, no more than that many
bytes are written. If the precision is not specified or is greater
than the size of the array, the array shall contain a null character.
Relying on this behavior may lead to surprises!
Instead of printing the string with %s , you will have to come up with a for loop that checks a condition whther a given char in your char array is a \0 and then print the NULL
From C++ Reference on puts() (emphasis mine):
Writes the C string pointed by str to stdout and appends a newline
character ('\n'). The function begins copying from the address
specified (str) until it reaches the terminating null character
('\0'). This final null-character is not copied to stdout.
To process data such as you have, you'll need to know the length. From there, you can simply loop across the characters:
/* ugly example */
char* str = "AAAA\x00\x00\x00...\x11\x11\x11\x11\x00\x00...";
int len = ...; /* get the len somehow or know ahead of time */
for(int i = 0; i < len; ++i) {
if('\0' == str[i]) {
printf(" (null) ");
} else {
printf(" %c ", str[i]);
}
}
One of the key cornerstones of C is strings are terminated by '\0'. Everyone lives by that rule. so I suggest you not think of your string as a string but as an array of characters.
If you traverse the array and test for '\0', you can print "(null)" out in place of the character. Here is an example. Please note, your char * str was created either as a char array or on the stack using malloc. This code needs to know the actual buffer size.
char* str = "AAAA\x00\x00\x00...\x11\x11\x11\x11\x00\x00...";
int iStrSz = <str's actual buffer size>
int idx;
for(idx=0; idx<iStrSz; idx++)
{
if('\0' == *(str + idx)
{
sprintf("%s", "(null)");
}
else
{
putchar(*(str + idx));
}
}
printf("%s", "\n");

Resources