Printing an array of characters with "while" - c

So here is the working version:
#include <stdio.h>
int main(int argc, char const *argv[]) {
char myText[] = "hello world\n";
int counter = 0;
while(myText[counter]) {
printf("%c", myText[counter]);
counter++;
}
}
and in action:
Korays-MacBook-Pro:~ koraytugay$ gcc koray.c
Korays-MacBook-Pro:~ koraytugay$ ./a.out
hello world
My question is, why is this code even working? When (or how) does
while(myText[counter])
evaluate to false?
These 2 work as well:
while(myText[counter] != '\0')
while(myText[counter] != 0)
This one prints garbage in the console:
while(myText[counter] != EOF)
and this does not even compile:
while(myText[counter] != NULL)
I can see why the '\0' works, as C puts this character at the end of my array in compile time. But why does not NULL work? How is 0 == '\0'?

AS for your last question,
But why does not NULL work?
Usually, NULL is a pointer type. Here, myText[counter] is a value of type char. As per the conditions for using the == operator, from C11 standard, chapter 6.5.9,
One of the following shall hold:
both operands have arithmetic type;
both operands are pointers to qualified or unqualified versions of compatible types;
one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void; or
one operand is a pointer and the other is a null pointer constant.
So, it tells, you can only compare a pointer type with a null pointer constant ## (NULL).
After that,
When (or how) does while(myText[counter]) evaluate to false?
Easy, when myText[counter] has got a value of 0.
To elaborate, after the initialization, myText holds the character values used to initialize it, with a "null" at last. The "null" is the way C identifies the string endpoint. Now, the null, is represented by a values of 0. So, we can say. when the end-of-string is reached, the while() is FALSE.
Additional explanation:
while(myText[counter] != '\0') works, because '\0' is the representation of the "null" used as the string terminator.
while(myText[counter] != 0) works, because 0 is the decimal value of '\0'.
Above both statements are equivalent of while(myText[counter]).
while(myText[counter] != EOF) does not work because a null is not an EOF.
Reference: (#)
Reference: C11 standard, chapter 6.3.2.3, Pointers, Paragraph 3
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.
and, from chapter, 7.19,
NULL
which expands to an implementation-defined null pointer constant
Note: In the end, this question and realted answers will clear the confusion, should you have any.

In C, any non-zero value evaluates to true. C strings are null-terminated. That is, there is a special zero-value character after the last character in the string.
And so when the null terminator is reached, the zero value evaluates to false, and the loop terminates.
I can see why the '\0' works, as C puts this character at the end of my array in compile time. But why does not NULL work? How is 0 == '\0'?
0 has the same value as '\0' because '\0' is a character with the value zero. (Not to be confused with '0', which is the zero digit and has a value of 48.)
Regarding NULL, that actually can work since it also evaluates to zero. However, NULL is a pointer type so you may have to type cast to avoid errors. (Hard to say for certain since you didn't post the error that you got.)

When (or how) does this work?
while(myText[counter])
Any built-in with value zero will evaluate to false in a boolean context. So this while(myText[counter]) is false when myText[counter] is '\0', which has the value 0.
How is 0 == '\0'?
It is defined that way in the language. '\0' is an int literal with value zero. 0 is also an int literal with value zero, so both compare equal, and they both evaluate to false in a boolean context

How is 0 == '\0'?
All characters have an 8-bit numeric value, for example, 'a' is 97 (decimal). The backslash in the character literal '\0' introduces an "escape" to directly specify the character through its numeric value. In this case, the numeric value 0.

The termination of a string is \0
NULL is used to initialise a pointer to a determined value
while(myText[counter]) evaluates to false as soon as counter points to the zero byte.
In the end there is nothing different than a zero byte at the end of the string. Actually NULL would mean the same but it is used for notation purposes only in the context of pointers.
If something is not 100% clear from a coding perspective, you might want to look inside your debugger watch window, what are the bits and bytes actually during program execution.

Related

Is this undefined behaviour ( working with string literal)

#include<stdio.h>
int main()
{
char *s = "Abc";
while(*s)
printf("%c", *s++);
return 0;
}
I have seen this (on a site) as a correct code but I feel this is undefined behavior.
My reasoning:
Here s stores the address of the string literal Abc. So while traversing through the while loop :
Iteration - 1:
Here *(s++) increments the address stored in s by 1 and returns the non-incremented address (i.e the previous/original value of s). So, no problem everything works fine and Abc is printed.
Iteration - 2:
Now s points to a completely different address (which may be either valid or not). Now when trying to perform while(*s) isn't it undefined behavior ?
Any help would be really appreciated!
No. There's no undefined behaviour here.
*s++ is evaluated as *(s++) due to higher precedence of postfix increment operator than the dereference operator. So the loop simply iterates over the string and prints the bytes and stop when it sees the null byte.
Now s points to a completely different address (which may be either valid or not). Now when trying to perform while(*s) isn't it undefined behavior ?
No. In the first iteration s points to the address at the char A and at b in the next and at c in the next. And the loop terminates when s reaches the null byte at end of the string (i.e. *s is 0).
Basically, there's no modification of the string literal. The loop is functionally equivalent to:
while(*s) {
printf("%c", *s);
s++;
}
Iteration - 1:
Here *(s++) increments the address stored in s by 1 and returns the non-incremented address (i.e the previous/original value of s). So, no problem everything works fine and Abc is printed.
No, “Abc” is not printed. %c tells printf to expect a character value and print that. It prints a single character, not a string. Initially, s points to the first character of "Abc". s++ increments it to point to the next character.
Iteration - 2:
Now s points to a completely different address (which may be either valid or not). Now when trying to perform while(*s) isn't it undefined behavior ?
In iteration 2, s is pointing to “b”.
You may have been thinking of some char **p for which *p had been set to a pointer to "abc". In that case, incrementing p would change it to point to a different pointer (or to uncontrolled memory), and there would be a problem. That is not the case; for char *s, s points to a single character, and incrementing it adjusts it to point to the next character.
Now s points to a completely different address
Indeed, it is a completely different but well defined address. s referenced the next char of the string literal. So it just adds 1 to the pointer.
Because string literal is nul (zero) terminated the while loop will stop when s will reference it.
There is no UB.

How to understand strings with pointers

I have been studying the C language for the last few months. I'm using a book and I have this exercise:
char vector[N_STRINGS][20] = {"ola", "antonio", "susana"};
char (*ptr)[20] = vector;
char *p;
while(ptr-vector<N_STRINGS)
{
p = *ptr;
while(*p)
putchar(*p++);
putchar('\n');
ptr++;
}
I understand everything expect the while(*p)! I can't figure out what the while(*p) is doing.
The variable p in your code is defined as a pointer to a char. The get the actual char value that p points to, you need to dereference the pointer, using the * operator.
So, the expression in your while loop, *p evaluates - at the beginning of each loop - to the char variable that p is currently pointing to. Inside the loop, the putchar call also uses this dereference operator but then increments the pointer's value so, after sending that character to the output, the pointer is incremented (the ++ operator) and it then points to the next character in the string.
Conventionally (in fact, virtually always), character strings in C are NUL-terminated, meaning that the end of the string is signalled by having a character with the value of zero at the end of the string.
When the while loop in your code reaches this NUL terminator, the value of the expression *p will thus be ZERO. And, as ZERO is equivalent to a logical "false" in C (any non-zero value is considered "true"), the while loop will end.
Feel free to ask for further clarification and/or explanation.
From the C Standard (6.8.5 Iteration statements)
4 An iteration statement causes a statement called the loop body to be
executed repeatedly until the controlling expression compares equal to
0.
In this part of the program
p = *ptr;
while(*p)
//…
the pointer p points to the first character of a current string. String in C is a sequence of characters terminated by the zero character '\0'.
So let's for example the pointer initially points to the first character of the string "ola". The string is represented in the corresponding character array like
{ 'o', 'l', 'a', '\0' }
The condition in the loop
while(*p)
may be rewritten like
while(*p != 0 )
So the loop will be executed for all characters of the string except the last zero-terminated character and there will be outputted the first three characters of the string.
Pay attention to that (6.5.9 Equality operators)
3 The == (equal to) and != (not equal to) operators are analogous to
the relational operators except for their lower precedence.108) Each
of the operators yields 1 if the specified relation is true and 0 if it
is false. The result has type int. For any pair of operands, exactly
one of the relations is true.

Why does the OR statement evaluate only one even if both are true? [duplicate]

This question already has an answer here:
Is an if statement guaranteed to not be evaluated more than necessary? [duplicate]
(1 answer)
Closed 3 years ago.
Consider the following C code, according to what I have learnt the OR statement should evaluate both the printf. But in actual output I see only "XX". Why is this happening?
#include<stdio.h>
int main() {
int a;
a = (printf("XX")||printf("YY"));
printf("%d\n",a);
a = (printf("XX")&&printf("YY"));
printf("%d\n",a);
}
Output -
XX1
XXYY1
OR operator output is true even if one condition is true. And in this case first printf statement returns true. So there is no need to evaluate the second operand of OR operator.
|| operator evaluates to true if any of the operands is true. So if first operand evaluates to true it doesn't check the second. It only checks second operand if first operand was false.
&& operator evaluates to true only when both the operands are true. It would check the second operand iff the first one is not false.
As stated in the manpage:
Upon successful return, these functions return the number of characters printed (excluding the null byte used to end output to strings).
The functions snprintf() and vsnprintf() do not write more than size bytes (including the terminating null byte ('\0')). If the output was truncated due to this limit then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available. Thus, a return value of size or more means that the output was truncated. (See also below under NOTES.)
Say you have a code like:
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("%d", printf(""));
return 0;
}
This would print 0 just as explained in the manpage.
If your statement is:
printf("") && printf("XX");
It won't print anything because first operand evaluated to 0.
Whereas,
printf("") || printf("YY");
Would print YY.
In your case,
a = (printf("XX")||printf("YY"));
would evaluate printf("YY") only when the first operand failed to print anything.
a = (printf("XX")&&printf("YY"));
Would print XXYY when both printf were successful.
Say you're trying to answer this question "Is it raining or is it after 7 PM?". Once you see that it's raining, you know the answer is "yes". You have no need to check if it's after 7 PM or not.
a = (printf("XX")||printf("YY"));
You asked it an "OR" question. Once it evaluates the first printf, it knows that the answer to your question is yes, so it has no need to evaluate the second part.
The || or OR operator doesn't evaluate its right part if the left part has been determined to be true. Because it does not need to: TRUE || something is "always" true. Even if a potential crash hides on the right. The left part is always checked first.
The return value of printf is not "false" in case of error: it return -1. The "false" value is 0, which would occur if you are trying to print an empty format string, or put a '\0' at the beginning of it. Any value that is not 0 is evaluated as "true" in the general case.

Why adding a Null Character in a String array? [duplicate]

This question already has answers here:
Why are strings in C++ usually terminated with '\0'?
(5 answers)
Why do we need to add a '\0' (null) at the end of a character array in C?
(9 answers)
Closed 9 years ago.
I know that we have to use a null character to terminate a string array like this:
char str[5] = { 'A','N','S','\0' };
But I just wanted to know why is it essential to use a null character to terminate an array like this?
Also why don't we add a null charater to terminate these :-
char str1[5]="ANS";
The NULL-termination is what differentiates a char array from a string (a NULL-terminated char-array) in C. Most string-manipulating functions relies on NULL to know when the string is finished (and its job is done), and won't work with simple char-array (eg. they'll keep on working past the boundaries of the array, and continue until it finds a NULL somewhere in memory - often corrupting memory as it goes).
In C, 0 (the integer value) is considered boolean FALSE - all other values are considered TRUE. if, for and while uses 0 (FALSE) or non-zero (TRUE) to determent how to branch or if to loop. char is an integer type, an the NULL-character (\0) is actually and simply a char with the decimal integer value 0 - ie. FALSE. This make it very simple to make functions for things like manipulating or copying strings, as they can safely loop as long as the character it's working on is non-zero (ie. TRUE) and stop when it encounters the NULL-character (ie. FALSE) - as this signifies the end of the string. It makes very simple loops, since we don't have to compare, we just need to know if it's 0 (FALSE) or not (TRUE).
Example:
char source[]="Test"; // Actually: T e s t \0 ('\0' is the NULL-character)
char dest[8];
int i=0;
char curr;
do {
curr = source[i];
dest[i] = curr;
i++;
} while(curr); //Will loop as long as condition is TRUE, ie. non-zero, all chars but NULL.
It isnt essential but if you are using any of the standard libraries, they all expect it.

Can the null character be used to represent the zero character?

The C99 standard requires that "A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string." (5.2.1.2) It then goes on to list 99 other characters that must be in the execution set. Can a character set be used in which the null character is one of these 99 characters? In particular, is it allowed that '0' == '\0' ?
Edit: Everyone is pointing out that in ASCII, '0' is 0x30. This is true, but the standard doesn't mandate the used of ASCII.
No matter if you use ASCII, EBCDIC or something "self-crafted", '0' must be distinct from '\0', for the reason you mention yourself:
A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string. (5.2.1.2)
If the null character terminates a character string, it cannot be contained in that string. It is the only character which cannot be contained in a string; all other haracters can be used and thus must be distinct from 0.
I don't think the standard states that each of the characters that it lists (including the null character) has a distinct value, other than that the digits do. But a "character set" containing a value 0 that allegedly represents 91 of the 100 required characters is clearly not really a character set containing the required 100 characters. So this is either:
part of the English-language definition of "a character set",
obvious from context,
a very minor flaw in the text of the standard, that it should spell it out to prevent wilful misinterpretation by a faithless implementer.
Take your pick.
In the case of the '0'='\0' you will not be able to differ end of string and '0' value.
Thus it will be a bit hard to use something like "0_any_string", as it already starts from '0'.
No, it can't. Character set must be described by an injective function, i.e. a function that maps each character to exactly one distinct binary value. Mapping 2 characters to the same value will make the character set non-deterministic, i.e. the computer won't be able to interpret the data to a matching character since more than one fits.
The C99 standard poses another restriction by forcing the mapping of null character to a specific binary value. Given the above paragraph this means that no other character can have a value identical to null.
The integer constant literal 0 has different meanings depending upon
the context in which it's used. In all cases, it is still an integer
constant with the value 0, it is just described in different ways.
If a pointer is being compared to the constant literal 0, then this is
a check to see if the pointer is a null pointer. This 0 is then
referred to as a null pointer constant. The C standard defines that 0
cast to the type void * is both a null pointer and a null pointer
constant.
What is the difference between NULL, '\0' and 0

Resources