How to understand strings with pointers - c

I have been studying the C language for the last few months. I'm using a book and I have this exercise:
char vector[N_STRINGS][20] = {"ola", "antonio", "susana"};
char (*ptr)[20] = vector;
char *p;
while(ptr-vector<N_STRINGS)
{
p = *ptr;
while(*p)
putchar(*p++);
putchar('\n');
ptr++;
}
I understand everything expect the while(*p)! I can't figure out what the while(*p) is doing.

The variable p in your code is defined as a pointer to a char. The get the actual char value that p points to, you need to dereference the pointer, using the * operator.
So, the expression in your while loop, *p evaluates - at the beginning of each loop - to the char variable that p is currently pointing to. Inside the loop, the putchar call also uses this dereference operator but then increments the pointer's value so, after sending that character to the output, the pointer is incremented (the ++ operator) and it then points to the next character in the string.
Conventionally (in fact, virtually always), character strings in C are NUL-terminated, meaning that the end of the string is signalled by having a character with the value of zero at the end of the string.
When the while loop in your code reaches this NUL terminator, the value of the expression *p will thus be ZERO. And, as ZERO is equivalent to a logical "false" in C (any non-zero value is considered "true"), the while loop will end.
Feel free to ask for further clarification and/or explanation.

From the C Standard (6.8.5 Iteration statements)
4 An iteration statement causes a statement called the loop body to be
executed repeatedly until the controlling expression compares equal to
0.
In this part of the program
p = *ptr;
while(*p)
//…
the pointer p points to the first character of a current string. String in C is a sequence of characters terminated by the zero character '\0'.
So let's for example the pointer initially points to the first character of the string "ola". The string is represented in the corresponding character array like
{ 'o', 'l', 'a', '\0' }
The condition in the loop
while(*p)
may be rewritten like
while(*p != 0 )
So the loop will be executed for all characters of the string except the last zero-terminated character and there will be outputted the first three characters of the string.
Pay attention to that (6.5.9 Equality operators)
3 The == (equal to) and != (not equal to) operators are analogous to
the relational operators except for their lower precedence.108) Each
of the operators yields 1 if the specified relation is true and 0 if it
is false. The result has type int. For any pair of operands, exactly
one of the relations is true.

Related

Is this undefined behaviour ( working with string literal)

#include<stdio.h>
int main()
{
char *s = "Abc";
while(*s)
printf("%c", *s++);
return 0;
}
I have seen this (on a site) as a correct code but I feel this is undefined behavior.
My reasoning:
Here s stores the address of the string literal Abc. So while traversing through the while loop :
Iteration - 1:
Here *(s++) increments the address stored in s by 1 and returns the non-incremented address (i.e the previous/original value of s). So, no problem everything works fine and Abc is printed.
Iteration - 2:
Now s points to a completely different address (which may be either valid or not). Now when trying to perform while(*s) isn't it undefined behavior ?
Any help would be really appreciated!
No. There's no undefined behaviour here.
*s++ is evaluated as *(s++) due to higher precedence of postfix increment operator than the dereference operator. So the loop simply iterates over the string and prints the bytes and stop when it sees the null byte.
Now s points to a completely different address (which may be either valid or not). Now when trying to perform while(*s) isn't it undefined behavior ?
No. In the first iteration s points to the address at the char A and at b in the next and at c in the next. And the loop terminates when s reaches the null byte at end of the string (i.e. *s is 0).
Basically, there's no modification of the string literal. The loop is functionally equivalent to:
while(*s) {
printf("%c", *s);
s++;
}
Iteration - 1:
Here *(s++) increments the address stored in s by 1 and returns the non-incremented address (i.e the previous/original value of s). So, no problem everything works fine and Abc is printed.
No, “Abc” is not printed. %c tells printf to expect a character value and print that. It prints a single character, not a string. Initially, s points to the first character of "Abc". s++ increments it to point to the next character.
Iteration - 2:
Now s points to a completely different address (which may be either valid or not). Now when trying to perform while(*s) isn't it undefined behavior ?
In iteration 2, s is pointing to “b”.
You may have been thinking of some char **p for which *p had been set to a pointer to "abc". In that case, incrementing p would change it to point to a different pointer (or to uncontrolled memory), and there would be a problem. That is not the case; for char *s, s points to a single character, and incrementing it adjusts it to point to the next character.
Now s points to a completely different address
Indeed, it is a completely different but well defined address. s referenced the next char of the string literal. So it just adds 1 to the pointer.
Because string literal is nul (zero) terminated the while loop will stop when s will reference it.
There is no UB.

Output of the following

Code snippet
int main(){
printf(5 + "GeeksQuiz");
return 0;
}
Output is Quiz
Can you tell me how this output is coming.
Whats the logic behind it.
Addition is commutative. a + b is equal to b + a.
Adding an integer to a pointer increments follows pointer arithmetic. Adding an integer to a pointer increments it by so many elements as the integer count. So (int*)a + b is equal to (int*)((uintptr_t)a + b * sizeof(int))
sizeof(char) is always equal to 1.
"GeeksQuiz" is a string literal. strlen("GeeksQuiz") is equal 9. Accounting for the string terminating null byte, the type of the literal is char[10]. It's an array of 10 characters with the content {'G','e','e','k','s','Q','u','i','z','\0'}.
The C rules say, that an array of type is converted into a pointer to the first element of that array in most contexts. That happens here: (char[])"GeeksQuiz" is converted into a char* pointer to the first character 'G' in the string.
5 + "GeeksQuiz": "GeeksQuiz" is converted to the pointer to the first character. Then that pointer is incremented by 5. So the result of 5 + "GeeksQuiz" will be char* pointer that will point to the character 'Q' inside the string literal.
printf prints the null terminated string passed to it as the first character, except for conversions that start with %, which does not apply here.
To the printf function is passed the address of a pointer that points to the letter 'Q' inside the "GeeksQuiz" string literal.
printf increments the pointer until it will find the string terminating null byte. So it will print {'Q','u','i','z'}, as after z character it will find the null byte.

Post increment with pointers in while loop in C

I am writing the strcat function
/*appends source string to destination string*/
#include <stdio.h>
int main()
{
char srcstr[100], deststr[100];
char *psrcstr, *pdeststr;
printf("\n Enter source string: ");
gets(srcstr);
printf("\n Enter destination string: ");
gets(deststr);
pdeststr = deststr;
psrcstr = srcstr;
while(*pdeststr++)
;
while(*pdeststr++ = *psrcstr++)
;
printf("%s", deststr);
return 0;
}
For srcstr = " world" and deststr = "hello" I get hello, when I expect to see hello world, which is what I see if I change the first while so
while(*pdeststr);
pdeststr++;
why can't I write all in one line in the first while, just like in the second while?
Your one line loop
while(*pdeststr++);
Is equivalent to
while(*pdeststr)
pdeststr++;
pdeststr++;
Because the postincrement operator is executed before the condition is tested, but after the value for the test is determined.
So you could cater for this with
while(*pdeststr++);
pdeststr--;
Mandatory introduction: do not use gets(), use fgets()!.
Your problem is here:
while(*pdeststr++)
;
The side effect of incrementing is carried out in your last iteration step (when pdeststr points to the NUL terminator), so after this loop, pdeststr points one after your NUL terminator. Write it like this instead:
while(*pdeststr) ++pdeststr;
The boolean value for the while condition is computed before the ++ post-increment.
So when your while loop exits, the post-increment operator is executed one last time, hence pdeststr is pointing right after the null terminator char that follows the word "hello".
Then the rest of the program appends more data after that null char. You end up with the string "hello\0world\0". The print function thinks the string ends at the first null char it encounters.
You have an extra incrementation that point you after the NULL char, and so finally you could print only the first string.
By precedence the postfix increment operator (ptr++) is higher than the indirection (dereference) operator (*ptr). Therefore the
while(*pdeststr++);
will always increment the pdeststr first then evaluate the previously pointed value. As an outcome, when the result of evaluation is 0 the pdeststr actually points to the next element, so there will be a null-terminator character ('\0') between your concatenated words.
As a one-liner solution with while loop you can use the short-circuit evaluation as follows:
while(*pdeststr && pdeststr++);
The code snippet above will stop when *foo results 0 and won't evaluate the foo++ part.
My conclusion is that at the end, deststr = "hello\0 world\0" and because of that printf("%s", deststr); find the first \0 and gives as output hello

How does c compare character variable against string?

The following code is completely ok in C but not in C++. In following code if statement is always false. How C compares character variable against string?
int main()
{
char ch='a';
if(ch=="a")
printf("confusion");
return 0;
}
The following code is completely ok in C
No, Not at all.
In your code
if(ch=="a")
is essentially trying to compare the value of ch with the base address of the string literal "a",. This is meaning-and-use-less.
What you want here, is to use single quotes (') to denote a char literal, like
if(ch == 'a')
NOTE 1:
To elaborate on the difference between single quotes for char literals and double quotes for string literal s,
For char literal, C11, chapter §6.4.4.4
An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'
and, for string literal, chapter §6.4.5
Acharacter string literal is a sequence of zero or more multibyte characters enclosed in
double-quotes, as in "xyz".
NOTE 2:
That said, as a note, the recommend signature of main() is int main(void).
I wouldn't say the code is okay in either language.
'a' is a single character. It is actually a small integer, having as its value the value of the given character in the machine's character set (almost invariably ASCII). So 'a' has the value 97, as you can see by running
char c = 'a';
printf("%d\n", c);
"a", on the other hand, is a string. It is an array of characters, terminated by a null character. In C, arrays are almost always referred to by pointers to their first element, so in this case the string constant "a" acts like a pointer to an array of two characters, 'a' and the terminating '\0'. You could see that by running
char *str = "a";
printf("%d %d\n", str[0], str[1]);
This will print
97 0
Now, we don't know where in memory the compiler will choose to put our string, so we don't know what the value of the pointer will be, but it's safe to say that it will never be equal to 97. So the comparison if(ch=="a") will always be false.
When you need to compare a character and a string, you have two choices. You can compare the character to the first character of the string:
if(c == str[0])
printf("they are equal\n");
else printf("confusion\n");
Or you can construct a string from the character, and compare that. In C, that might look like this:
char tmpstr[2];
tmpstr[0] = c;
tmpstr[1] = '\0';
if(strcmp(tmpstr, str) == 0)
printf("they are equal\n");
else printf("confusion\n");
That's the answer for C. There's a different, more powerful string type in C++, so things would be different in that language.
There is difference between 'a' (a character) and "a" (a string having two characters a and \0). ch=="a" comparison will be evaluated to false because in this expression "a" will converted to pointer to its first element and of course that address is not a character but a hexadecimal number.
Change it to
if(ch=='a')

Printing an array of characters with "while"

So here is the working version:
#include <stdio.h>
int main(int argc, char const *argv[]) {
char myText[] = "hello world\n";
int counter = 0;
while(myText[counter]) {
printf("%c", myText[counter]);
counter++;
}
}
and in action:
Korays-MacBook-Pro:~ koraytugay$ gcc koray.c
Korays-MacBook-Pro:~ koraytugay$ ./a.out
hello world
My question is, why is this code even working? When (or how) does
while(myText[counter])
evaluate to false?
These 2 work as well:
while(myText[counter] != '\0')
while(myText[counter] != 0)
This one prints garbage in the console:
while(myText[counter] != EOF)
and this does not even compile:
while(myText[counter] != NULL)
I can see why the '\0' works, as C puts this character at the end of my array in compile time. But why does not NULL work? How is 0 == '\0'?
AS for your last question,
But why does not NULL work?
Usually, NULL is a pointer type. Here, myText[counter] is a value of type char. As per the conditions for using the == operator, from C11 standard, chapter 6.5.9,
One of the following shall hold:
both operands have arithmetic type;
both operands are pointers to qualified or unqualified versions of compatible types;
one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void; or
one operand is a pointer and the other is a null pointer constant.
So, it tells, you can only compare a pointer type with a null pointer constant ## (NULL).
After that,
When (or how) does while(myText[counter]) evaluate to false?
Easy, when myText[counter] has got a value of 0.
To elaborate, after the initialization, myText holds the character values used to initialize it, with a "null" at last. The "null" is the way C identifies the string endpoint. Now, the null, is represented by a values of 0. So, we can say. when the end-of-string is reached, the while() is FALSE.
Additional explanation:
while(myText[counter] != '\0') works, because '\0' is the representation of the "null" used as the string terminator.
while(myText[counter] != 0) works, because 0 is the decimal value of '\0'.
Above both statements are equivalent of while(myText[counter]).
while(myText[counter] != EOF) does not work because a null is not an EOF.
Reference: (#)
Reference: C11 standard, chapter 6.3.2.3, Pointers, Paragraph 3
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.
and, from chapter, 7.19,
NULL
which expands to an implementation-defined null pointer constant
Note: In the end, this question and realted answers will clear the confusion, should you have any.
In C, any non-zero value evaluates to true. C strings are null-terminated. That is, there is a special zero-value character after the last character in the string.
And so when the null terminator is reached, the zero value evaluates to false, and the loop terminates.
I can see why the '\0' works, as C puts this character at the end of my array in compile time. But why does not NULL work? How is 0 == '\0'?
0 has the same value as '\0' because '\0' is a character with the value zero. (Not to be confused with '0', which is the zero digit and has a value of 48.)
Regarding NULL, that actually can work since it also evaluates to zero. However, NULL is a pointer type so you may have to type cast to avoid errors. (Hard to say for certain since you didn't post the error that you got.)
When (or how) does this work?
while(myText[counter])
Any built-in with value zero will evaluate to false in a boolean context. So this while(myText[counter]) is false when myText[counter] is '\0', which has the value 0.
How is 0 == '\0'?
It is defined that way in the language. '\0' is an int literal with value zero. 0 is also an int literal with value zero, so both compare equal, and they both evaluate to false in a boolean context
How is 0 == '\0'?
All characters have an 8-bit numeric value, for example, 'a' is 97 (decimal). The backslash in the character literal '\0' introduces an "escape" to directly specify the character through its numeric value. In this case, the numeric value 0.
The termination of a string is \0
NULL is used to initialise a pointer to a determined value
while(myText[counter]) evaluates to false as soon as counter points to the zero byte.
In the end there is nothing different than a zero byte at the end of the string. Actually NULL would mean the same but it is used for notation purposes only in the context of pointers.
If something is not 100% clear from a coding perspective, you might want to look inside your debugger watch window, what are the bits and bytes actually during program execution.

Resources