we are currently learning how Pointers work in C.
I have this very short code of a copymethod of Strings in C, that was given to us from a tutor. I tried to explain its function in my own words but I am unsure if I have understood it correctly and would appreciate if somebody could correct my mistakes and answer my questions about it.
void copy ( char ∗ source , char ∗ dest) {
while (∗dest++ = ∗source++);
}
"Copy is a function with 2 Paramaters source and dest, which are both pointers of type char. The function calls a while statement which sets the dereferenced dest, incremented by 1 * sizeof(char), equal to the dereferenced and incremented (by 1*sizeof(char)) source."
What exactly does the while statement do? From my understanding *dest means that I am getting the char which dest points to, is that correct? But why would a while statement only set 2 pointers equal to each other, I don't really get it.
I appreciate any help, thank you!
This is the terse way to write an implementation of strcpy.
It is equivalent to the longer, but easier to follow
while (*dest = *source) {
dest++;
source++;
}
*dest = *source copies one char from the source to the destination. The result is zero if a zero byte was copied, at which time the loop ends.
Adding the increment to the condition is possible because the pointer dereferenced is the pointer value before the increment.
First, you need to start evaluating the expressions from the inner to the outer side. An important thing is to take into consideration that the body of the loop is empty (the ; at the right parenthesis indicates that the loop is executing the null statement --do nothing--, so everything must happen in the test expression).
The expression in the while test is not a test for equity, but an assignment. The equals operator is written doubling the =, as in ==, while = is used to assign the value of the right subexpression to the variable on the left side.
The variable on the left side is *dest++. A bit complicated expression that includes a dereference operator * and an autoincrement operator ++. The ++ takes higher preference, so it is acted first: the pointer is incremented, and the value returned by this subexpression is the value of the pointer before it was incremented. This means that the value of the pointer being used is the one it had before the expression dest++ was evaluated, and the * operator states the char variable pointed to by dest. So the place where the value will be stored in the assignment is the value pointed to by dest, and the pointer will be incremented before the end of the whole expression ends --but after the value is used--. The right side of the assignment show what is going to be assigned. As the expression is the same, I will pass quickly over it. The value assigned is the one pointed to by the source variable. It is incremented once the value is taken, and the value stored in the place of dest is filled with the character of the variable pointed by src. After that, both pointers are incremented, so the next time the test expression is evaluated, the characters involved will be the next source assigned to the next destination character. So, in this point, when the loop will be finished? Well, the value of an assignment expression is precisely the value assigned to the destination target, so in this case is the character copied from source to destination. And the test will fail, when the character copied happens to be zero (or false in C terms). As you see, there's nothing left to be put in the body of the loop.
This sample is famous for being an example of how cryptic C can get to. It appears in the two editions of "The C programming language" of Kernighan & Ritchie, the inventors of the language, and is accompanied by a comment that says something more or less like this: This is a bit obscure but every programmer that is proud of being proficient at C coding must be capable of interpreting this with a bit of care. (indeed the loop variables in the source are one letter s and d, i think)
Related
I'm learning C and I came across the below code which implements concatenation but I am struggling to understand the second portion despite recapping pointer/increment precedence and associativity.
I have run examples of all the different combinations of dereferencing and post/pre increment and now recall that pre-increment with dereferencing is right to left associative e.g *++q where the inc would occur first. The page I learned from stated that post-increment with the dereference operator has the increment as higher precedence and goes on to say that the associativity of this example is left to right. I don't particularly know why it mentions the associativity as I understand precedence is regarded before it e.g *p++ would increment before its dereference.
The below code starts by recursively calling itself with a pre-inc on dest to get to the end of the destination string which is denoted by it reaching and dereferencing the null byte. At this point, I lose track because I would have thought it would simply the dereferenced dest to the dereferenced src however it post increments it which based off what I've learned would increment to the memory location after the null byte dereference it and proceed to assign the string "eeksfor" to the position above the null byte in dest. Not only does this confuse me but the program goes on to call itself in another ternary operator / return 0 for seemingly no reason.
Thanks
/* my_strcat(dest, src) copies data of src to dest. To do so, it first reaches end of the string dest using recursive calls my_strcat(++dest, src). Once end of dest is reached, data is copied using
(*dest++ = *src++)? my_strcat(dest, src). */
void my_strcat(char *dest, char *src)
{
(*dest)? my_strcat(++dest, src): (*dest++ = *src++)? my_strcat(dest, src): 0 ;
}
/* driver function to test above function */
int main()
{
char dest[100] = "geeksfor";
char *src = "geeks";
my_strcat(dest, src);
printf(" %s ", dest);
getchar();
}
I ran the program and it does what it is expected to do. That is it returns the string "geeksforgeeks" so clearly I'm just not understanding something
Ok, so you understand the first recursive bit fine (keep incrementing dst to find the end).
Once you've found the end, it's time to start copying. The code it's using to copy one byte is: (*dest++ = *src++)
It might help your understanding to expand that code to something like: *dest = *src; dest += 1; src += 1
(because they are post-increment operators)
Now, a normal 'copy' function would repeat that statement in a while or for loop, but because we are cleverly showing off, we use a second ternary operator and recursion. The final '0' doesn't really do anything, but it needs to be there as the "else" part of the ternary operator.
While this may not be the worst implementation of strcat ever written, it's certainly a strong contender.
Why exactly is a string literal in an if-condition treated as true?
if("whatiamreturning")
//this is true. I want to know y?
Based on the above, what happens here?
#include<stdio.h>
void main() {
static int i;
for(;;) { //infinite loop
if(i+++"The Matrix")
// what is happening in the above line?
printf("Memento");
else
break;
}
}
if("whatiamreturning")
is equivalent to
if (1)
This is because "whatiamreturning" is a char [] that decays into a non-NULL char const* inside the if(). Any non-NULL pointer evaluates to true in the context of a boolean expression.
The line
if(i+++"The Matrix")
can be simplified to:
if( (i++) + "The Matrix")
In the first iteration of the loop, the value of i is 0. Hence, the (i++) + "The Matrix" evaluates to "The Matrix".
In the second iteration of the loop, the value of i is 1. Hence, the (i++) + "The Matrix" evaluates to "he Matrix".
However, the loop never ends and goes into the territory of undefined behavior since (i++) + "The Matrix" never evaluates to 0 and the value of i keeps on increasing.
Perhaps they meant to use:
if(i++["The Matrix"])
which will allow the expression inside if() it to be 0 after 10 iterations.
Update
If you are following somebody else's code, stay away anything else that they have written. The main function can be cleaned up to:
int main() {
char name[] = "The Matrix";
int i = 0;
for( ; name[i] != '\0'; ++i )
{
printf("Memento\n");
}
}
if(i+++"The Matrix") // what is happening here please help here to understand
This will take the value of i, add the pointer value of the location of the string "The Matrix" in memory and compare it to zero. After that it will increase the value of i by one.
It's not very useful, since the pointer value could be basically any random number (it depends on architecture, OS, etc). And thus the whole program amounts to printing Memento a random number of times (likely the same number each run though).
Perhaps you meant to write if(*(i+++"The Matrix")). That would loop 10 times until it i+"The Matrix" evaluates to the address pointing to the NUL byte at the end of the string, and *(i+"The Matrix") will thus return 0.
Btw, spaces are a nice way to make your code more readable.
It will return the address of first element of the string whatiamreturning.
Basically when you assign a string literal to a char pointer
char *p;
p = "whatiamreturning";
the assignment doesn't copy the the characters in whatiamreturning, instead it makes p point to the first character of the string and that's why string literals can be sub-scripted
char ch = "whatiamreturning"[1];
ch will will have character h now. This worked because compiler treated whatiamreturning as a char * and calculated the base address of the literal.
if(i+++"The Matrix") is equivalent to
if( i++ + "The Matrix")
or it can be rewritten as
if(&("The Matrix"[i++]))
which will be true for every i and results in an infinite loop. Ultimately, the code will suffer from undefined behavior due to integer overflow for variable i.
Why exactly is a string literal in an if-condition treated as true?
if("whatiamreturning")
The string literal "whatiamreturning" is a constant of type char[].
In nearly all contexts, including this one, arrays decay to pointers to their first element.
In a boolean context, like the condition of an if-statement, all non-zero values are true.
As the pointer points to an object, it is not the null-pointer, and thus is true.
Based on the above, what happens here?
#include<stdio.h>
void main() {
The above is your first instance of Undefined Behavior, whatever happens, it is right.
We will now pretend the error is corrected by substituting int for void.
Now, your loop:
static int i;
Static variables are default initialized, so i starts with value 0.
for(;;) { //infinite loop
if(i+++"The Matrix")
// what is happening in the above line?
printf("Memento");
else
break;
}
This loop has Undefined Behavior as well.
The condition takes i and adds it to the string literal "Memento" which decayed to a pointer like in the previous example, interpreting the resultant pointer in a boolean context, and as a side-effect incrementing i.
As long as i is no more than strlen("The Matrix")+1 on entry, everything is ok, the pointer points to an element of the string literal or one past, and the standard guarantees that's not a null pointer.
The moment it is though, all hell breaks loose because calculating such a pointer is Undefined Behavior.
Well, now that we know the loop is UB, let's ignore the loop too.
The rest of the program is:
}
Which is ok, because even though main has a return type of int, there's a special rule which states that if control reaches the end of main without executing a return-statement, an implicit return 0; is added.
Side-note: If an execution of a program encounters Undefined Behavior anywhere, the whole program is undefined, not only from that point on:
Undefined behavior can result in time travel (among other things, but time travel is the funkiest)
About the expression statement(an example)
i = 1;
it is said that after assigning 1 to i the value of entire expression is being discarded. If the value is discarded then how this can be used later in the program,for example
printf("%d",i);
?
I know this is very basic question but I am really confused with discarded.
The value of the expression is indeed discarded, but this expression has a side effect - it changes the value of i. So next time you will access this variable, you will read the new value, which is 1.
The term "discarded" is more helpful when you do things like foo(5); or even simply "hello";. Since the expression "hello" does not have any side effect, and its value is dicarded, it is does absolutely nothing. When a compiler encounters it, as a stand alone statement:
"hello";
It may simply ignore it altogether, as if it does not exist at all. This is what happens when you call functions, or use operators:
4+5;
sin(2.6);
These expressions, too, have no side effect, and their values are ignored. When you do something like
printf("hello");
This is an expression, too. Its value is the total number of characters written. This value is ignored. But the expression must not be comletely ignored, since it has an important side effect: it prints these characters to the standard output.
So let's build a function instead of using the assignment operator (since C has no references, we'll use pointers):
int assign_int(int* var, int value) {
*var = value;
return *var;
}
now, back to your example, you do something like:
assign_int(&i, 1);
the value returned from assign_int is discarded. Just like in the printf() case. But since the function assign_int has a side effect (changing the value of i), it is not ignored by the compiler.
The important point is the i = 1 has two properties.
It changes the value stored in the variable i to be 1
It is an expression and has a value (which is also 1);
That second part is interesting is a case like
if ( (i=1) == 2 ) { // ...
or
y = 3 + (i = 1); // assign 4 to y
The line
the value of entire expression is being discarded.
refers to the value of the expression (my #2), but does not affect assignment to variable i (my #1).
What does the below code do? I'm very confused with its working. Because I thought that the if loop runs till the range of int. But I'm confused when I try to print the value of i. Please help me out with this.
#include<stdio.h>
void main()
{
static int i;
for (;;)
if (i+++”Apple”)
printf(“Banana”);
else
break;
}
It is interpreted as i++ + "Apple". Since i is static and does not have an initializer, i++ yields 0. So the whole expression is 0 + some address or equivalent to if ("Apple").
EDIT
As Jonathan Leffler correctly notes in the comments, what I said above only applies to the first iteration. After that it will keep incrementing i and will keep printing "Banana".
I think at some point, due to overflows (if it doesn't crash) "Apple" + i will yield 0 and the loop will break. Again, I don't really know what a well-meaning compiler should do when one adds a pointer and a large number.
As Eric Postpischil commented, you can only advance the pointer until it points to one-past the allocated space. In your exxample adding 7 will advance the pointer one-past the allocated space ("Apples\0"). Adding more is undefined behavior and technically strange things can happen.
Use int main(void) instead of void main().
The expression i+++"Apple" is parsed as (i++) + "Apple"; the string literal "Apple" is converted from an expression of type "6-element array of char" to "pointer to char", and its value is the address of the first element of the array. The expression i++ evaluates to the current value of i, and as a side effect, the value in i is incremented by 1.
So, we're adding the result of the integer expression i++ to the pointer value resulting from the expression "Apple"; this gives us a new pointer value that's equal or greater than the address of "Apple". So assuming the address of the string literal "Apple" is 0x80123450, then basically we're evaluating the values
0x80123450 + 0
0x80123450 + 1
0x80123450 + 2
...
all of which should evaluate to non-zero, which causes the printf statement to be executed. The question is what happens when i++ results in an integer overflow (the behavior of which is not well defined) or the value of i+++"Apple" results in an overflow for a pointer value. It's not clear that i+++"Apple" will ever result in a 0-valued expression.
This code SHOULD Have been written like this:
char *apple = "Apple";
for(i = 0; apple[i++];)
printf("Banana");
Not only is it clearer than the code posted in the original, it is also clearer to see what it does. But I guess this came from "Look how bizarre we can write things in C". There are lots of things that are possible in C that isn't a great idea.
It is also possible to learn to balance a plate of hot food on your head for the purpose of serving yourself dinner. It doesn't make it a particularly great idea - unless you don't have hands and feet, I suppose... ;)
Edit: Except this is wrong... The equivalent is:
char *apple = "Apple";
for(i = 0; apple+i++ != NULL;)
printf("Banana");
On a 64-bit machine, that will take a while. If it finishes in reasonable time (sending output to /dev/null), I will update. It takes approximitely three minutes on my machine (AMD 3.4GHz Phenom II).
I am confused about this code: (http://www.joelonsoftware.com/articles/CollegeAdvice.html)
while (*s++ = *t++);
What is the order of execution? Is *s = *t first done, and then are they each incremented? Or other way around?
Thanks.
EDIT: And what if it was:
while(*(s++) = *(t++));
and
while(++*s = ++*t);
while (*s++ = *t++);
From the precedence table you can clearly see ++ is having higher precedence than *. But ++ is used here as post increment operator, so the incrementation happens after the assignment expression. So *s = *t happens first, then s and t are incremented.
EDIT:
while(*(s++) = *(t++));
Is same as above. You are making it more explicit with the use of parenthesis. But remember ++ is still a post increment.
while(++*s = ++*t);
There is just one operator next to s. So * is applied first and on that result ++ is applied which results in the lvalue required error.
while(*++s = *++t);
Again just operator next to s,t. So the incrementation happens first followed by copy. So we are effectively skipping the copy of the first char from t to s.
You are right. *s = *t is done first, and then they are incremented.
The increment is a post-increment. Post not just because it comes after the variable being incremented, but also because it comes after the expression is evaluated. So the order of execution is
*s = *t
then s++ and t++
EDIT::
#chrisgoyal
Order of execution is an ambiguous term. There are two different things here. The syntactical order, and the semantics of the expression.
Syntactically, the operator ++ is applied first. If the *s is applied first, then the following is equivalent to what #Hogan said:
(*s)++ = (*t)++
Which is very different from Joel's sample.
The semantics of the operator ++ is that it is executed after the expression.
Hope that clarifies what I meat.
Actually, s++ and t++ are applied first. Don't forget that the post-fix operator is executed after the expression is done. Basically the operator ++ is applied for both, then *s = *t is executed.
In Post increment operation variable is used first and then after its gets modified.
So there are two forms of increment
++s // increment before using value
s++ // increment after using value
And the result of these can be dereferenced:
*++s // or...
*s++
This worked out really well on one of the very first machines for C to run on, the PDP-11, which had a register-indirect addressing mode that increment the register after. The following ops were available in hardware:
*--s // or
*s++
You could do either
*x++ = *y++; // or
*--x = *--y; // or some combination
And if you did, the whole line happened in a single instruction. Since // comments were introduced by C99, however, you couldn't actually get away with my comment syntax.
The code: (while *s++ = *t++); is roughly equivalent to:
while (*s = *t) {
++s;
++t;
}
The second is exactly the same -- the extra parens don't change anything (in this case). For the parens to do anything, they'd have to be like: while ((*s)++ = (*t)++);. This would do roughly the same as your third example (covered in the paragraph below).
The last example: while(++*s = ++*t); is completely different. Since the dereference (*) is closer to the operand, this dereferences the operand, and increments the result of the dereference, which means it increments what the pointer points AT, instead of incrementing the pointer itself. As a result, this would copy the first character, then increment that character, then check whether that character was non-zero and continue the same until it was zero. The result would be both the source and the destination becoming empty strings (since the first character of both would now be a zero, which is used to terminate strings).