I'm learning C and I came across the below code which implements concatenation but I am struggling to understand the second portion despite recapping pointer/increment precedence and associativity.
I have run examples of all the different combinations of dereferencing and post/pre increment and now recall that pre-increment with dereferencing is right to left associative e.g *++q where the inc would occur first. The page I learned from stated that post-increment with the dereference operator has the increment as higher precedence and goes on to say that the associativity of this example is left to right. I don't particularly know why it mentions the associativity as I understand precedence is regarded before it e.g *p++ would increment before its dereference.
The below code starts by recursively calling itself with a pre-inc on dest to get to the end of the destination string which is denoted by it reaching and dereferencing the null byte. At this point, I lose track because I would have thought it would simply the dereferenced dest to the dereferenced src however it post increments it which based off what I've learned would increment to the memory location after the null byte dereference it and proceed to assign the string "eeksfor" to the position above the null byte in dest. Not only does this confuse me but the program goes on to call itself in another ternary operator / return 0 for seemingly no reason.
Thanks
/* my_strcat(dest, src) copies data of src to dest. To do so, it first reaches end of the string dest using recursive calls my_strcat(++dest, src). Once end of dest is reached, data is copied using
(*dest++ = *src++)? my_strcat(dest, src). */
void my_strcat(char *dest, char *src)
{
(*dest)? my_strcat(++dest, src): (*dest++ = *src++)? my_strcat(dest, src): 0 ;
}
/* driver function to test above function */
int main()
{
char dest[100] = "geeksfor";
char *src = "geeks";
my_strcat(dest, src);
printf(" %s ", dest);
getchar();
}
I ran the program and it does what it is expected to do. That is it returns the string "geeksforgeeks" so clearly I'm just not understanding something
Ok, so you understand the first recursive bit fine (keep incrementing dst to find the end).
Once you've found the end, it's time to start copying. The code it's using to copy one byte is: (*dest++ = *src++)
It might help your understanding to expand that code to something like: *dest = *src; dest += 1; src += 1
(because they are post-increment operators)
Now, a normal 'copy' function would repeat that statement in a while or for loop, but because we are cleverly showing off, we use a second ternary operator and recursion. The final '0' doesn't really do anything, but it needs to be there as the "else" part of the ternary operator.
While this may not be the worst implementation of strcat ever written, it's certainly a strong contender.
Related
we are currently learning how Pointers work in C.
I have this very short code of a copymethod of Strings in C, that was given to us from a tutor. I tried to explain its function in my own words but I am unsure if I have understood it correctly and would appreciate if somebody could correct my mistakes and answer my questions about it.
void copy ( char ∗ source , char ∗ dest) {
while (∗dest++ = ∗source++);
}
"Copy is a function with 2 Paramaters source and dest, which are both pointers of type char. The function calls a while statement which sets the dereferenced dest, incremented by 1 * sizeof(char), equal to the dereferenced and incremented (by 1*sizeof(char)) source."
What exactly does the while statement do? From my understanding *dest means that I am getting the char which dest points to, is that correct? But why would a while statement only set 2 pointers equal to each other, I don't really get it.
I appreciate any help, thank you!
This is the terse way to write an implementation of strcpy.
It is equivalent to the longer, but easier to follow
while (*dest = *source) {
dest++;
source++;
}
*dest = *source copies one char from the source to the destination. The result is zero if a zero byte was copied, at which time the loop ends.
Adding the increment to the condition is possible because the pointer dereferenced is the pointer value before the increment.
First, you need to start evaluating the expressions from the inner to the outer side. An important thing is to take into consideration that the body of the loop is empty (the ; at the right parenthesis indicates that the loop is executing the null statement --do nothing--, so everything must happen in the test expression).
The expression in the while test is not a test for equity, but an assignment. The equals operator is written doubling the =, as in ==, while = is used to assign the value of the right subexpression to the variable on the left side.
The variable on the left side is *dest++. A bit complicated expression that includes a dereference operator * and an autoincrement operator ++. The ++ takes higher preference, so it is acted first: the pointer is incremented, and the value returned by this subexpression is the value of the pointer before it was incremented. This means that the value of the pointer being used is the one it had before the expression dest++ was evaluated, and the * operator states the char variable pointed to by dest. So the place where the value will be stored in the assignment is the value pointed to by dest, and the pointer will be incremented before the end of the whole expression ends --but after the value is used--. The right side of the assignment show what is going to be assigned. As the expression is the same, I will pass quickly over it. The value assigned is the one pointed to by the source variable. It is incremented once the value is taken, and the value stored in the place of dest is filled with the character of the variable pointed by src. After that, both pointers are incremented, so the next time the test expression is evaluated, the characters involved will be the next source assigned to the next destination character. So, in this point, when the loop will be finished? Well, the value of an assignment expression is precisely the value assigned to the destination target, so in this case is the character copied from source to destination. And the test will fail, when the character copied happens to be zero (or false in C terms). As you see, there's nothing left to be put in the body of the loop.
This sample is famous for being an example of how cryptic C can get to. It appears in the two editions of "The C programming language" of Kernighan & Ritchie, the inventors of the language, and is accompanied by a comment that says something more or less like this: This is a bit obscure but every programmer that is proud of being proficient at C coding must be capable of interpreting this with a bit of care. (indeed the loop variables in the source are one letter s and d, i think)
So i have this main:
#define NUM 5
int main()
{
int a[NUM]={20,-90,450,-37,87};
int *p;
for (p=a; (char *)p < ((char *)a + sizeof(int) * NUM); ) //same meaning: for (p=a; p<a+NUM;)
*p++ = ++*p < 60 ? *p : 0; //same meaning: *(p++)=++(*p)<60?*p:0;
for(p=a; (char *)p < ((char *)a + sizeof(int) * NUM); )
printf("\n %d ", *p++);
return 0;
}
And i need to find what is the output.
So after try to understand without any idea i run it and this is the output:
21
-89
0
-36
0
So i will glad to explanation how to solve this kind of questions (i have exam soon and this type of questions probably i will see..)
EDIT:
at the beginning i want to understand what the first forstatement doing:
This jump 1 integer ? and what this going inside the block ?
And what is the different between *p++ and ++*p
The question is similar to Why are these constructs (using ++) undefined behavior in C? although not an exact duplicate due to the (subtle) sequence point inside the ?: operator.
There is no predictable output since the program contains undefined behavior.
While the sub-expression ++*p is sequenced in a well-defined way compared to *p because of the internal sequence point of the ?: operator, this is not true for the other combinations of sub-expressions. Most notably, the order of evaluation of the operands to = is not specified:
C11 6.5.15/3:
The evaluations of the operands are unsequenced.
*p++ is not sequenced in relation to ++*p. The order of evaluation of the sub-expressions is unspecified, and since there are multiple unsequenced side-effects on the same variable, the behavior is undefined.
Similarly, *p++ is not sequenced in relation to *p. This also leads to undefined behavior.
Summary: the code is broken and full of bugs. Anything can happen. Whoever gave you the assignment is incompetent.
at the beginning i want to understand what the first for statement doing
This is what one would call code obfuscation... The difficult part is obviously this one:
(char *)p < ((char *)a+sizeof(int)*NUM);
OK, we convert p to a pointer to char, then compare it to another pointer retrieved from array a that points to the first element past a: sizeof(int)*NUM is the size of the array - which we could have gotten much more easily by just having sizeof(a), so (char*)p < (char*)a + sizeof(a)
Be aware that comparing pointers other than with (in-)equality is undefined behaviour if the pointers do not point into the same array or one past the end of the latter (they do, in this example, though).
Typically, one would have this comparison as p < a + sizeof(a)/sizeof(*a) (or sizeof(a)/sizeof(a[0]), if you prefer).
*p++ increments the pointer and dereferences it afterwards, it is short for p = p + 1; *p = .... ++*p, on the other hand first dereferences the pointer and increments the value it is pointing to (note the difference to *++p, yet another variant - can you get it yourself?), i. e. it is equivalent to *p = *p + 1.
The entire line *p++ = ++*p<60 ? *p : 0; then shall do the following:
increment the value of *p
if the result is less than 60, use it, otherwise use 0
assign this to *p
increment p
However, this is undefined behaviour as there is no sequence point in between read and write access of p; you do not know if the left or the right side of the assignment is evaluated first, in the former case we would assign a[i] = ++a[i + 1], in the latter case, a[i] = ++a[i]! You might have gotten different output with another compiler!!!
However, these are only the two most likely outputs – actually, if falling into undefined behaviour, anything might happen, the compiler might just to ignore the piece of code in question, decide not to do anything at all (just exit from main right as the first instruction), the program might crash or it could even switch off the sun...
Be aware that one single location with undefined behaviour results in the whole program itself having undefined behaviour!
Short answer: because of this line
*p++ = ++*p<60 ? *p : 0;
it is impossible to say how the program behaves. When we access *p on the right-hand side, does it use the old or the new value of p, that is, before or after the p++ on the left-hand side gets to it? There is no rule in C to tell us. What there is instead is a rule that says that for this reason the code is undefined.
Unfortunately the person setting the question didn't understand this, thinks that "tricky" code line this is something to make a puzzle about, instead of something to be avoided at all costs.
The only way to really understand this kind of stuff (memory management, pointer behaviour, etc.) is to experiment yourself. Anyway, I smell someone is trying to seem clever fooling students, so I will try to clarify a few things.
int a[NUM]={20,-90,450,-37,87};
int *p;
This structure in memory would be something like:
This creates a vector of five int, so far, so good. The obvious move, given that data, is to run over the elements of a using p. You would do the following:
for(p = a; p < (a + NUM); ++p) {
printf("%d ", *p);
}
However, the first change to notice is that both loops convert the pointers to char. So, they would be:
for (p=a;(char *)p<((char *)a+sizeof(int)*NUM); ++p) {
printf("%d ", *p);
}
Instead of pointing to a with a pointer to int the code converts pto a pointer to char. Say your machine is a 32bit one. Then an int will probably occupy four bytes. With p being a pointer to int, when you do ++p then you effectively go to the next element in a, since transparently your compiler will jump four bytes. If you convert the int pointer to a char instead, then you cannot add NUM and assume that you are the end of the array anymore: a char is stored in one byte, so ((char *)p) + 5 will point to the second byte in the second element of a, provided it was pointing at the beginning of a before. That is way you have to call sizeof(int) and multiply it by NUM, in order to get the end of the array.
And finally, the infamous *p++ = ++*p<60 ? *p : 0;. This is something unfair to face students with, since as others have already pointed out, the behaviour of that code is undefined. Lets go expression by expression.
++*p means "access p and add 1 to the result. If p is pointing to the first position of a, then the result would be 21. ++*pnot only returns 21, but also stored 21 in memory in the place where you had 20. If you were only to return 21, you would write; *p + 1.
++*p<60 ? *p : 0 means "if the result of permanently adding 1 to the value pointed by p is less than 60, then return that result, otherwise return 0.
*p++ = x means "store the value of x in the memory address pointed by p, and then increment p. That's why you don't find ++p or p++ in the increment part of the for loop.
Now about the whole instruction (*p++ = ++*p<60 ? *p : 0;), it is undefined behaviour (check #Lundin's answer for more details). In summary, the most obvious problem is that you don't know which part (the left or the right one), around the assignment operator, is going to be evaluated first. You don't even know how the subexpressions in the expression at the right of the assignment operator are going to be evaluated (which order).
How could you fix this? It would be actually be very simple:
for (p=a;(char *)p<((char *)a+sizeof(int)*NUM); ++p) {
*p = (*p + 1) <60 ? (*p + 1) : 0;
}
And much more readable. Even better:
for (p = a; p < (a + NUM); ++p) {
*p = (*p + 1) <60 ? (*p + 1) : 0;
}
Hope this helps.
I searched for a string replacement function and found this question
What is the function to replace string in C?
If I use the code from the answer, it works but it looks wrong and gives a warning:
/home/dac/osh/util.c: In function ‘str_replace’:
/home/dac/osh/util.c:867:5: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
for (count = 0; tmp = strstr(ins, rep); ++count) {
It looks like it's maybe a bug with = and == . Is it a bug or did I misunderstand? Should it be == instead?
No, it's not. In this case, the value of tmp is actually intended to be used as the condition.
The return value of strstr:
char * strstr(char * str1, const char * str2 );
Returns a pointer to the first occurrence of str2 in str1, or a null pointer if str2 is not part of str1.
To remove the warning, try this:
for (count = 0; (tmp = strstr(ins, rep)) != NULL; ++count) {
No, it's not a bug. As per the body of the loop:
for (count = 0; tmp = strstr(ins, rep); ++count) {
ins = tmp + len_rep;
}
it actually uses tmp for something. The continuation condition in that for statement will assign the result of strstr() to tmp then execute the body as long as it's non-zero (i.e., as long as it found the string). That's because strstr() returns NULL only if the string cannot be found.
I suspect this is just gcc being paranoid in that it realises the continuation condition (the middle bit) on a for statement usually tends to be a comparison and you may have accidentally used = rather then ==.
That's why the diagnostic states warning: suggest ... rather than error: what the heck? :-) If you want to get rid of the warning (which isn't a bad position to take), simply do what it suggests and surround the entire continuation condition with parentheses.
[ strstr reference ] states the return value for char * strstr (char *str1,const char *str2 ) is :
A pointer to the first occurrence in str1 of the entire sequence of
characters specified in str2, or a null pointer if the sequence is not
present in str1.
Now for a little more C terminology, when you do :
tmp = strstr(ins, rep)
the main intent of C is to evaluate the expression as a whole and C evaluates it to the return value of strstr(ins, rep) here. The side effect is assigning this return value to tmp. The error :
suggest parentheses around assignment used as truth value
is a way gcc helps you to avoid a careless mistake, say typing a=b instead of a==b, I believe. Note that in the first case the value of b is used as truth value, but in the second case the result of is a equal b is used a truth value. By putting a () around tmp = strstr(ins, rep) you give the compiler the green signal to evaluate the value of the expression as truth value.
Side Note :
Putting () around tmp = strstr(ins, rep) makes it a full expression, and full expression is considered as a sequence point. A sequence point is a point in program execution at which all side effects are evaluated before going on to the next step.
Why exactly is a string literal in an if-condition treated as true?
if("whatiamreturning")
//this is true. I want to know y?
Based on the above, what happens here?
#include<stdio.h>
void main() {
static int i;
for(;;) { //infinite loop
if(i+++"The Matrix")
// what is happening in the above line?
printf("Memento");
else
break;
}
}
if("whatiamreturning")
is equivalent to
if (1)
This is because "whatiamreturning" is a char [] that decays into a non-NULL char const* inside the if(). Any non-NULL pointer evaluates to true in the context of a boolean expression.
The line
if(i+++"The Matrix")
can be simplified to:
if( (i++) + "The Matrix")
In the first iteration of the loop, the value of i is 0. Hence, the (i++) + "The Matrix" evaluates to "The Matrix".
In the second iteration of the loop, the value of i is 1. Hence, the (i++) + "The Matrix" evaluates to "he Matrix".
However, the loop never ends and goes into the territory of undefined behavior since (i++) + "The Matrix" never evaluates to 0 and the value of i keeps on increasing.
Perhaps they meant to use:
if(i++["The Matrix"])
which will allow the expression inside if() it to be 0 after 10 iterations.
Update
If you are following somebody else's code, stay away anything else that they have written. The main function can be cleaned up to:
int main() {
char name[] = "The Matrix";
int i = 0;
for( ; name[i] != '\0'; ++i )
{
printf("Memento\n");
}
}
if(i+++"The Matrix") // what is happening here please help here to understand
This will take the value of i, add the pointer value of the location of the string "The Matrix" in memory and compare it to zero. After that it will increase the value of i by one.
It's not very useful, since the pointer value could be basically any random number (it depends on architecture, OS, etc). And thus the whole program amounts to printing Memento a random number of times (likely the same number each run though).
Perhaps you meant to write if(*(i+++"The Matrix")). That would loop 10 times until it i+"The Matrix" evaluates to the address pointing to the NUL byte at the end of the string, and *(i+"The Matrix") will thus return 0.
Btw, spaces are a nice way to make your code more readable.
It will return the address of first element of the string whatiamreturning.
Basically when you assign a string literal to a char pointer
char *p;
p = "whatiamreturning";
the assignment doesn't copy the the characters in whatiamreturning, instead it makes p point to the first character of the string and that's why string literals can be sub-scripted
char ch = "whatiamreturning"[1];
ch will will have character h now. This worked because compiler treated whatiamreturning as a char * and calculated the base address of the literal.
if(i+++"The Matrix") is equivalent to
if( i++ + "The Matrix")
or it can be rewritten as
if(&("The Matrix"[i++]))
which will be true for every i and results in an infinite loop. Ultimately, the code will suffer from undefined behavior due to integer overflow for variable i.
Why exactly is a string literal in an if-condition treated as true?
if("whatiamreturning")
The string literal "whatiamreturning" is a constant of type char[].
In nearly all contexts, including this one, arrays decay to pointers to their first element.
In a boolean context, like the condition of an if-statement, all non-zero values are true.
As the pointer points to an object, it is not the null-pointer, and thus is true.
Based on the above, what happens here?
#include<stdio.h>
void main() {
The above is your first instance of Undefined Behavior, whatever happens, it is right.
We will now pretend the error is corrected by substituting int for void.
Now, your loop:
static int i;
Static variables are default initialized, so i starts with value 0.
for(;;) { //infinite loop
if(i+++"The Matrix")
// what is happening in the above line?
printf("Memento");
else
break;
}
This loop has Undefined Behavior as well.
The condition takes i and adds it to the string literal "Memento" which decayed to a pointer like in the previous example, interpreting the resultant pointer in a boolean context, and as a side-effect incrementing i.
As long as i is no more than strlen("The Matrix")+1 on entry, everything is ok, the pointer points to an element of the string literal or one past, and the standard guarantees that's not a null pointer.
The moment it is though, all hell breaks loose because calculating such a pointer is Undefined Behavior.
Well, now that we know the loop is UB, let's ignore the loop too.
The rest of the program is:
}
Which is ok, because even though main has a return type of int, there's a special rule which states that if control reaches the end of main without executing a return-statement, an implicit return 0; is added.
Side-note: If an execution of a program encounters Undefined Behavior anywhere, the whole program is undefined, not only from that point on:
Undefined behavior can result in time travel (among other things, but time travel is the funkiest)
I am confused about this code: (http://www.joelonsoftware.com/articles/CollegeAdvice.html)
while (*s++ = *t++);
What is the order of execution? Is *s = *t first done, and then are they each incremented? Or other way around?
Thanks.
EDIT: And what if it was:
while(*(s++) = *(t++));
and
while(++*s = ++*t);
while (*s++ = *t++);
From the precedence table you can clearly see ++ is having higher precedence than *. But ++ is used here as post increment operator, so the incrementation happens after the assignment expression. So *s = *t happens first, then s and t are incremented.
EDIT:
while(*(s++) = *(t++));
Is same as above. You are making it more explicit with the use of parenthesis. But remember ++ is still a post increment.
while(++*s = ++*t);
There is just one operator next to s. So * is applied first and on that result ++ is applied which results in the lvalue required error.
while(*++s = *++t);
Again just operator next to s,t. So the incrementation happens first followed by copy. So we are effectively skipping the copy of the first char from t to s.
You are right. *s = *t is done first, and then they are incremented.
The increment is a post-increment. Post not just because it comes after the variable being incremented, but also because it comes after the expression is evaluated. So the order of execution is
*s = *t
then s++ and t++
EDIT::
#chrisgoyal
Order of execution is an ambiguous term. There are two different things here. The syntactical order, and the semantics of the expression.
Syntactically, the operator ++ is applied first. If the *s is applied first, then the following is equivalent to what #Hogan said:
(*s)++ = (*t)++
Which is very different from Joel's sample.
The semantics of the operator ++ is that it is executed after the expression.
Hope that clarifies what I meat.
Actually, s++ and t++ are applied first. Don't forget that the post-fix operator is executed after the expression is done. Basically the operator ++ is applied for both, then *s = *t is executed.
In Post increment operation variable is used first and then after its gets modified.
So there are two forms of increment
++s // increment before using value
s++ // increment after using value
And the result of these can be dereferenced:
*++s // or...
*s++
This worked out really well on one of the very first machines for C to run on, the PDP-11, which had a register-indirect addressing mode that increment the register after. The following ops were available in hardware:
*--s // or
*s++
You could do either
*x++ = *y++; // or
*--x = *--y; // or some combination
And if you did, the whole line happened in a single instruction. Since // comments were introduced by C99, however, you couldn't actually get away with my comment syntax.
The code: (while *s++ = *t++); is roughly equivalent to:
while (*s = *t) {
++s;
++t;
}
The second is exactly the same -- the extra parens don't change anything (in this case). For the parens to do anything, they'd have to be like: while ((*s)++ = (*t)++);. This would do roughly the same as your third example (covered in the paragraph below).
The last example: while(++*s = ++*t); is completely different. Since the dereference (*) is closer to the operand, this dereferences the operand, and increments the result of the dereference, which means it increments what the pointer points AT, instead of incrementing the pointer itself. As a result, this would copy the first character, then increment that character, then check whether that character was non-zero and continue the same until it was zero. The result would be both the source and the destination becoming empty strings (since the first character of both would now be a zero, which is used to terminate strings).