I am confused on how to parse precedence of operations in C. the line
countArray[*string++]--
Is executing how I want it to, but I don't understand the steps that result in countArray[*string]-- being evaluated before *string++.
My research on C precedence and binding didn't provide answers that relate to this case, and I'm wondering about general rules for post/pre-increment and post/pre-decrement when in combination with other post/pre - crements.
How does C know to evaluate it this way?
void discountChars(char* string, char** countArray)
{
int test;
while(*string) {
test = *string;
//why is countArray[*string]-- evaluated before string++ is incremented?
countArray[*string++]--;
printf("countArray[%d] = %d\n", test, countArray[test]);
}
}
You can break this:
countArray[*string++]--;
down into this:
char index = *string; // get char from `string` into temporary index
string++; // increment `string`
countArray[index]--; // decrement `countArray` value at given index
and then it should be clearer what's going on.
As it has been stated many times, precedence has no connection to the order of evaluation. The only thing in C language that can affect order of evaluation is sequencing. Precedence has nothing to do with it.
It is also unclear where you got the strange idea that "countArray[*string]-- is being evaluated before *string++". This is simply impossible. The expression in [] will always be evaluated first, since its result is required to perform the element access (i.e. sequenced before the element access). Which means that the opposite is true: *string++ is evaluated before countArray[*string]--.
So, the sequence of steps here is
Evaluate *string++. The result of this expression is the original value of *string. Let's designate it tmp.
This expression also "schedules" a side-effect - increment of string. But this increment does not have to happen right now.
Evaluate countArray[tmp]--. The result of this expression is the original value of countArray[tmp]. This result is immediately discareded.
This expression also "schedules" a side-effect - decrement of countArray[tmp]. But this decrement does not have to happen right now.
Complete the evaluation of the full expression . If any of the above side-effects are still pending, complete them right now.
For example, one possible evaluation schedule might look as follows
char tmp = *string; // subexpression inside `[]`
countArray[tmp]; // full expression, result discarded
countArray[tmp] = countArray[tmp] - 1; // side-effect
string = string + 1; // side-effect
Another possible evaluation schedule is
char tmp = *string; // subexpression inside `[]`
string = string + 1; // side-effect
countArray[tmp]; // full expression, result discarded
countArray[tmp] = countArray[tmp] - 1; // side-effect
It can even be evaluated as
string = string + 1; // side-effect
char tmp = *(string - 1); // subexpression inside `[]`
countArray[tmp]; // full expression, result discarded
countArray[tmp] = countArray[tmp] - 1; // side-effect
Precedence controls groupings of operators and operands, not order of evaluation.
The expression *string++ must be evaluated before it can be used as an array subscript; however, the side effect of updating string may happen after the larger expression has been evaluated. The following sequence of events is allowed:
t1 <- *string
countArray[t1] <- countArray[t1] - 1
string <- string + 1
Then again, so is the following:
t1 <- *string
string <- string + 1
countArray[t1] <- countArray[t1] - 1
Related
I'm having trouble wrapping my head around a concept in C regarding the use of the operator ++ and arrays. I know enough that ++ will increment a value, but I'm running into a code snippet that does not make any sense to me:
while((c = getchar()) != EOF)
{
if(c < NUM_CHARS)
{
thisval = ++freqarr[c];
if(thisval > maxval)
{
maxval = thisval;
}
}
the line thisval = ++freqarr[c]; does not make a lot of sense to me.
Does it mean that thisval adds an additional index or value to the array?
I'm still new to C, so I'm not sure if this is common in C or not, if it's not or looked down upon, please let me know.
Prefix ++ has lower precendence than [], see https://en.cppreference.com/w/c/language/operator_precedence. So this is equivalent to thisval = ++(freqarr[c]);. It takes the cth element of the array, and increments that element, then assigns the new value to thisval. Just like:
freqarray[c] = freqarray[c] + 1;
thisval = freqarray[c];
I presume that c is defined as an int and freqarr is defined as an array. (It's generally best to include a Minimal, Reproducible Example in your question).
The prefix ++ operator increments an object and yields the incremented value. freqarr[c] is an object. The fact that it happens to be an element of an array is not relevant here. It works the same way as it would if it were applied to a simple named variable.
Note that the indexing operator [] binds more tightly than ++, so the expression ++freqarr[c] is equivalent to ++(freqarr[c]).
The prefix ++ operator increments its operand, and the resulting expression has the incremented value.
So this line does two things:
It increments the value of freqarr[c] which is a member of an array
It assigned the incremented value of freqarr[c] to thisval
Code :
#include<stdio.h>
#include<stdlib.h>
int arr[] = {1, 2, 3, 4};
static int count = 0;
int incr( ) {
++count;
++count;
return count;
}
int main(void)
{
printf("\ncount= %d \n",count);
int i;
arr[count++]=incr( );
for(i=0;i<4;i++)
printf("arr[%d]=%d\n", i,arr[i]);
printf("\nIncremented count= %d \n",count);
return 0;
}
Output
count = 0
arr[0]=2
arr[1]=2
arr[2]=3
arr[3]=4
Incremented count = 1
The final incremented value of global variable count is 1 even though it has been incremented thrice.
When count++ is replaced by count in arr[count++]=incr( ) the final incremented value of count is 2.
This is undefined behaviour from bad sequencing. On this line:
arr[count++]=incr( );
What happens (with your compiler) is:
arr[count] is resolved to arr[0], postfix ++ will be applied at the end of the
statement;
incr() is called, count is now equal to 2, incr() returns 2;
arr[0] gets assigned 2;
postfix ++'s side effect kicks in, and count is now equal to 1. Previous changes to count are lost.
You will find more info on "side effects" and "sequence points" by googling their real name :)
To understand why your code goes wrong, you must first understand undefined behavior and sequence points, which is a rather advanced topic. You also need to understand what undefined behavior is, and what unspecified behavior is, explained here.
If you do something to a variable which counts as a side-effect, such as modifying it, then you are not allowed to access that variable again before the next sequence point, for other purposes than to calculate which value to store in your variable.
For example i = i++ is undefined behavior because there are two side effects on the same variable with no sequence point in between. But i = i+1; is well-defined, because there is only one side effect (the assignment) and the i+1 is only a read access to determine what value to store.
In your case, there is no sequence point between the arr[count++] sub-expression and the incr() sub-expression, so you get undefined behavior.
This is how sequence points appear in functions, C11 6.5.2.2:
There is a sequence point after the evaluations of the function
designator and the actual arguments but before the actual call. Every
evaluation in the calling function (including other function calls)
that is not otherwise specifically sequenced before or after the
execution of the body of the called function is indeterminately
sequenced with respect to the execution of the called function.
This means that the contents of the function aren't sequenced in relation to the rest of the expression. So you are essentially writing an expression identical to arr[count++] = ++count;, except through the function you managed to squeeze in two unsequenced ++count on the right side of the operation, which wouldn't otherwise be possible. Any any rate, it is undefined behavior.
Fix your code by enforcing sequence points between the left hand and the right hand of the expression. However, the order of evaluation of sub-expressions is unspecified behavior, so you need to ensure that your code is safe no matter if the left or right side is evaluated first. This code will fix the problems:
// artificial example, don't write code like this
0,arr[count++] = 0,incr();
since the comma operator introduces a sequence point. But of course, writing nonsense code like that isn't something you should be doing. The real solution is to never use ++ together with other operators in the same expression.
// good code, write code like this
arr[count] = incr();
count++;
About the expression statement(an example)
i = 1;
it is said that after assigning 1 to i the value of entire expression is being discarded. If the value is discarded then how this can be used later in the program,for example
printf("%d",i);
?
I know this is very basic question but I am really confused with discarded.
The value of the expression is indeed discarded, but this expression has a side effect - it changes the value of i. So next time you will access this variable, you will read the new value, which is 1.
The term "discarded" is more helpful when you do things like foo(5); or even simply "hello";. Since the expression "hello" does not have any side effect, and its value is dicarded, it is does absolutely nothing. When a compiler encounters it, as a stand alone statement:
"hello";
It may simply ignore it altogether, as if it does not exist at all. This is what happens when you call functions, or use operators:
4+5;
sin(2.6);
These expressions, too, have no side effect, and their values are ignored. When you do something like
printf("hello");
This is an expression, too. Its value is the total number of characters written. This value is ignored. But the expression must not be comletely ignored, since it has an important side effect: it prints these characters to the standard output.
So let's build a function instead of using the assignment operator (since C has no references, we'll use pointers):
int assign_int(int* var, int value) {
*var = value;
return *var;
}
now, back to your example, you do something like:
assign_int(&i, 1);
the value returned from assign_int is discarded. Just like in the printf() case. But since the function assign_int has a side effect (changing the value of i), it is not ignored by the compiler.
The important point is the i = 1 has two properties.
It changes the value stored in the variable i to be 1
It is an expression and has a value (which is also 1);
That second part is interesting is a case like
if ( (i=1) == 2 ) { // ...
or
y = 3 + (i = 1); // assign 4 to y
The line
the value of entire expression is being discarded.
refers to the value of the expression (my #2), but does not affect assignment to variable i (my #1).
#include <stdio.h>
main()
{
int i;
for(i=0; i<0, 5; i++)
printf("%d\n", i);
}
I am unable to understand the i<0, 5 part in the condition of the for loop.
Even if I make it i>0, 5, there's no change in output.
How does this work?
On topic
The comma operator will always yield the last value in the comma separated list.
Basically it's a binary operator that evaluates the left hand value but discards it, then evaluates the right hand value and returns it.
If you chain multiple of these they will eventually yield the last value in the chain.
As per anatolyg's comment, this is useful if you want to evaluate the left hand value before the right hand value (if the left hand evaluation has a desirable side effect).
For example i < (x++, x/2) would be a sane way to use that operator because you're affecting the right hand value with the repercussions of the left hand value evaluation.
http://en.wikipedia.org/wiki/Comma_operator
Sidenote: did you ever hear of this curious operator?
int x = 100;
while(x --> 0) {
// do stuff with x
}
It's just another way of writing x-- > 0.
Comma operator evaluates i<0 Or i>0 and ignores. Hence, it's always the 5 that's present in the condition.
So it's equivalent to:
for(i=0;5;i++)
The coma operator is done to the initialization and to the increment part, to do something like for(i=0,j=20;i<j;i++,j--), if you do it in the comparation part it will evaluate the last one (as it was already answered before)
i<0,5 will always evaluate to 5, as always the right expression will be returned for ex1,ex2 .
The comma operator is intended for cases where the first operand has some side effects. It's just an idiom, meant to make your code more readable. It has no effect on the evaluation of the conditional.
For example,
for (i = 0; i<(i++, 5); i++) {
// something
}
will increment i, and then check if i<5.
I am confused about this code: (http://www.joelonsoftware.com/articles/CollegeAdvice.html)
while (*s++ = *t++);
What is the order of execution? Is *s = *t first done, and then are they each incremented? Or other way around?
Thanks.
EDIT: And what if it was:
while(*(s++) = *(t++));
and
while(++*s = ++*t);
while (*s++ = *t++);
From the precedence table you can clearly see ++ is having higher precedence than *. But ++ is used here as post increment operator, so the incrementation happens after the assignment expression. So *s = *t happens first, then s and t are incremented.
EDIT:
while(*(s++) = *(t++));
Is same as above. You are making it more explicit with the use of parenthesis. But remember ++ is still a post increment.
while(++*s = ++*t);
There is just one operator next to s. So * is applied first and on that result ++ is applied which results in the lvalue required error.
while(*++s = *++t);
Again just operator next to s,t. So the incrementation happens first followed by copy. So we are effectively skipping the copy of the first char from t to s.
You are right. *s = *t is done first, and then they are incremented.
The increment is a post-increment. Post not just because it comes after the variable being incremented, but also because it comes after the expression is evaluated. So the order of execution is
*s = *t
then s++ and t++
EDIT::
#chrisgoyal
Order of execution is an ambiguous term. There are two different things here. The syntactical order, and the semantics of the expression.
Syntactically, the operator ++ is applied first. If the *s is applied first, then the following is equivalent to what #Hogan said:
(*s)++ = (*t)++
Which is very different from Joel's sample.
The semantics of the operator ++ is that it is executed after the expression.
Hope that clarifies what I meat.
Actually, s++ and t++ are applied first. Don't forget that the post-fix operator is executed after the expression is done. Basically the operator ++ is applied for both, then *s = *t is executed.
In Post increment operation variable is used first and then after its gets modified.
So there are two forms of increment
++s // increment before using value
s++ // increment after using value
And the result of these can be dereferenced:
*++s // or...
*s++
This worked out really well on one of the very first machines for C to run on, the PDP-11, which had a register-indirect addressing mode that increment the register after. The following ops were available in hardware:
*--s // or
*s++
You could do either
*x++ = *y++; // or
*--x = *--y; // or some combination
And if you did, the whole line happened in a single instruction. Since // comments were introduced by C99, however, you couldn't actually get away with my comment syntax.
The code: (while *s++ = *t++); is roughly equivalent to:
while (*s = *t) {
++s;
++t;
}
The second is exactly the same -- the extra parens don't change anything (in this case). For the parens to do anything, they'd have to be like: while ((*s)++ = (*t)++);. This would do roughly the same as your third example (covered in the paragraph below).
The last example: while(++*s = ++*t); is completely different. Since the dereference (*) is closer to the operand, this dereferences the operand, and increments the result of the dereference, which means it increments what the pointer points AT, instead of incrementing the pointer itself. As a result, this would copy the first character, then increment that character, then check whether that character was non-zero and continue the same until it was zero. The result would be both the source and the destination becoming empty strings (since the first character of both would now be a zero, which is used to terminate strings).