how to solve this expression taking in mind precedence and associativity? - c

int p=4, s=5;
int m;
m = (p=s) * (p==s);
printf("%d",m);
How is this expression evaluated?
What will be the value of m?
How do parenthesis change the precedence of operators in this example?
I am getting m=4 in Turbo C.

Is this a code fragment that actually came up in your work, or is it an assignment someone gave you?
The expression contains two parts:
p=s /* in this part p's value is assigned */
p==s /* in this part p's value is used */
So before we can figure out what the value of the expression is, we have to figure out: does p's value get set before or after it gets used?
And the answer is -- I'm going to shout a little here -- WE DO NOT KNOW.
Let me say that again. We simply do not know whether p's value gets set before or after it gets used. So we have no way of predicting what value this expression will evaluate to.
Now, you might think that precedence will tell you whether p gets set before or after it gets used. But it turns out that, no, in this case precedence does not tell you whether p gets set before or after it gets used.
You might think that associativity will tell you whether p gets set before or after it gets used. But it turns out that, no, in this case associativity does not tell you whether p gets set before or after it gets used, either.
Finally, you might think that I'm wrong, that precedence and/or associativity have to be able to tell you what you need to know, that there has to be a way of figuring out what this expression does. But it turns out that I'm right: there is no way of figuring out what this expression does.
You could compile it and try it, but that will only tell you how the compiler you're using today chooses to evaluate it. I guarantee you that there's another compiler out there that will evaluate it differently.
You might ask, which compiler is right? And the answer is, they're both right (or at least, neither of them is wrong). As far as the C language is concerned, this expression is undefined. So there's no right answer, and your compiler isn't wrong no matter what it does.
If this is code that came up in your work, please delete it right away, and figure out what you were really trying to do, and figure out some cleaner, well-defined way of expressing it. And if this code is an assignment, your answer is simply: it's undefined. (If your instructor believes this expression has a well-defined result, you're in the unfortunate position of having an instructor who doesn't know what he's talking about.)

You're reading and writing the variable p multiple times in the same expression without a sequence point, which means the compiler is free to evaluate the two sub-expressions in any order.
This results in undefined behavior, meaning the program's behavior is unpredictable.

Related

Why does c = ++(a+b) give compilation error?

After researching, I read that the increment operator requires the operand to have a modifiable data object: https://en.wikipedia.org/wiki/Increment_and_decrement_operators.
From this I guess that it gives compilation error because (a+b) is a temporary integer and so is not modifiable.
Is this understanding correct? This was my first time trying to research a problem so if there was something I should have looked for please advise.
It's just a rule, that's all, and is possibly there to (1) make it easier to write C compilers and (2) nobody has convinced the C standards committee to relax it.
Informally speaking you can only write ++foo if foo can appear on the left hand side of an assignment expression like foo = bar. Since you can't write a + b = bar, you can't write ++(a + b) either.
There's no real reason why a + b couldn't yield a temporary on which ++ can operate, and the result of that is the value of the expression ++(a + b).
The C11 standard states in section 6.5.3.1
The operand of the prefix increment or decrement operator shall have
atomic, qualified, or unqualified real or pointer type, and shall be a
modifiable lvalue
And "modifiable lvalue" is described in section 6.3.2.1 subsection 1
An lvalue is an expression (with an object type other than void) that
potentially designates an object; if an lvalue does not designate an
object when it is evaluated, the behavior is undefined. When an
object is said to have a particular type, the type is
specified by the lvalue used to designate the object. A modifiable
lvalue is an lvalue that does not have array type, does not have
an incomplete type, does not have a const-qualified type, and
if it is a structure or union, does not have any member
(including, recursively, any member or element of all contained
aggregates or unions) with a const-qualified type.
So (a+b) is not a modifiable lvalue and is therefore not eligible for the prefix increment operator.
You are correct. the ++ tries to assign the new value to the original variable. So ++a will take the value of a, adds 1 to it and then assign it back to a. Since, as you said, (a+b) is a temp value, and not a variable with assigned memory address the assignment can't be performed.
I think you mostly answered your own question.
I might make a small change to your phrasing and replace "temporary variable" with "rvalue" as C.Gibbons mentioned.
The terms variable, argument, temporary variable and so on will become more clear as you learn about C's memory model (this looks like a nice overview: https://www.geeksforgeeks.org/memory-layout-of-c-program/ ).
The term "rvalue" may seem opaque when you're just starting out, so I hope the following helps with developing an intuition about it.
Lvalue/rvalue are talking about the different sides of an equals sign (assignment operator):
lvalue = left hand side (lowercase L, not a "one")
rvalue = right hand side
Learning a little about how C uses memory (and registers) will be helpful for seeing why the distinction is important. In broad brush strokes, the compiler creates a list of machine language instructions that compute the result of an expression (the rvalue) and then puts that result somewhere (the lvalue). Imagine a compiler dealing with the following code fragment:
x = y * 3
In assembly pseudocode it might look something like this toy example:
load register A with the value at memory address y
load register B with a value of 3
multiply register A and B, saving the result in A
write register A to memory address x
The ++ operator (and its -- counterpart) need a "somewhere" to modify, essentially anything that can work as an lvalue.
Understanding the C memory model will be helpful because you'll get a better idea in your head about how arguments get passed to functions and (eventually) how to work with dynamic memory allocation, like the malloc() function. For similar reasons you might study some simple assembly programming at some point to get a better idea of what the compiler is doing. Also if you're using gcc, the -S option "Stop after the stage of compilation proper; do not assemble." can be interesting (though I'd recommend trying it on a small code fragment).
Just as an aside:
The ++ instruction has been around since 1969 (though it started in C's predecessor, B):
(Ken Thompson's) observation (was) that the translation of ++x was smaller than that of x=x+1."
Following that wikipedia reference will take you to an interesting writeup by Dennis Ritchie (the "R" in "K&R C") on the history of the C language, linked here for convenience: http://www.bell-labs.com/usr/dmr/www/chist.html where you can search for "++".
The reason is that the standard requires the operand being an lvalue. The expression (a+b) is not a lvalue, so applying the increment operator isn't allowed.
Now, one might say "OK, that's indeed the reason, but there is actually no *real* reason other than that", but unluckily the particular wording of how the operator works factually does require that to be the case.
The expression ++E is equivalent to (E+=1).
Obviously, you cannot write E += 1 if E isn't a lvalue. Which is a shame because one could just as well have said: "increments E by one" and be done. In that case, applying the operator on a non-lvalue would (in principle) be perfectly possible, at the expense of making the compiler slightly more complex.
Now, the definition could trivially be reworded (I think it isn't even originally C but an heirloom of B), but doing so would fundamentally change the language to something that's no longer compatible with its former versions. Since the possible benefit is rather small but the possible implications are huge, that never happened and probably is never going to happen.
If you consider C++ in addition to C (question is tagged C, but there was discussion about operator overloads), the story becomes even more complicated. In C, it's hard to imagine that this could be the case, but in C++ the result of (a+b) could very well be something that you cannot increment at all, or incrementing could have very considerable side effects (not just adding 1). The compiler must be able to cope with that, and diagnose problematic cases as they occur. On a lvalue, that's still kinda trivial to check. Not so for any kind of haphazard expression inside a parenthesis that you throw at the poor thing.
This isn't a real reason why it couldn't be done, but it sure lends as an explanation why the people who implemented this are not precisely ecstatic to add such a feature which promises very little benefit to very few people.
(a+b) evaluates to an rvalue, which cannot be incremented.
++ tries to give the value to the original variable and since (a+b) is a temp value it cannot perform the operation. And they are basically rules of the C programming conventions to make the programming easy. That's it.
When ++(a+b) expression performed, then for example :
int a, b;
a = 10;
b = 20;
/* NOTE :
//step 1: expression need to solve first to perform ++ operation over operand
++ ( exp );
// in your case
++ ( 10 + 20 );
// step 2: result of that inc by one
++ ( 30 );
// here, you're applying ++ operator over constant value and it's invalid use of ++ operator
*/
++(a+b);

Strange C precedence evaluation

Can somebody explain what is happening with the precedence in this code? I've be trying to figure out what is happening by myself but I could'nt handle it alone.
#include <stdio.h>
int main(void) {
int v[]={20,35,76,80};
int *a;
a=&v[1];
--(*++a);
printf("%d,%d,%d,%d\n",v[0],v[1],v[2],v[3]);
(*++a);
printf("%d\n", *a);
*a--=*a+1; // WHAT IS HAPPENING HERE?
printf("%d\n", *a);
printf("%d,%d,%d,%d\n",v[0],v[1],v[2],v[3]);
}
//OUTPUT
20,35,75,80
80
75
20,35,75,76
*a--=*a+1; // WHAT IS HAPPENING HERE?
What's happening is that the behavior is undefined.
6.5 Expressions
...
2 If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an unsequenced side
effect occurs in any of the orderings.84)
3 The grouping of operators and operands is indicated by the syntax.85) Except as specified
later, side effects and value computations of subexpressions are unsequenced.86)
C 2011 Online Draft (N1570)
The expressions *a-- and *a are unsequenced relative to each other. Except in a few cases, C does not guarantee that expressions are evaluated left to right; therefore, it's not guaranteed that *a-- is evaluated (and the side effect applied) before *a.
*a-- has a side effect - it updates a to point to the previous element in the sequence. *a + 1 is a value computation - it adds 1 to the value of what a currently points to.
Depending on the order that *a-- and *a are evaluated and when the side effect of the -- operator is actually applied, you could be assigning the result of v[1] + 1 to v[0], or v[1] + 1 to v[1], or v[0] + 1 to v[0], or v[0] + 1 to v[1], or something else entirely.
Since the behavior is undefined, the compiler is not required to do anything in particular - it may issue a diagnostic and halt translation, it may issue a diagnostic and finish translation, or it may finish translation without a diagnostic. At runtime, the code may crash, you may get an unexpected result, or the code may work as intended.
I'm not going to explain the whole program; I'm going to focus on the "WHAT IS HAPPENING HERE" line. I think we can agree that before this line, the v[] array looks like this, with a pointing at v's last element:
+----+----+----+----+
v: | 20 | 35 | 75 | 80 |
+----+----+----+----+
0 1 2 3
^
+-|-+
a: | * |
+---+
Now, we have
*a-- = *a+1;
It looks like this is going to assign something to where a points, and decrement a. So it looks like it will assign something to v[3], but leave a pointing at v[2].
And the value that gets assigned will evidently be the value that a points to, plus 1.
But the key question is, when we take *a+1 on the right-hand side, will it use the old or the new value of a, before or after the decrement on the right-hand side? It turns out this is a really, really hard question to answer.
If we take the value after the decrement, it'll be a[2], plus 1, or 76 that gets assigned to a[3]. It looks like that's how your compiler interpreted it. And this makes a certain amount of sense, because when we read from left to right, it's easy to imagine that by the time we get around to computing *a+1, the a-- has already happened.
Or, if we took the value before the decrement, it would be a[3], plus 1, or 81 that gets assigned to a[3]. And that's how it was interpreted by three different compilers I tried it on. And this makes a certain amount of sense, too, because of course assignments actually proceed from right to left, so it's easy to imagine that *a+1 happens before the a-- on the left-hand side.
So which compiler is correct, yours or mine, and which is wrong? This is where the answer gets a little strange, and/or surprising. The answer is that neither compiler is wrong. This is because it turns out that it's not just really hard to decide what should happen here, it is (by definition) impossible to figure out what happens here. The C standard does not define how this expression should behave. In fact, it goes one farther than not defining how this expression should behave: the C Standard explicitly says that this expression is undefined. So your compiler is right to put 76 in v[3], and my compilers are right to put 81. And since "undefined behavior" means that anything can happen, it wouldn't be wrong for a compiler to arrange to put some other number into v[3], or to end up assigning to something other than v[3].
So the other part of the answer is that you must not write code like this. You must not depend on undefined behavior. It will do different things under different compilers. It may do something completely unpredictable. It is impossible to understand, maintain, or explain.
It's pretty easy to detect when an expression is undefined due to order-of-evaluation ambiguity. There are two cases: (1) the same variable gets modified twice, as in x++ + x++. (2) The same variable gets modified in one place, and used in another, as in *a-- = *a+1.
It's worth noting that one of the three compilers I used said "eo.c:15: warning: unsequenced modification and access to 'a'", and another said "eo.c:15:5: warning: operation on ‘a’ may be undefined". If your compiler has an option to enable warnings like these, use it! (Under gcc it's -Wsequence-point or -Wall. Under clang, it's -Wunsequenced or -Wall.)
See John Bode's answer for the detailed language from the C Standard that makes this expression undefined. See also the canonical StackOverflow question on this topic, Why are these constructs (using ++) undefined behavior?
Not exactly sure which expression you have problems with. Increment and decrement operators have the highest precedence. Dereference comes after. Addition, substraction, after.
But with regards to assignment, C does not specify order of evaluation (right to left or left to right).
will right hand side of an expression always evaluated first
C does not specify which of the right hand side or left hand side of the = operator is evaluated first.
*a--=*a+1;
So it could be that your pointer a is decremented first or after it's dereferenced on the right hand side.
In other words, depending on the compiler this expression could be equivalent to either:
a--;
*a = *a+1;
or
*(a-1)=*a+1;
a--;
I personally never rely too much on operator precedence in my code. I makes it more legible to either put parenthesis or separate in different lines.
Unless you're building a compiler yourself and need to make a decision to what assembly code to generate.

Logic program: increment and arithmetic operator of C [duplicate]

This question already has answers here:
sequence points in c
(4 answers)
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 6 years ago.
Can anyone help me solve this question
void main()
{
int num, a=5;
num = -a-- + + ++a;
printf("%d %d\n", num, a);
}
Answer: 0 5
But how? Can anyone explain the logic behind?
Once upon a time I watched two aggressive Boston drivers trying to park in the same parking space, such that they actually crashed their cars into each other. Something very much like that happens here.
There are other things wrong with this code, but let's just focus on what gets stored into a. If we imagine that the expression a-- + ++a gets evaluated from left to right, the first thing that happens is the a-- part. a started out as 5, so a-- has the value of 5, with a note to store 4 into a. Imagine that this note is given to the driver of the first car for delivery.
Next we get to ++a. Assume for the moment that the message to store 4 into a hasn't gotten through yet. So ++a has the value 6, with a note to store 6 into a. Imagine that this note is given to the driver of the second car for delivery.
One big question is, what is the final value of num, but I'm not worried about that for now. Let's just ask, what is the final value of a? We've got one car driver trying to set it to 4, and one trying to set to 6. But there's only one parking space labeled a. Maybe the first driver gets there first and sets it to 4, or maybe the second driver gets there and sets it to 6. Or maybe they get there at the exact same time, and crash into each other, and a gets set to a twisted piece of metal from the second car's front bumper. (Perhaps that twisted piece of metal looks a little like the number 5, so printing 5 was the best that printf could do.)
As I said, there are other things wrong with this code. For one thing, we shouldn't have imagined that it gets evaluated from left to right; there's no guarantee about that at all. We also don't know if the ++a part should operate on the original value of a, or the value affected by a--. In fact, we have no idea what this expression should do, and the perhaps surprising fact is, the compiler doesn't know, either! Different compilers will interpret an expression like this differently, giving wildly different answers. None of the answers are right, and none are wrong, because this expression is undefined.
The expression is undefined because there are two different attempts inside it to assign a new value to a. (You'll find more formal definitions of undefined behavior in the linked answers.) So a simple rule for avoiding undefined behavior is, "don't try to set a variable's value twice in one expression".
You might be thinking that this expression ought to have a well-defined interpretation. You might be thinking that it should obviously be evaluated from left to right. You might be thinking that a-- should store the new value into a immediately, such that it would be guaranteed to be the value seen by "later" parts of the expression. You might think those things, and they might be true about some other programming language, but not C. C does not guarantee that expressions are evaluated strictly left to right, and it does not guarantee that a-- or ++a update a's value immediately. Trying to update a's value twice in one expression leads to undefined behavior, and undefined behavior means that anything can happen.

Undefined expression with a definite answer?

I'm trying to understand the answer to an interview question.
The following code is presented:
int x = 5;
int y = x++ * ++x;
What is the value of y?
The answers are presented as a list of multiple choice answers, one of them being 35.
Writing and running this code on my machine results in y being equal to 35.
Therefore I would expect to mark the answer to this question as 35.
However, isn't this expression undefined due to the x++ i.e. the side of effect (the actual incrementation of x) could happen at any time depending on how the compile chooses to compile the code.
Therefore, I would not have thought that you could say for certain that the the answer is 35 as different compilers might produce difference results.
Another possible multiple choice answer is 30, which I would have thought was also viable i.e. if the post increment side effect takes place towards very end of the sequence point.
I don't have the answer key, so it's hard to determine what the would be the best answer to give.
Is this just a poor question or is the answer more obvious?
You are correct; the Standard does not impose any requirements at all on what this code might do.
If this was a multiple-choice question that required you to choose a specific defined answer, then — whoever wrote the question doesn't understand C as well as you do. Keep that in mind when you decide whether to accept an offer there. :-)
Undefined behavior means UNDEFINED, sometimes it could be the EXPECTED behavior, it doesn't mean it's DEFINED.
Perhaps they want you to answer what the expected behavior is, which is none since it's undefined behavior.
You are correct that this is undefined. However...
Often these questions are designed to make you think and, even though undefined by standard, the questioner may be looking to see if you:
Know what happens "in the real world" most of the time. (I'm not saying this is good, but I am saying that you will find constructs this bad and worse in codebases that you may be asked to take over and maintain.)
Know about operator precedence and order of operations.

Is *++*p acceptable syntax?

In K&R Section 5.10, in their sample implementation of a grep-like function, there are these lines:
while (--argc > 0 && (*++argv)[0] == '-')
while (c = *++argv[0])
Understanding the syntax there was one of the most challenging things for me, and even now a couple weeks after viewing it for the first time, I still have to think very slowly through the syntax to make sense of it. I compiled the program with this alternate syntax, but I'm not sure that the second line is allowable. I've just never seen *'s and ++'s interleaved like this, but it makes sense to me, it compiles, and it runs. It also requires no parentheses or brackets, which is maybe part of why it seems more clear to me. I just read the operators in one direction only (right to left) rather than bouncing back and forth to either side of the variable name.
while (--argc > 0 && **++argv == '-')
while (c = *++*argv)
Well for one, that's one way to make anyone reading your code to go huh?!?!?!
So, from a readability standpoint, no, you probably shouldn't write code like that.
Nevertheless, it's valid code and breaks down as this:
*(++(*p))
First, p is dereferenced. Then it is incremented. Then it is dereferenced again.
To make thing worse, this line:
while (c = *++*argv)
has an assignment in the loop-condition. So now you have two side-effects to make your reader's head spin. YAY!!!
Seems valid to me. Of course, you should not read it left to right, that's not how C compiler parses the source, and that's not how C language grammatics work. As a rule of thumb, you should first locate the object that's subject to operating upon (in this case - argv), and then analyze the operators, often, like in this case, from inside (the object) to outside. The actual parsing (and reading) rules are of course more complicated.
P. S. And personally, I think this line of code is really not hard to understand (and I'm not a C programming guru), so I don't think you should surround it with parentheses as Mysticial suggests. That would only make the code look big, if you know what I mean...
There's no ambiguity, even without knowledge of the precedence rules.
Both ++ and * are prefix unary operators; they can only apply to an operand that follows them. The second * can only apply to argv, the ++ to *argv, and the first * to ++*argv. So it's equivalent to *(++(*argv)). There's no possible relationship between the precedences of ++ and * that could make it mean anything else.
This is unlike something like *argv++, which could conceivably be either (*argv)++ or *(argv++), and you have to apply precedence rules to determine which (it's *(argv++)` because postfix operators bind more tightly than prefix unary operators).
There's a constraint that ++ can only be applied to an lvalue; since *argv is an lvalue, that's not a problem.
Is this code valid? Yes, but that's not what you asked.
Is this code acceptable? That depends (acceptable to who?).
I wouldn't consider it acceptable - I'd consider it "harder to read than necessary" for a few different reasons.
First; lots of programmers have to work with several different languages, potentially with different operator precedence rules. If your code looks like it relies on a specific language's operator precedence rules (even if it doesn't) then people have to stop and try to remember which rules apply to which language.
Second; different programmers have different skill levels. If you're ever working in a large team of developers you'll find that the best programmers write code that everyone can understand, and the worst programmers write code that contains subtle bugs that half of the team can't spot. Most C programmers should understand "*++*argv", but a good programmer knows that a small number of "not-so-good" programmers either won't understand it or will take a while to figure it out.
Third; out of all the different ways of writing something, you should choose the variation that expresses your intent the best. For this code you're working with an array, and therefore it should look like you intend to be working with an array (and not a pointer). Note: For the same reason, "uint32_t foo = 0x00000002;" is better than "uint32_t foo = 0x02;".

Resources