Understanding the difference between ++i and i++ at the Assembly Level - c

I know that variations of this question has been asked here multiple times, but I'm not asking what is the difference between the two. Just would like some help understanding the assembly behind both forms.
I think my question is more related to the whys than to the what of the difference.
I'm reading Prata's C Primer Plus and in the part dealing with the increment operator ++ and the difference between using i++ or ++i the author says that if the operator is used by itself, such as ego++; it doesn't matter which form we use.
If we look at the dissasembly of the following code (compiled with Xcode, Apple LLVM version 9.0.0 (clang-900.0.39.2)):
int main(void)
{
int a = 1, b = 1;
a++;
++b;
return 0;
}
we can see that indeed the form used doesn't matter, since the assembly code is the same for both (both variables would print out a 2 to the screen).
Initializaton of a and b:
0x100000f8d <+13>: movl $0x1, -0x8(%rbp)
0x100000f94 <+20>: movl $0x1, -0xc(%rbp)
Assembly for a++:
0x100000f9b <+27>: movl -0x8(%rbp), %ecx
0x100000f9e <+30>: addl $0x1, %ecx
0x100000fa1 <+33>: movl %ecx, -0x8(%rbp)
Assembly for ++b:
0x100000fa4 <+36>: movl -0xc(%rbp), %ecx
0x100000fa7 <+39>: addl $0x1, %ecx
0x100000faa <+42>: movl %ecx, -0xc(%rbp)
Then the author states that when the operator and its operand are part of a larger expression as, for example, in an assignment statement the use of prefix or postfix it does make a difference.
For example:
int main(void)
{
int a = 1, b = 1;
int c, d;
c = a++;
d = ++b;
return 0;
}
This would print 1 and 2 for c and b, respectively.
And:
Initialization of a and b:
0x100000f46 <+22>: movl $0x1, -0x8(%rbp)
0x100000f4d <+29>: movl $0x1, -0xc(%rbp)
Assembly for c = a++; :
0x100000f54 <+36>: movl -0x8(%rbp), %eax // eax = a = 1
0x100000f57 <+39>: movl %eax, %ecx // ecx = 1
0x100000f59 <+41>: addl $0x1, %ecx // ecx = 2
0x100000f5c <+44>: movl %ecx, -0x8(%rbp) // a = 2
0x100000f5f <+47>: movl %eax, -0x10(%rbp) // c = eax = 1
Assembly for d = ++b; :
0x100000f62 <+50>: movl -0xc(%rbp), %eax // eax = b = 1
0x100000f65 <+53>: addl $0x1, %eax // eax = 2
0x100000f68 <+56>: movl %eax, -0xc(%rbp) // b = eax = 2
0x100000f6b <+59>: movl %eax, -0x14(%rbp) // d = eax = 2
Clearly the assembly code is different for the assignments:
The form c = a++; includes the use of the registers eax and ecx. It uses ecx for performing the increment of a by 1, but uses eax for the assignment.
The form d = ++b; uses ecx for both the increment of b by 1 and the assignment.
My question is:
Why is that?
What determines that c = a++; requires two registers instead of just one (ecx for example)?

In the following statements:
a++;
++b;
neither of the evaluation of the expressions a++ and ++b is used. Here the compiler is actually only interested in the side effects of these operators (i.e.: incrementing the operand by one). In this context, both operators behave in the same way. So, it's no wonder that these statements result in the same assembly code.
However, in the following statements:
c = a++;
d = ++b;
the evaluation of the expressions a++ and ++b is relevant to the compiler because they have to be stored in c and d, respectively:
d = ++b;: b is incremented and the result of this increment assigned to d.
c = a++; : the value of a is first assigned to c and then a is incremented.
Therefore, these operators behave differently in this context. So, it would make sense to result in different assembly code, at least in the beginning, without more aggressive optimizations enabled.

A good compiler would replace this whole code with c = 1; d = 2;. And if those variables aren't used in turn, the whole program is one big NOP - there should be no machine code generated at all.
But you do get machine code, so you are not enabling the optimizer correctly. Discussing the efficiency of non-optimized C code is quite pointless.
Discussing a particular compiler's failure to optimize the code might be meaningful, if a specific compiler is mentioned. Which isn't the case here.
All this code shows is that your compiler isn't doing a good job, possibly because you didn't enable optimizations, and that's it. No other conclusions can be made. In particular, no meaningful discussion about the behavior of i++ versus ++i is possible.

Your test has flaws : the compiler optimized your code by replacing your value with what could be easily predicted.
The compiler can, and will, calculate the result in advance during compilation and avoid the use of 'jmp' instructions (jump to the the while each time condition is still true).
If you try this code:
int a = 0;
int i = 0;
while (i++ < 10)
{
a += i;
}
The assembly will not use a single jmp instruction.
It will directly assign value of ½ n (n + 1), here (0.5 * 10 * 6) = 30 to the register holding the value of 'a' variable
You would have the following assembly output:
mov eax, 30 ; a register
mov ecx, 10 ; i register, this line only if i is still used after.
Whether you write :
int i = 0;
while (i++ < 10)
{
...
}
or
int i = -1;
while (++i < 11)
{
...
}
will also result in the same assembly output.
If you had a much more complex code you would be able to witness differences in the assembly code.
a = ++i;
would translate into :
inc rcx ; increase i by 1, RCX holds the current value of both and i variables.
mov rax, rcx ; a = i;
and a = i++; into :
lea rax, [rcx+1] ; RAX now holds i, RCX now holds a.
mov rax, rcx ; a = i;
inc rcx ; increase i by 1
(edit: See comment below)

Both the expressions ++i and i++ have the effect of incrementing i. The difference is that ++i produces a result (a value stored somewhere, for example in a machine register, that can be used within other expressions) equal to the new value of i, whereas i++ produces a result equal to the original value of i.
So, assuming we start with i having a value of 2, the statement
b = ++i;
has the effect of setting both b and i equal to 3, whereas;
b = i++;
has the effect of setting b equal to 2 and i equal to 3.
In the first case, there is no need to keep track of the original value of i after incrementing i whereas in the second there is. One way of doing this is for the compiler to employ an additional register for i++ compared with ++i.
This is not needed for a trivial expression like
i++;
since the compiler can immediately detect that the original value of i will not be used (i.e. is discarded).
For simple expressions like b = i++ the compiler could - in principle at least - avoid using an additional register, by simply storing the original value of i in b before incrementing i. However, in slightly more complex expressions such as
c = i++ - *p++; // p is a pointer
it can be much more difficult for the compiler to eliminate the need to store old and new values of i and p (unless, of course, the compiler looks ahead and determines how (or if) c, i, and p (and *p) are being used in subsequent code). In more complex expressions (involving multiple variables and interacting operations) the analysis needed can be significant.
It then comes down to implementation choices by developers/designers of the compiler. Practically, compiler vendors compete pretty heavily on compilation time (getting compilation times as small as possible) and, in doing so, may choose not to do all possible code transformations that remove unneeded uses of temporaries (or machine registers).

You compiled with optimization disabled! For gcc and LLVM, that means each C statement is compiled independently, so you can modify variables in memory with a debugger, and even jump to a different source line. To support this, the compiler can't optimize between C statements at all, and in fact spills / reloads everything between statements.
So the major flaw in your analysis is that you're looking at an asm implementation of that statement where the inputs and outputs are memory, not registers. This is totally unrealistic: compilers keep most "hot" values in registers inside inner loops, and don't need separate copies of a value just because it's assigned to multiple C variables.
Compilers generally (and LLVM in particular, I think) transform the input program into an SSA (Static Single Assignment) internal representation. This is how they track data flow, not according to C variables. (This is why I said "hot values", not "hot variables". A loop induction variable might be totally optimized away into a pointer-increment / compare against end_pointer in a loop over arr[i++]).
c = ++i; produces one value with 2 references to it (one for c, one for i). The result can stay in a single register. If it doesn't optimize into part of some other operation, the asm implementation could be as simple as inc %ecx, with the compiler just using ecx/rcx everywhere that c or i is read before the next modification of either. If the next modification of c can't be done non-destructively (e.g. with a copy-and-modify like lea (,%rcx,4), %edx or shrx %eax, %ecx, %edx), then a mov instruction to copy the register will be emitted.
d = b++; produces one new value, and makes d a reference to the old value of b. It's syntactic sugar for d=b; b+=1;, and compiles into SSA the same as that would. x86 has a copy-and-add instruction, called lea. The compiler doesn't care which register holds which value (except in loops, especially without unrolling, when the end of the loop has to have values in the right registers to jump to the beginning of the loop). But other than that, the compiler can do lea 1(%rbx), %edx to leave %ebx unmodified and make EDX hold the incremented value.
An additional minor flaw in your test is that with optimization disabled, the compiler is trying to compile quickly, not well, so it doesn't look for all possible peephole optimizations even within the statement that it does allow itself to optimize.
If the value of c or d is never read, then it's the same as if you had never done the assignment in the first place. (In un-optimized code, every value is implicitly read by the memory barrier between statements.)
What determines that c = a++; requires two registers instead of just one (ecx for example)?
The surrounding code, as always. +1 can be optimized into other operations, e.g. done with an LEA as part of a shift and/or add. Or built in to an addressing mode.
Or before/after negation, use the 2's complement identity that -x == ~x+1, and use NOT instead of NEG. (Although often you're adding the negated value to something, so it turns into a SUB instead of NEG + ADD, so there isn't a stand-alone NEG you can turn into a NOT.)
++ prefix or postfix is too simple to look at on its own; you always have to consider where the input comes from (does the incremented value have to end up back in memory right away or eventually?) and how the incremented and original values are used.
Basically, un-optimized code is un-interesting. Look at optimized code for short functions. See Matt Godbolt's talk at CppCon2017: “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”, and also How to remove "noise" from GCC/clang assembly output? for more about looking at compiler asm output.

Related

How do I translate an optimized x86-64 asm loop back to a C for loop?

I have the following:
foo:
movl $0, %eax //result = 0
cmpq %rsi, %rdi // rdi = x, rsi = y?
jle .L2
.L3:
addq %rdi, %rax //result = result + i?
subq $1, %rdi //decrement?
cmp %rdi, rsi
jl .L3
.L2
rep
ret
And I'm trying to translate it to:
long foo(long x, long y)
{
long i, result = 0;
for (i= ; ; ){
//??
}
return result;
}
I don't know what cmpq %rsi, %rdi mean.
Why isn't there another &eax for long i?
I would love some help in figuring this out. I don't know what I'm missing - I been going through my notes, textbook, and rest of the internet and I am stuck. It's a review question, and I've been at it for hours.
Assuming this is a function taking 2 parameters. Assuming this is using the gcc amd64 calling convention, it will pass the two parameters in rdi and rsi. In your C function you call these x and y.
long foo(long x /*rdi*/, long y /*rsi*/)
{
//movl $0, %eax
long result = 0; /* rax */
//cmpq %rsi, %rdi
//jle .L2
if (x > y) {
do {
//addq %rdi, %rax
result += x;
//subq $1, %rdi
--x;
//cmp %rdi, rsi
//jl .L3
} while (x > y);
}
return result;
}
I don't know what cmpq %rsi, %rdi mean
That's AT&T syntax for cmp rdi, rsi. https://www.felixcloutier.com/x86/CMP.html
You can look up the details of what a single instruction does in an ISA manual.
More importantly, cmp/jcc like cmp %rsi,%rdi/jl is like jump if rdi<rsi.
Assembly - JG/JNLE/JL/JNGE after CMP. If you go through all the details of how cmp sets flags, and which flags each jcc condition checks, you can verify that it's correct, but it's much easier to just use the semantic meaning of JL = Jump on Less-than (assuming flags were set by a cmp) to remember what they do.
(It's reversed because of AT&T syntax; jcc predicates have the right semantic meaning for Intel syntax. This is one of the major reasons I usually prefer Intel syntax, but you can get used to AT&T syntax.)
From the use of rdi and rsi as inputs (reading them without / before writing them), they're the arg-passing registers. So this is the x86-64 System V calling convention, where integer args are passed in RDI, RSI, RDX, RCX, R8, R9, then on the stack. (What are the calling conventions for UNIX & Linux system calls on i386 and x86-64 covers function calls as well as system calls). The other major x86-64 calling convention is Windows x64, which passes the first 2 args in RCX and RDX (if they're both integer types).
So yes, x=RDI and y=RSI. And yes, result=RAX. (writing to EAX zero-extends into RAX).
From the code structure (not storing/reloading every C variable to memory between statements), it's compiled with some level of optimization enabled, so the for() loop turned into a normal asm loop with the conditional branch at the bottom. Why are loops always compiled into "do...while" style (tail jump)? (#BrianWalker's answer shows the asm loop transliterated back to C, with no attempt to form it back into an idiomatic for loop.)
From the cmp/jcc ahead of the loop, we can tell that the compiler can't prove the loop runs a non-zero number of iterations. So whatever the for() loop condition is, it might be false the first time. (That's unsurprising given signed integers.)
Since we don't see a separate register being used for i, we can conclude that optimization reused another var's register for i. Like probably for(i=x;, and then with the original value of x being unused for the rest of the function, it's "dead" and the compiler can just use RDI as i, destroying the original value of x.
I guessed i=x instead of y because RDI is the arg register that's modified inside the loop. We expect that the C source modifies i and result inside the loop, and presumably doesn't modify it's input variables x and y. It would make no sense to do i=y and then do stuff like x--, although that would be another valid way of decompiling.
cmp %rdi, %rsi / jl .L3 means the loop condition to (re)enter the loop is rsi-rdi < 0 (signed), or i<y.
The cmp/jcc before the loop is checking the opposite condition; notice that the operands are reversed and it's checking jle, i.e. jng. So that makes sense, it really is same loop condition peeled out of the loop and implemented differently. Thus it's compatible with the C source being a plain for() loop with one condition.
sub $1, %rdi is obviously i-- or --i. We can do that inside the for(), or at the bottom of the loop body. The simplest and most idiomatic place to put it is in the 3rd section of the for(;;) statement.
addq %rdi, %rax is obviously adding i to result. We already know what RDI and RAX are in this function.
Putting the pieces together, we arrive at:
long foo(long x, long y)
{
long i, result = 0;
for (i= x ; i>y ; i-- ){
result += i;
}
return result;
}
Which compiler made this code?
From the .L3: label names, this looks like output from gcc. (Which somehow got corrupted, removing the : from .L2, and more importantly removing the % from %rsi in one cmp. Make sure you copy/paste code into SO questions to avoid this.)
So it may be possible with the right gcc version/options to get exactly this asm back out for some C input. It's probably gcc -O1, because movl $0, %eax rules out -O2 and higher (where GCC would look for the xor %eax,%eax peephole optimization for zeroing a register efficiently). But it's not -O0 because that would be storing/reloading the loop counter to memory. And -Og (optimize a bit, for debugging) likes to use a jmp to the loop condition instead of a separate cmp/jcc to skip the loop. This level of detail is basically irrelevant for simply decompiling to C that does the same thing.
The rep ret is another sign of gcc; gcc7 and earlier used this in their default tune=generic output for ret that's reached as a branch target or a fall-through from a jcc, because of AMD K8/K10 branch prediction. What does `rep ret` mean?
gcc8 and later will still use it with -mtune=k8 or -mtune=barcelona. But we can rule that out because that tuning option would use dec %rdi instead of subq $1, %rdi. (Only a few modern CPUs have any problems with inc/dec leaving CF unmodified, for register operands. INC instruction vs ADD 1: Does it matter?)
gcc4.8 and later put rep ret on the same line. gcc4.7 and earlier print it as you've shown, with the rep prefix on the line before.
gcc4.7 and later like to put the initial branch before the mov $0, %eax, which looks like a missed optimization. It means they need a separate return 0 path out of the function, which contains another mov $0, %eax.
gcc4.6.4 -O1 reproduces your output exactly, for the source shown above, on the Godbolt compiler explorer
# compiled with gcc4.6.4 -O1 -fverbose-asm
foo:
movl $0, %eax #, result
cmpq %rsi, %rdi # y, x
jle .L2 #,
.L3:
addq %rdi, %rax # i, result
subq $1, %rdi #, i
cmpq %rdi, %rsi # i, y
jl .L3 #,
.L2:
rep
ret
So does this other version which uses i=y. Of course there are many things we could add that would optimize away, like maybe i=y+1 and then having a loop condition like x>--i. (Signed overflow is undefined behaviour in C, so the compiler can assume it doesn't happen.)
// also the same asm output, using i=y but modifying x in the loop.
long foo2(long x, long y) {
long i, result = 0;
for (i= y ; x>i ; x-- ){
result += x;
}
return result;
}
In practice the way I actually reversed this:
I copy/pasted the C template into Godbolt (https://godbolt.org/). I could see right away (from the mov $0 instead of xor-zero, and from the label names) that it looked like gcc -O1 output, so I put in that command line option and picked an old-ish version of gcc like gcc6. (Turns out this asm was actually from a much older gcc).
I tried an initial guess like x<y based on the cmp/jcc, and i++ (before I'd actually read the rest of the asm carefully at all), because for loops often use i++. The trivial-looking infinite-loop asm output showed me that was obviously wrong :P
I guessed that i=x, but after taking a wrong turn with a version that did result += x but i--, I realized that i was a distraction and at first simplified by not using i at all. I just used x-- while first reversing it because obviously RDI=x. (I know the x86-64 System V calling convention well enough to see that instantly.)
After looking at the loop body, the result += x and x-- were totally obvious from the add and sub instructions.
cmp/jl was obviously a something < something loop condition involving the 2 input vars.
I wasn't sure I if it was x<y or y<x, and newer gcc versions were using jne as the loop condition. I think at that point I cheated and looked at Brian's answer to check it really was x > y, instead of taking a minute to work through the actual logic. But once I had figured out it was x--, only x>y made sense. The other one would be true until wraparound if it entered the loop at all, but signed overflow is undefined behaviour in C.
Then I looked at some older gcc versions to see if any made asm more like in the question.
Then I went back and replaced x with i inside the loop.
If this seems kind of haphazard and slapdash, that's because this loop is so tiny that I didn't expect to have any trouble figuring it out, and I was more interested in finding source + gcc version that exactly reproduced it, rather than the original problem of just reversing it at all.
(I'm not saying beginners should find it that easy, I'm just documenting my thought process in case anyone's curious.)

Assembly: CMOVB instruction in Intel x86-64 assembly

I'm a little confused about what "cmovb" does in this assembly code
leal (%rsi, %rsi), %eax // %eax <- %rsi + %rsi
cmpl %esi, %edi // compare %edi and %esi
cmovb %edi, %eax
ret
and the C code for this is:
int foo(unsigned int a, unsigned int b)
{
if(a < b)
return a;
else
return 2*b;
}
Can anyone help me understand how cmovb works here?
Like Jester commented to the question, the cmov* family of instructions are conditional moves, paired via the flags register with a previous (comparison) operation.
You can use for example the Intel documentation as a reference for the x86-64/AMD64 instruction set. The conditional move instructions are shown on page 172 of the combined volume.
cmovb, cmovnae, and cmovc all perform the same way: If the carry flag is set, they move the source operand to the destination operand. Otherwise they do nothing.
If we then look at the preceding instructions that affect flags, we'll see that the cmp instruction (the l suffix is part of AT&T syntax, and means the arguments are "longs") changes the set of flags depending on the difference between the two arguments. In particular, if the second is smaller than the first (in AT&T syntax), the carry flag is set, otherwise the carry flag is cleared; just as if a subtraction was performed without storing the result anywhere. (The cmp instruction affects other flags as well, but they are ignored by the code.)
C MOV B = Conditional MOVe if Below (Carry Flag Set). It literally does what it says, if the condition is met then move. The condition is a<b and the value moved is 2*b
The ABI stores the return value in %edi, so it first stores a and then conditionally overwrites it with 2*b.

Understanding gcc output for if (a>=3)

I thought since condition is a >= 3, we should use jl (less).
But gcc used jle (less or equal).
It make no sense to me; why did the compiler do this?
You're getting mixed up by a transformation the compiler made on the way from the C source to the asm implementation. gcc's output implements your function this way:
a = 5;
if (a<=2) goto ret0;
return 1;
ret0:
return 0;
It's all clunky and redundant because you compiled with -O0, so it stores a to memory and then reloads it, so you could modify it with a debugger if you set a breakpoint and still have the code "work".
See also How to remove "noise" from GCC/clang assembly output?
Compilers generally prefer to reduce the magnitude of a comparison constant, so it's more likely to fit in a sign-extended 8-bit immediate instead of needing a 32-bit immediate in the machine code.
We can get some nice compact code by writing a function that takes an arg, so it won't optimize away when we enable optimizations.
int cmp(int a) {
return a>=128; // In C, a boolean converts to int as 0 or 1
}
gcc -O3 on Godbolt, targetting the x86-64 ABI (same as your code):
xorl %eax, %eax # whole RAX = 0
cmpl $127, %edi
setg %al # al = (edi>127) : 1 : 0
ret
So it transformed a >=128 into a >127 comparison. This saves 3 bytes of machine code, because cmp $127, %edi can use the cmp $imm8, r/m32 encoding (cmp r/m32, imm8 in Intel syntax in Intel's manual), but 128 would have to use cmp $imm32, r/m32.
BTW, comparisons and conditions make sense in Intel syntax, but are backwards in AT&T syntax. For example, cmp edi, 127 / jg is taken if edi > 127.
But in AT&T syntax, it's cmp $127, %edi, so you have to mentally reverse the operands or think of a > instead of <
The assembly code is comparing a to two, not three. That's why it uses jle. If a is less than or equal to two it logically follows that a IS NOT greater than or equal to 3, and therefore 0 should be returned.

Assembly language - How it works

I am really new at learning assembly language and just started digging in to it so I was wondering if maybe some of you guys could help me figure one problem out. I have a homework assignment which tells me to compare assembly language instructions to c code and tell me which c code is equivalent to the assembly instructions. So here is the assembly instructions:
pushl %ebp // What i think is happening here is that we are creating more space for the function.
movl %esp,%ebp // Here i think we are moving the stack pointer to the old base pointer.
movl 8(%ebp),%edx // Here we are taking parameter int a and storing it in %edx
movl 12(%ebp),%eax // Here we are taking parameter int b and storing it in %eax
cmpl %eax,%edx // Here i think we are comparing int a and b ( b > a ) ?
jge .L3 // Jump to .L3 if b is greater than a - else continue the instructions
movl %edx,%eax // If the term is not met here it will return b
.L3:
movl %ebp,%esp // Starting to finish the function
popl %ebp // Putting the base pointer in the right place
ret // return
I am trying to comment it out based on my understanding of this - but I might be totally wrong about this. The options for C functions which one of are suppose to be equivalent to are:
int fun1(int a, int b)
{
unsigned ua = (unsigned) a;
if (ua < b)
return b;
else
return ua;
}
int fun2(int a, int b)
{
if (b < a)
return b;
else
return a;
}
int fun3(int a, int b)
{
if (a < b)
return a;
else
return b;
}
I think the correct answer is fun3 .. but I'm not quite sure.
First off, welcome to StackOverflow. Great place, really it is.
Now for starters, let me help you; a lot; a whole lot.
You have good comments that help both you and me and everyone else tremendously, but they are so ugly that reading them is painful.
Here's how to fix that: white space, lots of it, blank lines, and grouping the instructions into small groups that are related to each other.
More to the point, after a conditional jump, insert one blank line, after an absolute jump, insert two blank lines. (Old tricks, work great for readability)
Secondly, line up the comments so that they are neatly arranged. It looks a thousand times better.
Here's your stuff, with 90 seconds of text arranging by me. Believe me, the professionals will respect you a thousand times better with this kind of source code...
pushl %ebp // What i think is happening here is that we are creating more space for the function.
movl %esp,%ebp // Here i think we are moving the stack pointer to the old base pointer.
movl 8(%ebp),%edx // Here we are taking parameter int a and storing it in %edx
movl 12(%ebp),%eax // Here we are taking parameter int b and storing it in %eax
cmpl %eax,%edx // Here i think we are comparing int a and b ( b > a ) ?
// No, Think like this: "What is the value of edx with respect to the value of eax ?"
jge .L3 // edx is greater, so return the value in eax as it is
movl %edx,%eax // If the term is not met here it will return b
// (pssst, I think you're wrong; think it through again)
.L3:
movl %ebp,%esp // Starting to finish the function
popl %ebp // Putting the base pointer in the right place
ret // return
Now, back to your problem at hand. What he's getting at is the "sense" of the compare instruction and the related JGE instruction.
Here's the confuse-o-matic stuff you need to comprehend to survive these sorts of "academic experiences"
This biz, the cmpl %eax,%edx instruction, is one of the forms of the "compare" instructions
Try to form an idea something like this when you see that syntax, "...What is the value of the destination operand with respect to the source operand ?..."
Caveat: I am absolutely no good with the AT&T syntax, so anybody is welcome to correct me on this.
Anyway, in this specific case, you can phrase the idea in your mind like this...
"...I see cmpl %eax,%edx so I think: With respect to eax, the value in edx is..."
You then complete that sentence in your mind with the "sense" of the next instruction which is a conditional jump.
The paradigmatic process in the human brain works out to form a sentence like this...
"...With respect to eax, the value in edx is greater or equal, so I jump..."
So, if you are correct about the locations of a and b, then you can do the paradigmatic brain scrambler and get something like this...
"...With respect to the value in b, that value in a is greater or equal, so I will jump..."
To get a grasp of this, take note that JGE is the "opposite sense" if you will, of JL (i.e., "Jump if less than")
Okay, now it so happens that return in C is related to the ret instruction in assembly language, but it isn't the same thing.
When C programmers say "...That function returns an int..." what they mean is...
The assembly language subroutine will place a value in Eax
The subroutine will then fix the stack and put it back in neat order
The subroutine will then execute its Ret instruction
One more item of obfuscation is thrown in your face now.
These following conditional jumps are applicable to Signed arithmetic comparison operations...
JG
JGE
JNG
JL
JLE
JNL
There it is ! The trap waiting to screw you up in all this !
Do you want to do signed or unsigned compares ???
By the way, I've never seen anybody do anything like that first function where an unsigned number is compared with a signed number. Is that even legal ?
So anyway, we put all these facts together, and we get: This assembly language routine returns the value in a if it is less than the value in b otherwise it returns the value in b.
These values are evaluated as signed integers.
(I think I got that right; somebody check my logic. I really don't like that assembler's syntax at all.)
So anyway, I am reasonably certain that you don't want to ask people on the internet to provide you with the specific answer to your specific homework question, so I'll leave it up to you to figure it out from this explanation.
Hopefully, I have explained enough of the logic and the "sense" of comparisons and the signed and unsigned biz so that you can get your brain around this.
Oh, and disclaimer again, I always use the Intel syntax (e.g., Masm, Tasm, Nasm, whatever) so if I got something backwards here, feel free to correct it for me.

where is rvalue stored in c?

in C, i have this code piece:
int a;
a = 10 + 5 - 3
I want to ask: where is (10+5-3) stored at?
(As far as I know, a is located on stack, how about (10+5-3)? How does this rvalue get calculated?)
Typically, the r-value is "stored" within the program itself.
In other words, the compiler itself (before the program is ever run) computes the 10 + 5 - 3 value (it can do so since since it is all based on constant immediate values), and it emits the assembly code to store the result of this calculation in whatever l-value for the assignement (in this case, the variable named a, which the compiler probably knows as a relative address to a data segment origin of sorts).
The r-value, which has a value of 12 is therefore only found inside the binary of the program, within a assembly instruction that looks like
mov <some dest, typically DS-relative>, $0C
$0C is the "r-value".
If the r-value happened to be the result of a calculation that can only done at run-time, say if the underlying c code was: a = 17 * x; // x some run time var, the r-value would too be "stored" (or rather materialized) as a series of instructions within the program binary. The difference with the simple "mov dest, imm" above is that it would take several instructions to load the variable x in an accumulator, multiply by 17 and store the result at the address where the variable a is. It is possible that the compiler may "authorize itself" ;-) to use the stack for some intermediate result etc. but such would be
a) completely compiler dependent
b) transiant
c) and typically would only involve part of the r-value
it is therefore safe to say that the r-value is a compile-time concept which is encapsulated in parts of the program (not the data), and isn't stored anywhere but in the program binary.
In response to paxdiablo: the explanation offered above is indeed restrictive of the possibilities because the c standard effectively does not dictate anything of that nature. Never the less, most any r-value is eventually materialized, at least in part, by some instructions which sets things up so that the proper value, whether calculated (at run time) or immediate gets addressed properly.
Constants are probably simplified at compile time, so your question as literally posed may not help. But something like, say, i - j + k that does need to be computed at runtime from some variables, may be "stored" wherever the compiler likes, depending on the CPU architecture: the compiler will typically try to do its best to use registers, e.g.
LOAD AX, i
SUB AX, j
ADD AX, k
to compute such an expression "storing" it in the accumulator register AX, before assigning it to some memory location with STORE AX, dest or the like. I'd be pretty surprised if a modern optimizing compiler on an even semi-decent CPU architecture (yeah, x86 included!-) needed to spill registers to memory for any reasonably simple expression!
This is compiler dependent. Usually the value (12) will be calculated by the compiler. It is then stored in the code, typically as part of a load/move immediate assembly instruction.
The result of the computation in the RHS (right-hand-side) is computed by the compiler in a step that's called "constant propagation".
Then, it is stored as an operand of the assembly instruction moving the value into a
Here's a disassembly from MSVC:
int a;
a = 10 + 5 - 3;
0041338E mov dword ptr [a],0Ch
Where it stores it is actually totally up to the compiler. The standard does not dictate this behavior.
A typical place can be seen by actually compiling the code and looking at the assembler output:
int main (int argc, char *argv[]) {
int a;
a = 10 + 5 - 3;
return 0;
}
which produces:
.file "qq.c"
.def ___main;
.scl 2;
.type 32;
.endef
.text
.globl _main
.def _main;
.scl 2;
.type 32;
.endef
_main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
call __alloca
call ___main
movl $12, -4(%ebp) ;*****
movl $0, %eax
leave
ret
The relevant bit is marked ;***** and you can see that the value is created by the compiler and just inserted directly into a mov type instruction.
Note that it's only this simple because the expression is a constant value. As soon as you introduce non-constant values (like variables), the code becomes a little more complicated. That's because you have to look those variables up in memory (or they may already be in a register) and then manipulate the values at run-time, not compile-time.
As to how the compiler calculates what the value should be, that's to do with expression evaluation and is a whole other question :-)
Your question is based on an incorrect premise.
The defining property of lvalue in C is that it has a place in storage, i.e it is stored. This is what differentiates lvalue from rvalue. Rvalue is not stored anywhere. That's what makes it an rvalue. If it were stored, it would be lvalue by definition.
The terms "lvalue" and "rvalue" are used to bisect the world of expressions. That is, (10+5-3) is an expression that happens to be an rvalue (because you cannot apply the & operator to it -- in C++ the rules are more complicated). At runtime, there are no expressions, lvalues or rvalues. In particular, they aren't stored anywhere.
You were wondering where the value 12 was stored, but the value 12 is neither an lvalue nor an rvalue (as opposed to the expression 12 which would be an rvalue, but 12 does not appear in your program).

Resources