unable to understand the base pointer calculation in assembly code [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am trying to understand the assembly code for the following function by doing a disassembler. I am unable to get why all the operations are relative to the base pointer.
Why are the register values of rcx and rdx moved to memory location offset by 10 and 18?
( mov 0x10(%rbp),%rax and mov %rdx,0x18(%rbp) ).
Why is the return value stored in the
mov %rax,-0x8(%rbp)
long absdiff(long x, long y)
{
long result;
if (x>y)
result = x-y;
else
result = y-x;
return result;
}
0x00000001004010e0 <+0>: push %rbp
0x00000001004010e1 <+1>: mov %rsp,%rbp
0x00000001004010e4 <+4>: sub $0x10,%rsp
0x00000001004010e8 <+8>: mov %rcx,0x10(%rbp)
0x00000001004010ec <+12>: mov %rdx,0x18(%rbp)
0x00000001004010f0 <+16>: mov 0x10(%rbp),%rax
0x00000001004010f4 <+20>: cmp 0x18(%rbp),%rax
0x00000001004010f8 <+24>: jle 0x100401108 <absdiff+40>
0x00000001004010fa <+26>: mov 0x10(%rbp),%rax
0x00000001004010fe <+30>: sub 0x18(%rbp),%rax
0x0000000100401102 <+34>: mov %rax,-0x8(%rbp)
0x0000000100401106 <+38>: jmp 0x100401114 <absdiff+52>
0x0000000100401108 <+40>: mov 0x18(%rbp),%rax
0x000000010040110c <+44>: sub 0x10(%rbp),%rax
0x0000000100401110 <+48>: mov %rax,-0x8(%rbp)
0x0000000100401114 <+52>: mov -0x8(%rbp),%rax
0x0000000100401118 <+56>: add $0x10,%rsp
0x000000010040111c <+60>: pop %rbp
0x000000010040111d <+61>: retq

1) Why sub $0x10, %rsp?
It is actually subtracting 16 bytes, in other words, its making space for the two 'long' arguments. try printing 'sizeof(long)' and I'm pretty sure you'll get '8' as the answer on the machine you're on.
2) Why move register values to memory?
Again this is where the computer is loading the two long values from the registers 'rcx' and 'rdx' into the memory space it made in '1)'. 0x10 and 0x18 have a difference of 8 bytes.
3) Why is the return value stored in the mov %rax,-0x8(%rbp)?
It's stored temporarily because before leaving the function, the %rax register is used for some other computations. Therefore if it was not saved it would have been over written, and you can see that after those computations are done the value is again loaded into rax.
mov%rax,-0x8(%rbp) <--- saving
jmp 0x100401114 <absdiff+52>
...
mov %rax,-0x8(%rbp)
-0x8(%rbp),%rax" < -- retrieving
A Suggestion
I'm pretty sure you'll find this link really helpful:
https://www.recurse.com/blog/7-understanding-c-by-learning-assembly

Related

Understanding x86-64 assembly for simple program in C with a function call

I have simple C program that produces this x86-64 assembly for function func
#include <stdio.h>
#include <string.h>
void func(char *name)
{
char buf[90];
strcpy(buf, name);
printf("Welcome %s\n", buf);
}
int main(int argc, char *argv[])
{
func(argv[1]);
return 0;
}
So I think this
0x000000000000118d <+4>: push %rbp
pushes the base pointer like placed argument which is char *name
then 0x000000000000118e <+5>: mov %rsp,%rbp set stack pointer to what at base pointer I belive that above and this makes stack point points to char *name at this point
then
0x0000000000001191 <+8>: add $0xffffffffffffff80,%rsp
I am little unsure about this. Why is 0xffffffffffffff80 added to rsp? What is the point of this instruction. Can any one please tell.
then in next instruction 0x0000000000001195 <+12>: mov %rdi,-0x78(%rbp)
its just setting -128 decimal to rdi. But still no buffer char buf[90] can be seen, where is my buffer? in following assmebly, can anyone please tell?
also what this line 0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
Dump of assembler code for function func:
0x0000000000001189 <+0>: endbr64
0x000000000000118d <+4>: push %rbp
0x000000000000118e <+5>: mov %rsp,%rbp
0x0000000000001191 <+8>: add $0xffffffffffffff80,%rsp
0x0000000000001195 <+12>: mov %rdi,-0x78(%rbp)
0x0000000000001199 <+16>: mov %fs:0x28,%rax
0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
0x00000000000011a6 <+29>: xor %eax,%eax
0x00000000000011a8 <+31>: mov -0x78(%rbp),%rdx
0x00000000000011ac <+35>: lea -0x70(%rbp),%rax
0x00000000000011b0 <+39>: mov %rdx,%rsi
0x00000000000011b3 <+42>: mov %rax,%rdi
0x00000000000011b6 <+45>: call 0x1070 <strcpy#plt>
0x00000000000011bb <+50>: lea -0x70(%rbp),%rax
0x00000000000011bf <+54>: mov %rax,%rsi
0x00000000000011c2 <+57>: lea 0xe3b(%rip),%rax # 0x2004
0x00000000000011c9 <+64>: mov %rax,%rdi
0x00000000000011cc <+67>: mov $0x0,%eax
0x00000000000011d1 <+72>: call 0x1090 <printf#plt>
0x00000000000011d6 <+77>: nop
0x00000000000011d7 <+78>: mov -0x8(%rbp),%rax
0x00000000000011db <+82>: sub %fs:0x28,%rax
0x00000000000011e4 <+91>: je 0x11eb <func+98>
0x00000000000011e6 <+93>: call 0x1080 <__stack_chk_fail#plt>
0x00000000000011eb <+98>: leave
0x00000000000011ec <+99>: ret
End of assembler dump.
also what in above assembly the use of fs register what this instruction actually doing 0x0000000000001199 <+16>: mov %fs:0x28,%rax
As already mentioned in comments, your buffer is on the stack.
In the beginning of the function the rsp is decreased to allow more space (stack grows towards lower addresses, thus rsp is decreased as stack grows). This space is generally used for local variables, arguments passed to the function, and also for other purposes (will get back to it below).
In your case, you may trace back where your buffer buf is by looking at what arguments are passed to the strcpy - the first argument is passed in rdi register, the second - in rsi.
0x00000000000011b0 <+39>: mov %rdx,%rsi
0x00000000000011b3 <+42>: mov %rax,%rdi
0x00000000000011b6 <+45>: call 0x1070 <strcpy#plt>
In the snippet above you can see that the pointer to buf (first argument to strcpy) was in rax prior to being put to rdi. And rax got its value from this instruction:
0x00000000000011ac <+35>: lea -0x70(%rbp),%rax
which means "load effective address (i.e. a pointer) that resides at offset -0x70 from the address rbp is pointing to". rbp points to where the stack pointer was in the beginning of the function (function frame pointer).
So it answers where the compiler has put your buffer.
Now for other questions:
then in next instruction 0x0000000000001195 <+12>:
mov %rdi,-0x78(%rbp) its just setting -128 decimal to rdi.
As we said, rdi holds the first argument to a function. Here it holds a first argument to func(), which is a pointer to name. This instruction puts this argument onto a stack at an offset of -0x78 from rbp - 8 bytes right before the space reserved for your buffer buf.
And the last two questions are related:
also what this line 0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
and
also what in above assembly the use of fs register what this instruction actually doing 0x0000000000001199 <+16>: mov %fs:0x28,%rax
0x0000000000001199 <+16>: mov %fs:0x28,%rax
0x00000000000011a2 <+25>: mov %rax,-0x8(%rbp)
...
...
0x00000000000011d7 <+78>: mov -0x8(%rbp),%rax
0x00000000000011db <+82>: sub %fs:0x28,%rax
0x00000000000011e4 <+91>: je 0x11eb <func+98>
0x00000000000011e6 <+93>: call 0x1080 <__stack_chk_fail#plt>
0x00000000000011eb <+98>: leave
There is some value at %fs:0x28 (which denotes an offset of 0x28 in an fs segment). And this value is being placed (via rax) to the stack. To the very first 8 bytes in the space allocated for your function. And there it stays, hopefully untouched, until the function is about to return. There, it checks whether the value on the stack was changed. If it remained unchanged, the jump (je) will take you to the leave and the function will return. If, by any chance, the value on the stack got changed - your code has caused a stack overflow (aha!) and a call to __stack_chk_fail will be triggered, which perhaps will warn you about the overflow, and perhaps dump some debug information. So the value at %fs:0x28 is a kind of a unique magic/canary value.
And one last thing - about why add $0xffffffffffffff80,%rsp was used to allocate space on the stack, and not sub - other compilers do use sub as did GCC (version 8.5.0 20210514):
sub $0x70,%rsp
It allocated less, and one of the reasons is that the compiler did not reserve space for the stack overflow check.
As to "why use an add %rsp rather than a sub %rsp instruction":
On x86_64 there are actually two versions of these add/sub immediate with rsp instructions
a 4 byte version with a 1 byte immediate
a 7 byte version with a 4 byte immediate
For both versions, the immediate will be sign-extended to 64 bits and then added to (or subtracted from) %rsp. Now because of that sign extension, a 1-byte immediate can be any value from -128 (-0x80) up to 127 (0x7f). So the instruction
add $-0x80, %rsp
can use the 4-byte encoding, while the instruction
sub $0x80, %rsp
would require the 7 byte encoding. All else being equal (as it never is), the shorter encoding is better as it occupies less memory/cache.

Basic buffer overflow tutorial

I'm learning about basic buffer overflows, and I have the following C code:
int your_fcn()
{
char buffer[4];
int *ret;
ret = buffer + 8;
(*ret) += 16;
return 1;
}
int main()
{
int mine = 0;
int yours = 0;
yours = your_fcn();
mine = yours + 1;
if(mine > yours)
printf("You lost!\n");
else
printf("You won!\n");
return EXIT_SUCCESS;
}
My goal is to bypass the line mine = yours + 1;, skip straight to the if statement comparison, so I can "win". main() cannot be touched, only your_fcn() can.
My approach is to override the return address with a buffer overflow. So in this case, I identified that the return address should be 8 bytes away from buffer, since buffer is 4 bytes and EBP is 4 bytes. I then used gdb to identify that the line I want to jump to is 16 bytes away from the function call. Here is the result from gdb:
(gdb) disassemble main
Dump of assembler code for function main:
0x0000054a <+0>: lea 0x4(%esp),%ecx
0x0000054e <+4>: and $0xfffffff0,%esp
0x00000551 <+7>: pushl -0x4(%ecx)
0x00000554 <+10>: push %ebp
0x00000555 <+11>: mov %esp,%ebp
0x00000557 <+13>: push %ebx
0x00000558 <+14>: push %ecx
0x00000559 <+15>: sub $0x10,%esp
0x0000055c <+18>: call 0x420 <__x86.get_pc_thunk.bx>
0x00000561 <+23>: add $0x1a77,%ebx
0x00000567 <+29>: movl $0x0,-0xc(%ebp)
0x0000056e <+36>: movl $0x0,-0x10(%ebp)
0x00000575 <+43>: call 0x51d <your_fcn>
0x0000057a <+48>: mov %eax,-0x10(%ebp)
0x0000057d <+51>: mov -0x10(%ebp),%eax
0x00000580 <+54>: add $0x1,%eax
0x00000583 <+57>: mov %eax,-0xc(%ebp)
0x00000586 <+60>: mov -0xc(%ebp),%eax
0x00000589 <+63>: cmp -0x10(%ebp),%eax
0x0000058c <+66>: jle 0x5a2 <main+88>
0x0000058e <+68>: sub $0xc,%esp
0x00000591 <+71>: lea -0x1988(%ebx),%eax
I see the line 0x00000575 <+43>: call 0x51d <your_fcn> and 0x00000583 <+57>: mov %eax,-0xc(%ebp) are four lines away from each other, which tells me I should offset ret by 16 bytes. But the address from gdb says something different. That is, the function call starts on 0x00000575 and the line I want to jump to is on 0x00000583, which means that they are 15 bytes away?
Either way, whether I use 16 bytes or 15 bytes, I get a segmentation fault error and I still "lose".
Question: What am I doing wrong? Why don't the address given in gdb go by 4 bytes at a time and what's actually going on here. How can I correctly jump to the line I want?
Clarification: This is being done on a x32 machine on a VM running linux Ubuntu. I'm compiling with the command gcc -fno-stack-protector -z execstack -m32 -g guesser.c -o guesser.o, which turns stack protector off and forces x32 compilation.
gdb of your_fcn() as requested:
(gdb) disassemble your_fcn
Dump of assembler code for function your_fcn:
0x0000051d <+0>: push %ebp
0x0000051e <+1>: mov %esp,%ebp
0x00000520 <+3>: sub $0x10,%esp
0x00000523 <+6>: call 0x5c3 <__x86.get_pc_thunk.ax>
0x00000528 <+11>: add $0x1ab0,%eax
0x0000052d <+16>: lea -0x8(%ebp),%eax
0x00000530 <+19>: add $0x8,%eax
0x00000533 <+22>: mov %eax,-0x4(%ebp)
0x00000536 <+25>: mov -0x4(%ebp),%eax
0x00000539 <+28>: mov (%eax),%eax
0x0000053b <+30>: lea 0xc(%eax),%edx
0x0000053e <+33>: mov -0x4(%ebp),%eax
0x00000541 <+36>: mov %edx,(%eax)
0x00000543 <+38>: mov $0x1,%eax
0x00000548 <+43>: leave
0x00000549 <+44>: ret
x86 has variable length instructions, so you cannot simply count instructions and multiply by 4. Since you have the output from gdb, trust it to determine the address of each instruction.
The return address from the function is the address after the call instruction. In the code shown, this would be main+48.
The if statement starts at main+60, not main+57. The instruction at main+57 stores yours+1 into mine. So to adjust the return address to return to the if statement, you should add 12 (that is, 60 - 48).
Doing that skips the assignments to both yours and mine. Since they are both initialized to 0, it will print "You won".

GDB disass main [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm doing a security project for my school.
For this project I have a binary, and I have to do 2 things, make a pseudo code of this binary and do the exploit.
To get better in ASM I'm trying to do exactly the same source code in c. I have a problem with edx in the main. I have no idee how to do this in c:
0x080484a5 <+41>: mov edx,0x8048468
This is the full main code:
Dump of assembler code for function main:
0x0804847c <+0>: push ebp
0x0804847d <+1>: mov ebp,esp
0x0804847f <+3>: and esp,0xfffffff0
0x08048482 <+6>: sub esp,0x20
0x08048485 <+9>: mov DWORD PTR [esp],0x40
0x0804848c <+16>: call 0x8048350 <malloc#plt>
0x08048491 <+21>: mov DWORD PTR [esp+0x1c],eax
0x08048495 <+25>: mov DWORD PTR [esp],0x4
0x0804849c <+32>: call 0x8048350 <malloc#plt>
0x080484a1 <+37>: mov DWORD PTR [esp+0x18],eax
0x080484a5 <+41>: mov edx,0x8048468
0x080484aa <+46>: mov eax,DWORD PTR [esp+0x18]
0x080484ae <+50>: mov DWORD PTR [eax],edx
0x080484b0 <+52>: mov eax,DWORD PTR [ebp+0xc]
0x080484b3 <+55>: add eax,0x4
0x080484b6 <+58>: mov eax,DWORD PTR [eax]
0x080484b8 <+60>: mov edx,eax
0x080484ba <+62>: mov eax,DWORD PTR [esp+0x1c]
0x080484be <+66>: mov DWORD PTR [esp+0x4],edx
0x080484c2 <+70>: mov DWORD PTR [esp],eax
0x080484c5 <+73>: call 0x8048340 <strcpy#plt>
0x080484ca <+78>: mov eax,DWORD PTR [esp+0x18]
0x080484ce <+82>: mov eax,DWORD PTR [eax]
0x080484d0 <+84>: call eax
0x080484d2 <+86>: leave
0x080484d3 <+87>: ret
Can you help me to find how to do the line main + 41 please :) ?
Thank you
0x8048468 is probably a pointer value based on the number. Maybe a function pointer, because it's in the page above 0x8048350 (the PLT entry for malloc). But maybe just a pointer to a static buffer (maybe a read-only buffer, like a string literal).
So perhaps void *edx = "hello world"; or void *edx = &some_function, and then use it somehow. C statements don't map to single asm instructions, but with un-optimized output (gcc -O0), each C statement does map to a contiguous block of instructions that finishes with all values in memory. (This means you can modify C variables with a debugger and still have it "work" in un-optimized code.)
I didn't trace through the mess of store/reload that looks like un-optimized code, so I'm not sure what exactly is being done with that value after it's stored to memory in the next instruction after the mov-immediate.
Look at your compiler's asm output if you have source (gcc -S instead of compiling all the way to a binary and then disassemblign), or use objdump -drwC -Mintel to get relocation info for that value if there is any. Or use nm to look for it in the symbol table.
If it is a function pointer, the disassembly for that address should make some sense.

Why gcc disassembler allocating extra space for local variable?

I have written simple function in C,
void GetInput()
{
char buffer[8];
gets(buffer);
puts(buffer);
}
When I disassemble it in gdb's disassembler, it gives following disassembly.
0x08048464 <+0>: push %ebp
0x08048465 <+1>: mov %esp,%ebp
0x08048467 <+3>: sub $0x10,%esp
0x0804846a <+6>: mov %gs:0x14,%eax
0x08048470 <+12>: mov %eax,-0x4(%ebp)
0x08048473 <+15>: xor %eax,%eax
=> 0x08048475 <+17>: lea -0xc(%ebp),%eax
0x08048478 <+20>: mov %eax,(%esp)
0x0804847b <+23>: call 0x8048360 <gets#plt>
0x08048480 <+28>: lea -0xc(%ebp),%eax
0x08048483 <+31>: mov %eax,(%esp)
0x08048486 <+34>: call 0x8048380 <puts#plt>
0x0804848b <+39>: mov -0x4(%ebp),%eax
0x0804848e <+42>: xor %gs:0x14,%eax
0x08048495 <+49>: je 0x804849c <GetInput+56>
0x08048497 <+51>: call 0x8048370 <__stack_chk_fail#plt>
0x0804849c <+56>: leave
0x0804849d <+57>: ret
Now please look at line number three, 0x08048467 <+3>: sub $0x10,%esp, I have only 8 bytes allocated as local variable, then why compiler is allocating 16 bytes(0x10).
Secondly, what is meaning of xor %gs:0x14,%eax.
#Edit: If it is optimization, is there any way to stop it.
Thanks.
Two things:
The compiler may reserve space for intermediate expressions to which you did not give names in the source code (or conversely not allocate space for local variables that can live entirely in registers). The list of stack slots in the binary does not have to match the list of local variables in the source code.
On some platforms, the compiler has to keep the stack pointer aligned. For the particular example in your question, it is likely that the compiler is striving to keep the stack pointer aligned to a boundary of 16 bytes.
Regarding your other question that you should have asked separately, xor %gs:0x14,%eax is clearly part of a stack protection mechanism, enabled by default. If you are using GCC, turn it off with -fno-stack-protector.
Besides the other answers already given, gcc will prefer to keep the stack 16-byte aligned for storing SSE values on the stack since some (all?) of the SSE instructions require their memory argument to be 16-byte aligned.
This more builds upon Pascal's answer, but in this case, it's probably because of the stack protection mechanism.
You allocate 8 bytes, which is fair enough and taken into account with the stack pointer. In addition, the current stack protection address is saved to %ebp, which points to the top of the current stack frame on the following lines
0x0804846a <+6>: mov %gs:0x14,%eax
0x08048470 <+12>: mov %eax,-0x4(%ebp)
This appears to take a four bytes. Given this, the other four bytes are probably for alignment of some form, or are taken up with some other stack information on the following lines:
=> 0x08048475 <+17>: lea -0xc(%ebp),%eax
0x08048478 <+20>: mov %eax,(%esp)

How does this disassembly output correlate to the function source? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I have the following function in C:
uint myFunction (uint arrayLen,
uint* array
)
{
uint i;
uint j;
uint sum = 0;
for (i = 0; i < arrayLen/2; i++)
for (j = 0; j < arrayLen; j++)
if (array[i*2] == array[j])
sum += mySumFunction(arrayLen,array,6);
mySortFunction(arrayLen,array);
for (i = 0; i < arrayLen/2; i++)
for (j = 0; j < arrayLen; j++)
if (array[i*2] == array[j])
sum -= mySumFunction(arrayLen,array,7);
return(sum);
}
And here's the output for the disassemble command on the function
Dump of assembler code for function myFunction:
0x080486a8 <myFunction+0>: push %ebp
0x080486a9 <myFunction+1>: mov %esp,%ebp
0x080486ab <myFunction+3>: sub $0x1c,%esp
0x080486ae <myFunction+6>: call 0x8048418 <mcount#plt>
0x080486b3 <myFunction+11>: movl $0x0,-0x4(%ebp)
0x080486ba <myFunction+18>: movl $0x0,-0xc(%ebp)
0x080486c1 <myFunction+25>: jmp 0x8048713 <myFunction+107>
0x080486c3 <myFunction+27>: movl $0x0,-0x8(%ebp)
0x080486ca <myFunction+34>: jmp 0x8048707 <myFunction+95>
0x080486cc <myFunction+36>: mov -0xc(%ebp),%eax
0x080486cf <myFunction+39>: shl $0x3,%eax
0x080486d2 <myFunction+42>: add 0xc(%ebp),%eax
0x080486d5 <myFunction+45>: mov (%eax),%edx
0x080486d7 <myFunction+47>: mov -0x8(%ebp),%eax
0x080486da <myFunction+50>: shl $0x2,%eax
0x080486dd <myFunction+53>: add 0xc(%ebp),%eax
0x080486e0 <myFunction+56>: mov (%eax),%eax
0x080486e2 <myFunction+58>: cmp %eax,%edx
0x080486e4 <myFunction+60>: jne 0x8048703 <myFunction+91>
0x080486e6 <myFunction+62>: movl $0x6,0x8(%esp)
0x080486ee <myFunction+70>: mov 0xc(%ebp),%eax
0x080486f1 <myFunction+73>: mov %eax,0x4(%esp)
0x080486f5 <myFunction+77>: mov 0x8(%ebp),%eax
0x080486f8 <myFunction+80>: mov %eax,(%esp)
0x080486fb <myFunction+83>: call 0x8048668 <mySumFunction>
0x08048700 <myFunction+88>: add %eax,-0x4(%ebp)
0x08048703 <myFunction+91>: addl $0x1,-0x8(%ebp)
0x08048707 <myFunction+95>: mov -0x8(%ebp),%eax
0x0804870a <myFunction+98>: cmp 0x8(%ebp),%eax
0x0804870d <myFunction+101>: jb 0x80486cc <myFunction+36>
0x0804870f <myFunction+103>: addl $0x1,-0xc(%ebp)
0x08048713 <myFunction+107>: mov 0x8(%ebp),%eax
0x08048716 <myFunction+110>: shr %eax
0x08048718 <myFunction+112>: cmp -0xc(%ebp),%eax
0x0804871b <myFunction+115>: ja 0x80486c3 <myFunction+27>
0x0804871d <myFunction+117>: mov 0xc(%ebp),%eax
0x08048720 <myFunction+120>: mov %eax,0x4(%esp)
0x08048724 <myFunction+124>: mov 0x8(%ebp),%eax
0x08048727 <myFunction+127>: mov %eax,(%esp)
0x0804872a <myFunction+130>: call 0x80485cc <mySortFunction>
0x0804872f <myFunction+135>: movl $0x0,-0xc(%ebp)
0x08048736 <myFunction+142>: jmp 0x8048788 <myFunction+224>
0x08048738 <myFunction+144>: movl $0x0,-0x8(%ebp)
0x0804873f <myFunction+151>: jmp 0x804877c <myFunction+212>
0x08048741 <myFunction+153>: mov -0xc(%ebp),%eax
0x08048744 <myFunction+156>: shl $0x3,%eax
0x08048747 <myFunction+159>: add 0xc(%ebp),%eax
0x0804874a <myFunction+162>: mov (%eax),%edx
0x0804874c <myFunction+164>: mov -0x8(%ebp),%eax
0x0804874f <myFunction+167>: shl $0x2,%eax
0x08048752 <myFunction+170>: add 0xc(%ebp),%eax
0x08048755 <myFunction+173>: mov (%eax),%eax
0x08048757 <myFunction+175>: cmp %eax,%edx
0x08048759 <myFunction+177>: jne 0x8048778 <myFunction+208>
0x0804875b <myFunction+179>: movl $0x7,0x8(%esp)
0x08048763 <myFunction+187>: mov 0xc(%ebp),%eax
0x08048766 <myFunction+190>: mov %eax,0x4(%esp)
0x0804876a <myFunction+194>: mov 0x8(%ebp),%eax
0x0804876d <myFunction+197>: mov %eax,(%esp)
0x08048770 <myFunction+200>: call 0x8048668 <mySumFunction>
0x08048775 <myFunction+205>: sub %eax,-0x4(%ebp)
0x08048778 <myFunction+208>: addl $0x1,-0x8(%ebp)
0x0804877c <myFunction+212>: mov -0x8(%ebp),%eax
0x0804877f <myFunction+215>: cmp 0x8(%ebp),%eax
0x08048782 <myFunction+218>: jb 0x8048741 <myFunction+153>
0x08048784 <myFunction+220>: addl $0x1,-0xc(%ebp)
0x08048788 <myFunction+224>: mov 0x8(%ebp),%eax
0x0804878b <myFunction+227>: shr %eax
0x0804878d <myFunction+229>: cmp -0xc(%ebp),%eax
0x08048790 <myFunction+232>: ja 0x8048738 <myFunction+144>
0x08048792 <myFunction+234>: mov -0x4(%ebp),%eax
0x08048795 <myFunction+237>: leave
0x08048796 <myFunction+238>: ret
End of assembler dump.
I was wondering if someone could help me read and translate the assembly instructions to the C code above. I was specifically looking for an explanation regarding the division portion seen in the code (arrayLen/2). I was under the impression that it would translate into a right shift instruction, but when I didn't see that in the assembly code I wasn't sure what was happening.
Edit: I've added the missing assembly code. It also looks like I found the explanation for the division portion.
It doesn't look like you have the whole function, but here's a breakdown of what's here:
These 3 instructions set up your stack frame:
0x080486a8 <myFunction+0>: push %ebp
0x080486a9 <myFunction+1>: mov %esp,%ebp
0x080486ab <myFunction+3>: sub $0x1c,%esp
I'm not sure what this is for:
0x080486ae <myFunction+6>: call 0x8048418 <mcount#plt>
This is i and sum getting initialized to 0 (the are stored on the stack):
0x080486b3 <myFunction+11>: movl $0x0,-0x4(%ebp)
0x080486ba <myFunction+18>: movl $0x0,-0xc(%ebp)
This is the beginning of the outer for loop. Typically, a loop in assembly starts by jumping to the end. The end in this case is past the end of your assembly listing.
0x080486c1 <myFunction+25>: jmp 0x8048713 <myFunction+107>
This is j getting initialized to 0. It's done here because it has to be reset every time the outer for loop runs.
0x080486c3 <myFunction+27>: movl $0x0,-0x8(%ebp)
This is the beginning of the inner for loop.
0x080486ca <myFunction+34>: jmp 0x8048707 <myFunction+95>
This indexes array by i*2 by doing pointer arithmetic on the address of array. First it puts i into eax, then left shifts it 3 (multiplying it by 8). This is an optimization of the *2 as well as accounting for the size of elements of array (4). Finally it adds this to the address of array, storing the result in eax.
0x080486cc <myFunction+36>: mov -0xc(%ebp),%eax
0x080486cf <myFunction+39>: shl $0x3,%eax
0x080486d2 <myFunction+42>: add 0xc(%ebp),%eax
This takes the value pointed to by address calculated above and stores it in edx. In this dialect of assembly x(y) means *(y+x)
0x080486d5 <myFunction+45>: mov (%eax),%edx
This calculates array[j] in a similar fashion, storing the result in eax this time:
0x080486d7 <myFunction+47>: mov -0x8(%ebp),%eax
0x080486da <myFunction+50>: shl $0x2,%eax
0x080486dd <myFunction+53>: add 0xc(%ebp),%eax
0x080486e0 <myFunction+56>: mov (%eax),%eax
This checks the two calculations above to see if they are equal:
0x080486e2 <myFunction+58>: cmp %eax,%edx
If the check doesn't pass (if they are not equal), skip the inside of the if. (This jumps past the end of your listing) jne means "jump if not equal"
0x080486e4 <myFunction+60>: jne 0x8048703 <myFunction+91>
These instructions load the arguments of mySumFunction into the proper places:
0x080486e6 <myFunction+62>: movl $0x6,0x8(%esp)
0x080486ee <myFunction+70>: mov 0xc(%ebp),%eax
0x080486f1 <myFunction+73>: mov %eax,0x4(%esp)
If the listing is cut off here, but hopefully this gives you a good general idea.
It's just the beginning.
the first lines are declare vars, and the condition of the loop.
This part
0x080486cc <myFunction+36>: mov -0xc(%ebp),%eax
0x080486cf <myFunction+39>: shl $0x3,%eax
0x080486d2 <myFunction+42>: add 0xc(%ebp),%eax
0x080486d5 <myFunction+45>: mov (%eax),%edx
0x080486d7 <myFunction+47>: mov -0x8(%ebp),%eax
0x080486da <myFunction+50>: shl $0x2,%eax
0x080486dd <myFunction+53>: add 0xc(%ebp),%eax
0x080486e0 <myFunction+56>: mov (%eax),%eax
0x080486e2 <myFunction+58>: cmp %eax,%edx
0x080486e4 <myFunction+60>: jne 0x8048703 <myFunction+91>
is like if (%ebp[0xc[2*%ebp[-0xc]]]!=%ebp[0xc[%ebp[8]]]) goto myFunction+91;
I guess that it's the if (array[i*2] == array[j])

Resources