Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm doing a security project for my school.
For this project I have a binary, and I have to do 2 things, make a pseudo code of this binary and do the exploit.
To get better in ASM I'm trying to do exactly the same source code in c. I have a problem with edx in the main. I have no idee how to do this in c:
0x080484a5 <+41>: mov edx,0x8048468
This is the full main code:
Dump of assembler code for function main:
0x0804847c <+0>: push ebp
0x0804847d <+1>: mov ebp,esp
0x0804847f <+3>: and esp,0xfffffff0
0x08048482 <+6>: sub esp,0x20
0x08048485 <+9>: mov DWORD PTR [esp],0x40
0x0804848c <+16>: call 0x8048350 <malloc#plt>
0x08048491 <+21>: mov DWORD PTR [esp+0x1c],eax
0x08048495 <+25>: mov DWORD PTR [esp],0x4
0x0804849c <+32>: call 0x8048350 <malloc#plt>
0x080484a1 <+37>: mov DWORD PTR [esp+0x18],eax
0x080484a5 <+41>: mov edx,0x8048468
0x080484aa <+46>: mov eax,DWORD PTR [esp+0x18]
0x080484ae <+50>: mov DWORD PTR [eax],edx
0x080484b0 <+52>: mov eax,DWORD PTR [ebp+0xc]
0x080484b3 <+55>: add eax,0x4
0x080484b6 <+58>: mov eax,DWORD PTR [eax]
0x080484b8 <+60>: mov edx,eax
0x080484ba <+62>: mov eax,DWORD PTR [esp+0x1c]
0x080484be <+66>: mov DWORD PTR [esp+0x4],edx
0x080484c2 <+70>: mov DWORD PTR [esp],eax
0x080484c5 <+73>: call 0x8048340 <strcpy#plt>
0x080484ca <+78>: mov eax,DWORD PTR [esp+0x18]
0x080484ce <+82>: mov eax,DWORD PTR [eax]
0x080484d0 <+84>: call eax
0x080484d2 <+86>: leave
0x080484d3 <+87>: ret
Can you help me to find how to do the line main + 41 please :) ?
Thank you
0x8048468 is probably a pointer value based on the number. Maybe a function pointer, because it's in the page above 0x8048350 (the PLT entry for malloc). But maybe just a pointer to a static buffer (maybe a read-only buffer, like a string literal).
So perhaps void *edx = "hello world"; or void *edx = &some_function, and then use it somehow. C statements don't map to single asm instructions, but with un-optimized output (gcc -O0), each C statement does map to a contiguous block of instructions that finishes with all values in memory. (This means you can modify C variables with a debugger and still have it "work" in un-optimized code.)
I didn't trace through the mess of store/reload that looks like un-optimized code, so I'm not sure what exactly is being done with that value after it's stored to memory in the next instruction after the mov-immediate.
Look at your compiler's asm output if you have source (gcc -S instead of compiling all the way to a binary and then disassemblign), or use objdump -drwC -Mintel to get relocation info for that value if there is any. Or use nm to look for it in the symbol table.
If it is a function pointer, the disassembly for that address should make some sense.
Related
I've got an NX enabled, canary enabled x64 ELF and can only view the assembly, not the source code but I do know its written in c. When run, it only accepts command line args and returns nothing. Inside of the main function there's only one function call of note, to evil;
0x000000000040060e <+0>: push rbp
0x000000000040060f <+1>: mov rbp,rsp
0x0000000000400612 <+4>: sub rsp,0x10
0x0000000000400616 <+8>: mov DWORD PTR [rbp-0x4],edi
0x0000000000400619 <+11>: mov QWORD PTR [rbp-0x10],rsi
0x000000000040061d <+15>: mov rax,QWORD PTR [rbp-0x10]
0x0000000000400621 <+19>: add rax,0x8
0x0000000000400625 <+23>: mov rax,QWORD PTR [rax]
0x0000000000400628 <+26>: mov rdi,rax
0x000000000040062b <+29>: call 0x4005a7 <evil>
0x0000000000400630 <+34>: mov eax,0x0
0x0000000000400635 <+39>: leave
0x0000000000400636 <+40>: ret
and inside evil, it pulls the command line args and checks... something... against 0xdeadbeef and if they match, passes a "you win!" message, then verifies the canary and if either of those fail, it runs call __stack_chk_fail#plt, which indirects through a GOT entry which is in writeable memory somewhere.
0x00000000004005a7 <+0>: push rbp
0x00000000004005a8 <+1>: mov rbp,rsp
0x00000000004005ab <+4>: sub rsp,0x70
0x00000000004005af <+8>: mov QWORD PTR [rbp-0x68],rdi
0x00000000004005b3 <+12>: mov rax,QWORD PTR fs:0x28
0x00000000004005bc <+21>: mov QWORD PTR [rbp-0x8],rax
0x00000000004005c0 <+25>: xor eax,eax
0x00000000004005c2 <+27>: mov DWORD PTR [rbp-0x54],0x0
0x00000000004005c9 <+34>: mov rdx,QWORD PTR [rbp-0x68]
0x00000000004005cd <+38>: lea rax,[rbp-0x50]
0x00000000004005d1 <+42>: mov rsi,rdx
0x00000000004005d4 <+45>: mov rdi,rax
0x00000000004005d7 <+48>: mov eax,0x0
0x00000000004005dc <+53>: call 0x4004b0 <sprintf#plt>
0x00000000004005e1 <+58>: mov eax,DWORD PTR [rbp-0x54]
0x00000000004005e4 <+61>: cmp eax,0xdeadbeef
0x00000000004005e9 <+66>: jne 0x4005f7 <evil+80>
0x00000000004005eb <+68>: lea rdi,[rip+0xd6] # 0x4006c8
0x00000000004005f2 <+75>: call 0x400490 <puts#plt>
0x00000000004005f7 <+80>: nop
0x00000000004005f8 <+81>: mov rax,QWORD PTR [rbp-0x8]
0x00000000004005fc <+85>: xor rax,QWORD PTR fs:0x28
0x0000000000400605 <+94>: je 0x40060c <evil+101>
0x0000000000400607 <+96>: call 0x4004a0 <__stack_chk_fail#plt>
0x000000000040060c <+101>: leave
0x000000000040060d <+102>: ret
In ghidra and with cyclic strings I'm able to verify that the buffer is 72 characters. I've found a bunch of old info from liveoverflow that's about 5 years old now with the exact same problem (protostar format0), except his buffer is 64. For some reason, this buffer mismatch is causing me all sorts of problems I believe.
I've tried hundreds of inputs to achieve the winning statement;
I've tried overwriting the buffer of 72 with 72 A's followed by variations of 0xdeadbeef such as little endian, strings, hex, etc
I've played around with the buffer and offset, so for example putting 0xdeadbeef and then the buffer after, or putting 72 A's with a nop sled of 8 or so after it then 0xdeadbeef
I've tried following liveoverflow's method of overwriting __stack_chk_fail's GOT entry completely, via a format-string vulnerability like %1640d which you can see here, but either have the wrong numbers or am misunderstanding how it works/if it will work on my binary and machine
None of these have given me the winning statement, and I'd really like to understand the why and how and the assembly reasoning behind it.
edit
I've noticed that if I pass python -c 'print ("%x."*5)' as an argument, my stack fills up with ffffe374.f7fa3658.4006b0.f7fcbb60.0, which repeats if I up the number from 5. When checking these with x 0x4006b0, 0x4006b0 is the only value that reuturns anything of use - 0x4006b0 <__libc_csu_fini>: "\363", <incomplete sequence \303>. I've tried creating a rop chain using this and looking up ret2csu guides but I've had no luck so far.
I'm writing small C programs and disassembling them using objdump and gdb to see what the assembly looks like. I don't understand how functions like printf work on an assembly level. The C program I last compiled is:
#include <stdio.h>
int main(){
printf("test");
return 0;
}
A gdb disassembly of the program produces the following:
Dump of assembler code for function main:
0x00401460 <+0>: push ebp
0x00401461 <+1>: mov ebp,esp
0x00401463 <+3>: and esp,0xfffffff0
0x00401466 <+6>: sub esp,0x10
0x00401469 <+9>: call 0x4019c0 <__main>
0x0040146e <+14>: mov DWORD PTR [esp],0x405064
0x00401475 <+21>: call 0x403a60 <printf>
0x0040147a <+26>: mov eax,0x0
0x0040147f <+31>: leave
0x00401480 <+32>: ret
0x00401481 <+33>: nop
0x00401482 <+34>: nop
0x00401483 <+35>: nop
0x00401484 <+36>: xchg ax,ax
0x00401486 <+38>: xchg ax,ax
0x00401488 <+40>: xchg ax,ax
0x0040148a <+42>: xchg ax,ax
0x0040148c <+44>: xchg ax,ax
0x0040148e <+46>: xchg ax,ax
End of assembler dump.
I understand that the printf function is part of the standard C library, which is a precompiled DLL. However the string I'm passing to printf is not precompiled and should be found somewhere in my program. Where is it?
The only place I can think of is main+14. I run 405064 into a hexadecimal to ascii and utf converter, neither of which produced "test". Does that mean 0x405064 is the address where the "test" string is stored? If so, why is it in a random address rather than one relative to the main function? Or is text encoded differently in memory and my hex to ascii converter is pointless?
Also, if the stack is allocated 16 bytes, why is the string stored in [esp]? Aren't characters 1 byte long, and therefore the string should be stored in [ebp-4]?
This question already has answers here:
Why is the address of static variables relative to the Instruction Pointer?
(1 answer)
32-bit absolute addresses no longer allowed in x86-64 Linux?
(1 answer)
Closed 4 years ago.
The C source:
int sum(int a, int b) {
return a + b;
}
int main() {
int (*ptr_sum_1)(int,int) = sum; // assign the address of the "sum"
int (*ptr_sum_2)(int,int) = sum; // to the function pointer
int (*ptr_sum_3)(int,int) = sum;
int a = (*ptr_sum_1)(2,4); // call the "sum" through the pointer
int b = sum(2,4); // call the "sum" by usual way
return 0;
}
The crucial part of the assembly code:
lea rax, sum[rip]
mov QWORD PTR -24[rbp], rax
lea rax, sum[rip]
mov QWORD PTR -16[rbp], rax
lea rax, sum[rip]
mov QWORD PTR -8[rbp], rax
The executing program instructions from GDB:
0x5fa <sum>: push rbp
0x5fb <sum+1>: mov rbp,rsp
0x5fe <sum+4>: mov DWORD PTR [rbp-0x4],edi
0x601 <sum+7>: mov DWORD PTR [rbp-0x8],esi
0x604 <sum+10>: mov edx,DWORD PTR [rbp-0x4]
0x607 <sum+13>: mov eax,DWORD PTR [rbp-0x8]
0x60a <sum+16>: add eax,edx
0x60c <sum+18>: pop rbp
0x60d <sum+19>: ret
0x60e <main>: push rbp
0x60f <main+1>: mov rbp,rsp
0x612 <main+4>: sub rsp,0x20
0x616 <main+8>: lea rax,[rip+0xffffffffffffffdd] # 0x5fa <sum>
0x61d <main+15>: mov QWORD PTR [rbp-0x18],rax
0x621 <main+19>: lea rax,[rip+0xffffffffffffffd2] # 0x5fa <sum>
0x628 <main+26>: mov QWORD PTR [rbp-0x10],rax
0x62c <main+30>: lea rax,[rip+0xffffffffffffffc7] # 0x5fa <sum>
0x633 <main+37>: mov QWORD PTR [rbp-0x8],rax
0x637 <main+41>: mov rax,QWORD PTR [rbp-0x18]
0x63b <main+45>: mov esi,0x4
0x640 <main+50>: mov edi,0x2
0x645 <main+55>: call rax
0x647 <main+57>: mov DWORD PTR [rbp-0x20],eax
0x64a <main+60>: mov esi,0x4
0x64f <main+65>: mov edi,0x2
0x654 <main+70>: call 0x5fa <sum>
0x659 <main+75>: mov DWORD PTR [rbp-0x1c],eax
0x65c <main+78>: mov eax,0x0
0x661 <main+83>: leave
0x662 <main+84>: ret
I think that the sum label is just the starting address of the sum procedure - 0x5fa, so I don't understand why gcc can't use it directly, but uses the calculation sum[rip] for this.
Question:
Why is sum[rip] used in the lea rax, sum[rip] instruction in assembly, instead of the simple sum label, e.g. lea rax, sum?
Will the mov rax, 0x5fa instruction do the same? Because we know the sum address after linking: the call 0x5fa <sum> instruction just uses it directly.
I believe that it might depend on your build of GCC, but on the Linux distributions that I use everything is set up to default to PIC builds. That's Position Independent Code. It's better for both shared libraries and executables, because the result can be mapped into memory anywhere without needing a fixup pass. It's better for security because ASLR can be applied.
With x86-64 there's no significant penalty for using PIC so why wouldn't it be used everywhere?
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am trying to understand the assembly code for the following function by doing a disassembler. I am unable to get why all the operations are relative to the base pointer.
Why are the register values of rcx and rdx moved to memory location offset by 10 and 18?
( mov 0x10(%rbp),%rax and mov %rdx,0x18(%rbp) ).
Why is the return value stored in the
mov %rax,-0x8(%rbp)
long absdiff(long x, long y)
{
long result;
if (x>y)
result = x-y;
else
result = y-x;
return result;
}
0x00000001004010e0 <+0>: push %rbp
0x00000001004010e1 <+1>: mov %rsp,%rbp
0x00000001004010e4 <+4>: sub $0x10,%rsp
0x00000001004010e8 <+8>: mov %rcx,0x10(%rbp)
0x00000001004010ec <+12>: mov %rdx,0x18(%rbp)
0x00000001004010f0 <+16>: mov 0x10(%rbp),%rax
0x00000001004010f4 <+20>: cmp 0x18(%rbp),%rax
0x00000001004010f8 <+24>: jle 0x100401108 <absdiff+40>
0x00000001004010fa <+26>: mov 0x10(%rbp),%rax
0x00000001004010fe <+30>: sub 0x18(%rbp),%rax
0x0000000100401102 <+34>: mov %rax,-0x8(%rbp)
0x0000000100401106 <+38>: jmp 0x100401114 <absdiff+52>
0x0000000100401108 <+40>: mov 0x18(%rbp),%rax
0x000000010040110c <+44>: sub 0x10(%rbp),%rax
0x0000000100401110 <+48>: mov %rax,-0x8(%rbp)
0x0000000100401114 <+52>: mov -0x8(%rbp),%rax
0x0000000100401118 <+56>: add $0x10,%rsp
0x000000010040111c <+60>: pop %rbp
0x000000010040111d <+61>: retq
1) Why sub $0x10, %rsp?
It is actually subtracting 16 bytes, in other words, its making space for the two 'long' arguments. try printing 'sizeof(long)' and I'm pretty sure you'll get '8' as the answer on the machine you're on.
2) Why move register values to memory?
Again this is where the computer is loading the two long values from the registers 'rcx' and 'rdx' into the memory space it made in '1)'. 0x10 and 0x18 have a difference of 8 bytes.
3) Why is the return value stored in the mov %rax,-0x8(%rbp)?
It's stored temporarily because before leaving the function, the %rax register is used for some other computations. Therefore if it was not saved it would have been over written, and you can see that after those computations are done the value is again loaded into rax.
mov%rax,-0x8(%rbp) <--- saving
jmp 0x100401114 <absdiff+52>
...
mov %rax,-0x8(%rbp)
-0x8(%rbp),%rax" < -- retrieving
A Suggestion
I'm pretty sure you'll find this link really helpful:
https://www.recurse.com/blog/7-understanding-c-by-learning-assembly
I have written simple function in C,
void GetInput()
{
char buffer[8];
gets(buffer);
puts(buffer);
}
When I disassemble it in gdb's disassembler, it gives following disassembly.
0x08048464 <+0>: push %ebp
0x08048465 <+1>: mov %esp,%ebp
0x08048467 <+3>: sub $0x10,%esp
0x0804846a <+6>: mov %gs:0x14,%eax
0x08048470 <+12>: mov %eax,-0x4(%ebp)
0x08048473 <+15>: xor %eax,%eax
=> 0x08048475 <+17>: lea -0xc(%ebp),%eax
0x08048478 <+20>: mov %eax,(%esp)
0x0804847b <+23>: call 0x8048360 <gets#plt>
0x08048480 <+28>: lea -0xc(%ebp),%eax
0x08048483 <+31>: mov %eax,(%esp)
0x08048486 <+34>: call 0x8048380 <puts#plt>
0x0804848b <+39>: mov -0x4(%ebp),%eax
0x0804848e <+42>: xor %gs:0x14,%eax
0x08048495 <+49>: je 0x804849c <GetInput+56>
0x08048497 <+51>: call 0x8048370 <__stack_chk_fail#plt>
0x0804849c <+56>: leave
0x0804849d <+57>: ret
Now please look at line number three, 0x08048467 <+3>: sub $0x10,%esp, I have only 8 bytes allocated as local variable, then why compiler is allocating 16 bytes(0x10).
Secondly, what is meaning of xor %gs:0x14,%eax.
#Edit: If it is optimization, is there any way to stop it.
Thanks.
Two things:
The compiler may reserve space for intermediate expressions to which you did not give names in the source code (or conversely not allocate space for local variables that can live entirely in registers). The list of stack slots in the binary does not have to match the list of local variables in the source code.
On some platforms, the compiler has to keep the stack pointer aligned. For the particular example in your question, it is likely that the compiler is striving to keep the stack pointer aligned to a boundary of 16 bytes.
Regarding your other question that you should have asked separately, xor %gs:0x14,%eax is clearly part of a stack protection mechanism, enabled by default. If you are using GCC, turn it off with -fno-stack-protector.
Besides the other answers already given, gcc will prefer to keep the stack 16-byte aligned for storing SSE values on the stack since some (all?) of the SSE instructions require their memory argument to be 16-byte aligned.
This more builds upon Pascal's answer, but in this case, it's probably because of the stack protection mechanism.
You allocate 8 bytes, which is fair enough and taken into account with the stack pointer. In addition, the current stack protection address is saved to %ebp, which points to the top of the current stack frame on the following lines
0x0804846a <+6>: mov %gs:0x14,%eax
0x08048470 <+12>: mov %eax,-0x4(%ebp)
This appears to take a four bytes. Given this, the other four bytes are probably for alignment of some form, or are taken up with some other stack information on the following lines:
=> 0x08048475 <+17>: lea -0xc(%ebp),%eax
0x08048478 <+20>: mov %eax,(%esp)