Below are the first 5 lines of a disassembled C program that I am trying to reverse engineer back into it's C code for purposes of better learning assembly language. At the beginning of this code I see it makes room on the stack and immediately calls
0x000000000040054e <+8>: mov %fs:0x28,%rax
I am confused what this line does, and what might be calling this from the corresponding C program. The only time I have seen this line so far is when a different method within a C program is called, but this time it is not followed by any Callq instructions so I am not so sure... Any ideas what else could be in this C program to be making this call?
0x0000000000400546 <+0>: push %rbp
0x0000000000400547 <+1>: mov %rsp,%rbp
0x000000000040054a <+4>: sub $0x40,%rsp
0x000000000040054e <+8>: mov %fs:0x28,%rax
0x0000000000400557 <+17>: mov %rax,-0x8(%rbp)
0x000000000040055b <+21>: xor %eax,%eax
0x000000000040055d <+23>: movl $0x17,-0x30(%rbp)
...
I know this is to provide some form of stack protection for buffer overflow attacks, I just need to know what C code would prompt this protection if not for a seperate method.
As you say, this is code used to defend against buffer overflows. The compiler generates this "stack canary check" for functions that have local variables that might be buffers that could be overflowed. Note the instructions immediately above and below the line you are asking about:
sub $0x40, %rsp
mov %fs:0x28, %rax
mov %rax, -0x8(%ebp)
xor %eax, %eax
The sub allocates 64 bytes of space on the stack, which is enough room for at least one small array. Then a secret value is copied from %fs:0x28 to the top of that space, just below the previous frame pointer and the return address, and then it is erased from the register file.
The body of the function does something with arrays; if it writes sufficiently far past the end of an array, it will overwrite the secret value. At the end of the function, there will be code along the lines of
mov -0x8(%rbp), %rax
xor %fs:28, %rax
jne 1
mov %rbp, %rsp
pop %rbp
ret
1:
call __stack_chk_fail # does not return
This verifies that the secret value is unchanged, and crashes the program if it has changed. The idea is that someone trying to exploit a simple buffer overflow vulnerability, like you have when you use gets, won't be able to change the return address without also modifying the secret value.
The compiler has several different heuristics, selectable with command line options, for deciding when it is necessary to generate stack-canary protection code.
You can't write C code corresponding to this assembly language yourself, because it uses the unusual %fs:nnnn addressing mode; the stack-canary code intentionally uses an addressing mode that no other code generation relies on, to make it as difficult as possible for the adversary to learn the secret value.
Related
I've written a piece of C code and I've disassembled it as well as read the registers to understand how the program works in assembly.
int test(char *this){
char sum_buf[6];
strncpy(sum_buf,this,32);
return 0;
}
The piece of my code that I've been examining is the test function. When I disassemble the output my test function I get ...
0x00000000004005c0 <+12>: mov %fs:0x28,%rax
=> 0x00000000004005c9 <+21>: mov %rax,-0x8(%rbp)
... stuff ..
0x00000000004005f0 <+60>: xor %fs:0x28,%rdx
0x00000000004005f9 <+69>: je 0x400600 <test+76>
0x00000000004005fb <+71>: callq 0x4004a0 <__stack_chk_fail#plt>
0x0000000000400600 <+76>: leaveq
0x0000000000400601 <+77>: retq
What I would like to know is what mov %fs:0x28,%rax is really doing?
Both the FS and GS registers can be used as base-pointer addresses in order to access special operating system data-structures. So what you're seeing is a value loaded at an offset from the value held in the FS register, and not bit manipulation of the contents of the FS register.
Specifically what's taking place, is that FS:0x28 on Linux is storing a special sentinel stack-guard value, and the code is performing a stack-guard check. For instance, if you look further in your code, you'll see that the value at FS:0x28 is stored on the stack, and then the contents of the stack are recalled and an XOR is performed with the original value at FS:0x28. If the two values are equal, which means that the zero-bit has been set because XOR'ing two of the same values results in a zero-value, then we jump to the test routine, otherwise we jump to a special function that indicates that the stack was somehow corrupted, and the sentinel value stored on the stack was changed.
If using GCC, this can be disabled with:
-fno-stack-protector
glibc:
uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random);
# ifdef THREAD_SET_STACK_GUARD
THREAD_SET_STACK_GUARD (stack_chk_guard);
the _dl_random from kernel.
Looking at http://www.imada.sdu.dk/Courses/DM18/Litteratur/IntelnATT.htm, I think %fs:28 is actually an offset of 28 bytes from the address in %fs. So I think it's loading a full register size from location %fs + 28 into %rax.
I have an exam comming up, and I'm strugling with assembly. I have written some simple C code, gotten its assembly code, and then trying to comment on the assembly code as practice. The C code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char const *argv[])
{
int x = 10;
char const* y = argv[1];
printf("%s\n",y );
return 0;
}
Its assembly code:
0x00000000000006a0 <+0>: push %rbp # Creating stack
0x00000000000006a1 <+1>: mov %rsp,%rbp # Saving base of stack into base pointer register
0x00000000000006a4 <+4>: sub $0x20,%rsp # Allocate 32 bytes of space on the stack
0x00000000000006a8 <+8>: mov %edi,-0x14(%rbp) # First argument stored in stackframe
0x00000000000006ab <+11>: mov %rsi,-0x20(%rbp) # Second argument stored in stackframe
0x00000000000006af <+15>: movl $0xa,-0xc(%rbp) # Value 10 stored in x's address in the stackframe
0x00000000000006b6 <+22>: mov -0x20(%rbp),%rax # Second argument stored in return value register
0x00000000000006ba <+26>: mov 0x8(%rax),%rax # ??
0x00000000000006be <+30>: mov %rax,-0x8(%rbp) # ??
0x00000000000006c2 <+34>: mov -0x8(%rbp),%rax # ??
0x00000000000006c6 <+38>: mov %rax,%rdi # Return value copied to 1st argument register - why??
0x00000000000006c9 <+41>: callq 0x560 # printf??
0x00000000000006ce <+46>: mov $0x0,%eax # Value 0 is copied to return register
0x00000000000006d3 <+51>: leaveq # Destroying stackframe
0x00000000000006d4 <+52>: retq # Popping return address, and setting instruction pointer equal to it
Can a friendly soul help me out wherever I have "??" (meaning I don't understand what is happening or I'm unsure)?
0x00000000000006ba <+26>: mov 0x8(%rax),%rax # get argv[1] to rax
0x00000000000006be <+30>: mov %rax,-0x8(%rbp) # move argv[1] to local variable
0x00000000000006c2 <+34>: mov -0x8(%rbp),%rax # move local variable to rax (for move to rdi)
0x00000000000006c6 <+38>: mov %rax,%rdi # now rdi has argv[1]
0x00000000000006c9 <+41>: callq 0x560 # it is puts (optimized)
I will try to make a guess:
mov -0x20(%rbp),%rax # retrieve argv[0]
mov 0x8(%rax),%rax # store argv[1] into rax
mov %rax,-0x8(%rbp) # store argv[1] (which now is in rax) into y
mov -0x8(%rbp),%rax # put y back into rax (which might look dumb, but possibly it has its reasons)
mov %rax,%rdi # copy y to rdi, possibly to prepare the context for the printf
When you deal with assembler, please specify which architecture you are using. An Intel processor might use a different set of instructions from an ARM one, the same instructions might be different or they might rely on different assumptions. As you might know, optimisations change the sequence of assembler instructions generated by the compiler, you might want to specify whether you are using that as well (looks like not?) and which compiler you are using as everyone has its own policy for generating assembler.
Maybe we will never know why the compiler must prepare the context for printf by copying from rax, it could be a compiler's choice or an obligation imposed by the specific architecture. For all those annoying reasons, most of people prefer to use a "high level language" such as C, so that the set of instructions is always right although it might look very dumb for a human (as we know computers are dumb by design) and not always the most choice, that's why there are still many compilers around.
I can give you two more tips:
you IDE must have a way to interleave assembler instructions with C code, and to single step within the assembler. Try to find it out and explore it yourself
the IDE should also have a function to explore the memory of your program. If you find that try to enter the 0x560 address and look were it will lead you. It is very likely that that will be the entry point of your printf
I hope that my answer will help you work it out, good luck
I have the following program that enables the alignment check (AC) bit in the x86 processor flags register in order to trap unaligned memory accesses. Then the program declares two volatile variables:
#include <assert.h>
int main(void)
{
#ifndef NOASM
__asm__(
"pushf\n"
"orl $(1<<18),(%esp)\n"
"popf\n"
);
#endif
volatile unsigned char foo[] = { 1, 2, 3, 4, 5, 6 };
volatile unsigned int bar = 0xaa;
return 0;
}
If I compile this, the code generated initially does the
obvious things like setting up the stack and creating the array of chars by moving the values 1, 2, 3, 4, 5, 6 onto the stack:
/tmp ➤ gcc test3.c -m32
/tmp ➤ gdb ./a.out
(gdb) disassemble main
0x0804843d <+0>: push %ebp
0x0804843e <+1>: mov %esp,%ebp
0x08048440 <+3>: and $0xfffffff0,%esp
0x08048443 <+6>: sub $0x20,%esp
0x08048446 <+9>: mov %gs:0x14,%eax
0x0804844c <+15>: mov %eax,0x1c(%esp)
0x08048450 <+19>: xor %eax,%eax
0x08048452 <+21>: pushf
0x08048453 <+22>: orl $0x40000,(%esp)
0x0804845a <+29>: popf
0x0804845b <+30>: movb $0x1,0x16(%esp)
0x08048460 <+35>: movb $0x2,0x17(%esp)
0x08048465 <+40>: movb $0x3,0x18(%esp)
0x0804846a <+45>: movb $0x4,0x19(%esp)
0x0804846f <+50>: movb $0x5,0x1a(%esp)
0x08048474 <+55>: movb $0x6,0x1b(%esp)
0x08048479 <+60>: mov 0x16(%esp),%eax
0x0804847d <+64>: mov %eax,0x10(%esp)
0x08048481 <+68>: movzwl 0x1a(%esp),%eax
0x08048486 <+73>: mov %ax,0x14(%esp)
0x0804848b <+78>: movl $0xaa,0xc(%esp)
0x08048493 <+86>: mov $0x0,%eax
0x08048498 <+91>: mov 0x1c(%esp),%edx
0x0804849c <+95>: xor %gs:0x14,%edx
0x080484a3 <+102>: je 0x80484aa <main+109>
0x080484a5 <+104>: call 0x8048310 <__stack_chk_fail#plt>
0x080484aa <+109>: leave
0x080484ab <+110>: ret
However at main+60 it does something strange: it moves the array of 6 bytes to another part of the stack: the data is moved one 4-byte word at a time in registers. But the bytes start at offset 0x16, which is not aligned, so the program will crash when attempting to perform the mov.
So I've two questions:
Why is the compiler emitting code to copy the array to another part of the stack? I assumed volatile would skip every optimization and always perform memory accesses. Maybe volatile vars are required to always be accessed as whole words, and so the compiler would always use temporary registers to read/write whole words?
Why does the compiler not put the char array at an aligned address if it later intends to do these mov calls? I understand that x86 is normally safe with unaligned accesses, and on modern processors it does not even carry a performance penalty; however in all other instances I see the compiler trying to avoid generating unaligned accesses, since they are considered, AFAIK, an unspecified behavior in C. My guess is that, since later it provides a properly aligned pointer for the copied array on the stack, it just does not care about alignment of the data used only for initialization in a way which is invisible to the C program?
If my hypotheses above are right, it means that I cannot expect an x86 compiler to always generate aligned accesses, even if the compiled code never attempts to perform unaligned accesses itself, and so setting the AC flag is not a practical way to detect parts of the code where unaligned accesses are performed.
EDIT: After further research I can answer most of this myself. In an attempt to make progress, I added an option in Redis to set the AC flag and otherwise run normally. I found that this approach is not viable: the process immediately crashes inside libc: __mempcpy_sse2 () at ../sysdeps/x86_64/memcpy.S:83. I assume that the whole x86 software stack simply does not really care about misalignment since it is handled very well by this architecture. Thus it is not practical to run with the AC flag set.
So the answer to question 2 above is that, like the rest of the software stack, the compiler is free to do as it pleases, and relocate things on the stack without caring about alignment, so long as the behavior is correct from the perspective of the C program.
The only question left to answer, is why with volatile, is a copy made in a different part of the stack? My best guess is that the compiler is attempting to access whole words in variables declared volatile even during initialization (imagine if this address was mapped to an I/O port), but I'm not sure.
You're compiling without optimization, so the compiler is generating straight-forward code without worrying about how inefficient it is. So it first creates the initializer { 1, 2, 3, 4, 5, 6 } in temp space on the stack, and it then copies that into the space it allocated for foo.
The compiler populates the array in a working storage area, one byte at a time, which is not atomic. It then moves the entire array to its final resting place using an atomic MOVZ instruction (the atomicity is implicit when the target address is naturally aligned).
The write has to be atomic because the compiler must assume (due to the volatile keyword) that the array can be accessed at any time by anyone else.
There's a series of problems in SPOJ about creating a function in a single line with some constraints. I've already solved the easy, medium and hard ones, but for the impossible one I keep getting Wrong Answer.
To sum it up, the problem requests to fill in the code of the return statement such that if x is 1, the return value should be 2. For other x values, it should return 3. The constraint is that the letter 'x' can't be used, and no more code can be added; one can only code that return statement. Clearly, to solve this, one must create a hack.
So I've used gcc's built in way to get the stack frame, and then decreased the pointer to get a pointer to the first parameter. Other than that, the statement is just a normal comparison.
On my machine it works fine, but for the cluster (Intel Pentinum G860) used by the online judge, it doesn't work, probably due to a different calling convention. I'm not sure I understood the processor's ABI (I'm not sure if the stack frame pointer is saved on the stack or only on a register), or even if I'm reading the correct ABI.
The question is: what would be the correct way to get the first parameter of a function using the stack?
My code is (it must be formatted this way, otherwise it's not accepted):
#include <stdio.h>
int count(int x){
return (*(((int*)__builtin_frame_address(0))-1) == 1) ? 2 : 3;
}
int main(i){
for(i=1;i%1000001;i++)
printf("%d %d\n",i,count(i));
return 0;
}
The question is: what would be the correct way to get the first
parameter of a function using the stack?
There is no way in portable manner. You must assume specific compiler, its settings and ABI, along with calling conventions.
The gcc compiler is likely to "lay down" an int local variable with -0x4 offset (assuming that sizeof(int) == 4). You might observe with most basic definition of count:
4 {
0x00000000004004c4 <+0>: push %rbp
0x00000000004004c5 <+1>: mov %rsp,%rbp
0x00000000004004c8 <+4>: mov %edi,-0x4(%rbp)
5 return x == 1 ? 2 : 3;
0x00000000004004cb <+7>: cmpl $0x1,-0x4(%rbp)
0x00000000004004cf <+11>: jne 0x4004d8 <count+20>
0x00000000004004d1 <+13>: mov $0x2,%eax
0x00000000004004d6 <+18>: jmp 0x4004dd <count+25>
0x00000000004004d8 <+20>: mov $0x3,%eax
6 }
0x00000000004004dd <+25>: leaveq
0x00000000004004de <+26>: retq
You may also see that %edi register holds first parameter. This is the case for AMD64 ABI (%edi is also not preserved between calls).
Now, with that knowledge, you might write something like:
int count(int x)
{
return *((int*)(__builtin_frame_address(0) - sizeof(int))) == 1 ? 2 : 3;
}
which can be obfuscated as:
return *((int*)(__builtin_frame_address(0)-sizeof(int)))==1?2:3;
However, trick is that such optimizing compiler may enthusiastically assume that since x is not referenced in count, it could simply skip moving into stack. For example it produces following assembly with -O flag:
4 {
0x00000000004004c4 <+0>: push %rbp
0x00000000004004c5 <+1>: mov %rsp,%rbp
5 return *((int*)(__builtin_frame_address(0)-sizeof(int)))==1?2:3;
0x00000000004004c8 <+4>: cmpl $0x1,-0x4(%rbp)
0x00000000004004cc <+8>: setne %al
0x00000000004004cf <+11>: movzbl %al,%eax
0x00000000004004d2 <+14>: add $0x2,%eax
6 }
0x00000000004004d5 <+17>: leaveq
0x00000000004004d6 <+18>: retq
As you can see mov %edi,-0x4(%rbp) instruction is now missing, thus the only way1 would be to access value of x from %edi register:
int count(int x)
{
return ({register int edi asm("edi");edi==1?2:3;});
}
but this method lacks of ability to "obfuscate", as whitespaces are needed for variable declaration, that holds value of %edi.
1) Not necessarily. Even if compiler decides to skip mov operation from register to stack, there is still a possibility to "force" it to do so, by asm("mov %edi,-0x4(%rbp)"); inline assembly. Beware though, compiler may have its revenge, sooner or later.
C standard does NOT require a stack in any implementation, so really your problem doesn't make any sense.
in the context of gcc, the behavior is different in x86 and x86-64(and any others).
in x86, parameters reside in stack, but in x86-64, the first 6 parameters(including the implicit ones) reside in registers. so basically you can't do the hacking as you say.
if you want to hack the code, you need to specify the platform you want to run on, otherwise, there is no point to answer your question.
When I use disas in gdb. I may get something like this.
(gdb) disas bar
Dump of assembler code for function bar:
0x08048e84 <+0>: push %ebp
0x08048e85 <+1>: mov %esp,%ebp
0x08048e87 <+3>: sub $0x8,%esp
0x08048e8a <+6>: mov 0xc(%ebp),%eax
0x08048e8d <+9>: mov 0x8(%ebp),%edx
0x08048e90 <+12>: add %edx,%eax
0x08048e92 <+14>: mov %eax,-0x4(%ebp)
0x08048e95 <+17>: mov 0x81f4074,%eax
0x08048e9a <+22>: mov %eax,(%esp)
0x08048e9d <+25>: call 0x8048ed8 <traceback>
0x08048ea2 <+30>: mov -0x4(%ebp),%eax
0x08048ea5 <+33>: mov %eax,0x8(%ebp)
0x08048ea8 <+36>: leave
0x08048ea9 <+37>: ret
End of assembler dump.
Let say I have 0x08048ea2 in my C program. How is it possible for me to obtain the offset <+30> and get 0x08048e84.
It's hard to tell what, exactly you are after, but backtrace functions generally look at the "stack frame" and find return addresses from there. They then parse the list of functions looking for the immediately preceding function for each return address. If you are in a debugger then also probably can tell what line of code the return address represents.
From here I'm not sure if the question is "how to I evaluate the stack frame" or "how do I find the 'immediately preceding function".
Stack frame generation is platform specific, but generally uses a specific register to hold an address that is a fixed position on the stack established on entry to the function. Stack frames usually point to a location in the stack at (or just beside) the previous stack frame pointer. This allows walking the stack using the frame pointers and evaluating the return addresses (which will usually be at a fixed position relative the the stack frame information.
Be aware that there are various optimizations the compiler can make which will affect the stack frame (two specifically: disabling stack frame generation, and the return value optimization).
Determining the address of functions is also platform dependent. Compilers and linkers will create and manage debug symbols if asked to. This essentially embeds a lookup table of function name and starting address that is loaded into memory for the debugger to access. Better systems will also provide the ability to lookup symbol names in release builds. They usually do this by having the IDE provide the function base addresses (a map file), and then figure out where the sections of code were loaded into memory. I suspect this information is provided by OS calls, but maybe there is some other mechanism.
I have a stupid method.
Add a label like this:
/* ommit some code */
label:
mov -0x4(%ebp),%eax
mov %eax,0x8(%ebp)
leave
ret
Then:
lea eax, label /* eax will be 0x08048ea2 */
lea ebx, bar /* ebx will be 0x08048e84 */
sub eax, ebx /* get the offset */