I have to work with x86 for a homework assignment. I keep getting this segfault for an atoi call, but I'm not sure why it's occurring. This is the code:
addl $4, %eax
movl (%eax), %eax
movl %eax, (%esp)
call atoi
Where $eax right before the "call atoi" equals "11". To prove it, this is what I get in gdb right before the call.
x/s $eax
0xffffd658: "11"
I guess my big question is what should be held in register $eax before the "call atoi" and what would $eax have to hold for the program to then segfault in the "call atoi"?
EDIT: Someone asked for the registers.
(gdb) x/s $eax
0xffffd658: "11"
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x2aaaaaa3 in ?? ()
(gdb) inf r
eax 0x0 0
ecx 0x2aaaaaab 715827883
edx 0x0 0
ebx 0x2c3ff4 2899956
esp 0xffffd458 0xffffd458
ebp 0x80484ef 0x80484ef
esi 0x0 0
edi 0x0 0
eip 0x2aaaaaa3 0x2aaaaaa3
eflags 0x10287 [ CF PF SF IF RF ]
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x63 99
(gdb)
It depends on whether your atoi function expects the pointer in %eax or on the stack. You seem to be undecided about that, given the code segment:
movl %eax, (%esp)
call atoi
If that's supposed to be pushing the address on to the stack, you either need to really push it (which modifies the stack pointer correctly) or modify the stack pointer yourself (such as with sub %esp, 4 or something).
The x86 stack pointer points to the last pushed entry so that push will first decrement the stack pointer then store the value at the new pointer. That means movl %eax (%esp) is going to corrupt the last pushed entry on the stack rather than place a new value on the stack.
Most people let the CPU handle the stacking of data rather than trying to do it manually, so you're probably looking for:
push %eax
However, if instead atoi expects its argument in %eax rather than on the stack, why would you write it to the stack? That's still going to corrupt whatever was there.
So, the first thing I would do would be choose either (for a pushed value):
addl $4, %eax
movl (%eax), %eax
push %eax
call atoi
or (for using %eax), just:
addl $4, %eax
movl (%eax), %eax
call atoi
Related
I have a __kernel_vsyscall error in a function that gives the correct output but the program can never get past it and gives a __kernel_vsyscall error.
C functions:
void f1(int* input, int* output, int nbElements);
Assembler function (GAS). I can't post the whole thing since it's for an assignment and I don't want someone to copy it.
f1:
push %ebp
mov %esp, %ebp
movl $0, %ecx
movl $0, %ebx
jmp for_loop1
for_loop1:
cmpl %ecx, 16(%ebp)
jb end
movl $0, %ebx
jmp for_loop2
for_loop2:
/*move input elements to output elements*/
cmpl %ecx, %ebx
jmp incr_2
incr_2:
addl $1, %ebx
cmpl %ebx, 16(%ebp)
jb incr_1
jmp for_loop2
incr_1:
addl $1, %ecx
jmp for_loop1
end:
addl $8, %esp
leave
ret
Error when the program terminated :
malloc(): invalid size (unsorted)
Aborted (core dumped)
Error when debugging with gdb with the coredump file :
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./program_name...
[New LWP 2634]
warning: Loadable section ".note.gnu.property" outside of ELF segments
Core was generated by `./program_name'.
Program terminated with signal SIGABRT, Aborted.
#0 0xf7f35ac9 in __kernel_vsyscall ()
I've tried looking at what is at address 0xf7f35ac9 and it returned -402652697, or another random value.
Local variables are at adresses around 0xff83b1fc, the values stored in the pointers are at adresses around 0x80cab30, the functions declared previously are at adresses around 0x8049fc4, and the stack has values around 0xffaddedc, so I have no idea where this is.
Thanks
I was able to fix the problem by changing
cmpl %ecx, 16(%ebp)
jb end
to
cmpl 16(%ebp), %ecx
jge end
same as with all the other comparisons. Looked around online and apparently the difference between jb/ja and jl/jg is that jb/ja is for unsigned comparison while jl/jg is for signed comparison, but I'm still not sure why this causes this error.
Here is the assembly code I'm looking at:
push %ebp
mov %esp, %epb
sub $0x18, %esp
movl $0x804a170, 0x4(%esp)
mov 0x8(%ebp), %eax
mov %eax, (%esp)
call 0x8048f9b <strings_not_equal>
test %eax, %eax
je 0x8048f78 <phase_1+34>
call 0x8049198 <failed>
leave
ret
The goal is to give the function of the assembly the correct input as to jump to the function phase_1+34. So here is what I am interpreting so far:
The first two lines are set up code for the function. The first 'sub' line is allocating space by moving '%esp' downward to store arguments to call the strings_not_equal function. I think that the two arguments being passed to strings_not_equal are 0x804a170 and the input value. I assume that <strings_not_equal> will return 0 if the passed strings are equal. The je then checks to see if %eax & %eax is zero, which will only happen if %eax = 0. So basically, it seems that the input string just has to be equal to 0x804a170.
Anyone see any flaws so far?
At this point, I'm stuck, and what I have tried isn't working. 0x804a170 in decimal is 134521200. But the function this is being passed to is expecting strings. So 0x804a170 should be converted to a string? and that string is what the input should be equal to?
I dunno. If anyone sees any flaws or can give me a pointer it is very much appreciated!
0x804a170 is a pointer to something considered a "string" by the runtime. Since you give no information about what it is, we can only guess it's either a pointer to the first character, followed by more characters and ending with a 0, or it's a pointer to the first character, and the previous 1/2/4/8 bytes describe the length as an integer.
It's not a string itself.
Commented the code:
push %ebp ;function name is apparently phase_1
mov %esp, %epb
sub $0x18, %esp ;allocate 24 bytes from stack
movl $0x804a170, 0x4(%esp) ;[esp+4] = pointer to literal, static, or global string
mov 0x8(%ebp), %eax ;[esp ] = [ebp + 8] (this and next instruction)
mov %eax, (%esp)
call 0x8048f9b <strings_not_equal>
test %eax, %eax
je 0x8048f78 <phase_1+34> ;probably branches to next instruction after ret
call 0x8049198 <failed>
leave
ret
... ;padding here, nop (0x90) or int 3 (0xcc) are common
phase_1 + 34 ... ;continuation of phase_1
I got this short C Code.
#include <stdint.h>
uint64_t multiply(uint32_t x, uint32_t y) {
uint64_t res;
res = x*y;
return res;
}
int main() {
uint32_t a = 3, b = 5, z;
z = multiply(a,b);
return 0;
}
There is also an Assembler Code for the given C code above.
I don't understand everything of that assembler code. I commented each line and you will find my question in the comments for each line.
The Assembler Code is:
.text
multiply:
pushl %ebp // stores the stack frame of the calling function on the stack
movl %esp, %ebp // takes the current stack pointer and uses it as the frame for the called function
subl $16, %esp // it leaves room on the stack, but why 16Bytes. sizeof(res) = 8Bytes
movl 8(%ebp), %eax // I don't know quite what "8(%ebp) mean? It has to do something with res, because
imull 12(%ebp), %eax // here is the multiplication done. And again "12(%ebp).
movl %eax, -8(%ebp) // Now, we got a negative number in front of. How to interpret this?
movl $0, -4(%ebp) // here as well
movl -8(%ebp), %eax // and here again.
movl -4(%ebp), %edx // also here
leave
ret
main:
pushl %ebp // stores the stack frame of the calling function on the stack
movl %esp, %ebp // // takes the current stack pointer and uses it as the frame for the called function
andl $-8, %esp // what happens here and why?
subl $24, %esp // here, it leaves room for local variables, but why 24 bytes? a, b, c: the size of each of them is 4 Bytes. So 3*4 = 12
movl $3, 20(%esp) // 3 gets pushed on the stack
movl $5, 16(%esp) // 5 also get pushed on the stack
movl 16(%esp), %eax // what does 16(%esp) mean and what happened with z?
movl %eax, 4(%esp) // we got the here as well
movl 20(%esp), %eax // and also here
movl %eax, (%esp) // what does happen in this line?
call multiply // thats clear, the function multiply gets called
movl %eax, 12(%esp) // it looks like the same as two lines before, except it contains the number 12
movl $0, %eax // I suppose, this line is because of "return 0;"
leave
ret
Negative references relative to %ebp are for local variables on the stack.
movl 8(%ebp), %eax // I don't know quite what "8(%ebp) mean? It has to do something with res, because`
%eax = x
imull 12(%ebp), %eax // here is the multiplication done. And again "12(%ebp).
%eax = %eax * y
movl %eax, -8(%ebp) // Now, we got a negative number in front of. How to interpret this?
(u_int32_t)res = %eax // sets low 32 bits of res
movl $0, -4(%ebp) // here as well
clears upper 32 bits of res to extend 32-bit multiplication result to uint64_t
movl -8(%ebp), %eax // and here again.
movl -4(%ebp), %edx // also here
return ret; //64-bit results are returned as a pair of 32-bit registers %edx:%eax
As for the main, see x86 calling convention which may help making sense of what happens.
andl $-8, %esp // what happens here and why?
stack boundary is aligned by 8. I believe it's ABI requirement
subl $24, %esp // here, it leaves room for local variables, but why 24 bytes? a, b, c: the size of each of them is 4 Bytes. So 3*4 = 12
Multiples of 8 (probably due to alignment requirements)
movl $3, 20(%esp) // 3 gets pushed on the stack
a = 3
movl $5, 16(%esp) // 5 also get pushed on the stack
b = 5
movl 16(%esp), %eax // what does 16(%esp) mean and what happened with z?
%eax = b
z is at 12(%esp) and is not used yet.
movl %eax, 4(%esp) // we got the here as well
put b on the stack (second argument to multiply())
movl 20(%esp), %eax // and also here
%eax = a
movl %eax, (%esp) // what does happen in this line?
put a on the stack (first argument to multiply())
call multiply // thats clear, the function multiply gets called
multiply returns 64-bit result in %edx:%eax
movl %eax, 12(%esp) // it looks like the same as two lines before, except it contains the number 12
z = (uint32_t) multiply()
movl $0, %eax // I suppose, this line is because of "return 0;"
yup. return 0;
Arguments are pushed onto the stack when the function is called. Inside the function, the stack pointer at that time is saved as the base pointer. (You got that much already.) The base pointer is used as a fixed location from which to reference arguments (which are above it, hence the positive offsets) and local variables (which are below it, hence the negative offsets).
The advantage of using a base pointer is that it is stable throughout the entire function, even when the stack pointer changes (due to function calls and new scopes).
So 8(%ebp) is one argument, and 12(%ebp) is the other.
The code is likely using more space on the stack than it needs to, because it is using temporary variables that could be optimized out of you had optimization turned on.
You might find this helpful: http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
I started typing this as a comment but it was getting too long to fit.
You can compile your example with -masm=intel so the assembly is more readable. Also, don't confuse the push and pop instructions with mov. push and pop always increments and decrements esp respectively before derefing the address whereas mov does not.
There are two ways to store values onto the stack. You can either push each item onto it one item at a time or you can allocate up-front the space required and then load each value onto the stackslot using mov + relative offset from either esp or ebp.
In your example, gcc chose the second method since that's usually faster because, unlike the first method, you're not constantly incrementing esp before saving the value onto the stack.
To address your other question in comment, x86 instruction set does not have a mov instruction for copying values from memory location a to another memory location b directly. It is not uncommon to see code like:
mov eax, [esp+16]
mov [esp+4], eax
mov eax, [esp+20]
mov [esp], eax
call multiply(unsigned int, unsigned int)
mov [esp+12], eax
Register eax is being used as an intermediate temporary variable to help copy data between the two stack locations. You can mentally translate the above as:
esp[4] = esp[16]; // argument 2
esp[0] = esp[20]; // argument 1
call multiply
esp[12] = eax; // eax has return value
Here's what the stack approximately looks like right before the call to multiply:
lower addr esp => uint32_t:a_copy = 3 <--. arg1 to 'multiply'
esp + 4 uint32_t:b_copy = 5 <--. arg2 to 'multiply'
^ esp + 8 ????
^ esp + 12 uint32_t:z = ? <--.
| esp + 16 uint32_t:b = 5 | local variables in 'main'
| esp + 20 uint32_t:a = 3 <--.
| ...
| ...
higher addr ebp previous frame
In the given url this function is given:
http://insecure.org/stf/smashstack.html
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
ret = buffer1 + 12;
(*ret) += 8;
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
The corresponding assembly code for main function is:
Dump of assembler code for function main:
0x8000490 <main>: pushl %ebp
0x8000491 <main+1>: movl %esp,%ebp
0x8000493 <main+3>: subl $0x4,%esp
0x8000496 <main+6>: movl $0x0,0xfffffffc(%ebp)
0x800049d <main+13>: pushl $0x3
0x800049f <main+15>: pushl $0x2
0x80004a1 <main+17>: pushl $0x1
0x80004a3 <main+19>: call 0x8000470 <function>
0x80004a8 <main+24>: addl $0xc,%esp
0x80004ab <main+27>: movl $0x1,0xfffffffc(%ebp)
0x80004b2 <main+34>: movl 0xfffffffc(%ebp),%eax
0x80004b5 <main+37>: pushl %eax
0x80004b6 <main+38>: pushl $0x80004f8
0x80004bb <main+43>: call 0x8000378 <printf>
0x80004c0 <main+48>: addl $0x8,%esp
0x80004c3 <main+51>: movl %ebp,%esp
0x80004c5 <main+53>: popl %ebp
0x80004c6 <main+54>: ret
0x80004c7 <main+55>: nop
In the variable ret, they are pointing ret to the address of the next instruction to be run. I cannot understand that just by keeping the next instruction in the ret variable, how is the program going to jump to this next location?
I know how buffer overflow works, but by changing the ret variable, how is this doing buffer overflow?
Even by considering that this is a dummy program and is just supposed to let us understand how buffer overflow works, changing the ret variable seems wrong.
Explanation of how this is an example of a buffer overrun:
The local variables of function, including buffer1, are on the stack, along with the return address, which is calculated as being 12 bytes beyond buffer1. This is an example of a buffer overrun because writing to an address 12 bytes beyond buffer1 is writing outside the proper bounds of buffer1. By replacing the return address by a number 8 larger than it was, when function finishes, rather than popping off a return to the statement following the function call as usual (x = 1;, in this case), the return address will be 8 bytes later (at the printf statement, in this case).
Skipping the x = 1; statement is not the buffer overflow -- it's the effect of the buffer overflow which modified the return address.
Note on the calculation of 8 as the proper offset for skipping x = 1; statement:
See also FrankH's careful reevaluation of the calculation of 8 as the proper offset to add to the return address to achieve the intent of skipping x = 1;. His findings contradict the GDB-based analysis of the insecure.org source article. Regardless of this detail, the explanation of how a buffer overrun is used to change the return address remains the same -- it's just a question of what to write into the overrun.
For completeness, here is the GDB-based analysis of the insecure.org source article:
What we have done is add 12 to buffer1[]'s address. This new
address is where the return address is stored. We want to skip pass
the assignment to the printf call. How did we know to add 8 to the
return address? We used a test value first (for example 1), compiled
the program, and then started gdb:
[aleph1]$ gdb example3
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.15 (i586-unknown-linux), Copyright 1995 Free Software Foundation, Inc...
(no debugging symbols found)...
(gdb) disassemble main
Dump of assembler code for function main:
0x8000490 <main>: pushl %ebp
0x8000491 <main+1>: movl %esp,%ebp
0x8000493 <main+3>: subl $0x4,%esp
0x8000496 <main+6>: movl $0x0,0xfffffffc(%ebp)
0x800049d <main+13>: pushl $0x3
0x800049f <main+15>: pushl $0x2
0x80004a1 <main+17>: pushl $0x1
0x80004a3 <main+19>: call 0x8000470 <function>
0x80004a8 <main+24>: addl $0xc,%esp
0x80004ab <main+27>: movl $0x1,0xfffffffc(%ebp)
0x80004b2 <main+34>: movl 0xfffffffc(%ebp),%eax
0x80004b5 <main+37>: pushl %eax
0x80004b6 <main+38>: pushl $0x80004f8
0x80004bb <main+43>: call 0x8000378 <printf>
0x80004c0 <main+48>: addl $0x8,%esp
0x80004c3 <main+51>: movl %ebp,%esp
0x80004c5 <main+53>: popl %ebp
0x80004c6 <main+54>: ret
0x80004c7 <main+55>: nop
We can see that when calling function() the RET will be 0x8004a8,
and we want to jump past the assignment at 0x80004ab. The next
instruction we want to execute is the at 0x8004b2. A little math
tells us the distance is 8 bytes.
A little better math tells us that the distance is 0x8004a8 - 0x8004b2 = 0xA or 10 bytes, not 8 bytes.
The layout on the stack is like this (addresses downwards - as stacks grow):
buffer + ... value found description
=================================================================================
+24 3 # from main, pushl $0x3
+20 2 # from main, pushl $0x2
+16 1 # from main, pushl $0x1
+12 <main+24> # from main, call 0x8000470 <function>
+8 <frameptr main> # from function, pushl %ebp
+4 %ebp(function) padding (3 bytes) # ABI - compiler will not _pack_ vars
+0 buffer[5];
... buffer1[12]; # might be optimized out (unused)
... int *ret # might be optimized out (reg used instead)
The tricky thing is that buffer starts at a four-byte-aligned address even though it's not sized a multiple of four bytes. The "effective size" is eight bytes, so if you add eight bytes to the start of it, you find the saved framepointer, and if you go another four bytes down, the saved return address (which, according to your disassembly, is main+0x24 / 0x80004a8. Adding 8 to that jumps "into the middle" of two intructions, the result is garbage - you're not skipping the x = 1 statement.
Wanting to see the output of the compiler (in assembly) for some C code, I wrote a simple program in C and generated its assembly file using gcc.
The code is this:
#include <stdio.h>
int main()
{
int i = 0;
if ( i == 0 )
{
printf("testing\n");
}
return 0;
}
The generated assembly for it is here (only the main function):
_main:
pushl %ebpz
movl %esp, %ebp
subl $24, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
call __alloca
call ___main
movl $0, -4(%ebp)
cmpl $0, -4(%ebp)
jne L2
movl $LC0, (%esp)
call _printf
L2:
movl $0, %eax
leave
ret
I am at an absolute loss to correlate the C code and assembly code. All that the code has to do is store 0 in a register and compare it with a constant 0 and take suitable action. But what is going on in the assembly?
Since main is special you can often get better results by doing this type of thing in another function (preferably in it's own file with no main). For example:
void foo(int x) {
if (x == 0) {
printf("testing\n");
}
}
would probably be much more clear as assembly. Doing this would also allow you to compile with optimizations and still observe the conditional behavior. If you were to compile your original program with any optimization level above 0 it would probably do away with the comparison since the compiler could go ahead and calculate the result of that. With this code part of the comparison is hidden from the compiler (in the parameter x) so the compiler can't do this optimization.
What the extra stuff actually is
_main:
pushl %ebpz
movl %esp, %ebp
subl $24, %esp
andl $-16, %esp
This is setting up a stack frame for the current function. In x86 a stack frame is the area between the stack pointer's value (SP, ESP, or RSP for 16, 32, or 64 bit) and the base pointer's value (BP, EBP, or RBP). This is supposedly where local variables live, but not really, and explicit stack frames are optional in most cases. The use of alloca and/or variable length arrays would require their use, though.
This particular stack frame construction is different than for non-main functions because it also makes sure that the stack is 16 byte aligned. The subtraction from ESP increases the stack size by more than enough to hold local variables and the andl effectively subtracts from 0 to 15 from it, making it 16 byte aligned. This alignment seems excessive except that it would force the stack to also start out cache aligned as well as word aligned.
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
call __alloca
call ___main
I don't know what all this does. alloca increases the stack frame size by altering the value of the stack pointer.
movl $0, -4(%ebp)
cmpl $0, -4(%ebp)
jne L2
movl $LC0, (%esp)
call _printf
L2:
movl $0, %eax
I think you know what this does. If not, the movl just befrore the call is moving the address of your string into the top location of the stack so that it may be retrived by printf. It must be passed on the stack so that printf can use it's address to infer the addresses of printf's other arguments (if any, which there aren't in this case).
leave
This instruction removes the stack frame talked about earlier. It is essentially movl %ebp, %esp followed by popl %ebp. There is also an enter instruction which can be used to construct stack frames, but gcc didn't use it. When stack frames aren't explicitly used, EBP may be used as a general puropose register and instead of leave the compiler would just add the stack frame size to the stack pointer, which would decrease the stack size by the frame size.
ret
I don't need to explain this.
When you compile with optimizations
I'm sure you will recompile all fo this with different optimization levels, so I will point out something that may happen that you will probably find odd. I have observed gcc replacing printf and fprintf with puts and fputs, respectively, when the format string did not contain any % and there were no additional parameters passed. This is because (for many reasons) it is much cheaper to call puts and fputs and in the end you still get what you wanted printed.
Don't worry about the preamble/postamble - the part you're interested in is:
movl $0, -4(%ebp)
cmpl $0, -4(%ebp)
jne L2
movl $LC0, (%esp)
call _printf
L2:
It should be pretty self-evident as to how this correlates with the original C code.
The first part is some initialization code, which does not make any sense in the case of your simple example. This code would be removed with an optimization flag.
The last part can be mapped to C code:
movl $0, -4(%ebp) // put 0 into variable i (located at -4(%ebp))
cmpl $0, -4(%ebp) // compare variable i with value 0
jne L2 // if they are not equal, skip to after the printf call
movl $LC0, (%esp) // put the address of "testing\n" at the top of the stack
call _printf // do call printf
L2:
movl $0, %eax // return 0 (calling convention: %eax has the return code)
Well, much of it is the overhead associated with the function. main() is just a function like any other, so it has to store the return address on the stack at the start, set up the return value at the end, etc.
I would recommend using GCC to generate mixed source code and assembler which will show you the assembler generated for each sourc eline.
If you want to see the C code together with the assembly it was converted to, use a command line like this:
gcc -c -g -Wa,-a,-ad [other GCC options] foo.c > foo.lst
See http://www.delorie.com/djgpp/v2faq/faq8_20.html
On linux, just use gcc. On Windows down load Cygwin http://www.cygwin.com/
Edit - see also this question Using GCC to produce readable assembly?
and http://oprofile.sourceforge.net/doc/opannotate.html
You need some knowledge about Assembly Language to understand assembly garneted by C compiler.
This tutorial might be helpful
See here more information. You can generate the assembly code with C comments for better understanding.
gcc -g -Wa,-adhls your_c_file.c > you_asm_file.s
This should help you a little.