So my homework, I ran it in Ubuntu, and it compiles fine and runs like the way it should. But when I run this in Mac OSX, it gets a bus error. Why is that?
I'm compiling with gcc -m32 source.c -o test
Here's the Mac OSX version (added prefixed underscores):
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
char phrase[] = "slow but sure";
int sz;
int phrasesz;
char *arg;
char *result;
// Add any extra variables you may need here.
int main(int argc, char* argv[]) {
if (argc != 2) {
printf("Usage: %s takes 1 string argument.\n", argv[0]);
exit(1);
}
// Allocate memory and copy argument string into arg.
sz = strlen(argv[1]) + 1;
arg = malloc(sz);
strcpy(arg, argv[1]);
// Allocate lots of memory for the result.
phrasesz = strlen(phrase) + 1;
result = malloc(sz * phrasesz);
// Now copy phrase into result, while replacing SPACE
// with SPACE+arg+SPACE.
__asm__("\n\
leal _phrase, %esi\n\
movl _result, %ebx\n\
outerLoop:\n\
cmpb $0, (%esi)\n\
je finished\n\
forLoop:\n\
cmpb $32,(%esi)\n\
je endLoop\n\
cmpb $0, (%esi)\n\
je finished\n\
mov (%esi), %eax\n\
mov %eax, (%ebx)\n\
incl %ebx\n\
incl %esi\n\
jmp forLoop\n\
endLoop:\n\
mov (%esi), %eax\n\
mov %eax, (%ebx)\n\
incl %ebx\n\
incl %esi\n\
movl _arg, %edx\n\
copyArgv1IntoResult:\n\
cmpb $0, (%edx)\n\
je finishedCopyingArgv1\n\
mov (%edx), %ecx\n\
mov %ecx, (%ebx)\n\
incl %ebx\n\
incl %edx\n\
jmp copyArgv1IntoResult\n\
finishedCopyingArgv1:\n\
movb $32, (%ebx)\n\
incl %ebx\n\
jmp outerLoop\n\
finished:\n\
movb $0, (%ebx)\n\
");
printf("%s\n", result);
return 0;
}
Update:
I ran it in gdb debugger and this is the error I am getting.
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x00001ee8 in finished ()
1: x/i $pc 0x1ee8 <finished+11>: mov (%eax),%eax
Also, I am removing the Ubuntu version so there's less scrolling.
Some of your instructions, like...
mov (%esi), %eax
...are copying more than a byte from the character buffer at a time. I assume that's accidental? You'd do well to write the code in C then use gcc -S and compare to your hand-written code. Even if the buffers are aligned to a word-boundary, you're incrementing the pointers by one byte, so certain to attempt an unaligned memory read. A sigbus basically means that you're trying to read a word's worth of memory from an address that points to a byte that's not at the start of an aligned word, but some CPUs silently if slowly battle on while others bail out. I've no idea of the hardware differences between your hosts.
Related
This is my assembly program add.s
.globl add
add:
movl 4(%esp), %eax
movl 8(%esp), %ebx
addl %ebx, %eax
ret
This is my C program. I am trying to call the assembly program from the C program.
#include <stdio.h>
int add(int a, int b);
int main() {
int res = add(5,6);
printf("%d",res);
return 0;
}
But the above code gives me a segmentation fault. What is causing this error and how do I fix it?
Assuming the cdecl calling convention, you are using the ebx register which is supposed not to be clobbered: its value has to be saved and then restored by the callee if it is going to be modified.
The caller assumes that ebx is not going to change by calling a function. Therefore if the callee modifies ebx it has to save it first and then restore it to its original value before returning from the function.
The registers eax, ecx and edx can be used without having to save them first and then restored. Therefore, I would recommend replacing ebx with edx in your code:
add:
movl 4(%esp), %eax
movl 8(%esp), %edx
addl %edx, %eax
ret
Trying to run through GDB and keep getting a segmentation fault once the C main program enters the main function.
GDB Error:
Breakpoint 1, main () at binom_main.c:7
7 n=10;
(gdb) s
10 0;
(gdb) s
12 +){
(gdb) s
Program received signal SIGSEGV, Segmentation fault.
0x00000000004005c4 in otherwise ()
(gdb)
I compiled the code as such:
as binom.s -o binom.o
gcc -S -Og binom_main.c
gcc -c binom_main.s
gcc binom_main.o binom.o -o runtimes
I'm trying to learn how to use GDB more efficiently here but segfaults like these are pretty ambiguous and limiting. Why is this segfault being caused the moment the function begins? Have I linked the two files incorrectly?
Main :
#include <stdio.h>
unsigned int result,m,n,i;
unsigned int binom(int,int);
int main(){
n=10;
i=0;
for (i=1; i<2;i++){
result = binom(n,i);
printf("i=%d | %d \n", i, result );
}
return 0;
}
Sub:
.text
.globl binom
binom:
mov $0x00, %edx #for difference calculation
cmp %edi, %esi #m=n?
je equalorzero #jump to equalorzero for returning of value 1
cmp $0x00, %esi #m=0?
je equalorzero
cmp $0x01, %esi #m=1?
mov %esi,%edx
sub %edi, %edx
cmp $0x01, %edx # n-m = 1 ?
je oneoronedifference
jmp otherwise
equalorzero:
add $1, %eax #return 1
call printf
ret
oneoronedifference:
add %edi, %eax #return n
ret
otherwise:
sub $1, %edi #binom(n-1,m)
call binom
sub $1, %esi #binom(n-1,m-1)
call binom
ret
When you use gdb to debug asm, look at the disassembly window as well as the source window. (e.g. layout asm / layout reg, and layout next until you get the combo of windows that you want.) See the bottom of the x86 tag wiki for some more tips and a link to docs.
You can use stepi (si) to step by instructions, not by C statements, while investigating a crash outside your asm, caused by it corrupting something before returning.
This looks like a bug:
sub $1, %edi #binom(n-1,m)
call binom
# at this point, %edi no longer holds n-1, and %esi no longer holds m.
# because binom clobbers them. (This is normal)
# as Jester points out, you also don't save the return value (%eax) from the first call anywhere.
sub $1, %esi #binom(n-1,m-1)
call binom
Another (minor?) bug is:
cmp $0x01, %esi #m=1?
# but then you never read the flags that cmp set
Another serious bug:
equalorzero:
add $1, %eax #return 1 # wrong: nothing before this set %eax to anything.
# mov $1, %eax # You probably want this instead
ret
I have a buffer overflow problem that I need to solve. Below is the problem, at the bottom is my question:
#include <stdio.h>
#include <string.h>
void lan(void) {
printf("Your loyalty to your captors is touching.\n");
}
void vulnerable(char *str) {
char buf[LENGTH]; //Length is not given
strcpy(buf, str); //str to fixed size buf (uh-oh)
}
int main(int argc, char **argv) {
if (argc < 2)
return -1;
vulnerable(argv[1]);
return 0;
}
(gdb) disass vulnerable
0x08048408: push %ebp
0x08048409: mov %esp, %ebp
0x0804840b: sub $0x88, %esp
0x0804840e: mov 0x8(%ebp), %eax
0x08048411: mov %eax, 0x4(%esp)
0x08048415: lea -0x80(%ebp), %eax
0x08048418: mov %eax, (%esp)
0x0804841b: call 0x8048314 <strcpy>
0x08048420: leave
0x08048421: ret
End of assembler dump.
(gdb) disass lan
0x080483f4: push %ebp
0x080483f5: mov %esp, %ebp
0x080483f7: sub $0x4, %esp
0x080483fa: movl $0x8048514, (%esp)
0x08048401: call 0x8048324 <puts>
0x08048406: leave
0x08048407: ret
End of assembler dump.
Then we have the following info:
(gdb) break *0x08048420
Breakpoint 1 at 0x8048420
(gdb) run 'perl -e' print "\x90" x Length' 'AAAABBBBCCCCDDDDEEEE'
Breakpoint 1, 0x08048420 in vulnerable
(gdb) info reg $ebp
ebp 0xffffd61c 0xffffd61c
(gdb) # QUESTION: Where in memory does the buf buffer start?
(gdb) cont
Program received signal SIGSEGV, Segmentation fault.
And finally, the perl command is a shorthand for writing out LENGTH copies of the character 0x90.
I've done a couple of problems of this sort before, but what stops me here is the following question: "By looking at the assembly code, what is the value of LENGTH?"
I'm not sure how to find that from the given assembly code. What I do know is.. the buffer that we're writing into is on the stack at the location -128(%ebp) (where -128 is a decimal number). However, I'm not sure where to go from here to get the length of the buffer.
Let's look at your vulnerable function.
First the compiler creates a frame and reserves 0x88 bytes on the stack:
0x08048408: push %ebp
0x08048409: mov %esp, %ebp
0x0804840b: sub $0x88, %esp
Then it puts two values onto the stack:
0x0804840e: mov 0x8(%ebp), %eax
0x08048411: mov %eax, 0x4(%esp)
0x08048415: lea -0x80(%ebp), %eax
0x08048418: mov %eax, (%esp)
And the last thing it does before returning is calling strcpy(buf, str):
0x0804841b: call 0x8048314 <strcpy>
0x08048420: leave
0x08048421: ret
So we can deduce that the two values it put on the stack are the arguments to strcpy.
mov 0x8(%ebp) would be char *str and lea -0x80(%ebp) would be a pointer to char buf[LENGTH].
Therefore, we know that your buffer starts at -0x80(%ebp), so it has a length of 0x80 = 128 bytes assuming the compiler didn't waste any space.
What I do know is.. the buffer that we're writing into is on the stack
at the location -128(%ebp)
Since the local variables end at %ebp, and you only have a single local variable which is buffer itself, you can conclude that it has length at most 128. It may be shorter, if the compiler added some padding for alignment.
I was looking at the difference in C between char* c = "thomas"; and char c[] = "thomas";. I saw questions about this here and while trying to understand the answers I wanted to check that I was right by looking at the assembly. And a few questions were born.
Here is what I thought :
char* c = ... : the characters are allocated somewhere on the static memory (read only from the program's perspective), alongside with the code. That's why it should be marked const. The string can be printed but not modified.
char c[] = ... : Same as 1. except that when a function is called, the characters are copied in an array on the stack, so it can be modified etc etc.
I wanted to check this so I made this C code :
#include <stdio.h>
int main(){
char c [] = "thomas blabljbflkjbsdflkjbds";
printf("%s\n", c);
}
Looking at the generated assembly :
0x400564 <main>: push rbp
0x400565 <main+1>: mov rbp,rsp
0x400568 <main+4>: sub rsp,0x30
0x40056c <main+8>: mov rax,QWORD PTR fs:0x28
0x400575 <main+17>: mov QWORD PTR [rbp-0x8],rax
0x400579 <main+21>: xor eax,eax
0x40057b <main+23>: mov DWORD PTR [rbp-0x30],0x6978616d
0x400582 <main+30>: mov DWORD PTR [rbp-0x2c],0x6220656d
0x400589 <main+37>: mov DWORD PTR [rbp-0x28],0x6c62616c
0x400590 <main+44>: mov DWORD PTR [rbp-0x24],0x6c66626a
0x400597 <main+51>: mov DWORD PTR [rbp-0x20],0x73626a6b
0x40059e <main+58>: mov DWORD PTR [rbp-0x1c],0x6b6c6664
0x4005a5 <main+65>: mov DWORD PTR [rbp-0x18],0x7364626a
0x4005ac <main+72>: mov BYTE PTR [rbp-0x14],0x0
0x4005b0 <main+76>: lea rax,[rbp-0x30]
0x4005b4 <main+80>: mov rdi,rax
0x4005b7 <main+83>: call 0x400450 <puts#plt>
0x4005bc <main+88>: mov rdx,QWORD PTR [rbp-0x8]
0x4005c0 <main+92>: xor rdx,QWORD PTR fs:0x28
0x4005c9 <main+101>: je 0x4005d0 <main+108>
So characters are copied into the stack, which is what I thought.
Questions :
The characters are stored by bytes at addresses 0x6978616d, 0x6220656d and so on. Why aren't they allocated contiguously in an array ? Simple optimization of the compiler ?
explains why char* doesn't behave like an array and why c[10] isn't the 11th character of the string. However it doesn't explain why
char* c = "thomas blabljbflkjbsdflkjbds";
printf("%s\n", c);
works. (Note the [] -> *). I guess that printf reads characters by characters until it reaches a 0, so knowing just c (i.e &c[0]) how does it access c[10] ? (because of the non contiguous and the fact that this time chars are not copied to an array on the stack)
I hope that I am clear, I can reformulate if you ask/don't understand a point. Thanks
1: 0x6978616d, 0x6220656d are not addresses, it is the data in your string. When converted to from hex to ascii, 0x6978616d = moht, 0x6220656d = b sa.
2: When used in a function call, arrays decay into pointers. So printf will receive a pointer to char regardless of if c is an array or a pointer.
A compiler may actually choose to compile character array initialisation as a copy from read-only storage, but as Klas suggests, that is not happening in your example.
Here is an example of code for which that does happen (using gcc). It may be illuminating to change the definition of STR to strings of various lengths and look at the difference in assembly output.
/* 99 characters */
#define STR "123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789"
void observe(const char *);
void test1() {
char *str = STR;
observe(str);
}
void test2() {
char str[] = STR;
observe(str);
}
And the assembly:
.section .rodata.str1.4,"aMS",#progbits,1
.align 4
.LC0:
.string "123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789"
.text
test2:
pushl %ebp
movl $25, %ecx
movl %esp, %ebp
subl $136, %esp
movl %esi, -8(%ebp)
movl $.LC0, %esi
movl %edi, -4(%ebp)
leal -108(%ebp), %edi
rep movsl
leal -108(%ebp), %eax
movl %eax, (%esp)
call observe
movl -8(%ebp), %esi
movl -4(%ebp), %edi
movl %ebp, %esp
popl %ebp
ret
test1:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $.LC0, (%esp)
call observe
leave
ret
I'm trying to get the FP in my C program, I tried two different ways, but they both differ from what I get when I run GDB.
The first way I tried, I made a protocol function in C for the Assembly function:
int* getEbp();
and my code looks like this:
int* ebp = getEbp();
printf("ebp: %08x\n", ebp); // value i get here is 0xbfe2db58
while( esp <= ebp )
esp -= 4;
printf( "ebp: %08x, esp" ); //value i get here is 0xbfe2daec
My assembly code
getEbp:
movl %ebp, %eax
ret
I tried making the prototype function to just return an int, but that also doesn't match up with my GDB output. We are using x86 assembly.
EDIT: typos, and my getEsp function looks exactly like the other one:
getEsp:
movl %esp, %eax
ret
For reading a register, it's indeed best to use GCC extended inline assembly syntax.
Your getEbp() looks like it should work if you compiled it in a separate assembler file.
Your getEsp() is obviously incorrect since it doesn't take the return address pushed by the caller into account.
Here's a code snippet that gets ebp through extended inline asm and does stack unwinding by chasing the frame pointer:
struct stack_frame {
struct stack_frame *prev;
void *return_addr;
} __attribute__((packed));
typedef struct stack_frame stack_frame;
void backtrace_from_fp(void **buf, int size)
{
int i;
stack_frame *fp;
__asm__("movl %%ebp, %[fp]" : /* output */ [fp] "=r" (fp));
for(i = 0; i < size && fp != NULL; fp = fp->prev, i++)
buf[i] = fp->return_addr;
}
I'll show two working implementations of reading the registers below. The pure asm functions are get_ebp() and get_esp() in getbp.S. The other set implemented as inline functions are get_esp_inline() and get_ebp_inline() at the top of test-getbp.c.
In getbp.S
.section .text
/* obviously incurring the cost of a function call
to read a register is inefficient */
.global get_ebp
get_ebp:
movl %ebp, %eax
ret
.global get_esp
get_esp:
/* 4: return address pushed by caller */
lea 4(%esp), %eax
ret
In test-getbp.c
#include <stdio.h>
#include <stdint.h>
/* see http://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation */
#include <sys/sdt.h>
int32_t *get_ebp(void);
int32_t *get_esp(void);
__attribute__((always_inline)) uintptr_t *get_ebp_inline(void)
{
uintptr_t *r;
__asm__ volatile ("movl %%ebp, %[r]" : /* output */ [r] "=r" (r));
return r;
}
__attribute__((always_inline)) uintptr_t *get_esp_inline(void)
{
uintptr_t *r;
__asm__ volatile ("movl %%esp, %[r]" : /* output */ [r] "=r" (r));
return r;
}
int main(int argc, char **argv)
{
uintptr_t *bp, *sp;
/* allocate some random data on the stack just for fun */
int a[10] = { 1, 3, 4, 9 };
fprintf(fopen("/dev/null", "r"), "%d\n", a[3]);
STAP_PROBE(getbp, getbp); /* a static probe is like a named breakpoint */
bp = get_ebp();
sp = get_esp();
printf("asm: %p, %p\n", (void*)bp, (void*)sp);
bp = get_ebp_inline();
sp = get_esp_inline();
printf("inline: %p, %p\n", (void*)bp, (void*)sp);
return 0;
}
We can now write a GDB script to dump ebp and esp while making use of the getbp static probe defined in test-getbp.c above.
In test-getbp.gdb
file test-getbp
set breakpoint pending on
break -p getbp
commands
silent
printf "gdb: 0x%04x, 0x%04x\n", $ebp, $esp
continue
end
run
quit
To verify that the functions return the same data as GDB:
$ gdb -x test-getbp.gdb
< ... >
gdb: 0xffffc938, 0xffffc920
asm: 0xffffc938, 0xffffc920
inline: 0xffffc938, 0xffffc920
< ... >
Disassembling test-getbp main() produces:
0x08048370 <+0>: push %ebp
0x08048371 <+1>: mov %esp,%ebp
0x08048373 <+3>: push %ebx
0x08048374 <+4>: and $0xfffffff0,%esp
0x08048377 <+7>: sub $0x10,%esp
0x0804837a <+10>: movl $0x8048584,0x4(%esp)
0x08048382 <+18>: movl $0x8048586,(%esp)
0x08048389 <+25>: call 0x8048360 <fopen#plt>
0x0804838e <+30>: movl $0x9,0x8(%esp)
0x08048396 <+38>: movl $0x8048590,0x4(%esp)
0x0804839e <+46>: mov %eax,(%esp)
0x080483a1 <+49>: call 0x8048350 <fprintf#plt>
0x080483a6 <+54>: nop
0x080483a7 <+55>: call 0x80484e4 <get_ebp>
0x080483ac <+60>: mov %eax,%ebx
0x080483ae <+62>: call 0x80484e7 <get_esp>
0x080483b3 <+67>: mov %ebx,0x4(%esp)
0x080483b7 <+71>: movl $0x8048594,(%esp)
0x080483be <+78>: mov %eax,0x8(%esp)
0x080483c2 <+82>: call 0x8048320 <printf#plt>
0x080483c7 <+87>: mov %ebp,%eax
0x080483c9 <+89>: mov %esp,%edx
0x080483cb <+91>: mov %edx,0x8(%esp)
0x080483cf <+95>: mov %eax,0x4(%esp)
0x080483d3 <+99>: movl $0x80485a1,(%esp)
0x080483da <+106>: call 0x8048320 <printf#plt>
0x080483df <+111>: xor %eax,%eax
0x080483e1 <+113>: mov -0x4(%ebp),%ebx
0x080483e4 <+116>: leave
0x080483e5 <+117>: ret
The nop at <main+54> is the static probe. See the code around the two printf calls for how the registers are read.
BTW, this loop in your code seems strange to me:
while( esp <= ebp )
esp -= 4;
Don't you mean
while (esp < ebp)
esp +=4
?
Because you're relying on implementation specific details, you need to provide more information about your target to get an accurate answer. You didn't specify architecture, compiler or operating system, which are really required to answer your question.
Making an educated guess based on the register names you referenced and the fact that you're using at&t syntax, I'm going to assume this is i386 and you're using gcc.
The simplest way to achieve this is using gcc variable attributes, you can try this, which is a gcc specific syntax to request a specific register.
#include <stdint.h>
#include <stdio.h>
int main(int argc, char **argv)
{
const uintptr_t register framep asm("ebp");
fprintf(stderr, "val: %#x\n", framep);
return 0;
}
An alternative is to use inline assembly to load the value, like this:
#include <stdint.h>
#include <stdio.h>
int main(int argc, char **argv)
{
uintptr_t framep;
asm("movl %%ebp, %0" : "=r" (framep));
fprintf(stderr, "val: %#x\n", framep);
return 0;
}
This requests a 32bit register for a write-operation (= modifier), and loads it onto framep. The compiler takes care of extracting the values you declare.
In gdb, you can print the value and verify it matches the output.
(gdb) b main
Breakpoint 1 at 0x40117f: file ebp2.c, line 8.
(gdb) r
Starting program: /home/zero/a.exe
[New Thread 4664.0x1290]
[New Thread 4664.0x13c4]
Breakpoint 1, main (argc=1, argv=0x28ac50) at ebp2.c:8
8 asm("movl %%ebp, %0" : "=r" (framep));
(gdb) n
10 fprintf(stderr, "val: %#x\n", framep);
(gdb) p/x framep
$1 = 0x28ac28
(gdb) p/x $ebp
$2 = 0x28ac28
(gdb) c
Continuing.
val: 0x28ac28
[Inferior 1 (process 4664) exited normally]
(gdb) q
Remember that you cannot rely on this behaviour, even on x86 gcc can be configured to not use the frame pointer and keeps track of stack usage manually. This is generally called FPO by Microsoft, or omit-frame-pointer on other platforms. This trick frees up another register for general purpose use, but makes debugging a little more complicated.
You're correct that eax is generally used for return values where possible in x86 calling conventions, I have no idea why the comments on your post claim the stack is used.