I tried to read an input into a buffer with fgets. I pushed the 3 parameters, but got segmentation fault. I tried to see the problem with GDB, but I didn't understand the message that I got there.
This is the code:
section .rodata
buffer: db 10
section .text
align 16
global main
extern fgets
extern stdin
main:
push ebp
mov ebp, esp
pushad
push dword[stdin]
push 10;
push buffer;
call fgets;
add esp, 12
popad
mov esp, ebp
pop ebp
ret
And this is the message that I got:
Program received signal SIGSEGV, Segmentation fault.
__GI__IO_getline_info (fp=fp#entry=0xf7fb1c20 <_IO_2_1_stdin_>,
buf=buf#entry=0x80484f0 "\n", n=8, n#entry=9, delim=delim#entry=10,
extract_delim=extract_delim#entry=1, eof=eof#entry=0x0) at iogetline.c:86
86 iogetline.c: No such file or directory.
What is wrong with my code?
You segfault because you ask fgets to write to an address in the .rodata section. It's of course read-only.
Put your buffer in the .bss section, and use resb 10 to reserve 10 bytes. Your current version is one byte, initialized to { 10 }. You don't want to store a bunch of zeros in your executable for no reason; that's what the bss is for.
section .bss
buffer: resb 10
buffer_length equ $ - buffer
section .text
align 16
global main
extern fgets
extern stdin
main:
push dword [stdin]
push buffer_length
push buffer ; 3 pushes gets the stack back to 16B-alignment
call fgets
add esp, 12
ret
You don't need pusha, or a stack frame (the stuff with ebp) in this function. Normally you only save/restore call-preserved registers you want to use, not all of them every time.
As Michael Petch points out, it would also be better to reserve space on the stack for the buffer, instead of using static storage. Have a look at compiler output for an equivalent C function that uses a local array. (e.g. on http://gcc.godbolt.org/).
Related
Im pretty new to assembly, and am trying my best to learn it. Im taking a course to learn it and they mentioned a very remedial Hello World example, that I decomplied.
original c file:
#include <stdio.h>
int main()
{
printf("Hello Students!");
return 0;
}
This was decompiled using the following command:
C:> objdump -d -Mintel HelloStudents.exe > disasm.txt
decompliation (assembly):
push ebp
mov ebp, esp
and esp, 0xfffffff0
sub esp, 0x10
call 401e80 <__main>
mov DWORD PTR [esp], 0x404000
call 4025f8 <_puts>
mov eax, 0x0
leave
ret
Im having issues mapping this output from the decompliation, to the original C file can someone help?
Thank you very much!
The technical term for decompiling assembly back into C is "turning hamburger back into cows". The generated assembly will not be a 1-to-1 translation of the source, and depending on the level of optimization may be radically different. You will get something functionally equivalent to the original source, but how closely it resembles that source in structure is heavily variable.
push ebp
mov ebp, esp
and esp, 0xfffffff0
sub esp, 0x10
This is all preamble, setting up the stack frame for the main function. It aligns the stack pointer (ESP) by 16 bytes then reserves another 16 bytes of space for outgoing function args.
call 401e80, <___main>
This function call to ___main is how MinGW arranges for libc initialization functions to run at the start of the program, making sure stdio buffers are allocated and stuff like that.
That's the end of the pre-amble; the part of the function that implements the C statements in your source starts with:
mov DWORD PTR [esp], 0x404000
This writes the address of the string literal "Hello Students!" onto the stack. Combined with the earliersub esp, 16, this is like apush` instruction. In this 32-bit calling convention, function args are passed on the stack, not registers, so that's where the compiler has to put them before function calls.
call 4025f8 <_puts>
This calls the puts function. The compiler realized that you weren't doing any format processing in the printf call and replaced it with the simpler puts call.
mov eax, 0x0
The return value of main is loaded into the eax register
leave
ret
Restore the previous EBP value, and tear down the stack frame, then exit the function. ret pops a return address off the stack, which can only work when ESP is pointing at the return address.
I use the fgets() function in assembly and it should work, but I have a problem with my buffer. Is there a way to define a char pointer? I ask because the function needs a char pointer as the first argument.
Here you can see my code:
; nasm fgets.asm -f elf64 -o fgets.o
; gcc -no-pie fgets.o
; ./a.out
; Define fgets as an external function
extern fgets
SECTION .DATA
buffer: db "0000000000", 0
SECTION .TEXT
global main
main:
push rbp ; Push stack
; Set up parameters and call the C function
mov rdi, buffer
mov rsi,10
mov rdx, 1
mov rax,0
call fgets
pop rbp ; Pop stack
mov rax,0 ; Exit code 0
ret ; Return
I want to read something from the stdin which is a maximum of 10 characters long.
A FILE * is not a file descriptor. Instead of passing 1 like you do, pass [stdin] (this works because stdin is a global pointer in glibc, and the keyword stdin in NASM is a pointer to that):
mov rdx, [stdin]
If you use GAS, this will work:
mov stdin, %rdx
However, you should probably be using RIP-relative addressing; this allows your executable to be relocated and is required for PIE (position-independent executable)s, which are the default now. In NASM, simply put this at the top of the file:
default rel
In GAS, it's a bit more complicated. You have to add (%rip) to all the external symbols you use, like this:
mov stdin(%rip), %rdx
This loads the memory located at stdin (which is the 8-byte FILE * pointer you're looking for) into rdx.
fgets takes a FILE * pointer arg, not an integer file descriptor, as the 3rd arg.
Also, 1 is the stdout file descriptor, not stdin. But anyway, when fgets dereferences 1 as a pointer it segfaults. You could have seen this using a debugger to find the instruction that faulted.
In C you'd call fgets(buf, len, stdin). The global variable stdin is a pointer of type FILE *stdin. This pointer value (to an opaque FILE struct) is itself stored in memory at the symbol address stdin. (Glibc's startup code initializes this pointer to point at a FILE struct it allocates).
Therefore you want to load a qword pointer from static storage as the 3rd arg for fgets. You could have seen this yourself by looking at compiler output for the C function.
default rel ; Use RIP-relative addressing by default
extern stdin
extern fgets
...
main:
push rax ; dummy push to realign the stack by 16
...
lea rdi, [buffer] ; RIP-relative LEA, or mov edi, buffer in a non-PIE
mov esi, buffer.len
mov rdx, [stdin]
call fgets
...
pop rcx ; dummy pop to readjust RSP
ret
section .bss
buffer: resb 11 ; reserve 11 bytes, zero-filled
.len equ $ - buffer
Note that it's not just lea rdx, [stdin] or mov edx, stdin to put the address of the stdin variable itself in a register. fgets takes the pointer arg by value, not as a pointer to a pointer, and it's not FILE stdin, it's FILE *stdin.
I got the assembler to calculate the length of buffer for me. I filled it with zero bytes instead of ASCII '0' characters.
fgets() reads at most len-1 bytes, and writes a terminating \0 after those, so you need to pass 11 (the size of the full buffer) to read up to 10 bytes.
fgets isn't variadic so you don't need to zero AL before calling it.
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
int main(int argc, char *argv[]){
char a[5];
char b[10];
strcpy(a,"nop");
gets(b);
printf("Hello there %s. Value in a is %s.\n",b,a);
exit(0);
}
The first few lines of assembly output show:
push %ebp
mov %esp,%ebp
sub $0x28,%esp
mov $0x80c5b08,%edx
lea -0xd(%ebp),%eax
mov (%edx),%edx
mov %edx,(%eax)
lea -0x17(%ebp),%eax
mov %eax,(%esp)
call 0x8049c60 <gets>
I'm confused for a few reason. First, why do we do sub $0x28,%esp which accounts for 40 bytes if char *argv[] accounts for 8 bytes, int argc accounts for 4, a accounts for 8, and b accounts for 12 -> 8+4+8+12 = 32?
I'm also struggling to see where strcpy happens and what accounts for the two memory addresses $0x80c5b08 and 0x8049c60.
All right, I'll give you what I can, but my assembly is a bit rusty. First, let's start with what you are looking at. With AT&T syntax, you basically have to read the address operation backwards (operation data register) compared to Intel syntax (operation register data) which is part of the reason some people prefer to read Intel.
The assembly calls are not too difficult to digest from an overview standpoint. If you look at the first two commands, the first pushes the previous base pointer address onto the stack to save it. (when this program exits, the previous base pointer address will be restored and that is where execution in the calling routine will pick back up). The second line moves the base pointer address for this program to the current stack pointer address (top of the stack) to start executing your program. Both lines are known as the assembly prolog.
push %ebp
mov %esp,%ebp
The next line subtracts 40 bytes (28 hex) from the stack pointer (the stack grows lower) to create space for the local variables a and b where the "nop" data and the resuts of gets will be copied. I'm not sure what precice alignment it is trying to achieve, but the storage for a will be 5 bytes and b 10.
sub $0x28,%esp
The following line moves the pointer address 0x80c5b08 to the general purpose dx register (edx for 80386 32-bit registers). In assembly, you put the address of the data you want to manipulate into one of the CPU registers, before you do something with it. Here it looks to be putting the memory address for "nop" in edx.
mov $0x80c5b08,%edx
The next call lea loads the effective address copies the memory address (at offset) base pointer - 14 (0xd hex) bytes into the eax register. The beginning address to a so that the string "nop" can be copied there.
lea -0xd(%ebp),%eax
The following calls to mov copy the data pointed to by edx to the memory location specified in eax. copying "nop" to a.
mov (%edx),%edx
mov %edx,(%eax)
The next lea loads the memory address for base pointer - 23 (0x17 hex) b into eax and the mov places the address on the stack before the call to gets fills the memory at that location.
lea -0x17(%ebp),%eax
mov %eax,(%esp)
call 0x8049c60 <gets>
Afterwards, there with be instructions to load the memory addresses for a and b along with the address for the static part of the string to string before calling printf. Hopefully this will help.
There may be some padding after the local variables
as there needs to be (32-bit aligned) room for the parameter for gets()
and the PC register that will be saved via the call instruction.
Note: the ebp register has to point to the next available stack address
after the local stack frame.
Note: the gets() function should never be used,
for several reasons. Use fgets() instead.
The strcpy() was replaced by the compiler, with a macro invocation.
that macro produced the following:
mov $0x80c5b08,%edx
lea -0xd(%ebp),%eax
mov (%edx),%edx
I'm also struggling to see where strcpy() happens and what accounts for the two memory addresses $0x80c5b08 and 0x8049c60"
The 0x80c5b08 is the address of the literal that is to be copied into the variable.
The 0x8049c60 is the linked address of the gets() function.
I made a simple program which will just push a number and display it on the screen but
don't know what is going wrong
section .data
value db 10
section .text
global main
extern printf
main:
push 10 //can we push value directly on stack?
call printf
add esp,4
ret
Getting Segmentation fault for above.
section .data
value db 10
section .text
global main
extern printf
main:
push [value]
call printf
add esp,4
ret
In second version will be pushing value pointed to by value variable on to stack
But getting "operation size not specified"
Yes, you can push any DWORD value (in 32-bit assembler) onto the stack.
The problem in the first code fragment is that printf expects the first argument to be a format string (in C, you'd write printf("%d\n", 10);). So something like
section .data
fmt db "%d", 10, 0
...
push 10
push fmt
call printf
add esp, 8
will work.
In the second code fragment, instead of push [value] you should write push dword [value], but that's not correct if your value variable is a single byte. Either declare it as a DWORD (dd), or perform
movsx eax, byte [value] ; if it's a signed integer; movzx for unsigned
push eax
And one more thing. When calling printf (or any of the C library functions), beware of stack alignment. Some platforms require that stack is 16-byte aligned at the time of a function call (this is necessary for correct execution of optimized CPU instructions like SSE). So, to make the stack aligned:
push ebp
mov ebp, esp
sub esp, 8 ; reserve 8 bytes for parameters
and esp, -16 ; align the stack (the reserved space can increase)
mov dword [esp], fmt ; put parameters into stack
mov dword [esp+4], 10
call printf
mov esp, ebp ; restore stack
pop ebp
I just noticed some strange assembly language code of empty main method.
//filename: main.c
void main()
{
}
disassembly:
push ebp
mov ebp,esp
sub esp,0C0h; why on the earth is it reserving 192 bytes?
push ebx
push esi
push edi ; good compiler. Its saving ebx, esi & edi values.
lea edi,[ebp-0C0h] ; line 1
mov ecx,30h ; line 2
mov eax,0CCCCCCCCh ; line 3
rep stos dword ptr es:[edi] ; line 4
xor eax,eax ; returning value 0. Code following this line is explanatory.
pop edi ; restoring the original states of edi,esi & ebx
pop esi
pop ebx
mov esp,ebp
pop ebp
ret
why on the earth is it reserving 192 bytes for function where there aren't any variables
whats up with the four lines: line 1, line 2, line 3, line 4? what is it trying to do & WHY?
Greg already explained how the compiler generates code to diagnose uninitialized local variables, enabled by the /RTCu compile option. The 0xcccccccc value was chosen to be distinctive and easily recognized in the debugger. And to ensure the program bombs when an uninitialized pointer is dereferenced. And to ensure it terminates the program when it is executed as code. 0xcc is pretty ideal to do all of these jobs well, it is the instruction opcode for INT3.
The mysterious 192 bytes that are allocated in the stack frame are there to support the Edit + Continue feature, /ZI compile option. It allows you to edit the code while a breakpoint is active. And add local variables to a function those 192 bytes are available to provide the space for those added locals. Exceeding that space will make the IDE force you to rebuild your program.
Btw: this can cause a problem if you use recursion in your code. The debug build will bomb with this site's name a lot quicker. Not normally much of an issue, you debug with practical dataset sizes.
The four code lines you've indicated are the debug build clearing out the local variable space with the "clear" special value (0xCCCCCCCC).
I'm not sure why there are 192 bytes of seemingly dead space, but that might be VC++ building some guard space into your local variable area to try to detect stack smashing.
You will probably get a very different output if you switch from Debug to Release build.