I use the fgets() function in assembly and it should work, but I have a problem with my buffer. Is there a way to define a char pointer? I ask because the function needs a char pointer as the first argument.
Here you can see my code:
; nasm fgets.asm -f elf64 -o fgets.o
; gcc -no-pie fgets.o
; ./a.out
; Define fgets as an external function
extern fgets
SECTION .DATA
buffer: db "0000000000", 0
SECTION .TEXT
global main
main:
push rbp ; Push stack
; Set up parameters and call the C function
mov rdi, buffer
mov rsi,10
mov rdx, 1
mov rax,0
call fgets
pop rbp ; Pop stack
mov rax,0 ; Exit code 0
ret ; Return
I want to read something from the stdin which is a maximum of 10 characters long.
A FILE * is not a file descriptor. Instead of passing 1 like you do, pass [stdin] (this works because stdin is a global pointer in glibc, and the keyword stdin in NASM is a pointer to that):
mov rdx, [stdin]
If you use GAS, this will work:
mov stdin, %rdx
However, you should probably be using RIP-relative addressing; this allows your executable to be relocated and is required for PIE (position-independent executable)s, which are the default now. In NASM, simply put this at the top of the file:
default rel
In GAS, it's a bit more complicated. You have to add (%rip) to all the external symbols you use, like this:
mov stdin(%rip), %rdx
This loads the memory located at stdin (which is the 8-byte FILE * pointer you're looking for) into rdx.
fgets takes a FILE * pointer arg, not an integer file descriptor, as the 3rd arg.
Also, 1 is the stdout file descriptor, not stdin. But anyway, when fgets dereferences 1 as a pointer it segfaults. You could have seen this using a debugger to find the instruction that faulted.
In C you'd call fgets(buf, len, stdin). The global variable stdin is a pointer of type FILE *stdin. This pointer value (to an opaque FILE struct) is itself stored in memory at the symbol address stdin. (Glibc's startup code initializes this pointer to point at a FILE struct it allocates).
Therefore you want to load a qword pointer from static storage as the 3rd arg for fgets. You could have seen this yourself by looking at compiler output for the C function.
default rel ; Use RIP-relative addressing by default
extern stdin
extern fgets
...
main:
push rax ; dummy push to realign the stack by 16
...
lea rdi, [buffer] ; RIP-relative LEA, or mov edi, buffer in a non-PIE
mov esi, buffer.len
mov rdx, [stdin]
call fgets
...
pop rcx ; dummy pop to readjust RSP
ret
section .bss
buffer: resb 11 ; reserve 11 bytes, zero-filled
.len equ $ - buffer
Note that it's not just lea rdx, [stdin] or mov edx, stdin to put the address of the stdin variable itself in a register. fgets takes the pointer arg by value, not as a pointer to a pointer, and it's not FILE stdin, it's FILE *stdin.
I got the assembler to calculate the length of buffer for me. I filled it with zero bytes instead of ASCII '0' characters.
fgets() reads at most len-1 bytes, and writes a terminating \0 after those, so you need to pass 11 (the size of the full buffer) to read up to 10 bytes.
fgets isn't variadic so you don't need to zero AL before calling it.
Related
Im pretty new to assembly, and am trying my best to learn it. Im taking a course to learn it and they mentioned a very remedial Hello World example, that I decomplied.
original c file:
#include <stdio.h>
int main()
{
printf("Hello Students!");
return 0;
}
This was decompiled using the following command:
C:> objdump -d -Mintel HelloStudents.exe > disasm.txt
decompliation (assembly):
push ebp
mov ebp, esp
and esp, 0xfffffff0
sub esp, 0x10
call 401e80 <__main>
mov DWORD PTR [esp], 0x404000
call 4025f8 <_puts>
mov eax, 0x0
leave
ret
Im having issues mapping this output from the decompliation, to the original C file can someone help?
Thank you very much!
The technical term for decompiling assembly back into C is "turning hamburger back into cows". The generated assembly will not be a 1-to-1 translation of the source, and depending on the level of optimization may be radically different. You will get something functionally equivalent to the original source, but how closely it resembles that source in structure is heavily variable.
push ebp
mov ebp, esp
and esp, 0xfffffff0
sub esp, 0x10
This is all preamble, setting up the stack frame for the main function. It aligns the stack pointer (ESP) by 16 bytes then reserves another 16 bytes of space for outgoing function args.
call 401e80, <___main>
This function call to ___main is how MinGW arranges for libc initialization functions to run at the start of the program, making sure stdio buffers are allocated and stuff like that.
That's the end of the pre-amble; the part of the function that implements the C statements in your source starts with:
mov DWORD PTR [esp], 0x404000
This writes the address of the string literal "Hello Students!" onto the stack. Combined with the earliersub esp, 16, this is like apush` instruction. In this 32-bit calling convention, function args are passed on the stack, not registers, so that's where the compiler has to put them before function calls.
call 4025f8 <_puts>
This calls the puts function. The compiler realized that you weren't doing any format processing in the printf call and replaced it with the simpler puts call.
mov eax, 0x0
The return value of main is loaded into the eax register
leave
ret
Restore the previous EBP value, and tear down the stack frame, then exit the function. ret pops a return address off the stack, which can only work when ESP is pointing at the return address.
I tried to read an input into a buffer with fgets. I pushed the 3 parameters, but got segmentation fault. I tried to see the problem with GDB, but I didn't understand the message that I got there.
This is the code:
section .rodata
buffer: db 10
section .text
align 16
global main
extern fgets
extern stdin
main:
push ebp
mov ebp, esp
pushad
push dword[stdin]
push 10;
push buffer;
call fgets;
add esp, 12
popad
mov esp, ebp
pop ebp
ret
And this is the message that I got:
Program received signal SIGSEGV, Segmentation fault.
__GI__IO_getline_info (fp=fp#entry=0xf7fb1c20 <_IO_2_1_stdin_>,
buf=buf#entry=0x80484f0 "\n", n=8, n#entry=9, delim=delim#entry=10,
extract_delim=extract_delim#entry=1, eof=eof#entry=0x0) at iogetline.c:86
86 iogetline.c: No such file or directory.
What is wrong with my code?
You segfault because you ask fgets to write to an address in the .rodata section. It's of course read-only.
Put your buffer in the .bss section, and use resb 10 to reserve 10 bytes. Your current version is one byte, initialized to { 10 }. You don't want to store a bunch of zeros in your executable for no reason; that's what the bss is for.
section .bss
buffer: resb 10
buffer_length equ $ - buffer
section .text
align 16
global main
extern fgets
extern stdin
main:
push dword [stdin]
push buffer_length
push buffer ; 3 pushes gets the stack back to 16B-alignment
call fgets
add esp, 12
ret
You don't need pusha, or a stack frame (the stuff with ebp) in this function. Normally you only save/restore call-preserved registers you want to use, not all of them every time.
As Michael Petch points out, it would also be better to reserve space on the stack for the buffer, instead of using static storage. Have a look at compiler output for an equivalent C function that uses a local array. (e.g. on http://gcc.godbolt.org/).
Let's take this codegolf example, which reverses the stdin character buffer, and prints it on stdout:
main(_){write(read(0,&_,1)&&main());}
Here write() prints the intended output even though it seems to make do with only 1 parameter passed to it. For comparison, here's the actual write() prototype which clearly specifies 3 parameters:
ssize_t write(int fildes, const void *buf, size_t nbyte);
Moreover, in the example above, the only parameter passed to write() is (as far as I can tell) not even the 1st one, but the 2nd, which corresponds to the buffer pointer. So how are the file descriptor and size values set correctly here? UPDATE: the only argument explicitly passed is argument 1. See my answer for more information.
Can anyone give a precise explanation for this hack?
It's taking advantage of ANSI C89's implicit declaration of functions. (Search implicit declaration here http://flash-gordon.me.uk/ansi.c.txt)
This code golfer must have a system with a syscall called write that defaults to stdout.
This program doesn't work on my system.
Update:
The program works for me if I compile it as a 32-bit application, which gave me a hint as to what's going on.
read is writing to the location of the first argument to main, which is usually argc. If write is the three argument write, then the value at the address of argc to the current call to main must be passed as the second argument to write, and the value of 1 must be passed as the third argument of write, which is the value of argc, because no arguments are passed to the program.
It has something to do with abusing the call stack as the arguments to write.
Here's the x86 asm, if someone wants to step through and explain this one. I'm currently working on it, but haven't figured it out completely yet.
main:
push ebp
mov ebp, esp
and esp, -16
sub esp, 16
mov DWORD PTR [esp+12], eax
mov DWORD PTR [esp+8], 1
lea eax, [esp+12]
mov DWORD PTR [esp+4], eax
mov DWORD PTR [esp], 0
call read
test eax, eax
je .L2
call main
test eax, eax
je .L2
mov eax, 1
jmp .L3
.L2:
mov eax, 0
.L3:
mov DWORD PTR [esp], eax
call write
leave
ret
I think I've figured this out now. As OregonTrail crucially discovered, the code example in question only works on 32-bit systems. This lead to me suspect the code relies on the commonly used 32-bit calling conventioned called CDECL. To quote this wikibook:
In the CDECL calling convention the following holds:
Arguments are passed on the stack in Right-to-Left order, and return values are passed in eax.
The calling function cleans the stack. This allows CDECL functions to have variable-length argument lists (aka variadic functions). For this reason the number of arguments is not appended to the name of the function by the compiler, and the assembler and the linker are therefore unable to determine if an incorrect number of arguments is used.
In chronological order:
1 is pushed to the stack as the 3rd parameter (length) to read().
the pointer &_ to a char buffer is pushed to the stack as the 2nd parameter to read().
0 is pushed to the stack as the 1st parameter (the file descriptor of stdin) to read().
read() is called, returns, and now write() is about to be called.
The assembly was generated based on the premise given in the source code that write() takes only 1 paramater, so only one item is "cleaned"/popped off the stack.
1 is pushed to the stack as the 1st parameter (the file descriptor of stdout) to write(), because 1 is the result of the write(read(0,&_,1)&&main() expression.
write() needs 3 parameters so reads the top 3 items on the stack.
So why does write() write correctly? Because the 2nd and 3d item on the stack have not been touched since they were put there for read() to read, and the 2nd and 3rd parameters of read() and write() in this case need to hold the exact same values anyway: a pointer to the char buffer and a length of 1.
I made a simple program which will just push a number and display it on the screen but
don't know what is going wrong
section .data
value db 10
section .text
global main
extern printf
main:
push 10 //can we push value directly on stack?
call printf
add esp,4
ret
Getting Segmentation fault for above.
section .data
value db 10
section .text
global main
extern printf
main:
push [value]
call printf
add esp,4
ret
In second version will be pushing value pointed to by value variable on to stack
But getting "operation size not specified"
Yes, you can push any DWORD value (in 32-bit assembler) onto the stack.
The problem in the first code fragment is that printf expects the first argument to be a format string (in C, you'd write printf("%d\n", 10);). So something like
section .data
fmt db "%d", 10, 0
...
push 10
push fmt
call printf
add esp, 8
will work.
In the second code fragment, instead of push [value] you should write push dword [value], but that's not correct if your value variable is a single byte. Either declare it as a DWORD (dd), or perform
movsx eax, byte [value] ; if it's a signed integer; movzx for unsigned
push eax
And one more thing. When calling printf (or any of the C library functions), beware of stack alignment. Some platforms require that stack is 16-byte aligned at the time of a function call (this is necessary for correct execution of optimized CPU instructions like SSE). So, to make the stack aligned:
push ebp
mov ebp, esp
sub esp, 8 ; reserve 8 bytes for parameters
and esp, -16 ; align the stack (the reserved space can increase)
mov dword [esp], fmt ; put parameters into stack
mov dword [esp+4], 10
call printf
mov esp, ebp ; restore stack
pop ebp
I've recently read this article on using printf and scanf in assembly:
Meaning of intfmt: db "%d", 10, 0 in assembly
In particular it says
"In printf, the newline prints a newline and then (if the output is in line buffered mode, which it probably is), flushes the internal output buffer so you can actually see the result. So when you remove the 10, there's no flush and you don't see the output."
However I do not know what to do if I do not want a newline after my output in my assembly file.
Here's a simple test file I've written to try printing without a newline:
extern printf
LINUX equ 80H ; interupt number for entering Linux kernel
EXIT equ 60 ; Linux system call 1 i.e. exit ()
section .data
int_output_format: db "%ld", 0
segment .text
global main
main:
mov r8, 10
push rdi
push rsi
push r10
push r9
mov rsi, r8
mov rdi, int_output_format
xor rax, rax
call printf
pop r9
pop r10
pop rsi
pop rdi
call os_return ; return to operating system
os_return:
mov rax, EXIT ; Linux system call 1 i.e. exit ()
mov rdi, 0 ; Error code 0 i.e. no errors
syscall ; Interrupt Linux kernel 64-bit
but as the article I've read suggests stdout isn't being flushed. I was thinking perhaps I need to somehow flush after I output the number? But I'm really not sure.
I am using the NASM assembly language.
Thanks in advance!
fflush() flushes buffered output in line or full-buffered stdio streams:
extern fflush
...
xor edi, edi ; RDI = 0
call fflush ; fflush(NULL) flushes all streams
...
Alternatively, mov rdi, [stdout] / call fflush also works to flush only that stream. (Use default rel for efficient RIP-relative addressing, and you'll need extern stdout as well.)
Call fflush(stdout); to display what's currently sitting in the buffers.
For Windows 32-bit mode (FASM):
push [_iob]
call [fflush] ; call into DLL. Callee-pops calling convention
GNU/Linux 32-bit mode (NASM)
extern fflush
extern stdout
...
push dword [stdout]
call fflush ; linker takes care of PLT stub for dynamic linking (in a non-PIE executable)
add esp, 4 ; caller-pops calling convention
etc...
The other possibility would be to remove the default line buffering of the stdoutstream. Here the C call to do that. Translation to assembly let as exercice, as I don't think it makes even sense to do file/stream I/O in ASM, the cost/benefit is tremendously wrong.
setvbuf(stdout, NULL, _IONBF, 0);
This way every printf (and fputs, putc, puts etc...) would have an implicit fflush
My answer is for those searching a fast bypass to the problem, not an actual fix.
I was attempting to output the number 1234 one digit at a time and encountered the same issue as those here. After having no success with the mentioned solutions, and not wanting to spend more than a few minutes on this, I have found an easy way to display the number.
In your output string formats, simply have an output string that is an empty line (with the newline ofcourse).
Digit_out: db "%u", 0
Number_end: db "", 10, 0
Output as you would; in my case I had 4 pushes of digit_out (with 4 pushes of the digits themselves) and 4 calls to printf.
Once this is done, push Number_end and do a final call to printf. The entire number will then show :)