I have some assembly code that uses scanf and printf and I'm running into some problems. When both of these functions are used in the same code, the values in the registers seem to be lost. The program basically loads a number and prints it out. We run it using
nasm -f elf64 file.asm && gcc -o file file.o && ./file
on linux
Here's our code:
extern printf
extern scanf
section .data
a db "set: ", 0
b db "not set: ", 0
reading db "Please enter a number: ", 0
message db "\n", 0
printsent db "%s", 10, 0
printint db "%d", 10, 0
printchar db "%c", 10, 0
readInt db "%d", 0
input db "%d", 0
section .text
global main
main:
hatta:
push rbp,
mov rbp, rsp,
push rbx,
xor rax, rax,
mov rdi, printsent,
mov rsi, reading
call printf,
pop rbx,
xor rax, rax,
mov rdi, readInt,
call scanf,
mov rbx, rdi
push rbx,
xor rax, rax,
mov rdi, printint,
mov rsi, rbx,
call printf,
pop rbx,
pop rbp,
ret
The odd thing is that if the line mov rdi, printint, is removed, we obtain the correct values. However, if we do the same thing with printsentence, we get a segmentation fault. Could anyone tell us the reason for this?
Thanks!
There are two errors in your scanf usage, possibly based on one false assumption: You seem to mean that scanf returns the loaded number in rdi and that no further argument is needed with format "%d". In truth the number (if scanned successfully) is returned in the memory pointed to by the second argument. Thus, the following changes make the code work.
pop rbx, | ;delete
=
xor rax, rax, = xor rax, rax,
mov rdi, readInt, = mov rdi, readInt,
> mov rsi, rsp
call scanf, = call scanf,
mov rbx, rdi | pop rbx,
Concerning if the line mov rdi, printint, is removed, we obtain the correct values - I doubt that.
I don't understand why you have the C flag here, there is no C code involved, but to your question:
As far as I remember the calling convention in Linux glibc x64 for printf(format, argument) is format in rdi, argument in rsi.
If you remove mov rdi, printsent, then you are calling printf(undefined,"Please enter a number: "). You did not provide a format argument in rdi, but printf
doesn't know that and uses whatever is in rdi at that moment. Most likely an invalid memory address which thus invokes a SIGSEGV.
By default function calls in x86 are supposed to be non-destructive on the arguments, though this is not a requirement. Standard library functions generally are. They achieve this by pushing the arguments on to the stack and reload them when done.
As such when you call scanf(readInt, ...) it will restore the pointer to readInt which has the same contents as printint in rdi when it returns. As such removing the line mov rdi, printint, has no effect because rdi contains a valid pointer to the format you need.
Related
#include <stdio.h>
int add(int a, int b)
{
if (a > b)
return a * b;
}
int main(void)
{
printf("%d", add(3, 7));
return 0;
}
Output:
3
In the above code, I am calling the function inside the print. In the function, the if condition is not true, so it won't execute. Then why I am getting 3 as output? I tried changing the first parameter to some other value, but it's printing the same when the if condition is not satisfied.
What happens here is called undefined behaviour.
When (a <= b), you don't return any value (and your compiler probably told you so). But if you use the return value of the function anyway, even if the function doesn't return anything, that value is garbage. In your case it is 3, but with another compiler or with other compiler flags it could be something else.
If your compiler didn't warn you, add the corresponding compiler flags. If your compiler is gcc or clang, use the -Wall compiler flags.
Jabberwocky is right: this is undefined behavior. You should turn your compiler warnings on and listen to them.
However, I think it can still be interesting to see what the compiler was thinking. And we have a tool to do just that: Godbolt Compiler Explorer.
We can plug your C program into Godbolt and see what assembly instructions it outputs. Here's the direct Godbolt link, and here's the assembly that it produces.
add:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
mov eax, DWORD PTR [rbp-4]
cmp eax, DWORD PTR [rbp-8]
jle .L2
mov eax, DWORD PTR [rbp-4]
imul eax, DWORD PTR [rbp-8]
jmp .L1
.L2:
.L1:
pop rbp
ret
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov esi, 7
mov edi, 3
call add
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
pop rbp
ret
Again, to be perfectly clear, what you've done is undefined behavior. With different compiler flags or a different compiler version or even just a compiler that happens to feel like doing things differently on a particular day, you will get different behavior. What I'm studying here is the assembly output by gcc 12.2 on Godbolt with optimizations disabled, and I am not representing this as standard or well-defined behavior.
This engine is using the System V AMD64 calling convention, common on Linux machines. In System V, the first two integer or pointer arguments are passed in the rdi and rsi registers, and integer values are returned in rax. Since everything we work with here is either an int or a char*, this is good enough for us. Note that the compiler seems to have been smart enough to figure out that it only needs edi, esi, and eax, the lower half-words of each of these registers, so I'll start using edi, esi, and eax from this point on.
Our main function works fine. It does everything we'd expect. Our two function calls are here.
mov esi, 7
mov edi, 3
call add
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
To call add, we put 3 in the edi register and 7 in the esi register and then we make the call. We get the return value back from add in eax, and we move it to esi (since it will be the second argument to printf). We put the address of the static memory containing "%d" in edi (the first argument), and then we call printf. This is all normal. main knows that add was declared to return an integer, so it has the right to assume that, after calling add, there will be something useful in eax.
Now let's look at add.
add:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
mov eax, DWORD PTR [rbp-4]
cmp eax, DWORD PTR [rbp-8]
jle .L2
mov eax, DWORD PTR [rbp-4]
imul eax, DWORD PTR [rbp-8]
jmp .L1
.L2:
.L1:
pop rbp
ret
The rbp and rsp shenanigans are standard function call fare and aren't specific to add. First, we load our two arguments onto the call stack as local variables. Now here's where the undefined behavior comes in. Remember that I said eax is the return value of our function. Whatever happens to be in eax when the function returns is the returned value.
We want to compare a and b. To do that, we need a to be in a register (lots of assembly instructions require their left-hand argument to be a register, while the right-hand can be a register, reference, immediate, or just about anything). So we load a into eax. Then we compare the value in eax to the value b on the call stack. If a > b, then the jle does nothing. We go down to the next two lines, which are the inside of your if statement. They correctly set eax and return a value.
However, if a <= b, then the jle instruction jumps to the end of the function without doing anything else to eax. Since the last thing in eax happened to be a (because we happened to use eax as our comparison register in cmp), that's what gets returned from our function.
But this really is just random. It's what the compiler happened to have put in that register previously. If I turn optimizations up (with -O3), then gcc inlines the whole function call and ends up printing out 0 rather than a. I don't know exactly what sequence of optimizations led to this conclusion, but since they started out by hinging on undefined behavior, the compiler is free to make what assumptions it chooses.
the following code tries to read the command line arguments and then scans them with sscanf() and use the result to emit utf8 text.
I'm failing to call sscanf() and getting segfault error at the line where I call this function
I have already debugged this and I know where is the problem but not how to solve it.
global main
extern puts
extern sscanf
extern printf
extern emit_utf_8
section .text
main:
cmp rdi, 2
jl argumentsError
add rsi, 8 ; skip the name of the program
forloop:
mov r12, rsi
push rdi
push rsi
push r12
sub rsp, 8 ; must align stack before call
; start of for bloc
xor rax, rax
mov rdi, qword [r12]
mov rsi, codePointFormat
mov rdx, qword [codePoint]
call sscanf
cmp rax, 1
je ifthen
jmp else
ifthen:
mov rdi, codePoint
call emit_utf_8
jmp endif
else:
mov rdi, incorrectFormat
mov rsi, r12
call printf
endif:
; end of for bloc
add rsp, 8 ; restore %rsp to pre-aligned value
pop r12
pop rsi
pop rdi
add rsi, 8 ; point to next argument
dec rdi ; count down
jnz forloop ; if not done counting keep going
ret
argumentsError: mov rdi, argumentsRequiredMessage
call puts
mov rdi, argumentDescription
call puts
xor rax, rax
inc rax
ret
section .data
argumentsRequiredMessage:
db "This program requires one or more command line arguments,", 0
argumentDescription: db "one for each code point to encode as UTF-8.", 0
incorrectFormat: db "(%s incorrect format)", 0
codePointFormat: db "U+%6X", 0
section .bss
codePoint: resb 8 ; The code point from sscanf should go here.
Is there a way to pass that third argument?
sscanf() signature.
in __cdecl sscanf(const char *const _buffer, const char *const _Format, ...)
I'm using ubuntu 19.04 64 bit
extern printf
extern scanf
global main
section .text
main:
sub rsp, 0x10
mov rbx, rsp
add rbx, 0x08
mov rdi, format1
mov rsi, rbx
xor rax, rax
call scanf
mov rdi, format2
mov rsi, [rbx]
xor rax, rax
call printf
add rsp, 0x10
ret
format1:
db "%d", 0
format2:
db "%d", 0xa, 0
value:
dd 0xa
Above source is same with
#include <stdio.h>
int main(void)
{
int tmp;
scanf("%d", &tmp);
printf("%d\n", tmp);
}
It works well. But I have question. If I change my source code to
extern printf
extern scanf
global main
section .text
main:
mov rdi, format1
mov rsi, value
xor rax, rax
call scanf
mov rdi, format2
mov rsi, [value]
xor rax, rax
call printf
ret
format1:
db "%d", 0
format2:
db "%d", 0xa, 0
value:
dd 0xa
it makes segmentation fault. I think there are no difference between above source code and first one. Did I misunderstand?
In the first code, you're allocating space for a variable (the tmp in your C code, unnamed in the asm code) on the stack and passing the address of it to the scanf function, then passing the value that scanf wrote there to printf.
In the second, you're trying to use a global value allocated in the .text section instead, but .text is read-only by default on most systems. So when scanf tries to write to it, you get a segfault.
Stick a section .data just before value: to put it in the data section instead and it should be fine...
Using NASM, I need to change a character in a string at a given index and print the string in its new form. Here is a simplified version of my code:
;test_code.asm
global main
extern printf
output_str: db "----------"
index: dq 7
main:
push rbp
mov rdi, output_str
mov rax, index
mov byte[rdi + rax], 'x'
xor rax, rax
call printf
pop rbp
ret
I then compile using:
nasm -felf64 test_code.asm && gcc test_code.o -lm
and get a seg fault. Would someone please point out the flaw here? I can't seem to find it myself.
your string is in the .text section of the executable, which is read only by default. Either you allocate a buffer on the stack, copy the string and you modify it there, or you put the string in the .data section (which is read/write) using the section directive. In this last case, notice that the character replacement will be persistent, i.e. even later in the program the string will remain modified;
if you want to print that string with printf it has to be NUL-terminated. Add a ,0 to the end of the db line;
that mov rax, index is wrong - index is the address of the qword you wrote above, while you actually want to copy in rax the datum wrote there; you probably want mov rax, [index].
So, something like
;test_code.asm
global main
extern printf
section .data
output_str:
db "----------",0
section .text
index:
dq 7
main:
push rbp
mov rdi, output_str
mov rax, [index]
mov byte[rdi + rax], 'x'
xor rax, rax
call printf
pop rbp
ret
I'm trying to input values into an array in x86-64 Intel assembly, but I can't quite figure it out.
I'm creating an array in segement .bss. Then I try to pass the address of the array along to another module by using r15. Inside that module I prompt the user for a number that I then insert into the array. But it doesn't work.
I'm trying to do the following
segment .bss
dataArray resq 15 ; Array that will be manipulated
segment .text
mov rdi, dataArray ; Store memory address of array so the next module can use it.
call inputqarray ; Calling inputqarray module
Inside of inputqarary I have:
mov r15, rdi ; Move the memory address of the array into r15 for safe keeping
push qword 0 ; Make space on the stack for the value we are reading
mov rsi, rsp ; Set the second argument to point to the new locaiton on the stack
mov rax, 0 ; No SSE input
mov rdi, oneFloat ; "%f", 0
call scanf ; Call C Standard Library scanf function
call getchar ; Clean the input stream
pop qword [r15]
I then try to output the value entered by the use by doing
push qword 0
mov rax, 1
mov rdi, oneFloat
movsd xmm0, [dataArray]
call printf
pop rax
Unfortunately, all I get for output is 0.00000
Output is 0 because you are using the wrong format specifier. It should be "%lf"
Next, no need to push and pop in your procedure. Since your going to pass the address of the data array to scanf, and that will be in rsi, just pass it in rsi; one less move.
You declared your array as 15 QWORDS, is that correct - 120 bytes? or did you mean resb 15?
This works and should get you on your way:
extern printf, scanf, exit
global main
section .rodata
fmtFloatIn db "%lf", 0
fmtFloatOut db `%lf\n`, 0
section .bss
dataArray resb 15
section .text
main:
sub rsp, 8 ; stack pointer 16 byte aligned
mov rsi, dataArray
call inputqarray
movsd xmm0, [dataArray]
mov rdi, fmtFloatOut
mov rax, 1
call printf
call exit
inputqarray:
sub rsp, 8 ; stack pointer 16 byte aligned
; pointer to buffer is in rsi
mov rdi, fmtFloatIn
mov rax, 0
call scanf
add rsp, 8
ret
Since you are passing params in rdi to the C functions, this is not on Windows.