ASM x64 scanf printf double, GAS

ASM x64 scanf printf double, GAS - c

I can't figure out why this code isn't working for me. I need to use scanf function for double and then printf for same double.
When using this code results are not good. What I see are pretty random characters.
.data
d1: .double
format: .asciz "%lf\n"
format2: .asciz "%lf"
.text
.globl main
main:
subq $8, %rsp
#scanf
movq $0, %rax
movq $d1, %rsi
movq $format2, %rdi
call scanf
addq $16, %rsp
#printf
movq $1, %rax
movsd d1, %xmm0
movq $format, %rdi
call printf
addq $16, %rsp
#exit
movq $1, %rdi
xorq %rax, %rax
call exit

This is the problem:
.data
d1: .double # declares zero doubles, since you used an empty list
format: .asciz "%lf\n"
d1 and format have the same address, since .double with no args assembles to nothing. (".double expects zero or more flonums, separated by commas. It assembles floating point numbers.").
So scanf overwrites the format string you use for printf. This is the random garbage that printf prints.
The fix is to actually reserve some space, preferably on the stack. But if you really want static storage then use the BSS. (This doc explains it well, even though it's about some specific gcc port.)
Instead, use this:
#.bss
# .p2align 3
# d1: .skip 8 ### This is the bugfix. The rest is just improvements
# or just use .lcomm instead of switching to the .bss and back
.lcomm d1, 8
.section .rodata
print_format: .asciz "%f\n" # For printf, "%f" is the format for double. %lf still works to print a double, though. Only %llf or %Lf is long double.
scan_format: .asciz "%lf" # scanf does care about the trailing whitespace in the format string: it won't return until it sees something after the whitespeace :/ Otherwise we could use the same format string for both.
.text
.globl main
main:
subq $8, %rsp
xor %eax,%eax
mov $d1, %esi # addresses for code and static data are always in the low 2G in the default "small" code model, so we can save insn bytes by avoiding REX prefixes.
mov $scan_format, %edi
call scanf
mov $1, %eax
movsd d1, %xmm0
mov $print_format, %edi
call printf
add $8, %rsp
ret
#xor %edi,%edi # exit(0) means success, but we can just return from main instead. It's not a varargs function, so you don't need to zero rax
#call exit
For more stuff about writing efficient asm code, see the links in the x86 tag wiki.
Also would have worked, but wasted 8 bytes in your executable:
.data
d1: .double 0.0
Or to use scratch space on the stack. Also changed: RIP-relative LEA for the format strings, so this will work in a PIE (PIC executable). The explicit #plt is necessary to generate PLT when making a PIE executable.
.globl main
main:
xor %eax, %eax # no FP args. (double* is a pointer, aka integer)
push %rax # reserve 8 bytes, and align the stack. (sub works, push is more compact and usually not slower)
mov %rsp, %rsi # pointer to the 8 bytes
lea scan_format(%rip), %rdi
call scanf#plt
# %eax will be 1 if scanf successfully converted an arg
movsd (%rsp), %xmm0
mov $1, %eax # 1 FP arg in xmm registers (as opposed to memory)
lea print_format(%rip), %rdi
pop %rdx # deallocate 8 bytes. add $8, %rsp would work, too
jmp printf#plt # tailcall return printf(...)
.section .rodata
print_format: .asciz "%f\n"
scan_format: .asciz "%lf"
You could even store your format strings as immediates, too, but then you need to reserve more stack space to keep it aligned. (e.g. push $"%lf", except GAS syntax doesn't do multi-character integer constants. In NASM you really could do push '%lf' to get those 3 bytes + 5 zeros of padding.)
Related: How to print a single-precision float with printf: you can't because of C default-conversion rules that promote to double.
Also related: a Q&A about the ABI alignment rules: Printing floating point numbers from x86-64 seems to require %rbp to be saved

Related

Why does GCC allocate more stack memory than needed?

I'm reading "Computer Systems: A Programmer's Perspective, 3/E" (CS:APP3e) and the following code is an example from the book:
long call_proc() {
long x1 = 1;
int x2 = 2;
short x3 = 3;
char x4 = 4;
proc(x1, &x1, x2, &x2, x3, &x3, x4, &x4);
return (x1+x2)*(x3-x4);
}
The book gives the assembly code generated by GCC:
long call_proc()
call_proc:
; Set up arguments to proc
subq $32, %rsp ; Allocate 32-byte stack frame
movq $1, 24(%rsp) ; Store 1 in &x1
movl $2, 20(%rsp) ; Store 2 in &x2
movw $3, 18(%rsp) ; Store 3 in &x3
movb $4, 17(%rsp) ; Store 4 in &x4
leaq 17(%rsp), %rax ; Create &x4
movq %rax, 8(%rsp) ; Store &x4 as argument 8
movl $4, (%rsp) ; Store 4 as argument 7
leaq 18(%rsp), %r9 ; Pass &x3 as argument 6
movl $3, %r8d ; Pass 3 as argument 5
leaq 20(%rsp), %rcx ; Pass &x2 as argument 4
movl $2, %edx ; Pass 2 as argument 3
leaq 24(%rsp), %rsi ; Pass &x1 as argument 2
movl $1, %edi ; Pass 1 as argument 1
; Call proc
call proc
; Retrieve changes to memory
movslq 20(%rsp), %rdx ; Get x2 and convert to long
addq 24(%rsp), %rdx ; Compute x1+x2
movswl 18(%rsp), %eax ; Get x3 and convert to int
movsbl 17(%rsp), %ecx ; Get x4 and convert to int
subl %ecx, %eax ; Compute x3-x4
cltq ; Convert to long
imulq %rdx, %rax ; Compute (x1+x2) * (x3-x4)
addq $32, %rsp ; Deallocate stack frame
ret ; Return
I can understand this code: the compiler allocates 32 bytes of space on the stack, of which the first 16 bytes hold the arguments passed to proc and the last 16 bytes hold 4 local variables.
Then I tested this code on GCC 11.2, using the optimization flag -Og, and got this assembly code:
call_proc():
subq $24, %rsp
movq $1, 8(%rsp)
movl $2, 4(%rsp)
movw $3, 2(%rsp)
movb $4, 1(%rsp)
leaq 1(%rsp), %rax
pushq %rax
pushq $4
leaq 18(%rsp), %r9
movl $3, %r8d
leaq 20(%rsp), %rcx
movl $2, %edx
leaq 24(%rsp), %rsi
movl $1, %edi
call proc(long, long*, int, int*, short, short*, char, char*)
movslq 20(%rsp), %rax
addq 24(%rsp), %rax
movswl 18(%rsp), %edx
movsbl 17(%rsp), %ecx
subl %ecx, %edx
movslq %edx, %rdx
imulq %rdx, %rax
addq $40, %rsp
ret
I noticed that gcc first allocated 24 bytes for 4 local variables. Then it uses pushq to add 2 arguments to the stack, so the final code uses addq $40, %rsp to free stack space.
Compared to the code in the book, GCC allocates 8 more bytes of space here, and it doesn't seem to use the extra space. Why does it need the extra space?

(This answer is a summary of comments posted above by Antti Haapala, klutt and Peter Cordes.)
GCC allocates more space than "necessary" in order to ensure that the stack is properly aligned for the call to proc: the stack pointer must be adjusted by a multiple of 16, plus 8 (i.e. by an odd multiple of 8). Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?
What's strange is that the code in the book doesn't do that; the code as shown would violate the ABI and, if proc actually relies on proper stack alignment (e.g. using aligned SSE2 instructions), it may crash.
So it appears that either the code in the book was incorrectly copied from compiler output, or else the authors of the book are using some unusual compiler flags which alter the ABI.
Modern GCC 11.2 emits nearly identical asm (Godbolt) using -Og -mpreferred-stack-boundary=3 -maccumulate-outgoing-args, the former of which changes the ABI to maintain only 2^3 byte stack alignment, down from the default 2^4. (Code compiled this way can't safely call anything compiled normally, even standard library functions.) -maccumulate-outgoing-args used to be the default in older GCC, but modern CPUs have a "stack engine" that makes push/pop single-uop so that option isn't the default anymore; push for stack args saves a bit of code size.
One difference from the book's asm is a movl $0, %eax before the call, because there's no prototype so the caller has to assume it might be variadic and pass AL = the number of FP args in XMM registers. (A prototype that matches the args passed would prevent that.) The other instructions are all the same, and in the same order as whatever older GCC version the book used, except for choice of registers after call proc returns: it ends up using movslq %edx, %rdx instead of cltq (sign-extend with RAX).
CS:APP 3e global edition is notorious for errors in practice problems introduced by the publisher (not the authors), but apparently this code is present in the North American edition, too. So this may be the author's mistake / choice to use actual compiler output with weird options. Unlike some of the bad global edition practice problems, this code could have come unmodified from some GCC version, but only with non-standard options.
Related: Why does GCC allocate more space than necessary on the stack, beyond what's needed for alignment? - GCC has a missed-optimization bug where it sometimes reserves an additional 16 bytes that it truly didn't need to. That's not what's happening here, though.

Prinfting multiple values in Assembly

So I do have some assembly code which I wrote on my linux VM (Manjaro, x86_64). It looks like this:
.section .rodata
.LC0:
.string "The value of a is: %d, of b: %d"
.text
.globl main
.type main, #function
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl $15, -4(%rbp)
movl $20, -8(%rbp)
movl -8(%rbp), %edx
movl -4(%rbp), %eax
movl %eax, %esi
movl $.LC0, %edi
movl $0, %eax
call printf
movl $0, %eax
leave
ret
Basically I want to insert 2 values in registers, then somehow print them (formated like in .LC0). Well, I got stuck, so I just wrote C program, and used gcc -S to see how it looks. It gave me something similair to the code above. I don't understand two things:
If I store 20 in %edx and 15 in %eax, then why passing only %eax to %esi causes printf to print the values both from %eax and %edx?
Why do I have to put a zero constant everytime before and after printf (as gcc does?)

Why do I have to put a zero constant everytime before and after printf
These are two different issues.
Zero before printf conforms to x86-64 a.k.a. AMD64 SysV ABI to specify count of variable arguments in vector (XMMn, YMMn...) registers.
Zero after printf is this function return value (likely, return 0 at its end).
why passing only %eax to %esi causes printf to print the values both from %eax and %edx?
It does not.
The same ABI specifies: the first argument (printf format string pointer) in %rdi; the second argument (first variable argument) in %rsi, and so on. Additional move of arguments seems to be artifact of non-optimized (-O0) gcc output code. If you add any optimization (even -Og), youʼll see these senseless moves wiped out.

Where string data is stored?

I wrote a small c program:
#include <stdio.h>
int main()
{
char s[] = "Hello, world!";
printf("%s\n", s);
return 0;
}
which compiles to (on my linux machine):
.file "hello.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $32, %rsp
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
movl $1819043144, -32(%rbp)
movl $1998597231, -28(%rbp)
movl $1684828783, -24(%rbp)
movw $33, -20(%rbp)
leaq -32(%rbp), %rax
movq %rax, %rdi
call puts
movl $0, %eax
movq -8(%rbp), %rdx
xorq %fs:40, %rdx
je .L3
call __stack_chk_fail
.L3:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2"
.section .note.GNU-stack,"",#progbits
I don't understand the assembly code, but I can't see anywhere the string message. So how the executable know what to print?

It's here:
movl $1819043144, -32(%rbp) ; 1819043144 = 0x6C6C6548 = "lleH"
movl $1998597231, -28(%rbp) ; 1998597231 = 0x77202C6F = "w ,o"
movl $1684828783, -24(%rbp) ; 1684828783 = 0x646C726F = "dlro"
movw $33, -20(%rbp) ; 33 = 0x0021 = "\0!"
In this particular case the compiler is generating inline instructions to generate the literal string constant before calling printf. Of course in other situations it may not do this but may instead store a string constant in another section of memory. Bottom line: you can not make any assumptions about how or where the compiler will generate and store string literals.

The string is here:
movl $1819043144, -32(%rbp)
movl $1998597231, -28(%rbp)
movl $1684828783, -24(%rbp)
This copies a bunch of values to the stack. Those values happen to be your string.

string constants are stored in the binary of your application. Exactly where is up to your compiler.

Assembly has no "string" concept. Thus, the "string" is actually a chunk of memory. The string is stored somewhere in memory (up to the compiler) then you can manipulate this chunk of data using its memory address (pointer).
If your string is constant, compiler might want to use it as constants instead of storing it into memory, which is faster. This is your case, as pointed out by Paul R:
movl $1819043144, -32(%rbp)
movl $1998597231, -28(%rbp)
movl $1684828783, -24(%rbp)
You cannot make assumptions about how the compiler will treat your string.

In addition to the above, the compiler can see that your string literal cannot be referenced directly (i.e. there can't be any valid pointers to your string), which is why it can just copy it inline. If however you assign a character pointer instead, i.e.
char *s = "Hello, world!";
The compiler will initialise a string literal somewhere in memory, since you can of course now point to it. This modification produces on my machine:
.LC0:
.string "Hello, world!"
.text
.globl main
.type main, #function
One assumption can be made about string literals: if a pointer is initialised to a literal, it will point to a static char array held somewhere in memory. As a result the pointer is valid in any part of the program, e.g. you can return a pointer to a string literal initialised in a function, and it will still be valid.

Why doesn't the compiler allocate and deallocate local var with "sub" and "add" on the stack?

According to some textbooks, the compiler will use sub* to allocate memory for local variables.
For example, I write a Hello World program:
int main()
{
puts("hello world");
return 0;
}
I guess this will be compiled to some assembly code on the 64 bit OS:
subq $8, %rsp
movq $.LC0, (%rsp)
calq puts
addq $8, %rsp
The subq allocates 8 byte memory (size of a point) for the argument and the addq deallocates it.
But when I input gcc -S hello.c (I use the llvm-gcc on Mac OS X 10.8), I get some assembly code.
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
subq $16, %rsp
Ltmp2:
xorb %al, %al
leaq L_.str(%rip), %rcx
movq %rcx, %rdi
callq _puts
movl $0, -8(%rbp)
movl -8(%rbp), %eax
movl %eax, -4(%rbp)
movl -4(%rbp), %eax
addq $16, %rsp
popq %rbp
ret
.......
L_.str:
.asciz "hello world!"
Around this callq without any addq and subq. Why? And what is the function of addq $16, %rsp?
Thanks for any input.

You don't have any local variables in your main(). All you may have in it is a pseudo-variable for the parameter passed to puts(), the address of the "hello world" string.
According to your last disassembly, the calling conventions appear to be such that the first parameter to puts() is passed in the rdi register and not on the stack, which is why there isn't any stack space allocated for this parameter.
However, since you're compiling your program with optimization disabled, you may encounter some unnecessary stack space allocations and reads and writes to and from that space.
This code illustrates it:
subq $16, %rsp ; allocate some space
...
movl $0, -8(%rbp) ; write to it
movl -8(%rbp), %eax ; read back from it
movl %eax, -4(%rbp) ; write to it
movl -4(%rbp), %eax ; read back from it
addq $16, %rsp
Those four mov instructions are equivalent to just one simple movl $0, %eax, no memory is needed to do that.
If you add an optimization switch like -O2 in your compile command, you'll see more meaningful code in the disassembly.
Also note that some space allocations may be needed solely for the purpose of keeping the stack pointer aligned, which improves performance or avoids issues with misaligned memory accesses (you could get the #AC exception on misaligned accesses if it's enabled).
The above code shows it too. See, those four mov instructions only use 8 bytes of memory, while the add and sub instructions grow and shrink the stack by 16.

Incrementing from 0 to 100 in assembly language

This is kinda oddball, but I was poking around with the GNU assembler today (I want to be able to at least read the syntax), and was trying to get this little contrived example of mine to work. Namely I just want to go from 0 to 100, printing out numbers all the while. So a few minutes later I come up with this:
# count.s: print the numbers from 0 to 100.
.text
string: .asciz "%d\n"
.globl _main
_main:
movl $0, %eax # The starting point/current value.
movl $100, %ebx # The ending point.
_loop:
# Display the current value.
pushl %eax
pushl $string
call _printf
addl $8, %esp
# Check against the ending value.
cmpl %eax, %ebx
je _end
# Increment the current value.
incl %eax
jmp _loop
_end:
All I get from this is 3 printed over and over again. Like I said, just a little contrived example, so don't worry too much about it, it's not a life or death problem.
(The formatting's a little messed up, but nothing major).

You can't trust what any called procedure does to any of the registers.
Either push the registers onto the stack and pop them back off after calling printf or have the increment and end point values held in memory and read/written into registers as you need them.
I hope the following works. I'm assuming that pushl has an equivalant popl and you can push an extra couple of numbers onto the stack.
# count.s: print the numbers from 0 to 100.
.text
string: .asciz "%d\n"
.globl _main
_main:
movl $0, %eax # The starting point/current value.
movl $100, %ebx # The ending point.
_loop:
# Remember your registers.
pushl %eax
pushl %ebx
# Display the current value.
pushl %eax
pushl $string
call _printf
addl $8, %esp
# reinstate registers.
popl %ebx
popl %eax
# Check against the ending value.
cmpl %eax, %ebx
je _end
# Increment the current value.
incl %eax
jmp _loop
_end:

I'm not too familiar with _printf, but could it be that it modifies eax? Printf should return the number of chars printed, which in this case is two: '0' and '\n'. I think it returns this in eax, and when you increment it, you get 3, which is what you proceed to print.
You might be better off using a different register for the counter.

You can safely use registers that are "callee-saved" without having to save them yourself. On x86 these are edi, esi, and ebx; other architectures have more.
These are documented in the ABI references: http://math-atlas.sourceforge.net/devel/assembly/

Well written functions will usually push all the registers onto the stack and then pop them when they're done so that they remain unchanged during the function. The exception would be eax that contains the return value. Library functions like printf are most likely written this way, so I wouldn't do as Wedge suggests:
You'll need to do the same for any other variable you have. Using registers to store local variables is pretty much reserved to architectures with enough registers to support it (e.g. EPIC, amd64, etc.)
In fact, from what I know, compilers usually compile functions that way to deal exactly with this issue.
#seanyboy, your solution is overkill. All that's needed is to replace eax with some other register like ecx.

Nathan is on the right track. You can't assume that register values will be unmodified after calling a subroutine. In fact, it's best to assume they will be modified, else the subroutine wouldn't be able to do it's work (at least for low register count architectures like x86). If you want to preserve a value you should store it in memory (e.g. push it onto the stack and keep track of it's location).
You'll need to do the same for any other variable you have. Using registers to store local variables is pretty much reserved to architectures with enough registers to support it (e.g. EPIC, amd64, etc.)

You could rewrite it so that you use registers that aren't suppose to change, for example %ebp. Just make sure you push them onto the stack at the beginning, and pop them off at the end of your routine.
# count.s: print the numbers from 0 to 100.
.text
string: .asciz "%d\n"
.globl _main
_main:
push %ecx
push %ebp
movl $0, %ecx # The starting point/current value.
movl $100, %ebp # The ending point.
_loop:
# Display the current value.
pushl %ecx
pushl $string
call _printf
addl $8, %esp
# Check against the ending value.
cmpl %ecx, %ebp
je _end
# Increment the current value.
incl %ecx
jmp _loop
_end:
pop %ebp
pop %ecx

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight