subq $40 %rsp crash with AS but GCC not - c

I meet a strange phenomenon, I record the code in following.
My test bed is x86_64 and gcc is 5.3.0
When I reserve some space in the stack for local value, sometimes it would crash.
| AS and LD | gcc |
--------------------------------------------
40 bytes in stack | crash | ok |
--------------------------------------------
32 bytes in stack | ok | crash |
--------------------------------------------
.section .data
fmt:
.ascii "0x%lx\n\0"
.section .text
.global _start
_start:
subq $40, %rsp # subq $32, %rsp is OK
# I want to reserve some place for local value.
movq $8, %rsi
movq $fmt, %rdi
call printf #print something
addq $40, %rsp
movq $1, %rax
int $0x80
as tsp.s -o tsp.o
ld -lc -I /lib64/ld-linux-x86-64.so.2 tsp.o -o tsp
./tsp
Segmentation fault (core dumped)
This time I use gcc to compile and link.
It is ok, when I reserve 40 bytes in the stack.
It crash, when I reserve 32 bytes in the stack.
.section .data
fmt:
.ascii "0x%lx\n\0"
.section .text
.global main
main:
subq $40, %rsp # if subq $32, %rsp, it would crash.
movq $8, %rsi
movq $fmt, %rdi
call printf
addq $40, %rsp
movq $1, %rax
int $0x80
gcc tsp.s -o tsp
./tsp
0x8

When I tested your code printf crashed when accessing xmm registers. There are two reasons for it. When you let gcc do the compilation and linking it will actually have additional code before main. That code will correctly align the stack and then call main.
Since main was called like a normal function the stack will be aligned at 8 mod 16 because of the call instruction, but when calling a function the stack has to be correctly aligned (0 mod 16). The reason for the alignment requirement is because of xmm registers (among others).
Now, why did printf touch xmm registers in the first place? Because you called printf incorrectly. The ABI for amd64 says:
When a function taking variable-arguments is called, %rax must be set to the total number of floating point parameters passed to the function in SSE registers.
Your rax probably has some non-zero value in it.
So, two things to fix your problems. xorl %eax, %eax to zero %rax before the call to printf. And be aware of how you have been called and how to align the stack. If you've been called as a normal function, you need to subtract 8+n*16 (n can be 0) from your stack pointer before doing a call. If you've been called as an entry point to be safe you need to properly align your stack pointer because I'm not sure if the kernel always guarantees that your stack pointer will be aligned.

Related

Why does this assembly cause a segmentation fault? [duplicate]

This question already has answers here:
How to write assembly language hello world program for 64 bit Mac OS X using printf?
(1 answer)
Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?
(1 answer)
Closed 1 year ago.
I'm working on a hobby compiler project, and am getting frustrated because my generated assembly makes perfect sense to me but causes a segmentation fault on running. For the input printf("%d", strcmp("1", "2"));, the compiler generates:
.globl _main
_main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
leaq S0(%rip), %rax
movq %rax, %rdi
pushq %rdi
leaq S1(%rip), %rax
movq %rax, %rdi
leaq S2(%rip), %rax
movq %rax, %rsi
callq _strcmp
popq %rdi
movq %rax, %rsi
callq _printf
addq $16, %rsp
popq %rbp
retq
S0:
.asciz "%d"
S1:
.asciz "1"
S2:
.asciz "2"
The compiler starts by generating code to call printf, since that's what it encounters first. However, the second argument is a function call, so I thought it would be a good idea to push all the arguments I've already parsed, hence the pushq %rdi (and later popq %rdi). However, running the program causes a segfault without printing anything, and I'm not sure why. I'm guessing it's either due to stack alignment somehow or it's due to the offset based adressing (leaq S0(%rip), %rax) not being stored or retrieved correctly. Commenting out the instructions that push and pop %rdi and after the call to strcmp re-inserting leaq S0(%rip), %rdi before callq _printf causes the program to run as expected.
I've been banging my head against the wall trying to figure this out. If anyone can help figure out exactly what's wrong with the given assembly, it would be much appreciated.
System info: Working on a 2015 Mac running Mac OSX 10.15.6.
$ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/include/c++/4.2.1
Apple clang version 12.0.0 (clang-1200.0.32.29)
Target: x86_64-apple-darwin19.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

GCC assembly code shows 32bit registers on 64bit machine

I am trying to learn how to use ptrace library for tracing all system calls and their arguments. I am stuck in getting the arguments passed to system call.
I went through many online resources and SO questions and figured out that on 64 bit machine the arguments are stored in registers rax(sys call number), rdi, rsi, rdx, r10, r8, r9
in the same order. Check this website .
Just to confirm this I wrote a simple C program as follows
#include<stdio.h>
#include<fcntl.h>
int main() {
printf("some print data");
open("/tmp/sprintf.c", O_RDWR);
}
and generated assembly code for this using gcc -S t.c but assembly code generated is as below
.file "t.c"
.section .rodata
.LC0:
.string "some print data"
.LC1:
.string "/tmp/sprintf.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $.LC0, %edi
movl $0, %eax
call printf
movl $2, %esi
movl $.LC1, %edi
movl $0, %eax
call open
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4"
.section .note.GNU-stack,"",#progbits
As you can see this code is storing parameters on esi and edi instead.
Why is happening?
Also please guide me on what is the best way to access these passed arguments from these registers/memory location from a C code? How can I figure out if the contents of register is the argument itself or is it a memory location where actual argument is stored?
Thanks!
this code is storing parameters on esi and edi
32-bit instructions are smaller, thus preferred when possible. See also Why do most x64 instructions zero the upper part of a 32 bit register.
How can I figure out if the contents of register is the argument itself or is it a memory location where actual argument is stored?
The AMD64 SystemV calling convention never implicitly replaces a function arg with a hidden pointer. Integer / pointer args in the C prototype always go in the arg-passing registers directly.
structs / unions passed by value go in one or more registers, or on the stack.
The full details are documented in the ABI. See more links in the x86 tag wiki. http://www.x86-64.org/documentation.html is down right now, so I linked the current revision on github.

ASM x64 scanf printf double, GAS

I can't figure out why this code isn't working for me. I need to use scanf function for double and then printf for same double.
When using this code results are not good. What I see are pretty random characters.
.data
d1: .double
format: .asciz "%lf\n"
format2: .asciz "%lf"
.text
.globl main
main:
subq $8, %rsp
#scanf
movq $0, %rax
movq $d1, %rsi
movq $format2, %rdi
call scanf
addq $16, %rsp
#printf
movq $1, %rax
movsd d1, %xmm0
movq $format, %rdi
call printf
addq $16, %rsp
#exit
movq $1, %rdi
xorq %rax, %rax
call exit
This is the problem:
.data
d1: .double # declares zero doubles, since you used an empty list
format: .asciz "%lf\n"
d1 and format have the same address, since .double with no args assembles to nothing. (".double expects zero or more flonums, separated by commas. It assembles floating point numbers.").
So scanf overwrites the format string you use for printf. This is the random garbage that printf prints.
The fix is to actually reserve some space, preferably on the stack. But if you really want static storage then use the BSS. (This doc explains it well, even though it's about some specific gcc port.)
Instead, use this:
#.bss
# .p2align 3
# d1: .skip 8 ### This is the bugfix. The rest is just improvements
# or just use .lcomm instead of switching to the .bss and back
.lcomm d1, 8
.section .rodata
print_format: .asciz "%f\n" # For printf, "%f" is the format for double. %lf still works to print a double, though. Only %llf or %Lf is long double.
scan_format: .asciz "%lf" # scanf does care about the trailing whitespace in the format string: it won't return until it sees something after the whitespeace :/ Otherwise we could use the same format string for both.
.text
.globl main
main:
subq $8, %rsp
xor %eax,%eax
mov $d1, %esi # addresses for code and static data are always in the low 2G in the default "small" code model, so we can save insn bytes by avoiding REX prefixes.
mov $scan_format, %edi
call scanf
mov $1, %eax
movsd d1, %xmm0
mov $print_format, %edi
call printf
add $8, %rsp
ret
#xor %edi,%edi # exit(0) means success, but we can just return from main instead. It's not a varargs function, so you don't need to zero rax
#call exit
For more stuff about writing efficient asm code, see the links in the x86 tag wiki.
Also would have worked, but wasted 8 bytes in your executable:
.data
d1: .double 0.0
Or to use scratch space on the stack. Also changed: RIP-relative LEA for the format strings, so this will work in a PIE (PIC executable). The explicit #plt is necessary to generate PLT when making a PIE executable.
.globl main
main:
xor %eax, %eax # no FP args. (double* is a pointer, aka integer)
push %rax # reserve 8 bytes, and align the stack. (sub works, push is more compact and usually not slower)
mov %rsp, %rsi # pointer to the 8 bytes
lea scan_format(%rip), %rdi
call scanf#plt
# %eax will be 1 if scanf successfully converted an arg
movsd (%rsp), %xmm0
mov $1, %eax # 1 FP arg in xmm registers (as opposed to memory)
lea print_format(%rip), %rdi
pop %rdx # deallocate 8 bytes. add $8, %rsp would work, too
jmp printf#plt # tailcall return printf(...)
.section .rodata
print_format: .asciz "%f\n"
scan_format: .asciz "%lf"
You could even store your format strings as immediates, too, but then you need to reserve more stack space to keep it aligned. (e.g. push $"%lf", except GAS syntax doesn't do multi-character integer constants. In NASM you really could do push '%lf' to get those 3 bytes + 5 zeros of padding.)
Related: How to print a single-precision float with printf: you can't because of C default-conversion rules that promote to double.
Also related: a Q&A about the ABI alignment rules: Printing floating point numbers from x86-64 seems to require %rbp to be saved

gcc 4.7.2 in Debian wheezy doesn't always properly align stack pointer. Is this a bug?

Casually, when reading the assembler listing of a sample C program, I noted that the stack pointer is not 16 bit aligned before calling function foo:
void foo() { }
int func(int p) { foo(); return p; }
int main() { return func(1); }
func:
pushq %rbp
movq %rsp, %rbp
subq $8, %rsp ; See here
movl %edi, -4(%rbp)
movl $0, %eax
call foo
movl -4(%rbp), %eax
leave
ret
The subq $8, %rsp instruction makes RSP not aligned before calling foo (it should be "subq $16, %rsp").
In System V ABI, par. 3.2.2, I read: "the value (%rsp − 8) is always a multiple of 16 when control is transferred to the function entry point".
Someone can help me to understand why gcc doesn't put subq $16, %rsp ?
Thank you in advance.
Edit:
I forgot to mention my OS and compiler version:
Debian wheezy, gcc 4.7.2
Assuming that the stack pointer is 16-byte aligned when func is entered, then the combination of
pushq %rbp ; <- 8 bytes
movq %rsp, %rbp
subq $8, %rsp ; <- 8 bytes
will keep it 16-byte aligned for the subsequent call to foo().
It seems that since the compiler knows about the implementation of foo() and that it's a noop, it's not bothering with the stack alignment. If foo() is seen as only a declaration or prototype in the translation unit where func() is compiled you'll see your expected stack alignment.

Not getting Segmentation Fault in C

here is the c code:
char **s;
s[334]=strdup("test");
printf("%s\n",s[334]);`
i know that strdup does the allocation of "test", but the case s[334] where we will put the pointer to the string "test" is not allocated,however,this code works like a charm
Your code exhibits undefined behavior. That does not mean it will crash. All it means is that you can't predict anything about what will happen.
A crash is rather likely, but not guaranteed at all, in this case.
"Undefined behaviour" doesn't mean you'll get a segfault, it means you might get a segfault. A conforming implementation might also decide to display ASCII art of a puppy.
You might like to check this code with a tool like Valgrind.
You don't always get segmentation fault if you access uninitialized memory.
You do access uninitialized memory here.
The compiler is too smart for us! It knows that printf("%s\n", some_string) is exactly the same as puts(some_string), so it can simplify
char **s;
s[334]=strdup("test");
printf("%s\n",s[334]);
into
char **s;
s[334]=strdup("test");
puts(s[334]);
and then (assuming no UB) that is again equivalent to
puts(strdup("test"));
So, by chance the segment fault didn't happen (this time).
I get a segfault without optimisations, but when compiled with optimisations, gcc doesn't bother with the s at all, it's eliminated as dead code.
gcc -Os -S:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $.LC0, %edi # .LC0 is where "test" is at
call strdup
addq $8, %rsp
.cfi_def_cfa_offset 8
movq %rax, %rdi
jmp puts
.cfi_endproc
gcc -S -O (same for -O2, -O3):
.LFB23:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $5, %edi
call malloc
movq %rax, %rdi
testq %rax, %rax
je .L2
movl $1953719668, (%rax)
movb $0, 4(%rax)
.L2:
call puts
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc

Resources