System V AMD64 ABI floating point varargs order

System V AMD64 ABI floating point varargs order - c

I compiled a call to printf with different kinds of args.
Here's the code + generated asm:
int main(int argc, char const *argv[]){
// 0: 55 push rbp
// 1: 48 89 e5 mov rbp,rsp
// 4: 48 83 ec 20 sub rsp,0x20
// 8: 89 7d fc mov DWORD PTR [rbp-0x4],edi
// b: 48 89 75 f0 mov QWORD PTR [rbp-0x10],rsi
printf("%s %f %d %f\n", "aye u gonna get some", 133.7f, 420, 69.f);
// f: f2 0f 10 05 00 00 00 00 movsd xmm0,QWORD PTR [rip+0x0] # 17 <main+0x17> 13: R_X86_64_PC32 .rodata+0x2c 69
// 17: 48 8b 05 00 00 00 00 mov rax,QWORD PTR [rip+0x0] # 1e <main+0x1e> 1a: R_X86_64_PC32 .rodata+0x34 133.7
// 1e: 66 0f 28 c8 movapd xmm1,xmm0
// 22: ba a4 01 00 00 mov edx,0x1a4 (420)
// 27: 48 89 45 e8 mov QWORD PTR [rbp-0x18],rax
// 2b: f2 0f 10 45 e8 movsd xmm0,QWORD PTR [rbp-0x18]
// 30: 48 8d 35 00 00 00 00 lea rsi,[rip+0x0] # 37 <main+0x37> 33: R_X86_64_PC32 .rodata-0x4 "aye u wanna get some"
// 37: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # 3e <main+0x3e> 3a: R_X86_64_PC32 .rodata+0x18 "%s %f %d %f\n"
// 3e: b8 02 00 00 00 mov eax,0x2
// 43: e8 00 00 00 00 call 48 <main+0x48> 44: R_X86_64_PLT32 printf-0x4
return 0;
// 48: b8 00 00 00 00 mov eax,0x0
// 4d: c9 leave
// 4e: c3 ret
}
Most stuff here makes sense to me.
In fact everything here makes some level of sense to me.
"%s %f %d %f\n" -> rdi
"aye u gonna get some" -> rsi
133.7 -> xmm0
420 -> rdx
69 -> xmm1
2 -> rax (to indicate there are 2 floating point arguments)
Now what I don't understand is how printf (or any other varargs function) would figure out the position of these floating point arguments among the others.
It can't be compiler magic either since it's dynamically linked.
So the only thing I can think of is maybe it's just va_arg internals, and how when you provide a type, if it's floating point, it must get from the xmms (or stack) instead of otherwise.
Is that correct? If not, how does the other side know where to get 'em? Thanks in advance.

For printf the format string indicates the type of the remaining argments.
The implementation of va_arg knows the type as it is an argument of va_arg, and the correct register can be deduced from the types.

Related

AFL-GCC compiles differently than GCC

I want to understand AFL's code instrumentation in detail.
Compiling a sample program sample.c
int main(int argc, char **argv) {
int ret = 0;
if(argc > 1) {
ret = 7;
} else {
ret = 12;
}
return ret;
}
with gcc -c -o obj/sample-gcc.o src/sample.c and afl-gcc -c -o obj/sample-afl-gcc.o src/sample.c and disassembling both with objdump -d leads to different Assembly code:
[GCC]
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 89 7d ec mov %edi,-0x14(%rbp)
b: 48 89 75 e0 mov %rsi,-0x20(%rbp)
f: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
16: 83 7d ec 01 cmpl $0x1,-0x14(%rbp)
1a: 7e 09 jle 25 <main+0x25>
1c: c7 45 fc 07 00 00 00 movl $0x7,-0x4(%rbp)
23: eb 07 jmp 2c <main+0x2c>
25: c7 45 fc 0c 00 00 00 movl $0xc,-0x4(%rbp)
2c: 8b 45 fc mov -0x4(%rbp),%eax
2f: 5d pop %rbp
30: c3 retq
[AFL-GCC]
0000000000000000 <main>:
0: 48 8d a4 24 68 ff ff lea -0x98(%rsp),%rsp
7: ff
8: 48 89 14 24 mov %rdx,(%rsp)
c: 48 89 4c 24 08 mov %rcx,0x8(%rsp)
11: 48 89 44 24 10 mov %rax,0x10(%rsp)
16: 48 c7 c1 0e ff 00 00 mov $0xff0e,%rcx
1d: e8 00 00 00 00 callq 22 <main+0x22>
22: 48 8b 44 24 10 mov 0x10(%rsp),%rax
27: 48 8b 4c 24 08 mov 0x8(%rsp),%rcx
2c: 48 8b 14 24 mov (%rsp),%rdx
30: 48 8d a4 24 98 00 00 lea 0x98(%rsp),%rsp
37: 00
38: f3 0f 1e fa endbr64
3c: 31 c0 xor %eax,%eax
3e: 83 ff 01 cmp $0x1,%edi
41: 0f 9e c0 setle %al
44: 8d 44 80 07 lea 0x7(%rax,%rax,4),%eax
48: c3 retq
AFL (usually) adds a trampoline in front of every basic block to track executed paths [https://github.com/mirrorer/afl/blob/master/afl-as.h#L130]
-> Instruction 0x00 lea until 0x30 lea
AFL (usually) adds a main payload to the program (which I excluded due to simplicity) [https://github.com/mirrorer/afl/blob/master/afl-as.h#L381]
AFL claims to use a wrapper for GCC, so I expected everything else to be equal. Why is the if-else-condition still compiled differently?
Bonus question: If a binary without source code available should be instrumented manually without using AFL's QEMU-mode or Unicorn-mode, can this be achieved by (naively) adding the main payload and each trampoline manually to the binary file or are there better approaches?

Re: Why the compilation with gcc and with afl-gcc is different, a short look at the afl-gcc source shows that by default it modifies the compiler parameters, setting -O3 -funroll-loops (as well as defining __AFL_COMPILER and FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION).
According to the documentation (docs/env_variables.txt):
By default, the wrapper appends -O3 to optimize builds. Very rarely,
this will cause problems in programs built with -Werror, simply
because -O3 enables more thorough code analysis and can spew out
additional warnings. To disable optimizations, set AFL_DONT_OPTIMIZE.

Why is a returned stack-pointer replaced by a null-pointer by gcc?

I've created the following function in c as a demonstration/small riddle about how the stack works in c:
#include "stdio.h"
int* func(int i)
{
int j = 3;
j += i;
return &j;
}
int main()
{
int *tmp = func(4);
printf("%d\n", *tmp);
func(5);
printf("%d\n", *tmp);
}
It's obviously undefined behavior and the compiler also produces a warning about that. However unfortunately the compilation didn't quite work out. For some reason gcc replaces the returned pointer by NULL (see line 6d6).
00000000000006aa <func>:
6aa: 55 push %rbp
6ab: 48 89 e5 mov %rsp,%rbp
6ae: 48 83 ec 20 sub $0x20,%rsp
6b2: 89 7d ec mov %edi,-0x14(%rbp)
6b5: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
6bc: 00 00
6be: 48 89 45 f8 mov %rax,-0x8(%rbp)
6c2: 31 c0 xor %eax,%eax
6c4: c7 45 f4 03 00 00 00 movl $0x3,-0xc(%rbp)
6cb: 8b 55 f4 mov -0xc(%rbp),%edx
6ce: 8b 45 ec mov -0x14(%rbp),%eax
6d1: 01 d0 add %edx,%eax
6d3: 89 45 f4 mov %eax,-0xc(%rbp)
6d6: b8 00 00 00 00 mov $0x0,%eax
6db: 48 8b 4d f8 mov -0x8(%rbp),%rcx
6df: 64 48 33 0c 25 28 00 xor %fs:0x28,%rcx
6e6: 00 00
6e8: 74 05 je 6ef <func+0x45>
6ea: e8 81 fe ff ff callq 570 <__stack_chk_fail#plt>
6ef: c9 leaveq
6f0: c3 retq
This is the disassembly of the binary compiled with gcc version 7.5.0 and the -O0-flag; no other flags were used. This behavior makes the entire code pointless, since it's supposed to show how the stack behaves across function-calls. Is there any way to achieve a more literal compilation of this code with a at least somewhat up-to-date version of gcc?
And just for the sake of curiosity: what's the point of changing the behavior of the code like this in the first place?

Putting the return value in a pointer variable seems to change the behavior of the compiler and it generates the assembly code that returns a pointer to stack:
int* func(int i) {
int j = 3;
j += i;
int *p = &j;
return p;
}

GCC optimizer generating error in nostdlib code

I have the following code:
void cp(void *a, const void *b, int n) {
for (int i = 0; i < n; ++i) {
((char *) a)[i] = ((const char *) b)[i];
}
}
void _start(void) {
char buf[20];
const char m[] = "123456789012345";
cp(buf, m, 15);
register int rax __asm__ ("rax") = 60; // exit
register int rdi __asm__ ("rdi") = 0; // status
__asm__ volatile (
"syscall" :: "r" (rax), "r" (rdi) : "cc", "rcx", "r11"
);
__builtin_unreachable();
}
If I compile it with gcc -nostdlib -O1 "./a.c" -o "./a", I get a functioning program, but if I compile it with -O2, I get a program that generates a segmentation fault.
This is the generated code with -O1:
0000000000001000 <cp>:
1000: b8 00 00 00 00 mov $0x0,%eax
1005: 0f b6 14 06 movzbl (%rsi,%rax,1),%edx
1009: 88 14 07 mov %dl,(%rdi,%rax,1)
100c: 48 83 c0 01 add $0x1,%rax
1010: 48 83 f8 0f cmp $0xf,%rax
1014: 75 ef jne 1005 <cp+0x5>
1016: c3 retq
0000000000001017 <_start>:
1017: 48 83 ec 30 sub $0x30,%rsp
101b: 48 b8 31 32 33 34 35 movabs $0x3837363534333231,%rax
1022: 36 37 38
1025: 48 ba 39 30 31 32 33 movabs $0x35343332313039,%rdx
102c: 34 35 00
102f: 48 89 04 24 mov %rax,(%rsp)
1033: 48 89 54 24 08 mov %rdx,0x8(%rsp)
1038: 48 89 e6 mov %rsp,%rsi
103b: 48 8d 7c 24 10 lea 0x10(%rsp),%rdi
1040: ba 0f 00 00 00 mov $0xf,%edx
1045: e8 b6 ff ff ff callq 1000 <cp>
104a: b8 3c 00 00 00 mov $0x3c,%eax
104f: bf 00 00 00 00 mov $0x0,%edi
1054: 0f 05 syscall
And this is the generated code with -O2:
0000000000001000 <cp>:
1000: 31 c0 xor %eax,%eax
1002: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
1008: 0f b6 14 06 movzbl (%rsi,%rax,1),%edx
100c: 88 14 07 mov %dl,(%rdi,%rax,1)
100f: 48 83 c0 01 add $0x1,%rax
1013: 48 83 f8 0f cmp $0xf,%rax
1017: 75 ef jne 1008 <cp+0x8>
1019: c3 retq
101a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
0000000000001020 <_start>:
1020: 48 8d 44 24 d8 lea -0x28(%rsp),%rax
1025: 48 8d 54 24 c9 lea -0x37(%rsp),%rdx
102a: b9 31 00 00 00 mov $0x31,%ecx
102f: 66 0f 6f 05 c9 0f 00 movdqa 0xfc9(%rip),%xmm0 # 2000 <_start+0xfe0>
1036: 00
1037: 48 8d 70 0f lea 0xf(%rax),%rsi
103b: 0f 29 44 24 c8 movaps %xmm0,-0x38(%rsp)
1040: eb 0d jmp 104f <_start+0x2f>
1042: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
1048: 0f b6 0a movzbl (%rdx),%ecx
104b: 48 83 c2 01 add $0x1,%rdx
104f: 88 08 mov %cl,(%rax)
1051: 48 83 c0 01 add $0x1,%rax
1055: 48 39 f0 cmp %rsi,%rax
1058: 75 ee jne 1048 <_start+0x28>
105a: b8 3c 00 00 00 mov $0x3c,%eax
105f: 31 ff xor %edi,%edi
1061: 0f 05 syscall
The crash happens at 103b, instruction movaps %xmm0,-0x38(%rsp).
I noticed that if m contains less than 15 characters, then the generated code is different and the crash does not happen.
What am I doing wrong?

_start is not a function. It's not called by anything, and on entry the stack is 16-byte aligned, not (as the ABI requires) 8 bytes away from 16-byte alignment.
(The ABI requires 16-byte alignment before a call, and call pushes an 8-byte return address. So on function entry RSP-8 and RSP+8 are 16-byte aligned.)
At -O2 GCC uses alignment-required 16-byte instructions to implement the copy done by cp(), copying the "123456789012345" from static storage to the stack.
At -O1, GCC just uses two mov r64, imm64 instructions to get bytes into integer regs for 8-byte stores. These don't require alignment.
Workarounds
Just write a main in C like a normal person if you want everything to work.
Or if you're trying to microbenchmark something light-weight in asm, you can use gcc -nostdlib -O3 -mincoming-stack-boundary=3 (docs) to tell GCC that functions can't assume they're called with more than 8-byte alignment. Unlike -mpreferred-stack-boundary=3, this will still align by 16 before making further calls. So if you have other non-leaf functions, you might want to just use an attribute on your hacky C _start() instead of affecting the whole file.
A worse, more hacky way would be to try putting
asm("push %rax"); at the very top of _start to modify RSP by 8, where GCC hopefully runs it before doing anything else with the stack. GNU C Basic asm statements are implicitly volatile so you don't need asm volatile, although that wouldn't hurt.
You're 100% on your own and responsible for correctly tricking the compiler by using inline asm that works for whatever optimization level you're using.
Another safer way would be write your own light-weight _start that calls main:
// at global scope:
asm(
".globl _start \n"
"_start: \n"
" mov (%rsp), %rdi \n" // argc
" lea 8(%rsp), %rsi \n" // argv
" lea 8(%rsi, %rdi, 8), %rdx \n" // envp
" call main \n"
// NOT DONE: stdio cleanup or other atexit stuff
// DO NOT USE WITH GLIBC; use libc's CRT code if you use libc
" mov %eax, %edi \n"
" mov $231, %eax \n"
" syscall" // exit_group( main() )
);
int main(int argc, char**argv, char**envp) {
... your code here
return 0;
}
If you didn't want main to return, you could just pop %rdi; mov %rsp, %rsi ; jmp main to give it argc and argv without a return address.
Then main can exit via inline asm, or by calling exit() or _exit() if you link libc. (But if you link libc, you should usually use its _start.)
See also: How Get arguments value using inline assembly in C without Glibc? for other hand-rolled _start versions; this is pretty much like #zwol's there.

analyzing i386 assembled function... line by line

hi i'm such newb in assemble and OS world. and yes this is my homework which i'm in stuck in deep dark of i386 manual. please help me or give me some hint.. here's code i have to analyze ine by line. this function is part of EOS(educational OS), doing about interrupt request in hal(hardware abstraction layer). i did "objdump -d interrupt.o" and got this assemble code. of course in i386.
00000000 <eos_ack_irq>:
0: 55 push %ebp ; push %ebp to stack to save stack before
1: b8 fe ff ff ff mov $0xfffffffe,%eax ; what is this??
6: 89 e5 mov %esp,%ebp ; couple with "push %ebp". known as prolog assembly function.
8: 8b 4d 08 mov 0x8(%ebp),%ecx ; set %ecx as value of (%ebp+8)...and what is this do??
b: 5d pop %ebp ; pop the top of stack to %ebp. i know this is for getting back to callee..
c: d3 c0 rol %cl,%eax ; ????? what is this for???
e: 21 05 00 00 00 00 and %eax,0x0 ; make %eax as 0. for what??
14: c3 ret ; return what register??
00000015 <eos_get_irq>:
15: 8b 15 00 00 00 00 mov 0x0,%edx
1b: b8 1f 00 00 00 mov $0x1f,%eax
20: 55 push %ebp
21: 89 e5 mov %esp,%ebp
23: 56 push %esi
24: 53 push %ebx
25: bb 01 00 00 00 mov $0x1,%ebx
2a: 89 de mov %ebx,%esi
2c: 88 c1 mov %al,%cl
2e: d3 e6 shl %cl,%esi
30: 85 d6 test %edx,%esi
32: 75 06 jne 3a <eos_get_irq+0x25>
34: 48 dec %eax
35: 83 f8 ff cmp $0xffffffff,%eax
38: 75 f0 jne 2a <eos_get_irq+0x15>
3a: 5b pop %ebx
3b: 5e pop %esi
3c: 5d pop %ebp
3d: c3 ret
0000003e <eos_disable_irq_line>:
3e: 55 push %ebp
3f: b8 01 00 00 00 mov $0x1,%eax
44: 89 e5 mov %esp,%ebp
46: 8b 4d 08 mov 0x8(%ebp),%ecx
49: 5d pop %ebp
4a: d3 e0 shl %cl,%eax
4c: 09 05 00 00 00 00 or %eax,0x0
52: c3 ret
00000053 <eos_enable_irq_line>:
53: 55 push %ebp
54: b8 fe ff ff ff mov $0xfffffffe,%eax
59: 89 e5 mov %esp,%ebp
5b: 8b 4d 08 mov 0x8(%ebp),%ecx
5e: 5d pop %ebp
5f: d3 c0 rol %cl,%eax
61: 21 05 00 00 00 00 and %eax,0x0
67: c3 ret
and here's pre-assembled C code
/* ack the specified irq */
void eos_ack_irq(int32u_t irq) {
/* clear the corresponding bit in _irq_pending register */
_irq_pending &= ~(0x1<<irq);
}
/* get the irq number */
int32s_t eos_get_irq() {
/* get the highest bit position in the _irq_pending register */
int i = 31;
for(; i>=0; i--) {
if (_irq_pending & (0x1<<i)) {
return i;
}
}
return -1;
}
/* mask an irq */
void eos_disable_irq_line(int32u_t irq) {
/* turn on the corresponding bit */
_irq_mask |= (0x1<<irq);
}
/* unmask an irq */
void eos_enable_irq_line(int32u_t irq) {
/* turn off the corresponding bit */
_irq_mask &= ~(0x1<<irq);
}
so these functions do ack and get and mask and unmask an interrupt request. and i'm stuck at the first one. so if you are mercy enough, would you please get me some hint or answer to analyze the first function? i'll try to get the others... and i'm very sorry for another homework.. (my TA doesn't look email)

21 05 00 00 00 00 (that and) is actually an and with a memory operand (namely and [0], eax) which the AT&T syntax obscures (but technically it does say that, note the absence of a $ sign). It makes more sense that way (the offset of 0 suggests you didn't link the code before disassembling).
mov $0xfffffffe, %eax is doing exactly what it looks like it's doing (note that 0xfffffffe is all ones except the lowest bit), and that means the function has been implemented like this:
_irq_pending &= rotate_left(0xFFFFFFFE, irq);
Saving a not operation. It has to be a rotate there instead of a shift in order to make the low bits 1 if necessary.

Why does this code prevent gcc & llvm from tail-call optimization?

I have tried the following code on gcc 4.4.5 on Linux and gcc-llvm on Mac OSX(Xcode 4.2.1) and this. The below are the source and the generated disassembly of the relevant functions. (Added: compiled with gcc -O2 main.c)
#include <stdio.h>
__attribute__((noinline))
static void g(long num)
{
long m, n;
printf("%p %ld\n", &m, n);
return g(num-1);
}
__attribute__((noinline))
static void h(long num)
{
long m, n;
printf("%ld %ld\n", m, n);
return h(num-1);
}
__attribute__((noinline))
static void f(long * num)
{
scanf("%ld", num);
g(*num);
h(*num);
return f(num);
}
int main(void)
{
printf("int:%lu long:%lu unsigned:%lu\n", sizeof(int), sizeof(long), sizeof(unsigned));
long num;
f(&num);
return 0;
}
08048430 <g>:
8048430: 55 push %ebp
8048431: 89 e5 mov %esp,%ebp
8048433: 53 push %ebx
8048434: 89 c3 mov %eax,%ebx
8048436: 83 ec 24 sub $0x24,%esp
8048439: 8d 45 f4 lea -0xc(%ebp),%eax
804843c: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
8048443: 00
8048444: 89 44 24 04 mov %eax,0x4(%esp)
8048448: c7 04 24 d0 85 04 08 movl $0x80485d0,(%esp)
804844f: e8 f0 fe ff ff call 8048344 <printf#plt>
8048454: 8d 43 ff lea -0x1(%ebx),%eax
8048457: e8 d4 ff ff ff call 8048430 <g>
804845c: 83 c4 24 add $0x24,%esp
804845f: 5b pop %ebx
8048460: 5d pop %ebp
8048461: c3 ret
8048462: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
8048469: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
08048470 <h>:
8048470: 55 push %ebp
8048471: 89 e5 mov %esp,%ebp
8048473: 83 ec 18 sub $0x18,%esp
8048476: 66 90 xchg %ax,%ax
8048478: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
804847f: 00
8048480: c7 44 24 04 00 00 00 movl $0x0,0x4(%esp)
8048487: 00
8048488: c7 04 24 d8 85 04 08 movl $0x80485d8,(%esp)
804848f: e8 b0 fe ff ff call 8048344 <printf#plt>
8048494: eb e2 jmp 8048478 <h+0x8>
8048496: 8d 76 00 lea 0x0(%esi),%esi
8048499: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
080484a0 <f>:
80484a0: 55 push %ebp
80484a1: 89 e5 mov %esp,%ebp
80484a3: 53 push %ebx
80484a4: 89 c3 mov %eax,%ebx
80484a6: 83 ec 14 sub $0x14,%esp
80484a9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
80484b0: 89 5c 24 04 mov %ebx,0x4(%esp)
80484b4: c7 04 24 e1 85 04 08 movl $0x80485e1,(%esp)
80484bb: e8 94 fe ff ff call 8048354 <__isoc99_scanf#plt>
80484c0: 8b 03 mov (%ebx),%eax
80484c2: e8 69 ff ff ff call 8048430 <g>
80484c7: 8b 03 mov (%ebx),%eax
80484c9: e8 a2 ff ff ff call 8048470 <h>
80484ce: eb e0 jmp 80484b0 <f+0x10>
We can see that g() and h() are mostly identical except the & (address of) operator beside the argument m of printf()(and the irrelevant %ld and %p).
However, h() is tail-call optimized and g() is not. Why?

In g(), you're taking the address of a local variable and passing it to a function. A "sufficiently smart compiler" should realize that printf does not store that pointer. Instead, gcc and llvm assume that printf might store the pointer somewhere, so the call frame containing m might need to be "live" further down in the recursion. Therefore, no TCO.

It's the & that does it. It tells the compiler that m should be stored on the stack. Even though it is passed to printf, the compiler has to assume that it might be accessed by somebody else and thus must the cleaned from the stack after the call to g.
In this particular case, as printf is known by the compiler (and it knows that it does not save pointers), it could probably be taught to perform this optimization.
For more info on this, look up 'escape anlysis'.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

System V AMD64 ABI floating point varargs order - c

For printf the format string indicates the type of the remaining argments. The implementation of va_arg knows the type as it is an argument of va_arg, and the correct register can be deduced from the types.

Related

AFL-GCC compiles differently than GCC

Why is a returned stack-pointer replaced by a null-pointer by gcc?

GCC optimizer generating error in nostdlib code

analyzing i386 assembled function... line by line

Why does this code prevent gcc & llvm from tail-call optimization?

Categories

Resources