Why use JLE instead of JL?

Why use JLE instead of JL? - c

I wrote the following program:
#include <stdio.h>
int main()
{
int i = 0;
for (; i < 4; i++)
{
printf("%i",i);
}
return 0;
}
I compiled it using gcc test.c -o test.o, then disassembled it using objdump -d -Mintel test.o. The assembly code I got (at least the relevant part) is the following:
0804840c <main>:
804840c: 55 push ebp
804840d: 89 e5 mov ebp,esp
804840f: 83 e4 f0 and esp,0xfffffff0
8048412: 83 ec 20 sub esp,0x20
8048415: c7 44 24 1c 00 00 00 mov DWORD PTR [esp+0x1c],0x0
804841c: 00
804841d: eb 19 jmp 8048438 <main+0x2c>
804841f: 8b 44 24 1c mov eax,DWORD PTR [esp+0x1c]
8048423: 89 44 24 04 mov DWORD PTR [esp+0x4],eax
8048427: c7 04 24 e8 84 04 08 mov DWORD PTR [esp],0x80484e8
804842e: e8 bd fe ff ff call 80482f0 <printf#plt>
8048433: 83 44 24 1c 01 add DWORD PTR [esp+0x1c],0x1
8048438: 83 7c 24 1c 03 cmp DWORD PTR [esp+0x1c],0x3
804843d: 7e e0 jle 804841f <main+0x13>
804843f: b8 00 00 00 00 mov eax,0x0
8048444: c9 leave
8048445: c3 ret
I noticed that, although my compare operation was i < 4, the assembly code is (after disassembly) i <= 3. Why does that happen? Why would it use JLE instead of JL?

Loops that count upwards, and have constant limit, are very common. The compiler has two options to implement the check for loop termination - JLE and JL. While the two ways seem absolutely equivalent, consider the following.
As you can see in the disassembly listing, the constant (3 in your case) is encoded in 1 byte. If your loop counted to 256 instead of 4, it would be impossible to use such an efficient encoding for the CMP instruction, and the compiler would have to use a "larger" encoding. So JLE provides a marginal improvement in code density (which is ultimately good for performance because of caching).

It would JLE because it shifted the value by one.
if (x < 4) {
// ran when x is 3, 2, 1, 0, -1, ... MIN_INT.
}
is logically equivalent to
if (x <= 3) {
// ran when x is 3, 2, 1, 0, -1, ... MIN_INT.
}
Why the compiler chose one internal representation over another is often a matter of optimization, but really it is hard to know if optimization was the true driver. In any case, functional equivalents like this is the reason why back-mapping isn't 100% accurate. There are many ways to write a condition that has the same effect over the same inputs.

Related

Why is a returned stack-pointer replaced by a null-pointer by gcc?

I've created the following function in c as a demonstration/small riddle about how the stack works in c:
#include "stdio.h"
int* func(int i)
{
int j = 3;
j += i;
return &j;
}
int main()
{
int *tmp = func(4);
printf("%d\n", *tmp);
func(5);
printf("%d\n", *tmp);
}
It's obviously undefined behavior and the compiler also produces a warning about that. However unfortunately the compilation didn't quite work out. For some reason gcc replaces the returned pointer by NULL (see line 6d6).
00000000000006aa <func>:
6aa: 55 push %rbp
6ab: 48 89 e5 mov %rsp,%rbp
6ae: 48 83 ec 20 sub $0x20,%rsp
6b2: 89 7d ec mov %edi,-0x14(%rbp)
6b5: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
6bc: 00 00
6be: 48 89 45 f8 mov %rax,-0x8(%rbp)
6c2: 31 c0 xor %eax,%eax
6c4: c7 45 f4 03 00 00 00 movl $0x3,-0xc(%rbp)
6cb: 8b 55 f4 mov -0xc(%rbp),%edx
6ce: 8b 45 ec mov -0x14(%rbp),%eax
6d1: 01 d0 add %edx,%eax
6d3: 89 45 f4 mov %eax,-0xc(%rbp)
6d6: b8 00 00 00 00 mov $0x0,%eax
6db: 48 8b 4d f8 mov -0x8(%rbp),%rcx
6df: 64 48 33 0c 25 28 00 xor %fs:0x28,%rcx
6e6: 00 00
6e8: 74 05 je 6ef <func+0x45>
6ea: e8 81 fe ff ff callq 570 <__stack_chk_fail#plt>
6ef: c9 leaveq
6f0: c3 retq
This is the disassembly of the binary compiled with gcc version 7.5.0 and the -O0-flag; no other flags were used. This behavior makes the entire code pointless, since it's supposed to show how the stack behaves across function-calls. Is there any way to achieve a more literal compilation of this code with a at least somewhat up-to-date version of gcc?
And just for the sake of curiosity: what's the point of changing the behavior of the code like this in the first place?

Putting the return value in a pointer variable seems to change the behavior of the compiler and it generates the assembly code that returns a pointer to stack:
int* func(int i) {
int j = 3;
j += i;
int *p = &j;
return p;
}

Understanding how assembly language passes arguments from one method to another

I've, for a few hours, been trying to enlarge my understanding of Assembly Language, by trying to read and understand the instructions of a very simple program I wrote in C to initiate myself to how arguments were handled in ASM.
#include <stdio.h>
int say_hello();
int main(void) {
printf("say_hello() -> %d\n", say_hello(10, 20, 30, 40, 50, 60, 70, 80, 90, 100));
}
int say_hello(int a, int b, int c, int d, int e, int f, int g, int h, int i, int j) {
printf("a:b:c:d:e:f:g:h:i:j -> %d:%d:%d:%d:%d:%d:%d:%d:%d:%d\n", a, b, c, d, e, f, g, h, i, j);
return 1000;
}
The program is as I said, very basic and contains two functions, the main and another one called say_hello which takes 10 arguments, from a to j and print each one of them in a printf call. I've tried doing the same process (So trying to understand the instructions and what's happening), with the same program and less arguments, I think I was able to understand most of it, but then I was wondering, "ok but what's happening if I have so many arguments, there isn't any more register available to store the value in"
So I went to look for how many registers were available and usable in my case, and I found out from this website that "only" (not sure, correct me if I'm wrong) the following registers could be used in my case to store argument values in them edi, esi, r8d, r9d, r10d, r11d, edx, ecx, which is 8, so I went to modify my C program and I added a few more arguments, so that I reach the 8 limit, I even added one more, I don't really know why, let's say just in case.
So when I compiled my program using gcc with no optimization related option whatsoever, I was expecting the main() function to push the values that were left after all the 8 registers have been used, but I wasn't expecting anything from the say_hello() method, that's pretty much why I tried this out in the first place.
So I went to compile my program, then disassembled it using the objdump command (More specifically, this is the full command I used: objdump -d -M intel helloworld) and I started looking for my main method, which was doing pretty much as I expected
000000000000064a <main>:
64a: 55 push rbp
64b: 48 89 e5 mov rbp,rsp
64e: 6a 64 push 0x64
650: 6a 5a push 0x5a
652: 6a 50 push 0x50
654: 6a 46 push 0x46
656: 41 b9 3c 00 00 00 mov r9d,0x3c
65c: 41 b8 32 00 00 00 mov r8d,0x32
662: b9 28 00 00 00 mov ecx,0x28
667: ba 1e 00 00 00 mov edx,0x1e
66c: be 14 00 00 00 mov esi,0x14
671: bf 0a 00 00 00 mov edi,0xa
676: b8 00 00 00 00 mov eax,0x0
67b: e8 1e 00 00 00 call 69e <say_hello>
680: 48 83 c4 20 add rsp,0x20
684: 89 c6 mov esi,eax
686: 48 8d 3d 0b 01 00 00 lea rdi,[rip+0x10b] # 798 <_IO_stdin_used+0x8>
68d: b8 00 00 00 00 mov eax,0x0
692: e8 89 fe ff ff call 520 <printf#plt>
697: b8 00 00 00 00 mov eax,0x0
69c: c9 leave
69d: c3 ret
So it, as I expected pushed the values that were left after all the registers had been used into the stack, and then just did the usual work to pass values from one method to another. But then I went to look for the say_hello method, and it got me really confused.
000000000000069e <say_hello>:
69e: 55 push rbp
69f: 48 89 e5 mov rbp,rsp
6a2: 48 83 ec 20 sub rsp,0x20
6a6: 89 7d fc mov DWORD PTR [rbp-0x4],edi
6a9: 89 75 f8 mov DWORD PTR [rbp-0x8],esi
6ac: 89 55 f4 mov DWORD PTR [rbp-0xc],edx
6af: 89 4d f0 mov DWORD PTR [rbp-0x10],ecx
6b2: 44 89 45 ec mov DWORD PTR [rbp-0x14],r8d
6b6: 44 89 4d e8 mov DWORD PTR [rbp-0x18],r9d
6ba: 44 8b 45 ec mov r8d,DWORD PTR [rbp-0x14]
6be: 8b 7d f0 mov edi,DWORD PTR [rbp-0x10]
6c1: 8b 4d f4 mov ecx,DWORD PTR [rbp-0xc]
6c4: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8]
6c7: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
6ca: 48 83 ec 08 sub rsp,0x8
6ce: 8b 75 28 mov esi,DWORD PTR [rbp+0x28]
6d1: 56 push rsi
6d2: 8b 75 20 mov esi,DWORD PTR [rbp+0x20]
6d5: 56 push rsi
6d6: 8b 75 18 mov esi,DWORD PTR [rbp+0x18]
6d9: 56 push rsi
6da: 8b 75 10 mov esi,DWORD PTR [rbp+0x10]
6dd: 56 push rsi
6de: 8b 75 e8 mov esi,DWORD PTR [rbp-0x18]
6e1: 56 push rsi
6e2: 45 89 c1 mov r9d,r8d
6e5: 41 89 f8 mov r8d,edi
6e8: 89 c6 mov esi,eax
6ea: 48 8d 3d bf 00 00 00 lea rdi,[rip+0xbf] # 7b0 <_IO_stdin_used+0x20>
6f1: b8 00 00 00 00 mov eax,0x0
6f6: e8 25 fe ff ff call 520 <printf#plt>
6fb: 48 83 c4 30 add rsp,0x30
6ff: b8 e8 03 00 00 mov eax,0x3e8
704: c9 leave
705: c3 ret
706: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
70d: 00 00 00
I'm really sorry in advance, I'm not exactly sure I really understand well what the square brackets do, but from what I've read and understand it's a way to "point" to the address containing the value I want (please correct me if I'm wrong), so for example mov DWORD PTR [rbp-0x4],edi moves the value in edi to the value at the address rsp-0x4, right?
I'm also not actually not sure why this process is required, can't the say_hello method just read edi for example and that's it? Why does the program have to move it into [rbp-0x4] and then re-reading it back from [rbp-0x4] to eax ?
So the program just goes on and reads every value it needs and put them into an available register, and when it reaches the point when there's no register left, it just starts moving all of them into esi and then pushing them onto the stack, then repeating the process until all the 10 arguments have been stored somewhere.
So that makes sense, I was satisfied and then just went to double check if I really had got it well, so I started reading from bottom to top, starting from 0x6ea to 0x6e2 so the sample I'm working on is
6e2: 45 89 c1 mov r9d,r8d
6e5: 41 89 f8 mov r8d,edi
6e8: 89 c6 mov esi,eax
6ea: 48 8d 3d bf 00 00 00 lea rdi,[rip+0xbf] # 7b0 <_IO_stdin_used+0x20>
So just like on all my previous tests, I was expecting the arguments to go in "reverse" like the first argument is the last instruction executed, and the last one the first instruction executed, so I started double checking every field.
So the first one, rdi was [rip+0x10b] which I thought for sure was pointing to my string.
So then I moved to 0x6e8, which moves eax which is currently equal to the value stored in [rbp-0x4], which is equal to edi as stated at 0x6a6, and edi is equal to 0xa (10) as stated on 0x671, so my first argument is my string, and the second one is 10, which is exactly what I expected.
But then when I jumped on the instruction executed right before 0x6e8, so 0x6e5 I was expecting it to be 20, so I did the same process. edi is moved to r8d and is currently equal to the value stored in [rbp-0x10] which is equal to ecx which is equal to, as stated at 0x662.. 40? What the heck? I'm confused, why would it be 40? Then I tried looking up the instruction right above that one, and found 50, and did the same for the next one, and again I found 60!! Why? Is the way I get those values wrong? Am I missing something in the instructions? Or did I just assume something by looking at my previous programs (which all had way less arguments, and were all in "reverse" like I said earlier) that I should not have?
I'm sorry if this is a dumb post, I'm very new to ASM (few hours of experience!) and just trying to get my mind cleared on that one, as I really can't figure it out alone. I'm also sorry if this post is too long, I was trying to include a lot of informations so that what I'm trying to do is clear, the result I get is clear, and what my problem is is clear aswell. Thanks a lot for reading and even a bigger thanks to anyone who will help!

Compiler optimization of strcmp I don't understand, against a constant string

In order to improve my binary exploitation skills, and deepen my understanding in low level environments I tried solving challenges in pwnable.kr, The first challenge- called fd has the following C code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char buf[32];
int main(int argc, char* argv[], char* envp[]){
if(argc<2){
printf("pass argv[1] a number\n");
return 0;
}
int fd = atoi( argv[1] ) - 0x1234;
int len = 0;
len = read(fd, buf, 32);
if(!strcmp("LETMEWIN\n", buf)){
printf("good job :)\n");
system("/bin/cat flag");
exit(0);
}
printf("learn about Linux file IO\n");
return 0;
}
I used objdump -S -g ./fd in order to disassemble it, and I got confused, because insteaad of calling a strcmp function. It just compared the strings without calling it.
This is the assembly code im talking about:
80484c6: e8 05 ff ff ff call 80483d0 <atoi#plt>
80484cb: 2d 34 12 00 00 sub eax,0x1234
; eax = atoi( argv[1] ) - 0x1234;
; initialize fd=eax
80484d0: 89 44 24 18 mov DWORD PTR [esp+0x18],eax
; initialize len
80484d4: c7 44 24 1c 00 00 00 mov DWORD PTR [esp+0x1c],0x0
; Set up read variables
80484db: 00
80484dc: c7 44 24 08 20 00 00 mov DWORD PTR [esp+0x8],0x20 ; read 32 bytes
80484e3: 00
80484e4: c7 44 24 04 60 a0 04 mov DWORD PTR [esp+0x4],0x804a060 ; buf variable address
80484eb: 08
80484ec: 8b 44 24 18 mov eax,DWORD PTR [esp+0x18]
80484f0: 89 04 24 mov DWORD PTR [esp],eax ; fd variable
80484f3: e8 78 fe ff ff call 8048370 <read#plt>
80484f8: 89 44 24 1c mov DWORD PTR [esp+0x1c],eax
80484fc: ba 46 86 04 08 mov edx,0x8048646 ; "LETMEWIN\n" address
8048501: b8 60 a0 04 08 mov eax,0x804a060 ; buf address
8048506: b9 0a 00 00 00 mov ecx,0xa ; what is this?
; strcmp starts here?
804850b: 89 d6 mov esi,edx
804850d: 89 c7 mov edi,eax
804850f: f3 a6 repz cmps BYTE PTR ds:[esi],BYTE PTR es:[edi] ; <------- ?STRCMP?
The things I don't understand are:
Where is the strcmp call? And why is it like that?
What does this 8048506: b9 0a 00 00 00 mov ecx,0xa do?

The compiler inlined strcmp against a known-length string using repe cmpsb which implements memcmp.
It loads into register esi the address of the constant literal string "LETMEWIN\n". Note that the length of this string is 10 (with the '\0' at the end).
Then it loads the address of buf into edi register, then it calls for the x86 instruction:
repz cmps BYTE PTR ds:[esi],BYTE PTR es:[edi]
repz repeats the following instruction as long as zero flag is set and up to the number of times stored in ecx (this explains you the mov ecx,0xa ; what is this?).
The repeated instruction is cmps which compares strings (byte by byte) and automatically increases the pointers by 1 on each iteration.
When the compared bytes are equal, it sets the zero flag.
So per your questions:
Where is the strcmp call? And why is it like that?
No explicit call for strcmp, it is optimized out and replaced with inlined code:
80484fc: ba 46 86 04 08 mov edx,0x8048646 ; "LETMEWIN\n" address
8048501: b8 60 a0 04 08 mov eax,0x804a060 ; buf address
8048506: b9 0a 00 00 00 mov ecx,0xa ; number of bytes to compare
804850b: 89 d6 mov esi,edx
804850d: 89 c7 mov edi,eax
804850f: f3 a6 repz cmps BYTE PTR ds:[esi],BYTE PTR es:[edi] ;
Actually it misses the part where it should check if the returned value of strcmp is zero or not. I think you just didn't copy it here. There probably should be something like je ... / jz ... / jne ... / jnz ... right after the repz ... line.
What does this 8048506: b9 0a 00 00 00 mov ecx,0xa do?
It sets the maximum number of bytes to compare.

GCC optimizer generating error in nostdlib code

I have the following code:
void cp(void *a, const void *b, int n) {
for (int i = 0; i < n; ++i) {
((char *) a)[i] = ((const char *) b)[i];
}
}
void _start(void) {
char buf[20];
const char m[] = "123456789012345";
cp(buf, m, 15);
register int rax __asm__ ("rax") = 60; // exit
register int rdi __asm__ ("rdi") = 0; // status
__asm__ volatile (
"syscall" :: "r" (rax), "r" (rdi) : "cc", "rcx", "r11"
);
__builtin_unreachable();
}
If I compile it with gcc -nostdlib -O1 "./a.c" -o "./a", I get a functioning program, but if I compile it with -O2, I get a program that generates a segmentation fault.
This is the generated code with -O1:
0000000000001000 <cp>:
1000: b8 00 00 00 00 mov $0x0,%eax
1005: 0f b6 14 06 movzbl (%rsi,%rax,1),%edx
1009: 88 14 07 mov %dl,(%rdi,%rax,1)
100c: 48 83 c0 01 add $0x1,%rax
1010: 48 83 f8 0f cmp $0xf,%rax
1014: 75 ef jne 1005 <cp+0x5>
1016: c3 retq
0000000000001017 <_start>:
1017: 48 83 ec 30 sub $0x30,%rsp
101b: 48 b8 31 32 33 34 35 movabs $0x3837363534333231,%rax
1022: 36 37 38
1025: 48 ba 39 30 31 32 33 movabs $0x35343332313039,%rdx
102c: 34 35 00
102f: 48 89 04 24 mov %rax,(%rsp)
1033: 48 89 54 24 08 mov %rdx,0x8(%rsp)
1038: 48 89 e6 mov %rsp,%rsi
103b: 48 8d 7c 24 10 lea 0x10(%rsp),%rdi
1040: ba 0f 00 00 00 mov $0xf,%edx
1045: e8 b6 ff ff ff callq 1000 <cp>
104a: b8 3c 00 00 00 mov $0x3c,%eax
104f: bf 00 00 00 00 mov $0x0,%edi
1054: 0f 05 syscall
And this is the generated code with -O2:
0000000000001000 <cp>:
1000: 31 c0 xor %eax,%eax
1002: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
1008: 0f b6 14 06 movzbl (%rsi,%rax,1),%edx
100c: 88 14 07 mov %dl,(%rdi,%rax,1)
100f: 48 83 c0 01 add $0x1,%rax
1013: 48 83 f8 0f cmp $0xf,%rax
1017: 75 ef jne 1008 <cp+0x8>
1019: c3 retq
101a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
0000000000001020 <_start>:
1020: 48 8d 44 24 d8 lea -0x28(%rsp),%rax
1025: 48 8d 54 24 c9 lea -0x37(%rsp),%rdx
102a: b9 31 00 00 00 mov $0x31,%ecx
102f: 66 0f 6f 05 c9 0f 00 movdqa 0xfc9(%rip),%xmm0 # 2000 <_start+0xfe0>
1036: 00
1037: 48 8d 70 0f lea 0xf(%rax),%rsi
103b: 0f 29 44 24 c8 movaps %xmm0,-0x38(%rsp)
1040: eb 0d jmp 104f <_start+0x2f>
1042: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
1048: 0f b6 0a movzbl (%rdx),%ecx
104b: 48 83 c2 01 add $0x1,%rdx
104f: 88 08 mov %cl,(%rax)
1051: 48 83 c0 01 add $0x1,%rax
1055: 48 39 f0 cmp %rsi,%rax
1058: 75 ee jne 1048 <_start+0x28>
105a: b8 3c 00 00 00 mov $0x3c,%eax
105f: 31 ff xor %edi,%edi
1061: 0f 05 syscall
The crash happens at 103b, instruction movaps %xmm0,-0x38(%rsp).
I noticed that if m contains less than 15 characters, then the generated code is different and the crash does not happen.
What am I doing wrong?

_start is not a function. It's not called by anything, and on entry the stack is 16-byte aligned, not (as the ABI requires) 8 bytes away from 16-byte alignment.
(The ABI requires 16-byte alignment before a call, and call pushes an 8-byte return address. So on function entry RSP-8 and RSP+8 are 16-byte aligned.)
At -O2 GCC uses alignment-required 16-byte instructions to implement the copy done by cp(), copying the "123456789012345" from static storage to the stack.
At -O1, GCC just uses two mov r64, imm64 instructions to get bytes into integer regs for 8-byte stores. These don't require alignment.
Workarounds
Just write a main in C like a normal person if you want everything to work.
Or if you're trying to microbenchmark something light-weight in asm, you can use gcc -nostdlib -O3 -mincoming-stack-boundary=3 (docs) to tell GCC that functions can't assume they're called with more than 8-byte alignment. Unlike -mpreferred-stack-boundary=3, this will still align by 16 before making further calls. So if you have other non-leaf functions, you might want to just use an attribute on your hacky C _start() instead of affecting the whole file.
A worse, more hacky way would be to try putting
asm("push %rax"); at the very top of _start to modify RSP by 8, where GCC hopefully runs it before doing anything else with the stack. GNU C Basic asm statements are implicitly volatile so you don't need asm volatile, although that wouldn't hurt.
You're 100% on your own and responsible for correctly tricking the compiler by using inline asm that works for whatever optimization level you're using.
Another safer way would be write your own light-weight _start that calls main:
// at global scope:
asm(
".globl _start \n"
"_start: \n"
" mov (%rsp), %rdi \n" // argc
" lea 8(%rsp), %rsi \n" // argv
" lea 8(%rsi, %rdi, 8), %rdx \n" // envp
" call main \n"
// NOT DONE: stdio cleanup or other atexit stuff
// DO NOT USE WITH GLIBC; use libc's CRT code if you use libc
" mov %eax, %edi \n"
" mov $231, %eax \n"
" syscall" // exit_group( main() )
);
int main(int argc, char**argv, char**envp) {
... your code here
return 0;
}
If you didn't want main to return, you could just pop %rdi; mov %rsp, %rsi ; jmp main to give it argc and argv without a return address.
Then main can exit via inline asm, or by calling exit() or _exit() if you link libc. (But if you link libc, you should usually use its _start.)
See also: How Get arguments value using inline assembly in C without Glibc? for other hand-rolled _start versions; this is pretty much like #zwol's there.

Assembly - why is %rsp decremented by so much, and why are arguments stored at the top of the stack?

Assembly newbie here... I wrote the following simple C program:
void fun(int x, int* y)
{
char arr[4];
int* sp;
sp = y;
}
int main()
{
int i = 4;
fun(i, &i);
return 0;
}
I compiled it with gcc and ran objdump with -S, but the Assembly code output is confusing me:
000000000040055d <fun>:
void fun(int x, int* y)
{
40055d: 55 push %rbp
40055e: 48 89 e5 mov %rsp,%rbp
400561: 48 83 ec 30 sub $0x30,%rsp
400565: 89 7d dc mov %edi,-0x24(%rbp)
400568: 48 89 75 d0 mov %rsi,-0x30(%rbp)
40056c: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
400573: 00 00
400575: 48 89 45 f8 mov %rax,-0x8(%rbp)
400579: 31 c0 xor %eax,%eax
char arr[4];
int* sp;
sp = y;
40057b: 48 8b 45 d0 mov -0x30(%rbp),%rax
40057f: 48 89 45 e8 mov %rax,-0x18(%rbp)
}
400583: 48 8b 45 f8 mov -0x8(%rbp),%rax
400587: 64 48 33 04 25 28 00 xor %fs:0x28,%rax
40058e: 00 00
400590: 74 05 je 400597 <fun+0x3a>
400592: e8 a9 fe ff ff callq 400440 <__stack_chk_fail#plt>
400597: c9 leaveq
400598: c3 retq
0000000000400599 <main>:
int main()
{
400599: 55 push %rbp
40059a: 48 89 e5 mov %rsp,%rbp
40059d: 48 83 ec 10 sub $0x10,%rsp
int i = 4;
4005a1: c7 45 fc 04 00 00 00 movl $0x4,-0x4(%rbp)
fun(i, &i);
4005a8: 8b 45 fc mov -0x4(%rbp),%eax
4005ab: 48 8d 55 fc lea -0x4(%rbp),%rdx
4005af: 48 89 d6 mov %rdx,%rsi
4005b2: 89 c7 mov %eax,%edi
4005b4: e8 a4 ff ff ff callq 40055d <fun>
return 0;
4005b9: b8 00 00 00 00 mov $0x0,%eax
}
4005be: c9 leaveq
4005bf: c3 retq
First, in the line:
400561: 48 83 ec 30 sub $0x30,%rsp
Why is the stack pointer decremented so much in the call to 'fun' (48 bytes)? I assume it has to do with alignment issues, but I cannot visualize why it would need so much space (I only count 12 bytes for local variables (assuming 8 byte pointers))?
Second, I thought that in x86_64, the arguments to a function are either stored in specific registers, or if there are a lot of them, just 'above' (with a downward growing stack) the base pointer, %rbp. Like in the picture at http://en.wikipedia.org/wiki/Call_stack#Structure except 'upside-down'.
But the lines:
400565: 89 7d dc mov %edi,-0x24(%rbp)
400568: 48 89 75 d0 mov %rsi,-0x30(%rbp)
suggest to me that they are being stored way down from the base of the stack (%rsi and %edi are where main put the arguments, right before calling 'fun', and 0x30 down from %rbp is exactly where the stack pointer is pointing...). And when I try to do stuff with them , like assigning their values to local variables, it grabs them from those locations near the head of the stack:
sp = y;
40057b: 48 8b 45 d0 mov -0x30(%rbp),%rax
40057f: 48 89 45 e8 mov %rax,-0x18(%rbp)
... what is going on here?! I would expect them to grab the arguments from either the registers they were stored in, or just above the base pointer, where I thought they are 'supposed to be', according to every basic tutorial I read. Every answer and post I found on here related to stack frame questions confirms my understanding of what stack frames "should" look like, so why is my Assembly output so darn weird?

Because that stuff is a hideously simplified version of what really goes on. It's like wondering why Newtonian mechanics doesn't model the movement of the planets down to the millimeter. Compilers need stack space for all sorts of things. For example, saving callee-saved registers.
Also, the fundamental fact is that debug-mode compilations contain all sorts of debugging and checking machinery. The compiler outputs all sorts of code that checks that your code is correct, for example the call to __stack_chk_fail.
There are only two ways to understand the output of a given compiler. The first is to implement the compiler, or be otherwise very familiar with the implementation. The second is to accept that whatever you understand is a gross simplification. Pick one.

Because you're compiling without optimization, the compiler does lots of extra stuff to maybe make things easier to debug, which use lots of extra space.
it does not attempt to compress the stack frame to reuse memory for anything, or get rid of any unused things.
it redundantly copies the arguments into the stack frame (which requires still more memory)
it copies a 'canary' on to the stack to guard against stack smashing buffer overflows (even though they can't happen in this code).
Try turning on optimization, and you'll see more real code.

This is 64 bit code. 0x30 of stack space corresponds to 6 slots on the stack. You have what appears to be:
2 slots for function arguments (which happen also to be passed in registers)
2 slots for local variables
1 slot for saving the AX register
1 slot looks like a stack guard, probably related to DEBUG mode.
Best thing is to experiment rather than ask questions. Try compiling in different modes (DEBUG, optimisation, etc), and with different numbers and types of arguments and variables. Sometimes asking other people is just too easy -- you learn better by doing your own experiments.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight