What does disassemble look like on machines with memory larger than 4G? - disassembly

This is what it looks like on my laptop with less than 4G:
0x004012f1 <main+0>: push %ebp
0x004012f2 <main+1>: mov %esp,%ebp
0x004012f4 <main+3>: sub $0x18,%esp
0x004012f7 <main+6>: and $0xfffffff0,%esp
Can someone using RAM larger than 4G paste a dump?
I think it should be no longer like 0x004012f7 as its capacity is only 2^32=4G

Here's a sample from my 64bit OS, the addresses are just twice as long like you'd expect...twice the address length to address 2^2*n bytes:
000000007729EE15 ldmxcsr dword ptr [rcx+34h]
000000007729EE19 fldcw word ptr [rcx+100h]
000000007729EE1F mov rsp,qword ptr [rcx+98h]
000000007729EE26 mov rcx,qword ptr [rcx+0F8h]

On a 32bit OS, the addressable space will indeed only be 2^32 = 4Gb.
On a 64bit OS (assuming a 64bit application), it will be 2^64, which is much much larger.

Related

Difference in x86-32 and x64 Assembly stack allocation for a fixed-size buffer with unoptimized C (GCC)

Doing some basic disassembly and have noticed that the buffer is being given additional buffer space for some reason although what i am looking at in a tutorial uses the same code but is only given the correct (500) chars in length. Why is this?
My code:
#include <stdio.h>
#include <string.h>
int main (int argc, char** argv){
char buffer[500];
strcpy(buffer, argv[1]);
return 0;
}
compiled with GCC, the dissembled code is:
0x0000000000001139 <+0>: push %rbp
0x000000000000113a <+1>: mov %rsp,%rbp
0x000000000000113d <+4>: sub $0x210,%rsp
0x0000000000001144 <+11>: mov %edi,-0x204(%rbp)
0x000000000000114a <+17>: mov %rsi,-0x210(%rbp)
0x0000000000001151 <+24>: mov -0x210(%rbp),%rax
0x0000000000001158 <+31>: add $0x8,%rax
0x000000000000115c <+35>: mov (%rax),%rdx
0x000000000000115f <+38>: lea -0x200(%rbp),%rax
0x0000000000001166 <+45>: mov %rdx,%rsi
0x0000000000001169 <+48>: mov %rax,%rdi
0x000000000000116c <+51>: call 0x1030 <strcpy#plt>
0x0000000000001171 <+56>: mov $0x0,%eax
0x0000000000001176 <+61>: leave
0x0000000000001177 <+62>: ret
However, this video https://www.youtube.com/watch?v=1S0aBV-Waeo clearly only has 500 bytes assigned
Why is this this the case as the only difference I can see here is one is 32-bit and another (mine) is on x86-64.
500 is not a multiple of 16.
The x86-64 ABI (application binary interface) requires the stack pointer to be a multiple of 16 whenever a call instruction is about to happen. (Since call pushes an 8-byte return address, this means the stack pointer is always congruent to 8, mod 16, when control reaches the first instruction of a called function.) For the code shown, it is convenient for the compiler to achieve this requirement by increasing the value it uses in the sub instruction, making it be a multiple of 16.
The x86-32 ABI did not make this requirement, so there was no reason for the compiler used in the video to increase the size of the stack frame.
Note that you appear to have compiled your code without optimization. I get this at -O2:
0x0000000000000000 <+0>: sub $0x208,%rsp
0x0000000000000007 <+7>: mov 0x8(%rsi),%rsi
0x000000000000000b <+11>: mov %rsp,%rdi
0x000000000000000e <+14>: call <strcpy#PLT>
0x0000000000000013 <+19>: xor %eax,%eax
0x0000000000000015 <+21>: add $0x208,%rsp
0x000000000000001c <+28>: ret
The stack adjustment is still somewhat larger than the size of the array, but not as big as what you had, and no longer a multiple of 16; the difference is that with optimization on, the frame pointer is eliminated, so %rbp does not need to be saved and restored, and so the stack pointer is not a multiple of 16 at the point of the sub instruction.
(Incidentally, there is no requirement anywhere for a stack frame to be as small as possible. "Quality of implementation" dictates that it should be as small as possible, but for various reasons it's quite common for the compiler to miss that target. In my optimized code dump, I don't see any reason why the immediate operand to sub and add couldn't have been 0x1f8 (504).

MOVABS opcode in the assembly code

While debugging one of the assembly code examples, I found following piece of information:
(gdb) x /10i 0x4005c4
0x4005c4: push %rbp
0x4005c5: mov %rsp,%rbp
0x4005c8: sub $0xa0,%rsp
0x4005cf: mov %fs:0x28,%rax
0x4005d8: mov %rax,-0x8(%rbp)
0x4005dc: xor %eax,%eax
0x4005de: movabs $0x6673646c6a6b3432,%rax
0x4005e8: mov %rax,-0x40(%rbp)
0x4005ec: movl $0x323339,-0x38(%rbp)
0x4005f3: movl $0x553059,-0x90(%rbp)
As per my understanding movabs should not be used, it seems like it was introduced intentionally. Am I right in my understanding?
What should be the equivalent MOV command to replace it?
As a direct copy from this question: https://reverseengineering.stackexchange.com/questions/2627/what-is-the-meaning-of-movabs-in-gas-x86-att-syntax
[...] The movabs instruction to load arbitrary 64-bit
constant into register and to load/store integer register from/to
arbitrary constant 64-bit address is available.
http://www.ucw.cz/~hubicka/papers/amd64/node1.html
It does exactly what you'd expect from it - it puts the immediate into the register.

Causes and benefits of this improvement on gcc version >= 4.9.0 vs gcc version < 4.9?

I have recently exploited a dangerous program and found something interesting about the difference between versions of gcc on x86-64 architecture.
Note:
Wrongful usage of gets is not the issue here.
If we replace gets with any other functions, the problem doesn't change.
This is the source code I use:
#include <stdio.h>
int main()
{
char buf[16];
gets(buf);
return 0;
}
I use gcc.godbolt.org to disassemble the program with flag -m32 -fno-stack-protector -z execstack -g.
At the disassembled code, when gcc with version >= 4.9.0:
lea ecx, [esp+4] # begin of main
and esp, -16
push DWORD PTR [ecx-4] # push esp
push ebp
mov ebp, esp
/* between these comment is not related to the question
push ecx
sub esp, 20
sub esp, 12
lea eax, [ebp-24]
push eax
call gets
add esp, 16
mov eax, 0
*/
mov ebp, esp
mov ecx, DWORD PTR [ebp-4] # ecx = saved esp
leave
lea esp, [ecx-4]
ret # end of main
But gcc with version < 4.9.0 just:
push ebp # begin of main
mov ebp, esp
/* between these comment is not related to the question
and esp, -16
sub esp, 32
lea eax, [esp+16]
mov DWORD PTR [esp], eax
call gets
mov eax, 0
*/
leave
ret # end of main
My question is: What is the causes of this difference on the disassembled code and its benefits? Does it have a name for this technique?
I can't say for sure without the actual values in:
and esp, 0xXX # XX is a number
but this looks a lot like extra code to align the stack to a larger value than the ABI requires.
Edit: The value is -16, which is 32-bit 0xFFFFFFF0 or 64-bit 0xFFFFFFFFFFFFFFF0 so this is indeed stack alignment to 16 bytes, likely meant for use of SSE instructions. As mentioned in comments, there is more code in the >= 4.9.0 version because it also aligns the frame pointer instead of only the stack pointer.
The i386 ABI, used for 32-bit programs, imposes that a process, immediately after loaded, has to have the stack aligned on 32-bit values:
%esp Performing its usual job, the stack pointer holds the address of the
bottom of the stack, which is guaranteed to be word aligned.
confront this with the x86_64 ABI1 used for 64-bit programs:
%rsp The stack pointer holds the address of the byte with lowest address which
is part of the stack. It is guaranteed to be 16-byte aligned at process entry
The opportunity gave by the new AMD's 64-bit technology to rewrite the old i386 ABI allow a number of optimizations that were lacking due to backward compatibility, among these a bigger (stricter?) stack alignment.
I won't dwell on the benefits of stack alignment but it suffices to say that if a 4-byte alignment was good, so is a 16-byte one.
So much that it is worth spending some instructions aligning the stack.
That's what GCC 4.9.0+ does, it aligns the stack at 16-bytes.
That explains the and esp, -16 but not the other instructions.
Aligning the stack with and esp, -16 is the fastest way to do it when the compiler only knows that the stack is 4-byte aligned (since esp MOD 16 can be 0, 4, 8 or 12).
However it is a destructive method, the compiler loses the original esp value.
But now it comes the chicken or the egg problem: if we save the original esp on the stack before aligning the stack, we lose it because we don't know how far the stack pointer is lowered by the alignment. If we save it after the alignment, well, we can't. We lost it in the alignment.
So the only possible solution is to save it in a register, align the stack and then save said register on the stack.
;Save the stack pointer in ECX, actually is ESP+4 but still does
lea ecx, [esp+4] #ECX = ESP+4
;Align the stack
and esp, -16 #This lowers ESP by 0, 4, 8 or 12
;IGNORE THIS FOR NOW
push DWORD PTR [ecx-4]
;Usual prolog
push ebp
mov ebp, esp
;Save the original ESP (before alignment), actually is ESP+4 but OK
push ecx
GCC saves esp+4 in ecx, I don't know why2 but this values still does the trick.
The only mystery left is the push DWORD PTR [ecx-4].
But it turns out to be a simple mystery: for debugging purposes GCC pushes the return addresses just before the old frame pointer (before push ebp), this is where 32-bit tools expect it to be.
Since ecx=esp_o+4, where esp_o is the original stack pointer pre-alignment, [ecx-4] = [esp_o] = return address.
Note that now the stack is at 12 bytes modulo 16, thus the local variable area must be of size 16*k+4 to have the stack aligned at 16-byte again.
In your example k is 1 and the area is of 20 bytes in size.
The subsequent sub esp, 12 is to align the stack for the gets function (the requirement is to have the stack aligned at the function call).
Finally, the code
mov ebp, esp
mov ecx, DWORD PTR [ebp-4] # ecx = saved esp
leave
lea esp, [ecx-4]
ret
The first instruction is copy-paste error.
One could check it out or simply reason that
if it were there the [ebp-4] would be below the stack pointer (and there is no red zone for the i386 ABI).
The rest is just undoing what's is done in the prolog:
;Get the original stack pointer
mov ecx, DWORD PTR [ebp-4] ;ecx = esp_o+4
;Standard epilog
leave ;mov esp, ebp / pop ebp
;The stack pointer points to the copied return address
;Restore the original stack pointer
lea esp, [ecx-4] ;esp = esp_o
ret
GCC has to first get the original stack pointer (+4) saved on the stack, then restore the old frame pointer (ebp) and finally, restore the original stack pointer.
The return address is on the top of the stack when lea esp, [ecx-4] is executed, so in theory GCC could just return but it has to restore the original esp because main is not the first function to be executed in a C program, so it cannot leave the stack unbalanced.
1 This is not the latest version but the text quoted went unchanged in the successive editions.
2 This has been discussed here on SO but I can't remember if in some comment or in an answer.

Why do byte spills occur and what do they achieve?

What is a byte spill?
When I dump the x86 ASM from an LLVM intermediate representation generated from a C program, there are numerous spills, usually of a 4 byte size. I cannot figure out why they occur and what they achieve.
They seem to "cut" pieces of the stack off, but in an unusual way:
## this fragment comes from a C program right before a malloc() call to a struct.
## there are other spills in different circumstances in this same program, so it
## is not related exclusively to malloc()
...
sub ESP, 84
mov EAX, 60
mov DWORD PTR [ESP + 80], 0
mov DWORD PTR [ESP], 60
mov DWORD PTR [ESP + 60], EAX # 4-byte Spill
call malloc
mov ECX, 60
...
A register spill is simply what happens when you have more local variables than registers (it's an analogy - really the meaning is that they must be saved to memory). The instruction is saving the value of EAX, likely because EAX is clobbered by malloc and you don't have another spare register to save it in (and for whatever reason the compiler has decided it needs the constant 60 in the register later).
By the looks of it, the compiler could certainly have omitted mov DWORD PTR [ESP + 60], EAX and instead repeated the mov EAX, 60 where it would otherwise mov EAX, DWORD PTR [ESP + 60] or whatever offset it used, because the saved value of EAX cannot be other than 60 at that point. However, compilation is not guaranteed to be perfectly optimal.
Bear also in mind that after sub ESP, 84, the stack size is not adjusted (except by the call instruction which of course pushes the return address). The following instructions are using ESP as a memory offset, not a destination.

C allocated space size on stack for an array

I have a simple program called demo.c which allocates space for a char array with the length of 8 on the stack
#include<stdio.h>
main()
{
char buffer[8];
return 0;
}
I thought that 8 bytes will be allocated from stack for the eight chars but if I check this in gdb there are 10 bytes subtracted from the stack.
I compile the the program with this command on my Ubuntu 32 bit machine:
$ gcc -ggdb -o demo demo.c
Then I analyze the program with:
$ gdb demo
$ disassemble main
(gdb) disassemble main
Dump of assembler code for function main:
0x08048404 <+0>: push %ebp
0x08048405 <+1>: mov %esp,%ebp
0x08048407 <+3>: and $0xfffffff0,%esp
0x0804840a <+6>: sub $0x10,%esp
0x0804840d <+9>: mov %gs:0x14,%eax
0x08048413 <+15>: mov %eax,0xc(%esp)
0x08048417 <+19>: xor %eax,%eax
0x08048419 <+21>: mov $0x0,%eax
0x0804841e <+26>: mov 0xc(%esp),%edx
0x08048422 <+30>: xor %gs:0x14,%edx
0x08048429 <+37>: je 0x8048430 <main+44>
0x0804842b <+39>: call 0x8048340 <__stack_chk_fail#plt>
0x08048430 <+44>: leave
0x08048431 <+45>: ret
End of assembler dump.
0x0804840a <+6>: sub $0x10,%esp says, that there are 10 bytes allocated from the stack right?
Why are there 10 bytes allocated and not 8?
No, 0x10 means it's hexadecimal, i.e. 1016, which is 1610 bytes in decimal.
Probably due to alignment requirements for the stack.
Please note that the constant $0x10 is in hexadecimal this is equal to 16 byte.
Take a look at the machine code:
0x08048404 <+0>: push %ebp
0x08048405 <+1>: mov %esp,%ebp
0x08048407 <+3>: and $0xfffffff0,%esp
0x0804840a <+6>: sub $0x10,%esp
...
0x08048430 <+44>: leave
0x08048431 <+45>: ret
As you can see before we subtract 16 from the esp we ensure to make esp pointing to a 16 byte aligned address first (take a look at the and $0xfffffff0,%esp instruction).
I guess the compiler try to respect the alignment so he simply reserves 16 byte as well. It does not matter anyway because 8 byte fit into 16 byte very well.
sub $0x10, %esp is saying that there are 16 bytes on the stack, not 10 since 0x is hexadecimal notation.
The amount of space for the stack is completely dependent on the compiler. In this case it's most like an alignment issue where the alignment is 16 bytes and you've requested 8, so it gets increased to 16.
If you requested 17 bytes, it would most likely have been sub $0x20, %esp or 32 bytes instead of 17.
(I skipped over some things the other answers explain in more detail).
You compiled with -O0, so gcc is operating in a super-simple way that tells you something about compiler internals, but little about how to make good code from C.
gcc is keeping the stack 16B-aligned at all times. The 32bit SysV ABI only guarantees 4B stack alignment, but GNU/Linux systems actually assume and maintain gcc's default -mpreferred-stack-boundary=4 (16B-aligned).
Your version of gcc also defaults to using -fstack-protector, so it checks for stack-smashing in functions with local char arrays with 4 or more elements:
-fstack-protector
Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to
functions with
vulnerable objects. This includes functions that call "alloca", and functions with buffers larger than 8 bytes. The guards
are
initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is
printed and
the program exits.
For some reason, this is actually kicking in with char arrays >= 4B, but not with integer arrays. (At least, not when they're unused!). char pointers can alias anything, which may have something to do with it.
See the code on godbolt, with asm output. Note how main is special: it uses andl $-16, %esp to align the stack on entry to main, but other functions assume the stack was 16B-aligned before the call instruction that called them. So they'll typically sub $24, %esp, after pushing %ebp. (%ebp and the return address are 8B total, so the stack is 8B away from being 16B-aligned). This leaves room for the stack-protector canary.
The 32bit SysV ABI only requires arrays to be aligned to the natural alignment of their elements, so this 16B alignment for the char array is just what the compiler decided to do in this case, not something you can count on.
The 64bit ABI is different:
An array uses the same alignment as its elements, except that a local
or global array variable of length at least 16 bytes or a C99
variable-length array variable always has alignment of at least 16
bytes
(links from the x86 tag wiki)
So you can count on char buf[1024] being 16B-aligned on SysV, allowing you to use SSE aligned loads/stores on it.

Resources