What is the purpose of the RBP register in x86_64 assembler? - c

So I'm trying to learn a little bit of assembly, because I need it for Computer Architecture class. I wrote a few programs, like printing the Fibonacci sequence.
I recognized that whenever I write a function I use those 3 lines (as I learned from comparing assembly code generated from gcc to its C equivalent):
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
I have 2 questions about it:
First of all, why do I need to use %rbp? Isn't it simpler to use %rsp, as its contents are moved to %rbp on the 2nd line?
Why do I have to subtract anything from %rsp? I mean it's not always 16, when I was printfing like 7 or 8 variables, then I would subtract 24 or 28.
I use Manjaro 64 bit on a Virtual Machine (4 GB RAM), Intel 64 bit processor

rbp is the frame pointer on x86_64. In your generated code, it gets a snapshot of the stack pointer (rsp) so that when adjustments are made to rsp (i.e. reserving space for local variables or pushing values on to the stack), local variables and function parameters are still accessible from a constant offset from rbp.
A lot of compilers offer frame pointer omission as an optimization option; this will make the generated assembly code access variables relative to rsp instead and free up rbp as another general purpose register for use in functions.
In the case of GCC, which I'm guessing you're using from the AT&T assembler syntax, that switch is -fomit-frame-pointer. Try compiling your code with that switch and see what assembly code you get. You will probably notice that when accessing values relative to rsp instead of rbp, the offset from the pointer varies throughout the function.

Linux uses the System V ABI for x86-64 (AMD64) architecture; see System V ABI at OSDev Wiki for details.
This means the stack grows down; smaller addresses are "higher up" in the stack. Typical C functions are compiled to
pushq %rbp ; Save address of previous stack frame
movq %rsp, %rbp ; Address of current stack frame
subq $16, %rsp ; Reserve 16 bytes for local variables
; ... function ...
movq %rbp, %rsp ; \ equivalent to the
popq %rbp ; / 'leave' instruction
ret
The amount of memory reserved for the local variables is always a multiple of 16 bytes, to keep the stack aligned to 16 bytes. If no stack space is needed for local variables, there is no subq $16, %rsp or similar instruction.
(Note that the return address and the previous %rbp pushed to the stack are both 8 bytes in size, 16 bytes in total.)
While %rbp points to the current stack frame, %rsp points to the top of the stack. Because the compiler knows the difference between %rbp and %rsp at any point within the function, it is free to use either one as the base for the local variables.
A stack frame is just the local function's playground: the region of stack the current function uses.
Current versions of GCC disable the stack frame whenever optimizations are used. This makes sense, because for programs written in C, the stack frames are most useful for debugging, but not much else. (You can use e.g. -O2 -fno-omit-frame-pointer to keep stack frames while enabling optimizations otherwise, however.)
Although the same ABI applies to all binaries, no matter what language they are written in, certain other languages do need stack frames for "unwinding" (for example, to "throw exceptions" to an ancestor caller of the current function); i.e. to "unwind" stack frames that one or more functions can be aborted and control passed to some ancestor function, without leaving unneeded stuff on the stack.
When stack frames are omitted -- -fomit-frame-pointer for GCC --, the function implementation changes essentially to
subq $8, %rsp ; Re-align stack frame, and
; reserve memory for local variables
; ... function ...
addq $8, %rsp
ret
Because there is no stack frame (%rbp is used for other purposes, and its value is never pushed to stack), each function call pushes only the return address to the stack, which is an 8-byte quantity, so we need to subtract 8 from %rsp to keep it a multiple of 16. (In general, the value subtracted from and added to %rsp is an odd multiple of 8.)
Function parameters are typically passed in registers. See the ABI link at the beginning of this answer for details, but in short, integral types and pointers are passed in registers %rdi, %rsi, %rdx, %rcx, %r8, and %r9, with floating-point arguments in the %xmm0 to %xmm7 registers.
In some cases you'll see rep ret instead of rep. Don't be confused: the rep ret means the exact same thing as ret; the rep prefix, although normally used with string instructions (repeated instructions), does nothing when applied to the ret instruction. It's just that certain AMD processors' branch predictors don't like jumping to a ret instruction, and the recommended workaround is to use a rep ret there instead.
Finally, I've omitted the red zone above the top of the stack (the 128 bytes at addresses less than %rsp). This is because it is not really useful for typical functions: In the normal have-stack-frame case, you'll want your local stuff to be within the stack frame, to make debugging possible. In the omit-stack-frame case, stack alignment requirements already mean we need to subtract 8 from %rsp, so including the memory needed by the local variables in that subtraction costs nothing.

Related

Dynamic allocation of structure array inside another structure [duplicate]

I wrote a simple code on a 64 bit machine
int main() {
printf("%d", 2.443);
}
So, this is how the compiler will behave. It will identify the second argument to be a double hence it will push 8 bytes on the stack or possibly just use registers across calls to access the variables. %d expects a 4 byte integer value, hence it prints some garbage value.
What is interesting is that the value printed changes everytime I execute this program. So what is happening? I expected it to print the same garbage value everytime not different ones everytime.
It's undefined behaviour, of course, to pass arguments not corresponding to the format, so the language cannot tell us why the output changes. We must look at the implementation, what code it produces, and possibly the operating system too.
My setup is different from yours,
Linux 3.1.10-1.16-desktop x86_64 GNU/Linux (openSuSE 12.1)
with gcc-4.6.2. But it's similar enough that it's reasonable to suspect the same mechanisms.
Looking at the generated assembly (-O3, out of habit), the relevant part (main) is
.cfi_startproc
subq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 16
movl $.LC1, %edi # move format string to edi
movl $1, %eax # move 1 to eax, seems to be the number of double arguments
movsd .LC0(%rip), %xmm0 # move the double to the floating point register
call printf
xorl %eax, %eax # clear eax (return 0)
addq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 8
ret # return
If instead of the double, I pass an int, not much changes, but that significantly
movl $47, %esi # move int to esi
movl $.LC0, %edi # format string
xorl %eax, %eax # clear eax
call printf
I have looked at the generated code for many variations of types and count of arguments passed to printf, and consistently, the first double (or promoted float) arguments are passed in xmmN, N = 0, 1, 2, and the integer (int, char, long, regardless of signedness) are passed in esi, edx, ecx, r8d, r9d and then the stack.
So I venture the guess that printf looks for the announced int in esi, and prints whatever happens to be there.
Whether the contents of esi are in any way predictable when nothing is moved there in main, and what they might signify, I have no idea.
This answer attempts to address some of the sources of variation. It is a follow-up to Daniel Fischer’s answer and some comments to it.
As I do not work with Linux, I cannot give a definitive answer. For a printf later in a large application, there would be a myriad of sources of potential variation. This early in a small application, there should be only a few.
Address space layout randomization (ASLR) is one: The operating system deliberately rearranges some memory randomly to prevent malware for knowing what addresses to use. I do not know if Linux 3.4.4-2 has this.
Another is environment variables. Your shell environment variables are copied into processes it spawns (and accessible through the getenv routine). A few of those might change automatically, so they would have slightly different values. This is unlikely to directly affect what printf sees when it attempts to use a missing integer argument, but there could be cascading effects.
There may be a shared-library loader that runs either before main is called or before printf is called. For example, if printf is in a shared library, rather than built into your executable file, then a call to printf likely actually results in a call to a stub routine that calls the loader. The loader looks up the shared library, finds the module containing printf, loads that module into your process’ address space, changes the stub so that it calls the newly loaded printf directly in the future (instead of calling the loader), and calls printf. As you can imagine, that can be a fairly extensive process and involves, among other things, finding and reading files on disk (all the directories to get to the shared library and the shared library). It is conceivable that some caching or file operations on your system result in slightly different behavior in the loader.
So far, I favor ASLR as the most likely candidate of the ones above. The latter two are likely to be fairly stable; the values involved would usually change occasionally, not frequently. ASLR would change each time, and simply leaving an address in a register would suffice to explain the printf behavior.
Here is an experiment: After the initial printf, insert another printf with this code:
printf("%d\n", 2.443);
int a;
printf("%p\n", (void *) &a);
The second printf prints the address of a, which is likely on the stack. Run the program two or three times and calculate the difference between the value printed by the first printf and the value printed by the second printf. (The second printf is likely to print in hexadecimal, so it might be convenient to change the first to "%x" to make it hexadecimal too.) If the value printed by the second printf varies from run to run, then your program is experiencing ASLR. If the values change from run to run but the difference between them remains constant, then the value that printf has happened upon in the first printf is some address in your process that was left lying around after program initialization.
If the address of a changes but the difference does not remain constant, you might try changing int a; to static int a; to see if comparing the first value to different part of your address space yields a better result.
Naturally, none of this is useful for writing reliable programs; it is just educational with regard to how program loading and initialization works.

GCC compiler gives different results for Windows and Linux? [duplicate]

I wrote a simple code on a 64 bit machine
int main() {
printf("%d", 2.443);
}
So, this is how the compiler will behave. It will identify the second argument to be a double hence it will push 8 bytes on the stack or possibly just use registers across calls to access the variables. %d expects a 4 byte integer value, hence it prints some garbage value.
What is interesting is that the value printed changes everytime I execute this program. So what is happening? I expected it to print the same garbage value everytime not different ones everytime.
It's undefined behaviour, of course, to pass arguments not corresponding to the format, so the language cannot tell us why the output changes. We must look at the implementation, what code it produces, and possibly the operating system too.
My setup is different from yours,
Linux 3.1.10-1.16-desktop x86_64 GNU/Linux (openSuSE 12.1)
with gcc-4.6.2. But it's similar enough that it's reasonable to suspect the same mechanisms.
Looking at the generated assembly (-O3, out of habit), the relevant part (main) is
.cfi_startproc
subq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 16
movl $.LC1, %edi # move format string to edi
movl $1, %eax # move 1 to eax, seems to be the number of double arguments
movsd .LC0(%rip), %xmm0 # move the double to the floating point register
call printf
xorl %eax, %eax # clear eax (return 0)
addq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 8
ret # return
If instead of the double, I pass an int, not much changes, but that significantly
movl $47, %esi # move int to esi
movl $.LC0, %edi # format string
xorl %eax, %eax # clear eax
call printf
I have looked at the generated code for many variations of types and count of arguments passed to printf, and consistently, the first double (or promoted float) arguments are passed in xmmN, N = 0, 1, 2, and the integer (int, char, long, regardless of signedness) are passed in esi, edx, ecx, r8d, r9d and then the stack.
So I venture the guess that printf looks for the announced int in esi, and prints whatever happens to be there.
Whether the contents of esi are in any way predictable when nothing is moved there in main, and what they might signify, I have no idea.
This answer attempts to address some of the sources of variation. It is a follow-up to Daniel Fischer’s answer and some comments to it.
As I do not work with Linux, I cannot give a definitive answer. For a printf later in a large application, there would be a myriad of sources of potential variation. This early in a small application, there should be only a few.
Address space layout randomization (ASLR) is one: The operating system deliberately rearranges some memory randomly to prevent malware for knowing what addresses to use. I do not know if Linux 3.4.4-2 has this.
Another is environment variables. Your shell environment variables are copied into processes it spawns (and accessible through the getenv routine). A few of those might change automatically, so they would have slightly different values. This is unlikely to directly affect what printf sees when it attempts to use a missing integer argument, but there could be cascading effects.
There may be a shared-library loader that runs either before main is called or before printf is called. For example, if printf is in a shared library, rather than built into your executable file, then a call to printf likely actually results in a call to a stub routine that calls the loader. The loader looks up the shared library, finds the module containing printf, loads that module into your process’ address space, changes the stub so that it calls the newly loaded printf directly in the future (instead of calling the loader), and calls printf. As you can imagine, that can be a fairly extensive process and involves, among other things, finding and reading files on disk (all the directories to get to the shared library and the shared library). It is conceivable that some caching or file operations on your system result in slightly different behavior in the loader.
So far, I favor ASLR as the most likely candidate of the ones above. The latter two are likely to be fairly stable; the values involved would usually change occasionally, not frequently. ASLR would change each time, and simply leaving an address in a register would suffice to explain the printf behavior.
Here is an experiment: After the initial printf, insert another printf with this code:
printf("%d\n", 2.443);
int a;
printf("%p\n", (void *) &a);
The second printf prints the address of a, which is likely on the stack. Run the program two or three times and calculate the difference between the value printed by the first printf and the value printed by the second printf. (The second printf is likely to print in hexadecimal, so it might be convenient to change the first to "%x" to make it hexadecimal too.) If the value printed by the second printf varies from run to run, then your program is experiencing ASLR. If the values change from run to run but the difference between them remains constant, then the value that printf has happened upon in the first printf is some address in your process that was left lying around after program initialization.
If the address of a changes but the difference does not remain constant, you might try changing int a; to static int a; to see if comparing the first value to different part of your address space yields a better result.
Naturally, none of this is useful for writing reliable programs; it is just educational with regard to how program loading and initialization works.

Assembly: push vs movl

I have some C code that I compiled with gcc:
int main() {
int x = 1;
printf("%d\n",x);
return 0;
}
I've run it through gdb 7.9.1 and come up with this assembler code for main:
0x0000000100000f40 <+0>: push %rbp # save original frame pointer
0x0000000100000f41 <+1>: mov %rsp,%rbp # stack pointer is new frame pointer
0x0000000100000f44 <+4>: sub $0x10,%rsp # make room for vars
0x0000000100000f48 <+8>: lea 0x47(%rip),%rdi # 0x100000f96
0x0000000100000f4f <+15>: movl $0x0,-0x4(%rbp) # put 0 on the stack
0x0000000100000f56 <+22>: movl $0x1,-0x8(%rbp) # put 1 on the stack
0x0000000100000f5d <+29>: mov -0x8(%rbp),%esi
0x0000000100000f60 <+32>: mov $0x0,%al
0x0000000100000f62 <+34>: callq 0x100000f74
0x0000000100000f67 <+39>: xor %esi,%esi # set %esi to 0
0x0000000100000f69 <+41>: mov %eax,-0xc(%rbp)
0x0000000100000f6c <+44>: mov %esi,%eax
0x0000000100000f6e <+46>: add $0x10,%rsp # move stack pointer to original location
0x0000000100000f72 <+50>: pop %rbp # reclaim original frame pointer
0x0000000100000f73 <+51>: retq
As I understand it, push %rbb pushes the frame pointer onto the stack, so we can retrieve it later with pop %rbp. Then, sub $0x10,%rsp clears 10 bytes of room on the stack so we can put stuff on it.
Later interactions with the stack move variables directly into the stack via memory addressing, rather than pushing them onto the stack:
movl $0x0, -0x4(%rbp)
movl $0x1, -0x8(%rbp)
Why does the compiler use movl rather than push to get this information onto the stack?
Does referencing the register after the memory address also put that value into that register?
It is very common for modern compilers to move the stack pointer once at the beginning of a function, and move it back at the end. This allows for more efficient indexing because it can treat the memory space as a memory mapped region rather than a simple stack. For example, values which are suddenly found to be of no use (perhaps due to an optimized shortcutted operator) can be ignored, rather than forcing one to pop them off the stack.
Perhaps in simpler days, there was a performance reason to use push. With modern processors, there is no advantage, so there's no reason to make special cases in the compiler to use push/pop when possible. It's not like compiler-written assembly code is readable!
While Cort is correct, there is another important reason for this practice of apparently allocating space on the stack. According to the ABI, function calls must find the stack 16 byte aligned. Rather than fiddling with the stack every single time a call needs to be made from a function, it is generally easier and more efficient to adjust the stack for proper alignment first and then modify the values that might otherwise have been pushed onto it.
So, the stack is absolutely adjusted for local variable space, but it is also adjusted to provide correct stack alignment for calls into the standard library.
I'm not an authority on assemblers or compilers but I've played around with MASM back in the day and did spend a whole bunch of time with WinDbg while debugging production C++ issues.
I think the answer to your question is because it's easier.
push/pop instructions write to and read from the stack but they also modify the stack as they are processed. C/C++ compiler uses stack for all its local variables. It does it by shifting stack pointer by the exact number of bytes that is needed to hold all local variables and it does so right when you enter the function.
After that reading and writing all those variables can be done from anywhere in the function and also as many times as you want by simply using the mov instructions. If you look at pure assembly, you might question why create a hole in the stack just to copy two values into that space using mov when you could have done two push instructions.
But look at it from compiler author perspective. The process of entering a function, and allocating stack for local variables is coded separately and is completely decoupled from the process of reading/writing those variables.

Forcing integer constant to be quadword

Writing a compiler for class, and no one in the class could figure out exactly why we couldn't do the straight-forward thing.
cmpq %r13, %r10
movq $0, %r10
cmovne $1, %r10
My best guess is that since cmovXX doesn't explicitly define the size of its arguments like movq or movl, $1 doesn't know how big to be, and therefore, throws a type mismatch tantrum.
My question is, how does one force an integer constant to be a quadword? $1q didn't work, so I'm out of guesses.
Thanks!
Not really. cmov is simply not available (neither Intel, nor AMD created such an encoding of this particular instruction) with an immediate operand. It operates only on registers and memory locations.
Forcing a particular size of an instruction in AT&T syntax is done by appending one of the size prefixes to the instruction's mnemonic - just the way you have done it.
The only instruction in the x86-64 instruction set that can accept a quadword (64-bit) immediate is the mov instruction with a 64-bit register. However, doing movq $0, %rax will give you the ordinary encoding with a 32-bit immediate. In order to force the assembler to emit a 64-bit immediate, you have to use movabs $0, %rax.

Why does gcc use movl instead of push to pass function args?

pay attention to this code :
#include <stdio.h>
void a(int a, int b, int c)
{
char buffer1[5];
char buffer2[10];
}
int main()
{
a(1,2,3);
}
after that :
gcc -S a.c
that command shows our source code in assembly.
now we can see in the main function, we never use "push" command to push the arguments of
the a function into the stack. and it used "movel" instead of that
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $3, 8(%esp)
movl $2, 4(%esp)
movl $1, (%esp)
call a
leave
why does it happen?
what's difference between them?
Here is what the gcc manual has to say about it:
-mpush-args
-mno-push-args
Use PUSH operations to store outgoing parameters. This method is shorter and usually
equally fast as method using SUB/MOV operations and is enabled by default.
In some cases disabling it may improve performance because of improved scheduling
and reduced dependencies.
-maccumulate-outgoing-args
If enabled, the maximum amount of space required for outgoing arguments will be
computed in the function prologue. This is faster on most modern CPUs because of
reduced dependencies, improved scheduling and reduced stack usage when preferred
stack boundary is not equal to 2. The drawback is a notable increase in code size.
This switch implies -mno-push-args.
Apparently -maccumulate-outgoing-args is enabled by default, overriding -mpush-args. Explicitly compiling with -mno-accumulate-outgoing-args does revert to the PUSH method, here.
2019 update: modern CPUs have had efficient push/pop since about Pentium M.
-mno-accumulate-outgoing-args (and using push) eventually became the default for -mtune=generic in Jan 2014.
That code is just directly putting the constants (1, 2, 3) at offset positions from the (updated) stack pointer (esp). The compiler is choosing to do the "push" manually with the same result.
"push" both sets the data and updates the stack pointer. In this case, the compiler is reducing that to only one update of the stack pointer (vs. three). An interesting experiment would be to try changing function "a" to take only one argument, and see if the instruction pattern changes.
gcc does all sorts of optimizations, including selecting instructions based upon execution speed of the particular CPU being optimized for. You will notice that things like x *= n is often replaced by a mix of SHL, ADD and/or SUB, especially when n is a constant; while MUL is only used when the average runtime (and cache/etc. footprints) of the combination of SHL-ADD-SUB would exceed that of MUL, or n is not a constant (and thus using loops with shl-add-sub would come costlier).
In case of function arguments: MOV can be parallelized by hardware, while PUSH cannot. (The second PUSH has to wait for the first PUSH to finish because of the update of the esp register.) In case of function arguments, MOVs can be run in parallel.
Is this on OS X by any chance? I read somewhere that it requires the stack pointer to be aligned at 16-byte boundaries. That could possibly explain this kind of code generation.
I found the article: http://blogs.embarcadero.com/eboling/2009/05/20/5607

Resources