The only thing that I know about the mechanism of how C passes values is that it is done either through a register or the stack.
Register or Stack? Exactly how?
Both. And the conventions will vary by platform.
On x86, values are usually passed by stack. On x64, passing by register is preferred.
In all cases, if you have too many parameters, some will have to be passed by stack.
Refer to x86 calling conventions
Typically (some compilers will do it differently as pointed out) for normal function calls they are passed on the stack. That is usually it is a series of push instructions that just put the data onto the stack.
There are special cases such as system calls where parameters get passed via assembly instructions and registers. In hardware cases they are passed via registers or even certain interrupt signals which consequently write to registers.
On architectures with high numbers of registers they are usually passed via registers such as some RISC and 64 bit architectures.
Related
I am at a situation where returning a value from a function is optional. So whether I return or don't return makes no difference logically, but can I safely assume the same thing performance-wise?
I mean are there any performance overheads(time or memory) when we use a function which returns some value(ex: Int) over the function which returns void?
This is a dummy project and the question is raised plain out of my curiosity.
It depends on the ABI used and whether the called function is inlined.
On x86_64 platforms with System V Application Binary Interface AMD64 (Linux, FreeBSD, macOS, Solaris and Windows Subsystem for Linux) return values with sizeof of up to 16 bytes are returned in registers. Returning up to 16 bytes involves loading the return value into one or two 8-byte registers. Returning larger values involves stores into the stack of the caller through the hidden return value pointer passed into the callee, that also must be loaded in rax register upon return.
See Calling conventions by Agner Fog for a detailed treatment of the calling conventions, in particular §7.1 Passing and returning objects. There are separate calling conventions for passing SIMD types in registers.
As stated, what software-visible processor state needs to go in a jmp_buf on an x86-64 processor when setjmp(jmp_buf env) is called? What processor state does not?
I have been reading a lot about setjmp and longjmp but couldn't find a clear answer to my question. I know it is implementation dependent but I would like to know for the x86_64 architecture.
From the following implementation
it seems that on an x86-64 machine all the callee saved registers (%r12-%r15, %rbp, %rbx) need to be saved as well as the stack pointer, program counter and all the saved arguments of the current environment. However I'm not sure about that, hope someone could clarify that for me.
For example, which x86-64 registers need to be saved? What about condition flags? For example, I think the floating point registers do not need to be saved because they don't contribute to the state of the program.
That's because of the calling convention. setjmp is a function-call that can return multiple times (the first time when you actually call it, later times when a child function calls longjmp), but it's still a function call. Like any function call, the compiler assumes that all call-clobbered registers have been clobbered, so longjmp doesn't need to restore them.
So yes, they're not part of the "program state" on a function call boundary because the compiler-generated asm is definitely not keeping any values in them.
You're looking at glibc's implementation for the x86-64 System V ABI, where all vector / x87 registers are call-clobbered and thus don't have to be saved.
In the Windows x86-64 calling convention, xmm6-15 are call-preserved (just the low 128 bits, not the upper portions of y/zmm6-15), and would have to be part of the jmp_buf.
i.e. it's not the CPU architecture that's relevant here, it's the software calling convention.
Besides the call-preserved registers, one key thing is that it's only legal to longjmp to a jmp_buf saved by a parent function, not from any arbitrary function after the function that called setjmp has returned.
If setjmp had to support that, it would have to save the entire stack frame, or actually (for the function to be able to return, and that parent to be able to return, etc.) the whole stack all the way up to the top. This is obviously insane, and thus it's clear why longjmp has that restriction of only being able to jump to parent / (great) grandparent functions, so it just has to restore the stack pointer to point at the still-existing stack frame and restore whatever local variables in that function might have been modified since setjmp.
(On C / C++ implementations on architectures / calling conventions that use something other than a normal call-stack, a similar argument about the jump-target function being able to return still applies.)
As the jmp_buf is the only place that can be used to restore processor state on a longjmp, it's generally everything that is needed to restore the full state of the machine as it was when setjmpis called.
This obviously depends very much on the processor and the compiler (what exactly does it use of the CPU's features to store program state):
On an ideal pure-stack machine that holds information of CPU state nowhere but the stack, that would be the stack pointer only. Other than in very old or purely academical implementations, such machines do rarely exist. You could, however, write a compiler on a modern machine like an x86 that solely uses the stack to store such information. For such a hypothetical compiler, saving the stack pointer only would suffice to restore program state.
On a more common, practical machine, this might be the stack pointer and the full set of registers used to store program status.
On some CPUs that store program status information in other places, for example in a zero page, and compilers that make use of such CPU features, the jmp_buff would also need to store a copy of this zero page (certain 65xx CPUs or ATmel AVR MCUs and their compilers might use this feature)
In a signal handler under Linux, one has access to the saved context (all register values) of the suspended thread. These register values are obviously architecture dependent. For example, for a PowerPC Little Endian (ppcle) architecture, ucontext->uc_regs->gp_regs is an array that contains the values of the general purpose registers.
For certain architectures there are also defines (e.g., the REG_XXX defines for x86-64) which identify the purpose of the registers. For ppc64le such definitions are missing. How can I figure out which registers are which? The little IBM documentation available did not help...
I'm not aware of this being documented anywhere. However, setup_sigcontext for ppc64 fills in the gp_regs array from a struct pt_regs that forms part of the task state. Therefore, that struct can be taken as a guide for which registers are which. There is also a set of PT_Rxxx defines immediately below the definition of that struct, which confirms bits of the mapping that are not immediately obvious from the struct (e.g. general purpose register 1 is indeed in gp_regs[1]).
Let us say I have three functions, f1(), f2(), and f3(). When f1 is called, it stores information in CPU registers (and I imagine there is other important information as well). Now, depending on a condition that is unknown at compile-time, f1 will call either f2 or f3. f2 and f3 use very different registers, some of which may overlap with those used by f1. Is the following reasoning correct?
The compiler knows which registers a particular function needs during its execution. Therefore, when f1 calls either f2 or f3, the function call code preserves those registers that f2 or f3 use on the stack, regardless of whether or not they are being used by f1.
Or is there some other mechanism by which the compiler preserves registers so that the function that is being returned to doesn't lose its data?
Recall that a programming language is a specification in a document. For C11, read n1570.
Registers do not exist in C (in other words, the nearly obsolete register keyword is no more related to processor registers). They only matter in machine code (often generated by a C compiler).
However, the code generated by a given compiler (for a given instruction set and target system) obey to some conventions, notably the calling conventions and the ABI (read the system V x86-64 ABI governing Linux for an example). Thes conventions define how registers should be used, and which registers are callee-saved or caller-saved. Register allocation is a difficult part of an optimizing compiler's job.
Often the compiler would emit code to spill some of the registers content into the call stack. And a given register can be used for several things (e.g. it could keep two different variables, if they occur in different places in the same function).
In general the calling convention does not depend upon the called function (recall that you can make indirect calls thru function pointers), but mostly of its signature.
"The compiler knows which registers a particular function needs during its execution."
No, it will generally not know this.
For one reason, a function can be from a (third party) library about which the compiler knows nothing. For another reason, what if that function calls another function, and another etetera?
The compiler will just push all "suspect" registers onto the stack and pops them before returning.
I think as others have stated the arguments for a function are typically sent down via a number of registers (thereafter on the stack). Which registers are used depends on the compiler – for gcc see GNU C/assembler: http://cs.lmu.edu/~ray/notes/gasexamples/
A number of principles worth noting:
stack frame
caller (the function calling f1) and callee functions (your f1, f2... functions)
volatile and non-volatile registers. For your question you only don't need to worry about non-volatile registers.
Each function has a stack frame, this is an expandable block of the stack that temporarily stores data that needs to be loaded in and out of registers.
Before each function call (to the callee from the caller) the values you wish to pass down, i.e. your arguments, will be placed in a number of preordained registers (typically 4-6 depending on a the compiler – see link); if there are more arguments than the number of preordained registers then these additional values are stored on the stack (typically the callers stack frame).
If these preordained registers are being used by the caller, then the compiler will push these values onto the caller's stack frame before assigning the arguments to the registers before making the call to the callee (e.g. your f1 function). Once the called function (callee) returns, these values are restored to their respective registers from the stack.
It doesn't matter how or what order a series of functions are called the same system is followed when the compiler converts your C code to assembly/opcode.
From the GCC documentation
On the Intel x86, the force_align_arg_pointer attribute may be applied to individual function definitions, generating an alternate prologue and epilogue that realigns the runtime stack. This supports mixing legacy codes that run with a 4-byte aligned stack with modern codes that keep a 16-byte stack for SSE compatibility. The alternate prologue and epilogue are slower and bigger than the regular ones, and the alternate prologue requires a scratch register; this lowers the number of registers available if used in conjunction with the regparm attribute. The force_align_arg_pointer attribute is incompatible with nested functions; this is considered a hard error.
Specifically, I want to know what is a prologue, epilogue, and SSE compatibility?
From gcc manual:
void TARGET_ASM_FUNCTION_PROLOGUE (FILE *file, HOST_WIDE_INT size)
The prologue is responsible for setting up the stack frame, initializing the frame pointer register, saving registers that must be saved, and allocating size additional bytes of storage for the local variables. file is a stdio stream to which the assembler code should be output.
On machines that have “register windows”, the function entry code does not save on the stack the registers that are in the windows, even if they are supposed to be preserved by function calls; instead it takes appropriate steps to “push” the register stack, if any non-call-used registers are used in the function.
On machines where functions may or may not have frame-pointers, the function entry code must vary accordingly; it must set up the frame pointer if one is wanted, and not otherwise. To determine whether a frame pointer is in wanted, the macro can refer to the variable frame_pointer_needed. The variable's value will be 1 at run time in a function that needs a frame pointer.
void TARGET_ASM_FUNCTION_EPILOGUE (FILE *file, HOST_WIDE_INT size)
If defined, a function that outputs the assembler code for exit from a function. The epilogue is responsible for restoring the saved registers and stack pointer to their values when the function was called, and returning control to the caller. This macro takes the same arguments as the macro TARGET_ASM_FUNCTION_PROLOGUE, and the registers to restore are determined from regs_ever_live and CALL_USED_REGISTERS in the same way.
SSE (Streaming SIMD Extensions) is a collection of 128 bit CPU registers. These registers can be packed with 4, 32-bit scalars after which an operation can be performed on each of the 4 elements simultaneously. In contrast it may take 4 or more operations in regular assembly to do the same thing.