Implementing stack backtrace without using ebp - c

How a stack backtrace can be implemented when the compiler is explicitly told not to use ebp as stack frame pointer?

The answer to this was only ever in comments on the accepted answer on What is the purpose of the EBP frame pointer register?.
Modern debuggers can do stack backtraces even in code compiled with -fomit-frame-pointer. That setting is the default in recent gcc.
gcc puts the necessary stack-unwind info into a .eh_frame_hdr section. See this blog post for more details. It's used for runtime exceptions, too. You'll find it (with objdump -h) in most binaries on a Linux system. It's about 16k for /bin/bash, vs. 572B for GNU /bin/true, 108k for ffmpeg.
There is a gcc option to disable generating it, but it's a "normal" data section, not a debug section that strip removes by default. Otherwise you couldn't backtrace through a library function that didn't have debug symbols. That section may be bigger than the push/mov/pop instructions it replaces, but it has near zero runtime cost (e.g. uop cache).
I think the info stored in that section is a mapping from return-address to size of stack frame. Since every call instruction pushes the address of the following instruction onto the stack, you can identify the parent caller from that address. Instead of pushing ebp to make a linked list of stack frames on the stack, the offset to the next return address is stored in the .eh_frame_hdr section, so it can be used if needed by code that needs to backtrace.

Related

Adding the -O2 option when cross-compiling causes the unwind backtrace to fail

Add -funwind-tables when cross-compiling, you can successfully unwind backtrace through the interface(_Unwind_Backtrace and _Unwind_VRS_Get) in the libgcc library.
But when I added the -O2 option at cross-compiling time, unwind backtrace would fail. I pass -Q -O2 --help=optimizers print out the optimization and testing, but the results and -O2 is different, very strange,
You haven't told us which ARM architecture you are building for - but assuming it's a 32-bit architecture, enabling -O2 has also enabled -fomit-frame-pointer (§ -fomit-frame-pointer)
The frame-pointer
The frame pointer contains the base of the current function's stack frame (and with the knowledge that the caller's frame pointer is stored on the stack, a linked list of all stack frames in the call-tree). It's usually described as fp in documentation - a synonym for r11.
Omitting the frame-pointer
The ARM register file is small at 16 registers - one of which is the program counter.
The frame pointer is one of the 15 that remains and is used only for debugging and diagnostics - specifically to provide stack walk-backs and symbolic debugging.
-fomit-frame-pointer tells the compiler to not maintain a frame pointer, thus liberating the r11 for other uses, and potentially avoiding spill of variables to the stack from registers. It also saves 4 bytes per stack-frame of stack storage and a store and load to the stack.
Naturally, if fp is used as general purpose register, its contents are undefined and walk-backs won't work.
You probably want to reenable the frame pointer with -fno-omit-frame-pointer for your own sanity.

Stacktrace on ARM cortex-M4

When I run into a fault handler on my ARM cortex-M4 (Thumb) I get a snapshot of the CPU register just before the fault occured. With this information I can find the stack pointer where it was. Now, what I want is to backtrace through all functions it passed. The only problem I see here is that I don't have a frame pointer, so I cannot really see where a certain subroutine has saved the LR, ad infinitum.
How would one tackle this problem if the frame pointer is not available in r7?
This blog post discusses this issue with reference to the MIPS architecture - the principles can be readily adapted to ARM architectures.
In short, it describes three possibilities for locating the stack frame for a given SP and PC:
Using compiler-generated debug information (not included in the executable image) to calculate it.
Using compiler-generated stack-unwinding (exception handling) information (included in the executable image) to calculate it.
Scanning the call site to locate the prologue or epilogue code that adjusts the stack pointer, and deducing the stack frame address from that.
Obviously it's very compiler- and compiler-option dependent, and not guaranteed to work in all cases.
R7 is not the frame pointer on the M4, it's R11. R7 is the FP for Cortex-M0+/M1 where only the lower registers are generally available. In anycase, when Cortex-M makes a call to a function using BL and variants, it saves the return address into LR (link register). At function entry, the LR is saved onto the stack. So in theory, to get a call trace, you would "chase" the chain of the LRs.
Unfortunately, the saved location of LR on the stack is not defined by the calling convention, and its location must be deduced from the debug info for that function entry in the DWARF records (in the .elf file). I do not know if there is an utility that would extract the LR locations from an ELF file, but it should not be too difficult.
Richard at ImageCraft is right.
More information can be found here
This works fine with C code. I had a harder applying it to C++ but it's not impossible.

Trying to understand gcc option -fomit-frame-pointer

I asked Google to give me the meaning of the gcc option -fomit-frame-pointer, which redirects me to the below statement.
-fomit-frame-pointer
Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines.
As per my knowledge of each function, an activation record will be created in the stack of the process memory to keep all local variables and some more information. I hope this frame pointer means the address of the activation record of a function.
In this case, what are the type of functions, for which it doesn't need to keep the frame pointer in a register? If I get this information, I will try to design the new function based on that (if possible) because if the frame pointer is not kept in registers, some instructions will be omitted in binary. This will really improve the performance noticeably in an application where there are many functions.
Most smaller functions don't need a frame pointer - larger functions MAY need one.
It's really about how well the compiler manages to track how the stack is used, and where things are on the stack (local variables, arguments passed to the current function and arguments being prepared for a function about to be called). I don't think it's easy to characterize the functions that need or don't need a frame pointer (technically, NO function HAS to have a frame pointer - it's more a case of "if the compiler deems it necessary to reduce the complexity of other code").
I don't think you should "attempt to make functions not have a frame pointer" as part of your strategy for coding - like I said, simple functions don't need them, so use -fomit-frame-pointer, and you'll get one more register available for the register allocator, and save 1-3 instructions on entry/exit to functions. If your function needs a frame pointer, it's because the compiler decides that's a better option than not using a frame pointer. It's not a goal to have functions without a frame pointer, it's a goal to have code that works both correctly and fast.
Note that "not having a frame pointer" should give better performance, but it's not some magic bullet that gives enormous improvements - particularly not on x86-64, which already has 16 registers to start with. On 32-bit x86, since it only has 8 registers, one of which is the stack pointer, and taking up another as the frame pointer means 25% of register-space is taken. To change that to 12.5% is quite an improvement. Of course, compiling for 64-bit will help quite a lot too.
This is all about the BP/EBP/RBP register on Intel platforms. This register defaults to stack segment (doesn’t need a special prefix to access stack segment).
The EBP is the best choice of register for accessing data structures, variables and dynamically allocated work space within the stack. EBP is often used to access elements on the stack relative to a fixed point on the stack rather than relative to the current TOS. It typically identifies the base address of the current stack frame established for the current procedure. When EBP is used as the base register in an offset calculation, the offset is calculated automatically in the current stack segment (i.e., the segment currently selected by SS). Because SS does not have to be explicitly specified, instruction encoding in such cases is more efficient. EBP can also be used to index into segments addressable via other segment registers.
( source - http://css.csail.mit.edu/6.858/2017/readings/i386/s02_03.htm )
Since on most 32-bit platforms, data segment and stack segment are the same, this association of EBP/RBP with the stack is no longer an issue. So is on 64-bit platforms: The x86-64 architecture, introduced by AMD in 2003, has largely dropped support for segmentation in 64-bit mode: four of the segment registers: CS, SS, DS, and ES are forced to 0. These circumstances of x86 32-bit and 64-bit platforms essentially mean that EBP/RBP register can be used, without any prefix, in the processor instructions that access memory.
So the compiler option you wrote about allows the BP/EBP/RBP to be used for other means, e.g., to hold a local variable.
By "This avoids the instructions to save, set up and restore frame pointers" is meant avoiding the following code on the entry of each function:
push ebp
mov ebp, esp
or the enter instruction, which was very useful on Intel 80286 and 80386 processors.
Also, before the function return, the following code is used:
mov esp, ebp
pop ebp
or the leave instruction.
Debugging tools may scan the stack data and use these pushed EBP register data while locating call sites, i.e., to display names of the function and the arguments in the order they have been called hierarchically.
Programmers may have questions about stack frames not in a broad term (that it is a single entity in the stack that serves just one function call and keeps return address, arguments and local variables) but in a narrow sense – when the term stack frames is mentioned in the context of compiler options. From the compiler's perspective, a stack frame is just the entry and exit code for the routine, that pushes an anchor to the stack – that can also be used for debugging and for exception handling. Debugging tools may scan the stack data and use these anchors for back-tracing, while locating call sites in the stack, i.e., to display names of the function in the same order they have been called hierarchically.
That's why it is vital to understand for a programmer what a stack frame is in terms of compiler options – because the compiler can control whether to generate this code or not.
In some cases, the stack frame (entry and exit code for the routine) can be omitted by the compiler, and the variables will directly be accessed via the stack pointer (SP/ESP/RSP) rather than the convenient base pointer (BP/ESP/RSP).
Conditions for a compiler to omit the stack frames for some functions may be different, for example: (1) the function is a leaf function (i.e., an end-entity that doesn't call other functions); (2) no exceptions are used; (3) no routines are called with outgoing parameters on the stack; (4) the function has no parameters.
Omitting stack frames (entry and exit code for the routine) can make code smaller and faster. Still, they may also negatively affect the debuggers' ability to back-trace the stack's data and display it to the programmer. These are the compiler options that determine under which conditions a function should satisfy in order for the compiler to award it with the stack frame entry and exit code. For example, a compiler may have options to add such entry and exit code to functions in the following cases: (a) always, (b) never, (c) when needed (specifying the conditions).
Returning from generalities to particularities: if you use the -fomit-frame-pointer GCC compiler option, you may win on both entry and exit code for the routine, and on having an additional register (unless it is already turned on by default either itself or implicitly by other options, in this case, you are already benefiting from the gain of using the EBP/RBP register and no additional gain will be obtained by explicitly specifying this option if it is already on implicitly). Please note, however, that in 16-bit and 32-bit modes, the BP register doesn't have the ability to provide access to 8-bit parts of it like AX has (AL and AH).
Since this option, besides allowing the compiler to use EBP as a general-purpose register in optimizations, also prevents generating exit and entry code for the stack frame, which complicates the debugging – that's why the GCC documentation explicitly states (unusually emphasizing with a bold style) that enabling this option makes debugging impossible on some machines.
Please also be aware that other compiler options, related to debugging or optimization, may implicitly turn the -fomit-frame-pointer option ON or OFF.
I didn't find any official information at gcc.gnu.org about how do other options affect -fomit-frame-pointer on x86 platforms,
the https://gcc.gnu.org/onlinedocs/gcc-3.4.4/gcc/Optimize-Options.html only states the following:
-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging.
So it is not clear from the documentation per se whether -fomit-frame-pointer will be turned on if you just compile with a single `-O' option on x86 platform. It may be tested empirically, but in this case there is no commitment from the GCC developers to not change the behavior of this option in the future without notice.
However, Peter Cordes has pointed out in comments that there is a difference for the default settings of the -fomit-frame-pointer between x86-16 platforms and x86-32/64 platforms.
This option – -fomit-frame-pointer – is also relevant to the Intel C++ Compiler 15.0, not only to the GCC:
For the Intel Compiler, this option has an alias /Oy.
Here is what Intel wrote about it:
These options determine whether EBP is used as a general-purpose register in optimizations. Options -fomit-frame-pointer and /Oy allow this use. Options -fno-omit-frame-pointer and /Oy- disallow it.
Some debuggers expect EBP to be used as a stack frame pointer, and cannot produce a stack back-trace unless this is so. The -fno-omit-frame-pointer and /Oy- options direct the compiler to generate code that maintains and uses EBP as a stack frame pointer for all functions so that a debugger can still produce a stack back-trace without doing the following:
For -fno-omit-frame-pointer: turning off optimizations with -O0
For /Oy-: turning off /O1, /O2, or /O3 optimizations
The -fno-omit-frame-pointer option is set when you specify option -O0 or the -g option. The -fomit-frame-pointer option is set when you specify option -O1, -O2, or -O3.
The /Oy option is set when you specify the /O1, /O2, or /O3 option. Option /Oy- is set when you specify the /Od option.
Using the -fno-omit-frame-pointer or /Oy- option reduces the number of available general-purpose registers by 1 and can result in slightly less efficient code.
NOTE For Linux* systems: There is currently an issue with GCC 3.2 exception handling. Therefore, the Intel compiler ignores this option when GCC 3.2 is installed for C++ and exception handling is turned on (the default).
Please be aware that the above quote is only relevant for the Intel C++ 15 compiler, not to GCC.
I haven't come across the term "activation record" before, but I would assume it reffers to what is normally called a "stack frame". That is the area on the stack used by the current function.
The frame pointer is a register that holds the address of the current function's stack frame. If a frame pointer is used then on entering the function the old frame pointer is saved to the stack and the frame pointer is set to the stack pointer. On leaving the function the old frame pointer is restored.
Most normal functions don't need a frame pointer for their own operation. The compiler can keep track of the stack pointer offset on all codepaths through the function and generate local variable accesses accordingly.
A frame pointer may be important in some contexts for debugging and exception handling. This is becoming increasingly rare though as modern debugging and exception handling formats are designed to support functions without frame pointers in most cases.
The main time a frame pointer is needed nowadays is if a function uses alloca or variable length arrays. In this case the value of the stack pointer cannot be tracked statically.

In ELF or DWARF, how can I get .PLT section values? -- Trying to get the address of a function on where an instrumentation tool is in

I am working in obtaining all the data of a program using its ELF and DWARF info and by hooking a pin tool to a process that is currently running -- It is kind of a debugger using a Pin tool.
For getting the local variables from the stack I am working with the registers EIP, EBP and ESP which I have access to from Pin.
What stroke me as weird is that I was expecting EIP to be pointing to the current function that was running when the pin tool was attached to the process, but instead EIP is pointing to the section .PLT. In other words, if the pin tool was hooked into the process when Foo() was running, then I was expecting EIP to be pointing to some address inside the Foo function. However it is pointing to the beginning of the .PLT section.
What I need to know is which function the process is currently in -- Is there any way to get the address of the function using the .PLT section? Is there any other ways to get the address of the function from the stack or using Pin? I hope I was clear enough, let me know if there are any questions though.
I might not be understanding exactly what is going on here...is the instruction pointer really in the .plt section or are you just getting a garbage value from Pin ?
You name the instruction pointer you are reading EIP, which might be a problem if you are running on a 64bit system, is that the case ?
You see the instruction pointer register is a 32bit value on a 32bit system, and a 64bit value on a 64bit system. So Pin actually provides 3 REG_* names for the instruction pointer: EIP, RIP and GBP. EIP is always the lower 32bit half of the register, RIP the 64bit value, and GBP one of the two depending on your architecture. Asking for EIP on a 64bit system gives you garbage, same for asking RIP on a 32bit one.
Otherwise, a quick look on Google gives me this. Quoting a bit:
By default the .plt entries are all initialized by the linker not to point to the correct target functions, but instead to point to the dynamic loader itself. Thus, the first time you call any given function, the dynamic loader looks up the function and fixes the target of the .plt so that the next time this .plt slot is used we call the correct function.
And more importantly:
It is possible to instruct the dynamic loader to bind addresses to all of the .plt slots before transferring control to the application—this is done by setting the environment variable LD_BIND_NOW=1 before running the program. This turns out to be useful in some cases when you are debugging a program, for example.
Hope that helps.

Is there a function to invoke a stack dump in C?

Can someone please provide an implementation of a C function that gets invoked to dump the current stack? It's for a x86 linux system. It can be invoked in 2 ways: explicitly by another function, or after a crash (probably as a trap/int handler). The output can be either to the screen or to a file, as indicated by a parameter (handle). A clear explanation/comments of how the stack is unwound would obviously be very helpful. Thank you.
The documentation for the backtrace() function is in the GNU LIBC MANUAL.
following on Adam's answer, the source code that shows how to perform the actual stack backtracing is in gnu libc's backtrace(), under /libc/debug/backtrace.c - not sure if the full link below will be accepted by stackoverflow's html filters...
http://cvs.savannah.gnu.org/viewvc/*checkout*/libc/debug/backtrace.c?root=libc&revision=1.1.2.1&content-type=text%2Fplain
When function calls are nested, the stack grows downwards and builds a chain of stack frames. At any given point in a program it is theoretically possible to backtrace the sequence of stack frames to the original calling point. The backtrace() function navigates the stack frames from the calling point to the beginning of the program and provides an array of return addresses. The implementation of backtrace() in the glibc library contains platform-specific code for each platform.
In the case of an x86 platform, the contents of the ebp (base pointer) and esp (stack pointer) CPU registers, which hold the address of the current stack frame and of the stack pointer for any given function, are used to follow the chain of pointers and move up to the initial stack frame. This allows the sequence of return addresses to be gathered to build the backtrace.
If you would like to know more information on how backtrace() works and how to use it, I would recommend reading Stack Backtracing Inside Your Program (LINUX Journal).
Since you mentioned executing a backtrace from a signal handler for an x86 platform, I would like to add to Adam's answer and direct you to my response to the question he linked to for details on how to ensure a backtrace from a signal handler points to the actual location of the fault.

Resources