Call stack backtrace in C - c

I am trying to get call stack backtrace at my assert/exception handler. Can't include "execinfo.h" therefore can't use int backtrace(void **buffer, int size);.
Also, tried to use __builtin_return_address() but acording to :http://codingrelic.geekhold.com/2009/05/pre-mortem-backtracing.html
... on some architectures, including my beloved MIPS, only __builtin_return_address(0) works.MIPS has no frame pointer, making it difficult to walk back up the stack. Frame 0 can use the return address register directly.
How can I reproduce full call stack backtrace?

I have successfully used the method described here, to get a call trace from stack on MIPS32.
You can then print out the call stack:
void *retaddrs[16];
int n, i;
n = get_call_stack_no_fp (retaddrs, 16);
printf ("CALL STACK: ");
for (i = 0; i < n; i++) {
printf ("0x%08X ", (uintptr_t)retaddrs[i]);
}
printf ("\r\n");
... and if you have the ELF file, then use the addr2line to convert the return addresses to function names:
addr2line -a -f -p -e xxxxxxx.elf addr addr ...
There are of course many gotchas, when using a method like this, including interrupts and exception handlers or results of code optimization. But nevertheless, it might be helpful sometimes.

I have successfully used the method suggested by #Erki A and described here.
Here is a short summary of the method:
The problem:
get a call stack without a frame pointer.
Solution main idea:
conclude from the assembly code what the debugger understood from debug info.
The information we need:
1. Where the return address is kept.
2. What amount the stack pointer is decremented.
To reproduce the whole stack trace one need to:
1. Get the current $sp and $ra
2. Scan towards the beginning of the function and look for "addui
sp,sp,spofft" command (spofft<0)
3. Reprodece prev. $sp (sp- spofft)
4. Scan forward and look for "sw r31,raofft(sp)"
5. Prev. return address stored at [sp+ raofft]
Above I described one iteration. You stop when the $ra is 0.
How to get the first $ra?
__builtin_return_address(0)
How to get the first $sp?
register unsigned sp asm("29");
asm("" : "=r" (sp));
***Since most of my files compiled with micro-mips optimisation I had to deal with micro-mips-ISA.
A lot of issues arose when I tried to analyze code that compiled with microMips optimization(remember that the goal at each step is to reproduce prev. ra and prev. sp):
It makes things a bit more complicated:
1. ra ($31) register contain unaligned return address.
You may find more information at Linked questions.
The unaligned ra helps you understand that you run over different
ISA(micro-mips-isa)
2. There are functions that do not move the sp. You can find more
information [here][3].
(If a "leaf" function only modifies the temporary registers and returns
to a return statement in its caller's code, then there is no need for
$ra to be changed, and there is no need for a stack frame for that
function.)
3. Functions that do not store the ra
4. MicroMips instructions can be both - 16bit and 32bit: run over the
commnds using unsinged short*.
5. There are functions that perform "addiu sp, sp, spofft" more than once
6. micro-mips-isa has couple variations for the same command
for example: addiu,addiusp.
I have decided to ignore part of the issues and that is why it works for 95% of the cases.

Related

How do I call hex data stored in an array with inline assembly?

I have an OS project that I am working on and I am trying to call data that I have read from the disk in C with inline assembly.
I have already tried reading the code and executing it with the assembly call instruction, using inline assembly.
void driveLoop() {
uint16_t sectors = 31;
uint16_t sector = 0;
uint16_t basesector = 40000;
uint32_t i = 40031;
uint16_t code[sectors][256];
int x = 0;
while(x==0) {
read(i);
for (int p=0; p < 256; p++) {
if (readOut[p] == 0) {
} else {
x = 1;
//kprint_int(i);
}
}
i++;
}
kprint("Found sector!\n");
kprint("Loading OS into memory...\n");
for (sector=0; sector<sectors; sector++) {
read(basesector+sector);
for (int p=0; p<256; p++) {
code[sector][p] = readOut[p];
}
}
kprint("Done loading.\n");
kprint("Attempting to call...\n");
asm volatile("call (%0)" : : "r" (&code));
When the inline assembly is called I expect it to run the code from the sectors I read from the "disk" (this is in a VM, because its a hobby OS). What it does instead is it just hangs.
I probably don't much understand how variables, arrays, and assembly work, so if you could fill me in, that would be nice.
EDIT: The data I am reading from the disk is a binary file that was added
to the disk image file with
cat kernel.bin >> disk.img
and the kernel.bin is compiled with
i686-elf-ld -o kernel.bin -Ttext 0x4C4B40 *insert .o files here* --oformat binary
What it does instead is it just hangs.
Run your OS inside BOCHS so you can use BOCHS's built-in debugger to see exactly where it's stuck.
Being able to debug lockups, including with interrupts disabled, is probably very useful...
asm volatile("call (%0)" : : "r" (&code)); is unsafe because of missing clobbers.
But even worse than that it will load a new EIP value from the first 4 bytes of the array, instead of setting EIP to that address. (Unless the data you're loading is an array of pointers, not actual machine code?)
You have the %0 in parentheses, so it's an addressing mode. The assembler will warn you about an indirect call without *, but will assemble it like call *(%eax), with EAX = the address of code[0][0]. You actually want a call *%eax or whatever register the compiler chooses, register-indirect not memory-indirect.
&code and code are both just a pointer to the start of the array; &code doesn't create an anonymous pointer object storing the address of another address. &code takes the address of the array as a whole. code in this context "decays" to a pointer to the first object.
https://gcc.gnu.org/wiki/DontUseInlineAsm (for this).
You can get the compiler to emit a call instruction by casting the pointer to a function pointer.
__builtin___clear_cache(&code[0][0], &code[30][255]); // don't optimize away stores into the buffer
void (*fptr)(void) = (void*)code; // casting to void* instead of the actual target type is simpler
fptr();
That will compile (with optimization enabled) to something like lea 16(%esp), %eax / call *%eax, for 32-bit x86, because your code[][] buffer is an array on the stack.
Or to have it emit a jmp instead, do it at the end of a void function, or return funcptr(); in a non-void function, so the compiler can optimize the call/ret into a jmp tailcall.
If it doesn't return, you can declare it with __attribute__((noreturn)).
Make sure the memory page / segment is executable. (Your uint16_t code[]; is a local, so gcc will allocate it on the stack. This might not be what you want. The size is a compile-time constant so you could make it static, but if you do that for other arrays in other sibling functions (not parent or child), then you lose out on the ability to reuse a big chunk of stack memory for different arrays.)
This is much better than your unsafe inline asm. (You forgot a "memory" clobber, so nothing tells the compiler that your asm actually reads the pointed-to memory). Also, you forgot to declare any register clobbers; presumably the block of code you loaded will have clobbered some registers if it returns, unless it's written to save/restore everything.
In GNU C you do need to use __builtin__clear_cache when casting a data pointer to a function pointer. On x86 it doesn't actually clear any cache, it's telling the compiler that the stores to that memory are not dead because it's going to be read by execution. See How does __builtin___clear_cache work?
Without that, gcc could optimize away the copying into uint16_t code[sectors][256]; because it looks like a dead store. (Just like with your current inline asm which only asks for the pointer in a register.)
As a bonus, this part of your OS becomes portable to other architectures, including ones like ARM without coherent instruction caches where that builtin expands to a actual instructions. (On x86 it purely affects the optimizer).
read(basesector+sector);
It would probably be a good idea for your read function to take a destination pointer to read into, so you don't need to bounce data through your readOut buffer.
Also, I don't see why you'd want to declare your code as a 2D array; sectors are an artifact of how you're doing your disk I/O, not relevant to using the code after it's loaded. The sector-at-a-time thing should only be in the code for the loop that loads the data, not visible in other parts of your program.
char code[sectors * 512]; would be good.

SPARC assembly jmp \boot

I'll explain the problem briefly. I have a Leon3 board (gr-ut-g99). Using GRMON2 I can load executables at the desired address in the board.
I have two programs. Let's call them A and B. I tried to load both in memory and individually they work.
What I would like to do now is to make the A program call the B program.
Both programs are written in C using a variant of the gcc compiler (the Gaisler Sparc GCC).
To do the jump I wrote a tiny inline assembler function in program A that jumps to a memory address where I loaded the program B.
below a snippet of the program A
unsigned int return_address;
unsigned int * const RAM_pointer = (unsigned int *) RAM_ADDRESS;
printf("RAM pointer set to: 0x%08x \n",(unsigned int)RAM_pointer);
printf("jumping...\n");
__asm__(" nop;" //clean the pipeline
"jmp %1;" // jmp to programB
:"=r" (return_address)
:"r" (RAM_pointer)
);
RAM_ADDRESS is a #define
#define RAM_ADDRESS 0x60000000
The program B is a simple hello world. The program B is loaded at the 0x60000000 address. If I try to run it, it works!
int main()
{
printf ("HELLO! I'M BOOTED! \n");
fflush(stdout);
return 0;
}
What I expect when I run the ProgramA, is to see the "jumping..." message on the console and then see the "HELLO! I'M BOOTED!" from the programB
What happens instead an IU exception.
Below I posted the messages show by grmon2 monitor. I also reported the "inst" report which should show the last operations performed before the exception.
grmon2> run
IU exception (tt = 0x07, mem address not aligned)
0x60004824: 9fc04000 call %g1
grmon2> inst
TIME ADDRESS INSTRUCTION RESULT SYMBOL
407085 600047FC mov %i3, %o2 [600063B8] -
407086 60004800 cmp %i4 [00000013] -
407089 60004804 be 0x60004970 [00000000] -
407090 60004808 mov %i0, %o0 [6000646C] -
407091 6000480C mov %i4, %o3 [00000013] -
407092 60004810 cmp %i4, %l0 [80000413] -
407108 60004814 bleu 0x60004820 [00000000] -
407144 60004818 ld [%i1 + 0x20], %o1 [FFFFFFFF] -
407179 60004820 ld [%i1 + 0x28], %g1 [FFFFFFFF] -
407186 60004824 call %g1 [ TRAP ] -
I also tried to substitute the "jmp" with a "jmpl" or a "call" but it does not worked.
I'm quite confused.
I do not know how to cope well with the problem and therefore I do not know what other information it is necessary to provide.
I can say that, the programB is loaded at 0x60000000 and the entry_point is, of course, 0x60000000. Running directly program B from that entry point it works good!
Thanks in advance for your help!
Looks to me like you did execute the jump, and it got to program B, as evidenced by the addresses of the instructions in the trace buffer. But where you crashed was in stdio trying to print stuff. Stdio makes extensive use of function pointers, and the sequence clearly shows a call instruction with the target address in a register, which indicates use of a function pointer.
I suggest putting fflush(stdout) in program A just before the jump, and this will allow you to see the messages before doing the jump. Then, in program B, instead of using printf, just put some known value in memory that you can examine later via the monitor to verify that it got there.
My guess is that the stdio library has some data or parameter that needs to be set up at the start of the program, and that's not being done or not done properly. Not sure about the platform you are running on, but do you have some sort of debugging or single stepping ability, like in a debugger? If so, just single step through the jump and follow where the program goes.

Buffer overflow in C

I'm attempting to write a simple buffer overflow using C on Mac OS X 10.6 64-bit. Here's the concept:
void function() {
char buffer[64];
buffer[offset] += 7; // i'm not sure how large offset needs to be, or if
// 7 is correct.
}
int main() {
int x = 0;
function();
x += 1;
printf("%d\n", x); // the idea is to modify the return address so that
// the x += 1 expression is not executed and 0 gets
// printed
return 0;
}
Here's part of main's assembler dump:
...
0x0000000100000ebe <main+30>: callq 0x100000e30 <function>
0x0000000100000ec3 <main+35>: movl $0x1,-0x8(%rbp)
0x0000000100000eca <main+42>: mov -0x8(%rbp),%esi
0x0000000100000ecd <main+45>: xor %al,%al
0x0000000100000ecf <main+47>: lea 0x56(%rip),%rdi # 0x100000f2c
0x0000000100000ed6 <main+54>: callq 0x100000ef4 <dyld_stub_printf>
...
I want to jump over the movl instruction, which would mean I'd need to increment the return address by 42 - 35 = 7 (correct?). Now I need to know where the return address is stored so I can calculate the correct offset.
I have tried searching for the correct value manually, but either 1 gets printed or I get abort trap – is there maybe some kind of buffer overflow protection going on?
Using an offset of 88 works on my machine. I used Nemo's approach of finding out the return address.
This 32-bit example illustrates how you can figure it out, see below for 64-bit:
#include <stdio.h>
void function() {
char buffer[64];
char *p;
asm("lea 4(%%ebp),%0" : "=r" (p)); // loads address of return address
printf("%d\n", p - buffer); // computes offset
buffer[p - buffer] += 9; // 9 from disassembling main
}
int main() {
volatile int x = 7;
function();
x++;
printf("x = %d\n", x); // prints 7, not 8
}
On my system the offset is 76. That's the 64 bytes of the buffer (remember, the stack grows down, so the start of the buffer is far from the return address) plus whatever other detritus is in between.
Obviously if you are attacking an existing program you can't expect it to compute the answer for you, but I think this illustrates the principle.
(Also, we are lucky that +9 does not carry out into another byte. Otherwise the single byte increment would not set the return address how we expected. This example may break if you get unlucky with the return address within main)
I overlooked the 64-bitness of the original question somehow. The equivalent for x86-64 is 8(%rbp) because pointers are 8 bytes long. In that case my test build happens to produce an offset of 104. In the code above substitute 8(%%rbp) using the double %% to get a single % in the output assembly. This is described in this ABI document. Search for 8(%rbp).
There is a complaint in the comments that 4(%ebp) is just as magic as 76 or any other arbitrary number. In fact the meaning of the register %ebp (also called the "frame pointer") and its relationship to the location of the return address on the stack is standardized. One illustration I quickly Googled is here. That article uses the terminology "base pointer". If you wanted to exploit buffer overflows on other architectures it would require similarly detailed knowledge of the calling conventions of that CPU.
Roddy is right that you need to operate on pointer-sized values.
I would start by reading values in your exploit function (and printing them) rather than writing them. As you crawl past the end of your array, you should start to see values from the stack. Before long you should find the return address and be able to line it up with your disassembler dump.
Disassemble function() and see what it looks like.
Offset needs to be negative positive, maybe 64+8, as it's a 64-bit address. Also, you should do the '+7' on a pointer-sized object, not on a char. Otherwise if the two addresses cross a 256-byte boundary you will have exploited your exploit....
You might try running your code in a debugger, stepping each assembly line at a time, and examining the stack's memory space as well as registers.
I always like to operate on nice data types, like this one:
struct stackframe {
char *sf_bp;
char *sf_return_address;
};
void function() {
/* the following code is dirty. */
char *dummy;
dummy = (char *)&dummy;
struct stackframe *stackframe = dummy + 24; /* try multiples of 4 here. */
/* here starts the beautiful code. */
stackframe->sf_return_address += 7;
}
Using this code, you can easily check with the debugger whether the value in stackframe->sf_return_address matches your expectations.

Why GDB jumps unpredictably between lines and prints variables as "<value optimized out>"?

Can anyone explain this behavior of gdb?
900 memset(&new_ckpt_info,'\0',sizeof(CKPT_INFO));
(gdb)
**903 prev_offset = cp_node->offset;**
(gdb)
**905 m_CPND_CKPTINFO_READ(ckpt_info,(char *)cb->shm_addr.ckpt_addr+sizeof(CKPT_** HDR),i_offset);
(gdb)
**903 prev_offset = cp_node->offset;**
(gdb)
**905 m_CPND_CKPTINFO_READ(ckpt_info,(char *)cb->shm_addr.ckpt_addr+sizeof(CKPT_ HDR),i_offset);**
(gdb)
**908 bitmap_offset = client_hdl/32;**
(gdb)
**910 bitmap_value = cpnd_client_bitmap_set(client_hdl%32);**
(gdb)
**908 bitmap_offset = client_hdl/32;**
(gdb)
**910 bitmap_value = cpnd_client_bitmap_set(client_hdl%32);**
(gdb)
**908 bitmap_offset = client_hdl/32;**
(gdb)
**910 bitmap_value = cpnd_client_bitmap_set(client_hdl%32);**
(gdb)
913 found = cpnd_find_exact_ckptinfo(cb , &ckpt_info , bitmap_offset , &offset , &prev_offset);
(gdb)
916 if(!found)
(gdb) p found
$1 = <value optimized out>
(gdb) set found=0
Left operand of assignment is not an lvalue.
Why after executing line 903 it again executes the same for 905 908 910?
Another things is found is a bool-type variable, so why it is showing value optimized out?
I am not able to set the value of found as well.
This seems to be a compiler optimization (in this case its -O2); how can I still set the value of found?
To debug optimized code, learn assembly/machine language.
Use the GDB TUI mode. My copy of GDB enables it when I type the minus and Enter. Then type C-x 2 (that is hold down Control and press X, release both and then press 2). That will put it into split source and disassembly display. Then use stepi and nexti to move one machine instruction at a time. Use C-x o to switch between the TUI windows.
Download a PDF about your CPU's machine language and the function calling conventions. You will quickly learn to recognize what is being done with function arguments and return values.
You can display the value of a register by using a GDB command like p $eax
Recompile without optimizations (-O0 on gcc).
Declare found as "volatile". This should tell the compiler to NOT optimize it out.
volatile int found = 0;
The compiler will start doing very clever things with optimisations turned on. The debugger will show the code jumping forward and backwards alot due to the optimized way variables are stored in registers. This is probably the reason why you can't set your variable (or in some cases see its value) as it has been cleverly distributed between registers for speed, rather than having a direct memory location that the debugger can access.
Compile without optimisations?
You pretty much can't set the value of found. Debugging optimized programs is rarely worth the trouble, the compiler can rearrange the code in ways that it'll in no way correspond to the source code (other than producing the same result), thus confusing debuggers to no end.
Typically, boolean values that are used in branches immediately after they're calculated like this are never actually stored in variables. Instead, the compiler just branches directly off the condition codes that were set from the preceding comparison. For example,
int a = SomeFunction();
bool result = --a >= 0; // use subtraction as example computation
if ( result )
{
foo();
}
else
{
bar();
}
return;
Usually compiles to something like:
call .SomeFunction ; calls to SomeFunction(), which stores its return value in eax
sub eax, 1 ; subtract 1 from eax and store in eax, set S (sign) flag if result is negative
jl ELSEBLOCK ; GOTO label "ELSEBLOCK" if S flag is set
call .foo ; this is the "if" black, call foo()
j FINISH ; GOTO FINISH; skip over the "else" block
ELSEBLOCK: ; label this location to the assembler
call .bar
FINISH: ; both paths end up here
ret ; return
Notice how the "bool" is never actually stored anywhere.
When debugging optimized programs (which may be necessary if the bug doesn't show up in debug builds), you often have to understand assembly compiler generated.
In your particular case, return value of cpnd_find_exact_ckptinfo will be stored in the register which is used on your platform for return values. On ix86, that would be %eax. On x86_64: %rax, etc. You may need to google for '[your processor] procedure calling convention' if it's none of the above.
You can examine that register in GDB and you can set it. E.g. on ix86:
(gdb) p $eax
(gdb) set $eax = 0
Im using QtCreator with gdb.
Adding
QMAKE_CXXFLAGS += -O0
QMAKE_CXXFLAGS -= -O1
QMAKE_CXXFLAGS -= -O2
QMAKE_CXXFLAGS -= -O3
Works well for me

C - How to create a pattern in code segment to recognize it in memory dump?

I dump my RAM (a piece of it - code segment only) in order to find where is which C function being placed. I have no map file and I don't know what boot/init routines exactly do.
I load my program into RAM, then if I dump the RAM, it is very hard to find exactly where is what function. I'd like to use different patterns build in the C source, to recognize them in the memory dump.
I've tryed to start every function with different first variable containing name of function, like:
char this_function_name[]="main";
but it doesn't work, because this string will be placed in the data segment.
I have simple 16-bit RISC CPU and an experimental proprietary compiler (no GCC or any well-known). The system has 16Mb of RAM, shared with other applications (bootloader, downloader). It is almost impossible to find say a unique sequence of N NOPs or smth. like 0xABCD. I would like to find all functions in RAM, so I need unique identificators of functions visible in RAM-dump.
What would be the best pattern for code segment?
If it were me, I'd use the symbol table, e.g. "nm a.out | grep main". Get the real address of any function you want.
If you really have no symbol table, make your own.
struct tab {
void *addr;
char name[100]; // For ease of searching, use an array.
} symtab[] = {
{ (void*)main, "main" },
{ (void*)otherfunc, "otherfunc" },
};
Search for the name, and the address will immediately preceed it. Goto address. ;-)
If your compiler has inline asm you can use it to create a pattern. Write some NOP instructions which you can easily recognize by opcodes in memory dump:
MOV r0,r0
MOV r0,r0
MOV r0,r0
MOV r0,r0
How about a completely different approach to your real problem, which is finding a particular block of code: Use diff.
Compile the code once with the function in question included, and once with it commented out. Produce RAM dumps of both. Then, diff the two dumps to see what's changed -- and that will be the new code block. (You may have to do some sort of processing of the dumps to remove memory addresses in order to get a clean diff, but the order of instructions ought to be the same in either case.)
Numeric constants are placed in the code segment, encoded in the function's instructions. So you could try to use magic numbers like 0xDEADBEEF and so on.
I.e. here's the disassembly view of a simple C function with Visual C++:
void foo(void)
{
00411380 push ebp
00411381 mov ebp,esp
00411383 sub esp,0CCh
00411389 push ebx
0041138A push esi
0041138B push edi
0041138C lea edi,[ebp-0CCh]
00411392 mov ecx,33h
00411397 mov eax,0CCCCCCCCh
0041139C rep stos dword ptr es:[edi]
unsigned id = 0xDEADBEEF;
0041139E mov dword ptr [id],0DEADBEEFh
You can see the 0xDEADBEEF making it into the function's source. Note that what you actually see in the executable depends on the endianness of the CPU (tx. Richard).
This is a x86 example. But RISC CPUs (MIPS, etc) have instructions moving immediates into registers - these immediates can have special recognizable values as well (although only 16-bit for MIPS, IIRC).
Psihodelia - it's getting harder and harder to catch your intention. Is it just a single function you want to find? Then can't you just place 5 NOPs one after another and look for them? Do you control the compiler/assembler/linker/loader? What tools are at your disposal?
As you noted, this:
char this_function_name[]="main";
... will end up setting a pointer in your stack to a data segment containing the string. However, this:
char this_function_name[]= { 'm', 'a', 'i', 'n' };
... will likely put all these bytes in your stack so you will be able to recognize the string in your code (I just tried it on my platform).
Hope this helps
Why not get each function to dump its own address. Something like this:
void* fnaddr( char* fname, void* addr )
{
printf( "%s\t0x%p\n", fname, addr ) ;
return addr ;
}
void test( void )
{
static void* fnaddr_dummy = fnaddr( __FUNCTION__, test ) ;
}
int main (int argc, const char * argv[])
{
static void* fnaddr_dummy = fnaddr( __FUNCTION__, main ) ;
test() ;
test() ;
}
By making fnaddr_dummy static, the dump is done once per-function. Obviously you would need to adapt fnaddr() to support whatever output or logging means you have on your system. Unfortunately, if the system performs lazy initialisation, you'll only get the addresses of the functions that are actually called (which may be good enough).
You could start each function with a call to the same dummy function like:
void identifyFunction( unsigned int identifier)
{
}
Each of your functions would call the identifyFunction-function with a different parameter (1, 2, 3, ...). This will not give you a magic mapfile, but when you inspect the code dump you should be able to quickly find out where the identifyFunction is because there will be lots of jumps to that address. Next scan for those jump and check before the jump to see what parameter is passed. Then you can make your own mapfile. With some scripting this should be fairly automatic.

Resources