call stack unwinding in ARM cortex m3 - c

I would like to create a debugging tool which will help me debug better my application.
I'm working bare-bones (without an OS). using IAR embedded workbench on Atmel's SAM3.
I have a Watchdog timer, which calls a specific IRQ in case of timeout (This will be replaced with a software reset on release).
In the IRQ handler, I want to print out (UART) the stack trace, of where exactly the Watchdog timeout occurred.
I looked in the web, and I didn't find any implementation of that functionality.
Anyone has an idea on how to approach this kind of thing ?
EDIT: OK, I managed to grab the return address from the stack, so I know exactly where the WDT timeout occurred.
Unwinding the whole stack is not simple as it first appears, because each function pushes different amount of local variables into the stack.
The code I end up with is this (for others, who may find it usefull)
void WDT_IrqHandler( void )
{
uint32_t * WDT_Address;
Wdt *pWdt = WDT ;
volatile uint32_t dummy ;
WDT_Address = (uint32_t *) __get_MSP() + 16 ;
LogFatal ("Watchdog Timer timeout,The Return Address is %#X", *WDT_Address);
/* Clear status bit to acknowledge interrupt */
dummy = pWdt->WDT_SR ;
}

ARM defines a pair of sections, .ARM.exidx and .ARM.extbl, that contain enough information to unwind the stack without debug symbols. These sections exist for exception handling but you can use them to perform a backtrace as well. Add -funwind-tables to force GCC to include these sections.

To do this with ARM, you will need to tell your compiler to generate stack frames. For instance with gcc, check the option -mapcs-frame. It may not be the one you need, but this will be a start.
If you do not have this, it will be nearly impossible to "unroll" the stack, because you will need for each function the exact stack usage depending on parameters and local variables.
If you are looking for some exemple code, you can check dump_stack() in Linux kernel sources, and find back the related piece of code executed for ARM.

It should be pretty straight forward to follow execution. Not programmatically in your isr...
We know from the ARM ARM that on a Cortex-M3 it pushes xPSR,
ReturnAddress, LR (R14), R12, R3, R2, R1, and R0 on the stack. mangles the lr so it can detect a return from interrupt then calls the entry point listed in the vector table. if you implement your isr in asm to control the stack, you can have a simple loop that disables the interrupt source (turns off the wdt, whatever, this is going to take some time) then goes into a loop to dump a portion of the stack.
From that dump you will see the lr/return address, the function/instruction that was interrupted, from a disassembly of your program you can then see what the compiler has placed on the stack for each function, subtract that off at each stage and go as far back as you like or as far back as you have printed the stack contents.
You could also make a copy of the stack in ram and dissect it later rather than doing such things in an isr (the copy still takes too much time but is less intrusive than waiting on the uart).
If all you are after is the address of the instruction that was interrupted, that is the most trivial task, just read that from the stack, it will be at a known place, and print it out.

Did I hear my name? :)
You will probably need a tiny bit of inline assembly. Just figure out the format of the stack frames, and which register holds the ordinary1 stack pointer, and transfer the relevant values into C variables from which you can format strings for output to the UART.
It shouldn't be too tricky, but of course (being rather low-level) you need to pay attention to the details.
1As in "non-exception"; not sure if the ARM has different stacks for ordinary code and exceptions, actually.

Your watchdog timer can fire at any point, even when the stack does not contain enough information to unwind (e.g. stack space has been allocated for register spill, but the registers not copied yet).
For properly optimized code, you need debug info, period. All you can do from a watchdog timer is a register and stack dump in a format that is machine readable enough to allow conversion into a core dump for gdb.

Related

Reserving space in SRAM for SRAM decay experiment (C ; AVR atmega644p ; Atmel Studio 7)

I am looking to perform some experiments on an atmega644p looking at evaluating the amount of decay in SRAM between power cycles. My method is to set a number of bytes in SRAM to 0xFF, then when the mcu powers back up, count the number of remaining 1s in these bytes.
For this to work, I need to read and write the array of 1s to/from a known memory address in SRAM. So far I have code which writes the values to a specific address using a pointer set to 0x1000, and then on power up I begin reading the array from this address. However, I need a way of guaranteeing that this section of SRAM memory (say, 0x100 + 64 bytes) is not allocated to other variables/overwritten before it can be read.
I have looked online at the possibility of allocating memory segments - I don't know if this is a good solution in this case, and am not even too sure how to go about doing this. Can anyone suggest a neat way of approaching this?
Please ask any questions for clarification.
Thanks for your help.
If you're using AVR/GNU, then when C app starts it clears whole the memory and initializes global variables as required.
To avoid that, you can configure the linker to exclude all start files using options -nostartfiles -nodefaultlibs -nostdlib
If you're using Atmel Studio you can configure it like this:
After that you can mark your main for it to be called as initialization code:
int main(void) __attribute__((naked, section(".init9")));
Now you'll have the "naked" code, which does not perform ANY initialization.
That means you need at least to initialize the stack pointer and clear register r1 (which is assumed by avr-gcc to contain zero):
int main(void) {
asm volatile (
"clr r1\n" // load zero into r1
"cli" // clear I flag
);
SPL = (uint8_t)RAMEND;
SPH = (uint8_t)(RAMEND >> 8);
... // here goes your code
for(;;); // do not leave main()!
}
After this you'll have ALL global variables uninitialized. You can declare, for example a global array and check its content on startup.
You'll need to come up with a custom region in RAM by reserving space for it in your linker file. It is important that it is marked "no init" or similar, or otherwise .bss initialization and similar might happen on it before main() is called. How to do this is linker-specific.
However, writing software for this seems needlessly cumbersome. Simply use an in-circuit debugger:
Ensure that you are using a debugger which does not power the target.
Download a program which uses no RAM at all into flash. Confirm this by checking the map file.
Set the whole RAM to 0xFF through the debugger.
Remove power while keeping the in-circuit debugger connected.
Wait x time units.
Power up, hit MCU reset in the debugger, memory dump the whole RAM.
Any half-decent tool chain should be able to do this for you.

Stacktrace on ARM cortex-M4

When I run into a fault handler on my ARM cortex-M4 (Thumb) I get a snapshot of the CPU register just before the fault occured. With this information I can find the stack pointer where it was. Now, what I want is to backtrace through all functions it passed. The only problem I see here is that I don't have a frame pointer, so I cannot really see where a certain subroutine has saved the LR, ad infinitum.
How would one tackle this problem if the frame pointer is not available in r7?
This blog post discusses this issue with reference to the MIPS architecture - the principles can be readily adapted to ARM architectures.
In short, it describes three possibilities for locating the stack frame for a given SP and PC:
Using compiler-generated debug information (not included in the executable image) to calculate it.
Using compiler-generated stack-unwinding (exception handling) information (included in the executable image) to calculate it.
Scanning the call site to locate the prologue or epilogue code that adjusts the stack pointer, and deducing the stack frame address from that.
Obviously it's very compiler- and compiler-option dependent, and not guaranteed to work in all cases.
R7 is not the frame pointer on the M4, it's R11. R7 is the FP for Cortex-M0+/M1 where only the lower registers are generally available. In anycase, when Cortex-M makes a call to a function using BL and variants, it saves the return address into LR (link register). At function entry, the LR is saved onto the stack. So in theory, to get a call trace, you would "chase" the chain of the LRs.
Unfortunately, the saved location of LR on the stack is not defined by the calling convention, and its location must be deduced from the debug info for that function entry in the DWARF records (in the .elf file). I do not know if there is an utility that would extract the LR locations from an ELF file, but it should not be too difficult.
Richard at ImageCraft is right.
More information can be found here
This works fine with C code. I had a harder applying it to C++ but it's not impossible.

Custom Bootloader for Kinetis MKE06Z microcontrollers on IAR EWARM issue

First I'd like to introduce myself, as I'm new to the site. I'm an Electronic Engineer, specialized in embedded systems design and development. I've been gathering info from the site for a long time, and I think that there's a lot of people with great deal of knowledge. I'm hoping some other of you may have stumbled upon this or a similar issue.
I've been having some trouble in the implementation of a custom bootloader for a Kinetis MKE06Z microcontroller, not in the bootloader itself but in the relocation of the application code and the behavior after jumping to it. The application is completely coded in C.
The bootloader executes everything as expected, determines if it should run or jump to user application. This is the sequence that implements the jump:
__disable_interrupt();
SCB->VTOR = RELOCATION_VECTOR_ADDR & 0x3FFFFE00;
JumpToUserApplication(RELOCATION_VECTOR_ADDR);
where:
void JumpToUserApplication(uint32_t userStartup)
{
/* set up stack pointer */
asm("LDR r1, [r0]");
asm("MOV r13, r1");
/* jump to application reset vector */
asm("ADDS r0,r0,#0x04 ");
asm("LDR r0, [r0]");
asm("BX r0");
}
as implemented in Frescale's AN4767.
So far, so good. Once the jump is executed, I trace the application behavior (on the Disassembly Window) and find out after some instructions, it gets stuck at some specific address with a jump instruction, which ends up being an infinite loop. I then run it step by step to determine which was the instruction that causes this malfunction. It's very strange, as it is running OK and suddenly jumps to a RAM address. A couple of cycles and then jumps to the infinite loop. I took note of the addresses with the instruction causing this strange jump and the one with the infinite loop. I look at the core registers and find out there is an exception, and notice it's the number 0x03 (Hard Fault). Then switch to debugging the user application.
Once in the user application, I start debugging. The user application works fine running like this (no jump from the bootloader). Then I look for the relevant addresses and discover that the routine causing the hard fault when jumping from bootloader is from IAR: __iar_data_init3. The thing is, it's part of a precompiled library and I'm not sure if it's safe to remove it (by removing the __iar_program_start and replacing it directly with the call to main on the startup file.
The real question is: why does the application behave like that after the jump from the booloader but not if there is no such jump? Why does this routine jumps to a RAM address (when it shouldn't)?
Of course, it may be a little to specific, but hopefully there's someone that can help me.
It seems that something IAR does with the linker configuration is not very clear to me, but has something to do with this problem. The thing is I relocated .text segment:
define symbol __ICFEDIT_intvec_start__ = 0x00001800;
define symbol __ICFEDIT_region_ROM_start__ = 0x00002000;
define symbol __ICFEDIT_region_ROM_end__ = 0x0000FFFF;
define region APP_ROM = mem:[from (__ICFEDIT_region_ROM_start__) to (__ICFEDIT_region_ROM_end__)];
place at address mem:__ICFEDIT_intvec_start__ { readonly section .intvec };
place at start of APP_ROM { readonly section .text };
It seems that the linker doesn't appreciate this and something make the app misbehave when jumping from other app. Instead of this, keeping the original .icf file and editing within the GUI only the .intvec_start solved the problem, but code starts right next to the vector table. Not an issue, but I wanted to relocate code a little farther.
Thanks.

Forcing a function to restore all registers before making a function call

I am using an EK-LM4F120XL board, which contains a cortex-M4 processor. I also use GCC-ARM-none-eabi as toolchain.
I am building on a little hobby project, which slowly becomes an operating system. An important part of this is that I need to switch out registers to switch processes. This happens inside an interrupt and this specific processor makes sure that all the temporary registers (r0-r3, r12, lr) are pushed to the process stack. So in order to continue I need to write the content of r4-r11 and the SP to a place in memory, I need to load the r4-r11 of the new process, load its stackpointer and return. Additionally the lr value contains some information about the process that was interrupted, so I need information from that register too.
All of this works, because I wrote it in assembly. I linked the assembly function directly to the interrupt, so I have full control over what happens to the registers. The combination of C and inline assembly did not work because the compiler usually pushes some registers to the stack and that is fatal. But the OS is growing and the context change is growing along: there are now also some global variables that need changing, etc. All of this is doable in assembly, but its becoming a pain: assembly is hard to read and to debug. So I want a C/Assemlby combo. Basically I am looking for something like this:
void contextSwitch(void){
//Figure out what the next process will be
//Change every variable that needs changing
// Restore register state to the moment of interrupt. The following function will not return in the sense that it will end the interrupt.
swapRegisters(oldProc, newProc);
}
And then write only swapRegisters in assembly. Is there a way to achieve this? Is my solution even the best solution?
There is no portable method of directly accessing CPU registers in C; you will need assembler, in-line assembler, compiler intrinsics or a kernel library (that uses assembler code).
The details of how that is done for Cortex-M are well covered elsewhere and probably too complex to be repeated here: The specifics of doing this in Cortex-M4(F) are described at the ARM Info Center site here. The approach is broadly similar for the Cortex-M3 except for the FPU considerations, an M3 specific description of context switching is provided in this Embedded.com article.
As you can never have enough explanations because different authors make some things clearer than others or give better or more directly applicable examples, here's another - also M3 based, but will work on M4 if not using the FPU or for M4's without an FPU. And yet another example.

Saving registers state in COM program

I disassembled a simple DOS .COM program and there was some code which saves and restores registers values
PUSH AX ; this is the first instruction
PUSH CX
....
POP CX
POP AX
MOV AX, 0x00 0x4C
INT 21 // call DOS interrupt 21 => END
This is very similar to function prologue and epilogue in C programs. But prologues are added automatically by compiler, and the program above was written manually in assembler, so the programmer took full responsibility for saving and restoring values in this code.
My question is what will happen if I unintentionally forgot to save some registers in my program?
And what if I intentionally replace these instructions to NOP in HEX editor? Will this lead to program crash? And why called function is responsible for saving outer context on the stack? From my point of view this should be done somehow in calling function to prevent problems if I use 3rd party libraries and poorly written code which may break my program execution.
One problem of making the calling function save all of its working registers before calling another function is that sometimes a function is interrupted (i.e. a hardware interrupt) without its knowledge. In DOS, for example, there was that pesky 54 millisecond timer tick. 18 times per second, a hardware interrupt would transfer control from whatever code was executing to the timer tick handler. This happened automatically unless your program specifically disabled interrupts.
The timer tick handler would then save all of the registers it was going to use, do its work, and then restore the registers it saved before returning.
Sure, you could say that interrupt handlers are special, but why? Even with the paucity of registers on the 8086 (AX, BX, CX, DX, SI, DI, Flags -- did I forget anything? I purposely didn't include the segment registers), making a function save its entire state before transferring control means that you'd be using a lot of unnecessary stack space and execution cycles to save things because they might be modified. But if the called function is responsible for saving just the registers it uses, and it only uses AX and CX, then it can save just those two registers. It makes for smaller and faster code, and much less stack space usage.
When you start talking about call hierarchies that are many levels deep, the difference between pushing 8 registers rather than 2 registers adds up pretty quickly.
Consider the x86-64, with its 64 general purpose registers. Do you really think a function should be forced to save all 64 of those registers before calling another function, even when the called function only uses two of them? Saving 64 64-bit registers requires 512 bytes of stack space. As opposed to saving two registers requiring only 16 bytes.
The primary point of writing things in assembly language these days is to write faster and smaller code than what a compiler can write. A guiding principle is don't do more work than you have to. That means it's up to you to know what registers your assembly language function is using, and to save those registers on entry and restore them on exit.
If you don't want to guard against forgetting what to push or pop I would advise sticking to a higher level language.
In assembler, if the function is your own then you should save and restore all registers you use within the function except those which return an output from the function. If others wrote the function, look up its documentation. If in doubt, save/restore registers before/after calling the function (except those which are supposed to return a value).
Since the DOS Terminate function does not rely on any register settings (other than AX) for its operation (*) both pushes/pops in the code you have posted seem superfluous. You should however be aware that the programmer could have pushed these values for the purpose of using them locally! So replacing both these pushes by NOP in HEX editor is surely a bad idea. You could however replace both pops by NOP because at that point in the program the restoration of AX/CX as well as balancing the stack are unnecessary because of (*).
Since your question is about saving registers on the program level the answer must be that pushing/popping registers for the sake of saving them is useless. Nothing bad will happen if you unintentionally forgot to save some registers in your program.

Resources