SystemInit function ends up at add odd address in Kiel ARM compiler

SystemInit function ends up at add odd address in Kiel ARM compiler - arm

I'm seeing some weird behaviour in the Kiel compiler. This is the reset code, I'm using a generic STM32F103C8 board (which uses an M3 core), and the SystemInit function (which is provided by the Kiel libraries) has somehow ended up at an odd address (0800043D), you can see from the screenshot below that the pointer at 080001F8 contains this value and sure enough, when the instruction at the PC (BLX r0) is executed, a hard fault occurs. Has anyone seen this happen before? How can I get it to put the SystemInit function at an even address?
Also, my main() function seems to have ended up at an odd address as well... Shouldn't the compiler align these for me?
These 2 lines from the map file are interesting:
0x08000000 0x000000ec Data RO 1224 RESET startup_stm32f10x_md.o
0x080000ec 0x00000008 Code RO 1269 * !!!main c_w.l(__main.o)

Related

Hardfault exception when calling memset on STM32

Starting up a STM32 i try to allocate memory for a struture pointed to by a pointer.
TLxbEvents *LxbEvents
memset((void*)LxbEvents, 0, sizeof(TLxbEvents));
Looking into the disassembly, it crashes always on the line
STMCS r0!,{r2-r3,r12,lr}
I could not find a document describing the STMCS instruction nether on ARM website or Google or elsewhere...
The registers at that point are
r0 0x2000D694
r2 0x00000000
r3 0x00000000
r12 0x00000000
lr 0x00000000
I tried to move the call to another routine, without any changes, checked the alignment and that also seems to be okay. Everytime the program runs into that line it crashes with a HardFault and according to some debug variables, it is caused by a watchdog reset, what i do not believe...
What does this line do and has someone an idea, what is causing the hard fault?

STMCS is an ARM instruction (base instruction is STM and CS is the conditional instruction suffix) It seems you are compiling your code in ARM mode, but STM32 is a Cortex-M core and only supports Thumb-2 instruction set variant. Double-check you build settings and compilation switches.

Stacktrace on ARM cortex-M4

When I run into a fault handler on my ARM cortex-M4 (Thumb) I get a snapshot of the CPU register just before the fault occured. With this information I can find the stack pointer where it was. Now, what I want is to backtrace through all functions it passed. The only problem I see here is that I don't have a frame pointer, so I cannot really see where a certain subroutine has saved the LR, ad infinitum.
How would one tackle this problem if the frame pointer is not available in r7?

This blog post discusses this issue with reference to the MIPS architecture - the principles can be readily adapted to ARM architectures.
In short, it describes three possibilities for locating the stack frame for a given SP and PC:
Using compiler-generated debug information (not included in the executable image) to calculate it.
Using compiler-generated stack-unwinding (exception handling) information (included in the executable image) to calculate it.
Scanning the call site to locate the prologue or epilogue code that adjusts the stack pointer, and deducing the stack frame address from that.
Obviously it's very compiler- and compiler-option dependent, and not guaranteed to work in all cases.

R7 is not the frame pointer on the M4, it's R11. R7 is the FP for Cortex-M0+/M1 where only the lower registers are generally available. In anycase, when Cortex-M makes a call to a function using BL and variants, it saves the return address into LR (link register). At function entry, the LR is saved onto the stack. So in theory, to get a call trace, you would "chase" the chain of the LRs.
Unfortunately, the saved location of LR on the stack is not defined by the calling convention, and its location must be deduced from the debug info for that function entry in the DWARF records (in the .elf file). I do not know if there is an utility that would extract the LR locations from an ELF file, but it should not be too difficult.

Richard at ImageCraft is right.
More information can be found here
This works fine with C code. I had a harder applying it to C++ but it's not impossible.

Custom Bootloader for Kinetis MKE06Z microcontrollers on IAR EWARM issue

First I'd like to introduce myself, as I'm new to the site. I'm an Electronic Engineer, specialized in embedded systems design and development. I've been gathering info from the site for a long time, and I think that there's a lot of people with great deal of knowledge. I'm hoping some other of you may have stumbled upon this or a similar issue.
I've been having some trouble in the implementation of a custom bootloader for a Kinetis MKE06Z microcontroller, not in the bootloader itself but in the relocation of the application code and the behavior after jumping to it. The application is completely coded in C.
The bootloader executes everything as expected, determines if it should run or jump to user application. This is the sequence that implements the jump:
__disable_interrupt();
SCB->VTOR = RELOCATION_VECTOR_ADDR & 0x3FFFFE00;
JumpToUserApplication(RELOCATION_VECTOR_ADDR);
where:
void JumpToUserApplication(uint32_t userStartup)
{
/* set up stack pointer */
asm("LDR r1, [r0]");
asm("MOV r13, r1");
/* jump to application reset vector */
asm("ADDS r0,r0,#0x04 ");
asm("LDR r0, [r0]");
asm("BX r0");
}
as implemented in Frescale's AN4767.
So far, so good. Once the jump is executed, I trace the application behavior (on the Disassembly Window) and find out after some instructions, it gets stuck at some specific address with a jump instruction, which ends up being an infinite loop. I then run it step by step to determine which was the instruction that causes this malfunction. It's very strange, as it is running OK and suddenly jumps to a RAM address. A couple of cycles and then jumps to the infinite loop. I took note of the addresses with the instruction causing this strange jump and the one with the infinite loop. I look at the core registers and find out there is an exception, and notice it's the number 0x03 (Hard Fault). Then switch to debugging the user application.
Once in the user application, I start debugging. The user application works fine running like this (no jump from the bootloader). Then I look for the relevant addresses and discover that the routine causing the hard fault when jumping from bootloader is from IAR: __iar_data_init3. The thing is, it's part of a precompiled library and I'm not sure if it's safe to remove it (by removing the __iar_program_start and replacing it directly with the call to main on the startup file.
The real question is: why does the application behave like that after the jump from the booloader but not if there is no such jump? Why does this routine jumps to a RAM address (when it shouldn't)?
Of course, it may be a little to specific, but hopefully there's someone that can help me.

It seems that something IAR does with the linker configuration is not very clear to me, but has something to do with this problem. The thing is I relocated .text segment:
define symbol __ICFEDIT_intvec_start__ = 0x00001800;
define symbol __ICFEDIT_region_ROM_start__ = 0x00002000;
define symbol __ICFEDIT_region_ROM_end__ = 0x0000FFFF;
define region APP_ROM = mem:[from (__ICFEDIT_region_ROM_start__) to (__ICFEDIT_region_ROM_end__)];
place at address mem:__ICFEDIT_intvec_start__ { readonly section .intvec };
place at start of APP_ROM { readonly section .text };
It seems that the linker doesn't appreciate this and something make the app misbehave when jumping from other app. Instead of this, keeping the original .icf file and editing within the GUI only the .intvec_start solved the problem, but code starts right next to the vector table. Not an issue, but I wanted to relocate code a little farther.
Thanks.

In ELF or DWARF, how can I get .PLT section values? -- Trying to get the address of a function on where an instrumentation tool is in

I am working in obtaining all the data of a program using its ELF and DWARF info and by hooking a pin tool to a process that is currently running -- It is kind of a debugger using a Pin tool.
For getting the local variables from the stack I am working with the registers EIP, EBP and ESP which I have access to from Pin.
What stroke me as weird is that I was expecting EIP to be pointing to the current function that was running when the pin tool was attached to the process, but instead EIP is pointing to the section .PLT. In other words, if the pin tool was hooked into the process when Foo() was running, then I was expecting EIP to be pointing to some address inside the Foo function. However it is pointing to the beginning of the .PLT section.
What I need to know is which function the process is currently in -- Is there any way to get the address of the function using the .PLT section? Is there any other ways to get the address of the function from the stack or using Pin? I hope I was clear enough, let me know if there are any questions though.

I might not be understanding exactly what is going on here...is the instruction pointer really in the .plt section or are you just getting a garbage value from Pin ?
You name the instruction pointer you are reading EIP, which might be a problem if you are running on a 64bit system, is that the case ?
You see the instruction pointer register is a 32bit value on a 32bit system, and a 64bit value on a 64bit system. So Pin actually provides 3 REG_* names for the instruction pointer: EIP, RIP and GBP. EIP is always the lower 32bit half of the register, RIP the 64bit value, and GBP one of the two depending on your architecture. Asking for EIP on a 64bit system gives you garbage, same for asking RIP on a 32bit one.
Otherwise, a quick look on Google gives me this. Quoting a bit:
By default the .plt entries are all initialized by the linker not to point to the correct target functions, but instead to point to the dynamic loader itself. Thus, the first time you call any given function, the dynamic loader looks up the function and fixes the target of the .plt so that the next time this .plt slot is used we call the correct function.
And more importantly:
It is possible to instruct the dynamic loader to bind addresses to all of the .plt slots before transferring control to the application—this is done by setting the environment variable LD_BIND_NOW=1 before running the program. This turns out to be useful in some cases when you are debugging a program, for example.
Hope that helps.

call stack unwinding in ARM cortex m3

I would like to create a debugging tool which will help me debug better my application.
I'm working bare-bones (without an OS). using IAR embedded workbench on Atmel's SAM3.
I have a Watchdog timer, which calls a specific IRQ in case of timeout (This will be replaced with a software reset on release).
In the IRQ handler, I want to print out (UART) the stack trace, of where exactly the Watchdog timeout occurred.
I looked in the web, and I didn't find any implementation of that functionality.
Anyone has an idea on how to approach this kind of thing ?
EDIT: OK, I managed to grab the return address from the stack, so I know exactly where the WDT timeout occurred.
Unwinding the whole stack is not simple as it first appears, because each function pushes different amount of local variables into the stack.
The code I end up with is this (for others, who may find it usefull)
void WDT_IrqHandler( void )
{
uint32_t * WDT_Address;
Wdt *pWdt = WDT ;
volatile uint32_t dummy ;
WDT_Address = (uint32_t *) __get_MSP() + 16 ;
LogFatal ("Watchdog Timer timeout,The Return Address is %#X", *WDT_Address);
/* Clear status bit to acknowledge interrupt */
dummy = pWdt->WDT_SR ;
}

ARM defines a pair of sections, .ARM.exidx and .ARM.extbl, that contain enough information to unwind the stack without debug symbols. These sections exist for exception handling but you can use them to perform a backtrace as well. Add -funwind-tables to force GCC to include these sections.

To do this with ARM, you will need to tell your compiler to generate stack frames. For instance with gcc, check the option -mapcs-frame. It may not be the one you need, but this will be a start.
If you do not have this, it will be nearly impossible to "unroll" the stack, because you will need for each function the exact stack usage depending on parameters and local variables.
If you are looking for some exemple code, you can check dump_stack() in Linux kernel sources, and find back the related piece of code executed for ARM.

It should be pretty straight forward to follow execution. Not programmatically in your isr...
We know from the ARM ARM that on a Cortex-M3 it pushes xPSR,
ReturnAddress, LR (R14), R12, R3, R2, R1, and R0 on the stack. mangles the lr so it can detect a return from interrupt then calls the entry point listed in the vector table. if you implement your isr in asm to control the stack, you can have a simple loop that disables the interrupt source (turns off the wdt, whatever, this is going to take some time) then goes into a loop to dump a portion of the stack.
From that dump you will see the lr/return address, the function/instruction that was interrupted, from a disassembly of your program you can then see what the compiler has placed on the stack for each function, subtract that off at each stage and go as far back as you like or as far back as you have printed the stack contents.
You could also make a copy of the stack in ram and dissect it later rather than doing such things in an isr (the copy still takes too much time but is less intrusive than waiting on the uart).
If all you are after is the address of the instruction that was interrupted, that is the most trivial task, just read that from the stack, it will be at a known place, and print it out.

Did I hear my name? :)
You will probably need a tiny bit of inline assembly. Just figure out the format of the stack frames, and which register holds the ordinary1 stack pointer, and transfer the relevant values into C variables from which you can format strings for output to the UART.
It shouldn't be too tricky, but of course (being rather low-level) you need to pay attention to the details.
1As in "non-exception"; not sure if the ARM has different stacks for ordinary code and exceptions, actually.

Your watchdog timer can fire at any point, even when the stack does not contain enough information to unwind (e.g. stack space has been allocated for register spill, but the registers not copied yet).
For properly optimized code, you need debug info, period. All you can do from a watchdog timer is a register and stack dump in a format that is machine readable enough to allow conversion into a core dump for gdb.