Understand vector table definition assembly in stm32cubeIDE startup

Understand vector table definition assembly in stm32cubeIDE startup - c

The code to initialize vector table is placed in startup code of STM32cubeIDE:
.global g_pfnVectors
.section .isr_vector,"a",%progbits
.type g_pfnVectors, %object
.size g_pfnVectors, .-g_pfnVectors
g_pfnVectors:
.word _estack
.word Reset_Handler
.word NMI_Handler
.word HardFault_Handler
.
.
.
.word DMA1_Stream0_IRQHandler /* DMA1 Stream 0 */
.
.
.
I want to understand it, Then I have some questsion if anyone could help:
g_pfnVectors have declared two time one with .global other with .word. It first declared as global then declared it's size in hardware?
line 2,3,4 have all something comma separated, what are they?
Is there any reference to understand them?
Note: I know line 6 is array defenition of g_pfnVectord.
It seem the startup have to holds the IRQhandlers function pointer, since we have defined them on *_it.c ,these will link to them, am I right?
In arm interupt vector relocation usage and description they've reassigne SysTick_Handler after table reallocation. If we use linker to move this function to RAM then we don't need this re allocate this function, am I right? (Since this function executes every 1mS by default on cubeIDE apps and have be fast)

Can't exactly cover the assembly syntax well, but I think some explanation of vector table and interrupt entry would be helpful. In fact, it's very easy to have zero assembly code in startup, you can totally make a vector table in C as an array that you place in its own section. Also, I'm not super verse in assembly myself. Anyway, regarding the vector table, what it is, how it works and what you can do with it.
In the beginning of Flash, before any executable code, you have a vector table. It's just a list of pointers to void (void) functions. (The first word of the vector table is loaded into main stack pointer on power on, the others are pointers). Since Flash is not intended to be dynamically changed at runtime, we can safely assume that in practice those pointers are fixed.
When interrupt occurs, the CPU reads the corresponding interrupt handler address from the vector table and then jumps to whatever address it read. Thus, if you move vector table, you also need to make sure that all interrupts have handler addresses in the correct places in new vector table. Basically, you want to copy the old vector table into the new place.
Small addition: in Cortex M, vector table entries' least significant bit is always set to 1 (like all function pointers). Architecture requirement (indicates Thumb instruction set and not 32-bit ARM instruction set). Otherwise usage fault exception. (so for interrupt handler at 0x20040000, the vector table entry is 0x20040001).
Let's consider a basic example.
Your vector table is in Flash and it's fixed (unchangeable). Imagine you have interrupt number 20. Imagine you have void IRQ20_Handler (void) handler for that interrupt, located also in Flash, where your executable code typically resides. Let's imagine the address of the handler is 0x08004000 (any address in Flash that's not overlapping with vector table, just an example).
In your Flash, 32-bit word number 36 of the memory is going to contain 0x08004001. 36 because MSP+Exceptions (hardfault, memmanage fault, busfault, systick, etc) take the first 16 words, and IRQs start only after that. Vector table is in the STM MCU's reference manual.
So right now, your vector table is at location 0x08000000 (beginning of Flash), and word 36 of it has a pointer to IRQ20_Handler, which is also somewhere in Flash. So if IRQ20 interrupt happens, your MCU reads word number 36 from Flash - 0x08004001 - and jumps to 0x08004000 to the handler.
Imagine you want to have a function in SRAM as an interrupt handler located at 0x20040000. You will have to overwrite Flash so that vector table word number 36 has new handler's address - it's not recommended to rewrite flash just for 1 thing - so instead you move vector table to RAM, and then you can dynamically change vectors as much as you want at runtime.
Thus, you first move IRQ handler to RAM (say, 0x20040000), then you move vector table to RAM and make sure that all handler addresses are copied over to there so that all interrupts work just like before, then in the new vector table you overwrite the IRQ20_Handler (word 36 of vector table) with the address of the SRAM function (+1). In this case, with 0x20040001.
Unfortunately, can't provide more details about assembly, since I use it only a bit and mainly as inline assembly blocks for context switching or special instructions not natively supported through C. It looks like it declares a section (and makes it globally visible) and gives it a pair of attributes.

Related

Reserving space in SRAM for SRAM decay experiment (C ; AVR atmega644p ; Atmel Studio 7)

I am looking to perform some experiments on an atmega644p looking at evaluating the amount of decay in SRAM between power cycles. My method is to set a number of bytes in SRAM to 0xFF, then when the mcu powers back up, count the number of remaining 1s in these bytes.
For this to work, I need to read and write the array of 1s to/from a known memory address in SRAM. So far I have code which writes the values to a specific address using a pointer set to 0x1000, and then on power up I begin reading the array from this address. However, I need a way of guaranteeing that this section of SRAM memory (say, 0x100 + 64 bytes) is not allocated to other variables/overwritten before it can be read.
I have looked online at the possibility of allocating memory segments - I don't know if this is a good solution in this case, and am not even too sure how to go about doing this. Can anyone suggest a neat way of approaching this?
Please ask any questions for clarification.
Thanks for your help.

If you're using AVR/GNU, then when C app starts it clears whole the memory and initializes global variables as required.
To avoid that, you can configure the linker to exclude all start files using options -nostartfiles -nodefaultlibs -nostdlib
If you're using Atmel Studio you can configure it like this:
After that you can mark your main for it to be called as initialization code:
int main(void) __attribute__((naked, section(".init9")));
Now you'll have the "naked" code, which does not perform ANY initialization.
That means you need at least to initialize the stack pointer and clear register r1 (which is assumed by avr-gcc to contain zero):
int main(void) {
asm volatile (
"clr r1\n" // load zero into r1
"cli" // clear I flag
);
SPL = (uint8_t)RAMEND;
SPH = (uint8_t)(RAMEND >> 8);
... // here goes your code
for(;;); // do not leave main()!
}
After this you'll have ALL global variables uninitialized. You can declare, for example a global array and check its content on startup.

You'll need to come up with a custom region in RAM by reserving space for it in your linker file. It is important that it is marked "no init" or similar, or otherwise .bss initialization and similar might happen on it before main() is called. How to do this is linker-specific.
However, writing software for this seems needlessly cumbersome. Simply use an in-circuit debugger:
Ensure that you are using a debugger which does not power the target.
Download a program which uses no RAM at all into flash. Confirm this by checking the map file.
Set the whole RAM to 0xFF through the debugger.
Remove power while keeping the in-circuit debugger connected.
Wait x time units.
Power up, hit MCU reset in the debugger, memory dump the whole RAM.
Any half-decent tool chain should be able to do this for you.

Split section into multiple memory regions

I'm developing an application on an ARM Cortex-M microcontroller which has two RAM banks à 64kB. The first bank is directly followed by the second bank in the memory map.
The memory banks are currently split into two regions in my linker script. The first region contains the sections .bss and .data. The second bank is used for .heap and .stack, which only take 1kB each (I'm using a different stack in FreeRTOS, which also manages it's own heap).
My problem is, that .bss is too large for the first bank. Therefore I'd like to move some of it's content to the second bank.
One way to accomplish this would be to create a new section, lets call it .secondbss, which is linked to the second bank. Single variables could then be added to this section using __attribute__((section(".secondbss"))).
The reasons why I am not using this solution are
I really want to maintain portability of my source code
There might be a whole lot of variables that would require this attribute and I don't want to choose the section for every single variable
Is there a better solution for this? I already thought of both memories as one region, but I don't know how to prevent the linker from misaligning the data across the boundary between both banks.
How can I solve my problem without using __attribute__ flags?
Thank you!

For example you have 2 banks at 0x20000000 and 0x20010000. You wants use Bank2 for heap and (main) stack. I assume that you have large .bss because of configTOTAL_HEAP_SIZE in FreeRTOSConfig.h. Now see heap sources in FreeRTOS/Source/portable/MemMang/. There are 5 implementations of pvPortMalloc() that do memory allocation.
Looks at lines in heap_X.c that you use
/* Allocate the memory for the heap. */
#if( configAPPLICATION_ALLOCATED_HEAP == 1 )
/* The application writer has already defined the array used for the RTOS
heap - probably so it can be placed in a special segment or address. */
extern uint8_t ucHeap[ configTOTAL_HEAP_SIZE ];
#else
static uint8_t ucHeap[ configTOTAL_HEAP_SIZE ];
#endif /* configAPPLICATION_ALLOCATED_HEAP */
So you can set configAPPLICATION_ALLOCATED_HEAP at 1 and say to you linker to place ucHeap at 0x20010000.
Another way is writing headers for each device that includes addresses of heap and stack and edit sources.
For heap_1.c we can do next changes:
// somewhere in devconfig.h
#define HEAP_ADDR 0x20010000
// in heap_1.c
// remove code related ucHeap
//
// remove static uint8_t *pucAlignedHeap = NULL;
// and paste:
static uint8_t *pucAlignedHeap = (uint8_t *)HEAP_ADDR;
For heap_2.c and heap_4.c edit function prvHeapInit() as well.
Pay attention to heap_5.c that includes vPortDefineHeapRegions().
Now pvPortMalloc() will returns pointers to memory in Bank2. pvPortMalloc() used for allocations stacks of tasks, TCB and user varables. Read sources. Location of main stack depends of your device/architecture. For stm32 (ARM) see vector table or how to change MSP register.

what does PC have to do with load or link address?

Link address is the address where execution of a program takes place, while load address is the address in memory where the program is actually placed.
Now i'm confused what is the value in program counter? is it the load address or is it the link address?

Link address is the address where execution of a program takes place
No, it's not.
while load address is the address in memory where the program is actually placed.
Kind of. The program usually consists of more than one instruction, so it can't be placed at a single "load address".
When people talk about load address, they usually talk about relocatable code that can be relocated (at runtime) to an arbitrary load address.
For example, let's take a program that is linked at address 0x20020, and consists of 100 4-byte instructions, which all execute sequentially (e.g. it's a sequence of ADDs followed by a single SYSCALL to exit the pogram).
If such a program is loaded at address 0x20020, then at runtime the program counter will have value 0x20020, then it will advance to the next instruction at 0x20024, then to 0x20028, etc. until it reaches the last instruction of the program at 0x201ac.
But if that program is loaded at address 0x80020020 (i.e. if the program is relocated by 0x80000000 from its linked-at address), then the program counter will start at 0x80020020, and the last instruction will be at 0x800201ac.
Note that on many OSes executables are not relocatable and thus have to always be loaded at the same address they were linked at (i.e. with relocation 0; in this case "link address" really is the address where execution starts), while shared libraries are almost always relocatable and are often linked at address 0 and have non-zero relocation.

Both are different concepts, used in different context. The Linker/Loader is mainly responsible for code relocation and modification; the PC is a digital counter which indicates the positioning of program sequence(not a type's address/location like linker/loader).
Linking & Loading :-
The heart of a linker or loader's actions is relocation and code
modification. When a compiler or assembler generates an object file,
it generates the code using the unrelocated addresses of code and data
defined within the file, and usually zeros for code and data defined
elsewhere. As part of the linking process, the linker modifies the
object code to reflect the actual addresses assigned. For example,
consider this snippet of x86 code that moves the contents of variable
a to variable b using the eax register.
mov a,%eax
mov %eax,b
If a is defined in the same file at location 1234 hex and b is
imported from somewhere else, the generated object code will be:
A1 34 12 00 00 mov a,%eax
A3 00 00 00 00 mov %eax,b
Each instruction contains a one-byte operation code followed by a
four-byte address. The first instruction has a reference to 1234 (byte
reversed, since the x86 uses a right to left byte order) and the
second a reference to zero since the location of b is unknown.
Now assume that the linker links this code so that the section in
which a is located is relocated by hex 10000 bytes, and b turns out to
be at hex 9A12. The linker modifies the code to be:
A1 34 12 01 00 mov a,%eax
A3 12 9A 00 00 mov %eax,b
That is, it adds 10000 to the address in the first instruction so now
it refers to a's relocated address which is 11234, and it patches in
the address for b. These adjustments affect instructions, but any
pointers in the data part of an object file have to be adjusted as
well.
Program Counter :-
The program counter (PC) is a processor register that indicates where
a computer is in its program sequence.
In a typical central processing unit (CPU), the PC is a digital
counter (which is the origin of the term "program counter") that may
be one of many registers in the CPU hardware. The instruction cycle
begins with a fetch, in which the CPU places the value of the PC on
the address bus to send it to the memory.
The memory responds by
sending the contents of that memory location on the data bus. (This is
the stored-program computer model, in which executable instructions
are stored alongside ordinary data in memory, and handled identically
by it).
Following the fetch, the CPU proceeds to execution, taking
some action based on the memory contents that it obtained. At some
point in this cycle, the PC will be modified so that the next
instruction executed is a different one (typically, incremented so
that the next instruction is the one starting at the memory address
immediately following the last memory location of the current
instruction).

I would put the term "load address" out of your thinking. It does not really exist in a modern operating system. In ye old days of multiple programs loaded into the same address space (and each program loaded into a contiguous region of memory), load address had significance. Now it does not. He's why.
An executable file is typically going to define a number of different program segments. These may not be loaded contiguously in memory. For example, the linker often directs the creation of stack areas remote from other areas of the program.
The executable will indicate the location that should be the initial value of the PC. This might not be at the start of a program segment, let alone be in the first program segment.

Big empty space in memory?

Im very new to embedded programming started yesterday actually and Ive noticed something I think is strange. I have a very simple program doing nothing but return 0.
int main() {
return 0;
}
When I run this in IAR Embedded Workbench I have a memory view showing me the programs memory. Ive noticed that in the memory there is some memory but then it is a big block of empty space and then there is memory again (I suck at explaining :P so here is an image of the memory)
Please help me understand this a little more than I do now. I dont really know what to search for because Im so new to this.

The first two lines are the 8 interrupt vectors, expressed as 32-bit instructions with the highest byte last. That is, read them in groups of 4 bytes, with the highest byte last, and then convert to an instruction via the usual method. The first few vectors, including the reset at memory location 0, turn out to be LDR instructions, which load an immediate address into the PC register. This causes the processor to jump to that address. (The reset vector is also the first instruction to run when the device is switched on.)
You can see the structure of an LDR instruction here, or at many other places via an internet search. If we write the reset vector 18 f0 95 e5 as e5 95 f0 18, then we see that the PC register is loaded with the address located at an offset of 0x20.
So the next two lines are memory locations referred to by instructions in the first two lines. The reset vector sends the PC to 0x00000080, which is where the C runtime of your program starts. (The other vectors send the PC to 0x00000170 near the end of your program. What this instruction is is left to the reader.)
Typically, the C runtime is code added to the front of your program that loads the global variables into RAM from flash, and sets the uninitialized RAM to 0. Your program starts after that.
Your original question was: why have such a big gap of unused flash? The answer is that flash memory is not really at a premium, so we can waste a little, and that having extra space there allows for forward-compatibility. If we need to increase the vector table size, then we don't need to move the code around. In fact, this interrupt model has been changed in the new ARM Cortex processors anyway.

Physical (not virtual) memory addresses map to physical circuits. The lowest addresses often map to registers, not RAM arrays. In the interest of consistency, a given address usually maps to the same functionality on different processors of the same family, and missing functionality appears as a small hole in the address mapping.
Furthermore, RAM is assigned to a contiguous address range, after all the I/O registers and housekeeping functions. This produces a big hole between all the registers and the RAM.
Alternately, as #Martin suggests, it may represent uninitialized and read-only Flash memory as -- bytes. Unlike truly unassigned addresses, access to this is unlikely to produce an exception, and you might even be able to make them "reappear" using appropriate Flash controller commands.
On a modern desktop-class machine, virtual memory hides all this from you, and even parts of the physical address map may be configurable. Many embedded-class processors allow configuration to the extent of specifying the location of the interrupt vector table.

UncleO is right but here is some additional information.
The project's linker command file (*.icf for IAR EW) determines where sections are located in memory. (Look under Project->Options->Linker->Config to identify your linker configuration file.) If you view the linker command file with a text editor you may be able to identify where it locates a section named .intvec (or similar) at address 0x00000000. And then it may locate another section (maybe .text) at address 0x00000080.
You can also see these memory sections identified in the .map file, along with their locations. (Ensure "Generate linker map file" is checked under Project->Options->Linker->List.) The map file is an output from the build, however, and it's the linker command file that determines the locations.
So that space in memory is there because the linker command file instructed it to be that way. I'm not sure whether that space is necessary but it's certainly not a problem. You might be able to experiment with the linker command file and move that second section around. But the exception table (a.k.a. interrupt vector table) must be located at 0x00000000. And you'll want to ensure that the reset vector points to the new location of the startup code if you move it.

call stack unwinding in ARM cortex m3

I would like to create a debugging tool which will help me debug better my application.
I'm working bare-bones (without an OS). using IAR embedded workbench on Atmel's SAM3.
I have a Watchdog timer, which calls a specific IRQ in case of timeout (This will be replaced with a software reset on release).
In the IRQ handler, I want to print out (UART) the stack trace, of where exactly the Watchdog timeout occurred.
I looked in the web, and I didn't find any implementation of that functionality.
Anyone has an idea on how to approach this kind of thing ?
EDIT: OK, I managed to grab the return address from the stack, so I know exactly where the WDT timeout occurred.
Unwinding the whole stack is not simple as it first appears, because each function pushes different amount of local variables into the stack.
The code I end up with is this (for others, who may find it usefull)
void WDT_IrqHandler( void )
{
uint32_t * WDT_Address;
Wdt *pWdt = WDT ;
volatile uint32_t dummy ;
WDT_Address = (uint32_t *) __get_MSP() + 16 ;
LogFatal ("Watchdog Timer timeout,The Return Address is %#X", *WDT_Address);
/* Clear status bit to acknowledge interrupt */
dummy = pWdt->WDT_SR ;
}

ARM defines a pair of sections, .ARM.exidx and .ARM.extbl, that contain enough information to unwind the stack without debug symbols. These sections exist for exception handling but you can use them to perform a backtrace as well. Add -funwind-tables to force GCC to include these sections.

To do this with ARM, you will need to tell your compiler to generate stack frames. For instance with gcc, check the option -mapcs-frame. It may not be the one you need, but this will be a start.
If you do not have this, it will be nearly impossible to "unroll" the stack, because you will need for each function the exact stack usage depending on parameters and local variables.
If you are looking for some exemple code, you can check dump_stack() in Linux kernel sources, and find back the related piece of code executed for ARM.

It should be pretty straight forward to follow execution. Not programmatically in your isr...
We know from the ARM ARM that on a Cortex-M3 it pushes xPSR,
ReturnAddress, LR (R14), R12, R3, R2, R1, and R0 on the stack. mangles the lr so it can detect a return from interrupt then calls the entry point listed in the vector table. if you implement your isr in asm to control the stack, you can have a simple loop that disables the interrupt source (turns off the wdt, whatever, this is going to take some time) then goes into a loop to dump a portion of the stack.
From that dump you will see the lr/return address, the function/instruction that was interrupted, from a disassembly of your program you can then see what the compiler has placed on the stack for each function, subtract that off at each stage and go as far back as you like or as far back as you have printed the stack contents.
You could also make a copy of the stack in ram and dissect it later rather than doing such things in an isr (the copy still takes too much time but is less intrusive than waiting on the uart).
If all you are after is the address of the instruction that was interrupted, that is the most trivial task, just read that from the stack, it will be at a known place, and print it out.

Did I hear my name? :)
You will probably need a tiny bit of inline assembly. Just figure out the format of the stack frames, and which register holds the ordinary1 stack pointer, and transfer the relevant values into C variables from which you can format strings for output to the UART.
It shouldn't be too tricky, but of course (being rather low-level) you need to pay attention to the details.
1As in "non-exception"; not sure if the ARM has different stacks for ordinary code and exceptions, actually.

Your watchdog timer can fire at any point, even when the stack does not contain enough information to unwind (e.g. stack space has been allocated for register spill, but the registers not copied yet).
For properly optimized code, you need debug info, period. All you can do from a watchdog timer is a register and stack dump in a format that is machine readable enough to allow conversion into a core dump for gdb.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight