simple struct dereference trigger ARM hard_fault hardware exception - c

I've been debugging this for many hours now.
My application is an embedded program running on the CC2650 ARM M3 processor using the TI RTOS.
This line of c generates an ARM hard_fault exception (LD - link register set to 0xFFFFFFFD):
leaseStartMessageForReplay = *leaseStartMessage;
The code simply dereferences the leaseStartMessage struct pointer and copies the full struct content (2 words) to the leaseStartMessageForReplay struct. (Thats the intension at least).
The actual assembly for that line looks like this:
The assembly seems correct: 1st line loads R0 with the address of leaseStartMessage. 2nd line loads R2 with the address of the leaseStartMessageForReplay. 3rd line load the two words located at address-R0 into R0 and R2. 4th line stores the two words in R0 and R2 at address-R1.
The hard_fault exception happens on the 3rd line. The registers R0, R1, R2 have these values just before executing the 3rd instruction:
As can be seen the two address pointers R0 and R1 are initialized and I have verified that they contain the correct addresses.
Any help on how to debug this would be greatly appreciated.

R0 isn't aligned to a 32 bit address, and LMDIA requires alignment.
Use memcpy() instead.

Related

STM32 Hardfault when trying to access memory

I am analyzing code written for STM32H730 microcontroller. I find the below snippet of code which is giving hardfault when the BootHoldRequest(&fnBoot) is called.
#define BOOTBLOCK_ADD 0x08000000L
#define BootHoldRequest (*((BOOTLOAD_PROCEED_TYPE *) (BOOTBLOCK_ADD + 0x200)))
typedef void (* CALLBACK_PTR)(void);
typedef uint16_t BOOTLOAD_PROCEED_TYPE(CALLBACK_PTR *);
typedef void (* VOID_FUN_TYPE)(void);
static VOID_FUN_TYPE fnBoot;
if (BootHoldRequest(&fnBoot)) //<--------- HARDFAULT
{
}
As it is impossible to answer your question not seeing the whole project (including linker scripts etc) I will only show how to debug this issue.
What does this code do?
if (BootHoldRequest(&fnBoot))
ldr r0, .L6
ldr r3, .L6+4
bx r3
.L6:
.word .LANCHOR0
.word 0x8000200
It loads the 4 bytes address from the BOOTBLOCK_ADD + 0x200 location and then next calls code located at this address. I do not know if you have the correct data there so you need to check it yourself.
If you use IDE (in my example Atollic - which is almost identical with STM32Cube IDE) you can easily check it.
Two methods:
Set the breakpoint at this line.
Use the expression window to see what is at this address:
Enter the instruction debug mode
And follow the code one assembly instruction at the time. You will see if the code does what it is supposing to do.
It is not your code. It is the code from my project I work on.

Behavior of LDR on a uint8_t variable in ARM?

I cannot find a straight answer for this anywhere. The registers for ARM are 32-bit, I know that LDRB loads a byte size value into a register and zeros out the remaining 3 bytes, even if you feed it a value bigger than a byte, it will just take the first byte value.
My program combines C with ARM Assembly. I have an extern variable in C that gets loaded into a register directly.
However if I call just LDR on this byte variable, is there a guarantee that it loads the byte and nothing else or will it load random things in the remaining 3 byte space from nearby things in memory to fill out the entire 32-bit register?
I'm only asking because I did LDR R0, =var and always got the correct value out of probably a hundred million executions (software ran for a long time and was tested thoroughly / recompiled many times before this issue was brought up on another setup).
However someone else with a different setup (Not so different, compiler is the same version I think) compiled the code successfully however the value loaded into R0 was polluted with random bits from the surrounding memory of the variable. They had to do LDRB to fix it.
Is this a compiler thing? Can it detect this and automatically switch it to LDRB? Or am I just that lucky that the surrounding memory of the variable was just zero due to some optimization?
As a side note the compiler is ARM GCC 9.2.1
because I did LDR R0, =var
Are you loading the value or the address of the variable?
Normally, the instruction LDR R0, =var will write the address of the variable var into the register R0 and not the value.
And the address of a variable is always a 32-bit value on a 32-bit ARM CPU - independent of the data type.
However if I call just LDR on this byte variable, ...
If you load the value of a variable (e.g. using LDR R1, [R0]), two things may happen:
The upper 24 bits of the register may contain a random value depending on the bytes that follow your variable in memory. If you are lucky, the bytes are always zero.
Depending on the exact CPU type, you may get problems due to alignment (for example an alignment exception or even completely undefined behavior)
LDR doesn't know anything about how you declared the variable or what's supposed to be in the 4 bytes it loads. That's why ISAs like ARM have byte loads like LDRB (and its sign-extending equivalent) in the first place.
And no, compilers don't waste 3 bytes (of zeros) after every uint8_t just so you can use word loads on it, that would be silly. i.e. sizeof(uint8_t) = 1 = unsigned char, CHAR_BIT = 8, and alignof(uint8_t) = 1
LDR loads an int32_t or uint32_t whole word.
But as Martin points out, LDR r0, =var puts the address of var into a register.
Then you use ldrb r1, [r0]
Fun fact: early ARM CPUs (ARMv4 and earlier) with an unaligned word load will use the low 2 bits of the address as a rotate count (after loading from an aligned word). https://medium.com/#iLevex/the-curious-case-of-unaligned-access-on-arm-5dd0ebe24965

Sorting ARM Assembly

I am newbie. I have difficulties with understanding memory ARM memory map.
I have found example of simple sorting algorithm
AREA ARM, CODE, READONLY
CODE32
PRESERVE8
EXPORT __sortc
; r0 = &arr[0]
; r1 = length
__sortc
stmfd sp!, {r2-r9, lr}
mov r4, r1 ; inner loop counter
mov r3, r4
sub r1, r1, #1
mov r9, r1 ; outer loop counter
outer_loop
mov r5, r0
mov r4, r3
inner_loop
ldr r6, [r5], #4
ldr r7, [r5]
cmp r7, r6
; swap without swp
strls r6, [r5]
strls r7, [r5, #-4]
subs r4, r4, #1
bne inner_loop
subs r9, r9, #1
bne outer_loop
ldmfd sp!, {r2-r9, pc}^
END
And this assembly should be called this way from C code
#define MAX_ELEMENTS 10
extern void __sortc(int *, int);
int main()
{
int arr[MAX_ELEMENTS] = {5, 4, 1, 3, 2, 12, 55, 64, 77, 10};
__sortc(arr, MAX_ELEMENTS);
return 0;
}
As far as I understand this code creates array of integers on the stack and calls _sortc function which implemented in assembly. This function takes this values from the stack and sorts them and put back on the stack. Am I right ?
I wonder how can I implement this example using only assembly.
For example defining array of integers
DCD 3, 7, 2, 8, 5, 7, 2, 6
BTW Where DCD declared variables are stored in the memory ??
How can I operate with values declared in this way ? Please explain how can I implement this using assembly only without any C code, even without stack, just with raw data.
I am writing for ARM7TDMI architecture
AREA ARM, CODE, READONLY - this marks start of section for code in the source.
With similar AREA myData, DATA, READWRITE you can start section where it's possible to define data like data1 DCD 1,2,3, this will compile as three words with values 1, 2, 3 in consecutive bytes, with label data1 pointing to the first byte of first word. (some AREA docs from google).
Where these will land in physical memory after loading executable depends on how the executable is linked (linker is using a script file which is helping him to decide which AREA to put where, and how to create symbol table for dynamic relocation done by the executable loader, by editing the linker script you can adjust where the code and data land, but normally you don't need to do that).
Also the linker script and assembler directives can affect size of available stack, and where it is mapped in physical memory.
So for your particular platform: google for memory mappings on web and check the linker script (for start just use linker option to produce .map file to see where the code and data are targeted to land).
So you can either declare that array in some data area, then to work with it, you load symbol data1 into register ("load address of data1"), and use that to fetch memory content from that address.
Or you can first put all the numbers into the stack (which is set probably to something reasonable by the OS loader of your executable), and operate in the code with the stack pointer to access the numbers in it.
You can even DCD some values into CODE area, so those words will end between the instructions in memory mapped as read-only by executable loader. You can read those data, but writing to them will likely cause crash. And of course you shouldn't execute them as instructions by accident (forgetting to put some ret/jump instruction ahead of DCD).
without stack
Well, this one is tricky, you have to be careful to not use any call/etc. and to have interrupts disabled, etc.. basically any thing what needs stack.
When people code a bootloader, usually they set up some temporary stack ASAP in first few instructions, so they can use basic stack functionality before setting up whole environment properly, or loading OS. A space for that temporary stack is often reserved somewhere in/after the code, or an unused memory space according to defined machine state after reset.
If you are down to the metal, without OS, usually all memory is writeable after reset, so you can then intermix code and data as you wish (just jumping around the data, not executing them by accident), without using AREA definitions.
But you should make your mind, whether you are creating application in user space of some OS (so you have things like stack and data areas well defined and you can use them for your convenience), or you are creating boot loader code which has to set it all up for itself (more difficult, so I would suggest at first going into user land of some OS, having C wrapper around with clib initialized is often handy too, so you can call things like printf from ASM for convenient output).
How can I operate with values declared in this way
It doesn't matter in machine code, which way the values were declared. All that matters is, if you have address of the memory, and if you know the structure, how the data are stored there. Then you can work with them in any way you want, using any instruction you want. So body of that asm example will not change, if you allocate the data in ASM, you will just pass the pointer as argument to it, like the C does.
edit: some example done blindly without testing, may need further syntax fixing to work for OP (or maybe there's even some bug and it will not work at all, let me know in comments if it did):
AREA myData, DATA, READWRITE
SortArray
DCD 5, 4, 1, 3, 2, 12, 55, 64, 77, 10
SortArrayEnd
AREA ARM, CODE, READONLY
CODE32
PRESERVE8
EXPORT __sortasmarray
__sortasmarray
; if "add r0, pc, #SortArray" fails (code too far in memory from array)
; then this looks like some heavy weight way of loading any address
; ldr r0, =SortArray
; ldr r1, =SortArrayEnd
add r0, pc, #SortArray ; address of array
; calculate array size from address of end
; (as I couldn't find now example of thing like "equ $-SortArray")
add r1, pc, #SortArrayEnd
sub r1, r1, r0
mov r1, r1, lsr #2
; do a direct jump instead of "bl", so __sortc returning
; to lr will actually return to called of this
b __sortc
; ... rest of your __sortc assembly without change
You can call it from C code as:
extern void __sortasmarray();
int main()
{
__sortasmarray();
return 0;
}
I used among others this Introducing ARM assembly language to refresh my ARM asm memory, but I'm still worried this may not work as is.
As you can see, I didn't change any thing in the __sortc. Because there's no difference in accessing stack memory, or "dcd" memory, it's the same computer memory. Once you have the address to particular word, you can ldr/str it's value with that address. The __sortc receives address of first word in array to sort in both cases, from there on it's just memory for it, without any context how that memory was defined in source, allocated, initialized, etc. As long as it's writeable, it's fine for __sortc.
So the only "dcd" related thing from me is loading array address, and the quick search for ARM examples shows it may be done in several ways, this add rX, pc, #label way is optimal, but does work only for +-4k range? There's also pseudo instruction ADR rX, #label doing this same thing, and maybe switching to other in case of range problem? For any range it looks like ldr rX, = label form is used, although I'm not sure if it's pseudo instruction or how it works, check some tutorials and disassembly the machine code to see how it was compiled.
It's up to you to learn all the ARM assembly peculiarities and how to load addresses of arrays, I don't need ARM ASM at the moment, so I didn't dig into those details.
And there should be some equ way to define length of array, instead of calculating it in code from end address, but I couldn't find any example, and I'm not going to read full Assembler docs to learn about all it's directives (in gas I think ArrayLength equ ((.-SortArray)/4) would work).

C code calling an Assembly routine - ARM

I'm currently working on a bootloader for an ARM Cortex M3.
I have two functions, one in C and one in assembly but when I attempt to call the assembly function my program hangs and generates some sort of fault.
The functions are as follows,
C:
extern void asmJump(void* Address) __attribute__((noreturn));
void load(void* Address)
{
asmJump(Address);
}
Assembly:
.section .text
.global asmJump
asmJump: # Accepts the address of the Vector Table
# as its first parameter (passed in r0)
ldr r2, [r0] # Move the stack pointer addr. to a temp register.
ldr r3, [r0, #4] # Move the reset vector addr. to a temp register.
mov sp, r2 # Set the stack pointer
bx r3 # Jump to the reset vector
And my problem is this:
The code prints "Hello" over serial and then calls load. The code that is loaded prints "Good Bye" and then resets the chip.
If I slowly step through the part where load calls asmJump everything works perfectly. However, when I let the code run my code experiences a 'memory fault'. I know that it is a memory fault because it causes a Hard Fault in some way (the Hard Fault handler's infinite while loop is executing when I pause after 4 or 5 seconds).
Has anyone experienced this issue before? If so, can you please let me know how to resolve it?
As you can see, I've tried to use the function attributes to fix the issue but have not managed to arrive at a solution yet. I'm hoping that someone can help me understand what the problem is in the first place.
Edit:
Thanks #JoeHass for your answer, and #MartinRosenau for your comment, I've since went on to find this SO answer that had a very thorough explanation of why I needed this label. It is a very long read but worth it.
I think you need to tell the assembler to use the unified syntax and explicitly declare your function to be a thumb function. The GNU assembler has directives for that:
.syntax unified
.section .text
.thumb_func
.global asmJump
asmJump:
The .syntax unified directive tells the assembler that you are using the modern syntax for assembly code. I think this is an unfortunate relic of some legacy syntax.
The .thumb_func directive tells the assembler that this function will be executed in thumb mode, so the value that is used for the symbol asmJump has its LSB set to one. When a Cortex-M executes a branch it checks the LSB of the target address to see if it is a one. If it is, then the target code is executed in thumb mode. Since that is the only mode supported by the Cortex-M, it will fault if the LSB of the target address is a zero.
Since you mention you have the debugger working, use it!
Look at the fault status registers to determine the fault source. Maybe it's not asmJump crashing but the code you're invoking.
If that is your all your code.. I suppose your change of SP called the segment error or something like that.
You should save your SP before changing it and restore it after the use of it.
ldr r6, =registerbackup
str sp, [r6]
#your code
...
ldr r6, =registerbackup
ldr sp, [r6]

Interpreting jump tables / branch tables

I've been slowly picking things up with assembly. I am working on a Canon Rebel T1i, here is a small snippet of a code flow chart that I am trying to understand. To my knowledge, I believe the camera has a 132MHz ARM v5 processor:
http://i.imgur.com/PtWC9.png
I have searched the bottom of google attempting to understand how jump tables work, and no matter how much I read I just can't connect things together to understand it. I understand a jump table is similar to a case statement, but I don't understand just how it moves through the table.
Ex: in this example there is only one CMP operation, so I don't understand how exactly this is working. Any help will be greatly appreciated!!
I dont think you have enough info on the screen shot to understand how it connects to your question. But a jump table in general...
In C think of an array of functions, and you have initialized each element in the array of functions, at some point later your code makes some decision and uses an index to choose one of those functions. As you mentioned a case statement, could be implemented that way but that would be the exception not the rule, all depends on the variable being used in the switch and the size/width/nature of the elements in the case statement.
You have been picking up assembly, so you understand registers, doing math with registers, storing things in registers, etc. The program counter can be used by many instructions as just another register, the difference is when you write something to it, you change what instruction is executed next.
Lets try a case statement example:
switch(bob&3)
{
case 0: ted(); break;
case 1: joe(); break;
case 2: jim(); bob=2; break;
case 3: tim(); bob=7; break;
}
What you COULD (probably would not) do is:
casetable:
.word a
.word b
.word c
.word d
caseentry:
ldr r1,=bob
ldr r0,[r1]
ldr r2,=casetable
and r0,#3
ldr pc,[r2,r0,lsl #2]
a:
bl ted
b caseend
b:
bl joe
b caseend
c:
bl jim
mov r0,#2
ldr r1,=bob
str r0,[r1]
b caseend
d:
bl tim
mov r0,#7
ldr r1,=bob
str r0,[r1]
b caseend
caseend:
So the four words after the label casetable: are the addresses where the code starts for each of the cases, case0 starts at a: case1 code starts at b: and so on. What we need to do is take the variable used by the switch statement and mathematically compute an address for the item in the table. Then we need to load the address from the table into the program counter. Writing to the program counter is the same as performing a jump.
So the C sample was crafted intentially to make this easy. First load the contents of the bob variable into r0. And it with 3. The items in the jump table are 32 bit addresses, or 4 bytes so we need to multiply r0 times 4 to get the offset in the table. A shift left of 2 is the same as a multiply by 4. And we need to add r0<<2 to the base address for the jump table. So essentially we are computing address_of(casetable)+((bob&3)<<2) The read memory at that computed address and load that value into the program counter.
With arm (you mentioned this was arm) you can do much of this in one instruction:
ldr pc,[r2,r0,lsl #2]
Load into the register pc, the contents of the memory location [r2+(r0<<2)]. r2 is the address of casetable, and r0 is bob&3.
Basically a jump table boils down to mathmatically computing an offset into a table of addresses. The table of addresses are addresses you want to jump/branch to depending on one of the parameters used in the math operation, in my example above bob is that variable. And the addresses a,b,c,d are the address choices I want to pick from based on the contents of bob. There are a zillion fun and interesting ways to do this sort of thing, but it all boils down to computing at runtime the address to branch to, and shoving that address into the program counter in a way that causes the particular processor to perform what is essentially a jump.
Note another, perhaps easier to read way to compute and jump in my example would be:
mov r3,r0,lsl #2
add r3,r2
bx r3
The cores that support thumb use the bx instruction with a register often, normally you see bx lr to return from a branch link (subroutine) call. bx lr means pc = lr. bx r3 means pc = r3.
I hope this is what you were asking about, if I have misunderstood the question, please elaborate.
EDIT:
Looking at the code on your screen shot.
cmp r0,#4
addls pc,pc,r0,lsl #2
The optional math (ADDLS add if lower or same) computes the new program counter value (a jump table is a computation stored in the program counter) based on the program counter itself plus an offset r0 times 4. For arm processors, at the time of execution, the program counter is two instructions ahead. so, mixing those two lines of code and a portion of my example:
cmp r0,#4
addls pc,pc,r0,lsl #2
ldr pc,=a
ldr pc,=b
ldr pc,=c
ldr pc,=d
...
At the time addls is executed the program counter contains the address for the ldr pc,=b instruction. So if r0 contains a 0 then 0<<2 = 0, pc plus 0 would branch to the ldr pc,=b instruction then that instruction causes a branch to the b: label. if r0 contained a 1 at the time of addls then you would execute the ldr pc,=c instruction next and so on. You can make a table as deep as you want this way. Also note that since the add is conditional, if the condition does not happen you will execute that first instruction after the addls, so maybe you want that to be an unconditional branch to branch over the table, or branch backward an loop or maybe it is a nop so that you fall into the first jump, or what I did above is have it branch to some other place. So to understand what is going on you need to example the instructions that follow the addls to figure out what the possible jump table destinations are.

Resources