lpc 1768 Secondary Boot Loader error - arm

I am working on lpc 1768 SBL which includes the following code to jump to user application.
#define NVIC_VectTab_FLASH (0x00000000)
#define USER_FLASH_START (0x00002000)
void NVIC_SetVectorTable(DWORD NVIC_VectTab, DWORD Offset)
{
NVIC_VECT_TABLE = NVIC_VectTab | (Offset & 0x1FFFFF80);
}
void execute_user_code(void)
{
void (*user_code_entry)(void);
/* Change the Vector Table to the USER_FLASH_START
in case the user application uses interrupts */
NVIC_SetVectorTable(NVIC_VectTab_FLASH, USER_FLASH_START);
user_code_entry = (void (*)(void))((USER_FLASH_START)+1);
user_code_entry();
}
It was working without any errors. After adding some heap memory to the code, the machine is stuck. I tried out different values for heap. Some of them are working. After some deep debugging ,I could find out that machine was not stuck when there is a value which is divisible by 64 is at first locations of application bin file.
ie,
When I select heap memory as 0x00002E90 ,it generates stack base as 0x10005240 . Then stack base + stack size(0x2900) gives a value = 0x10007B40.
I found this is loaded at first locations of application bin file. This value is divisible by 64 and the code is running without stuck.
But ,when I select heap memory as 0x00002E88 ,it generates stack base as 0x10005238 . Then stack base + stack size(0x2900) gives a value = 0x10007B38.
This value is not divisible by 64 and the code is stuck.
The disassembly is as follows in this case.
When stepping from address 0x0000 2000 ,it goes to hard fault handler. But in the earlier case it doesn't go to hard fault. It continues and works as well.
I cannot understand the instruction DCW and why it goes to hard fault.
Can anyone tell me the reason behind this?

Executing the vector table is what you do on older ARM7/ARM9 parts (or bigger Cortex-A ones) where the vectors are instructions, and the first entry will be a jump to the reset handler, but on Cortex-M, the vector table is pure data - the first entry is your initial stack pointer, and the second entry is the address of the reset handler - so trying to execute it is liable to go horribly wrong..
As it happens, in this case you can actually get away with executing most of that vector table by sheer chance, because the memory layout leads to each halfword of the flash addresses becoming fairly innocuous instructions:
2: 1000 asrs r0, r0, #32
4: 20d9 movs r0, #217 ; 0xd9
6: 0000 movs r0, r0
8: 20f5 movs r0, #245 ; 0xf5
a: 0000 movs r0, r0
...
Until you eventually bumble through all the remaining NOPs to 0x20d8 where you pick up the real entry point. However, the killer is that initial stack pointer, because thanks to the RAM being higher up, you get this:
0: 7b38 ldrb r0, [r7, #12]
The lower byte of 0x7bxx is where the base register is encoded, so by varying the address you have a crapshoot as to which register that is, and furthermore whether whatever junk value is left in there also happens to be a valid address to load from. Do you feel lucky?
Anyway, in summary: Rather than call the address of the vector table directly, you need to load the second word from it, then call whatever address that contains.

Related

Volatile variable not updated despite unoptimized assembly

I'm working on a dual-core Cortex-R52 ARM chip, with an instance of FreeRTOS running in each core (AMP), and using ICCARM (IAR) as my compiler.
I need to ensure that CPU1 initialize some tasks, in order to pass their handler to CPU0 through the shared memory, but both cores are executed at the same time, which creates a problem in the scenario where CPU0 gets to using the supposedly passed handler, that wasn't created yet by CPU1.
A solution I tried, was creating a volatile variable pdSTART at a dedicated address space, which keeps CPU0 looping as long as its equal to 0:
#pragma location = 0x100F900C
__no_init volatile uint8_t pdSTART;
while (pdSTART == 0)
{
vTaskDelay(10 / portTICK_PERIOD_MS);
}
As expected the generated assembly was as follows:
vTaskDelay(10 / portTICK_PERIOD_MS);
0xc3a: 0x200a MOVS R0, #10 ; 0xa
0xc3c: 0xf000 0xf93c BL vTaskDelay ; 0xeb8
while (pdSTART == 0)
0xc40: 0x7b28 LDRB R0, [R5, #0xc]
0xc42: 0x2800 CMP R0, #0
0xc44: 0xd0f9 BEQ.N 0xc3a
With register R5 containing the address 0x100F9000.
Using the debugger I made sure CPU0 reaches the while condition first and gets in the loop, I then made CPU1 change the value of pdSTART, which I confirmed on the memory map
pdSTART:
0x100f'900c: 0x0000'0001 DC32 VECTOR_RBLOCK$$Base
And yet the condition on CPU0 remains false and pdSTART is never updated, both the memory map and "Watch" window of the debugger show the variable updated.
I tried explicitly writing a read from the address of pdSTART:
void func(void)
{
asm volatile ("" : : "r" (*(uint8_t *)0x100F900C));
}
But the generated assembly was the same as the while condition.
Is the old value of pdSTART saved into some kind of stack or cache? is there a way to forcefully update it?
Thank you.

LDR Rd,-Label vs LDR Rd,[PC+Offset]

I am new to IAR and Embedded Programming. I was debugging the following C code, and found that R0 gets to hold the address of counter1 through ??main_0, while R1 gets to hold address of counter2 through [PC,#0x20]. This is completely understandable, but I cannot get why it was assigned to R0 to use LDR Rd, -label while R1 used LDR Rd, [PC+Offset] and what is the difference between the two approaches?
I only knew about literal pools after searching but It didn't answer my question. In addition, where did ??main_0 get defined in the first place?
int counter1=1;
int counter2=1;
int main()
{
int *ptr;
int *ptr2;
ptr=&counter1;
ptr2=&counter2;
++(*ptr);
++(*ptr2);
++counter2;
return 0;
}
??main_0 is not "defined" as such, it's just an auto-generated label for the address used here so that when reading the disassembly you don't have to remember that address 0x8c is that counter pointer. In fact it would make sense to have the other counter pointer as ??main_1 and I'm not sure why it shows the bare [PC, #0x20] instead. As you can see on page 144/145 of the IAR assembly reference, those two forms are just different interpretations of the same machine code. If the disassembler decides to assign a label to an address, it can show the label form, otherwise the offset form.
The machine code of the first instruction is 48 07, which means LDR.N R0, [PC, #0x1C]. The interpretation as ??main_0 (and the assignment of a label ??main_0 to address 0x8c in the first place) is just something the disassembler decided to do. You cannot know what the original assembly source (if it even exists and the compiler didn't directly compile to machine code) looked like and whether it used a label there or not.

Precise delays on Arduino using nop assembly?

I'm looking to make a very short pulse after a rising edge signal input.
The hard part here is that I would like to control (to high resolution) the timing of the delay before my pulse, and the duration of my pulse. I can easily control this by just stringing together nops by myself, hard coding delays, but I'm not sure how to do it for some arbitrary delay, with the same level of accuracy.
After a lot of headaches chasing down timers, and then eventually realizing I am ultimately limited by the interrupt routine entry/exit time, I am now settling at trying to control my delay via nops.
I had assumed this C switch statement would be what I wanted (after compiling, hoping it would become efficient and just change the program counter to the right spot), but it produces some very odd behavior...
switch(delayTime){
case 10:
__asm__ __volatile__("nop");
case 9:
__asm__ __volatile__("nop");
case 8:
__asm__ __volatile__("nop");
case 7:
__asm__ __volatile__("nop");
case 6:
__asm__ __volatile__("nop");
case 5:
__asm__ __volatile__("nop");
case 4:
__asm__ __volatile__("nop");
case 3:
__asm__ __volatile__("nop");
case 2:
__asm__ __volatile__("nop");
case 1:
__asm__ __volatile__("nop");
}
PORTD = 0x10;
...
Ideally, I would like to essentially run through some code that would compile into this: (it's some weird pseudocode of C and assembly, still not sure how to do some of it in assembly)
0x005 Reg1 = 0xFF-val1 %(where somehow 0xFF is known? / found out?)
0x006 Reg2 =0x1FF-val2
0x007 IJMP Reg1
0x008 NOP
0x009 NOP
0x00A NOP
...
0x0FF MOV 0x40, PORTD % assign the value 0x40 to the static variable "PORTD"
0x100 IJMP Reg2
0x101 NOP
0x102 NOP
0x103 NOP
0x104 NOP
...
0x1FF MOV 0x00, PORTD % assign the value 0x00 to the static variable "PORTD"
I'm just overall not sure how to find the memory location for the code after/during run time so that the "0xFF" and "0x1FF" aspects of this program are not really so bad (it seems like it's super dangerous to just, get the assembly of the code, and then hard code that in... I'd rather not do that). Also, while it's easy to just flood it with the 200+ nops, how to get the IJMP cmd to behave the way I want it to? (I honestly don't even know if that's the command I want)..
I guess in general I'm looking for some assembly command (that I can't seem to find) that allows me to "add N to Program Counter" and I can just make sure that that command is run in assembly with at least N+1 commands of assembly ahead of it, hardcoded in.
As a side note, all of this is executing inside of an interrupt routine, so I don't feel so bad about playing around with the PC... Also, I know is kinda bad blocking for up to 500 operations, but for the task at hand, timing is more important than how badly it blocks as a routine.
I'm not familiar with the AVR instruction set, but the general idea is to use the CALL instruction to put the program counter (PC) on the stack. Then use POP to move the PC to the Z register. Then you can ADD some number to the Z register, and use IJMP to jump to the resulting address.
So something along these lines
delay: call delay1 ; push the PC onto the stack
delay1: pop r30 ; pop the PC into the Z registers
pop r31
add r30,r0 ; add some amount to the PC value
addc r31,r1
ijmp ; use IJMP to jump to the resulting address
nop
nop
nop
...
Random thoughts:
On the 8MB machines, you need a third pop to remove the third byte of
the PC from the stack.
Z is only sixteen bits, therefore this code must be in the first
128KB of program memory.
I'm not sure which register (r30 or r31) is supposed to be popped
first.
The value added to Z must be relative to delay1 since call is
going to push the address of delay1 onto the stack. In other words,
the minimum amount that needs to be added is 6, since that's the
number of instructions from delay1 to the first nop.
The minimum delay is determined by the six instructions up to and
including the ijmp. You should increase r1/r0 (reduce the number of
nops) accordingly.
Like I said, I'm no expert on the AVR instruction set, so you should take this as a general suggestion, and be prepared to spend some time working out the particulars. Good luck!

Using #defined values before RAM has been initialised

I am writing the boot-up code for an ARM CPU. There is no internal RAM, but there is 1GB of DDRAM connected to the CPU, which is not directly accessible before initialisation. The code is stored in flash, initialises RAM, then copies itself and the data segment to RAM and continue execution there. My program is:
#define REG_BASE_BOOTUP 0xD0000000
#define INTER_REGS_BASE REG_BASE_BOOTUP
#define SDRAM_FTDLL_REG_DEFAULT_LEFT 0x887000
#define DRAM_BASE 0x0
#define SDRAM_FTDLL_CONFIG_LEFT_REG (DRAM_BASE+ 0x1484)
... //a lot of registers
void sdram_init() __attribute__((section(".text_sdram_init")));
void ram_init()
{
static volatile unsigned int* const sdram_ftdll_config_left_reg = (unsigned int*)(INTER_REGS_BASE + SDRAM_FTDLL_CONFIG_LEFT_REG);
... //a lot of registers assignments
*sdram_ftdll_config_left_reg = SDRAM_FTDLL_REG_DEFAULT_LEFT;
}
At the moment my program is not working correctly because the register values end up being linked to RAM, and at the moment the program tries to access them only the flash is usable.
How could I change my linker script or my program so that those values have their address in flash? Is there a way I can have those values in the text segment?
And actually are those defined values global or static data when they are declared at file scope?
Edit:
The object file is linked with the following linker script:
MEMORY
{
RAM (rw) : ORIGIN = 0x00001000, LENGTH = 12M-4K
ROM (rx) : ORIGIN = 0x007f1000, LENGTH = 60K
VECTOR (rx) : ORIGIN = 0x007f0000, LENGTH = 4K
}
SECTIONS
{
.startup :
{
KEEP((.text.vectors))
sdram_init.o(.sdram_init)
} > VECTOR
...
}
Disassembly from the register assignment:
*sdram_ftdll_config_left_reg = SDRAM_FTDLL_REG_DEFAULT_LEFT;
7f0068: e59f3204 ldr r3, [pc, #516] ; 7f0274 <sdram_init+0x254>
7f006c: e5932000 ldr r2, [r3]
7f0070: e59f3200 ldr r3, [pc, #512] ; 7f0278 <sdram_init+0x258>
7f0074: e5823000 str r3, [r2]
...
7f0274: 007f2304 .word 0x007f2304
7f0278: 00887000 .word 0x00887000
To answer your question directly -- #defined values are not stored in the program anywhere (besides possibly in debug sections). Macros are expanded at compile time as if you'd typed them out in the function, something like:
*((unsigned int *) 0xd0010000) = 0x800f800f;
The values do end up in the text segment, as part of your compiled code.
What's much more likely here is that there's something else you're doing wrong. Off the top of my head, my first guess would be that your stack isn't initialized properly, or is located in a memory region that isn't available yet.
There are a few options to solve this problem.
Use PC relative data access.
Use a custom linker script.
Use assembler.
Use PC relative data access
The trouble you have with this method is you must know details of how the compiler will generate code. #define register1 (volatile unsigned int *)0xd0010000UL is that this is being stored as a static variable which is loaded from the linked SDRAM address.
7f0068: ldr r3, [pc, #516] ; 7f0274 <sdram_init+0x254>
7f006c: ldr r2, [r3] ; !! This is a problem !!
7f0070: ldr r3, [pc, #512] ; 7f0278 <sdram_init+0x258>
7f0074: str r3, [r2]
...
7f0274: .word 0x007f2304 ; !! This memory doesn't exist.
7f0278: .word 0x00887000
You must do this,
void ram_init()
{
/* NO 'static', you can not do that. */
/* static */ volatile unsigned int* const sdram_reg =
(unsigned int*)(INTER_REGS_BASE + SDRAM_FTDLL_CONFIG_LEFT_REG);
*sdram_ftdll_config_left_reg = SDRAM_FTDLL_REG_DEFAULT_LEFT;
}
Or you may prefer to implement this in assembler as it is probably pretty obtuse as to what you can and can't do here. The main effect of the above C code is that every thing is calculated or PC relative. If you opt not to use a linker script, this must be the case. As Duskwuff points out, you also can have stack issues. If you have no ETB memory, etc, that you can use as a temporary stack then it probably best to code this in assembler.
Linker script
See gnu linker map... and many other question on using a linker script in this case. If you want specifics, you need to give actual addresses use by the processor. With this option you can annotate your function to specify which section it will live in. For instance,
void ram_init() __attribute__((section("FLASH")));
In this case, you would use the Gnu Linkers MEMORY statement and AT statements to put this code at the flash address where you desire it to run from.
Use assembler
Assembler gives you full control over memory use. You can garentee that no stack is used, that no non-PC relative code is generated and it will probably be faster to boot. Here is some table driven ARM assembler I have used for the case you describe, initializing an SDRAM controller.
/* Macro for table of register writes. */
.macro DCDGEN,type,addr,data
.long \type
.long \addr
.long \data
.endm
.set FTDLL_CONFIG_LEFT, 0xD0001484
sdram_init:
DCDGEN 4, FTDLL_CONFIG_LEFT, 0x887000
1:
init_sdram_bank:
adr r0,sdram_init
adr r1,1b
1:
/* Delay. */
mov r5,#0x100
2: subs r5,r5,#1
bne 2b
ldmia r0!, {r2,r3,r4} /* Load DCD entry. */
cmp r2,#1 /* byte? */
streqb r4,[r3] /* Store byte... */
strne r4,[r3] /* Store word. */
cmp r0,r1 /* table done? */
blo 1b
bx lr
/* Dump literal pool. */
.ltorg
Assembler has many benefits. You can also clear the bss section and setup the stack with simple routines. There are many on the Internet and I think you can probably code one yourself. The gnu ld script is also beneficial with assembler as you can ensure that sections like bss are aligned and a multiple of 4,8,etc. so that the clearing routine doesn't need special cases. Also, you will have to copy the code from flash to SDRAM after it is initialized. This is a fairly expensive/long running task and you can speed it up with some short assembler.

Which is the first address of ARM DA(Decrement After) addressing mode?

I have two questions about DA addressing mode. For example:
STMDA R0!, {R1-R7}
The start address will be R0 - (7 * 4) + 4, that is, R0-24, according to the ARM Architecture reference manual and end_address will be R0.
So:
Will the value of R1 will be stored to R0-24 or R0?
If R1 is stored to R0-24, then subsequent stores will grow towards the top of memory (from R0-24 to R0)?
When using ARM multiple stores and loads, register values are always loaded/stored in ascending order in memory. So, when using a descending multiple store, the registers are written into memory backwards. Your STMDA instruction effectively breaks down into the following steps:
store R7 at R0
store R6 at R0 - 4
store R5 at R0 - 8
store R4 at R0 - 12
store R3 at R0 - 16
store R2 at R0 - 20
store R1 at R0 - 24
subtract 28 from R0 (because of writeback - the !).
So, to answer your questions:
The value of R1 will be stored at R0 - 24. (Here, I mean the value of R0 before executing the instruction, not afterwards. You're using writeback - the ! - so after the instruction, R0 will have had 28 subtracted from it.)
R1 is stored at R0 - 24, but as explained above, R1 is the last register to have its value stored in memory. R7 is stored first, and subsequent stores from there grow downwards in memory.
I have to admit I don't know of any documentation that supports this answer. Also, it's been a while since I last did any ARM coding. However, I definitely remember wondering how the ARM stores registers in a descending multiple store. I figured this out by writing a short program to find out.
Search for arm arm The ARM Architectural reference manual...
The first address formed is the , and is the value of the base register minus four times the number of registers specified in , plus 4. Subsequent addresses are formed by incrementing the previous address by four. One address is produced for each register that is specified in .
part of the pseudocode is shown below:
address = start_address
for i = 0 to 15
if register_list[i] == 1 then
Memory[address,4] = Ri
address = address + 4
it seems that the growth method of STM has nothing to do with addressing mode when storing data?
it always stores data from lower address to higher,the addressing mode only
decides the start address based on R0?

Resources