LDR Rd,-Label vs LDR Rd,[PC+Offset] - c

I am new to IAR and Embedded Programming. I was debugging the following C code, and found that R0 gets to hold the address of counter1 through ??main_0, while R1 gets to hold address of counter2 through [PC,#0x20]. This is completely understandable, but I cannot get why it was assigned to R0 to use LDR Rd, -label while R1 used LDR Rd, [PC+Offset] and what is the difference between the two approaches?
I only knew about literal pools after searching but It didn't answer my question. In addition, where did ??main_0 get defined in the first place?
int counter1=1;
int counter2=1;
int main()
{
int *ptr;
int *ptr2;
ptr=&counter1;
ptr2=&counter2;
++(*ptr);
++(*ptr2);
++counter2;
return 0;
}

??main_0 is not "defined" as such, it's just an auto-generated label for the address used here so that when reading the disassembly you don't have to remember that address 0x8c is that counter pointer. In fact it would make sense to have the other counter pointer as ??main_1 and I'm not sure why it shows the bare [PC, #0x20] instead. As you can see on page 144/145 of the IAR assembly reference, those two forms are just different interpretations of the same machine code. If the disassembler decides to assign a label to an address, it can show the label form, otherwise the offset form.
The machine code of the first instruction is 48 07, which means LDR.N R0, [PC, #0x1C]. The interpretation as ??main_0 (and the assignment of a label ??main_0 to address 0x8c in the first place) is just something the disassembler decided to do. You cannot know what the original assembly source (if it even exists and the compiler didn't directly compile to machine code) looked like and whether it used a label there or not.

Related

Code execution exploit Cortex M4

For testing the MPU and playing around with exploits, I want to execute code from a local buffer running on my STM32F4 dev board.
int main(void)
{
uint16_t func[] = { 0x0301f103, 0x0301f103, 0x0301f103 };
MPU->CTRL = 0;
unsigned int address = (void*)&func+1;
asm volatile(
"mov r4,%0\n"
"ldr pc, [r4]\n"
:
: "r"(address)
);
while(1);
}
In main, I first turn of the MPU. In func my instructions are stored. In the ASM part I load the address (0x2001ffe8 +1 for thumb) into the program counter register. When stepping through the code with GDB, in R4 the correct value is stored and then transfered to PC register. But then I will end up in the HardFault Handler.
Edit:
The stack looks like this:
0x2001ffe8: 0x0301f103 0x0301f103 0x0301f103 0x2001ffe9
The instructions are correct in the memory. Definitive Guide to Cortex says region 0x20000000–0x3FFFFFFF is the SRAM and "this region is executable,
so you can copy program code here and execute it".
You are assigning 32 bit values to a 16 bit array.
Your instructions dont terminate, they continue on to run into whatever is found in ram, so that will crash.
You are not loading the address to the array into the program counter you are loading the first item in the array into the program counter, this will crash, you created a level of indirection.
Look at the BX instruction for this rather than ldr pc
You did not declare the array as static, so the array can be optimized out as dead and unused, so this can cause it to crash.
The compiler should also complain that you are assigning a void* to an unsigned variable, so a typecast is wanted there.
As a habit I recommend address|=1 rather than +=1, in this case either will function.

lpc 1768 Secondary Boot Loader error

I am working on lpc 1768 SBL which includes the following code to jump to user application.
#define NVIC_VectTab_FLASH (0x00000000)
#define USER_FLASH_START (0x00002000)
void NVIC_SetVectorTable(DWORD NVIC_VectTab, DWORD Offset)
{
NVIC_VECT_TABLE = NVIC_VectTab | (Offset & 0x1FFFFF80);
}
void execute_user_code(void)
{
void (*user_code_entry)(void);
/* Change the Vector Table to the USER_FLASH_START
in case the user application uses interrupts */
NVIC_SetVectorTable(NVIC_VectTab_FLASH, USER_FLASH_START);
user_code_entry = (void (*)(void))((USER_FLASH_START)+1);
user_code_entry();
}
It was working without any errors. After adding some heap memory to the code, the machine is stuck. I tried out different values for heap. Some of them are working. After some deep debugging ,I could find out that machine was not stuck when there is a value which is divisible by 64 is at first locations of application bin file.
ie,
When I select heap memory as 0x00002E90 ,it generates stack base as 0x10005240 . Then stack base + stack size(0x2900) gives a value = 0x10007B40.
I found this is loaded at first locations of application bin file. This value is divisible by 64 and the code is running without stuck.
But ,when I select heap memory as 0x00002E88 ,it generates stack base as 0x10005238 . Then stack base + stack size(0x2900) gives a value = 0x10007B38.
This value is not divisible by 64 and the code is stuck.
The disassembly is as follows in this case.
When stepping from address 0x0000 2000 ,it goes to hard fault handler. But in the earlier case it doesn't go to hard fault. It continues and works as well.
I cannot understand the instruction DCW and why it goes to hard fault.
Can anyone tell me the reason behind this?
Executing the vector table is what you do on older ARM7/ARM9 parts (or bigger Cortex-A ones) where the vectors are instructions, and the first entry will be a jump to the reset handler, but on Cortex-M, the vector table is pure data - the first entry is your initial stack pointer, and the second entry is the address of the reset handler - so trying to execute it is liable to go horribly wrong..
As it happens, in this case you can actually get away with executing most of that vector table by sheer chance, because the memory layout leads to each halfword of the flash addresses becoming fairly innocuous instructions:
2: 1000 asrs r0, r0, #32
4: 20d9 movs r0, #217 ; 0xd9
6: 0000 movs r0, r0
8: 20f5 movs r0, #245 ; 0xf5
a: 0000 movs r0, r0
...
Until you eventually bumble through all the remaining NOPs to 0x20d8 where you pick up the real entry point. However, the killer is that initial stack pointer, because thanks to the RAM being higher up, you get this:
0: 7b38 ldrb r0, [r7, #12]
The lower byte of 0x7bxx is where the base register is encoded, so by varying the address you have a crapshoot as to which register that is, and furthermore whether whatever junk value is left in there also happens to be a valid address to load from. Do you feel lucky?
Anyway, in summary: Rather than call the address of the vector table directly, you need to load the second word from it, then call whatever address that contains.

Using #defined values before RAM has been initialised

I am writing the boot-up code for an ARM CPU. There is no internal RAM, but there is 1GB of DDRAM connected to the CPU, which is not directly accessible before initialisation. The code is stored in flash, initialises RAM, then copies itself and the data segment to RAM and continue execution there. My program is:
#define REG_BASE_BOOTUP 0xD0000000
#define INTER_REGS_BASE REG_BASE_BOOTUP
#define SDRAM_FTDLL_REG_DEFAULT_LEFT 0x887000
#define DRAM_BASE 0x0
#define SDRAM_FTDLL_CONFIG_LEFT_REG (DRAM_BASE+ 0x1484)
... //a lot of registers
void sdram_init() __attribute__((section(".text_sdram_init")));
void ram_init()
{
static volatile unsigned int* const sdram_ftdll_config_left_reg = (unsigned int*)(INTER_REGS_BASE + SDRAM_FTDLL_CONFIG_LEFT_REG);
... //a lot of registers assignments
*sdram_ftdll_config_left_reg = SDRAM_FTDLL_REG_DEFAULT_LEFT;
}
At the moment my program is not working correctly because the register values end up being linked to RAM, and at the moment the program tries to access them only the flash is usable.
How could I change my linker script or my program so that those values have their address in flash? Is there a way I can have those values in the text segment?
And actually are those defined values global or static data when they are declared at file scope?
Edit:
The object file is linked with the following linker script:
MEMORY
{
RAM (rw) : ORIGIN = 0x00001000, LENGTH = 12M-4K
ROM (rx) : ORIGIN = 0x007f1000, LENGTH = 60K
VECTOR (rx) : ORIGIN = 0x007f0000, LENGTH = 4K
}
SECTIONS
{
.startup :
{
KEEP((.text.vectors))
sdram_init.o(.sdram_init)
} > VECTOR
...
}
Disassembly from the register assignment:
*sdram_ftdll_config_left_reg = SDRAM_FTDLL_REG_DEFAULT_LEFT;
7f0068: e59f3204 ldr r3, [pc, #516] ; 7f0274 <sdram_init+0x254>
7f006c: e5932000 ldr r2, [r3]
7f0070: e59f3200 ldr r3, [pc, #512] ; 7f0278 <sdram_init+0x258>
7f0074: e5823000 str r3, [r2]
...
7f0274: 007f2304 .word 0x007f2304
7f0278: 00887000 .word 0x00887000
To answer your question directly -- #defined values are not stored in the program anywhere (besides possibly in debug sections). Macros are expanded at compile time as if you'd typed them out in the function, something like:
*((unsigned int *) 0xd0010000) = 0x800f800f;
The values do end up in the text segment, as part of your compiled code.
What's much more likely here is that there's something else you're doing wrong. Off the top of my head, my first guess would be that your stack isn't initialized properly, or is located in a memory region that isn't available yet.
There are a few options to solve this problem.
Use PC relative data access.
Use a custom linker script.
Use assembler.
Use PC relative data access
The trouble you have with this method is you must know details of how the compiler will generate code. #define register1 (volatile unsigned int *)0xd0010000UL is that this is being stored as a static variable which is loaded from the linked SDRAM address.
7f0068: ldr r3, [pc, #516] ; 7f0274 <sdram_init+0x254>
7f006c: ldr r2, [r3] ; !! This is a problem !!
7f0070: ldr r3, [pc, #512] ; 7f0278 <sdram_init+0x258>
7f0074: str r3, [r2]
...
7f0274: .word 0x007f2304 ; !! This memory doesn't exist.
7f0278: .word 0x00887000
You must do this,
void ram_init()
{
/* NO 'static', you can not do that. */
/* static */ volatile unsigned int* const sdram_reg =
(unsigned int*)(INTER_REGS_BASE + SDRAM_FTDLL_CONFIG_LEFT_REG);
*sdram_ftdll_config_left_reg = SDRAM_FTDLL_REG_DEFAULT_LEFT;
}
Or you may prefer to implement this in assembler as it is probably pretty obtuse as to what you can and can't do here. The main effect of the above C code is that every thing is calculated or PC relative. If you opt not to use a linker script, this must be the case. As Duskwuff points out, you also can have stack issues. If you have no ETB memory, etc, that you can use as a temporary stack then it probably best to code this in assembler.
Linker script
See gnu linker map... and many other question on using a linker script in this case. If you want specifics, you need to give actual addresses use by the processor. With this option you can annotate your function to specify which section it will live in. For instance,
void ram_init() __attribute__((section("FLASH")));
In this case, you would use the Gnu Linkers MEMORY statement and AT statements to put this code at the flash address where you desire it to run from.
Use assembler
Assembler gives you full control over memory use. You can garentee that no stack is used, that no non-PC relative code is generated and it will probably be faster to boot. Here is some table driven ARM assembler I have used for the case you describe, initializing an SDRAM controller.
/* Macro for table of register writes. */
.macro DCDGEN,type,addr,data
.long \type
.long \addr
.long \data
.endm
.set FTDLL_CONFIG_LEFT, 0xD0001484
sdram_init:
DCDGEN 4, FTDLL_CONFIG_LEFT, 0x887000
1:
init_sdram_bank:
adr r0,sdram_init
adr r1,1b
1:
/* Delay. */
mov r5,#0x100
2: subs r5,r5,#1
bne 2b
ldmia r0!, {r2,r3,r4} /* Load DCD entry. */
cmp r2,#1 /* byte? */
streqb r4,[r3] /* Store byte... */
strne r4,[r3] /* Store word. */
cmp r0,r1 /* table done? */
blo 1b
bx lr
/* Dump literal pool. */
.ltorg
Assembler has many benefits. You can also clear the bss section and setup the stack with simple routines. There are many on the Internet and I think you can probably code one yourself. The gnu ld script is also beneficial with assembler as you can ensure that sections like bss are aligned and a multiple of 4,8,etc. so that the clearing routine doesn't need special cases. Also, you will have to copy the code from flash to SDRAM after it is initialized. This is a fairly expensive/long running task and you can speed it up with some short assembler.

Using GCC inline assembly with instructions that take immediate values

The problem
I'm working on a custom OS for an ARM Cortex-M3 processor. To interact with my kernel, user threads have to generate a SuperVisor Call (SVC) instruction (previously known as SWI, for SoftWare Interrupt). The definition of this instruction in the ARM ARM is:
Which means that the instruction requires an immediate argument, not a register value.
This is making it difficult for me to architect my interface in a readable fashion. It requires code like:
asm volatile( "svc #0");
when I'd much prefer something like
svc(SVC_YIELD);
However, I'm at a loss to construct this function, because the SVC instruciton requires an immediate argument and I can't provide that when the value is passed in through a register.
The kernel:
For background, the svc instruction is decoded in the kernel as follows
#define SVC_YIELD 0
// Other SVC codes
// Called by the SVC interrupt handler (not shown)
void handleSVC(char code)
{
switch (code) {
case SVC_YIELD:
svc_yield();
break;
// Other cases follow
This case statement is getting rapidly out of hand, but I see no way around this problem. Any suggestions are welcome.
What I've tried
SVC with a register argument
I initially considered
__attribute__((naked)) svc(char code)
{
asm volatile ("scv r0");
}
but that, of course, does not work as SVC requires a register argument.
Brute force
The brute-force attempt to solve the problem looks like:
void svc(char code)
switch (code) {
case 0:
asm volatile("svc #0");
break;
case 1:
asm volatile("svc #1");
break;
/* 253 cases omitted */
case 255:
asm volatile("svc #255");
break;
}
}
but that has a nasty code smell. Surely this can be done better.
Generating the instruction encoding on the fly
A final attempt was to generate the instruction in RAM (the rest of the code is running from read-only Flash) and then run it:
void svc(char code)
{
asm volatile (
"orr r0, 0xDF00 \n\t" // Bitwise-OR the code with the SVC encoding
"push {r1, r0} \n\t" // Store the instruction to RAM (on the stack)
"mov r0, sp \n\t" // Copy the stack pointer to an ordinary register
"add r0, #1 \n\t" // Add 1 to the address to specify THUMB mode
"bx r0 \n\t" // Branch to newly created instruction
"pop {r1, r0} \n\t" // Restore the stack
"bx lr \n\t" // Return to caller
);
}
but this just doesn't feel right either. Also, it doesn't work - There's something I'm doing wrong here; perhaps my instruction isn't properly aligned or I haven't set up the processor to allow running code from RAM at this location.
What should I do?
I have to work on that last option. But still, it feels like I ought to be able to do something like:
__attribute__((naked)) svc(char code)
{
asm volatile ("scv %1"
: /* No outputs */
: "i" (code) // Imaginary directive specifying an immediate argument
// as opposed to conventional "r"
);
}
but I'm not finding any such option in the documentation and I'm at a loss to explain how such a feature would be implemented, so it probably doesn't exist. How should I do this?
You want to use a constraint to force the operand to be allocated as an 8-bit immediate. For ARM, that is constraint I. So you want
#define SVC(code) asm volatile ("svc %0" : : "I" (code) )
See the GCC documentation for a summary of what all the constaints are -- you need to look at the processor-specific notes to see the constraints for specific platforms. In some cases, you may need to look at the .md (machine description) file for the architecture in the gcc source for full information.
There's also some good ARM-specific gcc docs here. A couple of pages down under the heading "Input and output operands" it provides a table of all the ARM constraints
What about using a macro:
#define SVC(i) asm volatile("svc #"#i)
As noted by Chris Dodd in the comments on the macro, it doesn't quite work, but this does:
#define STRINGIFY0(v) #v
#define STRINGIFY(v) STRINGIFY0(v)
#define SVC(i) asm volatile("svc #" STRINGIFY(i))
Note however that it won't work if you pass an enum value to it, only a #defined one.
Therefore, Chris' answer above is the best, as it uses an immediate value, which is what's required, for thumb instructions at least.
My solution ("Generating the instruction encoding on the fly"):
#define INSTR_CODE_SVC (0xDF00)
#define INSTR_CODE_BX_LR (0x4770)
void svc_call(uint32_t svc_num)
{
uint16_t instrs[2];
instrs[0] = (uint16_t)(INSTR_CODE_SVC | svc_num);
instrs[1] = (uint16_t)(INSTR_CODE_BX_LR);
// PC = instrs (or 1 -> thumb mode)
((void(*)(void))((uint32_t)instrs | 1))();
}
It works and its much better than switch-case variant, which takes ~2kb ROM for 256 svc's. This func does not have to be placed in RAM section, FLASH is ok.
You can use it if svc_num should be a runtime variable.
As discussed in this question, the operand of SVC is fixed, that is it should be known to the preprocessor, and it is different from immediate Data-processing operands.
The gcc manual reads
'I'- Integer that is valid as an immediate operand in a data processing instruction. That is, an integer in the range 0 to 255 rotated by a multiple of 2.
Therefore the answers here that use a macro are preferred, and the answer of Chris Dodd is not guaranteed to work, depending on the gcc version and optimization level. See the discussion of the other question.
I wrote one handler recently for my own toy OS on Cortex-M. Works if tasks use PSP pointer.
Idea:
Get interrupted process's stack pointer, get process's stacked PC, it will have the instruction address of instruction after SVC, look up the immediate value in the instruction. It's not as hard as it sounds.
uint8_t __attribute__((naked)) get_svc_code(void){
__asm volatile("MSR R0, PSP"); //Get Process Stack Pointer (We're in SVC ISR, so currently MSP in use)
__asm volatile("ADD R0, #24"); //Pointer to stacked process's PC is in R0
__asm volatile("LDR R1, [R0]"); //Instruction Address after SVC is in R1
__asm volatile("SUB R1, R1, #2"); //Subtract 2 bytes from the address of the current instruction. Now R1 contains address of SVC instruction
__asm volatile("LDRB R0, [R1]"); //Load lower byte of 16-bit instruction into R0. It's immediate value.
//Value is in R0. Function can return
}

ARM-GAS: how to load address of static array defined in some c-file (PIC and regular code)

I have some simple static array defined in c-file (const int data_input[1024];)and I need to access it from my assembly code. What's the right way to do it?
So far, I've been doing it this way:
.global data_input
data_input_ptr:
.word data_input
my_function:
adr r1, data_input_ptr
bx lr
AFAIK, adr is pseudo-op stands to ldr r1, =data_input_ptr or something like that.
To me the way I do it seems not to be very correct: first of all that adr r1, data_input might potentially use pc relative addressing directly if it detected at link time that it's possible.
Another issue is about PIC: what if the code has to be position independent. How then does it work if value of data_input_ptr has to be initialized by the loader (am I correct about that?)
The way you are doing it should work, but another way of handling it would be to use the address of the array as a second argument to the assembly function. Something like this:
Call from c-file:
my_function(original_argument, data_input);
my_function.h:
void my_function(void *original_argument, int *array_address);
my_function.S:
my_function:
/* r1 already contains data_input_ptr since second argument ends up in r1 */
bx lr

Resources