I am modifying a startup file for an ARM Cortex-M3 microcontroller. Everything works fine so far, but I have a question regarding the need of using assembler code to perform the zero-filling of the BSS block.
By default the reset interrupt in the startup file looks as follows:
// Zero fill the bss segment.
__asm( " ldr r0, =_bss\n"
" ldr r1, =_ebss\n"
" mov r2, #0\n"
" .thumb_func\n"
" zero_loop:\n"
" cmp r0, r1\n"
" it lt\n"
" strlt r2, [r0], #4\n"
" blt zero_loop"
);
Using that code everything works as expected. However if I change the previous code for the following it stops working:
// Zero fill the bss segment.
for(pui32Dest = &_bss; pui32Dest < &_ebss; )
{
*pui32Dest++ = 0;
}
In principle both codes should do the same (fill the BSS with zeros), but the second one does not work for some reason that I fail to understand. I belive that the .thumb_func directive must play a role here, but I am not very familiar with ARM assembler. Any ideas or directions to help me understanding? Thanks!
Edit: By the way, the code to initialize the data segment (e.g. copy from Flash to RAM) is as follows and works just fine.
// Copy the data segment initializers from flash to SRAM.
pui32Src = &_etext;
for(pui32Dest = &_data; pui32Dest < &_edata; )
{
*pui32Dest++ = *pui32Src++;
}
Edit: Added the dissasembled code for both functions.
Assembly for the first looks like:
2003bc: 4806 ldr r0, [pc, #24] ; (2003d8 <zero_loop+0x14>)
2003be: 4907 ldr r1, [pc, #28] ; (2003dc <zero_loop+0x18>)
2003c0: f04f 0200 mov.w r2, #0
002003c4 <zero_loop>:
2003c4: 4288 cmp r0, r1
2003c6: bfb8 it lt
2003c8: f840 2b04 strlt.w r2, [r0], #4
2003cc: dbfa blt.n 2003c4 <zero_loop>
Assembly for the second looks like:
2003bc: f645 5318 movw r3, #23832 ; 0x5d18
2003c0: f2c2 0300 movt r3, #8192 ; 0x2000
2003c4: 9300 str r3, [sp, #0]
2003c6: e004 b.n 2003d2 <ResetISR+0x6e>
2003c8: 9b00 ldr r3, [sp, #0]
2003ca: 1d1a adds r2, r3, #4
2003cc: 9200 str r2, [sp, #0]
2003ce: 2200 movs r2, #0
2003d0: 601a str r2, [r3, #0]
2003d2: 9a00 ldr r2, [sp, #0]
2003d4: f644 033c movw r3, #18492 ; 0x483c
2003d8: f2c2 0300 movt r3, #8192 ; 0x2000
2003dc: 429a cmp r2, r3
2003de: d3f3 bcc.n 2003c8 <ResetISR+0x64>
If the initial stack is in the .bss section as suggested, you can see from the disassembly why the C code fails - it's loading the current pointer from the stack, saving the incremented pointer back to the stack, zeroing the location, then reloading the incremented pointer for the next iteration. If you zero the contents of the stack while using them, Bad Things happen.
In this case, turning on optimisation might fix it (a smart compiler should generate pretty much the same as the assembly code if it actually tries). More generally, though, it's probably safer to consider sticking with assembly code when doing things like this that would normally be done at a level below the C runtime environment - bootstrapping a C environment from C code which expects that environment to exist already is risky at best, since you can only hope the code doesn't attempt to use anything that's not yet set up.
After a quick look around (I'm not overly familiar with the specifics of Cortex-M development), it seems an alternative/additional solution might be adjusting the linker script to move the stack somewhere else.
Related
this works, but I have to do it using auto-indexing and I can not figure out that part.
writeloop:
cmp r0, #10
beq writedone
ldr r1, =array1
lsl r2, r0, #2
add r2, r1, r2
str r2, [r2]
add r0, r0, #1
b writeloop
and for data I have
.balign 4
array1: skip 40
What I had tried was this, and yes I know it is probably a poor attempt but I am new to this and do not understand
ldr r1, =array1
writeloop:
cmp r0, #10
beq writedone
ldr r2, [r1], #4
str r2, [r2]
add r0, r0, #1
b writeloop
It says segmentation fault when I try this. What is wrong? What I am thinking should happen is every time it loops through, it sets the element r2 it at = to the address of itself, and then increments to the next element and does the same thing
The ARM architechures gives several different address modes.
From ARM946E-S product overview and many other sources:
Load and store instructions have three primary addressing modes
- offset
- pre-indexed
- post-indexed.
They are formed by adding or subtracting an immediate or register-based offset to or from a base register. Register-based offsets can also be scaled with shift operations. Pre-indexed and post-indexed addressing modes update the base register with the base plus offset calculation. As the PC is a general purpose register, a 32‑bit value can be loaded directly into the PC to perform a jump to any address in the 4GB memory space.
As well, they support write back or updating of the register, hence the reason for pre-indexed and post-indexed. Post-index doesn't make much sense without write back.
Now to your issue, I believe that you want to write the values 0-9 to an array of ten words (length four bytes). Assuming this, you can use indexing and update the value via add. This leads to,
mov r0, #0 ; start value
ldr r1, =array1 ; array pointer
writeloop:
cmp r0, #10
beq writedone
str r0, [r1, r0, lsl #2] ; index with r1 base by r0 scaled by *4
add r0, r0, #1
b writeloop
writedone:
; code to jump somewhere else and not execute data.
.balign 4
array1: skip 40
For interest a more efficient loop can be done by counting and writing down,
mov r0, #9 ; start value
ldr r1, =array1 ; array pointer
writeloop:
str r0, [r1, r0, lsl #2] ; index with r1 base by r0 scaled by *4
subs r0, r0, #1
bne writeloop
Your original example was writing the pointer to the array; often referred to as 'value equals address'. If this is what you want,
ldr r0, =array_end ; finished?
ldr r1, =array1 ; array pointer
write_loop:
str r1, [r1], #4 ; add four and update after storing
cmp r0, r1
bne write_loop
; code to jump somewhere else and not execute data.
.balign 4
array1: skip 40
array_end:
I am trying to copy data from LUT from one location and copying it to another location. Here is the code
AREA Program, CODE, READONLY
EXPORT __main
ENTRY
__main
ldr r0, =SourceL ; Address of SourceL
ldr r1, =DestinationL ; Address of DestinationL
ldr r2, [r0] ; r2 contains data#SourceL
str r3, [r1] ; r3 contains data#DestinationL
mov r4, #245
str r4, [r1]
SWI &11
AREA MyData, DATA, READWRITE
SourceL
DCW &1234
ALIGN
DestinationL
DCW &0
ALIGN
END
This is a very basic ARM7TDMI assembly code.
When I see the address of Labels(in debugger). SourceL is 0x40000000 and DestinationL is 0x40000004.
But when I see the memory locations, they are having zero values .
But in LUT SourceL is having value &1234.
When I try to store some data at memory representing by label DestinationL I am able to successfully do that.
This is not the only case of this above code.
The example code in the ARM website
AREA StrCopy, CODE, READONLY
EXPORT __main
ENTRY ; Mark first instruction to execute
__main
LDR r1, =srcstr ; Pointer to first string
LDR r0, =dststr ; Pointer to second string
BL strcopy ; Call subroutine to do copy
stop
MOV r0, #0x18 ; angel_SWIreason_ReportException
LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit
SVC #0x123456 ; ARM semihosting (formerly SWI)
strcopy
LDRB r2, [r1],#1 ; Load byte and update address
STRB r2, [r0],#1 ; Store byte and update address
CMP r2, #0 ; Check for zero terminator
BNE strcopy ; Keep going if not
MOV pc,lr ; Return
AREA Strings, DATA, READWRITE
srcstr DCB "First string - source",0
dststr DCB "Second string - destination",0
END
The memory corresponding to srcstr is showing zero only.
I am using KEIL IDE for programming.
Why data in LUT is not shown in the memory location?
Thanks!
Your code does not copy #SourceL to #DestinationL. It only loads data into registers and does nothing with it (exactly per the comments).
ldr r0, =SourceL ; Address of SourceL
ldr r1, =DestinationL ; Address of DestinationL
ldr r2, [r0] ; r2 contains data#SourceL
str r2, [r1] ; Copy data#SourceL to DestinationL <<ADD THIS
I am trying to boot a freertos app from UEFI on Qemu
When i run the app from uboot, using the below commands it runs without any errors
fatload mmc 0 80300000 rtosdemo.bin
go 0x80300000
An uefi application loads the elf file at 0x80300000 and then I tried two options.
My boot.s file is below
`start:
_start:
_mainCRTStartup:
ldr r0, .LC6
msr CPSR_c, #MODE_UND|I_BIT|F_BIT /* Undefined Instruction */
mov sp, r0
sub r0, r0, #UND_STACK_SIZE
msr CPSR_c, #MODE_ABT|I_BIT|F_BIT /* Abort Mode */
mov sp, r0
...
`
Disassembly file
`
80300000 <_undf-0x20>:
80300000: ea001424 b 80305098 <start>
80300004: e59ff014 ldr pc, [pc, #20] ; 80300020 <_undf>
80300008: e59ff014 ldr pc, [pc, #20] ; 80300024 <_swi>
8030000c: e59ff014 ldr pc, [pc, #20] ; 80300028 <_pabt>
80300010: e59ff014 ldr pc, [pc, #20] ; 8030002c <_dabt>
...........
80305098 <start>:
80305098: e59f00f4 ldr r0, [pc, #244] ; 80305194 <endless_loop+0x18>
8030509c: e321f0db msr CPSR_c, #219 ; 0xdb
803050a0: e1a0d000 mov sp, r0
803050a4: e2400004 sub r0, r0, #4
`
use goto 0x80305098 which is the entry point addr specified in the elf file. Now it jumps to ldr r0, .. instruction but after that it just seems to be jumping some where in the middle of some function rather than stepping into msr instruction.
Since in uboot its jumping to 0x80300000, I tried by jumping to that addr, now it goes to instruction b 80305098 <start>, but after that instruction instead of jumping to 80305098 it just goes to the next instruction ldr pc, [pc, #20].
So any ideas on where I am going wrong?
EDIT:
I updated boot.s to
start:
_start:
_mainCRTStartup:
.thumb
thumb_entry_point:
blx arm_entry_point
.arm
arm_entry_point:
ldr r0, .LC6
msr CPSR_c, #MODE_UND|I_BIT|F_BIT /* Undefined Instruction Mode */
mov sp, r0
Now it works fine.
This is ARM code, but it sounds very much like it's being jumped to in Thumb state. The word e59f00f4 will be interpreted in Thumb as lsls r4, r6, #3; b 0x80304bde (if I've got my address maths right), which seems consistent with "jumping somewhere in the middle of some function". You can verify by checking bit 5 of the CPSR (assuming you're not in user mode) - if it's set, you've come in in Thumb state.
If that is the case, then the 'proper' solution probably involves making the UEFI loader application clever enough to do the right kind of interworking branch, but a quick and easy hack would be to place a shim somewhere just for the initial entry, something like:
.thumb
thumb_entry_point:
blx arm_entry_point
.arm
arm_entry_point:
b start
I need to reduce the code bloat for the Cortex-M0 microprocessor.
At startup the ROM data has to be copied to the RAM data once. Therefore I have this piece of code:
void __startup( void ){
extern unsigned int __data_init_start;
extern unsigned int __data_start;
extern unsigned int __data_end;
// copy .data section from flash to ram
s = & __data_init_start;
d = & __data_start;
e = & __data_end;
while( d != e ){
*d++ = *s++;
}
}
The assembly code that is generated by the compiler looks like this:
ldr r1, .L10+8
ldr r2, .L10+12
sub r0, r1, r2
lsr r3, r0, #2
add r3, r3, #1
lsl r1, r3, #2
mov r3, #0
.L4:
add r3, r3, #4
cmp r3, r1
beq .L9
.L5:
ldr r4, .L10+16
add r0, r2, r3
add r4, r3, r4
sub r4, r4, #4
ldr r4, [r4]
sub r0, r0, #4
str r4, [r0]
b .L4
How can I optimize this code so the code size is at minimum?
The compiler (or you!) does not realize that the range to copy is end - start. There seems to be some unnecessarily shuffling of data going on -- the 2 add and the sub in the loop. Also, it seems to me the compiler makes sure that the number of copies to make is a multiple of 4. An obvious optimization, then, is to make sure it is in advance! Below I assume it is (if not, the bne will fail and happily keep on copying and trample all over your memory).
Using my decade-old ARM assembler knowlegde (yes, that is a major disclaimer), and post-incrementing, I think the following short snippet is what it can be condensed to. From 18 instructions down to 8, not too bad. If it works.
ldr r1, __data_init_start
ldr r2, __data_start
ldr r3, __data_end
sub r4, r3, r2
.L1:
ldr r3, [r1], #4 ; safe to re-use r3 here
str r3, [r2], #4
subs r4, r4, #4
bne L1
May be that platform guarantees that writing to an unsigned int * you may change an unsigned int * value (i.e. it doesn't take advantage of type mismatch aliasing rules).
Then the code is inefficient because e is a global variable and the generated code logic must take in account that writing to *d may change the value of e.
Making at least e a local should solve this problem (most compilers know that aliasing a local that never had its address taken is not possible from a C point of view).
Here is the c-source code line which crashes on an armv7:
ret = fnPtr (param1, param2);
In the debugger, fnPtr has an address of 0x04216c00. When I disassemble at the pc where it's pointing at the statement above, here is what I get:
0x18918e: movw r0, #0x73c
0x189192: movt r0, #0x1
0x189196: add r0, r2
0x189198: ldr r0, [r0]
0x18919a: str r0, [sp, #0x20]
0x18919c: ldr r0, [sp, #0x20]
0x18919e: ldr r1, [sp, #0x28]
0x1891a0: ldr r2, [sp, #0x2c]
0x1891a2: str r0, [sp, #0x14]
0x1891a4: mov r0, r1
0x1891a6: mov r1, r2
0x1891a8: ldr r2, [sp, #0x14]
0x1891aa: blx r2
Now, when I disassemble the memory at address $r2 (=0x4216c00), I get what is seemingly valid code that should be executed without any problem:
(lldb) disassemble -s 0x4216c00 -C 10
0x4216c00: push {r4, r5, r6, r7, lr}
0x4216c04: add r7, sp, #0xc
0x4216c08: push {r8, r10, r11}
0x4216c0c: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
0x4216c10: sub r7, r7, #0x280
0x4216c14: mov r6, r0
0x4216c18: bx r1
0x4216c1c: add r7, r7, #0x280
Yet what really happens is this:
EXC_BAD_ACCESS (code=2, address=0x4216c00)
Can anyone explain what is wrong and why the address is considered illegal?
Full disclosure: I am no assembly expert. The code compiled and linked is all c-code. Compiler is clang.
Check the value of r2 before calling executing blx instruction. It might be odd, telling the cpu that address is in thumb mode however from the listing it looks like in arm mode.
Try forcing clang to only arm mode by -mno-thumb to test this.
The EXC_BAD_ACCESS exception has two bits of data in it, the first is the "kern_return_t" number describing the access failure, and the second is the address accessed. In your case the code is 2, which means (from /usr/include/mach/kern_return.h):
#define KERN_PROTECTION_FAILURE 2
/* Specified memory is valid, but does not permit the
* required forms of access.
*/
Not sure why this is happening, sounds like you are trying to execute code that doesn't have the execute permission set. What does:
(lldb) image lookup -va 0x4216c00
say?
BTW, the exception types are in /usr/include/mach/exception_types.h, and if the codes have machine specific meanings, those will be in, e.g. /usr/include/mach/i386/exception.h) For ARM info you may have to look in the header in the Xcode SDK.