ARM - Load and Store assembly instructions

ARM - Load and Store assembly instructions - c

I am trying to load and store data from two different arm registers.
int testing[64*1024] __attribute__ ((aligned (8192)));
__asm__("MOV r0, %0" :: "r" (testing) : "r0");
__asm__("STR R5,[R0];");
In my initial attempt I tried to store some data pointed to by the register r0 to register r5. There are absolutely no compilation problems but the data in the register cannot be accessed.
It is the same case for Load as well.
LDR R1,[R0]
(gdb) info registers
r0 0xb6000 745472
r1 0x1 1
r2 0x0 0
r3 0xb6000 745472
r4 0x8961 35169
r5 0x0 0
r6 0x0 0
r7 0xbeba9664 3199899236
r8 0x0 0
r9 0xefb9 61369
r10 0xf02d 61485
r11 0x0 0
r12 0x0 0
sp 0xbeba9664 0xbeba9664
lr 0x89cb 35275
pc 0xeace 0xeace <test48+14>
cpsr 0x60000030 1610612784
(gdb) bt
#0 0x0000eace in test48 ()
#1 0x000089ca in main ()
(gdb) x/x $r5
0x0: Cannot access memory at address 0x0
(gdb) x/x $r0
0xb6000 <testing>: 0x00000000
Essentially I am trying to achieve some memory inline addressing using ldr and str.
I took help of this guide while I was building my example
Any idea where I am going wrong

Your comment and your code do not match:
In my initial attempt I tried to store some data pointed to by the register r0 to register r5 [...]
__asm__("STR R5,[R0];");
The instruction you wrote stores the value of R5 into the memory location that R0 points to. The register R5 does not point to any memory location - its value is 0x00 in your example.
The __asm__ statements do not declare the R5 register used in any way, so the compiler is free to put any temporary value or variable in it. This also explains:
(gdb) x/x $r5
0x0: Cannot access memory at address 0x0
Your gdb command tries to access the memory location that R5 points to - but it does not point at any.

Related

Why PC is loaded with address containing undefined instruction? - STM32H745

I have a problem enabling the MPU on the STM32H745 MCU. I wanted to just disable MPU, set region and then enable it. However, HardFault showed up. I thought it was a matter of wrong region settings. But after commenting, I noticed the problem occurs just by turning on the MPU.
Code:
static syslog_status_t setMPU_sysLog(void)
{
[...]
ARM_MPU_Disable();
/* ARM_MPU_SetRegion(ARM_MPU_RBAR(0, (uint32_t)NON_CACHABLE_RAM4_D3_BASE_ADDR),
ARM_MPU_RASR(0UL, ARM_MPU_AP_FULL, 1UL, 0UL, 0UL, 1UL, 0x00UL, ARM_MPU_REGION_SIZE_8KB)); */
HALT_IF_DEBUGGING();
ARM_MPU_Enable(0);
return SYSLOG_OK;
}
I use just CMSIS API, so I check assembly and woops:
>0x80003ec <setMPU_sysLog+36> bkpt 0x0001
0x80003ee <setMPU_sysLog+38> ldr r3, [pc, #28] ; (0x800040c <setMPU_sysLog+68>)
0x80003f0 <setMPU_sysLog+40> movs r2, #1
0x80003f2 <setMPU_sysLog+42> str.w r2, [r3, #148] ; 0x94
0x80003f6 <setMPU_sysLog+46> ldr r2, [r3, #36] ; 0x24
0x80003f8 <setMPU_sysLog+48> orr.w r2, r2, #65536 ; 0x10000
0x80003fc <setMPU_sysLog+52> str r2, [r3, #36] ; 0x24
0x80003fe <setMPU_sysLog+54> dsb sy
0x8000402 <setMPU_sysLog+58> isb sy
0x8000406 <setMPU_sysLog+62> movs r0, #0
0x8000408 <setMPU_sysLog+64> bx lr
0x800040a <setMPU_sysLog+66> nop
0x800040c <setMPU_sysLog+68> ; <UNDEFINED> instruction: 0xed00e000
0x8000410 <initSysLog> push {r3, lr}
Load UNDEFINED instruction to PC in 0x80003ee? What could cause this compilator(?) error? Has anyone encountered such a problem? How to start of debugging it? Additional debug information below:
0x08000398 in my_fault_handler_c (frame=0x2001ffb0) at CM7/exceptionHandlers.c:29
29 HALT_IF_DEBUGGING();
(gdb) p/a *frame
$1 = {r0 = 0xde684c0e, r1 = 0x6cefc92c, r2 = 0xed5b5cfb, r3 = 0xa3feeed1, r12 = 0xef082047, lr = 0xd7121a9e, return_address = 0xf16a13cf, xpsr = 0xf60e2caf}
Fields in SCB > HFSR:
VECTTBL: 0 Vector table hard fault
FORCED: 1 Forced hard fault
DEBUG_VT: 0 Reserved for Debug use
Fields in SCB > CFSR_UFSR_BFSR_MMFSR:
IACCVIOL: 1
DACCVIOL: 0
MUNSTKERR: 0
MSTKERR: 1
MLSPERR: 0
MMARVALID: 0
IBUSERR: 0 Instruction bus error
PRECISERR: 0 Precise data bus error
IMPRECISERR: 0 Imprecise data bus error
UNSTKERR: 0 Bus fault on unstacking for a return from exception
STKERR: 0 Bus fault on stacking for exception entry
LSPERR: 0 Bus fault on floating-point lazy state preservation
BFARVALID: 0 Bus Fault Address Register (BFAR) valid flag
UNDEFINSTR: 0 Undefined instruction usage fault
INVSTATE: 0 Invalid state usage fault
INVPC: 0 Invalid PC load usage fault
NOCP: 0 No coprocessor usage fault.
UNALIGNED: 0 Unaligned access usage fault
DIVBYZERO: 0 Divide by zero usage fault
arm-none-eabi-gcc -v
cc version 10.2.1 20201103 (release) (GNU Arm Embedded Toolchain 10-2020-q4-major)

The problem was to not set PRIVDEFENA bit. So turning on the MPU as follows helped:
ARM_MPU_Enable(MPU_CTRL_PRIVDEFENA_Msk);

It is not the undefined instruction. It is the value (in this case the address of the hardware register block) used by your function. ARM Thumb instructions cannot set the register with 32 bits value, so it has to be stored in the memory and loaded from there.
It is not a bug - it is something very standard.
Example:
typedef struct
{
volatile uint32_t reg1;
volatile uint32_t reg2;
}MYREG_t;
#define MYREG ((MYREG_t *)0xed00e000)
void foo(uint32_t val)
{
MYREG -> reg2 = val;
}
void bar(uint32_t val)
{
MYREG -> reg1 = val;
}
and generated code:
foo:
ldr r3, .L3
str r0, [r3, #4]
bx lr
.L3:
.word -318709760
bar:
ldr r3, .L6
str r0, [r3]
bx lr
.L6:
.word -318709760
The places where this data is stored and never reached by the code. The same is in your code. It returns from the function before getting tere (bc lr)
If you use the disassembly tool (as you did), it will not understand it and show undefined instructions.
BTW are you using arm-none-eabi-gdb? as it shows nonsense values of the registers,

ARM PC value after Reset

I am new to MCU and trying to figure out how arm (Cortex M3-M4) based MCU boots. Because booting is specific to any SOC, I took an example hardware board of STM for case study.
Board: STMicroelectronics – STM32L476 32-bit.
In this board when booting mode is (x0)"Boot from User Flash", board maps 0x0000000 address to flash memory address. On flash memory I have pasted my binary with first 4 bytes pointing to vector table first entry, which is esp. Now if I press reset button ARM documentation says PC value will be set to 0x00000000.
CPU generally executes stream of instructions based on PC -> PC + 1 loop. In this case if I see PC value points to esp, which is not instruction. How does Arm CPU does the logic of not use this instruction address, but do a jump to value store at address 0x00000004?
Or this is the case:
Reset produces a special hardware interrupt and cause PC value to be value at 0x00000004, if this is the case why Arm documentation says it sets PC value to 0x00000000?
Ref: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka3761.html
What values are in ARM registers after a power-on reset? Applies to:
ARM1020/22E, ARM1026EJ-S, ARM1136, ARM720T, ARM7EJ-S, ARM7TDMI,
ARM7TDMI-S, ARM920/922T, ARM926EJ-S, ARM940T, ARM946E-S, ARM966E-S,
ARM9TDMI
Answer Registers R0 - R14 (including banked registers) and SPSR (in
all modes) are undefined after reset.
The Program Counter (PC/R15) will be set to 0x000000, or 0xFFFF0000 if
the core has a VINITHI or CFGHIVECS input which is set high as the
core leaves reset. This input should be set to reflect where the base
of the vector table in your system is located.
The Current Program Status Register (CPSR) will indicate that the ARM
core has started in ARM state, Supervisor mode with both FIQ and IRQ
mask bits set. The condition code flags will be undefined. Please see
the ARM Architecture Manual for a detailed description of the CPSR.

The cortex-m's do not boot the same way the traditional and full sized cores boot. Those at least for the reset as you pointed out fetch from address 0x00000000 (or the alternate if asserted) the first instructions, not really fair to call it the PC value as at this point the PC is somewhat bugus, there are multiple program counters being produced a fake one in r15, one leading the fetching, one doing prefetch, none are really the program counter. anyway, doesnt matter.
The cortex-m as documented in the armv7-m documentation (for the m3 and m4, for the m0 and m0+ see the armv6-m although they so far all boot the same way). These use a vector TABLE not instructions. The CORE reads address 0x00000000 (or an alternate if a strap is asserted) and that 32 bit value gets loaded into the stack pointer register. it reads address 0x00000004 it checks the lsbit (maybe not all cores do) if set then this is a valid thumb address, strips the lsbit off (makes it a zero) and begins to fetch the first instructions for the reset handler at that address so if your flash starts with
0x00000000 : 0x20001000
0x00000004 : 0x00000101
the cortex-m will put 0x20001000 in the stack pointer and fetch the first instructions from address 0x100. Being thumb instructions are 16 bits with thumb2 extensions being two 16 bit portions, its not an x86 the program counter is aligned for the full sized processors with 32 bit instructions it fetches on aligned addresses 0x0000, 0x0004, 0x0008 it doesnt increment pc <= pc + 1; For thumb mode or thumb processors it is pc = pc + 2. But also the fetches are not necessarily single instruction transactions, for the full sized they may fetch 4 or 8 words per transaction, the cortex-ms as documented in the technical reference manuals some are able to be compiled or strapped to 16 bits at a time or 32 bits at a time. So no need to talk about or think about execution loops fetching pc = pc + 1, that doesnt make sense even in an x86 these days.
to be fair arms documentation is generally good, on the better side compared to a number of others, not the best. Unlike the full sized arm exception table, the vector table in the cortex-m documentation was not done as well as it could have been, could have/should have just done something like the full sized but shown they were vectors not instructions. It is in there though in the architectural reference manual for the armv6-m and armv7-m (and I would assume armv8-m as well but have not looked, got some parts last week but boards are not here yet, will know very soon). Cant look for words like reset have to look for interrupt or undefined or hardfault, etc in that manual.
EDIT
unwrap your mind on this notion of how the processor starts fetching, it can be any arbitrary address they add into the design, and then the execution of the instructions determines the next address and next address, etc.
Also understand unlike say x86 or microchip pic or the avrs, etc, the core and the chips are two different companies. Even in those same company designs, but certainly where there is a clear division between the IP with a known bus, the ARM CORE will read address 0x00000004 on the AMBA/AXI/AHB bus, the chip vendor can mirror that address in as many different places as they want, in this case with the stm32 there probably isnt actually anything at 0x00000000 as their documentation implies based on the boot pins they map it either to an internal bootloader, or they map it to the user application at 0x08000000 (or in most stm32's if there is an exception thats fine I have not yet seen it) so when strapped that way and the logic has those addresses mirrored you will see the same 32 bit values at 0x00000000 and 0x08000000, 0x00000004 and 0x08000004 and so on for some limited amount of address space. This is why even though linking for 0x00000000 will work to some extent (till you hit that limit which is probably smaller than the application flash size), you will see most folks link for 0x08000000 and the hardware takes care of the rest, so your table really wants to look like
0x08000000 : 0x20001000
0x08000004 : 0x08000101
for an stm32, at least the dozens I have seen so far.
The processor reads 0x00000000 which is mirrored to the first item in the application flash, finds 0x20001000, it then reads 0x00000004 which is mirroed to the second word in the application flash and gets 0x08000101 which causes a fetch from 0x08000100 and now we are executing from the proper fully mapped application flash address space. so long as you dont change the mirroring, which I dont know if you can on an stm32 (nxp chips you can and I dont know about ti or other brands off hand). Some of the cortex-m cores the VTOR register is there and changable (others it is fixed at 0x00000000 and you cant change it), you do not need to change it to 0x08000000 for an stm32, at least all the ones I know about. its only if you are actively changing the mirroring of the zero address space yourself if possible or if you say have your own bootloader and maybe YOUR application space is 0x08004000 and that application wants a vector table of its own. then you either use VTOR or you build the bootloaders vector table such that it runs code that reads the vectors at 0x08004000 and branches to those. The NXP and others in the past certainly with the ARMV7TDMI cores, would let you change the mirroring of address zero because those older cores didnt have a programmable vector table offset register, helping you solve that problem in their chip designs. Newer ARM cores with a VTOR eliminate that need and over time the chip vendors might not bother anymore if they do at all...
EDIT
I dont know if you have the discovery board or the nucleo, I assume the latter as the former is not available (wish I knew about that one would like to have one. And/or I already have one and its buried in a drawer and I never got to it).
so here is a somewhat minimal program you can try on your stm32
.cpu cortex-m0
.thumb
.globl _start
_start:
.word 0x20000400
.word reset
.word loop
.word loop
.thumb_func
loop: b loop
.thumb_func
reset:
ldr r0,=0x20000000
mov r2,sp
str r2,[r0]
add r0,r0,#4
mov r2,pc
str r2,[r0]
add r0,r0,#4
mov r1,#0
top:
str r1,[r0]
add r1,r1,#1
b top
build
arm-none-eabi-as so.s -o so.o
arm-none-eabi-ld -Ttext=0x08000000 so.o -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf -O binary so.bin
this should build with arm-linux-whatever- or other arm-whatever-whatever tools from a binutils from the last 10 years.
The disassembly is important to examine before using the binary, dont want to brick your chip (with an stm32 there is a way to get unbricked)
08000000 <_start>:
8000000: 20000400 andcs r0, r0, r0, lsl #8
8000004: 08000013 stmdaeq r0, {r0, r1, r4}
8000008: 08000011 stmdaeq r0, {r0, r4}
800000c: 08000011 stmdaeq r0, {r0, r4}
08000010 <loop>:
8000010: e7fe b.n 8000010 <loop>
08000012 <reset>:
8000012: 4805 ldr r0, [pc, #20] ; (8000028 <top+0x6>)
8000014: 466a mov r2, sp
8000016: 6002 str r2, [r0, #0]
8000018: 3004 adds r0, #4
800001a: 467a mov r2, pc
800001c: 6002 str r2, [r0, #0]
800001e: 3004 adds r0, #4
8000020: 2100 movs r1, #0
08000022 <top>:
8000022: 6001 str r1, [r0, #0]
8000024: 3101 adds r1, #1
8000026: e7fc b.n 8000022 <top>
8000028: 20000000 andcs r0, r0, r0
the disassembler doesnt know that the vector table is not instructions so you can ignore those.
08000000 <_start>:
8000000: 20000400
8000004: 08000013
8000008: 08000011
800000c: 08000011
08000010 <loop>:
8000010: e7fe b.n 8000010 <loop>
08000012 <reset>:
Does it start the vector table at 0x08000000, check. Our stack pointer init value is at 0x00000000, yes, the reset vector we had the tools place for us. thumb_func tells them the following label is an address for some code/function/procedure/whatever_not_data so they orr the one on there for us. our reset handler is at address 0x08000012 so we want to see 0x08000013 in the vector table, check. I tossed in a couple more for demonstration purposes, sent them to an infinite loop at address 0x08000010 so the vector table should have 0x08000011, check.
So assuming you have a nucleo board not the discovery then you can copy the so.bin file to the thumb drive that shows up when you plug it in.
If you use openocd to connect through the stlink interface into the board now you can see that it was running (details left to the reader to figure out)
Open On-Chip Debugger
> halt
stm32f0x.cpu: target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x08000022 msp: 0x20000400
> mdw 0x20000000 20
0x20000000: 20000400 0800001e 0048cd01 200002e7 200002e9 200002eb 200002ed 00000000
0x20000020: 00000000 00000000 00000000 200002f1 200002ef 00000000 200002f3 200002f5
0x20000040: 200002f7 200002f9 200002fb 200002fd
> resume
> halt
stm32f0x.cpu: target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x08000022 msp: 0x20000400
> mdw 0x20000000 20
0x20000000: 20000400 0800001e 005e168c 200002e7 200002e9 200002eb 200002ed 00000000
0x20000020: 00000000 00000000 00000000 200002f1 200002ef 00000000 200002f3 200002f5
0x20000040: 200002f7 200002f9 200002fb 200002fd
so we can see that the stack pointer had 0x20000400 as expected
0x20000000: 20000400 0800001e 0048cd01
the program counter which is not some magical thing, they have to somewhat fake it to make the instruction set work.
800001a: 467a mov r2, pc
as defined in the instruction set the pc value used in this instruction is two instructions ahead of the address of this instruction, so 0x0800001A + 4 = 0x0800001E which is what we see in the memory dump.
And the third item is a counter showing we are running, the resume and halt shows that that count kept going
0x20000000: 20000400 0800001e 005e168
So this demonstrates, the vector table, initializing the stack pointer, the reset vector, where code execution starts, what the value of the pc is at some point in the program, and seeing the program run.
the .cpu cortex-m0 makes it build the most compatible program for the cortex-m family and the mov r0,=0x20000000 was cheating, you posted the same feature in your comment it says I want to load the address of blah into the register a label is just an address and they let you put just an address =_estack is the address of a label =0x20000000 is just a number treated as an address (addresses are just numbers as well, nothing magical about them). I could have done a smaller immediate with a shift or explicitly have done the pc relative load. force of habit in this case.
EDIT2
In attempt for a programmer to understand that the chip is logic, only some percentage of it is software/instruction driven, even within that it is just logic that does more things than the software instruction itself indicates. You want to read from memory your instruction asks the processor to do it but in a real chip there are a number of steps involved to actually perform that, microcoded or not (ARMs are not microcoded) there are state machines that walk through the various steps to perform each of these tasks. grab the values from registers, compute the address, do the memory transaction which is a handful of separate steps, take the return value and place it in the register file.
.thumb
.globl _start
_start:
.word 0x20001000
.word reset
.word loop
.word loop
.thumb_func
loop: b loop
.thumb_func
reset:
ldr r0,loop_counts
loop_top:
sub r0,r0,#1
bne loop_top
b reset
.align
loop_counts: .word 0x1234
00000000 <_start>:
0: 20001000 andcs r1, r0, r0
4: 00000013 andeq r0, r0, r3, lsl r0
8: 00000011 andeq r0, r0, r1, lsl r0
c: 00000011 andeq r0, r0, r1, lsl r0
00000010 <loop>:
10: e7fe b.n 10 <loop>
00000012 <reset>:
12: 4802 ldr r0, [pc, #8] ; (1c <loop_counts>)
00000014 <loop_top>:
14: 3801 subs r0, #1
16: d1fd bne.n 14 <loop_top>
18: e7fb b.n 12 <reset>
1a: 46c0 nop ; (mov r8, r8)
0000001c <loop_counts>:
1c: 00001234 andeq r1, r0, r4, lsr r2
Just barely enough of an instruction set simulator to run that program.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ROMMASK 0xFFFF
#define RAMMASK 0xFFF
unsigned short rom[ROMMASK+1];
unsigned short ram[RAMMASK+1];
unsigned int reg[16];
unsigned int pc;
unsigned int cpsr;
unsigned int inst;
int main ( void )
{
unsigned int ra;
unsigned int rb;
unsigned int rc;
unsigned int rx;
//just putting something there, a real chip might have an MBIST, might not.
memset(reg,0xBA,sizeof(reg));
memset(ram,0xCA,sizeof(ram));
memset(rom,0xFF,sizeof(rom));
//in a real chip the rom/flash would contain the program and not
//need to do anything to it, this sim needs to have the program
//various ways to have done this...
//00000000 <_start>:
rom[0x00>>1]=0x1000; // 0: 20001000 andcs r1, r0, r0
rom[0x02>>1]=0x2000;
rom[0x04>>1]=0x0013; // 4: 00000013 andeq r0, r0, r3, lsl r0
rom[0x06>>1]=0x0000;
rom[0x08>>1]=0x0011; // 8: 00000011 andeq r0, r0, r1, lsl r0
rom[0x0A>>1]=0x0000;
rom[0x0C>>1]=0x0011; // c: 00000011 andeq r0, r0, r1, lsl r0
rom[0x0E>>1]=0x0000;
//
//00000010 <loop>:
rom[0x10>>1]=0xe7fe; // 10: e7fe b.n 10 <loop>
//
//00000012 <reset>:
rom[0x12>>1]=0x4802; // 12: 4802 ldr r0, [pc, #8] ; (1c <loop_counts>)
//
//00000014 <loop_top>:
rom[0x14>>1]=0x3801; // 14: 3801 subs r0, #1
rom[0x16>>1]=0xd1fd; // 16: d1fd bne.n 14 <loop_top>
rom[0x18>>1]=0xe7fb; // 18: e7fb b.n 12 <reset>
rom[0x1A>>1]=0x46c0; // 1a: 46c0 nop ; (mov r8, r8)
//
//0000001c <loop_counts>:
rom[0x1C>>1]=0x0004; // 1c: 00001234 andeq r1, r0, r4, lsr r2
rom[0x1E>>1]=0x0000;
//reset
//THIS IS NOT SOFTWARE DRIVEN LOGIC, IT IS JUST LOGIC
ra=rom[0x00>>1];
rb=rom[0x02>>1];
reg[14]=(rb<<16)|ra;
ra=rom[0x04>>1];
rb=rom[0x06>>1];
rc=(rb<<16)|ra;
if((rc&1)==0) return(1); //normally run a fault handler here
pc=rc&0xFFFFFFFE;
reg[15]=pc+2;
cpsr=0x000000E0;
//run
//THIS PART BELOW IS SOFTWARE DRIVEN LOGIC
//still you can see that each instruction requires some amount of
//non-software driven logic.
//while(1)
for(rx=0;rx<20;rx++)
{
inst=rom[(pc>>1)&ROMMASK];
printf("0x%08X : 0x%04X\n",pc,inst);
reg[15]=pc+4;
pc+=2;
if((inst&0xF800)==0x4800)
{
//LDR
printf("LDR r%02u,[PC+0x%08X]",(inst>>8)&0x7,(inst&0xFF)<<2);
ra=(inst>>0)&0xFF;
rb=reg[15]&0xFFFFFFFC;
ra=rb+(ra<<2);
printf(" {0x%08X}",ra);
rb=rom[((ra>>1)+0)&ROMMASK];
rc=rom[((ra>>1)+1)&ROMMASK];
ra=(inst>>8)&0x07;
reg[ra]=(rc<<16)|rb;
printf(" {0x%08X}\n",reg[ra]);
continue;
}
if((inst&0xF800)==0x3800)
{
//SUB
ra=(inst>>8)&0x07;
rb=(inst>>0)&0xFF;
printf("SUBS r%u,%u ",ra,rb);
rc=reg[ra];
rc-=rb;
reg[ra]=rc;
printf("{0x%08X}\n",rc);
//do flags
if(rc==0) cpsr|=0x80000000; else cpsr&=(~0x80000000); //N flag
//dont need other flags for this example
continue;
}
if((inst&0xF000)==0xD000) //B conditional
{
if(((inst>>8)&0xF)==0x1) //NE
{
ra=(inst>>0)&0xFF;
if(ra&0x80) ra|=0xFFFFFF00;
rb=reg[15]+(ra<<1);
printf("BNE 0x%08X\n",rb);
if((cpsr&0x80000000)==0)
{
pc=rb;
}
continue;
}
}
if((inst&0xF000)==0xE000) //B
{
ra=(inst>>0)&0x7FF;
if(ra&0x400) ra|=0xFFFFF800;
rb=reg[15]+(ra<<1);
printf("B 0x%08X\n",rb);
pc=rb;
continue;
}
printf("UNDEFINED INSTRUCTION 0x%08X: 0x%04X\n",pc-2,inst);
break;
}
return(0);
}
You are welcome to hate my coding style, this is a brute force thrown together for this question thing. No I dont work for ARM, this can all be pulled from public documents/information. I shortened the loop to 4 counts to see it hit the outer loop
0x00000012 : 0x4802
LDR r00,[PC+0x00000008] {0x0000001C} {0x00000004}
0x00000014 : 0x3801
SUBS r0,1 {0x00000003}
0x00000016 : 0xD1FD
BNE 0x00000014
0x00000014 : 0x3801
SUBS r0,1 {0x00000002}
0x00000016 : 0xD1FD
BNE 0x00000014
0x00000014 : 0x3801
SUBS r0,1 {0x00000001}
0x00000016 : 0xD1FD
BNE 0x00000014
0x00000014 : 0x3801
SUBS r0,1 {0x00000000}
0x00000016 : 0xD1FD
BNE 0x00000014
0x00000018 : 0xE7FB
B 0x00000012
0x00000012 : 0x4802
LDR r00,[PC+0x00000008] {0x0000001C} {0x00000004}
0x00000014 : 0x3801
SUBS r0,1 {0x00000003}
0x00000016 : 0xD1FD
BNE 0x00000014
0x00000014 : 0x3801
SUBS r0,1 {0x00000002}
0x00000016 : 0xD1FD
BNE 0x00000014
0x00000014 : 0x3801
SUBS r0,1 {0x00000001}
0x00000016 : 0xD1FD
BNE 0x00000014
0x00000014 : 0x3801
SUBS r0,1 {0x00000000}
0x00000016 : 0xD1FD
BNE 0x00000014
0x00000018 : 0xE7FB
B 0x00000012
Perhaps this helps perhaps this makes it worse. Most of the logic is not driven by instructions, each instruction, requires some amount of logic not counting the common logic like instruction fetching and things like that.
If you add more code this simulator will break it ONLY supports these handful of instructions and this loop.

The most important thing to check when you're confused about some behaviour of an Arm processor is probably to check the version of the architecture which applies. You will find a huge amount of very old legacy documentation which relates to ARM7 and ARM9 designs. Whilst not all of this is wrong today, it can be very misleading.
ARM v4, ARM v5, ARM v6: These are legacy designs, rarely even used in derivative products now.
ARM v7A: These are the first of the Cortex series. Cortex-A5 is the entry-level for a linux class device in 2018.
ARM v7M, ARM v6M: These are the common microcontroller devices like your STM32, and already these have over 10 years of history
ARM v8A: These introduce the 64 bit instruction set (T32/A32/A64 in one device), already entry level in the R-pi 3 for example.
ARM v8M: The latest iteration of an microcontroller architecture with more advanced security features, just starting to become available 2018Q2
Specifically, ARMv6M/ARMv7M/ARMv8M provide a very different exception model compared with all of the other ARM architectures (remaining similar within the family), whilst many of the other differences are more incremental or focused on specialised area.

LDMIA instruction results in corrupt register data

I'm attempting to run a compiled program on a ARM Cortex-M3 bare metal. Before the system even reaches the application code, an odd error blows the program counter away and errors out.
Before the instruction, the registers are observed to be:
r0 0x0 0
r1 0x1 1
r2 0x0 0
r3 0x2 2
r4 0x18564 99684
r5 0x18418 99352
r6 0x0 0
r7 0x0 0
r8 0x8311 33553
r9 0x0 0
r10 0x0 0
r11 0x0 0
r12 0xc84404 13124612
sp 0x7ffe0 0x7ffe0
lr 0x80df 32991
pc 0x8380 0x8380
The following instruction is executed nominally:
0x829c <__call_exitprocs+112>: ldmia.w sp!, {r4, r5, r6, r7, r8, r9, r10, r11, pc}
And the registers being read explode. It also sends the program counter way off effectively terminating the program.
...
r3 0x2 2
r4 0xffffffff 4294967295
r5 0xffffffff 4294967295
r6 0xffffffff 4294967295
r7 0xffffffff 4294967295
r8 0xffffffff 4294967295
r9 0xffffffff 4294967295
r10 0xffffffff 4294967295
r11 0x0 0
...
pc 0xfffffffe 0xfffffffe
I've read a similar issue on stack overfflow, but it doesn't seem to be the direct issue that I'm facing here. The ATMEL documentation for this board doesn't specify a limitation on number of internal registers read at once on a quick glance.
Any thoughts on the problem and, if possible, a workaround in gcc to prevent it?

The instruction (and its effect) are absolutely correct. But the sp value before this instruction is absolutely wrong. Your chip has no RAM memory on that address. In fact - it probably has no memory at all at this address. See page 32 of the manual (with the memory map).
http://www.atmel.com/Images/Atmel-6430-32-bit-Cortex-M3-Microcontroller-SAM3U4-SAM3U2-SAM3U1_Datasheet.pdf
Your sp should be somewhere within SRAM, so above 0x20000000. The value you have - 0x7ffe0 is somewhere in the "Boot memory" region. If you want to find the problem, find out why sp has invalid value.

ARM Cortex-M4: issues met when calling printf in assembly

I am trying to call printf in ARM M4 assembly and meet some problems. The purpose is to dump content in R1. The code is like the following
.data
.balign 4
output_string:
dcb "content in R1 is 0x%x\n", 0
....
.text
....
push {r0, r1}
mov r1, r0
ldr r0, =output_string
bl printf
pop {r0, r1}
The problem I meet is that, when put "output_string" address into R0, the value is added with a extra 1. For example, if the symbol "output_string" have a value of 0x2000, R0 will get the value 0x2001.
I feel this has something to do with THUMB/ARM mode. But I have declare "output_string" in data section, why the assembler still translate it as an instruction address?
Or is there some more formal way to do such in-assembly function calling?

I think you should use:
ldr r0, =output_string
The = prefix is an assembler shorthand to make it load an arbitrary 32-bit constant. See this ARM Information Center page.

LDR - Literal pool - ARM

I know how to load an immediate value using the LDR instruction in ARM.
For example:
LDR R0,=0x0804c088
This instruction loads the value (0x0804c088) to the register r0. When I try to access the address it is stored in using x/x $r0 using gdb. I get the message: Cannot access memory at address0x0804c088. But that is not the address, it is the value stored in that register and the address is a PC relative address which is stored in the literal pool.
What is the mistake that I doing there? did I understand something wrong there?
Moreover, How should I set the literal pool, can you give me an example please?
#Carl Norum: Here is the code.
__asm__("LDR R0,=0x0804c088");
__asm__("LDR R1,[PC, #34];");
O/p from gdb
(gdb) info registers
r0 0x804c088 134529160
r1 0xf2c00300 4072669952
r2 0x0 0
r3 0x1 1
r4 0x8961 35169
r5 0x0 0
r6 0x0 0
r7 0xbe8f4b74 3197062004
r8 0x0 0
r9 0xef99 61337
r10 0xf00d 61453
r11 0x0 0
r12 0x0 0
sp 0xbe8f4b74 0xbe8f4b74
lr 0x89a7 35239
pc 0x8a62 0x8a62 <test46+34>
cpsr 0x60000030 1610612784
(gdb) x/x $r0
0x804c088: Cannot access memory at address 0x804c088
(gdb) p/x$r0
$1 = 0x804c088
(gdb) p/x $r1
$2 = 0xf2c00300
(gdb) x/x $r1
0xf2c00300: Cannot access memory at address 0xf2c00300
(gdb) x/x $r15
0x8a62 <test46+34>: 0x1022f8df

The gdb x command has an inherent dereferencing operation. If you want to print the value in r0, just use p:
p/x $r0
The form of LDR you're using isn't a real instruction - it's an assembler macro-instruction that gets converted into a pc-relative ldr instruction and a literal value someplace in memory (probably close to the location you're using it). If you want to find the address of the constant in the literal pool, you need to look at the output binary. Your source assembly code doesn't contain it.
For example, let's take this simple example program:
.globl f
f:
ldr r0,=0x12345678
And then build and disassemble it:
$ arm-none-eabi-clang -c example.s
$ arm-none-eabi-objdump -d example.o
example.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <f>:
0: e51f0004 ldr r0, [pc, #-4] ; 4 <f+0x4>
4: 12345678 .word 0x12345678
You can see the literal is right there at offset 4.
You don't need to do anything to "set up the literal pool". Any necessary literals will be set up for you by the assembler.

If you want to know the actual address of the literal pool at runtime, try this :
adr r12, literal_pool_label
.
.
. // your code here
.
.
.
literal_pool_label:
.ltorg
Then you can read r12 which contains the address of the literal pool at runtime.
ltorg is a directive forcing where the literal pool is placed. For short codes, they are automatically attached at the end of the code, but if the code gets larger than 4KB, the LDR pseudo instruction will cause an error at assembly time since the pc-relative offset gets bigger than 4096, and thus out of allowed range.
To avoid this, you can put ltorg middle in the code where it's safe from being misinterpreted as an instruction. (after an absolute branch for example)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight