ARM LPC3250 execute instructions from external RAM - arm

I got trouble with making ARM execute instructions stored in the external RAM.
I wrote a small program that can blink led based on LPC3250 architecture. The program run properly if I download it to the internal RAM of LPC3250 via IAR online-debug. But it cannot run if I put it in the external RAM.
The external RAM is a block of SRAM built in the SPARTAN-6(Xilinx FPGA), DATA width is 32-bits, memory depth is 4096, means address width is 12-bits. This RAM can be initialized through a COE file.
So I get the BIN file of the Program via IAR, then convert the BIN file into a COE file, which will be used to initialize the SRAM. But every time, the processor just execute the three E59FF018(LDR PC,(PC, #0X18)) instructions at the beginning of SRAM, cannot jump to the main().
I cannot figure out why. As LPC3250 required, I add 4 bytes(0x13579BD2) to the beginning of BIN file via UltraEditor before generating the COE file. The UM of LPC3250 said it will start to execute the code at the address 0xE0000004 of external ram if the value in 0xE0000000 is 0x13579BD2. From the COE file I can see there are 5 same instructions(E59FF018) behind 0x13579BD2.
Please tell me where I'm wrong and What I need to do exactly to make it right.

Well, I almost forget that I have asked this question 3 years ago. Now, I have found the cause. It's about the Address signal output from the ARM. I misunderstand that the Address signal from the ARM is Byte addressing, actually it's Double words addressing. So I shouldn't dismiss the low two bits of the Address signal when processing in the FPGA, it has been dismissed already by the ARM.
In a word, my problem is an addressing problem.

Related

Put unchanging code in separate memory section, to reduce OTA update size

I am writing C code for the STM32L010RB microcontroller, which has 128 KB of flash memory where the program will reside. I want to implement over-the-air updates of this program code, and I've done this once before with an other Cortex M0+ based microcontroller. I divided the flash memory into four sections like this:
The actual program which executes would reside in the Main program section, and when time comes to update it, it would contact our servers and download the new program into the New firmware section. A CRC check is used to ensure the integrity of the downloaded FW. A flag in the Persistent memory section is set to signal new FW available, and the program then restarts into the bootloader. The bootloader checks the flag and sees that an update is available, erases the contents of the Main program and copies the content of New firmware into the Main program section. The flag is cleared, and it then branches into the now updated main program. This works perfectly, and is robust against power loss or errors at any point in the prosess.
I want to do a very similar implementation on the STM32. However, this time around my Main program uses a few dependencies that take up quite a lot of flash memory. For instance a printf implementation that is very useful, but takes up quite a lot of space (relative to what I have available). These libraries will naturally be used, probably unchanged, in all future versions of the program.
My question is whether it is possible to relocate certain parts of code that will be used, unchanged, by all future versions of the program, to a separate (fifth) section in the flash memory. This would save space in that (for instance) the printf library would only be allocated once in flash, instead of having to be included within the main program and all updates.
Is there a way to do this, if so, how? And is it a viable way to go in your opinion? Any pitfalls that must be avoided?

Using OpenOCD to determine RAM usage in microcontroller (ARM Cortex-M3)

I'd like to see how much RAM is used by the firmware by writing a known pattern, and comparing RAM contents to see how much has been modified.
I've tried
reset halt
load_image pattern.bin 0xaddress
resume
(let target run for a bit)
halt
dump_image sram.bin 0xaddress 0xsize
but it appears I have obtained flash contents and cannot see the test pattern anywhere.
Am I using the proper commands? If I "verify" manually by loading and dumping, the data is identical.
Could halt affect the RAM contents? Otherwise, is it safe to assume that the application in fact initializes all of the RAM, making analysis difficult/impossible?
I should point out that I only have a "dump" of the firmware, i.e. I am not building it.
I had to do soft_reset_halt to get the PC to the reset vector address.
My version of OpenOCD warns me that the command is deprecated.
Then I was able to spot a few occurrences of my test pattern in the RAM dump.
Also, there are notable differences between the RAM image and the firmware, so it seems that the firmware is indeed using most of the RAM.
(this might not be an issue if your interface is using a physical reset line?)

What is the canonical way to execute code directly from a QEMU device?

I'm modeling a particular evaluation board, which has a leon3 processor and several banks of MRAM mapped to specific addresses. My goal is to start qemu-system-sparc using my bootloader ELF, and then jump to the base address of a MRAM bank to begin executing bare-metal programs therein. To this end, I have been able to successfully run my bootloader and jump to the first instruction, but QEMU immediately stops and exits without reporting any error/trap. I can also run the bare-metal programs in isolation by passing them in ELF format as a kernel to qemu-system-sparc.
Short version: Is there a canonical way to set up a device such that code can be executed from it directly? What steps do I need to take when compiling that code to allow it to execute correctly?
I modeled the MRAM as a device with a MemoryRegion, along with the appropriate read and write operations to expose a heap-allocated array with my program. In my board code (modified version of qemu/hw/sparc/leon3.c), writes to the MRAM address are mapped to the MemoryRegion of the device. Using printfs, I am reporting reads and writes in the style of the unimplemented device (qemu/hw/misc/unimp.c), and I have verified that I am reading and writing to the device correctly.
Unfortunately, this did not work with respect to running the code on the device. I can see the read immediately after the bootloader jumps to the base address of my device, but the instruction read doesn't actually do anything. The bootloader uses a void function pointer, which is tied to the address of the MRAM device to induce a jump.
Another approach I tried is creating an alias to my device starting from address 0; I thought perhaps that my binary has all its addresses set relative to zero, so by mapping writes from addresses [0, MRAM_SIZE) as an alias to my device base address, the code will end up reading the corresponding instructions in the device MemoryRegion.
This approach failed an assert in memory.c:
static void memory_region_add_subregion_common(MemoryRegion *mr,
hwaddr offsset,
MemoryRegion *subregion)
{
assert(!subregion->container);
subregion->container = mr;
subregion->addr = offset;
memory_region_update_container_subregions(subregion);
}
What do I need to do to coerce QEMU to execute the code in my MRAM device? Do I need to produce a binary with absolute addresses?
Older versions of QEMU were simply unable to handle execution from anything other than RAM or ROM, and attempting to do so would give a "qemu: fatal: Trying to execute code outside RAM or ROM" error. QEMU 3.1 and later fixed this limitation, and now can execute code from anywhere -- though execution from a device will be much much slower than executing from RAM.
You mention that you "modeled the MRAM as a device with a MemoryRegion, along with the appropriate read and write operations to expose a heap-allocated array". This sounds like it is probably the wrong approach -- it will work but be very slow. If the MRAM appears to the guest as being like RAM, then model it as RAM (ie with a RAM MemoryRegion). If it's like RAM for reading but writes need to do something other than just-write-to-the-memory (or need to do that some of the time), then model it using a "romd" region, the same way the existing pflash devices do. Nonetheless, modelling it as a device with pure read and write functions should work, it'll just be horribly slow.
The assertion you've run into is the one that says "you can't put a memory region into two things at once" -- the 'subregion' you've passed in is already being used somewhere else, but you've tried to put it into a second container. If you have a MemoryRegion that you need to have appear in two places in the physical memory map, then you need to: create the MemoryRegion; create an alias MemoryRegion that aliases the real one; map the actual MemoryRegion into one place; map the alias into the other. There are plenty of examples of this in existing board models in QEMU.
More generally, you need to figure out what the evaluation board hardware actually is, and then model that. If the eval board has the MRAM visible at multiple physical addresses, then yes, use an alias MR. If it doesn't, then the problem is somewhere else and you need to figure out what's actually happening, not try to bodge around it with aliases that don't exist on the real hardware. QEMU's debug logging (various -d suboptions, plus -D file to log to a file) can be useful for checking what the emulated CPU is really doing in this early bootup phase -- but watch out as the logs can be quite large and they are sometimes tricky to interpret unless you know a little about QEMU internals.

ARM Cortex M3 Bootloader with User code at normal 0 address?

I have been researching ARM M3 bootloaders and most seem to work with the bootloader code sitting in low memory and the user code in higher memory. This requires the users application to be linked differently when used with the bootloader. That is some address above the bootloader code which is located at 0x00000000. I think this an inconvenience.
I would like to know if it is possible to have a bootloader which is located in high memory and yet still allow the users application to be linked normally so it starts at 0x00000000 in low memory ?
I have done this before with PIC bootloaders which also have reset & stack code at 0x0000 similar to ARM M3's NVIC table.
They way this has worked form me on the PIC is that bootloader is located in the top 1k of the flash. Reset vector and SP code which is at 0x0000 points to the start address of the bootloader code. Thus the bootloader starts after reset.
Now when the bootloaders is downloading the user code it puts it into low memory where the linker thinks it should go, the only exception being it does not allow overwriting of its own reset and SP vectors located at 0x0000. Instead it 'catches' these and stores them in flash within its high memory.
When the bootloader is ready for the users code to start it retrieves the initial reset and SP vectors it stored during the download and starts the user code using these.
At the next reset the bootloader will start and it can do a check to see if it should wait for a new user program or just start the users code using the vectors it has stored.
Could the above method be translated for use on the ARM M3 ?
I think the expression 'go with the flow' applies here. You are trying to do something that the ARM is not intended to do so I think you are making unnecessary problems for yourself. Putting the app above the bootloader just requires linking at a different address then setting the VTOR register to that address.

Flow of Startup code in an embedded system , concept of boot loader?

I am working with an embedded board , but i don't know the flow of the start up code(C/assembly) of the same.
Can we discuss the general modules/steps acted upon by the start up action in the case of an embedded system.
Just a high level overview(algorithmic) is enough.All examples are welcome.
/Kanu__
CPU gets a power on reset, and jumps to a defined point: the reset vector, beginning of flash, ROM, etc.
The startup code (crt - C runtime) is run. This is an important piece of code generated by your compiler/libc, which performs:
Configure and turn on any external memory (if absolutely required, otherwise left for later user code).
Establish a stack pointer
Clear the .bss segment (usually). .bss is the name for the uninitialized (or zeroed) global memory region. Global variables, arrays, etc which don't have an initializing value (beyond 0) are located here. The general practice on a microcontroller is to loop over this region and set all bytes to 0 at startup.
Copy from the end of .text the non-const .data. As most microcontrollers run from flash, they cannot store variable data there. For statements such as int thisGlobal = 5;, the value of thisGlobal must be copied from a persistent area (usually after the program in flash, as generated by your linker) to RAM. This applies to static values, and static values in functions. Values which are left undefined are not copied but instead cleared as part of step 2.
Perform other static initializers.
Call main()
From here, your code is run. Generally, the CPU is left in an interrupts-off state (platform dependent).
Pretty open-ended question, but here are a few things I have picked up.
For super simple processors, there is no true startup code. The cpu gets power and then starts running the first instruction in its memory: no muss no fuss.
A little further up we have mcu's like avr's and pic's. These have very little start up code. The only thing that really needs to be done is to set up the interrupt jump table with appropriate addresses. After that it is up to the application code (the only program) to do its thing. The good news is that you as the developer doesn't generally have to worry about these things: that's what libc is for.
After that we have things like simple arm based chips; more complicated than the avr's and pic's, but still pretty simple. These also have to setup the interrupt table, as well as make sure the clock is set correctly, and start any needed on chip components (basic interrupts etc.). Have a look at this pdf from Atmel, it details the start up procedure for an ARM 7 chip.
Farther up the food chain we have full-on PCs (x86, amd64, etc.). The startup code for these is really the BIOS, which are horrendously complicated.
The big question is whether or not your embedded system will be running an operating system. In general, you'll either want to run your operating system, start up some form of inversion of control (an example I remember from a school project was a telnet that would listen for requests using RL-ARM or an open source tcp/ip stack and then had callbacks that it would execute when connections were made/data was received), or enter your own control loop (maybe displaying a menu then looping until a key has been pressed).
Functions of Startup Code for C/C++
Disables all interrupts
Copies any initialized data from ROM to RAM
Uninitialized data area is set to zero.
Allocates space for and initializes the stack
Initializes the processor’s stack pointer
Creates and initializes the heap
Executes the constructors and initializers for all global variables (C++ only)
Enables interrupts
Calls main
Where is "BOOT LOADER" placed then? It should be placed before the start-up code right?
As per my understanding, from the reset vector the control goes to the boot loader. There the code waits for a small period of time during which it expects for data to be flashed/downloaded to the controller/processor. If it does not detect a data then the control gets transferred to the next step as specified by theatrus. But my doubt is whether the BOOT LOADER code can be re-written. Eg: Can a UART bootloader be changed to a ETHERNET/CAN bootloader or is it that data sent using any protocol are converted to UART using a gateway and then flashed.

Resources