PreLoad Engine (PLE) ARM A9 MPCore - arm

I'm trying to pre-load load SDRAM memory to L2 cache. I have initialised the MMU and made 1 translation table.
I also enabled the cache and I see the software is using the cache as well...
To load some SDRAM to my L2 cache i tried to work with the Preload Engine (PLE).
I have read the documentation about the PLE but i don't get it to work. This is what i have tried:
__asm("MCRR p15,0, %0, %1, c11" : :"r" (0x20000004),"r"(0x20000000)); //program PLE new channel
0x20000004 is the register that contains the start_register value (0x10000000 in my case)
0x20000000 is the register that containts the settings: 0xFFFC007C in my case. This means :
Length =16k, stride = 0 and number of blocks is 32 (32 * 16k = 512k which is the amount of L2 cache available)
When I execute this command, the PLE does nothing with the registers PLEIDR, PLEASR, PLEFSR, PLEUAR and PLEPCR.
The PLEIDR says that the Fifo_size = 16 and the PLE is available.
The PLEASR is always zero which mean the channel is never active
The PLEFSR = 16 which means that there are 16 available entries in the FIFO
The other settings are user defiened...
What do i have to do to get the PLE channel running?

Related

Is it possible to read the Cycle Count Register (DWT_CYCCNT) when executing at unprivileged?

Is it possible to read the Cycle Count Register (DWT_CYCCNT) when executing at unprivileged?
#define DWT_CYCCNT (*(volatile uint32_t*)(0xE0001004)) /**< Cycle Count Register */
CycleCount = DWT_CYCCNT; /* Unprivileged read of the Cycle Count Register causes a Bus Fault. */
Related: Measuring clock cycle count on cortex m7
Short answer: no.
Debug registers are located in special reserved address space
The architecture reserves address space 0xE0000000 to 0xFFFFFFFF for system-level use. ARM reserves the
first 1MB of this system address space, 0xE0000000 to 0xE00FFFFF, as the Private Peripheral Bus (PPB)
Register in question specifically
Data Watchpoint and Trace (DWT) block 0xE0001000-0xE0001FFF
Section "C1.2.1 General rules applying to debug register access" explicitly state that DWT memory, as a rule of thumb, is privilege access only:
The Private Peripheral Bus (PPB), address range 0xE0000000 to
0xE0100000, supports the following general rules:
The region is defined as Strongly-ordered memory
Registers are always accessed little-endian
Debug registers can only be accessed as a word access
A reserved register or bit field has the value UNK/SBZP
Unprivileged access to the PPB causes BusFault errors unless otherwise stated.
Retrieved from "ARMv7-M Architecture Reference Manual" (ARM DDI 0403D)

MPU settings for ARM Cortex M7

I am working on a SoC with several Cortex M7 cores. It has SRAM mapped to region 0x2000 0000 -> 0x3FFF FFFF and DDR mapped to 0x6000 0000 -> 0xDFFF FFFF.
It seems that configuring the 4 partitions of DDR to cached normal memory (WT, WB or WBA) is triggering an HW bug which freezes the whole chip after a few seconds. Even the debugger get disconnected. Note that I do not need to access the DDR to raise the problem. Just running code in SRAM to configure the MPU and do various stuff will, after a random delay, trigger the bug.
Is there any limitation to the attributes I set when configuring the MPU?
I would say only the system space in 0xE000 0000 has fixed attributes and others are freely configurable depending on the HW implementation, but I have a doubt because if I refer to ARMv7-M arch ref manual, I can for instance find this:
B3.1 The system address map (p. 588)
[...] A declared cache type can be demoted but not promoted [...]
I am not sure here if this is just a limitation during runtime for two sequential configurations or an absolute limitation that forbids enabling the cache for partitions at 0xA... and 0xC... even once during init as they are not enabled in the default address map.
Also, the documentation indicates that the DDR is normal memory, but is there any problem if I keep it configured by default as device space (without taking into account slower non re-ordered accesses inherent to this configuration)?
Here is the exact configuration for the 4 DDR partitions:
RBAR -> 60000000 80000000 A0000000 C0000000 (start addresses)
RASR => 03080039 03080039 13010039 13100039 (default settings)
RASR => 03020039 03020039 03020039 03020039 (new settings triggering the bug)
Note that for the SRAM I keep the same setting:
RBAR -> 20000000
RASR => 030B0039

Atmel SAM3X dual bank switching not working

I'm currently working with an Atmel SAM3X8 ARM microcontroller that features a dual banked 2 x 256KB flash memory. I'm trying to implement a firmware update feature, that puts the new firmware into the currently unused flash bank, and when done swaps the banks using the flash remapping to run the new firmware.
The datasheet states to do so I need to set the GPNVM2 bit, then the MCU will remap the memory, so Flash 1 is now at 0x80000 and Flash 0 at 0xC0000. This will also lead to the MCU executing code beginning from Flash 1.
To cite the datasheet:
The GPNVM2 is used only to swap the Flash 0 and Flash 1. If GPNVM2 is ENABLE, the Flash 1 is mapped at
address 0x0008_0000 (Flash 1 and Flash 0 are continuous). If GPNVM2 is DISABLE, the Flash 0 is mapped at
address 0x0008_0000 (Flash 0 and Flash 1 are continuous).
[...]
GPNVM2 enables to select if Flash 0 or Flash 1 is used for the boot.
Setting GPNVM bit 2 selects the boot from Flash 1, clearing it selects the boot from Flash 0.
But when I set GPNVM2, either via SAM-BA or my own firmware using flash_set_gpnvm(2) (ASF SAM Flash Service API), it will still boot from the program in Flash 0, and the new program will still reside at Flash 1's offset 0xC0000. The state of GPNVM2 has been verified by flash_is_gpnvm_set(2)
Flashing the firmware itself to Flash1 bank works flawlessly, that has been verified by dumping the whole flash memory with SAM-BA.
There is an errata from Atmel about an issue, that the flash remapping only works for portions smaller than 64KB. My code is less than that (40KB), so this shouldn't be an issue.
I've not found any other people having this issue, nor any example how to use it, so maybe somebody could tell me if I'm doing something wrong here, or what else to check.
I had the same issue (see here: Atmel SAM3X8E dual bank switching for booting different behaviour).
After some more research I found an Application Note (Link: http://ww1.microchip.com/downloads/en/AppNotes/Atmel-42141-SAM-AT02333-Safe-and-Secure-Bootloader-Implementation-for-SAM3-4_Application-Note.pdf) which explains the boot behaviour of the SAM3X in a more clear way. The problem is that the datasheet is a bit misleading (at least I was confused too). The SAM3X has no ability to remap the the Flash banks. The booting behaviour is a bit different (see the picture in the link, it's a snipped from the Application note, page 33/34):
Booting behaviour SAM3X
Picture 3-9 shows the SAM3X's behaviour at the boot-up. The GPNVM bits 1 and 2 just determine which memory section (ROM/Flash0/Flash1) is mirrored to the boot memory (located at 0x00000000). The mapping of the Flash banks is not changed. Therefore Flash0 still is mapped to 0x00080000 and Flash1 to 0x000C0000).
As the Application Note states some other Atmel microcontrollers are able to really remap the Flash banks (e.g. SAM3SD8 and SAM4SD32/16). These processors change the location of the Flash banks as you can see in picture 3-10.
To be able to update your firmware it is therefore necessary to implement some kind of bootloader. I implemented one by myself and was able to update my firmware even without using the GPNVM bits at all. I also opend a support ticket at Microchip to clarify the booting behaviour. When I receive an answer I hope to tell you more.
EDIT:
Here's the answer from the Microchip support:
Setting the GPNVM2 bit in SAM3X will merely make the CPU 'jump to' or start from flash bank 1 i.e. 0xC0000.
No actual swap of memory addresses will take place.
To use flash bank 1, you will need to change the linker file (flash.ld) to reflect the flash start address 0xC0000.
For flash bank 0 application, change:
rom (rx) : ORIGIN = 0x00080000, LENGTH = 0x00080000 /* Flash, 512K /
to:
rom (rx) : ORIGIN = 0x00080000, LENGTH = 0x00040000 / Flash, 256K */
For flash bank 1 application, change:
rom (rx) : ORIGIN = 0x00080000, LENGTH = 0x00080000 /* Flash, 512K /
to:
rom (rx) : ORIGIN = 0x000C0000, LENGTH = 0x00040000 / Flash, 256K */
If this is not done, the reset handler in the flash 1 application will point to an address in the flash 0 application.
So, although code will start execution in flash 1 (if GPNVM2 is set), it will jump back to the flash 0 application.
The errata stating the 64kb limitation can be ignored.
Therefore the Application Note is right and no actual change of the mmory mapping is performed.
Cheers
Lukas

ARM Bootloader: Disable MMU and Caches

According to some tutorials, we will disable MMU and I/D-Caches at the beginning of bootlaoder. If I understand correctly, it aims to use the physical address directly in the program, so please correct me if I'm wrong. Thank you!
Secondly, we do this to disable MMU and Caches:
mrc P15, 0, R0, C1, C0, 0
bic R0, R0, #0x00002300 # clear bits 13, 9:8
bic R0, R0, #0x00000087 # clear bits 7, 2:0
orr R0, R0, #0x00000002 # set bit 2 (A) Align
orr R0, R0, #0x00001000 # set bit 12 (I) I-Cache
mcr P15, 0, R0, C1, C0, 0
D-Cache, MMU and Data Address Alignment Fault Checking have been disabled by clear bits 2:0, but why we enable bit 2 immediately in the following instrument? To make sure this manipulation is valid?
Last question is why D-cache is disabled but I-caches is able? To speed up instrument process?
Last question is why D-cache is disabled but I-caches is able? To speed up instrument process?
The MMU has settings to determine which memory regions are cacheable or not. If you do not have the mmu on but you have the data cache on (if possible) then you cannot safely talk to peripherals. if you read the uart status register for example that goes through the cache just like any other data operation, whatever that status is stays in the cache for subsequent reads until such time as that cache line is evicted and you get one more shot at the actual register. Lets say for example you have some code that polls the uart status register waiting for a character in the rx buffer. If that first read shows there is no character, that status goes in the cache, you will remain in the loop forever since you will never get to talk to the status register again you will simply get the cached copy of the register. if there was a character in there then that status also gets cached, you read the rx register, and perhaps do something, if when you come back again if the status has not been evicted from the data cache then you get the stale status which shows there is a character, you rx buffer read may or may not also be cached so you may get the stale value in the cache, you may get a stale value or whatever the peripheral does when you read and there is no new value or you might get a new value, but what you dont get in these situations is proper access to the peripheral. When the mmu is on, you use the mmu to mark the address space used by that peripheral as non-(data)-cacheable, and you dont have this problem. With the mmu off you need the data cache off for arm systems.
Leaving the I-cache on is okay because instruction fetches only read instructions...Well for a bare metal application that is okay, it helps for example if you are using a flash that has a potential for read disturb (spi or i2c flashes). The problem is this application is a bootloader, so you must take some extra care. For example your bootloader has some code at address 0x8000 that it runs through at least once, then you choose to use it as a bootloader, the bootloader might be at say address 0x10000000 allowing you to load a new program at 0x8000, this load uses data accesses so it does not go through the instruction cache. So there is a potential that the instruction cache has some or all of the code from the last time you were in the 0x8000 area, and when you branch to the bootloaded code at 0x8000 you will get either the old program from cache or a nasty mixture of old program and new program for the parts that are cached and not cached. So if your bootloader allows for the i-cache to be on, you need to invalidate the cache before branching to bootloaded code.
Lastly, if you or anyone using this bootloader wants to use jtag, then you have that same problem but worse, data cycles that do not go through the i-cache are used to write the new program to ram, when you tell the jtag debugger to then run the new program you will get 1) only the new program, 2) a mixture of the new program and old program fragments from cache 3) the old program from cache.
So d-cache is bad without an mmu because of things that are not in ram, peripherals, etc. The i-cache is a use at your own risk kind of thing which you can mitigate except for the times that jtag is used for debugging.
If you have concerns or have confirmed read-disturb in your (external) flash, then I recommend turn on the i-cache, use a tight loop to copy your application to ram, branch to the ram copy and run there, turn off the i-cache (or use at your own risk) and dont touch the flash again, certainly not heavy read accesses to small areas. A tight uart polling loop like you might have for a command line parser, is a really good place to get hit with read-disturb.
You did not specified on which ARM you are working. Capabilities may vary from one ARM to an other (there is a huge gap between an ARM9 and an ARM Cortex A15).
In the given code, bit 2 is cleared and then set, but it does not matter, as those changes are done in R0. There is no change in the ARM behavior until the write in CP15 register (done by the instruction mcr P15, 0, R0, C1, C0, 0).
Concerning d-cache/i-cache enabling, it is only a matter of choice, there is no requirement. On the products I work on, the bootloader enables L1 I-cache, D-cache, L2 cache, and MMU (and it disables all that stuff before jumping on Linux). Be sure to follow ARM documentations about cache invalidation and memory barriers (according to your actual ARM Core) if you use cache and MMU in your bootloader.

booting from a disk/cd/usb

How can I boot my small console from a disk/cd/usb, with the following configuration:
The media that I want to use will be completely raw i.e no filesystem on it.
When I insert the media in my system or assume that its already inserted, I want to make it boot my own small OS.
The procedure that I want to use is that when my system starts, it boots from the disk/cd/usb, and starts my OS. For now imagine that the OS is going to be a hello world program. I actually want to see how the real world OS implement themselves.
A bootloader must be 512 bytes. No less no more.
And it must end with the standard PC boot signature: 0xAA55.
Also note a PC boots in 16 bits mode.
You need to load your kernel or secondstage bootloader from that code into memory, and then jump to that code (and maybe switch the CPU to 32 bits protected mode).
For instance (nasm):
BITS 16
; Your assembly code here (510 bytes max)...
jmp $
; Fills the remaining space with 0
times 510 - ( $ - $$ ) db 0
; Boot signature
dw 0xAA55
Thas the job of a boot loader. The bootloader should be present in the first 512 bytes of a HardDisk. This location is called MBR(Master boot record)
When bios loads it checks if the media contains the MBR. it verifies the MBR signature 0xAA55 which should be present as the last 2 bytes of MBR.
Then the Bios loads the BootLoader into RAM at address 0x7C00
Then the boot loader is the one who actually loads the kernal into memory, by reading the filesystem.
usually you cannot fit in all the code in 512 bytes, so there will be a secondory boot loader.
secondary bootloader will be loaded by your primary bootloader.
secondary boot loader loads IDT and GDT (Interupt vector table and Global descriptor table). Enables A20 gate to move into protected mode.
secondary boot loader loads the 32 bit kernel from disk into memory, then jumps to the kernel code
For more information you can download Linux kernel v0.01 (First version). Look how it is done. to my surprise the code to read the File system + the code to move into protected mode is fit into 512 bytes of code.

Resources