How to try cache invalidation and cache clean in ARM? - arm

I want to try cache clean and cache invalidate in Raspberry Pi
Can someone guide on this. I just want to do some DMA transfer and try some stuff.
Also if some one can give me a code snippet on how to check if cache has been cleaned in ARM It will be really helpfull
For Eg. I want to see state of cache memory and give instruction
Cache invalidate and see the state of memory

For the Raspberry Pi's ARM1176, cache maintenance is performed via the c7 group of System Control Coprocessor (CP15) operations. Data cache clean+invalidate is mcr p15, 0, r0, c7, c14, 0, and the Cache Dirty Status Register (which simply tells you if the cache is clean or has been written to) can be read with mrc p15, 0, <Rd>, c7, c10, 6.
There's too much information to regurgitate here, so I'd recommend referring to the TRM for the details - if the Raspberry Pi TRM doesn't cover it, you can find the full ARM1176 TRM here (which contains a few example code snippets). As always, the Linux source also provides a handy real-world usage reference.
As for inspecting the actual cache contents, AFAIK you're going to need a JTAG hardware debugger for that, if at all.

Related

Disabling interrupts of Cortex-M3 via the debug port

I'm trying to temporarily mask all interrupts of a Cortex-M3, with only having access to the debug port. I can read and write the memory freely, which so far has been sufficient for accessing processor registers.
My best bet so far was writing 1 to one of the core registers (either PRIMASK or FAULTMASK), and disabling all interrupts that way. The problem is, I can't seem to access them. According to the ARM infocenter, I can use the DCRSR to select core registers, and then read/write them via the DCRDR. However, the infocenter page seems to imply, that there is no way to access the interrupt mask registers (PRIMASK, FAULTMASK and BASEPRI).
I did find a different source, namely the Definitive Guide to the ARM Cortex-M3, which shows on page 244, that the DCRSR can indeed be used to access the special registers, by writing 0b10100 to it. I tried that too, and the value I read from the DCRDR afterwards seemed nothing like what you would expect from the special registers (reserved values were set to 1s or 0s randomly, the first bit of FAULTMASK was 1, yet interrupts seemed to work fine). I still tried setting the first bit of the supposed PRIMASK to 1, in hopes of disabling the interrupts, but to no avail. It seems that the infocenter was right, and the DCRSR can NOT be used to access all special registers.
My question is, what other way is there, to access the interrupt mask registers from the debug port? Is there a direct memory address at which they are stored, or some different register that provides access to them?
Any help is greatly appreciated.

How do you startup the additional cores on an Allwinner H5?

I am trying to figure out how to start cores other than core0 for a quad core allwinner h5. the C_RST_CTRL register (a.k.a CPU2 Reset Control Register) has four bits at the bottom that imply they are four reset controls. The lsbit is one the other three zeros implying setting those releases reset on the other cores, but I dont see that happening (nothing is running code I have left at address zero), at the same time zeroing that lsbit does stop core0 implying that it is a reset control. So I assume there are clock gates somewhere but I cannot find them.
The prcm registers which are not documented in the H5 docs but are on a sunxi wiki page for older allwinners do show what seem to be real PLL settings but the cpu enable registers are marked as A31 only and the cpu0 register(s) are not setup so that would imply that is not how you enable any cpu including 0 for this chip.
What am I missing?
For a pure bare metal solution look at sunxi_cpu_ops.c from the plat/sun50iw1p1 directory of https://github.com/apritzel/arm-trusted-firmware.git
You need to deactivate various power clamps as well as clock gates.
Alternatively, include the Arm Trusted Firmware code and enable a core by an SMC call:
ldr x2,=entry_point
mov x1,#corenumber
mov x0,#0x03
movk x0,#0x8400,lsl #16
smc #0
I've now confirmed this works on an H5.
Does C_CPU_STATUS STANDBY_WFI=0x0E suggest that the secondary cores are sitting in WFI?
Not an answer, I don't have enough rep to comment but I'm just starting the same exercise myself.
As an aside, how did you put code at address 0? Isn't that BROM? I was going to play with the RVBARADDR registers.

why does uboot invalidate TLB s, icache, BP array at beginning

On arm platform, the u-boot will invalidate TLBs, icache and BP array at beginning, but what's the reason? Is it necessary?
cpu_init_crit:
/*
* Invalidate L1 I/D
*/
mov r0, #0 # set up for MCR
mcr p15, 0, r0, c8, c7, 0 # invalidate TLBs
mcr p15, 0, r0, c7, c5, 0 # invalidate icache
mcr p15, 0, r0, c7, c5, 6 # invalidate BP array
mcr p15, 0, r0, c7, c10, 4 # DSB
mcr p15, 0, r0, c7, c5, 4 # ISB
A reset may be hard or soft. It is possible for something to jump to a reset vector. If cache is enabled with stale entries, the code may crash resulting in a system not booting. This can be more common than you think as SDRAM may miss behave after a cold power on and mis-read causing a crash. Often a watchdog or unrecoverable faults will jump to the reset vector. Finally, there maybe u-boot systems in XIP NOR flash. Some systems/code may just jump to this code to perform a soft reset.
In all of these cases, the debug available is NIL. You may have a working system in 99.9999% of the cases. Having to trouble shoot a boot failure on particular hardware which only occurs in a hot bar or an ice cream freezer might make you enjoy this extra code. Ie, there are rare circumstances where it is needed.
Hardware failures are much more common as supply rails come on line. Datasheets describe properly behaving systems with power/temperature, etc in range. u-boot may not operate in this ideal world. If your concern is boot time, you are much better to write your own loader (keep this code though) or optimize things like BSS clearing, etc.
On A15/A7/A12 you do not need to do cache/mmu/btb invalidation on post-reset, so this is done just for "paranoia" reasons. However, there might be cores which do not perform auto-invalidate on reset so this code just makes sure that the same behavior is preserved across different core types.
This is necessary in the cases where the invalidation is not done in hardware on a reset (cold or warm). Another more practical reason is that U-Boot is typically not the first piece of code that runs on the hardware. In most cases there would be the ROM code that runs before U-Boot and to avoid making any assumptions regarding the state of the hardware when U-Boot got control of the system it's safer to always try and get the hardware to a known state and proceed from there.

ARM Bootloader: Disable MMU and Caches

According to some tutorials, we will disable MMU and I/D-Caches at the beginning of bootlaoder. If I understand correctly, it aims to use the physical address directly in the program, so please correct me if I'm wrong. Thank you!
Secondly, we do this to disable MMU and Caches:
mrc P15, 0, R0, C1, C0, 0
bic R0, R0, #0x00002300 # clear bits 13, 9:8
bic R0, R0, #0x00000087 # clear bits 7, 2:0
orr R0, R0, #0x00000002 # set bit 2 (A) Align
orr R0, R0, #0x00001000 # set bit 12 (I) I-Cache
mcr P15, 0, R0, C1, C0, 0
D-Cache, MMU and Data Address Alignment Fault Checking have been disabled by clear bits 2:0, but why we enable bit 2 immediately in the following instrument? To make sure this manipulation is valid?
Last question is why D-cache is disabled but I-caches is able? To speed up instrument process?
Last question is why D-cache is disabled but I-caches is able? To speed up instrument process?
The MMU has settings to determine which memory regions are cacheable or not. If you do not have the mmu on but you have the data cache on (if possible) then you cannot safely talk to peripherals. if you read the uart status register for example that goes through the cache just like any other data operation, whatever that status is stays in the cache for subsequent reads until such time as that cache line is evicted and you get one more shot at the actual register. Lets say for example you have some code that polls the uart status register waiting for a character in the rx buffer. If that first read shows there is no character, that status goes in the cache, you will remain in the loop forever since you will never get to talk to the status register again you will simply get the cached copy of the register. if there was a character in there then that status also gets cached, you read the rx register, and perhaps do something, if when you come back again if the status has not been evicted from the data cache then you get the stale status which shows there is a character, you rx buffer read may or may not also be cached so you may get the stale value in the cache, you may get a stale value or whatever the peripheral does when you read and there is no new value or you might get a new value, but what you dont get in these situations is proper access to the peripheral. When the mmu is on, you use the mmu to mark the address space used by that peripheral as non-(data)-cacheable, and you dont have this problem. With the mmu off you need the data cache off for arm systems.
Leaving the I-cache on is okay because instruction fetches only read instructions...Well for a bare metal application that is okay, it helps for example if you are using a flash that has a potential for read disturb (spi or i2c flashes). The problem is this application is a bootloader, so you must take some extra care. For example your bootloader has some code at address 0x8000 that it runs through at least once, then you choose to use it as a bootloader, the bootloader might be at say address 0x10000000 allowing you to load a new program at 0x8000, this load uses data accesses so it does not go through the instruction cache. So there is a potential that the instruction cache has some or all of the code from the last time you were in the 0x8000 area, and when you branch to the bootloaded code at 0x8000 you will get either the old program from cache or a nasty mixture of old program and new program for the parts that are cached and not cached. So if your bootloader allows for the i-cache to be on, you need to invalidate the cache before branching to bootloaded code.
Lastly, if you or anyone using this bootloader wants to use jtag, then you have that same problem but worse, data cycles that do not go through the i-cache are used to write the new program to ram, when you tell the jtag debugger to then run the new program you will get 1) only the new program, 2) a mixture of the new program and old program fragments from cache 3) the old program from cache.
So d-cache is bad without an mmu because of things that are not in ram, peripherals, etc. The i-cache is a use at your own risk kind of thing which you can mitigate except for the times that jtag is used for debugging.
If you have concerns or have confirmed read-disturb in your (external) flash, then I recommend turn on the i-cache, use a tight loop to copy your application to ram, branch to the ram copy and run there, turn off the i-cache (or use at your own risk) and dont touch the flash again, certainly not heavy read accesses to small areas. A tight uart polling loop like you might have for a command line parser, is a really good place to get hit with read-disturb.
You did not specified on which ARM you are working. Capabilities may vary from one ARM to an other (there is a huge gap between an ARM9 and an ARM Cortex A15).
In the given code, bit 2 is cleared and then set, but it does not matter, as those changes are done in R0. There is no change in the ARM behavior until the write in CP15 register (done by the instruction mcr P15, 0, R0, C1, C0, 0).
Concerning d-cache/i-cache enabling, it is only a matter of choice, there is no requirement. On the products I work on, the bootloader enables L1 I-cache, D-cache, L2 cache, and MMU (and it disables all that stuff before jumping on Linux). Be sure to follow ARM documentations about cache invalidation and memory barriers (according to your actual ARM Core) if you use cache and MMU in your bootloader.

Does anyone know how to enable ARM FIQ?

Does anyone know how to enable ARM FIQ?
Other than enabling or disabling the IRQ/FIQ while you're in supervisor mode, there's no special setup you should have to do on the ARM to use it, unless the system (that the ARM chip is running in) has disabled it in hardware (based on your comment, this is not the case since you're seeing the FIQ input pin driven correctly).
For those unaware of the acronyms, FIQ is simply the last interrupt vector in the list which means it's not limited to a branch instruction as the other interrupts are. That means it can execute faster than the other IRQ handlers.
Normal IRQs are limited to a branch instruction since they have to ensure their code fits into a single word. FIQ, because it won't overwrite any other IRQ vectors, can just run the code directly without a branch instruction (hence the "fast").
The FIQ input line is just a way for external elements to kick the ARM chip into FIQ mode and start executing the correct exception. There's nothing on the ARM itself which prevents that from happening except the CPSR.
To enable FIQ in supervisor mode:
MRS r1, cpsr ; get the cpsr.
BIC r1, r1, #0x40 ; enable FIQ (ORR to disable).
MSR cpsr_c, r1 ; copy it back, control field bit update.
A similar thing can be done for normal IRQs, but using #0x80 instead of #0x40.
The FIQ can be closed off to you by the chip manufacturer using trustzone extensions.
Trustzone creates a Secure world and a normal world. The secure world has its own supervisor, user and memory space. The idea is for secure operations to be routed so they never leave the chip and cannot be traced even if you scan the pins on the bus. I think in OMAP it is used for some cryptography operations.
On Reset the core starts in secure mode. It sets up the secure monitor (gateway between secure and non-secure world) and at this time FIQ can be setup to be routed to the monitor. I think it is the SCR.FIQ bit that may be set and then all FIQs ignore the value of CPSR.F and go to monitor mode. Check out the ARM ARM but if I remember correctly if this is happening there is no way for you to know from nonsecure OS code. Then the monitor will reset the Normal world registers and doing an exception return with PC set to the reset exception vector.
The core will take an interrupt to monitor mode, do its thing and return.
Sorry I can't answer you in the comments, I don't have enough reputation, you could always fix that ;), but I hope you see this

Resources