Cannot write TMC/ETF registers on STM32H753 - arm

This question is the follow-up of this one.
I try to configure the ETF in circular mode in order to be able to read the execution trace through the embedded software in case of critical error, on a STM32H753.
I am following the algorithm described in ARM's Trace Memory Controller reference manual (section 2.2.2)
But I cannot write the ETF registers: I unlocked the ETF macrocell by writing the magic number 0xC5ACCE55to register ETF_LAR but when I read thez registers they are all 0 (through debugger or printf) and when I write them they remain at 0.
Any advice on how to write ETF registers ?

There is actually a mistake in the STM32H7 doc:
and few pages later:
My understanding is that the second image is wrong.
Also I had to use the "Component base address (system bus)" and not the "component base address (debugger)". This notion of double address is pretty confusing as for example for ETM there is only one address.
Anyway using 0x5C014000as base address, I can read/write the registers

When accesses are made using the DAP (SWD or JTAG), the debug subsystem should be powered up as part of the handshake with the tools. In a self-hosted debug mode, then device specific control of clocks/power is usually necessary.
For STM32H7, parts:
The ETM is in the same clock/power domain as the core, so it will normally be visible.
CK_DBG_D1 clocks the trace components in the D1 power domain: System
ROM table 2, CoreSight trace funnel, ETF, system CTI and TPIU. It is a
gated version of the D1 domain system clock (CK_HCLK_D1).
Self hosted debug clock and power control uses a dedicated set of registers.
All the debug clocks (except DAPCLK) can be enabled and disabled by
register bits in the DBGMCU.

Related

Cortex-M33 MTB configuration - When MTB buffer is full

I am exploring the MTB feature on Cortex-M hardware and doing experiments on Arm mps2+ FPGA board with the Cortex-m33 image, which enables the TrustZone technique. From what I learned, the MTB can be configured to record non-sequential branches of the execution. The record will be written directly to the allocated SRAM by the MTB unit automatically. During the experiment, I have several questions.
When the MTB buffer is full, there are two ways to handle it. One way is to rewrite the buffer from the very beginning. Another way is to set a watermark, which will let the PE enter the Debug State. I checked the document that the Debug State means we need an external debugger or another core to handle. Since the board I use is single-core and I prefer not to use the external debugger, is there another way to notify the CPU without halting it when the MTB buffer is full?
Based on the first question, I had several trials:
I configured the DWT comparator to monitor the data address write,
but it seems DWT cannot monitor hardware writing like MTB does.
I configured Non-secure MPU to set the MTB buffer region as RO, but it still cannot trigger a secure fault. I checked the document ARM CoreSight MTB-M33 page A2-26: The MTB-M33 ensures that trace write accesses have priority over AHB accesses. Is that the reason why the
MPU can not constrain MTB writing?
I also configured SAU to set the MTB buffer as secure, it still cannot trigger the secure fault when non-secure trace writing happens.
Then I went back to read the MTB document. MTB actually has the MTB_SECURE register that can partition the MTB buffer as secure or non-secure. But I don't know is that because the board or the image I use does not fully implement the MTB, I cannot modify the MTB_SECURE register except for the last two bits.
So at last, is there another way to notify the CPU without halting it when the MTB buffer is full? Like what Intel Processor Trace does, which will generate an interrupt.
I hope I describe the question clearly and I very appreciate if someone could give me any suggestion. Thank you so much!

STM32F4 FSMC/FMC SRAM as Heap/Stack results in random hardfaults

we are currently evaluating to use an external SRAM for C/C++ heap storage on our platform using a STM32F439BI microcontroller.
The problem
Using the SRAM as storage for heap results in random hardfaults which are raised from buserrors/imprecice buserrors.
Without placing the heap on the SRAM, memory tests run successfully on the whole SRAM (8 bit/16 bit and 32 bit accesses).
Connecting a debugger I can observe these errors sometimes before a hardfault occurs. Most often a word is read from the SRAM and the CPU register fills with addresses of the following format: 0x-1F3-1F3 (- is most often '0', sometimes 'A' or '6'). The pattern '1F3' persists. If the same address is read again some lines further down the correct value is read (some other address in 0x60000000 space).
If I stop the program on a breakpoint at some point early in the program and step a few lines, I get these errors more frequently.
Further details
The SRAM is connected using the FMC/FSMC peripheral on FMC bank 1 and SRAM bank 1 and is therefore memory-mapped to address 0x60000000.
All settings for GPIO pins and FMC configuration are set from the startup file before main() executes or static objects are created.
The SRAM is the following: CY7C1041GN30
We connect all 16 data pins, all 18 address pins, BHE, BLE, OE, WE and CE to our controller. All pins are configured as push-pull-alternate-function, pull-up, AF_12 (FMC), very high speed. We enable clocks for all necessary pins and the clock for FMC. Note: Initially we started out without pull-up/down showing the same symptoms.
The controller runs with a clock speed of 168 MHz
As stated above, a memory test runs successfully
We use DMA for SPI, I2C and ADC data transfers
We frequently use interrupts, including external (pin) interrupts
We use the following timing settings:
AddressSetupTime: 2
AddressHoldTime: 4
DataSetupTime: 4
BusTurnAroundDuration: 1
CLKDivision: 2
DataLatency: 2
We configure the FMC as follows:
NSBank FMC_NORSRAM_BANK1,
DataAddressMux FMC_DATA_ADDRESS_MUX_DISABLE,
MemoryType FMC_MEMORY_TYPE_SRAM,
MemoryDataWidth FMC_NORSRAM_MEM_BUS_WIDTH_16,
BurstAccessMode FMC_BURST_ACCESS_MODE_DISABLE,
WaitSignalPolarity FMC_WAIT_SIGNAL_POLARITY_LOW,
WrapMode FMC_WRAP_MODE_DISABLE,
WaitSignalActive FMC_WAIT_TIMING_BEFORE_WS,
WriteOperation FMC_WRITE_OPERATION_ENABLE,
WaitSignal FMC_WAIT_SIGNAL_DISABLE,
ExtendedMode FMC_EXTENDED_MODE_DISABLE,
AsynchronousWait FMC_ASYNCHRONOUS_WAIT_DISABLE,
WriteBurst FMC_WRITE_BURST_DISABLE,
ContinuousClock FMC_CONTINUOUS_CLOCK_SYNC_ASYNC,
WriteFifo 0,
PageSize 0
We spend a lot of time of experimenting with longer timings and compared all the settings to examples including this one: Using STM32L476/486 FSMC peripheral
to drive external memories (although this one is for the STM32L4, I am fairly certain it applies to this controller as well)
Findings on similar problems
The problem sounds very similar to this errata sheet entry: "2.3.4 Corruption of data read from the FMC" but it also says the error is fixed in our revision of the controller (3)
I hope someone out there has seen this strange behaviour before and can help us. After over one week of debugging we expect some kind of error in the controller when interrupts/DMA accesses occur while the CPU accesses the SRAM (when we use it as heap, it is accessed very frequently). Hopefully you can shed some light on this topic.
Sorry for not getting back to you, internet.
Yes, we found out what the issue was (at least in our case). Problem was that the J-Link debugger we use is causing problems if it hangs above the power electronics on our pcb (it is mounted vertically). If we guide the ribbon cable out at the top (only digital electronics) the error disappears. So our guess is, that some noise from the electronics was caught up by the cable and directly injected into the JTAG port, which caused failures inside the MCU.
Just got a confirmation from ST, that there is a bug in the STM32F469 FMC that might cause incorrect values if the write fifo is disabled. The workaround is to have the fifo enabled. It is the same issue as in this F7 processor https://www.st.com/resource/en/errata_sheet/dm00145382.pdf

how to single-step code on-target with no jtag, breakpoints, simulator, emulator

Let's say you have a pointer to function whose source you do not have and which is "untrusted" because it might read/write to disallowed memory region.
Before it executes each assembly instruction, you want to verify that it doesn't access disallowed memory regions.
The OS is (almost) bare-metal i.e. a custom RTOS (so no Linux or QNX).
This is for a functionality that needs to be enabled not only during development but during normal runtime.
Ideally, it'd run something like this:
void (*fptr)(int);
fptr = &someFunction; // untrusted, don't have source
// enable interrupts for each assembly instruction
_EN_INT();
// call the function
fptr();
// everytime the PC increments, some other code runs which verifies that if any load/stores are executed, it doesn't access some disallowed memory range
// disable interrupts for each assembly instruction
_DIS_INT();
QUESTION
Is it possible to call that function and pause execution after every assembly instruction?
The OS is (almost) bare-metal i.e. a custom RTOS (so no Linux or QNX).
My answer assumes that you can modify the "OS" the way you need it...
Cortex MK20DX256VLH7
This seems to be a Cortex M4 CPU.
how to single-step code on-target with no jtag, breakpoints
From the doc, it doesn't say whether you NEED an external debugger to resume execution.
If the CPU is really stopped, you'll definitely need an external signal (e.g. from a debugger).
However most CPUs support software debugging. This means that an interrupt service routine is executed whenever a breakpoint is hit. To continue execution you simply return from the interrupt service routine.
I don't know about the Cortex M4 but for the Cortex M3 you'll have to set some special registers to enable that feature. Whenever a "BKPT" instruction is hit then interrupt #12 (*) is executed.
For code in RAM you simply write an BKPT instruction (0xBExx, e.g. 0xBEBE) to the address where you want to set your breakpoint. (Before writing you read out the value to be able to restore it later on).
For code in Flash the M3 has a "Flash patching unit" which allows you to specify up to three addresses which shall be read out as 0xBExx (0xBEBE ?) even if other data is stored there. This allows you to set up to 3 breakpoints in Flash.
Interesting for you: The register controlling the debug features in the M3 (named "DEMCR") also has a bit named "MON_STEP":
If you set this bit in interrupt handler #12 then exactly one instruction is executed after returning from the interrupt handler and interrupt #12 is triggered again. The use case for this feature is - of course - single-stepping code!
To stop single-stepping you'll have to clear the MON_STEP bit again...
Important 1:
I don't know if the MK20DX256VLH7 really has all these features. However because it is a Cortex M4 chip and the M4 should have nearly all features of the M3 these features should be present...
Important 2:
Implementing single-stepping and debugging is not done quickly. Assembly language knowledge will be very helpful and you'll need a lot of time...
From the doc, ...
You will not only need the documentation for the MK20DX256VLH7 from NXP but you'll also need the Cortex M4 documentation from ARM.
(*) Offset 4*12 in the vector table is meant here (which is named "IRQ(-4)" in some ARM documents); not IRQ12.
yes the ARM emulator/interpreter sounds exactly like what I want. Is there a free one?
qemu is open-source, most of it is GPLv2. https://wiki.qemu.org/License. You'd probably need to modify it a lot, because it's designed for use as a stand-alone wrapper for a whole Unix process (qemu-user) or whole machine (qemu-system).
I googled, and there's also http://www.unicorn-engine.org/ which is designed to be used as part of a larger program (written in C with bindings for calling from various languages). It's also GPLv2 (not LGPL), so you can use it if the rest of your code is also Free software.
It's actually based on the CPU-emulation code from QEMU; they stripped out all the device / BIOS emulation stuff to make a flexible library for just emulating CPUs.
Presumably you could configure some memory protections for it and set up the starting machine state, and let it run your function (with a return address that leads to some code that hands control back to your main code?)

How to correctly use a startup-ipi to start an application processor?

My goal is to let my own kernel start an application cpu. It uses the same mechanism as the linux kernel:
Send asserting and level triggered init-IPI
Wait...
Send deasserting and level triggered init-IPI
Wait...
Send up to two startup-IPIs with vector number (0x40000 >> 12) (the entry code for the application processor lies there)
Currently I'm just interested in making it work with QEMU. Unfortunately, instead of jumping to 0x40000, the application cpu jumps to 0x0 with the cs register set to 0x4000. (I checked with gdb).
The Intel MultiProcessor Specification (B.4.2) explains that the behavior that I noticed is valid if the target processor is halted immediately after RESET or INIT. But shouldn't this also apply to the code of the linux kernel? It sends the startup-IPI after the init-IPI. Or do I misunderstand the specification?
What can I do to have the application processor jump to 0x000VV000 and not to 0x0 with the cs register set to 0xVV00? I really can't see, where linux does something that changes the behavior.
It seems that I really misunderstood the specification: Since the application cpu is started in real mode 0x000VV000 is equivalent to 0xVV00:0x0000. It is not possible to represent the address just in the 16 bit ip register. Therefore a segment offset for the code segment is required.
Additionally, debugging real mode code with gdb is comparable complicated because it does not respect the segment offset. When required to see the disassembled code of the trampoline at the current position, it is necessary to calculate the physical location:
x/20i $eip+0xVV000
This makes gdb print the next 20 instructions at 0xVV00:$eip.

What does 'bank'ing a register mean?

Reading 'ARM Architecture' on Wikipedia and found the following statement:
Registers R0-R7 are the same across all CPU modes; they are never
banked.
R13 and R14 are banked across all privileged CPU modes except system
mode.
What does banking a register mean?
Register banking refers to providing multiple copies of a register at the same address.
Taken from section 1.4.6 of the arm docs
The term is referring to a solution for the problem that not all registers can be seen at once.
There is a different register bank for each processor mode. The banked registers give rapid context switching for dealing with processor exceptions and privileged operations.
If your looking for a more theoretical reasoning, I recommend this paper.
Edit: A much deeper answer than mine is given here
When the processor enters an exception, the banked registers are switched automatically with another set of these registers.
Virtually, the exception handler routine doesn't have to save these registers on the stack to prevent them from being clobbered later on (by the exception handler functions). The processor just keeps a safe copy of that set; and will restore the original set on exception return.
This clip explains it well
https://youtu.be/7LqPJGnBPMM?t=1419
Banked registers are registers which are not needed and accessible by the current execution mode. When the execution mode changes, the registers needed for the new mode will become usable.

Resources