STM32F4 FSMC/FMC SRAM as Heap/Stack results in random hardfaults - arm

we are currently evaluating to use an external SRAM for C/C++ heap storage on our platform using a STM32F439BI microcontroller.
The problem
Using the SRAM as storage for heap results in random hardfaults which are raised from buserrors/imprecice buserrors.
Without placing the heap on the SRAM, memory tests run successfully on the whole SRAM (8 bit/16 bit and 32 bit accesses).
Connecting a debugger I can observe these errors sometimes before a hardfault occurs. Most often a word is read from the SRAM and the CPU register fills with addresses of the following format: 0x-1F3-1F3 (- is most often '0', sometimes 'A' or '6'). The pattern '1F3' persists. If the same address is read again some lines further down the correct value is read (some other address in 0x60000000 space).
If I stop the program on a breakpoint at some point early in the program and step a few lines, I get these errors more frequently.
Further details
The SRAM is connected using the FMC/FSMC peripheral on FMC bank 1 and SRAM bank 1 and is therefore memory-mapped to address 0x60000000.
All settings for GPIO pins and FMC configuration are set from the startup file before main() executes or static objects are created.
The SRAM is the following: CY7C1041GN30
We connect all 16 data pins, all 18 address pins, BHE, BLE, OE, WE and CE to our controller. All pins are configured as push-pull-alternate-function, pull-up, AF_12 (FMC), very high speed. We enable clocks for all necessary pins and the clock for FMC. Note: Initially we started out without pull-up/down showing the same symptoms.
The controller runs with a clock speed of 168 MHz
As stated above, a memory test runs successfully
We use DMA for SPI, I2C and ADC data transfers
We frequently use interrupts, including external (pin) interrupts
We use the following timing settings:
AddressSetupTime: 2
AddressHoldTime: 4
DataSetupTime: 4
BusTurnAroundDuration: 1
CLKDivision: 2
DataLatency: 2
We configure the FMC as follows:
NSBank FMC_NORSRAM_BANK1,
DataAddressMux FMC_DATA_ADDRESS_MUX_DISABLE,
MemoryType FMC_MEMORY_TYPE_SRAM,
MemoryDataWidth FMC_NORSRAM_MEM_BUS_WIDTH_16,
BurstAccessMode FMC_BURST_ACCESS_MODE_DISABLE,
WaitSignalPolarity FMC_WAIT_SIGNAL_POLARITY_LOW,
WrapMode FMC_WRAP_MODE_DISABLE,
WaitSignalActive FMC_WAIT_TIMING_BEFORE_WS,
WriteOperation FMC_WRITE_OPERATION_ENABLE,
WaitSignal FMC_WAIT_SIGNAL_DISABLE,
ExtendedMode FMC_EXTENDED_MODE_DISABLE,
AsynchronousWait FMC_ASYNCHRONOUS_WAIT_DISABLE,
WriteBurst FMC_WRITE_BURST_DISABLE,
ContinuousClock FMC_CONTINUOUS_CLOCK_SYNC_ASYNC,
WriteFifo 0,
PageSize 0
We spend a lot of time of experimenting with longer timings and compared all the settings to examples including this one: Using STM32L476/486 FSMC peripheral
to drive external memories (although this one is for the STM32L4, I am fairly certain it applies to this controller as well)
Findings on similar problems
The problem sounds very similar to this errata sheet entry: "2.3.4 Corruption of data read from the FMC" but it also says the error is fixed in our revision of the controller (3)
I hope someone out there has seen this strange behaviour before and can help us. After over one week of debugging we expect some kind of error in the controller when interrupts/DMA accesses occur while the CPU accesses the SRAM (when we use it as heap, it is accessed very frequently). Hopefully you can shed some light on this topic.

Sorry for not getting back to you, internet.
Yes, we found out what the issue was (at least in our case). Problem was that the J-Link debugger we use is causing problems if it hangs above the power electronics on our pcb (it is mounted vertically). If we guide the ribbon cable out at the top (only digital electronics) the error disappears. So our guess is, that some noise from the electronics was caught up by the cable and directly injected into the JTAG port, which caused failures inside the MCU.

Just got a confirmation from ST, that there is a bug in the STM32F469 FMC that might cause incorrect values if the write fifo is disabled. The workaround is to have the fifo enabled. It is the same issue as in this F7 processor https://www.st.com/resource/en/errata_sheet/dm00145382.pdf

Related

Cortex-M33 MTB configuration - When MTB buffer is full

I am exploring the MTB feature on Cortex-M hardware and doing experiments on Arm mps2+ FPGA board with the Cortex-m33 image, which enables the TrustZone technique. From what I learned, the MTB can be configured to record non-sequential branches of the execution. The record will be written directly to the allocated SRAM by the MTB unit automatically. During the experiment, I have several questions.
When the MTB buffer is full, there are two ways to handle it. One way is to rewrite the buffer from the very beginning. Another way is to set a watermark, which will let the PE enter the Debug State. I checked the document that the Debug State means we need an external debugger or another core to handle. Since the board I use is single-core and I prefer not to use the external debugger, is there another way to notify the CPU without halting it when the MTB buffer is full?
Based on the first question, I had several trials:
I configured the DWT comparator to monitor the data address write,
but it seems DWT cannot monitor hardware writing like MTB does.
I configured Non-secure MPU to set the MTB buffer region as RO, but it still cannot trigger a secure fault. I checked the document ARM CoreSight MTB-M33 page A2-26: The MTB-M33 ensures that trace write accesses have priority over AHB accesses. Is that the reason why the
MPU can not constrain MTB writing?
I also configured SAU to set the MTB buffer as secure, it still cannot trigger the secure fault when non-secure trace writing happens.
Then I went back to read the MTB document. MTB actually has the MTB_SECURE register that can partition the MTB buffer as secure or non-secure. But I don't know is that because the board or the image I use does not fully implement the MTB, I cannot modify the MTB_SECURE register except for the last two bits.
So at last, is there another way to notify the CPU without halting it when the MTB buffer is full? Like what Intel Processor Trace does, which will generate an interrupt.
I hope I describe the question clearly and I very appreciate if someone could give me any suggestion. Thank you so much!

WS2812 on LPC11C24 with light_ws2812

I'm trying to get the WS2812 working on the NXP LPC11C24.
I have done all things that are listed in the readme but I have problems with the following
Set code memory wait states to zero. Important: The library will not work if there are one or more code memory waitstate cycles.
This may limit your code clock to 30 Mhz or lower on most Cortex-M0 controllers.
The LED stripe is shining in different colours than I programmed and I think the wait states are the reason.
I've tried the following two things, but none of them helped.
1.) Set FLASH access time in clocks
Chip_FMC_SetFLASHAccess( FLASHTIM_50MHZ_CPU )
2.) Place the function that writes the data into the main RAM region
__RAMFUNC(RAM) void ws2812_sendarray(uint8_t *ledarray,int length);
How can I set the wait states down to zero?

Cannot write TMC/ETF registers on STM32H753

This question is the follow-up of this one.
I try to configure the ETF in circular mode in order to be able to read the execution trace through the embedded software in case of critical error, on a STM32H753.
I am following the algorithm described in ARM's Trace Memory Controller reference manual (section 2.2.2)
But I cannot write the ETF registers: I unlocked the ETF macrocell by writing the magic number 0xC5ACCE55to register ETF_LAR but when I read thez registers they are all 0 (through debugger or printf) and when I write them they remain at 0.
Any advice on how to write ETF registers ?
There is actually a mistake in the STM32H7 doc:
and few pages later:
My understanding is that the second image is wrong.
Also I had to use the "Component base address (system bus)" and not the "component base address (debugger)". This notion of double address is pretty confusing as for example for ETM there is only one address.
Anyway using 0x5C014000as base address, I can read/write the registers
When accesses are made using the DAP (SWD or JTAG), the debug subsystem should be powered up as part of the handshake with the tools. In a self-hosted debug mode, then device specific control of clocks/power is usually necessary.
For STM32H7, parts:
The ETM is in the same clock/power domain as the core, so it will normally be visible.
CK_DBG_D1 clocks the trace components in the D1 power domain: System
ROM table 2, CoreSight trace funnel, ETF, system CTI and TPIU. It is a
gated version of the D1 domain system clock (CK_HCLK_D1).
Self hosted debug clock and power control uses a dedicated set of registers.
All the debug clocks (except DAPCLK) can be enabled and disabled by
register bits in the DBGMCU.

Which MCU(Cortex-M) for time critical GPIO application?

We have an application which runs on PIC24H, we would like to port it to another MCU, preferably ARM Cortex. Application is extremely time critical, meaning that we need extremely deterministic code behaviour. In short, there are pulses which are obtained via special hardware to GPIO pins, data is analyzed right away. Processing of data is not complex(we don't need a beefy cpu/mcu to do it). After analyzing the data GPIO output pins are written to their values.
App in 3 short lines:
process input pins
determine pattern within processing of input pins
based on the received pattern write output pins
PIC24H is working at 40MHz, we can toggle the pin in 25ns, we would be grateful with at least 2x speed for future upgrades. So MCU which can run deterministic code and toggle pins with at least 80MHz (12.5ns) would be just fine. We don't need toggling of the pins at constant fast rate, we need a mcu which can toggle it in less than 25ns. We can't waste cycles while toggling, if one cycle is off we loose synchronization. Everything must be done in one cycle precision(or two but constant two cycles), so code should be 100% deterministic.
Please let me know if I'm missing something or if what we need can be done using some other methods on Cortex-M. Just keep in mind that if one cycle is lost(due cache or similar) we loose signal sync and app will not do it's work right or at all.
Thanks!
Br
According to this blog post, the interrupt latency for Cortex-M ranges from 12 to 16 cycles (assuming you are not using FPU registers) with best-case memories. M0 and M0+ are slower than M3/M4/M7. On top of this, you need to add the GPIO access times (and watch out for different clock frequencies between the core and the peripherals. Cortex-M7 will suppport higher clock speeds than M3/M4.
It still isn't clear how many cycles are consumed in recognising a pattern, and how an interrupt is useful in doing this - generally a low latency interface function like this would be an obvious target for dedicated hardware, but since you have an existing software solution it seems the problem is mis-specified.
Providing you avoid accessing any 'slow' peripherals which might stall the bus, the interrupt latency should be deterministic - any specific device should have documentation which covers this.
NXP have an application note which describes some of the detail of how to measure what is going on.

Why NOP/few extra lines of code/optimization of pointer aliasing helps? [Fujitsu MB90F543 MCU C code]

I am trying to fix an bug found in a mature program for Fujitsu MB90F543. The program works for nearly 10 years so far, but it was discovered, that under some special circumstances it fails to do two things at it's very beginning. One of them is crucial.
After low and high level initialization (ports, pins, peripherials, IRQ handlers) configuration data is read over SPI from EEPROM and status LEDs are turned on for a moment (to turn them a data is send over SPI to a LED driver).
When those special circumstances occur first and only first function invoking just a few EEPROM reads fails and additionally a few of the LEDs that should, don't turn on.
The program is written in C and compiled using Softune v30L32.
Surprisingly it is sufficient to add single __asm(" NOP ") in low level hardware init to make the program work as expected under mentioned circumstances. It is sufficient to turn off 'Control optimization of pointer aliasing' in Optimization settings. Adding just a few lines of code in various places helps too.
I have compared (DIFFed) ASM listings of compiled program for a version with and without __asm(" NOP ") and with both aforementioned optimizer settings and they all look just fine.
The only warning Softune compiler has been printing for years during compilation is as follows:
*** W1372L: The section is placed outside the RAM area or the I/O area (IOXTND)
I do realize it's rather general question, but maybe someone who has a bigger picture will be able to point out possible cause.
Have you got an idea what may cause such a weird behaviour? How to locate the bug and fix it?
During the initialization a few long (about 20ms) delay loops are used. They don't help although they were increased from about 2ms, yet single NOP in any line of the hardware initialization function and even before or after the function helps.
Both the wait loops works. I have checked it using an oscilloscope. (I have added LED turn on before and off after).
I have checked timming hypothesis by slowing down SPI clock from 1MHz to 500kHz. It does not change anything. Slowing down to 250kHz makes watchdog resets, as some parts of the code execute too long (>25ms).
One more thing. I have observed that adding local variables in any source file sometimes makes the problem disappear or reappear. The same concerns initializing uninitialized local variables. Adding a few extra lines of a code in any of the files helps or reveals the problem.
void main(void)
{
watchdog_init();
// waiting for power supply to stabilize
wait; // about 45ms
hardware_init();
clear_watchdog();
application_init();
clear_watchdog();
wait; // about 20ms
test_LED();
{...}
}
void hardware_init (void)
{
__asm("NOP"); // how it comes it helps? - it may be in any line of the function
io_init(); // ports initialization
clk_init();
timer_init();
adc_init();
spi_init();
LED_init();
spi_start();
key_driver_init();
can_init();
irq_init(); // set IRQ priorities and global IRQ enable
}
Could be one of many things but two spring to mind.
Timing.
Maybe the wait is not long enough for power to stabilize and not everything is synced to the clock. The NOP gets everything back in sync.
Alignment.
Perhaps the NOP gets your instructions aligned on a 32 or 64 bit boundary expected by the hardware. (we used to do this a lot on mainframe assemblers as IO operations often expected things to be on double word boundarys).
The problem was solved. It was caused by a trivial bug.
EEPROM's nHOLD and nCS signals were not initialized immediately after MCU's reset, but before the first use of the EEPROM. As a result they were 0's, so active.
This means EEPROM was selected, but waiting on hold. Meantime other transfer using SPI started. After 6 out of 8 CLK pulses EEPROM's nHOLD I/O pin was initialized and brought high. EEPROM was no longer on hold so it clocked in last two bits of a data for an other peripheral. Every subsequent operation on the EEPROM found it being having not synchronized CLK and MOSI.
When I have added NOP or anything other the moment of nHOLD 0->1 edge was shifted to happen after the last CLK pulse. Now CLK-MOSI were in sync.
All I have had to do was to initialize all the EEPROM's SPI lines, in
particular nHOLD and nCS right after the MCU reset.

Resources