STM32F4 Jump to Bootloader via SoftReset and without BOOT0 and BOOT1 Pin - c

i ask because of an answer to a similar quastion which can be found here: Jump to Bootloader in STM32 through appliction i.e using Boot 0 and Boot 1 Pins in Boot mode from User flash
The User "JF002" #JF002 answered "When I want to jump to the bootloader, I write a byte in one of the backup register and then issue a soft-reset. Then, when the processor will restart, at the very beginning of the program, it will read this register. This register contains the value indicating that it should reboot in bootloader mode. Then, the jump to the bootloader is much easier"
Can someone explain that solution to me step-by-step or show a code example?
At this time i write my exam and i am really reliant to help about this because it is only a little part with programming and i have no experience in that.

What I think User #JF002 is referring to by "backup register" is the SRAM onboard the STM32. The following has worked for me:
Configure backup registers at the beginning of the program using:
RCC_APB1PeriphClockCmd(RCC_APB1Periph_PWR, ENABLE);
PWR_BackupAccessCmd(ENABLE);
RCC_AHB1PeriphClockCmd(RCC_AHB1Periph_BKPSRAM, ENABLE);
PWR_BackupRegulatorCmd(ENABLE);
Write A_VALUE to a backup register during your program using:
(*(__IO uint32_t *) (BKPSRAM_BASE + OFFSET)) = A_VALUE;
where OFFSET is the address to write in SRAM. Use 0 for first address.
Issue a soft reset command using NVIC_SystemReset().
On boot, read (*(__IO uint32_t *) (BKPSRAM_BASE + OFFSET)) and check for A_VALUE:
RCC_APB1PeriphClockCmd(RCC_APB1Periph_PWR, ENABLE);
PWR_BackupAccessCmd(ENABLE);
RCC_AHB1PeriphClockCmd(RCC_AHB1Periph_BKPSRAM, ENABLE);
PWR_BackupRegulatorCmd(ENABLE);
void (*SysMemBootJump)(void);
volatile uint32_t addr = 0x1FFF0000; // For STM32F4 Discovery
if((*(__IO uint32_t *) (BKPSRAM_BASE + 0)) == A_VALUE)
{
(*(__IO uint32_t *) (BKPSRAM_BASE + 0)) = 0; // Reset memory, if desired.
SysMemBootJump = (void (*)(void)) (*((uint32_t *)(addr + 4))); // Set Bootloader address
__set_MSP(*(uint32_t *)addr); // Move Stack Pointer
SysMemBootJump(); // Execute Bootloader
}
else
{
RunYourApplication();
}

I have a small problem with man's answer. My problem is that there is a non-zero probability that the value of whatever memory location you pick is the same as A_VALUE when you power the system on. If that happens, then the software cannot tell whether the value that it reads (A_VALUE) is due to it having been written to the memory location prior to a soft reset, or due to random chance alone.
If the OP assumes the former, then the system starts up a bootload inappropriately with the potential of trashing the software. If he/she assumes the latter, then a required bootload will be missed. Neither is acceptable.
An improvement would be to have a more secure authentication wherein a random pattern was written to a block of memory that is saved as long as the system is powered, and a CRCC (cyclical redundancy code check) competed over that block. Then on soft reboot, the CRCC is calculated again. If the answer is still valid, the block is intact and it may be assumed that the boot was caused by a soft reboot.
Is it perfect? No, but the probability that all the bits in a block of memory bytes happens to yield the correct CRCC value is very much smaller than the probability of some small number of bits causing the value A_VALUE to be read.

Related

How to determine if an instruction is long or short at the event of an exception? (Variable Length Instructions)

My question is about Chapter 5 in this link.
I have an Error Correction Code which simply increments the program counter (PC) by 2 or 4 bytes according the length of the instruction at the time of exception. The core is e200z4.
As far as I know e200z4 can support Fixed Length Instructions of 4 bytes, too.
The thing I don't understand is that: To determine if Variable Length Instructions (VLE) enabled, we need to check the VLEMI bit in the ESR (Exception Syndrome Register). However, this register always contains 0x00000000. The only interrupt that we end up with is Machine Check Interrupt (IVOR1) (during Power On and Off tests with increasing On and fixed Off intervals).
So, why does the CPU not provide the information about the length of the instruction if VLE is used at the moment of interrupt, for instance via VLEMI bit inside ESR? How could I determine if the instruction at the time of interrupt is 2 bytes or 4 bytes long is fixed length or variable length?
Note1: isOpCode32Bit below is decoding opCode to determine instruction length, but isOpCode32Bit is relevant only if isFixedLength is 0, i.e. when (syndrome & VLEMI_MASK) is equal to 1. So, we need to have VLEMI value in syndrome somehow, but ESR seems to be always 0x00 (why?).
Note2: As mentioned before, we always end up in IVOR1 and the instruction address right before the interrupt is reachable (provided in a register).
// IVOR1 (Machine Check Interrupt Assembly part):
(ASSEMBLY)(mfmcsr r7) // copy MCSR into register 7 (MCSR in Chapter 5 in the link)
(ASSEMBLY)(store r7 &syndrome)
// IVOR2:
(ASSEMBLY)(mfesr r7) // copy ESR into register 7 (ESR in Chapter 5 in the link)
(ASSEMBLY)(store r7 &syndrome)
------------------------------------------------------
#define VLEMI_MASK 0x00000020uL
isFixedLength = ((syndrome & VLEMI_MASK) == 0);
if (isFixedLength || isOpCode32Bit)
{
PC += 4; // instruction is 32-bit, increase PC by 4
}
else
{
PC += 2; // instruction is 16-bit, increase PC by 2
}
When it comes to how these exception handlers work in real systems:
Sometimes handling the exception only requires servicing a page fault (e.g. via copy on write or disc reload).  In such cases, we don't even need to know the length of the instruction, just the effective memory address the instruction is accessing, and the CPUs generally offer that value.  If the page fault can be serviced, then re-running that faulting instruction (without advancing the PC) is appropriate (and if not, then halting the program, also without advancing the PC, is appropriate.)
In other cases, such as software emulation for instructions not present in this hardware, presumably hardware designers consider that such a software handler needs to decode the faulting instruction in order to emulate it, and so will figure out the instruction length anyway.
Thus, hardware turns the job of understanding the faulting instruction over to software.  As such system software needs to have deep knowledge of the instruction set architecture, while also likely requiring customization for each different hardware instantiation of the instruction set.
So, why does the CPU not provide information about the length of the instruction at the moment of interrupt inside ESR?
No CPU that I know tells us of the length of an instruction that caused an exception.  If they did, that would be convenient — but only for toy exception handlers.  For real systems, ultimately, this isn't a true burden.
How to determine if an instruction is long or short at the event of an exception? (Vairable Length Instructions)
Decode the instruction (while considering any instruction modes the CPU was in at the time of exception)!

Issue with global variable while making 32-bit counter

I am trying to do quadrature decoding using atmel xmega avr microcontroller. Xmega has only 16-bit counters. And in addition I have used up all the available timers.
Now to make 32-bit counter I have used one 16-bit counter and in its over/under flow interrupt I have increment/decrement a 16-bit global variable, so that by combining them we can make 32-bit counter.
ISR(timer_16bit)
{
if(quad_enc_mov_forward)
{
timer_over_flow++;
}
else if (quad_enc_mov_backward)
{
timer_over_flow--;
}
}
so far it is working fine. But I need to use this 32-bit value in various tasks running parallel. I'm trying to read 32-bit values as below
uint32_t current_count = timer_over_flow;
current_count = current_count << 16;
current_count = current_count + timer_16bit_count;
`timer_16_bit_count` is a hardware register.
Now the problem I am facing is when I read the read timer_over_flow to current_count in the first statement and by the time I add the timer_16bit_count there may be overflow and the 16bit timer may have become zero. This may result in taking total wrong value.
And I am trying to read this 32-bit value in multiple tasks .
Is there a way to prevent this data corruption and get the working model of 32-bit value.
Details sought by different members:
My motor can move forward or backward and accordingly counter increments/decrements.
In case of ISR, before starting my motor I'm making the global variables(quad_enc_mov_forward & quad_enc_mov_backward) set so that if there is a overflow/underflow timer_over_flow will get changed accordingly.
Variables that are modified in the ISR are declared as volatile.
Multiple tasks means that I'm using RTOS Kernel with about 6 tasks (mostly 3 tasks running parallel).
In the XMEGA I'm directly reading TCCO_CNT register for the lower byte.
One solution is:
uint16_t a, b, c;
do {
a = timer_over_flow;
b = timer_16bit_count;
c = timer_over_flow;
} while (a != c);
uint32_t counter = (uint32_t) a << 16 | b;
Per comment from user5329483, this must not be used with interrupts disabled, since the hardware counter fetched into b may be changing while the interrupt service routine (ISR) that modifies timer_over_flow would not run if interrupts are disabled. It is necessary that the ISR interrupt this code if a wrap occurs during it.
This gets the counters and checks whether the high word changed. If it did, this code tries again. When the loop exits, we know the low word did not wrap during the reads. (Unless there is a possibility we read the high word, then the low word wrapped, then we read the low word, then it wrapped the other way, then we read the high word. If that can happen in your system, an alternative is to add a flag that the ISR sets when the high word changes. The reader would clear the flag, read the timer words, and read the flag. If the flag is set, it tries again.)
Note that timer_over_flow, timer_16bit_count, and the flag, if used, must be volatile.
If the wrap-two-times scenario cannot happen, then you can eliminate the loop:
Read a, b, and c as above.
Compare b to 0x8000.
If b has a high value, either there was no wrap, it was read before a wrap upward (0xffff to 0), or it was read after a wrap downward. Use the lower of a or c.
Otherwise, either there was no wrap, b was read after a wrap upward, or it was read before a wrap downward. Use the larger of a or c.
The #1 fundamental embedded systems programming FAQ:
Any variable shared between the caller and an ISR, or between different ISRs, must be protected against race conditions. To prevent some compilers from doing incorrect optimizations, such variables should also be declared as volatile.
Those who don't understand the above are not qualified to write code containing ISRs. Or programs containing multiple processes or threads for that matter. Programmers who don't realize the above will always write very subtle, very hard-to-catch bugs.
Some means to protect against race conditions could be one of these:
Temporary disabling the specific interrupt during access.
Temporary disabling all maskable interrupts during access (crude way).
Atomic access, verified in the machine code.
A mutex or semaphore. On single-core MCU:s where interrupts cannot be interrupted in turn, you can use a bool as "poor man's mutex".
Just reading TCCO_CNT in multithreaded code is race condition if you do not handle it correctly. Check the section on reading 16bit registers in XMega manual. You should read lower byte first (this will be probably handled transparently by compiler for you). When lower byte is read, higher byte is (atomically) copied into the TEMP register. Then, reading high byte does read the TEMP register, not the counter. In this way atomic reading of 16bit value is ensured, but only if there is no access to TEMP register between low and high byte read.
Note that this TEMP register is shared between all counters, so context switch in right (wrong) moment will probably trash its content and therefore your high byte. You need to disable interrupts for this 16bit read. Because XMega will execute one instruction after the sei with interrupts disabled, the best way is probably:
cli
ld [low_byte]
sei
ld [high byte]
It disables interrupts for four CPU cycles (if I counted it correctly).
An alternative would to save shared TEMP register(s) on each context switch. It is possible (not sure if likely) that your OS already does this, but be sure to check. Even so, you need to make sure colliding access does not occur from an ISR.
This precaution should be applied to any 16bit register read in your code. Either make sure TEMP register is correctly saved/restored (or not used by multiple threads at all) or disable interrupts when reading/writing 16bit value.
This problem is indeed a very common and very hard one. All solutions will toit will have a caveat regarding timing constraints in the lower priority layers. To clarify this: the highest priority function in your system is the hardware counter - it's response time defines the maximum frequency that you can eventually sample. The next lower priority in your solution is the interrupt routine which tries to keep track of bit 2^16 and the lowest is your application level code which tries to read the 32-bit value. The question now is, if you can quantify the shortest time between two level changes on the A- and B- inputs of your encoder. The shortest time usually does occur not at the highest speed that your real world axis is rotating but when halting at a position: through minimal vibrations the encoder can double swing between two increments, thereby producing e.g. a falling and a rising edge on the same encoder output in short succession. Iff (if and only if) you can guarantee that your interrupt processing time is shorter (by a margin) than this minmal time you can use such a method to virtually extend the coordinate range of your encoder.

Could you tell me how to replace reset vector for secodary cpus in ARM architecture v7?

I know that arm reset vectors can be low(0x00000000) or high(0xffff0000).
But some SoC's codes in linux kernel saying the reset vectors can be changed.
For example, in mach-imx
static int __cpuinit imx_boot_secondary(unsigned int cpu, struct task_struct *idle)
{
imx_set_cpu_jump(cpu, v7_secondary_startup);
imx_enable_cpu(cpu, true);
return 0;
}
void imx_set_cpu_jump(int cpu, void *jump_addr)
{
cpu = cpu_logical_map(cpu);
writel_relaxed(virt_to_phys(jump_addr),
src_base + SRC_GPR1 + cpu * 8);
}
They say that secondary cpu can jump where you want by jump_addr.
Could you tell me how it works?
On ARMv7 cores implementing the security extensions for TrustZone - which as far as I'm aware is all of them - the "low vectors" address (when SCTLR.V == 0) is not hard-coded to 0, but instead set with the VBAR system register. VBAR is banked between secure and non-secure states, so that their vector tables can each be placed at any 32-byte aligned virtual address without interfering with each other, even with the MMU off in both states.
Note that whilst that's the question you asked, the code here isn't actually that at all. This is just stashing an entry point address in a non-volatile register in the reset controller (a common alternative is using a variable in some shared memory where the bootloader loaded itself to); the secondary CPU will still come out of reset into the default ROM vector and execute a whole bunch of self-initialisation code - code which will, coincidentally, involve setting the non-secure VBAR if the CPU is going to switch into non-secure state. That startup code will eventually end with reading this entry point address from wherever it was stashed and simply jumping to it.

Why do we need to delay when sending char to serial port?

Consider this code here:
// Stupid I/O delay routine necessitated by historical PC design flaws
static void
delay(void)
{
inb(0x84);
inb(0x84);
inb(0x84);
inb(0x84);
}
What is port 0x84? Why is it a design flaws? delay() is used in serial_putc() function:
static void
serial_putc(int c)
{
int i;
for (i = 0;
!(inb(COM1 + COM_LSR) & COM_LSR_TXRDY) && i < 12800;
i++)
delay();
outb(COM1 + COM_TX, c);
}
The file is from lab1 of the course Operating System Engineering from OCW.
The serial port is a piece of hardware with some semantic you have to accept. It usually has a shift register that makes the conversion from parallel to serial data. It can have a holding register for the next byte to send or even a FIFO for more than one byte. That's why you have to poll the line status register (LSR).
There are some hardware revisions out there that doesn't behave correctly. Your code looks like a workaround for a bug in old hardware. It shouldn't be necessary to read the port 0x84 here.
But the delay implementation can't be optimized out when you increase the compiler optimization level since it's accessing the I/O range. Running this code in a up-to-date hardware might be problematic if the run-time performance gives too little delay. You will have to verify that the maximum time that can be waited in the loop is sufficient to shift out one byte by the UART. Keep in mind that this is baudrate-dependend while you code example isn't.
The port 0x84 is used to access the "extra page register" (Overview). But reading this register should be a noop. Only the read operation itself is important to consume CPU cycles.

Writing Flash on STM32

I am implementing a emulated EEPROM in flash memory on a STM32 microprocessor, mostly based on the Application Note by ST (AN2594 - EEPROM emulation in STM32F10x microcontrollers).
The basics outline there and in the respective Datasheet and Programming manual (PM0075) are quite clear. However, I am unsure regarding the implications of power-out/system reset on flash programming and page erasure operations. The AppNote considers this case, too but does not clarify what exactly happens when a programming (write) operations is interrupted:
Does the address have a arbitrary (random) value? OR
Are only part of the bits written? OR
Does it have the default erase value 0xFF?
Thanks for hints or pointers to the relevant documentation.
Arne
This is not really a software question (much less C++). It belongs on electronics.se, but there does not seem to be an option to migrate questions there… only to sites such as superuser or webmasters.se.
The short answer is that hardware is inherently unreliable. Something can always in theory go wrong that interrupts the write process or causes the wrong bit to be written.
The long answer is that Flash circuits are usually designed for maximum reliability. A sudden power loss on write will probably not cause corruption because the driver circuit may have enough capacitance or the capability to operate under a low-voltage condition long enough to finish draining the charge as necessary. A power loss on erasure might be trickier. You really need to consult the manufacturer.
For a "soft" system reset with no power interruption, it would be pretty surprising if the hardware didn't always completely erase whatever bytes it was immediately working on. Usually the bytes are erased in a predefined order, so you can use the first or last ones to indicate whether a page is full or empty.
#include "stm32f10x.h"
#define FLASH_KEY1 ((uint32_t)0x45670123)
#define FLASH_KEY2 ((uint32_t)0xCDEF89AB)
#define Page_127 0x0801FC00
uint16_t i;
int main()
{
//FLASH_Unlock
FLASH->KEYR = FLASH_KEY1;
FLASH->KEYR = FLASH_KEY2;
//FLASH_Erase Page
while((FLASH->SR&FLASH_SR_BSY));
FLASH->CR |= FLASH_CR_PER; //Page Erase Set
FLASH->AR = Page_127; //Page Address
FLASH->CR |= FLASH_CR_STRT; //Start Page Erase
while((FLASH->SR&FLASH_SR_BSY));
FLASH->CR &= ~FLASH_CR_PER; //Page Erase Clear
//FLASH_Program HalfWord
FLASH->CR |= FLASH_CR_PG;
for(i=0; i<1024; i+=2)
{
while((FLASH->SR&FLASH_SR_BSY));
*(__IO uint16_t*)(Page_127 + i) = i;
}
FLASH->CR &= ~FLASH_CR_PG;
FLASH->CR |= FLASH_CR_LOCK;
while(1);
}
If you are using the EEProm Emulation driver, you shouldn't worry too much about the flash corruption issues as the EEProm emulation driver always keeps a shadow copy in another page. Worst come worst, you will lose the most recent values that are being written into the flash. If you look closely on the emulation driver, you will notice that it is nothing but essentially a wrapper to stm32fxx_flash.c in the standard peripheral library.
If you look at the application note, you will see the times that the emulation library take for the flash operations. Erasing a page typically takes the longest time (tens of milliseconds on M0 core - this depends on the clock frequency).
If you are using the EEProm Emulation driver, you had bettern add a function such as check the data after write finished.
For example, if you have 10 data to save, so you need write 11 bytes to flash. The last byte is checksum. And check the data after read from flash.

Resources