Accessing special CP15 c13 Software Thread ID registers in lldb - arm

I need to access the special CP15 c13 Software Thread ID ARM registers and could not find other way of doing it within LLDB. I thought of using
expr __asm__ __volatile__("MRC p15, 0, r0, c13, c0, 3");
It runs, but has no effect on the content of r0. Thanks a lot!

If you're on iOS, you can use _pthread_self() to get this value.

Related

I had an SOC with two coprocessors R52 and A55. Both had PMU registers. Is there any possibility to acees both PMU registers from single code

I use the code below to access the PMU register of the core A55. Is there any similar way to access the same register on the R52?
asm volatile("mrs %0, pmccntr_el0" : "=r" (value));
I want access the value of register from both processor.
From the R52 TRM, the PCMCNTR is CRn=9, Op1=0, CRm=c14, Op2=0 and the following text,
The PMU counters and their associated control registers are accessible from the internal non-debug system register interface with MCR and MRC instructions.
The core is based on the AArch32 ISA and uses the 'co-processor' instructions. An equivalent is,
asm volatile("mrc p15, 0, %0, c9, c14, 0" : "=r" (value));
The reference is from section 12 of the Cortex-R52 manual. Table 12-1 details the co-processor op-code parameters.

Writing to memory mapped GPIO-registers does not write anything

On my NUCLEO-H7A3ZI-Q, I am trying to make the LED at port PB7 turn on using assembly. According to the STM32H7A3 reference manual, port B is mapped at address 0x50820400 (page 129):
The following code should write the value 0xc0 to the address 0x50820400, pointing into the first byte of GPIOB_MODER, which is rw:
.section .text
reset_handler:
nop
ldr r0, GPIO_ADDR
mov r1, #0xc0
strb r1, [r0]
done:
b done
.align 2
GPIO_ADDR: .word 0x58020400
.section .vectors
.word 0x20001ffe # Initial SP
.word reset_handler # Entrypoint
However, this does not work. Looking at the memory using STM32CubeProgrammer before and after the strb instruction gives the same value 0xFFFFFEBF at 0x58020400 before and after the write instruction.
The value 0xFFFFFEBF is the reset value of GPIOB_MODER, which makes sense. However, all other values in the memory mapped region are also 0xFFFFFEBF, whereas the documentation states the reset value of some other values should not be 0xFFFFFEBF. This might suggest that I have missed some type of initialization step, but I could not find anything in the manual that states something like that should be necessary, but the manual is ~3000 pages, so I might have missed something :)
You need to enable GPIO peripheral clock first. RCC register is used for that.
I would rather discourage you from learning STM32 uCs using assembler. It is the way to nowhere.
Start from the programming manual & reference manual where ARM uCs low level programming is described. Clocks, peripherals etc etc

Can DMB instructions be safely omitted in ARM Cortex M4

I am going through the assembly generated by GCC for an ARM Cortex M4, and noticed that atomic_compare_exchange_weak gets two DMB instructions inserted around the condition (compiled with GCC 4.9 using -std=gnu11 -O2):
// if (atomic_compare_exchange_weak(&address, &x, y))
dmb sy
ldrex r0, [r3]
cmp r0, r2
itt eq
strexeq lr, r1, [r3]
cmpeq.w lr, #0
dmb sy
bne.n ...
Since the programming guide to barrier instructions for ARM Cortex M4 states that:
Omitting the DMB or DSB instruction in the examples in Figure 41 and Figure 42 would not cause any error because the Cortex-M processors:
do not re-order memory transfers
do not permit two write transfers to be overlapped.
Is there any reason why these instructions couldn't be removed when targetting Cortex M?
I'm not aware of whether Cortex M4 can be used in a multi-cpu/multi-core configuration, but in general:
Memory barriers are never necessary (can always be omitted) in single-core systems.
Memory barriers are always necessary (can never be omitted) in multi-core systems where threads/processes operating on the same memory may be running on different cores.
Presence or lack of reordering memory writes at the hardware level is irrelevant.
Of course I would expect the DMB instruction to be essentially free on chips that don't support SMP, so I'm not sure why you'd want to try to hack it out.
Please note that, based on the question's referencing the code the compiler produces for atomic intrinsics, I'm assuming the context is for synchronization of atomics to make them match the high-level specification, not other uses like IO barriers for MMIO, and the above "never" should not be read as applying to this (unrelated) use (though I suspect, for the reasons you already cited, it doesn't apply to Cortex M4).

inline assembly in lldb expression evaluation?

I try to use inline assembly in LLDB's expression and it does not seem to work. As a toy example I run on ARM:
(lldb) expr __asm__ __volatile__("mov r0, 4");
(lldb) register read
General Purpose Registers:
r0 = 0x00000003
In reality I need to access the special CP15 c13 Software Thread ID registers and could not find other way of doing it within LLDB - so an idea here will be appreciated. I thought of using
expr __asm__ __volatile__("MRC p15, 0, r0, c13, c0, 3");
It runs, but has no effect on the content of r0.
Thanks a lot!
An expression that writes to a register won't actually change register state. Before running an expression, lldb saves the register state away, and then it runs your expression, then restores the register state when it is done. In almost all cases, you don't want whatever change was made to the registers in order to run some complex expression to get inherited by the program as it continues, or it will just crash.
If you need to change register state explicitly, then you have to use register read.
I know that wasn't the point of your exercise, this comment is more to explain why that approach didn't work.

How to measure elapsed time on ARM Cortex-M4 processor in C? [duplicate]

This question already has answers here:
Cycle counter on ARM Cortex M4 (or M3)?
(5 answers)
Closed 8 years ago.
I'm using a STM32F429 with ARM Cortex-M4 processor. I premise that I don't know the assembly of ARM, but I need to optimize the code. I read the solution of
How to measure program execution time in ARM Cortex-A8 processor?
that is that I need, but that solution is for Cortex-A8. For a whim, I tried to implement the code of link above on my code but I obtain a SEGV in this point:
if (enable_divider)
value |= 8; // enable "by 64" divider for CCNT.
value |= 16;
// program the performance-counter control-register:
asm volatile ("MCR p15, 0, %0, c9, c12, 0\t\n" :: "r"(value)); /*<---Here I have SEGV error*/
// enable all counters:
asm volatile ("MCR p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x8000000f));
// clear overflows:
asm volatile ("MCR p15, 0, %0, c9, c12, 3\t\n" :: "r"(0x8000000f));
How can I adjust this assembly code to perform on ARM Cortex-M4?
Ditch the Cortex-A8 method.
This is the correct way to do it for most Cortex-M based microcontrollers (do not use SysTick!):
Set up a timer, which runs at the same speed as the CPU.
Do not attach an interrupt to the timer.
Poll the timer value by using a single LDR instruction before you start your measuring.
Execute a NOP instruction, then run the code you want to measure.
Execute a NOP instruction, then poll the timer value by using a single LDR instruction when you end your measuring.
The NOP instructions are for accuracy, in order to make sure the pipelining does not disturb your results.
This is necessary on the Cortex-M3, because one LDR instruction takes two clock cycles. Two contiguous LDR instructions can be pipelined, so they take only 3 clock cycles total.
See the Cortex-M4 Technical Reference Manual at the ARM Information Center, for more information on the instruction set timing.
Of course, you should run your code from internal SRAM, in order to make sure it's not slowed down by the slow Flash memory.
I cannot guarantee that this will be 100% cycle-accurate on all devices, but it should get very close. (See Chris' comment below). You should also know that this is intended to be used in an environment with no interrupts.

Resources