Enable ARM errata in inline assembly - c

I'm trying to learn asm by enabling workarounds for errata in a driver. That should be possible because kernel code is executed in the privileged world. The (minimalistic) code looks as follows.
unsigned int cp15c15 = 0, result = 0;
__asm__ volatile("mrc p15, 0, %0, c15, c0, 1" : "=r" (cp15c15));
cp15c15 |= (1<<22); /* Errata 845369 */
__asm__ volatile("mcr p15, 0, %0, c15, c0, 1" : "+r" (cp15c15));
This seems to be working, but when I read the register multiple times, I sometimes get a value without bit 22 enabled. (For example 0x000001 instead of 0x400001).
char buf[10];
__asm__ volatile("mrc p15, 0, %0, c15, c0, 1" : "=r" (cp15c15));
sprintf(buf, "0x%.8x", cp15c15);
copy_to_user(buffer, buf, 10);
I think I'm doing something wrong in the asm call. If someone can give me insight in why this only works 10% of the time, I really would appreciate it. (Asm is kind of cool).
EDIT:
The assembly code from the original NXP errata description:
MRC p15,0,rt,c15,c0,1
ORR rt,rt,#0x00400000
MCR p15,0,rt,c15,c0,1
EDIT 2:
If I enable this in ./linux/arch/arm/mm/proc-v7.S, the bit remains set when I read it from my driver. However if I disable it, it seems that bit switches off and on irregularly. It seems to corroborate to when the bit is set.

If the computer has multiple processor cores (like some i.MX 6 chips) then unless you make arrangements to run this all processor cores, the processor it runs on will be at the whim of the Linux scheduler.
Therefore might set the register on one CPU, then if the check is run on the that CPU the modification will show, but if the check is run on a different CPU the original value will be read.

Related

SAM4S bootloader not running application

I am trying to develop a custom bootloader for the Atmel SAM4S, but am not meeting with much success.
I have seen a few other posts on forums around this issue, but having tried solutions from each post I have found, it's time for my own post.
So far my bootloader receives data over UART, and writes it into flash, starting at address 0x00410000, however it fails to launch my program as expected.
Written based on numerous forum posts and also the Atmel bootloader example, this is my jumpToApp function:
void jumpToApp(void)
{
uint32_t loop;
// // Disable IRQ
Disable_global_interrupt();
__disable_irq();
// Disable system timer
SysTick->CTRL = 0;
// //Disable IRQs
for (loop = 0; loop < 8; loop++)
{
NVIC->ICER[loop] = 0xFFFFFFFF;
}
// Clear pending IRQs
for (loop = 0; loop < 8; loop++)
{
NVIC->ICPR[loop] = 0xFFFFFFFF;
}
// -- Modify vector table location
// Barriers
__DSB();
__ISB();
// Change the vector table
SCB->VTOR = ((uint32_t)FLASH_APP_START_ADDR & SCB_VTOR_TBLOFF_Msk);
// Barriers
__DSB();
__ISB();
// -- Enable interrupts
__enable_irq();
// -- Execute application
//------------------------------------------
__asm volatile("movw r1, #0x4100 \n"
"mov.w r1, r1, lsl #8 \n"
"ldr r0, [r1, #4] \n"
"ldr sp, [r1] \n"
"blx r0"
);
}
I have also tried this C based approach:
uint32_t v=FLASH_APP_START_ADDR;
asm volatile ("ldr sp,[%0,#0]": "=r" (v) : "0" (v));
typedef int(*fn_prt_t)(void);
fn_prt_t main_prog;
main_prog = (fn_prt_t)(appStartAddress);
main_prog();
and this alternative:
__DSB();
__ISB();
__set_MSP(*(uint32_t *) FLASH_APP_START_ADDR);
/* Rebase the vector table base address */
SCB->VTOR = ((uint32_t) FLASH_APP_START_ADDR & SCB_VTOR_TBLOFF_Msk);
__DSB();
__ISB();
__enable_irq();
/* Jump to application Reset Handler in the application */
asm("bx %0"::"r"(appStartAddress));
All with the same results, which makes sense as they all do pretty much the same thing.
Debugging the flash sector, I can see that the initial stack pointer in the new vector table is 0x20003b50, and the reset vector address is 0x004104d5, which both seem reasonable.
Stepping through, when execution is meant to jump to my application, I can see that the Program Counter is sat at 0x004104D0, which is close to, but not the reset vector address it should be.
The Stack Pointer also appears to be slightly off, reading 0x20003B28 instead of 0x20003b50.
Dissassembly shows exectution sitting at :
004104D0 b #-4
and never moving away.
Given that I have tried so many variations on jumping to application in flash that seem to have worked for others, I am beginning to think its my flash data at fault.
It is taken directly from the Intel Hex file generated by Atmel Studio 7.0, compiled with the linker offset flag:
-Wl,--section-start=.text=0x00410000
I am new to bootloaders, and have reached the limits of my understanding of low level cpu execution to provide more observation than the above, so any help or observations would be greatly appreciated!

Flush/Invalidate range by virtual address; ARMv8; Cache;

I'm implementing cache maintenance functions for ARMv8 (Cortex-A53) running in 32 bit mode.
There is a problems when I try to flush memory region by using virtual addresses (VA). DCacheFlushByRange looks like this
// some init.
// kDCacheL1 = 0; kDCacheL2 = 2;
while (alignedVirtAddr < endAddr)
{
// Flushing L1
asm volatile("mcr p15, 2, %0, c0, c0, 0" : : "r"(kDCacheL1) :); // select cache
isb();
asm volatile("mcr p15, 0, %0, c7, c14, 1" : : "r"(alignedVirtAddr) :); // clean & invalidate
dsb();
// Flushing L2
asm volatile("mcr p15, 2, %0, c0, c0, 0" : : "r"(kDCacheL2) :); // select cache
isb();
asm volatile("mcr p15, 0, %0, c7, c14, 1" : : "r"(alignedVirtAddr) :); // clean & invalidate
dsb();
alignedVirtAddr += lineSize;
}
DMA is used to validate the functions. DMA copies one buffer into another. Source buffer is flushed before DMA, destination buffer is invalidated after DMA completion. Buffers are 64 bytes aligned. Test
for (uint32_t i = 0; i < kBufSize; i++)
buf1[i] = 0;
for (uint32_t i = 0; i < kBufSize; i++)
buf0[i] = kRefValue;
DCacheFlushByRange(buf0, sizeof(buf0));
// run DMA
while (1) // wait DMA completion;
DCacheInvalidateByRange(buf1, sizeof(buf1));
compare(buf0, buf1);
In dump I could see that buf1 still contains only zeroes. When caches are turned off, result is correct so DMA itself works correctly.
Other point is when whole D-cache is flushed/invalidated by set/way result is correct.
// loops th/ way & set for L1 & L2
asm volatile("mcr p15, 0, %0, c7, c14, 2" : : "r"(setway) :)
So shortly flush/invalidate by set/way work correctly. The same by flashing/invalidating using VA doesn't. What could be a problem?
PS: kBufSize=4096;, total buffer size is 4096 * sizeof(uint32_t) == 16KB
There is no a problems w/ the function itself rather than Cortex-A53 cache implementation features.
From Cortex-A53 TRM
DCIMVAC operations in AArch32 and DC IVAC instructions in AArch64 perform an invalidate of the target address. If the data is dirty within the cluster then a clean is performed before the invalidate.
So there is no actual invalidate, there's clean and invalidate
Normal (at least for me) sequence is
flush(src);
dma(); // copy src -> dst
invalidate(dst);
But due to invalidate() does flush, old data from cache (dst region) is written on top of data in memory after DMA transfer.
Solution/workaround is
flush(src);
invalidate(dst);
dma(); // copy src -> dst
invalidate(dst); // again, that's right*.
* Data from 'dst' memory region could be fetched into a cache in advance. If that happens before DMA put data in memory, an old data from cache would be used. Second invalidate is fine, since data is not marked as 'dirty', it would be performed as 'pure invalidate'. No clean/flush in this case.

ARM v6 IRQ context switch

I'm trying to write a context switch in a timer interrupt handler. Currently, the context switch is able to switch between contexts on command (cooperative). In the interrupt handler, I was trying to:
Save the current program counter as the place the old thread needs to keep executing
Switch into SVC mode to actually perform the context switch
Switch back into IRQ mode and change the link register to be the saved PC from the new thread
Return from the IRQ handler to the IRQ link register
I believe I can do the first two properly, but I was wondering: how can I switch back into interrupt mode, or at least modify the SVC R13 and R15 from the interrupt handler?
I'm using an ARM v6 processor; thanks so much for the help!
Edit: here's basically what my switch is:
void interrupt_yield() {
unsigned int old_mode;
__asm__("mrs %0, cpsr" : "=r" (old_mode));
__asm__("msr cpsr_c, %0" : : "r" (MODE_SVC));
PUSH_ALL; // Macro for push {r0-r12, lr}
__asm__("mov %0, sp" : "=r"(sp));
manager->threads[manager->current_thread].sp = sp;
unsigned nt = (manager->current_thread + 1) % manager->thread_counter;
if (CURRENT_THREAD.status == ACTIVE) {
CURRENT_THREAD.status = INACTIVE;
}
manager->current_thread = nt;
CURRENT_THREAD.status = ACTIVE;
SET_SP(CURRENT_THREAD.sp);
POP_ALL;
__asm__("msr cpsr, %0" : : "r" (old_mode));
}
void timer_vector() { // This is called by assembly in interrupt mode
armtimer_clear_interrupt(); // clear timer interrupt
interrupt_yield(); // Calls above function
}
The goal is to change the IRQ link register to return to the new function. I can't seem to switch back into interrupt mode, however, to do this.
1 more edit: I never actually switch the IRQ link register; I realize this but am not even switching back into IRQ mode so this is a later problem to fix.
For the ARMv6 you need to change modes to get the banked registers. Your sample code already has many of the necessary details.
#define MODE_IRQ 0x12
#define MODE_SVC 0x13
unsigned int mode; /* original mode */
/* target data... */
unsigned int lr_irq;
unsigned int sp_irq;
unsigned int spsr;
asm (" mrs %0, cpsr\n" /* Save mode. */
" msr cpsr_c,%4 \n" /* to irq mode */
" mov %1, lr\n" /* Get lr_irq */
" mov %2, sp\n" /* Get sp_irq */
" mrs %3, spsr\n" /* Get spsr_irq */
" msr cpsr, %0\n" /* back to old mode */
: "=&r" (mode), "=r"(lr_irq),
"=r"(sp_irq), "=r"(spsr)
: "I" (MODE_IRQ));
gcc will allocate the lr_irq etc to general registers (non-banked) and you can transfer the data across modes. The ARMv7 with virtualization extensions has an instruction to avoid this switch.
You should be aware that the timer interrupt could occur in many contexts. It is probably prudent to at least check the spsr from the IRQ mode and have some debug (assert like) that verifies it is user mode. If this never triggers and you think an IRQ can only happen in user mode then the 'debug' can be removed.
Another method is to do this in the assembler of the IRQ handler and pass them to the interrupt_yield() routine in r0-r2 for instance. The ARM EABI puts parameters in r0-r2 so interrupt yield needs parameters. Once you have this data there should be no need to return to the IRQ mode. I highly recommend this method for production code. The above is good for prototyping.
Related: Explicitly accessing banked registers on ARM

Why this function can't be static in Linux driver

I have some assembly codes encapsulated in a static function of my driver code. My codes is like
static int _ARMVAtoPA(void *pvAddr)
{
__asm__ __volatile__(
/* ; INTERRUPTS_OFF" */
" mrs r2, CPSR;\n" /* r2 saves current status */
"CPSID iaf;\n" /* Disable interrupts */
/*In order to handle PAGE OUT scenario, we need do the same operation
twice. In the first time, if PAGE OUT happens for the input address,
translation abort will happen and OS will do PAGE IN operation
Then the second time will succeed.
*/
"mcr p15, 0, r0, c7, c8, 0;\n "
/* ; get VA = <Rn> and run nonsecure translation
; with nonsecure privileged read permission.
; if the selected translation table has privileged
; read permission, the PA is loaded in the PA
; Register, otherwise abort information is loaded
; in the PA Register.
*/
/* read in <Rd> the PA value */
"mrc p15, 0, r1, c7, c4, 0;\n"
/* get VA = <Rn> and run nonsecure translation */
" mcr p15, 0, r0, c7, c8, 0;\n"
/* ; with nonsecure privileged read permission.
; if the selected translation table has privileged
; read permission, the PA is loaded in the PA
; Register, otherwise abort information is loaded
; in the PA Register.
*/
"mrc p15, 0, r0, c7, c4, 0;\n" /* read in <Rd> the PA value */
/* restore INTERRUPTS_ON/OFF status*/
"msr cpsr, r2;\n" /* re-enable interrupts */
"tst r0, #0x1;\n"
"ldr r2, =0xffffffff;\n"
/* if error happens,return INVALID_PHYSICAL_ADDRESS */
"movne r0, r2;\n"
"biceq r0, r0, #0xff;\n"
"biceq r0, r0, #0xf00;" /* if ok, clear the flag bits */
);
}
static unsigned long CpuUmAddrToCpuPAddr(void *pvCpuUmAddr)
{
int phyAdrs;
int mask = 0xFFF; /* low 12bit */
int offset = (int)pvCpuUmAddr & mask;
int phyAdrsReg = _ARMVAtoPA((void *)pvCpuUmAddr);
if (INVALID_PHYSICAL_ADDRESS != phyAdrsReg)
phyAdrs = (phyAdrsReg & (~mask)) + offset;
else
phyAdrs = INVALID_PHYSICAL_ADDRESS;
return phyAdrs;
}
As you can see, I tried to convert a virtual address which from user space to physical address. I'm porting this codes from another project, except I modify the _ARMVAtoPA function to static function.
When I'm using static int _ARMVAtoPA(void *pvAddr):
this convert function (which with bunch of assembly codes in it) is always return fffffff, error case for sure.
When I'm using int _ARMVAtoPA(void *pvAddr):
this convert function would working fine.
Can anyone explain to me, why results are vary when I use static and non-static function.
Thanks
The ASM code doesn't define which register holds the function argument pvAddr and which register holds the return value. It just assumes the compiler follows mips ABI.
But if the function is inlined (where probably static does), the register allocation may change, so the asm code can be totally wrong.
In order to fix the problem, you should use gcc extension to assign registers for function arguments and return value. And also declare which registers it will use w/o restore, so the compiler can restore registers after the call in case the function is inlined.

Measure executing time on ARM Cortex-A8 using hardware counter

I'm using a Exynos 3110 processor (1 GHz Single-core ARM Cortex-A8, e.g. used in the Nexus S) and try to measure execution times of particular functions. I have an Android 4.0.3 running on the Nexus S. I tried the method from
[1] How to measure program execution time in ARM Cortex-A8 processor?
I loaded the kernel module to allow reading the register values in user mode. I am using the following program to test the counter:
static inline unsigned int get_cyclecount (void)
{
unsigned int value;
// Read CCNT Register
asm volatile ("MRC p15, 0, %0, c9, c13, 0\t\n": "=r"(value));
return value;
}
static inline void init_perfcounters (int do_reset, int enable_divider)
{
// in general enable all counters (including cycle counter)
int value = 1;
// peform reset:
if (do_reset)
{
value |= 2; // reset all counters to zero.
value |= 4; // reset cycle counter to zero.
}
if (enable_divider)
value |= 8; // enable "by 64" divider for CCNT.
value |= 16;
// program the performance-counter control-register:
asm volatile ("MCR p15, 0, %0, c9, c12, 0\t\n" :: "r"(value));
// enable all counters:
asm volatile ("MCR p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x8000000f));
// clear overflows:
asm volatile ("MCR p15, 0, %0, c9, c12, 3\t\n" :: "r"(0x8000000f));
}
int main(int argc, char **argv)
{
int i = 0;
unsigned int start = 0;
unsigned int end = 0;
printf("Hello Counter\n");
init_perfcounters(1,0);
for(i=0;i<10;i++)
{
start = get_cyclecount();
sleep(1); // sleep one second
end = get_cyclecount();
printf("%u %u %u\n", start, end, end - start);
}
return 0;
}
According to [1] the counter is incremented with each clock cycle. I switched the scaling_governor to userspace and set the CPU frequency to 1GHz to make sure that the clock frequency is not change by Android.
If I run the program the sleeps of 1 second are executed, but the counter values are in the range of ~200e6, instead of the expected 1e9. Is there anything processor specific I am missing here? Is the clock rate of the counters different to the clock rate of the processor ?
Check out this professor's page: http://users.ece.utexas.edu/~valvano/arm/
He has multiple full example programs that have to do with time/periodic-timers/measuring-execution-time, they are developed for ARM Cortex-M3 based microcontrollers. I hope this isn't very different from what you are working on.
I think you would be interested in Performance.c
Are you sure governors are used in Android for performance management the same way that in standard Linux? And are you using custom Android image or one provided by manufacturer? I would assume there are lower level policies in place in manufacturer provided image (tied to sleeps or modem activity, etc). It could be also that sleep code directly scales voltage and frequency. It might be worthwhile to disable the whole CPUFreq not just the policies (or governors).

Resources