Unhandled exception 0xC0000008: An invalid handle was specified in dynamic recompiler - c

The code is a MIPS->ARM dynamic recompiler. After many times of running recompile_function(), it crashes at the condition clause of below code, though it can run this line of code without any issue during the earlier function running.
void recompile_function(){
//recompilation code
......
if (out > (u_char *)((u_char *)base_addr + (1 << TARGET_SIZE_2) - MAX_OUTPUT_BLOCK_SIZE - JUMP_TABLE_SIZE))
out = (u_char *)base_addr;
// other code
......
}
Variable out is the pointer used to write the recompiled code. base_addr always points to the original start of the allocated memory space. Variable out progresses 4 bytes each time an instruction is written, while base_addr keeps unchanged.
extern char extra_memory[33554432];
#define BASE_ADDR ((int)(&extra_memory))
void *base_addr;
u_char *out;
void new_dynarec_init()
{
protect_readwrite();
base_addr = ((int)(&extra_memory));
out = (u_char *)base_addr;
}
The error is "Unhandled exception at 0x7738EC9F (ntdll.dll) in frontend.exe: 0xC0000008: An invalid handle was specified."
This is the disassemble code around the faulting clause instruction.
#if NEW_DYNAREC == NEW_DYNAREC_ARM
__clear_cache((void *)beginning, out);
53830242 ldr r1,[r9]
53830246 add r3,r4,r5,lsl #2
5383024A mov r0,r7
5383024C str r3,[r2]
5383024E blx __clear_cache_bugfix (537D19DCh)
//cacheflush((void *)beginning,out,0);
#endif
// If we're within 256K of the end of the buffer,
// start over from the beginning. (Is 256K enough?)
if (out > (u_char *)((u_char *)base_addr + (1 << TARGET_SIZE_2) - MAX_OUTPUT_BLOCK_SIZE - JUMP_TABLE_SIZE))
53830252 mov r2,#0xAA98
53830256 movt r2,#0x5462
5383025A ldr r3,new_recompile_block+0A1E8h (53830550h)
5383025C ldr r4,[r2]
5383025E ldr r2,[r9]
53830262 add r3,r3,r4
53830264 cmp r2,r3
53830266 bls new_recompile_block+9F06h (5383026Eh)
out = (u_char *)base_addr;
53830268 mov r2,r4
5383026A str r4,[r9]
It's the line the debugger prompts me. I checked the disassemble window that also points to this line. What's more, if I choose continue, a new error will pop up and the program will crash at the code line "__fastfail" in function __report_gsfailure. The new error is "Unhandled exception at 0x53831547 (mupen64plus.dll) in frontend.exe: Stack cookie instrumentation code detected a stack-based buffer overrun". 0x53831546 is the address of code line "__fastfail".
#pragma warning(push)
#pragma warning(disable: 4100) // unreferenced formal parameter
__declspec(noreturn) void __cdecl __report_gsfailure(GSFAILURE_PARAMETER)
{
5383153C push {r0,r1}
5383153E push {r11,lr}
53831542 mov r11,sp
__fastfail(FAST_FAIL_STACK_COOKIE_CHECK_FAILURE);
53831544 movs r0,#2
53831546 __fastfail
}
// Declare stub for rangecheckfailure, since these occur often enough that the
// code bloat of setting up the parameters hurts performance
__declspec(noreturn) void __cdecl __report_rangecheckfailure()
{
53831548 push {r11,lr}
5383154C mov r11,sp
__report_securityfailure(FAST_FAIL_RANGE_CHECK_FAILURE);
5383154E movs r0,#8
53831550 bl __report_securityfailure (53831558h)
53831554 __debugbreak
The register PC  = 53831546 so the execution point is __fastfail.

The error is caused by __clear_cache which is located above the crashed condition clause. Disabling that function call fixed the crash.

Related

APM32 C Copy & Execute function in RAM

hi i am using an APM32F003 with Keil uVision compiler.
is a little known microcontroller but compatible with STM32.
I would like to write functions in RAM for different purposes.
I don't want to use the linker attribute to assign the function in ram,
but I want to copy a written one in flash and transfer it in RAM in run-time.
below the code I am trying to write but for now it is not working.
I think it's not possible in this way right?
static volatile uint8_t m_buffer_ram[100];
void flash_function()
{
/* Example */
LED2_ON();
}
void flash_function_end()
{
}
void call_function_in_ram()
{
uint32_t size = (uint32_t) flash_function_end - (uint32_t) flash_function;
/* clone function in RAM */
for (uint32_t i = 0; i < size; i++)
m_buffer_ram[i] = (((uint8_t*)&flash_function)[i]);
__disable_irq();
/* cast buffer to function pointer */
void(*func_ptr)(void) = (void(*)(void)) (&m_buffer_ram);
/* call function in ram */
func_ptr();
__enable_irq();
}
Eugene asked if your function is relocatable. This is very important. I have had issues in the past wherein I copied a function from flash to RAM, and the compiler used an absolute address in the "flash" based function. Therefore the code which was running in RAM jumped back into the flash. This is just one example of what might go wrong with moving code which is not relocatable.
If you have a debugger that can disassemble and also step through the compiled code for you, that would be ideal.
Note also "the busybee" pointed out that code which is adjacent in source code does is not guaranteed to be adjacent in the compiled binary, so your method of finding the size of the code is not reliable.
You can look in the map file to determine the size of the function.
I agree with the comment that you would be better off learning to have the linker do the work for you.
None of what I am saying here is new; I am just reinforcing the comments made above.
CODE
static volatile uint8_t m_buffer_ram[200];
static uint32_t m_function_size;
void flash_function(void)
{
LED2_ON();
}
void flash_function_end(void)
{
}
void test(void)
{
m_function_size = (uint32_t) flash_function_end - (uint32_t) flash_function;
/* clone function in RAM */
for (uint16_t i = 0; i < m_function_size; i++)
m_buffer_ram[i] = (((uint8_t*)&flash_function)[i]);
__disable_irq();
/* cast buffer to function pointer, +1 Thumb Code */
void(*func_ptr)(void) = (void(*)(void)) (&m_buffer_ram[1]);
/* call function in ram */
func_ptr();
__enable_irq();
}
MAP
Image Symbol Table
Symbol Name Value Ov Type Size Object(Section)
Local Symbols
.....
m_function_size 0x20000024 Data 4 test.o(.data)
m_buffer_ram 0x200001f0 Data 200 test.o(.bss)
Global Symbols
.....
flash_function 0x00000399 Thumb Code 12 test.o(i.flash_function)
flash_function_end 0x000003a9 Thumb Code 2 test.o(i.flash_function_end)
Memory Map of the image
Exec Addr Load Addr Size Type Attr Idx E Section Name Object
.....
0x00000398 0x00000398 0x00000010 Code RO 355 i.flash_function test.o
0x000003a8 0x000003a8 0x00000002 Code RO 356 i.flash_function_end test.o
DISASSEMBLE
.....
30: m_function_size = (uint32_t) flash_function_end - (uint32_t) flash_function;
31:
0x00000462 480D LDR r0,[pc,#52] ; #0x00000498
0x00000464 4A0D LDR r2,[pc,#52] ; #0x0000049C
0x00000466 4C0E LDR r4,[pc,#56] ; #0x000004A0
0x00000468 1A81 SUBS r1,r0,r2
0x0000046A 6021 STR r1,[r4,#0x00]
32: for (uint16_t i = 0; i < m_function_size; i++)
0x0000046C 2000 MOVS r0,#0x00
33: m_buffer_ram[i] = (((uint8_t*)&flash_function)[i]);
34:
0x0000046E 4B0D LDR r3,[pc,#52] ; #0x000004A4
0x00000470 2900 CMP r1,#0x00
0x00000472 D905 BLS 0x00000480
33: m_buffer_ram[i] = (((uint8_t*)&flash_function)[i]);
0x00000474 5C15 LDRB r5,[r2,r0]
0x00000476 541D STRB r5,[r3,r0]
32: for (uint16_t i = 0; i < m_function_size; i++)
0x00000478 1C40 ADDS r0,r0,#1
0x0000047A B280 UXTH r0,r0
32: for (uint16_t i = 0; i < m_function_size; i++)
0x0000047C 4288 CMP r0,r1
0x0000047E D3F9 BCC 0x00000474
35: __disable_irq();
36:
0x00000480 B672 CPSID I
37: void(*func_ptr)(void) = (void(*)(void)) (&m_buffer_ram[1]);
0x00000482 1C5B ADDS r3,r3,#1
38: func_ptr();
39:
0x00000484 4798 BLX r3
40: __enable_irq();
41:
0x00000486 B662 CPSIE I
I report all the information that I was able to recover.
I added a shift for the Thumb Code; the calculation of the function size coincides with the MAP file
my doubt is that in debug the pointer cannot jump to a point of the RAM .. for this reason I activate a led to see if (flashing code and run) this turns on without debugging.
as reported below, the read values coincide
(0x000003a8)flash_function_end - (0x00000398)flash_function = 0x10
(0x20000024)m_function_size = 0x10
func_ptr = 0x200001f1;

Volatile variable not updated despite unoptimized assembly

I'm working on a dual-core Cortex-R52 ARM chip, with an instance of FreeRTOS running in each core (AMP), and using ICCARM (IAR) as my compiler.
I need to ensure that CPU1 initialize some tasks, in order to pass their handler to CPU0 through the shared memory, but both cores are executed at the same time, which creates a problem in the scenario where CPU0 gets to using the supposedly passed handler, that wasn't created yet by CPU1.
A solution I tried, was creating a volatile variable pdSTART at a dedicated address space, which keeps CPU0 looping as long as its equal to 0:
#pragma location = 0x100F900C
__no_init volatile uint8_t pdSTART;
while (pdSTART == 0)
{
vTaskDelay(10 / portTICK_PERIOD_MS);
}
As expected the generated assembly was as follows:
vTaskDelay(10 / portTICK_PERIOD_MS);
0xc3a: 0x200a MOVS R0, #10 ; 0xa
0xc3c: 0xf000 0xf93c BL vTaskDelay ; 0xeb8
while (pdSTART == 0)
0xc40: 0x7b28 LDRB R0, [R5, #0xc]
0xc42: 0x2800 CMP R0, #0
0xc44: 0xd0f9 BEQ.N 0xc3a
With register R5 containing the address 0x100F9000.
Using the debugger I made sure CPU0 reaches the while condition first and gets in the loop, I then made CPU1 change the value of pdSTART, which I confirmed on the memory map
pdSTART:
0x100f'900c: 0x0000'0001 DC32 VECTOR_RBLOCK$$Base
And yet the condition on CPU0 remains false and pdSTART is never updated, both the memory map and "Watch" window of the debugger show the variable updated.
I tried explicitly writing a read from the address of pdSTART:
void func(void)
{
asm volatile ("" : : "r" (*(uint8_t *)0x100F900C));
}
But the generated assembly was the same as the while condition.
Is the old value of pdSTART saved into some kind of stack or cache? is there a way to forcefully update it?
Thank you.

MSP430F5xxx RTOS restore context assembler not clear

I'm trying to port a FunkOS RTOS from MSP430F2xxx to MSP430F5529. I'm using CCS 10.4 with TI v20.2.5 LTS compiler. I ported most of the code but I have problem with the RTOS taking over the control. After I initialize all the tasks I call Task_StartTasks function. My problem is with the assembler part of this function.
void Task_StartTasks(void)
{
Task_SetScheduler(TRUE);
Task_Switch();
// Restore the context...
asm(" mov.w &pstCurrentTask, r12");
asm(" mov.w #r12, r1");
asm(" pop r15");
asm(" pop r14");
asm(" pop r13");
asm(" pop r12");
asm(" pop r11");
asm(" pop r10");
asm(" pop r9");
asm(" pop r8");
asm(" pop r7");
asm(" pop r6");
asm(" pop r5");
asm(" pop r4");
asm(" bic.w #0x00F0, 0(SP)");
asm(" reti");
}
pstCurrentTask is a global pointer to following structure:
typedef struct Task_Struct
{
/*! This is the basic task control block in the RTOS. It contains parameters
and state information required for a task, including stack, priority,
timeouts, entry funcitons, and task pending semaphore.
*/
//--[Task Control Block Entries]-----------------------------------------
WORD *pwTopStack; //!< Pointer to current stack top
WORD *pwStack; //!< Stack pointer, defined by the task.
USHORT usStackSize; //!< Size of the stack in MAU
//--[Task Definitions]---------------------------------------------------
BYTE *pacName; //!< Pointer to the name of the task (ASCII)
TASK_FUNC pfTaskFunc; //!< Pointer to the entry function
UCHAR ucPriority; //!< Task priority
TASK_STATE eState; //!< Current task state
USHORT usTimeLeft; //!< Ticks remaining in blocked/sleep state
BOOL bTimeout; //!< Indicates that an IO operation timed out
struct Task_Struct *pstNext; //!< Pointer to the next task (handled by scheduler)
} TASK_STRUCT;
Task_SetScheduler and Task_Switch make sure that pstCurrentTask is pointing to the correct task structure. As far as I understand this:
asm(" mov.w &pstCurrentTask, r12");
asm(" mov.w #r12, r1");
Moves the value of pstCurrentTask (in this case this is just an address to the structure?) to the R1 which for MSP430 is stack pointer (Why?). Then all registers are popped and the magic happens here.
I don't understand what is going on here:
asm(" bic.w #0x00F0, 0(SP)");
It would be great if someone could explaing the assembler here.
Don't miss the #. The stack pointer is (re)set to pstCurrentTask->pwTopStack (since it's the first field in the structure, dereferencing the pointer to the structure will do the trick without any extra offset needed), presumably the pointer to the original stack was stored here after the registers were pushed and is now put back into place.
Then the registers are popped. At the very end reti causes two more registers to be popped: the status register and the program counter (instruction pointer), the latter resulting in a jump/return to the stored value.
But just before that happens, the bic.w clears some bits in this value on the stack, namely it turns off low-power-mode by clearing CPUOFF, OSCOFF, SCG0, SCG1. It operates on the value that is currently on the top of the stack (dereferencing SP with offset zero) which is the soon-to-be status register. This means that even if the stored status register had those bits set, indicating low power mode, they won't be set anymore when it is popped again as part of reti.
If you were to translate that line to C, this is what it would look like:
SP[0] &= ~0x00f0;
// 0x00f0 comes from (CPUOFF | OSCOFF | SCG0 | SCG1)
Note that the 0x00f0 isn't a peripheral or anything like that. It's just a bitmask used on the status register. In the manual, check chapter 2.3.1 on page 40 (entering and exiting lower power mode). A very similar command is used there, but with a sum of named constants instead of a numerical value, in our case that would be CPUOFF+OSCOFF+SCG0+SCG1 instead of 0x00f0. If you look at chapter 3.2.3 on page 46 (status register) you can see why - those 4 flags are at bits 4-7 of the status register, i.e. their values are 0x0010, 0x0020, 0x0040 and 0x0080 respectively, which when added or ORed together gives you 0x00f0.

lpc 1768 Secondary Boot Loader error

I am working on lpc 1768 SBL which includes the following code to jump to user application.
#define NVIC_VectTab_FLASH (0x00000000)
#define USER_FLASH_START (0x00002000)
void NVIC_SetVectorTable(DWORD NVIC_VectTab, DWORD Offset)
{
NVIC_VECT_TABLE = NVIC_VectTab | (Offset & 0x1FFFFF80);
}
void execute_user_code(void)
{
void (*user_code_entry)(void);
/* Change the Vector Table to the USER_FLASH_START
in case the user application uses interrupts */
NVIC_SetVectorTable(NVIC_VectTab_FLASH, USER_FLASH_START);
user_code_entry = (void (*)(void))((USER_FLASH_START)+1);
user_code_entry();
}
It was working without any errors. After adding some heap memory to the code, the machine is stuck. I tried out different values for heap. Some of them are working. After some deep debugging ,I could find out that machine was not stuck when there is a value which is divisible by 64 is at first locations of application bin file.
ie,
When I select heap memory as 0x00002E90 ,it generates stack base as 0x10005240 . Then stack base + stack size(0x2900) gives a value = 0x10007B40.
I found this is loaded at first locations of application bin file. This value is divisible by 64 and the code is running without stuck.
But ,when I select heap memory as 0x00002E88 ,it generates stack base as 0x10005238 . Then stack base + stack size(0x2900) gives a value = 0x10007B38.
This value is not divisible by 64 and the code is stuck.
The disassembly is as follows in this case.
When stepping from address 0x0000 2000 ,it goes to hard fault handler. But in the earlier case it doesn't go to hard fault. It continues and works as well.
I cannot understand the instruction DCW and why it goes to hard fault.
Can anyone tell me the reason behind this?
Executing the vector table is what you do on older ARM7/ARM9 parts (or bigger Cortex-A ones) where the vectors are instructions, and the first entry will be a jump to the reset handler, but on Cortex-M, the vector table is pure data - the first entry is your initial stack pointer, and the second entry is the address of the reset handler - so trying to execute it is liable to go horribly wrong..
As it happens, in this case you can actually get away with executing most of that vector table by sheer chance, because the memory layout leads to each halfword of the flash addresses becoming fairly innocuous instructions:
2: 1000 asrs r0, r0, #32
4: 20d9 movs r0, #217 ; 0xd9
6: 0000 movs r0, r0
8: 20f5 movs r0, #245 ; 0xf5
a: 0000 movs r0, r0
...
Until you eventually bumble through all the remaining NOPs to 0x20d8 where you pick up the real entry point. However, the killer is that initial stack pointer, because thanks to the RAM being higher up, you get this:
0: 7b38 ldrb r0, [r7, #12]
The lower byte of 0x7bxx is where the base register is encoded, so by varying the address you have a crapshoot as to which register that is, and furthermore whether whatever junk value is left in there also happens to be a valid address to load from. Do you feel lucky?
Anyway, in summary: Rather than call the address of the vector table directly, you need to load the second word from it, then call whatever address that contains.

Using #defined values before RAM has been initialised

I am writing the boot-up code for an ARM CPU. There is no internal RAM, but there is 1GB of DDRAM connected to the CPU, which is not directly accessible before initialisation. The code is stored in flash, initialises RAM, then copies itself and the data segment to RAM and continue execution there. My program is:
#define REG_BASE_BOOTUP 0xD0000000
#define INTER_REGS_BASE REG_BASE_BOOTUP
#define SDRAM_FTDLL_REG_DEFAULT_LEFT 0x887000
#define DRAM_BASE 0x0
#define SDRAM_FTDLL_CONFIG_LEFT_REG (DRAM_BASE+ 0x1484)
... //a lot of registers
void sdram_init() __attribute__((section(".text_sdram_init")));
void ram_init()
{
static volatile unsigned int* const sdram_ftdll_config_left_reg = (unsigned int*)(INTER_REGS_BASE + SDRAM_FTDLL_CONFIG_LEFT_REG);
... //a lot of registers assignments
*sdram_ftdll_config_left_reg = SDRAM_FTDLL_REG_DEFAULT_LEFT;
}
At the moment my program is not working correctly because the register values end up being linked to RAM, and at the moment the program tries to access them only the flash is usable.
How could I change my linker script or my program so that those values have their address in flash? Is there a way I can have those values in the text segment?
And actually are those defined values global or static data when they are declared at file scope?
Edit:
The object file is linked with the following linker script:
MEMORY
{
RAM (rw) : ORIGIN = 0x00001000, LENGTH = 12M-4K
ROM (rx) : ORIGIN = 0x007f1000, LENGTH = 60K
VECTOR (rx) : ORIGIN = 0x007f0000, LENGTH = 4K
}
SECTIONS
{
.startup :
{
KEEP((.text.vectors))
sdram_init.o(.sdram_init)
} > VECTOR
...
}
Disassembly from the register assignment:
*sdram_ftdll_config_left_reg = SDRAM_FTDLL_REG_DEFAULT_LEFT;
7f0068: e59f3204 ldr r3, [pc, #516] ; 7f0274 <sdram_init+0x254>
7f006c: e5932000 ldr r2, [r3]
7f0070: e59f3200 ldr r3, [pc, #512] ; 7f0278 <sdram_init+0x258>
7f0074: e5823000 str r3, [r2]
...
7f0274: 007f2304 .word 0x007f2304
7f0278: 00887000 .word 0x00887000
To answer your question directly -- #defined values are not stored in the program anywhere (besides possibly in debug sections). Macros are expanded at compile time as if you'd typed them out in the function, something like:
*((unsigned int *) 0xd0010000) = 0x800f800f;
The values do end up in the text segment, as part of your compiled code.
What's much more likely here is that there's something else you're doing wrong. Off the top of my head, my first guess would be that your stack isn't initialized properly, or is located in a memory region that isn't available yet.
There are a few options to solve this problem.
Use PC relative data access.
Use a custom linker script.
Use assembler.
Use PC relative data access
The trouble you have with this method is you must know details of how the compiler will generate code. #define register1 (volatile unsigned int *)0xd0010000UL is that this is being stored as a static variable which is loaded from the linked SDRAM address.
7f0068: ldr r3, [pc, #516] ; 7f0274 <sdram_init+0x254>
7f006c: ldr r2, [r3] ; !! This is a problem !!
7f0070: ldr r3, [pc, #512] ; 7f0278 <sdram_init+0x258>
7f0074: str r3, [r2]
...
7f0274: .word 0x007f2304 ; !! This memory doesn't exist.
7f0278: .word 0x00887000
You must do this,
void ram_init()
{
/* NO 'static', you can not do that. */
/* static */ volatile unsigned int* const sdram_reg =
(unsigned int*)(INTER_REGS_BASE + SDRAM_FTDLL_CONFIG_LEFT_REG);
*sdram_ftdll_config_left_reg = SDRAM_FTDLL_REG_DEFAULT_LEFT;
}
Or you may prefer to implement this in assembler as it is probably pretty obtuse as to what you can and can't do here. The main effect of the above C code is that every thing is calculated or PC relative. If you opt not to use a linker script, this must be the case. As Duskwuff points out, you also can have stack issues. If you have no ETB memory, etc, that you can use as a temporary stack then it probably best to code this in assembler.
Linker script
See gnu linker map... and many other question on using a linker script in this case. If you want specifics, you need to give actual addresses use by the processor. With this option you can annotate your function to specify which section it will live in. For instance,
void ram_init() __attribute__((section("FLASH")));
In this case, you would use the Gnu Linkers MEMORY statement and AT statements to put this code at the flash address where you desire it to run from.
Use assembler
Assembler gives you full control over memory use. You can garentee that no stack is used, that no non-PC relative code is generated and it will probably be faster to boot. Here is some table driven ARM assembler I have used for the case you describe, initializing an SDRAM controller.
/* Macro for table of register writes. */
.macro DCDGEN,type,addr,data
.long \type
.long \addr
.long \data
.endm
.set FTDLL_CONFIG_LEFT, 0xD0001484
sdram_init:
DCDGEN 4, FTDLL_CONFIG_LEFT, 0x887000
1:
init_sdram_bank:
adr r0,sdram_init
adr r1,1b
1:
/* Delay. */
mov r5,#0x100
2: subs r5,r5,#1
bne 2b
ldmia r0!, {r2,r3,r4} /* Load DCD entry. */
cmp r2,#1 /* byte? */
streqb r4,[r3] /* Store byte... */
strne r4,[r3] /* Store word. */
cmp r0,r1 /* table done? */
blo 1b
bx lr
/* Dump literal pool. */
.ltorg
Assembler has many benefits. You can also clear the bss section and setup the stack with simple routines. There are many on the Internet and I think you can probably code one yourself. The gnu ld script is also beneficial with assembler as you can ensure that sections like bss are aligned and a multiple of 4,8,etc. so that the clearing routine doesn't need special cases. Also, you will have to copy the code from flash to SDRAM after it is initialized. This is a fairly expensive/long running task and you can speed it up with some short assembler.

Resources