Rebuild QEMU (For RISCV-GNU-TOOLCHAIN) - c

This is a follow-up from the following question Custom Instruction crashing with SIGNAL 4 (Illegal Instruction): RISC-V (32) GNU-Toolchain with QEMU (apologies if I have missed any etiquette points in advance or formatted this in an unsavoury way, as I am still new to posting here). I am using the latest version of the riscv-gnu-toolchain, so I am just wondering what caveats and differences this process will be for this platform.
I was following namely two guides to add custom RISC-V instructions to QEMU - namely https://www.ashling.com/wp-content/uploads/QEMU_CUSTOM_INST_WP.pdf and https://chowdera.com/2021/04/20210430120004272q.html.
After editing qemu/target/riscv/insn32.decode with a free opcode (following Ashling tutorial), and implementing translator function in insn_trans/trans_rvi.c.inc - QEMU hangs when calling the function as an .insn directive (I know this to work - ). Changing opcodes of existing instructions to test if disassembly changes indicated to me that QEMU was not registering the changes I made and that I didn't rebuild/recompile QEMU correctly. I simply ran make clean, reconfigured and make again in the QEMU directory to rebuild it - Is this the correct way to rebuild QEMU in this toolchain or is there something I missed.
The code to call
#include <stdio.h>
static int test_ins(int a, int b) {
int result;
asm volatile(".insn r 0x33, 7, 0x20, %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
return result;
}
int main() {
int a, b, result;
a = 2;
b = 4;
result = test_ins(a,b);
printf("%d\n", result);
}
Instruction in insn32.decode is as follows:
OPCODE = "0110011", FUNCT3 = "111" and FUNCT7 = "0100000".
Implementation of aforementioned instruction is as follows:
static bool trans_bitcnt(DisasContext *ctx, arg_bitcnt *a)
{
TCGLabel *loop_source1 = gen_new_label();
TCGLabel *loop_source2 = gen_new_label();
TCGv source1, source2, dstval, cntval;
source1 = tcg_temp_local_new();
source2 = tcg_temp_local_new();
dstval = tcg_temp_local_new();
cntval = tcg_temp_local_new();
// Count all the bits set in rs1 and rs2 and put that number in rd
gen_get_gpr(source1, a->rs1);
gen_get_gpr(source2, a->rs2);
tcg_gen_movi_tl(cntval, 0x0);
/* Count the bits that are set in the first register */
gen_set_label(loop_source1);
tcg_gen_andi_tl(dstval, source1, 0x1);
tcg_gen_shri_tl(source1, source1, 0x1);
tcg_gen_add_tl(cntval, cntval, dstval);
tcg_gen_brcondi_tl(TCG_COND_NE, source1, 0x0, loop_source1);
/* Count the bits that are set in the second register */
gen_set_label(loop_source2);
tcg_gen_andi_tl(dstval, source2, 0x1);
tcg_gen_shri_tl(source2, source2, 0x1);
tcg_gen_add_tl(cntval, cntval, dstval);
tcg_gen_brcondi_tl(TCG_COND_NE, source2, 0x0, loop_source2);
/* Update the destination register with the bits total */
gen_set_gpr(a->rd, cntval);
tcg_temp_free(source1);
tcg_temp_free(source2);
tcg_temp_free(dstval);
tcg_temp_free(cntval);
return true;
}

Related

How to do computations with addresses at compile/linking time?

I wrote some code for initializing the IDT, which stores 32-bit addresses in two non-adjacent 16-bit halves. The IDT can be stored anywhere, and you tell the CPU where by running the LIDT instruction.
This is the code for initializing the table:
void idt_init(void) {
/* Unfortunately, we can't write this as loops. The first option,
* initializing the IDT with the addresses, here looping over it, and
* reinitializing the descriptors didn't work because assigning a
* a uintptr_t (from (uintptr_t) handler_func) to a descr (a.k.a.
* uint64_t), according to the compiler, "isn't computable at load
* time."
* The second option, storing the addresses as a local array, simply is
* inefficient (took 0.020ms more when profiling with the "time" command
* line program!).
* The third option, storing the addresses as a static local array,
* consumes too much space (the array will probably never be used again
* during the whole kernel runtime).
* But IF my argument against the third option will be invalidated in
* the future, THEN it's the best option I think. */
/* Initialize descriptors of exception handlers. */
idt[EX_DE_VEC] = idt_trap(ex_de);
idt[EX_DB_VEC] = idt_trap(ex_db);
idt[EX_NMI_VEC] = idt_trap(ex_nmi);
idt[EX_BP_VEC] = idt_trap(ex_bp);
idt[EX_OF_VEC] = idt_trap(ex_of);
idt[EX_BR_VEC] = idt_trap(ex_br);
idt[EX_UD_VEC] = idt_trap(ex_ud);
idt[EX_NM_VEC] = idt_trap(ex_nm);
idt[EX_DF_VEC] = idt_trap(ex_df);
idt[9] = idt_trap(ex_res); /* unused Coprocessor Segment Overrun */
idt[EX_TS_VEC] = idt_trap(ex_ts);
idt[EX_NP_VEC] = idt_trap(ex_np);
idt[EX_SS_VEC] = idt_trap(ex_ss);
idt[EX_GP_VEC] = idt_trap(ex_gp);
idt[EX_PF_VEC] = idt_trap(ex_pf);
idt[15] = idt_trap(ex_res);
idt[EX_MF_VEC] = idt_trap(ex_mf);
idt[EX_AC_VEC] = idt_trap(ex_ac);
idt[EX_MC_VEC] = idt_trap(ex_mc);
idt[EX_XM_VEC] = idt_trap(ex_xm);
idt[EX_VE_VEC] = idt_trap(ex_ve);
/* Initialize descriptors of reserved exceptions.
* Thankfully we compile with -std=c11, so declarations within
* for-loops are possible! */
for (size_t i = 21; i < 32; ++i)
idt[i] = idt_trap(ex_res);
/* Initialize descriptors of hardware interrupt handlers (ISRs). */
idt[INT_8253_VEC] = idt_int(int_8253);
idt[INT_8042_VEC] = idt_int(int_8042);
idt[INT_CASC_VEC] = idt_int(int_casc);
idt[INT_SERIAL2_VEC] = idt_int(int_serial2);
idt[INT_SERIAL1_VEC] = idt_int(int_serial1);
idt[INT_PARALL2_VEC] = idt_int(int_parall2);
idt[INT_FLOPPY_VEC] = idt_int(int_floppy);
idt[INT_PARALL1_VEC] = idt_int(int_parall1);
idt[INT_RTC_VEC] = idt_int(int_rtc);
idt[INT_ACPI_VEC] = idt_int(int_acpi);
idt[INT_OPEN2_VEC] = idt_int(int_open2);
idt[INT_OPEN1_VEC] = idt_int(int_open1);
idt[INT_MOUSE_VEC] = idt_int(int_mouse);
idt[INT_FPU_VEC] = idt_int(int_fpu);
idt[INT_PRIM_ATA_VEC] = idt_int(int_prim_ata);
idt[INT_SEC_ATA_VEC] = idt_int(int_sec_ata);
for (size_t i = 0x30; i < IDT_SIZE; ++i)
idt[i] = idt_trap(ex_res);
}
The macros idt_trap and idt_int, and are defined as follows:
#define idt_entry(off, type, priv) \
((descr) (uintptr_t) (off) & 0xffff) | ((descr) (KERN_CODE & 0xff) << \
0x10) | ((descr) ((type) & 0x0f) << 0x28) | ((descr) ((priv) & \
0x03) << 0x2d) | (descr) 0x800000000000 | \
((descr) ((uintptr_t) (off) & 0xffff0000) << 0x30)
#define idt_int(off) idt_entry(off, 0x0e, 0x00)
#define idt_trap(off) idt_entry(off, 0x0f, 0x00)
idt is an array of uint64_t, so these macros are implicitly cast to that type. uintptr_t is the type guaranteed to be capable of holding pointer values as integers and on 32-bit systems usually 32 bits wide. (A 64-bit IDT has 16-byte entries; this code is for 32-bit).
I get the warning that the initializer element is not constant due to the address modification in play.
It is absolutely sure that the address is known at linking time.
Is there anything I can do to make this work? Making the idt array automatic would work but this would require the whole kernel to run in the context of one function and this would be some bad hassle, I think.
I could make this work by some additional work at runtime (as Linux 0.01 also does) but it just annoys me that something technically feasible at linking time is actually infeasible.
Related: Solution needed for building a static IDT and GDT at assemble/compile/link time - a linker script for ld can shift and mask to break up link-time-constant addresses. No earlier step has the final addresses, and relocation entries are limited in what they can represent in a .o.
The main problem is that function addresses are link-time constants, not strictly compile time constants. The compiler can't just get 32b binary integers and stick that into the data segment in two separate pieces. Instead, it has to use the object file format to indicate to the linker where it should fill in the final value (+ offset) of which symbol when linking is done. The common cases are as an immediate operand to an instruction, a displacement in an effective address, or a value in the data section. (But in all those cases it's still just filling in 32-bit absolute address so all 3 use the same ELF relocation type. There's a different relocation for relative displacements for jump / call offsets.)
It would have been possible for ELF to have been designed to store a symbol reference to be substituted at link time with a complex function of an address (or at least high / low halves like on MIPS for lui $t0, %hi(symbol) / ori $t0, $t0, %lo(symbol) to build address constants from two 16-bit immediates). But in fact the only function allowed is addition/subtraction, for use in things like mov eax, [ext_symbol + 16].
It is of course possible for your OS kernel binary to have a static IDT with fully resolved addresses at build time, so all you need to do at runtime is execute a single lidt instruction. However, the standard
build toolchain is an obstacle. You probably can't achieve this without post-processing your executable.
e.g. you could write it this way, to produce a table with the full padding in the final binary, so the data can be shuffled in-place:
#include <stdint.h>
#define PACKED __attribute__((packed))
// Note, this is the 32-bit format. 64-bit is larger
typedef union idt_entry {
// we will postprocess the linker output to have this format
// (or convert at runtime)
struct PACKED runtime { // from OSdev wiki
uint16_t offset_1; // offset bits 0..15
uint16_t selector; // a code segment selector in GDT or LDT
uint8_t zero; // unused, set to 0
uint8_t type_attr; // type and attributes, see below
uint16_t offset_2; // offset bits 16..31
} rt;
// linker output will be in this format
struct PACKED compiletime {
void *ptr; // offset bits 0..31
uint8_t zero;
uint8_t type_attr;
uint16_t selector; // to be swapped with the high16 of ptr
} ct;
} idt_entry;
// #define idt_ct_entry(off, type, priv) { .ptr = off, .type_attr = type, .selector = priv }
#define idt_ct_trap(off) { .ct = { .ptr = off, .type_attr = 0x0f, .selector = 0x00 } }
// generate an entry in compile-time format
extern void ex_de(); // these are the raw interrupt handlers, written in ASM
extern void ex_db(); // they have to save/restore *all* registers, and end with iret, rather than the usual C ABI.
// it might be easier to use asm macros to create this static data,
// just so it can be in the same file and you don't need cross-file prototypes / declarations
// (but all the same limitations about link-time constants apply)
static idt_entry idt[] = {
idt_ct_trap(ex_de),
idt_ct_trap(ex_db),
// ...
};
// having this static probably takes less space than instructions to write it on the fly
// but not much more. It would be easy to make a lidt function that took a struct pointer.
static const struct PACKED idt_ptr {
uint16_t len; // encoded as bytes - 1, so 0xffff means 65536
void *ptr;
} idt_ptr = { sizeof(idt) - 1, idt };
/****** functions *********/
// inline
void load_static_idt(void) {
asm volatile ("lidt %0"
: // no outputs
: "m" (idt_ptr));
// memory operand, instead of writing the addressing mode ourself, allows a RIP-relative addressing mode in 64bit mode
// also allows it to work with -masm=intel or not.
}
// Do this once at at run-time
// **OR** run this to pre-process the binary, after link time, as part of your build
void idt_convert_to_runtime(void) {
#ifdef DEBUG
static char already_done = 0; // make sure this only runs once
if (already_done)
error;
already_done = 1;
#endif
const int count = sizeof idt / sizeof idt[0];
for (int i=0 ; i<count ; i++) {
uint16_t tmp1 = idt[i].rt.selector;
uint16_t tmp2 = idt[i].rt.offset_2;
idt[i].rt.offset_2 = tmp1;
idt[i].rt.selector = tmp2;
// or do this swap in fewer insns with SSE or MMX pshufw, but using vector instructions before setting up the IDT may be insane.
}
}
This does compile. See a diff of the -m32 and -m64 asm output on the Godbolt compiler explorer. Look at the layout in the data section (note that .value is a synonym for .short, and is 16 bits.) (But note that the IDT table format is different for 64-bit mode.)
I think I have the size calculation correct (bytes - 1), as documented in http://wiki.osdev.org/Interrupt_Descriptor_Table. Minimum value 100h bytes long (encoded as 0x99). See also https://en.wikibooks.org/wiki/X86_Assembly/Global_Descriptor_Table. (lgdt size/pointer works the same way, although the table itself has a different format.)
The other option, instead of having the IDT static in the data section, is to have it in the bss section, with the data stored as immediate constants in a function that will initialize it (or in an array read by that function).
Either way, that function (and its data) can be in a .init section whose memory you re-use after it's done. (Linux does this to reclaim memory from code and data that's only needed once, at startup.) This would give you the optimal tradeoff of small binary size (since 32b addresses are smaller than 64b IDT entries), and no runtime memory wasted on code to set up the IDT. A small loop that runs once at startup is negligible CPU time. (The version on Godbolt fully unrolls because I only have 2 entries, and it embeds the address into each instruction as a 32-bit immediate, even with -Os. With a large enough table (just copy/paste to duplicate a line) you get a compact loop even at -O3. The threshold is lower for -Os.)
Without memory-reuse haxx, probably a tight loop to rewrite 64b entries in place is the way to go. Doing it at build time would be even better, but then you'd need a custom tool to run the tranformation on the kernel binary.
Having the data stored in immediates sounds good in theory, but the code for each entry would probably total more than 64b, because it couldn't loop. The code to split an address into two would have to be fully unrolled (or placed in a function and called). Even if you had a loop to store all the same-for-multiple-entries stuff, each pointer would need a mov r32, imm32 to get the address in a register, then mov word [idt+i + 0], ax / shr eax, 16 / mov word [idt+i + 6], ax. That's a lot of machine-code bytes.
One way would be to use an intermediate jump table that is located at a fixed address. You could initialize the idt with addresses of the locations in this table (which will be compile time constant). The locations in the jump table will contain jump instructions to the actual isr routines.
The dispatch to an isr will be indirect as follows:
trap -> jump to intermediate address in the idt -> jump to isr
One way to create a jump table at a fixed address is as follows.
Step 1: Put jump table in a section
// this is a jump table at a fixed address
void jump(void) __attribute__((section(".si.idt")));
void jump(void) {
asm("jmp isr0"); // can also be asm("call ...") depending on need
asm("jmp isr1");
asm("jmp isr2");
}
Step 2: Instruct the linker to locate the section at a fixed address
SECTIONS
{
.so.idt 0x600000 :
{
*(.si.idt)
}
}
Put this in the linker script right after the .text section. This will make sure that the executable code in the table will go into a executable memory region.
You can instruct the linker to use your script as follows using the --script option in the Makefile.
LDFLAGS += -Wl,--script=my_script.lds
The following macro gives you the address of the location which contains the jump (or call) instruction to the corresponding isr.
// initialize the idt at compile time with const values
// you can find a cleaner way to generate offsets
#define JUMP_ADDR(off) ((char*)0x600000 + 4 + (off * 5))
You will then initialize the idt as follows using modified macros.
// your real idt will be initialized as follows
#define idt_entry(addr, type, priv) \
( \
((descr) (uintptr_t) (addr) & 0xffff) | \
((descr) (KERN_CODE & 0xff) << 0x10) | \
((descr) ((type) & 0x0f) << 0x28) | \
((descr) ((priv) & 0x03) << 0x2d) | \
((descr) 0x1 << 0x2F) | \
((descr) ((uintptr_t) (addr) & 0xffff0000) << 0x30) \
)
#define idt_int(off) idt_entry(JUMP_ADDR(off), 0x0e, 0x00)
#define idt_trap(off) idt_entry(JUMP_ADDR(off), 0x0f, 0x00)
descr idt[] =
{
...
idt_trap(ex_de),
...
idt_int(int_casc),
...
};
A demo working example is below, which shows dispatch to a isr with a non-fixed address from a instruction at a fixed address.
#include <stdio.h>
// dummy isrs for demo
void isr0(void) {
printf("==== isr0\n");
}
void isr1(void) {
printf("==== isr1\n");
}
void isr2(void) {
printf("==== isr2\n");
}
// this is a jump table at a fixed address
void jump(void) __attribute__((section(".si.idt")));
void jump(void) {
asm("jmp isr0"); // can be asm("call ...")
asm("jmp isr1");
asm("jmp isr2");
}
// initialize the idt at compile time with const values
// you can find a cleaner way to generate offsets
#define JUMP_ADDR(off) ((char*)0x600000 + 4 + (off * 5))
// dummy idt for demo
// see below for the real idt
char* idt[] =
{
JUMP_ADDR(0),
JUMP_ADDR(1),
JUMP_ADDR(2),
};
int main(int argc, char* argv[]) {
int trap;
char* addr = idt[trap = argc - 1];
printf("==== idt[%d]=%p\n", trap, addr);
asm("jmp *%0\n" : :"m"(addr));
}

Undefined reference to in os kernel linking

i have a problem. I making simple OS kernel with this tutorial: http://wiki.osdev.org/Bare_Bones#Linking_the_Kernel
but,if i want to link files boot.o and kernel.o, gcc compiler returns this error:
boot.o: In function `start':
boot.asm:(.text+0x6): undefined reference to `kernel_main'
collect2.exe: error: ld returned 1 exit status.
sources of files:
boot.asm
; Declare constants used for creating a multiboot header.
MBALIGN equ 1<<0 ; align loaded modules on page boundaries
MEMINFO equ 1<<1 ; provide memory map
FLAGS equ MBALIGN | MEMINFO ; this is the Multiboot 'flag' field
MAGIC equ 0x1BADB002 ; 'magic number' lets bootloader find the header
CHECKSUM equ -(MAGIC + FLAGS) ; checksum of above, to prove we are multiboot
; Declare a header as in the Multiboot Standard. We put this into a special
; section so we can force the header to be in the start of the final program.
; You don't need to understand all these details as it is just magic values that
; is documented in the multiboot standard. The bootloader will search for this
; magic sequence and recognize us as a multiboot kernel.
section .multiboot
align 4
dd MAGIC
dd FLAGS
dd CHECKSUM
; Currently the stack pointer register (esp) points at anything and using it may
; cause massive harm. Instead, we'll provide our own stack. We will allocate
; room for a small temporary stack by creating a symbol at the bottom of it,
; then allocating 16384 bytes for it, and finally creating a symbol at the top.
section .bootstrap_stack
align 4
stack_bottom:
times 16384 db 0
stack_top:
; The linker script specifies _start as the entry point to the kernel and the
; bootloader will jump to this position once the kernel has been loaded. It
; doesn't make sense to return from this function as the bootloader is gone.
section .text
global _start
_start:
; Welcome to kernel mode! We now have sufficient code for the bootloader to
; load and run our operating system. It doesn't do anything interesting yet.
; Perhaps we would like to call printf("Hello, World\n"). You should now
; realize one of the profound truths about kernel mode: There is nothing
; there unless you provide it yourself. There is no printf function. There
; is no <stdio.h> header. If you want a function, you will have to code it
; yourself. And that is one of the best things about kernel development:
; you get to make the entire system yourself. You have absolute and complete
; power over the machine, there are no security restrictions, no safe
; guards, no debugging mechanisms, there is nothing but what you build.
; By now, you are perhaps tired of assembly language. You realize some
; things simply cannot be done in C, such as making the multiboot header in
; the right section and setting up the stack. However, you would like to
; write the operating system in a higher level language, such as C or C++.
; To that end, the next task is preparing the processor for execution of
; such code. C doesn't expect much at this point and we only need to set up
; a stack. Note that the processor is not fully initialized yet and stuff
; such as floating point instructions are not available yet.
; To set up a stack, we simply set the esp register to point to the top of
; our stack (as it grows downwards).
mov esp, stack_top
; We are now ready to actually execute C code. We cannot embed that in an
; assembly file, so we'll create a kernel.c file in a moment. In that file,
; we'll create a C entry point called kernel_main and call it here.
extern kernel_main
call kernel_main
; In case the function returns, we'll want to put the computer into an
; infinite loop. To do that, we use the clear interrupt ('cli') instruction
; to disable interrupts, the halt instruction ('hlt') to stop the CPU until
; the next interrupt arrives, and jumping to the halt instruction if it ever
; continues execution, just to be safe.
cli
.hang:
hlt
jmp .hang
kernel.c
#if !defined(__cplusplus)
#include <stdbool.h> /* C doesn't have booleans by default. */
#endif
#include <stddef.h>
#include <stdint.h>
/* Check if the compiler thinks if we are targeting the wrong operating system. */
#if defined(__linux__)
#error "You are not using a cross-compiler, you will most certainly run into trouble"
#endif
/* This tutorial will only work for the 32-bit ix86 targets. */
#if !defined(__i386__)
#error "This tutorial needs to be compiled with a ix86-elf compiler"
#endif
/* Hardware text mode color constants. */
enum vga_color
{
COLOR_BLACK = 0,
COLOR_BLUE = 1,
COLOR_GREEN = 2,
COLOR_CYAN = 3,
COLOR_RED = 4,
COLOR_MAGENTA = 5,
COLOR_BROWN = 6,
COLOR_LIGHT_GREY = 7,
COLOR_DARK_GREY = 8,
COLOR_LIGHT_BLUE = 9,
COLOR_LIGHT_GREEN = 10,
COLOR_LIGHT_CYAN = 11,
COLOR_LIGHT_RED = 12,
COLOR_LIGHT_MAGENTA = 13,
COLOR_LIGHT_BROWN = 14,
COLOR_WHITE = 15,
};
uint8_t make_color(enum vga_color fg, enum vga_color bg)
{
return fg | bg << 4;
}
uint16_t make_vgaentry(char c, uint8_t color)
{
uint16_t c16 = c;
uint16_t color16 = color;
return c16 | color16 << 8;
}
size_t strlen(const char* str)
{
size_t ret = 0;
while ( str[ret] != 0 )
ret++;
return ret;
}
static const size_t VGA_WIDTH = 80;
static const size_t VGA_HEIGHT = 25;
size_t terminal_row;
size_t terminal_column;
uint8_t terminal_color;
uint16_t* terminal_buffer;
void terminal_initialize()
{
terminal_row = 0;
terminal_column = 0;
terminal_color = make_color(COLOR_LIGHT_GREY, COLOR_BLACK);
terminal_buffer = (uint16_t*) 0xB8000;
for ( size_t y = 0; y < VGA_HEIGHT; y++ )
{
for ( size_t x = 0; x < VGA_WIDTH; x++ )
{
const size_t index = y * VGA_WIDTH + x;
terminal_buffer[index] = make_vgaentry(' ', terminal_color);
}
}
}
void terminal_setcolor(uint8_t color)
{
terminal_color = color;
}
void terminal_putentryat(char c, uint8_t color, size_t x, size_t y)
{
const size_t index = y * VGA_WIDTH + x;
terminal_buffer[index] = make_vgaentry(c, color);
}
void terminal_putchar(char c)
{
terminal_putentryat(c, terminal_color, terminal_column, terminal_row);
if ( ++terminal_column == VGA_WIDTH )
{
terminal_column = 0;
if ( ++terminal_row == VGA_HEIGHT )
{
terminal_row = 0;
}
}
}
void terminal_writestring(const char* data)
{
size_t datalen = strlen(data);
for ( size_t i = 0; i < datalen; i++ )
terminal_putchar(data[i]);
}
void kernel_main()
{
terminal_initialize();
/* Since there is no support for newlines in terminal_putchar yet, \n will
produce some VGA specific character instead. This is normal. */
terminal_writestring("Hello\n");
}
It looks like you’re using GCC on Microsoft® Windows® (for example, with Cygwin), judging from the collect2.exe reference. This means your native executable format, which you appear to be using, prepends an underscore to C identifiers to keep them separate from assembly identifiers, which is something most object formats, but not the ELF format wide-spread under modern Unix, does.
If you change your call to _kernel_main, the link error will likely go away.
But please note this line, quoted from your question:
#error "This tutorial needs to be compiled with a ix86-elf compiler"
You’re violating a basic tenet of the tutorial you’re using. I suggest you get a GNU/Linux or BSD VM for i386 (32-bit), and run the tutorial within that.

Address Error ISR

I am trying to run and debug a C program on a dsPIC30f3011 microcontroller. When I run my code in MPLAB, the code always tends to stop at this ISR and I am stuck with absolutely no output for any variables, with my code not even executing. It seems to be some kind of "trap" program that I assume is for catching simple mistakes (i.e. oscillator failures, etc.) I am using MPLabIDE v8.5, with an MPLab ICD3 in debug mode. It's worth mentioning that MPLAB shows that I am connected to both the target(dsPIC) and the ICD3. Can someone please give me a reason as to why this problem is occurring?
Here is the ISR:
void _ISR __attribute__((no_auto_psv))_AddressError(void)
{
INTCON1bits.ADDRERR = 0;
while(1);
}
Here is my code with initializations first, then PID use, then the DSP functions,
then the actual DSP header file where the syntax/algorithm is derived. There is also some sort of problem where I define DutyCycle.
///////////////////////////////Initializations/////////////////////////////////////////////
#include "dsp.h" //see bottom of program
tPID SPR4535_PID; // Declare a PID Data Structure named, SPR4535_PID, initialize the PID object
/* The SPR4535_PID data structure contains a pointer to derived coefficients in X-space and */
/* pointer to controller state (history) samples in Y-space. So declare variables for the */
/* derived coefficients and the controller history samples */
fractional abcCoefficient[3] __attribute__ ((space(xmemory))); // ABC Coefficients loaded from X memory
fractional controlHistory[3] __attribute__ ((space(ymemory))); // Control History loaded from Y memory
/* The abcCoefficients referenced by the SPR4535_PID data structure */
/* are derived from the gain coefficients, Kp, Ki and Kd */
/* So, declare Kp, Ki and Kd in an array */
fractional kCoeffs[] = {0,0,0};
//////////////////////////////////PID variable use///////////////////////////////
void ControlSpeed(void)
{
LimitSlew();
PID_CHANGE_SPEED(SpeedCMD);
if (timer3avg > 0)
ActualSpeed = SPEEDMULT/timer3avg;
else
ActualSpeed = 0;
max=2*(PTPER+1);
DutyCycle=Fract2Float(PID_COMPUTE(ActualSpeed))*max;
// Just make sure the speed that will be written to the PDC1 register is not greater than the PTPER register
if(DutyCycle>max)
DutyCycle=max;
else if (DutyCycle<0)
DutyCycle=0;
}
//////////////////////////////////PID functions//////////////////////////////////
void INIT_PID(int DESIRED_SPEED)
{
SPR4535_PID.abcCoefficients = &abcCoefficient[0]; //Set up pointer to derived coefficients
SPR4535_PID.controlHistory = &controlHistory[0]; //Set up pointer to controller history samples
PIDInit(&SPR4535_PID); //Clear the controller history and the controller output
kCoeffs[0] = KP; // Sets the K[0] coefficient to the KP
kCoeffs[1] = KI; // Sets the K[1] coefficient to the KI
kCoeffs[2] = KD; // Sets the K[2] coefficient to the Kd
PIDCoeffCalc(&kCoeffs[0], &SPR4535_PID); //Derive the a,b, & c coefficients from the Kp, Ki & Kd
SPR4535_PID.controlReference = DESIRED_SPEED; //Set the Reference Input for your controller
}
int PID_COMPUTE(int MEASURED_OUTPUT)
{
SPR4535_PID.measuredOutput = MEASURED_OUTPUT; // Records the measured output
PID(&SPR4535_PID);
return SPR4535_PID.controlOutput; // Computes the control output
}
void PID_CHANGE_SPEED (int NEW_SPEED)
{
SPR4535_PID.controlReference = NEW_SPEED; // Changes the control reference to change the desired speed
}
/////////////////////////////////////dsp.h/////////////////////////////////////////////////
typedef struct {
fractional* abcCoefficients; /* Pointer to A, B & C coefficients located in X-space */
/* These coefficients are derived from */
/* the PID gain values - Kp, Ki and Kd */
fractional* controlHistory; /* Pointer to 3 delay-line samples located in Y-space */
/* with the first sample being the most recent */
fractional controlOutput; /* PID Controller Output */
fractional measuredOutput; /* Measured Output sample */
fractional controlReference; /* Reference Input sample */
} tPID;
/*...........................................................................*/
extern void PIDCoeffCalc( /* Derive A, B and C coefficients using PID gain values-Kp, Ki & Kd*/
fractional* kCoeffs, /* pointer to array containing Kp, Ki & Kd in sequence */
tPID* controller /* pointer to PID data structure */
);
/*...........................................................................*/
extern void PIDInit ( /* Clear the PID state variables and output sample*/
tPID* controller /* pointer to PID data structure */
);
/*...........................................................................*/
extern fractional* PID ( /* PID Controller Function */
tPID* controller /* Pointer to PID controller data structure */
);
The dsPIC traps don't offer much information free of charge, so I tend to augment the ISRs with a little assembly language pre-prologue. (Note that the Stack Error trap is a little ropey, as it uses RCALL and RETURN instructions when the stack is already out of order.)
/**
* \file trap.s
* \brief Used to provide a little more information during development.
*
* The trapPreprologue function is called on entry to each of the routines
* defined in traps.c. It looks up the stack to find the value of the IP
* when the trap occurred and stores it in the _errAddress memory location.
*/
.global __errAddress
.global __intCon1
.global _trapPreprologue
.section .bss
__errAddress: .space 4
__intCon1: .space 2
.section .text
_trapPreprologue:
; Disable maskable interrupts and save primary regs to shadow regs
bclr INTCON2, #15 ;global interrupt disable
push.s ;Switch to shadow registers
; Retrieve the ISR return address from the stack into w0:w1
sub w15, #4, w2 ;set W2 to the ISR.PC (SP = ToS-4)
mov [--w2], w0 ;get the ISR return address LSW (ToS-6) in w0
bclr w0, #0x0 ;mask out SFA bit (w0<0>)
mov [--w2], w1 ;get the ISR return address MSW (ToS-8) in w1
bclr w1, #0x7 ;mask out IPL<3> bit (w1<7>)
ze w1, w1 ;mask out SR<7:0> bits (w1<15..8>)
; Save it
mov #__errAddress, w2 ;Move address of __errAddress into w2
mov.d w0, [w2] ;save the ISR return address to __errAddress
; Copy the content of the INTCON1 SFR into memory
mov #__intCon1, w2 ;Move address of __intCon1 into w2
mov INTCON1, WREG ;Read the trap flags into w0 (WREG)
mov w0, [w2] ;save the trap flags to __intCon1
; Return to the 'C' handler
pop.s ;Switch back to primary registers
return
Then I keep all the trap ISRs in a single traps.c file that uses the pre-prologue in traps.s. Note that the actual traps may be different for your microcontroller - check the data sheet to see which are implemented.
/**
* \file traps.c
* \brief Micro-controller exception interrupt vectors.
*/
#include <stdint.h>
#include "traps.h" // Internal interface to the micro trap handling.
/* Access to immediate call stack. Implementation in trap.s */
extern volatile unsigned long _errAddress;
extern volatile unsigned int _intCon1;
extern void trapPreprologue(void);
/* Trap information, set by the traps that use them. */
static unsigned int _intCon2;
static unsigned int _intCon3;
static unsigned int _intCon4;
/* Protected functions exposed by traps.h */
void trapsInitialise(void)
{
_errAddress = 0;
_intCon1 = 0;
_intCon2 = 0;
_intCon3 = 0;
_intCon4 = 0;
}
/* Trap Handling */
// The trap routines call the _trapPreprologue assembly routine in traps.s
// to obtain the value of the PC when the trap occurred and store it in
// the _errAddress variable. They reset the interrupt source in the CPU's
// INTCON SFR and invoke the (#defined) vThrow macro to report the fault.
void __attribute__((interrupt(preprologue("rcall _trapPreprologue")),no_auto_psv)) _OscillatorFail(void)
{
INTCON1bits.OSCFAIL = 0; /* Clear the trap flag */
vThrow(_intCon1, _intCon2, _intCon3, _intCon4, _errAddress);
}
void __attribute__((interrupt(preprologue("rcall _trapPreprologue")),no_auto_psv)) _StackError(void)
{
INTCON1bits.STKERR = 0; /* Clear the trap flag */
vThrow(_intCon1, _intCon2, _intCon3, _intCon4, _errAddress);
}
void __attribute__((interrupt(preprologue("rcall _trapPreprologue")),no_auto_psv)) _AddressError(void)
{
INTCON1bits.ADDRERR = 0; /* Clear the trap flag */
vThrow(_intCon1, _intCon2, _intCon3, _intCon4, _errAddress);
}
void __attribute__((interrupt(preprologue("rcall _trapPreprologue")),no_auto_psv)) _MathError(void)
{
INTCON1bits.MATHERR = 0; /* Clear the trap flag */
vThrow(_intCon1, _intCon2, _intCon3, _intCon4, _errAddress);
}
void __attribute__((interrupt(preprologue("rcall _trapPreprologue")),no_auto_psv)) _DMACError(void)
{
INTCON1bits.DMACERR = 0; /* Clear the trap flag */
vThrow(_intCon1, _intCon2, _intCon3, _intCon4, _errAddress);
}
void __attribute__((interrupt(preprologue("rcall _trapPreprologue")),no_auto_psv)) _HardTrapError(void)
{
_intCon4 = INTCON4;
INTCON4 = 0; // Clear the hard trap register
_intCon2 = INTCON2;
INTCON2bits.SWTRAP = 0; // Make sure the software hard trap bit is clear
vThrow(_intCon1, _intCon2, _intCon3, _intCon4, _errAddress);
}
void __attribute__((interrupt(preprologue("rcall _trapPreprologue")),no_auto_psv)) _SoftTrapError(void)
{
_intCon3 = INTCON3;
INTCON3 = 0; // Clear the soft trap register
vThrow(_intCon1, _intCon2, _intCon3, _intCon4, _errAddress);
}
Implementation of the vThrow macro is up to you. However, it should not use the stack, as this may be unavailable (so no puts() debug calls!) During development, it would be reasonable to use a simple endless loop with a NOP statement in it that you can breakpoint on.
(In a production build, my vThrow macro logs the parameters into a reserved area of RAM that is excluded from being zeroed at start-up by the linker script, and resets the microcontroller. During start-up the program inspects the reserved area and if it is non-zero records the error event for diagnostics.)
Once you get a trap, inspecting the content of the _errAddress variable will give you the return address for the ISR, which is the address immediately following the instruction that generated the interrupt. You can then inspect your MAP files to find the routine and, if you're really keen, inspect the disassembly to find the specific instruction. After that, debugging it is up to you.
As suggested in the comments, the while(1) statement is where your code is getting hung. Note however, that your code is executing - you're just in an infinite loop. That's also why you can't view your variables or current program counter. Generally when you're attached to a ucontroller via PC host, you can't view state information while the ucontroller is executing. Everything is running too fast, even on a slow one, to constantly update your screen.
To try to identify the cause, you can set a breakpoint in the ISR and reset the controller. When the breakpoint is hit, execution will halt, and you may be able to investigate your stack frames to see the last line of code executed before the ISR was triggered. This is not guaranteed though - depending on how your particular ucontroller handles interrupts, the call stack may not be continuous between normal program execution and the interrupt context.
If that doesn't work, set a breakpoint in your code before the ISR is invoked, and step through your code until it is. The last line of code you executed before the ISR will be the cause. Keep in mind, this may take some time, especially if the offending line is in the loop and only trips the ISR after a certain number of iterations.
EDIT
After posting this answer, I noticed your last comment about the linkscript warning. This is a perfect example of why you should, with very few exceptions, work just as hard to resolve warnings as you do to resolve compiler errors. Especially if you don't understand what the warning means and what caused it.
A PID algorithm involves multiplication. On a dspic, this is done via the built in hardware multiplier. This multiplier has one register which must point to xmemory space and another pointing to ymemory space. The dsp core then multiplies these two and the result can be found in the accumulator (there a two of them).
An addres error trap will be triggered if an xmemory address range is loaded into the ymemory register and viceversa. You can check this by single stepping the code in the assembly.
This is not the only instance the trap is triggered. There are also silicon bugs that can cause this, check the errata.

Writing ARM machine instructions and executing them from C (On the Raspberry pi)

I'm trying to write some self modifying code in C and ARM. I previously asked a similar question about MIPS and am now trying to port over the project to ARM.
My system := Raspbian on raspberry pi, ARMv6, GCC
There are a few things I am unsure of:
Does ARM require a D-cache write-back/I-cache invalidate (cache flush)? If so, how can we do this?
Also I tried an example
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
int inc(int x){ //increments x
uint16_t *ret = malloc(2 * sizeof(uint16_t));
*(ret + 0) = 0x3001; //add r0 1 := r0 += 1
*(ret + 1) = 0x4770; //bx lr := jump back to inc()
int(*f)(int) = (int (*)(int)) ret;
return (*f)(x);
}
int main(){
printf("%d",inc(6)); //expect '7' to be printed
exit(0);}
but I keep getting a segmentation fault. I'm using the aapcs calling convention, which I've been given to understand is the default for all ARM
I'd be much obliged if someone pointed me in the right direction
Bonus question (meaning, it doesn't really have to be answered, but would be cool to know) - I "come from a MIPS background", how the heck do ARM programmers do without a 0 register? (as in, a register hardcoded to the value 0)
Read Caches and Self-Modifying Code on blogs.arm.com. Article includes an example as well which does what you are describing.
To answer your question from article
... the ARM architecture is often considered to be a Modified Harvard Architecture. ...
The typical drawback of a pure Harvard architecture is that instruction memory is not directly accessible from the same address space as data memory, though this restriction does not apply to ARM. On ARM, you can write instructions into memory, but because the D-cache and I-cache are not coherent, the newly-written instructions might be masked by the existing contents of the I-cache, causing the processor to execute old (or possibly invalid) instructions.
See __clear_cache for how to invalidate cache(s).
I hope you are also aware of ARM/Thumb instruction sets, if you are planning to push your instructions into memory.
Ok, so this works on my raspberry Pi.
#include <stdio.h>
#include <sys/mman.h>
#include <stdint.h>
#include <stdlib.h>
int inc(int x){ //increments x
uint32_t *ret = mmap(NULL,
2 * sizeof(uint32_t), // Space for 16 instructions. (More than enough.)
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS,
-1,0);
if (ret == MAP_FAILED) {
printf("Could not mmap a memory buffer with the proper permissions.\n");
return -1;
}
*(ret + 0) = 0xE2800001; //add r0 r0 #1 := r0 += 1
*(ret + 1) = 0xE12FFF1E; //bx lr := jump back to inc()
__clear_cache((char*) ret, (char*) (ret+2));
int(*f)(int) = (int (*)(int)) ret;
return (*f)(x);
}
int main(){
printf("%d\n",inc(6)); //expect '7' to be printed
exit(0);}
There are a couple of problems.
You don't flush your D-Cache and I-Cache, so most times the I-Cache will fetch stale data from L2. Under linux there is a libc/sys-call which does that for you. Either use __clear_cache(begin, end) or _builtin_clear_cache(begin, end).
You output Thumb-Code, but you don't take care of how your code gets called. The easiest way to fix that would be to use some asm-code to do the actual blx call and OR the address with 1, as this bit sets the mode the processor runs in. As you're malloc address will always be aligned to a word boundary, making you call thumb-code in arm-mode.

I'm writing my own JIT-interpreter. How do I execute generated instructions?

I intend to write my own JIT-interpreter as part of a course on VMs. I have a lot of knowledge about high-level languages, compilers and interpreters, but little or no knowledge about x86 assembly (or C for that matter).
Actually I don't know how a JIT works, but here is my take on it: Read in the program in some intermediate language. Compile that to x86 instructions. Ensure that last instruction returns to somewhere sane back in the VM code. Store the instructions some where in memory. Do an unconditional jump to the first instruction. Voila!
So, with that in mind, I have the following small C program:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main() {
int *m = malloc(sizeof(int));
*m = 0x90; // NOP instruction code
asm("jmp *%0"
: /* outputs: */ /* none */
: /* inputs: */ "d" (m)
: /* clobbers: */ "eax");
return 42;
}
Okay, so my intention is for this program to store the NOP instruction somewhere in memory, jump to that location and then probably crash (because I haven't setup any way for the program to return back to main).
Question: Am I on the right path?
Question: Could you show me a modified program that manages to find its way back to somewhere inside main?
Question: Other issues I should beware of?
PS: My goal is to gain understanding, not necessarily do everything the right way.
Thanks for all the feedback. The following code seems to be the place to start and works on my Linux box:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
unsigned char *m;
int main() {
unsigned int pagesize = getpagesize();
printf("pagesize: %u\n", pagesize);
m = malloc(1023+pagesize+1);
if(m==NULL) return(1);
printf("%p\n", m);
m = (unsigned char *)(((long)m + pagesize-1) & ~(pagesize-1));
printf("%p\n", m);
if(mprotect(m, 1024, PROT_READ|PROT_EXEC|PROT_WRITE)) {
printf("mprotect fail...\n");
return 0;
}
m[0] = 0xc9; //leave
m[1] = 0xc3; //ret
m[2] = 0x90; //nop
printf("%p\n", m);
asm("jmp *%0"
: /* outputs: */ /* none */
: /* inputs: */ "d" (m)
: /* clobbers: */ "ebx");
return 21;
}
Question: Am I on the right path?
I would say yes.
Question: Could you show me a modified program that manages to find its way back to somewhere inside main?
I haven't got any code for you, but a better way to get to the generated code and back is to use a pair of call/ret instructions, as they will manage the return address automatically.
Question: Other issues I should beware of?
Yes - as a security measure, many operating systems would prevent you from executing code on the heap without making special arrangements. Those special arrangements typically amount to you having to mark the relevant memory page(s) as executable.
On Linux this is done using mprotect() with PROT_EXEC.
If your generated code follows the proper calling convention, then you can declare a pointer-to-function type and invoke the function this way:
typedef void (*generated_function)(void);
void *func = malloc(1024);
unsigned char *o = (unsigned char *)func;
generated_function *func_exec = (generated_function *)func;
*o++ = 0x90; // NOP
*o++ = 0xcb; // RET
func_exec();

Resources