mbed-OS porting to TivaC TM4123, Trouple with dynamic interrupt handling - c

Recently I'm trying to port mbed-OS to Tiva-C launchpad TM4C123, I am facing problem with file supplied by mbed which is cmsis_nvic.c and cmsis_nvic.h
This module is supposed to dynamically allocate the interrupt handler of OS timer to addressable function.(Or as far as I understand).
What happen is, The software jumps to "Hard Fault Handler" after executing the following line
vectors[i] = old_vectors[i];
Here's files which I use
#include "cmsis_nvic.h"
#define NVIC_RAM_VECTOR_ADDRESS (0x02000000) // Vectors positioned at start of RAM
#define NVIC_FLASH_VECTOR_ADDRESS (0x0) // Initial vector position in flash
void NVIC_SetVector(IRQn_Type IRQn, uint32_t vector) {
uint32_t *vectors = (uint32_t*)SCB->VTOR;
uint32_t i;
// Copy and switch to dynamic vectors if the first time called
if (SCB->VTOR == NVIC_FLASH_VECTOR_ADDRESS) {
uint32_t *old_vectors = vectors;
vectors = (uint32_t*)NVIC_RAM_VECTOR_ADDRESS;
for (i=0; i<NVIC_NUM_VECTORS; i++) {
vectors[i] = old_vectors[i];
}
SCB->VTOR = (uint32_t)NVIC_RAM_VECTOR_ADDRESS;
}
vectors[IRQn + 16] = vector;
}
uint32_t NVIC_GetVector(IRQn_Type IRQn) {
uint32_t *vectors = (uint32_t*)SCB->VTOR;
return vectors[IRQn + 16];
}
And here is cmsis_nvic.h
#ifndef MBED_CMSIS_NVIC_H
#define MBED_CMSIS_NVIC_H
#define NVIC_NUM_VECTORS (154) // CORE + MCU Peripherals
#define NVIC_USER_IRQ_OFFSET 16
#include "cmsis.h"
#ifdef __cplusplus
extern "C" {
#endif
void NVIC_SetVector(IRQn_Type IRQn, uint32_t vector);
uint32_t NVIC_GetVector(IRQn_Type IRQn);
#ifdef __cplusplus
}
#endif
#endif
and I am calling
NVIC_SetVector(IRQn_Type IRQn, uint32_t vector)
from file us_ticker.c like this
NVIC_SetVector(TIMER0A_IRQn, (uint32_t)us_ticker_irq_handler);
(my compiler is ARM GCC, I am using CDT for building, And GDB openOCD for debugging, and integrated all those tools on Eclipse)
Can anyone please let me know what is going wrong here? or at least where should I debug or read to help me solve this problem???
UPDATE
I figured out part of the problem, The vector is not pointing to the first address of the target SRAM which should be
#define NVIC_RAM_VECTOR_ADDRESS (0x20000000)
instead of
#define NVIC_RAM_VECTOR_ADDRESS (0x02000000)
So now when calling NVIC_SetVector , the function is executed. But then when enabling the interrupt, Software still jumps to Hard Fault, I guess(just guessing or might be part of solution) that the defines in the header file are not configured correctly, Can someone explain to me what do they mean? and how to calculate the number of vector addresses? and what is the USER OFFSET?

I have solved this issue, here's what I have found
1- NVIC_RAM_VECTOR_ADDRESS wasn't the first address of my target RAM which should be `0x20000000'
2- Linker file should be update so that the stack pointer shouldn't write over the new copied vector table. So shift the RAM address by number of bytes that vector table should occupy.
3-(THE MAIN CAUSE) inside function NVIC_SetVector, i was declared as uint32_t and then compared to less than 255 pre-processor value. So compile get confused by comparing uint32_t with uint8_t, by adding UL to the pre-processor value, it solved the whole issue.

Related

Auto-Allocate to a specific RAM Area in GCC/C

Sorry for my english, its a bit hard for me to explain what exactly i would need.
I'm making some extra code into existing binarys using the GCC compiler.
In this case, its PowerPC, but it should not really matter.
I know, where in the existing binary i have free ram available (i dumped the full RAM to make sure) but i need to define each RAM address manually, currently i am doing it like this:
// #ram.h
//8bit ram
uint8_t* xx1 = (uint8_t*) 0x807F00;
uint8_t* xx2 = (uint8_t*) 0x807F01;
//...and so on
// 16bit ram
uint16_t* xxx1 = (uint16_t*) 0x807F40;
uint16_t* xxx2 = (uint16_t*) 0x807F42;
//...and so on
// 32bit ram
uint32_t* xxxx1 = (uint32_t*) 0x807FA0;
uint32_t* xxxx2 = (uint32_t*) 0x807FA4;
//...and so on
And im accessing my variables like this:
void __attribute__ ((noinline)) silly_demo_function() {
#include "ram.h"
if (*xxx2>*xx1) {
*xxx3 = *xxx3 + *xx1;
}
return;
}
But this gets really boring, if i want to patch my code into another existing binary, where the location of available/free/unused ram can be fully different, or if im replacing/removing some value in the middle. I am using 8, 16 and 32bit variables.
Is there a way, i can define an area like 0x807F00 to 0x00808FFF, and allocate my variables on the fly, and the compiler will allocate it inside my specific location?
I suspect that the big problem here is that those addresses are memory mapped IO (devices) and not RAM; and should not be treated as RAM.
Further, I'd say that you probably should be hiding the "devices that aren't RAM" behind an abstract layer, a little bit like a device driver; partly so that you can make sure that the compiler complies with any constraints caused by it being IO and not RAM (e.g. treated as volatile, possibly honoring any access size restrictions, possibly taking care of any cache coherency management); partly so that you/programmers know what is normal/fast/cached RAM and what isn't; partly so that you can replace the "device" with fake code for testing; and partly so that it's all kept in a single well defined area.
For example; you might have a header file called "src/devices.h" that contains:
#define xx1_address 0x807F00
..and the wrapper code might be a file called "src/devices/xx1.c" that contains something like:
#include "src/devices.h"
static volatile uint8_t * xx1 = (uint8_t*) xx1_address;
uint8_t get_xx1(void) {
return *xx1;
}
void set_xx1(uint8_t x) {
*xx1 = x;
}
However; depending on what these devices actually are, you might need/want some higher level code. For example, maybe xx1 is a temperature sensor and it doesn't make any sense to try to set it, and you want it to scale that raw value so it's "degrees celsius", and the highest bit of the raw value is used to indicate an error condition (and the actual temperature is only 7 bits), so the wrapper might be more like:
#include "src/devices.h"
#define xx1_offset -12.34
#define xx1_scale 1.234
static volatile uint8_t * xx1 = (uint8_t*) xx1_address;
float get_xx1_temperature(void) {
uint8_t raw_temp = *xx1;
if(raw_temp * 0x80 != 0) {
/* Error flag set */
return NAN;
}
/* No error */
return (raw_temp + xx1_offset) * xx1_scale;
}
In the meanwhile, i figured it out.
Its just as easy as defining .data, .bsss and .sbss in the linker directives.
6 Lines of code and its working like a charm.

Can not able to read data from custom AXI peripheral register

I am working with a Zynq board where a custom AXI 4 lite slave peripheral is created and then added from the IP Repository.
And created a synthesizable custom IP in vivado (which is sine wave IP)and also wrote a C code for reading this IP output ( i want to read a data from the register). But somehow it shows something diff. instead of what I expect.
Here I'm attaching a screenshot and my c code for that.
But in teraterm it shows some garbage memory state.
Here, I'm expecting a sinewave output. ( In digit form )
Pls, suggest me correction or suggestion about Where could I have gone wrong or what have I missed in C code?.
#include "xil_printf.h"
#include "xil_io.h"
#include "xparameters.h"
#include "xil_types.h"
#include "xparameters_ps.h"
#include <stdio.h>
//Definitions for peripheral MYIPINETHREE_0 //
#define XPAR_ MYIPINETHREE_0_DEVICE_ID 0
#define XPAR_ MYIPINETHREE_0_S00_AXI_BASEADDR 0x43C00000
#define XPAR_ MYIPINETHREE_0_S00_AXI_HIGHADDR 0x43C0FFFF
int main(){
u32 baseaddr;
int sine, sinephase, enable,reg ;
while (1)
{
xil_printf("start of ip test\r");
if (enable == 1)
reg = 0xFFFFFFFF;
else
reg = 0x00000000;
Xil_Out32(0x43C00000, 32 );
sine = Xil_In32(baseaddr+4);
xil_printf("\r state: %d", sine);
Xil_Out32(0x43C00000, 32);
sinephase = Xil_In32(baseaddr+4);
xil_printf("\r state: %d", sinephase);
return 0;
}
}
To start with: you never initialize baseaddr but use it to read from.
Also I can't say because I have no idea how to verify that your addresses are correct. Normally you should use the defines from your xparameters.h file where the Xilinx board package program puts them. I don't see that happening here.
I am somewhat suspicious as all my Xilinx AXI addresses start with 0x800... but then I might be because I am using a different FPGA.
Please add a board layout or some additional information. You should use the addresses from xparameters.hinstead of hard coding them into your source code. The address space depends on your AXI master interface.

Uart receives correct Bytes but in chaotic order

Using Atmel studio 7, with STK600 and 32UC3C MCU
I'm pulling my hair over this.
I'm sending strings of a variable size over UART once every 5 seconds. The String consists of one letter as opcode, then two chars are following that tell the lenght of the following datastring (without the zero, there is never a zero at the end of any of those strings). In most cases the string will be 3 chars in size, because it has no data ("p00").
After investigation I found out that what supposed to be "p00" was in fact "0p0" or "00p" or (only at first try after restarting the micro "p00"). I looked it up in the memory view of the debugger. Then I started hTerm and confirmed that the data was in fact "p00". So after a while hTerm showed me "p00p00p00p00p00p00p00..." while the memory of my circular uart buffer reads "p000p000p0p000p0p000p0p0..."
edit: Actually "0p0" and "00p" are alternating.
The baud rate is 9600. In the past I was only sending single letters. So everything was running well.
This is the code of the Receiver Interrupt:
I tried different variations in code that were all doing the same in a different way. But all of them showed the exact same behavior.
lastWebCMDWritePtr is a uint8_t* type and so is lastWebCMDRingstartPtr.
lastWebCMDRingRXLen is a uint8_t type.
__attribute__((__interrupt__))
void UartISR_forWebserver()
{
*(lastWebCMDWritePtr++) = (uint8_t)((&AVR32_USART0)->rhr & 0x1ff);
lastWebCMDRingRXLen++;
if(lastWebCMDWritePtr - lastWebCMDRingstartPtr > lastWebCMDRingBufferSIZE)
{
lastWebCMDWritePtr = lastWebCMDRingstartPtr;
}
// Variation 2:
// advanceFifo((uint8_t)((&AVR32_USART0)->rhr & 0x1ff));
// Variation 3:
// if(usart_read_char(&AVR32_USART0, getReadPointer()) == USART_RX_ERROR)
// {
// usart_reset_status(&AVR32_USART0);
// }
//
};
I welcome any of your ideas and advices.
Regarts Someo
P.S. I put the Atmel studio tag in case this has something to do with the myriad of debugger bugs of AS.
For a complete picture you would have to show where and how lastWebCMDWritePtr, lastWebCMDRingRXLen, lastWebCMDRingstartPtr and lastWebCMDRingBufferSIZE are used elsewhere (on the consuming side)
Also I would first try a simpler ISR with no dependencies to other software modules to exclude a hardware resp. register handling problem.
Approach:
#define USART_DEBUG
#define DEBUG_BUF_SIZE 30
__attribute__((__interrupt__))
void UartISR_forWebserver()
{
uint8_t rec_byte;
#ifdef USART_DEBUG
static volatile uint8_t usart_debug_buf[DEBUG_BUF_SIZE]; //circular buffer for debugging
static volatile int usart_debug_buf_index = 0;
#endif
rec_byte = (uint8_t)((&AVR32_USART0)->rhr & 0x1ff);
#ifdef USART_DEBUG
usart_debug_buf_index = usart_debug_buf_index % DEBUG_BUF_SIZE;
usart_debug_buf[usart_debug_buf_index] = rec_byte;
usart_debug_buf_index++
if (!(usart_debug_buf_index < DEBUG_BUF_SIZE)) {
usart_debug_buf_index = 0; //candidate for a breakpoint to see what happened in the past
}
#endif
//uart_recfifo_enqueue(rec_byte);
};

How to do computations with addresses at compile/linking time?

I wrote some code for initializing the IDT, which stores 32-bit addresses in two non-adjacent 16-bit halves. The IDT can be stored anywhere, and you tell the CPU where by running the LIDT instruction.
This is the code for initializing the table:
void idt_init(void) {
/* Unfortunately, we can't write this as loops. The first option,
* initializing the IDT with the addresses, here looping over it, and
* reinitializing the descriptors didn't work because assigning a
* a uintptr_t (from (uintptr_t) handler_func) to a descr (a.k.a.
* uint64_t), according to the compiler, "isn't computable at load
* time."
* The second option, storing the addresses as a local array, simply is
* inefficient (took 0.020ms more when profiling with the "time" command
* line program!).
* The third option, storing the addresses as a static local array,
* consumes too much space (the array will probably never be used again
* during the whole kernel runtime).
* But IF my argument against the third option will be invalidated in
* the future, THEN it's the best option I think. */
/* Initialize descriptors of exception handlers. */
idt[EX_DE_VEC] = idt_trap(ex_de);
idt[EX_DB_VEC] = idt_trap(ex_db);
idt[EX_NMI_VEC] = idt_trap(ex_nmi);
idt[EX_BP_VEC] = idt_trap(ex_bp);
idt[EX_OF_VEC] = idt_trap(ex_of);
idt[EX_BR_VEC] = idt_trap(ex_br);
idt[EX_UD_VEC] = idt_trap(ex_ud);
idt[EX_NM_VEC] = idt_trap(ex_nm);
idt[EX_DF_VEC] = idt_trap(ex_df);
idt[9] = idt_trap(ex_res); /* unused Coprocessor Segment Overrun */
idt[EX_TS_VEC] = idt_trap(ex_ts);
idt[EX_NP_VEC] = idt_trap(ex_np);
idt[EX_SS_VEC] = idt_trap(ex_ss);
idt[EX_GP_VEC] = idt_trap(ex_gp);
idt[EX_PF_VEC] = idt_trap(ex_pf);
idt[15] = idt_trap(ex_res);
idt[EX_MF_VEC] = idt_trap(ex_mf);
idt[EX_AC_VEC] = idt_trap(ex_ac);
idt[EX_MC_VEC] = idt_trap(ex_mc);
idt[EX_XM_VEC] = idt_trap(ex_xm);
idt[EX_VE_VEC] = idt_trap(ex_ve);
/* Initialize descriptors of reserved exceptions.
* Thankfully we compile with -std=c11, so declarations within
* for-loops are possible! */
for (size_t i = 21; i < 32; ++i)
idt[i] = idt_trap(ex_res);
/* Initialize descriptors of hardware interrupt handlers (ISRs). */
idt[INT_8253_VEC] = idt_int(int_8253);
idt[INT_8042_VEC] = idt_int(int_8042);
idt[INT_CASC_VEC] = idt_int(int_casc);
idt[INT_SERIAL2_VEC] = idt_int(int_serial2);
idt[INT_SERIAL1_VEC] = idt_int(int_serial1);
idt[INT_PARALL2_VEC] = idt_int(int_parall2);
idt[INT_FLOPPY_VEC] = idt_int(int_floppy);
idt[INT_PARALL1_VEC] = idt_int(int_parall1);
idt[INT_RTC_VEC] = idt_int(int_rtc);
idt[INT_ACPI_VEC] = idt_int(int_acpi);
idt[INT_OPEN2_VEC] = idt_int(int_open2);
idt[INT_OPEN1_VEC] = idt_int(int_open1);
idt[INT_MOUSE_VEC] = idt_int(int_mouse);
idt[INT_FPU_VEC] = idt_int(int_fpu);
idt[INT_PRIM_ATA_VEC] = idt_int(int_prim_ata);
idt[INT_SEC_ATA_VEC] = idt_int(int_sec_ata);
for (size_t i = 0x30; i < IDT_SIZE; ++i)
idt[i] = idt_trap(ex_res);
}
The macros idt_trap and idt_int, and are defined as follows:
#define idt_entry(off, type, priv) \
((descr) (uintptr_t) (off) & 0xffff) | ((descr) (KERN_CODE & 0xff) << \
0x10) | ((descr) ((type) & 0x0f) << 0x28) | ((descr) ((priv) & \
0x03) << 0x2d) | (descr) 0x800000000000 | \
((descr) ((uintptr_t) (off) & 0xffff0000) << 0x30)
#define idt_int(off) idt_entry(off, 0x0e, 0x00)
#define idt_trap(off) idt_entry(off, 0x0f, 0x00)
idt is an array of uint64_t, so these macros are implicitly cast to that type. uintptr_t is the type guaranteed to be capable of holding pointer values as integers and on 32-bit systems usually 32 bits wide. (A 64-bit IDT has 16-byte entries; this code is for 32-bit).
I get the warning that the initializer element is not constant due to the address modification in play.
It is absolutely sure that the address is known at linking time.
Is there anything I can do to make this work? Making the idt array automatic would work but this would require the whole kernel to run in the context of one function and this would be some bad hassle, I think.
I could make this work by some additional work at runtime (as Linux 0.01 also does) but it just annoys me that something technically feasible at linking time is actually infeasible.
Related: Solution needed for building a static IDT and GDT at assemble/compile/link time - a linker script for ld can shift and mask to break up link-time-constant addresses. No earlier step has the final addresses, and relocation entries are limited in what they can represent in a .o.
The main problem is that function addresses are link-time constants, not strictly compile time constants. The compiler can't just get 32b binary integers and stick that into the data segment in two separate pieces. Instead, it has to use the object file format to indicate to the linker where it should fill in the final value (+ offset) of which symbol when linking is done. The common cases are as an immediate operand to an instruction, a displacement in an effective address, or a value in the data section. (But in all those cases it's still just filling in 32-bit absolute address so all 3 use the same ELF relocation type. There's a different relocation for relative displacements for jump / call offsets.)
It would have been possible for ELF to have been designed to store a symbol reference to be substituted at link time with a complex function of an address (or at least high / low halves like on MIPS for lui $t0, %hi(symbol) / ori $t0, $t0, %lo(symbol) to build address constants from two 16-bit immediates). But in fact the only function allowed is addition/subtraction, for use in things like mov eax, [ext_symbol + 16].
It is of course possible for your OS kernel binary to have a static IDT with fully resolved addresses at build time, so all you need to do at runtime is execute a single lidt instruction. However, the standard
build toolchain is an obstacle. You probably can't achieve this without post-processing your executable.
e.g. you could write it this way, to produce a table with the full padding in the final binary, so the data can be shuffled in-place:
#include <stdint.h>
#define PACKED __attribute__((packed))
// Note, this is the 32-bit format. 64-bit is larger
typedef union idt_entry {
// we will postprocess the linker output to have this format
// (or convert at runtime)
struct PACKED runtime { // from OSdev wiki
uint16_t offset_1; // offset bits 0..15
uint16_t selector; // a code segment selector in GDT or LDT
uint8_t zero; // unused, set to 0
uint8_t type_attr; // type and attributes, see below
uint16_t offset_2; // offset bits 16..31
} rt;
// linker output will be in this format
struct PACKED compiletime {
void *ptr; // offset bits 0..31
uint8_t zero;
uint8_t type_attr;
uint16_t selector; // to be swapped with the high16 of ptr
} ct;
} idt_entry;
// #define idt_ct_entry(off, type, priv) { .ptr = off, .type_attr = type, .selector = priv }
#define idt_ct_trap(off) { .ct = { .ptr = off, .type_attr = 0x0f, .selector = 0x00 } }
// generate an entry in compile-time format
extern void ex_de(); // these are the raw interrupt handlers, written in ASM
extern void ex_db(); // they have to save/restore *all* registers, and end with iret, rather than the usual C ABI.
// it might be easier to use asm macros to create this static data,
// just so it can be in the same file and you don't need cross-file prototypes / declarations
// (but all the same limitations about link-time constants apply)
static idt_entry idt[] = {
idt_ct_trap(ex_de),
idt_ct_trap(ex_db),
// ...
};
// having this static probably takes less space than instructions to write it on the fly
// but not much more. It would be easy to make a lidt function that took a struct pointer.
static const struct PACKED idt_ptr {
uint16_t len; // encoded as bytes - 1, so 0xffff means 65536
void *ptr;
} idt_ptr = { sizeof(idt) - 1, idt };
/****** functions *********/
// inline
void load_static_idt(void) {
asm volatile ("lidt %0"
: // no outputs
: "m" (idt_ptr));
// memory operand, instead of writing the addressing mode ourself, allows a RIP-relative addressing mode in 64bit mode
// also allows it to work with -masm=intel or not.
}
// Do this once at at run-time
// **OR** run this to pre-process the binary, after link time, as part of your build
void idt_convert_to_runtime(void) {
#ifdef DEBUG
static char already_done = 0; // make sure this only runs once
if (already_done)
error;
already_done = 1;
#endif
const int count = sizeof idt / sizeof idt[0];
for (int i=0 ; i<count ; i++) {
uint16_t tmp1 = idt[i].rt.selector;
uint16_t tmp2 = idt[i].rt.offset_2;
idt[i].rt.offset_2 = tmp1;
idt[i].rt.selector = tmp2;
// or do this swap in fewer insns with SSE or MMX pshufw, but using vector instructions before setting up the IDT may be insane.
}
}
This does compile. See a diff of the -m32 and -m64 asm output on the Godbolt compiler explorer. Look at the layout in the data section (note that .value is a synonym for .short, and is 16 bits.) (But note that the IDT table format is different for 64-bit mode.)
I think I have the size calculation correct (bytes - 1), as documented in http://wiki.osdev.org/Interrupt_Descriptor_Table. Minimum value 100h bytes long (encoded as 0x99). See also https://en.wikibooks.org/wiki/X86_Assembly/Global_Descriptor_Table. (lgdt size/pointer works the same way, although the table itself has a different format.)
The other option, instead of having the IDT static in the data section, is to have it in the bss section, with the data stored as immediate constants in a function that will initialize it (or in an array read by that function).
Either way, that function (and its data) can be in a .init section whose memory you re-use after it's done. (Linux does this to reclaim memory from code and data that's only needed once, at startup.) This would give you the optimal tradeoff of small binary size (since 32b addresses are smaller than 64b IDT entries), and no runtime memory wasted on code to set up the IDT. A small loop that runs once at startup is negligible CPU time. (The version on Godbolt fully unrolls because I only have 2 entries, and it embeds the address into each instruction as a 32-bit immediate, even with -Os. With a large enough table (just copy/paste to duplicate a line) you get a compact loop even at -O3. The threshold is lower for -Os.)
Without memory-reuse haxx, probably a tight loop to rewrite 64b entries in place is the way to go. Doing it at build time would be even better, but then you'd need a custom tool to run the tranformation on the kernel binary.
Having the data stored in immediates sounds good in theory, but the code for each entry would probably total more than 64b, because it couldn't loop. The code to split an address into two would have to be fully unrolled (or placed in a function and called). Even if you had a loop to store all the same-for-multiple-entries stuff, each pointer would need a mov r32, imm32 to get the address in a register, then mov word [idt+i + 0], ax / shr eax, 16 / mov word [idt+i + 6], ax. That's a lot of machine-code bytes.
One way would be to use an intermediate jump table that is located at a fixed address. You could initialize the idt with addresses of the locations in this table (which will be compile time constant). The locations in the jump table will contain jump instructions to the actual isr routines.
The dispatch to an isr will be indirect as follows:
trap -> jump to intermediate address in the idt -> jump to isr
One way to create a jump table at a fixed address is as follows.
Step 1: Put jump table in a section
// this is a jump table at a fixed address
void jump(void) __attribute__((section(".si.idt")));
void jump(void) {
asm("jmp isr0"); // can also be asm("call ...") depending on need
asm("jmp isr1");
asm("jmp isr2");
}
Step 2: Instruct the linker to locate the section at a fixed address
SECTIONS
{
.so.idt 0x600000 :
{
*(.si.idt)
}
}
Put this in the linker script right after the .text section. This will make sure that the executable code in the table will go into a executable memory region.
You can instruct the linker to use your script as follows using the --script option in the Makefile.
LDFLAGS += -Wl,--script=my_script.lds
The following macro gives you the address of the location which contains the jump (or call) instruction to the corresponding isr.
// initialize the idt at compile time with const values
// you can find a cleaner way to generate offsets
#define JUMP_ADDR(off) ((char*)0x600000 + 4 + (off * 5))
You will then initialize the idt as follows using modified macros.
// your real idt will be initialized as follows
#define idt_entry(addr, type, priv) \
( \
((descr) (uintptr_t) (addr) & 0xffff) | \
((descr) (KERN_CODE & 0xff) << 0x10) | \
((descr) ((type) & 0x0f) << 0x28) | \
((descr) ((priv) & 0x03) << 0x2d) | \
((descr) 0x1 << 0x2F) | \
((descr) ((uintptr_t) (addr) & 0xffff0000) << 0x30) \
)
#define idt_int(off) idt_entry(JUMP_ADDR(off), 0x0e, 0x00)
#define idt_trap(off) idt_entry(JUMP_ADDR(off), 0x0f, 0x00)
descr idt[] =
{
...
idt_trap(ex_de),
...
idt_int(int_casc),
...
};
A demo working example is below, which shows dispatch to a isr with a non-fixed address from a instruction at a fixed address.
#include <stdio.h>
// dummy isrs for demo
void isr0(void) {
printf("==== isr0\n");
}
void isr1(void) {
printf("==== isr1\n");
}
void isr2(void) {
printf("==== isr2\n");
}
// this is a jump table at a fixed address
void jump(void) __attribute__((section(".si.idt")));
void jump(void) {
asm("jmp isr0"); // can be asm("call ...")
asm("jmp isr1");
asm("jmp isr2");
}
// initialize the idt at compile time with const values
// you can find a cleaner way to generate offsets
#define JUMP_ADDR(off) ((char*)0x600000 + 4 + (off * 5))
// dummy idt for demo
// see below for the real idt
char* idt[] =
{
JUMP_ADDR(0),
JUMP_ADDR(1),
JUMP_ADDR(2),
};
int main(int argc, char* argv[]) {
int trap;
char* addr = idt[trap = argc - 1];
printf("==== idt[%d]=%p\n", trap, addr);
asm("jmp *%0\n" : :"m"(addr));
}

How to debug driver load error?

I've made a driver for Windows, compiled it and tried to start it via SC manager, but I get the system error from the SC manager API:
ERROR_PROC_NOT_FOUND The specified procedure could not be found.
Is there a way to get more information about why exactly the driver fails to start?
WinDbg or something? If I comment out all code in my DriverEntry routine, the driver starts.
The only thing I'm calling is a procedure in another source module (in my own project, though). I can comment out all external dependencies and I still get the same error.
Edit:
I've also tried different DDKs, i.e. 2003 DDK und Vista WDK (but not Win7 WDK)
Edit2:
Here is my driver sour code file driver.cpp:
#ifdef __cplusplus
extern "C" {
#endif
#include <ntddk.h>
#include <ntstrsafe.h>
#ifdef __cplusplus
}; // extern "C"
#endif
#include "../distorm/src/distorm.h"
void DriverUnload(IN PDRIVER_OBJECT DriverObject)
{
}
#define MAX_INSTRUCTIONS 20
#ifdef __cplusplus
extern "C" {
#endif
NTSTATUS DriverEntry(IN PDRIVER_OBJECT DriverObject, IN PUNICODE_STRING RegistryPath)
{
UNICODE_STRING pFcnName;
// Holds the result of the decoding.
_DecodeResult res;
// Decoded instruction information.
_DecodedInst decodedInstructions[MAX_INSTRUCTIONS];
// next is used for instruction's offset synchronization.
// decodedInstructionsCount holds the count of filled instructions' array by the decoder.
unsigned int decodedInstructionsCount = 0, i, next;
// Default decoding mode is 32 bits, could be set by command line.
_DecodeType dt = Decode32Bits;
// Default offset for buffer is 0, could be set in command line.
_OffsetType offset = 0;
char* errch = NULL;
// Buffer to disassemble.
char *buf;
int len = 100;
// Register unload routine
DriverObject->DriverUnload = DriverUnload;
DbgPrint("diStorm Loaded!\n");
// Get address of KeBugCheck
RtlInitUnicodeString(&pFcnName, L"KeBugCheck");
buf = (char *)MmGetSystemRoutineAddress(&pFcnName);
offset = (unsigned) (_OffsetType)buf;
DbgPrint("Resolving KeBugCheck # 0x%08x\n", buf);
// Decode the buffer at given offset (virtual address).
while (1) {
res = distorm_decode(offset, (const unsigned char*)buf, len, dt, decodedInstructions, MAX_INSTRUCTIONS, &decodedInstructionsCount);
if (res == DECRES_INPUTERR) {
DbgPrint(("NULL Buffer?!\n"));
break;
}
for (i = 0; i < decodedInstructionsCount; i++) {
// Note that we print the offset as a 64 bits variable!!!
// It might be that you'll have to change it to %08X...
DbgPrint("%08I64x (%02d) %s %s %s\n", decodedInstructions[i].offset, decodedInstructions[i].size,
(char*)decodedInstructions[i].instructionHex.p,
(char*)decodedInstructions[i].mnemonic.p,
(char*)decodedInstructions[i].operands.p);
}
if (res == DECRES_SUCCESS || decodedInstructionsCount == 0) {
break; // All instructions were decoded.
}
// Synchronize:
next = (unsigned int)(decodedInstructions[decodedInstructionsCount-1].offset - offset);
next += decodedInstructions[decodedInstructionsCount-1].size;
// Advance ptr and recalc offset.
buf += next;
len -= next;
offset += next;
}
DbgPrint(("Done!\n"));
return STATUS_SUCCESS;
}
#ifdef __cplusplus
}; // extern "C"
#endif
My directory structure is like this:
base_dir\driver\driver.cpp
\distorm\src\all_the_c_files
\distorm\distorm.h
\distorm\config.h
My SOURCES file:
# $Id$
TARGETNAME=driver
TARGETPATH=obj
TARGETTYPE=DRIVER
# Additional defines for the C/C++ preprocessor
C_DEFINES=$(C_DEFINES) -DSUPPORT_64BIT_OFFSET
SOURCES=driver.cpp \
distorm_dummy.c \
drvversion.rc
INCLUDES=..\distorm\src;
TARGETLIBS=$(DDK_LIB_PATH)\ntdll.lib \
$(DDK_LIB_PATH)\ntstrsafe.lib
You can download diStorm from here: http://ragestorm.net/distorm/dl.php?id=8
distorm_dummy is the same as the dummy.c from the diStorm lib.
Enable "Show loader snaps" using gflags -- in the debug output, you should find information about which import the loader is not able to resolve.
Not surprisingly, you have all the information you need to solve this on your own.
ERROR_PROC_NOT_FOUND The specified procedure could not be found.
This, combined with your dependency Walker output, pretty much points to a broken Import Table
Why is your IT broken? I'm not sure, could be a problem with your build/linker settings, since rather obviously, HAL.DLL is right there in %windir%\system32.
Reasons for a broken load order are many and you'll have to track them down yourself.
Have you tried running Dependency Walker on the compiled .sys and see if there is actually some missing function imports?
Build it with the 6000 WDK/DDK (because with the "actual" Build 7600... it links against wdfldr.sys, but under Windows Vista and XP Systems this sys file is not available).
I don't know where you can download it officially but i did use a torrent...
You can add deferred breakpoints in WinDbg.
If you specify a breakpoint, while the driver is not loaded (or with bu), it will be triggered, when the driver does get loaded and enters the function.
The command for specifiying breakpoints is :
bp <module_name>!<function_name>
e.g. :
bp my_driver!DriverEntry

Resources