detecting NVIDIA GPUs without CUDA

detecting NVIDIA GPUs without CUDA - c

I would like to extract a rather limited set of information about NVIDIA GPUs without linking against the CUDA libraries. The only information that is needed is compute capability and name of the GPU, more than this could be useful but it is not required. The code should be written in C (or C++). The information would be used at configure-time (when the CUDA toolkit is not available) and at run-time (when the executed binary is not compiled with CUDA support) to suggest the user that a supported GPU is present in the system.
As far as I understand, this is possible through the driver API, but I am not very familiar with the technical details of what this would require. So my questions are:
What are the exact steps to fulfill at least the minimum requirement (see above);
Is there such open-source code available?
Note that the my first step would be to have some code for Linux, but ultimately I'd need platform-independent code. Considering the platform-availability of CUDA, for a complete solution this would involve code for on x86/AMD64 for Linux, Mac OS, and Windows (at least for now, the list could get soon extended with ARM).
Edit
What I meant by "it's possible through the driver API" is that one should be able to load libcuda.so dynamically and query the device properties through the driver API. I'm not sure about the details, though.

Unfortunately NVML doesn't provide information about device compute capability.
What you need to do is:
Load CUDA library manually (application is not linked against libcuda)
If the library doesn't exist then CUDA driver is not installed
Find pointers to necessary functions in the library
Use driver API to query information about available GPUs
I hope this code will be helpful. I've tested it under Linux but with minor modifications it should also compile under Windows.
#include <cuda.h>
#include <stdio.h>
#ifdef WINDOWS
#include <Windows.h>
#else
#include <dlfcn.h>
#endif
void * loadCudaLibrary() {
#ifdef WINDOWS
return LoadLibraryA("nvcuda.dll");
#else
return dlopen ("libcuda.so", RTLD_NOW);
#endif
}
void (*getProcAddress(void * lib, const char *name))(void){
#ifdef WINDOWS
return (void (*)(void)) GetProcAddress(lib, name);
#else
return (void (*)(void)) dlsym(lib,(const char *)name);
#endif
}
int freeLibrary(void *lib)
{
#ifdef WINDOWS
return FreeLibrary(lib);
#else
return dlclose(lib);
#endif
}
typedef CUresult CUDAAPI (*cuInit_pt)(unsigned int Flags);
typedef CUresult CUDAAPI (*cuDeviceGetCount_pt)(int *count);
typedef CUresult CUDAAPI (*cuDeviceComputeCapability_pt)(int *major, int *minor, CUdevice dev);
int main() {
void * cuLib;
cuInit_pt my_cuInit = NULL;
cuDeviceGetCount_pt my_cuDeviceGetCount = NULL;
cuDeviceComputeCapability_pt my_cuDeviceComputeCapability = NULL;
if ((cuLib = loadCudaLibrary()) == NULL)
return 1; // cuda library is not present in the system
if ((my_cuInit = (cuInit_pt) getProcAddress(cuLib, "cuInit")) == NULL)
return 1; // sth is wrong with the library
if ((my_cuDeviceGetCount = (cuDeviceGetCount_pt) getProcAddress(cuLib, "cuDeviceGetCount")) == NULL)
return 1; // sth is wrong with the library
if ((my_cuDeviceComputeCapability = (cuDeviceComputeCapability_pt) getProcAddress(cuLib, "cuDeviceComputeCapability")) == NULL)
return 1; // sth is wrong with the library
{
int count, i;
if (CUDA_SUCCESS != my_cuInit(0))
return 1; // failed to initialize
if (CUDA_SUCCESS != my_cuDeviceGetCount(&count))
return 1; // failed
for (i = 0; i < count; i++)
{
int major, minor;
if (CUDA_SUCCESS != my_cuDeviceComputeCapability(&major, &minor, i))
return 1; // failed
printf("dev %d CUDA compute capability major %d minor %d\n", i, major, minor);
}
}
freeLibrary(cuLib);
return 0;
}
Test on Linux:
$ gcc -ldl main.c
$ ./a.out
dev 0 CUDA compute capability major 2 minor 0
dev 1 CUDA compute capability major 2 minor 0
Test on linux with no CUDA driver
$ ./a.out
$ echo $?
1
Cheers

Sure these people know the answer:
http://www.ozone3d.net/gpu_caps_viewer
but i can only know that i could be done with an installation of CUDA or OpenCL.
I think one way could be using OpenGL directly, maybe that is what you were talking about with the driver API, but i can only give you these example (CUDA required):
http://www.naic.edu/~phil/hardware/nvidia/doc/src/deviceQuery/deviceQuery.cpp

First, I think NVIDIA NVML is the API you are looking for. Second, there is an open-source project based on NVML called PAPI NVML.

I solved this problem by using and linking statically against the CUDA 6.0 SDK. It produces an application that works also well on a machines that does not have NVIDIA cards or on machines that the SDK is not installed. In such case it will indicate that there are zero CUDA capable devices.
There is an example in the samples included with the CUDA SDK calld deviceQuery - I used snippets from it to write the following code. I decide if a CUDA capable devices are present and if so which has the higest compute capabilities:
#include <cuda_runtime.h>
struct GpuCap
{
bool QueryFailed; // True on error
int DeviceCount; // Number of CUDA devices found
int StrongestDeviceId; // ID of best CUDA device
int ComputeCapabilityMajor; // Major compute capability (of best device)
int ComputeCapabilityMinor; // Minor compute capability
};
GpuCap GetCapabilities()
{
GpuCap gpu;
gpu.QueryFailed = false;
gpu.StrongestDeviceId = -1;
gpu.ComputeCapabilityMajor = -1;
gpu.ComputeCapabilityMinor = -1;
cudaError_t error_id = cudaGetDeviceCount(&gpu.DeviceCount);
if (error_id != cudaSuccess)
{
gpu.QueryFailed = true;
gpu.DeviceCount = 0;
return gpu;
}
if (gpu.DeviceCount == 0)
return gpu; // "There are no available device(s) that support CUDA
// Find best device
for (int dev = 0; dev < gpu.DeviceCount; ++dev)
{
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, dev);
if (deviceProp.major > gpu.ComputeCapabilityMajor)
{
gpu.ComputeCapabilityMajor = dev;
gpu.ComputeCapabilityMajor = deviceProp.major;
gpu.ComputeCapabilityMinor = 0;
}
if (deviceProp.minor > gpu.ComputeCapabilityMinor)
{
gpu.ComputeCapabilityMajor = dev;
gpu.ComputeCapabilityMinor = deviceProp.minor;
}
}
return gpu;
}

Related

Member value lost when passing object by pointer

I am very new to the FreeBSD world and am currently porting my terminal emulation library from Linux to FreeBSD and Mac OS. I've encountered some very strange behavior such that when I pass a struct by pointer to a subroutine the member values become zeroed out. This does not happen on Linux or Mac OS. It also does not matter if the compiler is GCC or Clang.
I've confirmed that the member value is correct before the subroutine is called and the parent struct is passed by pointer.
I've tested the same code on Linux and Mac OS and they do not exhibit the problem.
I've switched between GCC and Clang on FreeBSD and that seems to have no effect.
I've consider that stack smashing could be happening but it seems unlikely because ulimit shows that the stack size on Linux is 8M but on FreeBSD it's much larger (524 MB). I've also tried compiling with -fstack-protector-strong but none of this matters.
#include "vterm.h"
#include "vterm_private" // vterm_t and vterm_desc_t defined here
void vterm_cursor_move_backward(vterm_t* vterm) {
vterm_desc_t* v_desc = NULL;
int min_row;
int idx;
// idx = vterm_buffer_get_active(vterm);
idx = 0; // hard set to 0 just for debugging
v_desc = &vterm->vterm_desc[idx];
// printf() will display a value of zero
printf("%d\n\r", v_desc->ccol);
fflush(stdout);
}
void vterm_interpret_ctrl_char(vterm_t* vterm, const char* data) {
vterm_desc_t *v_desc = NULL;
int idx;
char verb;
// idx = vterm_buffer_get_active(vterm);
idx = 0; // hard set to 0 just for debugging
v_desc = &vterm->vterm_desc[idx];
verb = data[0];
switch (verb) {
case '\b': {
// the following printf will print a positive number
printf("%d\n\r", v_desc->ccol);
fflush(stdout);
vterm_cursor_move_backward(vterm);
break;
}
}
}
I expect the value of v_desc->ccol to be identical in both functions. Godbolt Link Github Link See files vterm_ctrl_char.c and vterm_cursor.c

After countless hours of debugging I figured out that data in the vterm_desc_t structure was actually being shifted causing the member value to be set to zero. Although, the ncurses header file is included via vterm_private.h, on FreeBSD that doesn't seem to matter. Both GCC and Clang are happy to silently compile the vterm_cursor.c translation unit with bad / incomplete alignment.
I would recommend anyone running into kind of problem to try and compile each translation unit individually which is how I unearthed it. For example gcc -S vterm_cursor.c
Thank you to everyone who took a look at this.

Cannot open USB device with libusb-1.0 in cygwin

I'm trying to interface with a USB peripheral using libusb-1.0 in cygwin.
libusb_get_device_list(...) works fine, I get a list of USB devices. It finds the device with the correct VendorID and ProductID in the device list, but when libusb_open(...) is called with that device, it always fails with the error code LIBUSB_ERROR_NOT_FOUND.
I don't think it's a permission issue, I've tried running this as admin, and there's a separate error code (LIBUSB_ERROR_ACCESS) for that. This same code works with libusb-1.0 in Linux.
unsigned init_usb(int vendor_id, int product_id, int interface_num)
{
int ret = libusb_init(NULL);
if (ret < 0) return CONTROL_ERROR;
libusb_device **devs = NULL;
int num_dev = libusb_get_device_list(NULL, &devs);
libusb_device *dev = NULL;
for (int i = 0; i < num_dev; i++) {
struct libusb_device_descriptor desc;
libusb_get_device_descriptor(devs[i], &desc);
if (desc.idVendor == vendor_id && desc.idProduct == product_id) {
dev = devs[i];
break;
}
}
if (dev == NULL) return CONTROL_ERROR;
libusb_device_handle *devh = NULL;
ret = libusb_open(dev, &devh);
//ret is always -5 here (in cygwin)!
if (ret < 0) return CONTROL_ERROR;
libusb_free_device_list(devs, 1);
return CONTROL_SUCCESS;
}

It turns out this was a kind of driver issue. I had to tell Windows to associate the particular device I'm using with the libusb drivers.
libusb-win32-1.2.6.0 comes with some tools to make that association (although you may need to configure your system to allow the installation of unsigned drivers).
There's one tricky bit. If you just want to associate the device with libusb, you can use the inf-wizard.exe tool to make that association, but that will change the primary association to be with libusb. In my case, the device is a USB Audio Class device (i.e. USB sound card) that also has some libusb functionality. When I used inf-wizard.exe, libusb started working (yay!), but then it stopped working as an audio device.
In my case, I needed to use the install-filter-win.exe tool to install a filter driver for libusb. That allows the device to still show up as a USB Audio device, but also interact with it using libusb.

Are the function signatures correct in the RSA Authentication Agent API documentation?

I have some software which uses the documented API for RSA's Authentication Agent. This is a product which runs as a service on the client machines in a domain, and authenticates users locally by communicating with an "RSA Authentication Manager" installed centrally.
The Authentication Agent's API is publicly documented here: Authentication Agent API 8.1.1 for C Developers Guide. However, the docs seem to be incorrect, and I do not have access to the RSA header files - they are not public; only the PDF documentation is available for download without paying $$ to RSA. If anyone here has access to up to date header files, would you be able to confirm for me whether the documentation is out of date?
The function signatures given in the API docs seem incorrect - in fact, I'm absolutely convinced that they are wrong on x64 machines. For example, the latest PDF documentation shows the following:
int WINAPI AceSetUserData(SDI_HANDLE hdl, unsigned int userData)
int WINAPI AceGetUserData(SDI_HANDLE hdl, unsigned int *pUserData)
The documentation states several times that the "userData" value is a 32-bit quantity, for example in the documentation for AceInit, AceSetUserData, and AceGetUserData. A relevant excerpt from the docs for AceGetUserData:
This function is synchronous and the caller must supply, as the second argument, a pointer to a 32-bit storage area (that is, an unsigned int) into which to copy the user data value.
This is clearly false - from some experimentation, if you pass in a pointer to the center of a buffer filled with 0xff, AceGetUserData is definitely writing out a 64-bit value, not a 32-bit quantity.
My version of aceclnt.dll is 8.1.3.563; the corresponding documentation is labelled "Authentication Agent API 8.1 SP1", and this corresponds to version 7.3.1 of the Authentication Agent itself.
Test code
Full test code given, even though it's not relevant to the problem at all... It's no use to me if someone else runs the test code (I know what it does!), what I need is someone with access to the RSA header files who can confirm the function signatures.
#include <assert.h>
#include <stdlib.h>
#include <stdint.h>
#ifdef WIN32
#include <Windows.h>
#include <tchar.h>
#define SDAPI WINAPI
#else
#define SDAPI
#endif
typedef int SDI_HANDLE;
typedef uint32_t SD_BOOL;
typedef void (SDAPI* AceCallback)(SDI_HANDLE);
#define ACE_SUCCESS 1
#define ACE_PROCESSING 150
typedef SD_BOOL (SDAPI* AceInitializeEx_proto)(const char*, char*, uint32_t);
typedef int (SDAPI* AceInit_proto)(SDI_HANDLE*, void*, AceCallback);
typedef int (SDAPI* AceClose_proto)(SDI_HANDLE, AceCallback);
typedef int (SDAPI* AceGetUserData_proto)(SDI_HANDLE, void*);
typedef int (SDAPI* AceSetUserData_proto)(SDI_HANDLE, void*);
struct Api {
AceInitializeEx_proto AceInitializeEx;
AceInit_proto AceInit;
AceClose_proto AceClose;
AceGetUserData_proto AceGetUserData;
AceSetUserData_proto AceSetUserData;
} api;
static void api_init(struct Api* api) {
// All error-checking stripped...
HMODULE dll = LoadLibrary(_T("aceclnt.dll")); // leak this for the demo
api->AceInitializeEx = (AceInitializeEx_proto)GetProcAddress(dll, "AceInitializeEx");
api->AceInit = (AceInit_proto)GetProcAddress(dll, "AceInit");
api->AceClose = (AceClose_proto)GetProcAddress(dll, "AceClose");
api->AceGetUserData = (AceGetUserData_proto)GetProcAddress(dll, "AceGetUserData");
api->AceSetUserData = (AceSetUserData_proto)GetProcAddress(dll, "AceSetUserData");
int success = api->AceInitializeEx("C:\\my\\conf\\directory", 0, 0);
assert(success);
}
static void demoFunction(SDI_HANDLE handle) {
union {
unsigned char testBuffer[sizeof(void *) * 3];
void *forceAlignment;
} u;
memset(u.testBuffer, 0xA5, sizeof u.testBuffer);
int err = api.AceGetUserData(handle, (void*)(u.testBuffer + sizeof(void*)));
assert(err == ACE_SUCCESS);
fputs("DEBUG: testBuffer =", stderr);
for (size_t i = 0; i < sizeof(u.testBuffer); i++) {
if (i % 4 == 0)
putc(' ', stderr);
fprintf(stderr, "%02x", u.testBuffer[i]);
}
fputc('\n', stderr);
// Prints:
// DEBUG: testBuffer = a5a5a5a5 a5a5a5a5 00000000 00000000 a5a5a5a5 a5a5a5a5
// According to the docs, this should only write out a 32-bit value
}
static void SDAPI demoCallback(SDI_HANDLE h) {
fprintf(stderr, "Callback invoked, handle = %p\n", (void*)h);
}
int main(int argc, const char** argv)
{
api_init(&api);
SDI_HANDLE h;
int err = api.AceInit(&h, /* contentious argument */ 0, &demoCallback);
assert(err == ACE_PROCESSING);
demoFunction(h);
api.AceClose(h, 0);
return 0;
}

As you've copied the function/type definitions out of the documentation, you basically don't have and never will have the correct definition for the version of the .dll you're using and could always end up in crashes or worse, undefined behavior.
What you could do is to debug the corresponding .dll:
Do you run Visual Studio? I remember that VS could enter a function call in debug mode and show the assembly, not sure though how it is today. But any disassembler should do the trick. As of x64 ABI register rcx gets the first argument, rdx the second. If the function internally works with the 32bit register names or clears the upper 32bit than you can assume a 32bit integer. If it uses it to load an address (e.g. lea instruction) you could assume a pointer. But as you can see, that's probably not a road you wanna go down...
So what else do you have left?
The document you've linked states a 32-bit and 64-bit library - depending on the platform you use. I guess you use the 64bit lib and that RSA did not update the documentation for this library, but at some point the developers needed to upgrade the library to 64bit.
So think about this way: If you would be the API developer, what is possible to migrate to 64bit and what not. E.g. everything that needs to work across 32/64 implementations (stuff that gets send over the network or stored and shared on disk) cannot be touched. But everything that's local to the instance, can be migrated. As the userData seems to be a runtime thing, it makes sense to support whatever the platform provides: unsigned long on 64bit and unsigned int on 32bit.
You've figured out that userData must be 64 bit. But not because the function writes out a 64bit integer, but because the function sees a 64bit value to start with. As integers are passed by value (I guess in general, but definitely in WINAPI), there's absolutely no chance the function could see the full 64bit value if it would be a 32bit datatype. So most likely, the API developers changed the datatype to unsigned long (in any case to 64bit type).
PS: If you end up putting a pointer into userData, cast the pointer to uintptr_t and store/read that type.

To avoid questions of undefined behavior, please replace your test function with this one, and report what it prints. Please also show us the complete test program, so that people who have access to this library can compile and run it for themselves and tinker with it. I would especially like to see the declarations of the api global and its type, and the code that initializes api, and to know where the type came from (did you make it up as part of this reverse engineering exercise or did you get it from somewhere?)
static void demoFunction(SDI_HANDLE handle) {
int err = api.AceSetUserData(handle, 0);
assert(err == ACE_SUCCESS);
union {
unsigned char testBuffer[sizeof(void *) * 3];
void *forceAlignment;
} u;
memset(u.testBuffer, 0xA5, sizeof u.testBuffer);
err = api.AceGetUserData(handle, (void *)(u.testBuffer + sizeof(void*)));
assert (err == ACE_SUCCESS);
fputs("DEBUG: testBuffer =", stderr);
for (size_t i = 0; i < sizeof(u.testBuffer); i++) {
if (i % 4 == 0)
putc(' ', stderr);
printf(stderr, "%02x", u.testBuffer[i]);
}
fputc('\n', stderr);
}
(If your hypothesis is correct, the output will be
DEBUG: testBuffer = a5a5a5a5 a5a5a5a5 00000000 00000000 a5a5a5a5 a5a5a5a5
.)

Waiting in DOS using djgpp -- alternatives to busy wait?

I recently wrote a little curses game and as all it needs to work is some timer mechanism and a curses implementation, the idea to try building it for DOS comes kind of naturally. Curses is provided by pdcurses for DOS.
Timing is already different between POSIX and Win32, so I have defined this interface:
#ifndef CSNAKE_TICKER_H
#define CSNAKE_TICKER_H
void ticker_init(void);
void ticker_done(void);
void ticker_start(int msec);
void ticker_stop(void);
void ticker_wait(void);
#endif
The game calls ticker_init() and ticker_done() once, ticker_start() with a millisecond interval as soon as it needs ticks and ticker_wait() in its main loop to wait for the next tick.
Using the same implementation on DOS as the one for POSIX platforms, using setitimer(), didn't work. One reason was that the C lib coming with djgpp doesn't implement waitsig(). So I created a new implementation of my interface for DOS:
#undef __STRICT_ANSI__
#include <time.h>
uclock_t tick;
uclock_t nextTick;
uclock_t tickTime;
void
ticker_init(void)
{
}
void
ticker_done(void)
{
}
void
ticker_start(int msec)
{
tickTime = msec * UCLOCKS_PER_SEC / 1000;
tick = uclock();
nextTick = tick + tickTime;
}
void
ticker_stop()
{
}
void
ticker_wait(void)
{
while ((tick = uclock()) < nextTick);
nextTick = tick + tickTime;
}
This works like a charm in dosbox (I don't have a real DOS system right now). But my concern is: Is busy waiting really the best I can do on this platform? I'd like to have a solution allowing the CPU to at least save some energy.
For reference, here's the whole source.

Ok, I think I can finally answer my own question (thanks Wyzard for the helpful comment!)
The obvious solution, as there doesn't seem any library call doing this, is putting a hlt in inline assembly. Unfortunately, this crashed my program. Looking for the reason, it is because the default dpmi server used runs the program in ring 3 ... hlt is reserved to ring 0. So to use it, you have to modify the loader stub to load a dpmi server running your program in ring 0. See later.
Browsing through the docs, I came across __dpmi_yield(). If we are running in a multitasking environment (Win 3.x or 9x ...), there will already be a dpmi server provided by the operating system, and of course, in that case we want to give up our time slice while waiting instead of trying the privileged hlt.
So, putting it all together, the source for DOS now looks like this:
#undef __STRICT_ANSI__
#include <time.h>
#include <dpmi.h>
#include <errno.h>
static uclock_t nextTick;
static uclock_t tickTime;
static int haveYield;
void
ticker_init(void)
{
errno = 0;
__dpmi_yield();
haveYield = errno ? 0 : 1;
}
void
ticker_done(void)
{
}
void
ticker_start(int msec)
{
tickTime = msec * UCLOCKS_PER_SEC / 1000;
nextTick = uclock() + tickTime;
}
void
ticker_stop()
{
}
void
ticker_wait(void)
{
if (haveYield)
{
while (uclock() < nextTick) __dpmi_yield();
}
else
{
while (uclock() < nextTick) __asm__ volatile ("hlt");
}
nextTick += tickTime;
}
In order for this to work on plain DOS, the loader stub in the compiled executable must be modified like this:
<path to>/stubedit bin/csnake.exe dpmi=CWSDPR0.EXE
CWSDPR0.EXE is a dpmi server running all code in ring 0.
Still to test is whether yielding will mess with the timing when running under win 3.x / 9x. Maybe the time slices are too long, will have to check that. Update: It works great in Windows 95 with this code above.
The usage of the hlt instruction breaks compatibility with dosbox 0.74 in a weird way .. the program seems to hang forever when trying to do a blocking getch() through PDcurses. This doesn't happen however on a real MS-DOS 6.22 in virtualbox. Update: This is a bug in dosbox 0.74 that is fixed in the current SVN tree.
Given those findings, I assume this is the best way to wait "nicely" in a DOS program.
Update: It's possible to do even better by checking all available methods and picking the best one. I found a DOS idle call that should be considered as well. The strategy:
If yield is supported, use this (we are running in a multitasking environment)
If idle is supported, use this. Optionally, if we're in ring-0, do a hlt each time before calling idle, because idle is documented to return immediately when no other program is ready to run.
Otherwise, in ring-0 just use plain hlt instructions.
Busy-waiting as a last resort.
Here's a little example program (DJGPP) that tests for all possibilities:
#include <stdio.h>
#include <dpmi.h>
#include <errno.h>
static unsigned int ring;
static int
haveDosidle(void)
{
__dpmi_regs regs;
regs.x.ax = 0x1680;
__dpmi_int(0x28, &regs);
return regs.h.al ? 0 : 1;
}
int main()
{
puts("checking idle methods:");
fputs("yield (int 0x2f 0x1680): ", stdout);
errno = 0;
__dpmi_yield();
if (errno)
{
puts("not supported.");
}
else
{
puts("supported.");
}
fputs("idle (int 0x28 0x1680): ", stdout);
if (!haveDosidle())
{
puts("not supported.");
}
else
{
puts("supported.");
}
fputs("ring-0 HLT instruction: ", stdout);
__asm__ ("mov %%cs, %0\n\t"
"and $3, %0" : "=r" (ring));
if (ring)
{
printf("not supported. (running in ring-%u)\n", ring);
}
else
{
puts("supported. (running in ring-0)");
}
}
The code in my github repo reflects the changes.

How to debug driver load error?

I've made a driver for Windows, compiled it and tried to start it via SC manager, but I get the system error from the SC manager API:
ERROR_PROC_NOT_FOUND The specified procedure could not be found.
Is there a way to get more information about why exactly the driver fails to start?
WinDbg or something? If I comment out all code in my DriverEntry routine, the driver starts.
The only thing I'm calling is a procedure in another source module (in my own project, though). I can comment out all external dependencies and I still get the same error.
Edit:
I've also tried different DDKs, i.e. 2003 DDK und Vista WDK (but not Win7 WDK)
Edit2:
Here is my driver sour code file driver.cpp:
#ifdef __cplusplus
extern "C" {
#endif
#include <ntddk.h>
#include <ntstrsafe.h>
#ifdef __cplusplus
}; // extern "C"
#endif
#include "../distorm/src/distorm.h"
void DriverUnload(IN PDRIVER_OBJECT DriverObject)
{
}
#define MAX_INSTRUCTIONS 20
#ifdef __cplusplus
extern "C" {
#endif
NTSTATUS DriverEntry(IN PDRIVER_OBJECT DriverObject, IN PUNICODE_STRING RegistryPath)
{
UNICODE_STRING pFcnName;
// Holds the result of the decoding.
_DecodeResult res;
// Decoded instruction information.
_DecodedInst decodedInstructions[MAX_INSTRUCTIONS];
// next is used for instruction's offset synchronization.
// decodedInstructionsCount holds the count of filled instructions' array by the decoder.
unsigned int decodedInstructionsCount = 0, i, next;
// Default decoding mode is 32 bits, could be set by command line.
_DecodeType dt = Decode32Bits;
// Default offset for buffer is 0, could be set in command line.
_OffsetType offset = 0;
char* errch = NULL;
// Buffer to disassemble.
char *buf;
int len = 100;
// Register unload routine
DriverObject->DriverUnload = DriverUnload;
DbgPrint("diStorm Loaded!\n");
// Get address of KeBugCheck
RtlInitUnicodeString(&pFcnName, L"KeBugCheck");
buf = (char *)MmGetSystemRoutineAddress(&pFcnName);
offset = (unsigned) (_OffsetType)buf;
DbgPrint("Resolving KeBugCheck # 0x%08x\n", buf);
// Decode the buffer at given offset (virtual address).
while (1) {
res = distorm_decode(offset, (const unsigned char*)buf, len, dt, decodedInstructions, MAX_INSTRUCTIONS, &decodedInstructionsCount);
if (res == DECRES_INPUTERR) {
DbgPrint(("NULL Buffer?!\n"));
break;
}
for (i = 0; i < decodedInstructionsCount; i++) {
// Note that we print the offset as a 64 bits variable!!!
// It might be that you'll have to change it to %08X...
DbgPrint("%08I64x (%02d) %s %s %s\n", decodedInstructions[i].offset, decodedInstructions[i].size,
(char*)decodedInstructions[i].instructionHex.p,
(char*)decodedInstructions[i].mnemonic.p,
(char*)decodedInstructions[i].operands.p);
}
if (res == DECRES_SUCCESS || decodedInstructionsCount == 0) {
break; // All instructions were decoded.
}
// Synchronize:
next = (unsigned int)(decodedInstructions[decodedInstructionsCount-1].offset - offset);
next += decodedInstructions[decodedInstructionsCount-1].size;
// Advance ptr and recalc offset.
buf += next;
len -= next;
offset += next;
}
DbgPrint(("Done!\n"));
return STATUS_SUCCESS;
}
#ifdef __cplusplus
}; // extern "C"
#endif
My directory structure is like this:
base_dir\driver\driver.cpp
\distorm\src\all_the_c_files
\distorm\distorm.h
\distorm\config.h
My SOURCES file:
# $Id$
TARGETNAME=driver
TARGETPATH=obj
TARGETTYPE=DRIVER
# Additional defines for the C/C++ preprocessor
C_DEFINES=$(C_DEFINES) -DSUPPORT_64BIT_OFFSET
SOURCES=driver.cpp \
distorm_dummy.c \
drvversion.rc
INCLUDES=..\distorm\src;
TARGETLIBS=$(DDK_LIB_PATH)\ntdll.lib \
$(DDK_LIB_PATH)\ntstrsafe.lib
You can download diStorm from here: http://ragestorm.net/distorm/dl.php?id=8
distorm_dummy is the same as the dummy.c from the diStorm lib.

Enable "Show loader snaps" using gflags -- in the debug output, you should find information about which import the loader is not able to resolve.

Not surprisingly, you have all the information you need to solve this on your own.
ERROR_PROC_NOT_FOUND The specified procedure could not be found.
This, combined with your dependency Walker output, pretty much points to a broken Import Table
Why is your IT broken? I'm not sure, could be a problem with your build/linker settings, since rather obviously, HAL.DLL is right there in %windir%\system32.
Reasons for a broken load order are many and you'll have to track them down yourself.

Have you tried running Dependency Walker on the compiled .sys and see if there is actually some missing function imports?

Build it with the 6000 WDK/DDK (because with the "actual" Build 7600... it links against wdfldr.sys, but under Windows Vista and XP Systems this sys file is not available).
I don't know where you can download it officially but i did use a torrent...

You can add deferred breakpoints in WinDbg.
If you specify a breakpoint, while the driver is not loaded (or with bu), it will be triggered, when the driver does get loaded and enters the function.
The command for specifiying breakpoints is :
bp <module_name>!<function_name>
e.g. :
bp my_driver!DriverEntry

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

detecting NVIDIA GPUs without CUDA - c

First, I think NVIDIA NVML is the API you are looking for. Second, there is an open-source project based on NVML called PAPI NVML.

Related

Member value lost when passing object by pointer

Cannot open USB device with libusb-1.0 in cygwin

Are the function signatures correct in the RSA Authentication Agent API documentation?

Waiting in DOS using djgpp -- alternatives to busy wait?

How to debug driver load error?

Categories

Resources