I'm working on native call bindings for a virtual machine, and one of the features is to be able to look up standard libc functions by name at runtime. On windows this becomes a bit of a hassle because I need to get a handle to the msvcrt module that's currently loaded in the process. Normally this is msvcrt.dll, but it could be other variants as well (msvcr100.dll, etc) and a call to GetModuleHandle("msvcrt") could fail if a variant with a different name is used.
What I would like to be able to do is a reverse lookup, take a function pointer from libc (which I have in abundance) and get a handle to the module that provides it. Basically, something like this:
HANDLE hlibc = ReverseGetModuleHandle(fprintf); // Any func from libc should do the trick
void *vfunc = GetProcAddress(hlibc);
Is there such a thing in the win32 API, without descending into a manual walk of process handles and symbol tables? Conversely, if I am over-thinking the problem, is there an easier way to look up a libc function by name on win32?
The documented way of obtaining the module handle is by using GetModuleHandleEx.
HMODULE hModule = NULL;
if(GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS |
GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, // behave like GetModuleHandle
(LPCTSTR)address, &hModule))
{
// hModule should now refer to the module containing the target address.
}
MEMORY_BASIC_INFORMATION mbi;
HMODULE mod;
if (VirtualQuery( vfunc, &mbi, sizeof(mbi) ))
{
mod = (HMODULE)mbi.AllocationBase;
}
Unfortunately you will have to walk through modules as you feared. It's not too bad though. Here is the idea, some code written in notepad:
MODULEENTRY32 me = {0};
HANDLE hSnapshot = CreateToolhelp32Snapshot( TH32CS_SNAPMODULE, 0 );
me.dwSize = sizeof me;
Module32First( hSnapshot, &me );
if( me.modBaseAddr <= funcPtr &&
( me.modBaseAddr + me.modBaseSize ) > funcPtr ) {
...
break;
}
do {
} while( Module32Next( hSnapshot, &me ) );
CloseHandle( hSnapshot );
Related
My question is about whether or not I should use the Windows API if I'm trying to get the most speed out of my program, where I could instead use a Standard Library function.
I assume the answer isn't consistent among every call; Specifically, I'm curious about stat() vs dwFileAttributes, if I wanted to figure out if a file was a directory or not for example (assuming file_name is a string containing the full path to the file):
WIN32_FIND_DATA fileData;
HANDLE hSearch;
hSearch = FindFirstFile(TEXT(file_name), &fileData);
int isDir = fileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY
vs.
struct stat info;
stat(file_name, &info);
int isDir = S_ISDIR(info.st_mode);
If anyone knows, or can elaborate on what the speed difference between these libraries generally is (if any) I'd appreciate it.
Not an answer regarding speed, but #The Corn Inspector, the MSVC CRT code is open sourced. If you look at an older version ( before .net and common UCRT), and look at the stat() function, it is INDEED a wrapper around the same OS call.
int __cdecl _tstat (
REG1 const _TSCHAR *name,
REG2 struct _stat *buf
)
{ // stuff omitted for clarity
_TSCHAR * path;
_TSCHAR pathbuf[ _MAX_PATH ];
int drive; /* A: = 1, B: = 2, etc. */
HANDLE findhandle;
WIN32_FIND_DATA findbuf;
/* Call Find Match File */
findhandle = FindFirstFile((_TSCHAR *)name, &findbuf);
Of course there is addtional code for mapping structures, etc. Looks like it also does some time conversion:
SYSTEMTIME SystemTime;
FILETIME LocalFTime;
if ( !FileTimeToLocalFileTime( &findbuf.ftLastWriteTime,
&LocalFTime ) ||
!FileTimeToSystemTime( &LocalFTime, &SystemTime ) )
{
so theoretically, it could be slower, but probably so insignificant, as to make no practical difference in the context of a complete, complex program. If you are calling stat() a million times, and worry about milliseconds, who knows. Profile it.
What is the intention to set handle to an object as pointer-to pointer but not pointer? Like following code:
FT_Library library;
FT_Error error = FT_Init_FreeType( &library );
where
typedef struct FT_LibraryRec_ *FT_Library
so &library is a FT_LIBraryRec_ handle of type FT_LIBraryRec_**
It's a way to emulate pass by reference in C, which otherwise only have pass by value.
The 'C' library function FT_Init_FreeType has two outputs, the error code and/or the library handle (which is a pointer).
In C++ we'd more naturally either:
return an object which encapsulated the success or failure of the call and the library handle, or
return one output - the library handle, and throw an exception on failure.
C APIs are generally not implemented this way.
It is not unusual for a C Library function to return a success code, and to be passed the addresses of in/out variables to be conditionally mutated, as per the case above.
The approach hides implementation. It speeds up compilation of your code. It allows to upgrade data structures used by the library without breaking existing code that uses them. Finally, it makes sure the address of that object never changes, and that you don’t copy these objects.
Here’s how the version with a single pointer might be implemented:
struct FT_Struct
{
// Some fields/properties go here, e.g.
int field1;
char* field2;
}
FT_Error Init( FT_Struct* p )
{
p->field1 = 11;
p->field2 = malloc( 100 );
if( nullptr == p->field2 )
return E_OUTOFMEMORY;
return S_OK;
}
Or C++ equivalent, without any pointers:
class FT_Struct
{
int field1;
std::vector<char> field2;
public:
FT_Struct() :
field1( 11 )
{
field2.resize( 100 );
}
};
As a user of the library, you have to include struct/class FT_Struct definition. Libraries can be very complex so this will slow down compilation of your code.
If the library is dynamic i.e. *.dll on windows, *.so on linux or *.dylib on osx, you upgrade the library and if the new version changes memory layout of the struct/class, old applications will crash.
Because of the way C++ works, objects are passed by value, i.e. you normally expect them to be movable and copiable, which is not necessarily what library author wants to support.
Now consider the following function instead:
FT_Error Init( FT_Struct** pp )
{
try
{
*pp = new FT_Struct();
return S_OK;
}
catch( std::exception& ex )
{
return E_FAIL;
}
}
As a user of the library, you no longer need to know what’s inside FT_Struct or even what size it is. You don’t need to #include the implementation details, i.e. compilation will be faster.
This plays nicely with dynamic libraries, library author can change memory layout however they please, as long as the C API is stable, old apps will continue to work.
The API guarantees you won’t copy or move the values, you can’t copy structures of unknown lengths.
I'm new at C, so sorry for my lack of knowledge (my C-book here is really massive :)
I would like to extend a shared library (libcustomer.so) with closed source, but public known api.
Is something like this possible?
rename libcustomer.so to liboldcustomer.so
create an extended shared library libcustomer.so (so others implicitly use the extended one)
link liboldcustomer.so into my extended libcustomer.so via -loldcustomer
forward any not extra-implemented methods directly to the old "liboldcustomer.so"
I don't think it would work that way (the name is compiled into the .so, isn't it?).
But what's the alternative?
For #4: is there a general way to do this, or do I have to write a method named like the old one and forward the call (how?)?
Because the original libcustomer.so (=liboldcustomer.so) can change from time to time, all that stuff should work dynamically.
For security reasons, our system has no LD_PRELOAD (otherwise I would take that :( ).
Think about extended validation-checks & some better NPE-handlings.
Thanks in advance for your help!
EDIT:
I'm just implementing my extension as shown in the answer, but I have one unhandled case at the moment:
How can I "proxy" the structs from the extended library?
For example I have this:
customer.h:
struct customer;
customer.c:
struct customer {
int children:1;
int age;
struct house *house_config;
};
Now, in my customer-extension.c I am writing all the public methods form customer.c, but how do I "pass-thru" the structs?
Many thanks for your time & help!
So you have OldLib with
void func1();
int func2();
... etc
The step 4 might look like creating another library with some static initialization.
Create NewLib with contents:
void your_func1();
void (*old_func1_ptr)() = NULL;
int (*old_func2_ptr)() = NULL;
void func1()
{
// in case you don't have static initializers, implement lazy loading
if(!old_func1_ptr)
{
void* lib = dlopen("OldLibFileName.so", RTLD_NOW);
old_func1_ptr = dlsym(lib, "func1");
}
old_func1_ptr();
}
int func2()
{
return old_func2_ptr();
}
// gcc extension, static initializer - will be called on .so's load
// If this is not supported, then you should call this function
// manually after loading the NewLib.so in your program.
// If the user of OldLib.so is not _your_ program,
// then implement lazy-loading in func1, func2 etc. - check function pointers for being NULL
// and do the dlopen/dlsym calls there.
__attribute__((constructor))
void static_global_init()
{
// use dlfcn.h
void* lib = dlopen("OldLibFileName.so", RTLD_NOW);
old_func1_ptr = dlsym(lib, "func1");
...
}
The static_global_init and all the func_ptr's can be autogenerated if you have some description of the old API. After the NewLib is created, you certainly can replace the OldLib.
I am trying to get a list of DLLs that a given process is using, I am trying to achieve that through VirtualQueryEx. My problem is that it return to me just a partial list of DLLs and not all of them (i can see the list using Process Explorer or using VirtualQuery on the given process).
Here's the code:
char szBuf[MAX_PATH * 100] = { 0 };
PBYTE pb = NULL;
MEMORY_BASIC_INFORMATION mbi;
HANDLE h_process = OpenProcess(PROCESS_QUERY_INFORMATION, FALSE, iPID);
while (VirtualQueryEx(h_process, pb, &mbi, sizeof(mbi)) == sizeof(mbi)) {
int nLen;
char szModName[MAX_PATH];
if (mbi.State == MEM_FREE)
mbi.AllocationBase = mbi.BaseAddress;
if ((mbi.AllocationBase == hInstDll) ||
(mbi.AllocationBase != mbi.BaseAddress) ||
(mbi.AllocationBase == NULL)) {
// Do not add the module name to the list
// if any of the following is true:
// 1. If this region contains this DLL
// 2. If this block is NOT the beginning of a region
// 3. If the address is NULL
nLen = 0;
} else {
nLen = GetModuleFileNameA((HINSTANCE) mbi.AllocationBase,
szModName, _countof(szModName));
}
if (nLen > 0) {
wsprintfA(strchr(szBuf, 0), "\n%p-%s",
mbi.AllocationBase, szModName);
}
pb += mbi.RegionSize;
}
I am getting the result on szBuf.
This function is part of a DLL file so that it is harder for me to debug.
Right now the DLL is compiled as x64 binary and i am using it against x64 processes.
P.S i know about EnumProcessModules and i am not using it with a reason (too long:).
GetModuleFileName() only gives you the name for modules loaded in your process, not the other process. It will give you a few hits by accident, the Windows operating system DLLs will get loaded at the same address and will thus have the same module handle value.
You will need to use GetModuleFileNameEx() so you can pass the process handle.
Do note the fundamental flaw with your code as posted, you are not doing anything to ensure that you can safely use VirtualQueryEx() on another process. Which requires that you suspend all its threads so it cannot allocate memory while you are iterating it, the kind of thing a debugger does. Also required for EnumProcessModules. The failure mode is nasty, it is random and it can easily get your loop stuck, iterating the same addresses over and over again. Which is why the CreateToolHelp32Snapshot() function exists, emphasis on "snapshot".
Every example I can find is in C++, but I'm trying to keep my project in C. Is it even possible to host the CLR in a C program?
If so, can you point me to an example?
As the above comments hint, there is a set of COM APIs for hosting the CLR, and you should be able to call these COM APIs from both C and C++.
As an example, below is a quick piece of (untested) C code that shows how to start up the CLR and execute a static method of a class in a managed assembly (which takes in a string as an argument and returns an integer). The key difference between this code and its C++ counterpart is the definition of COBJMACROS and the use of the <type>_<method> macros (e.g. ICLRRuntimeHost_Start) to call into the CLR-hosting COM interface. (Note that COBJMACROS must be defined prior to #include'ing mscoree.h to make sure these utility macros get defined.)
#include <windows.h>
#define COBJMACROS
#include <mscoree.h>
int main(int argc, char **argv)
{
HRESULT status;
ICLRRuntimeHost *Host;
BOOL Started;
DWORD Result;
Host = NULL;
Started = FALSE;
status = CorBindToRuntimeEx(
NULL,
NULL,
0,
&CLSID_CLRRuntimeHost,
&IID_ICLRRuntimeHost,
(PVOID *)&Host
);
if (FAILED(status)) {
goto cleanup;
}
status = ICLRRuntimeHost_Start(Host);
if (FAILED(status)) {
goto cleanup;
}
Started = TRUE;
status = ICLRRuntimeHost_ExecuteInDefaultAppDomain(
Host,
L"c:\\path\\to\\assembly.dll",
L"MyNamespace.MyClass",
L"MyMethod",
L"some string argument to MyMethod",
&Result
);
if (FAILED(status)) {
goto cleanup;
}
// inspect Result
// ...
cleanup:
if (Started) {
ICLRRuntimeHost_Stop(Host);
}
if (Host != NULL) {
ICLRRuntimeHost_Release(Host);
}
return SUCCEEDED(status) ? 0 : 1;
}
This sample should work with .NET 2.0+, although it looks like .NET 4.0 (not yet released) has deprecated some of these APIs in favor of a new set of APIs for hosting the CLR. (And if you need this to work with .NET 1.x, you need to use ICorRuntimeHost instead of ICLRRuntimeHost.)