weird stack corruption due to a dll call - c

I'm trying to make a call to a DLL function (via GetProcAddress etc) from C, using lcc compiler. The function gets called and everything goes well, but it looks like the top of the stack gets corrupted. I've tried to play with calling conventions (__stdcall / __cdecl), but that didn't help.
Unfortunately I don't have access to the dll code, and have to use the lcc compiler.
I found that this simple hack avoids stack corruption:
void foo(params)
{
int dummy;
dll_foo(params);
}
Here dll_foo is the pointer returned by GetProcAddress, and the stack is kind of protected by the dummy variable. So it's not the stack pointer that gets corrupted, but the data at the top of the stack. It works like this, but I'd like to know the reason of the corruption.
Any ideas?
UPD:
As asked in the comments, here are the actual function types:
typedef unsigned char (CALLBACK Tfr)(unsigned char);
typedef void (CALLBACK Tfw)(unsigned char,unsigned char);
typedef int (CALLBACK Tfs)(int);
typedef void (CALLBACK Tfwf)(int*,int);
All they show a similar behavior.
Unfortunately, it is not so straightforward to attach a debugger, as the code is compiled and launched by Matlab, using the LCC compiler, and there is no debugging support. Probably I will have to reproduce this problem in a standalone configuration, but it is not that easy to make it.

Sounds like you use MSVC, Debug + Windows + Registers. Look at the value of ESP before and after the call. If it doesn't match then first change the calling convention in the function pointer declaration (did you do that right?) If it still doesn't match then it is __stdcall and you haven't guessed the arguments you need to pass correctly.
Or the function could just clobbers the stack frame, it isn't impossible.
Posting your function pointer declaration that shows the real arguments would probably help diagnose this better.

It sounds to me like you were on the right track with looking at the calling convention. The main thing you need to do is ensure that the caller and callee are both using the same convention. Typically for a DLL, you want to use __stdcall for both, but if (as you say) you have no control over the DLL, then you need to modify your code to match what it's doing. Unfortunately, it's almost impossible to guess what that is -- I'm pretty sure lcc (like most C and C++ compilers) can produce code to use a variety of conventions.
Based on your hack working by putting an extra dword on the stack, it sounds like you currently have a mismatch where both the caller and the callee are trying to clear arguments off the stack (e.g., the caller using __cdecl and the callee using __stdcall.

You could try to "follow" the call to dll_foo() i assembler using a debugger, at check out exactly what the routine does stack-wise.

Related

Can a function know what's calling it?

Can a function tell what's calling it, through the use of memory addresses maybe? For example, function foo(); gets data on whether it is being called in main(); rather than some other function?
If so, is it possible to change the content of foo(); based on what is calling it?
Example:
int foo()
{
if (being called from main())
printf("Hello\n");
if (being called from some other function)
printf("Goodbye\n");
}
This question might be kind of out there, but is there some sort of C trickery that can make this possible?
For highly optimized C it doesn't really make sense. The harder the compiler tries to optimize the less the final executable resembles the source code (especially for link-time code generation where the old "separate compilation units" problem no longer prevents lots of optimizations). At least in theory (but often in practice for some compilers) functions that existed in the source code may not exist in the final executable (e.g. may have been inlined into their caller); functions that didn't exist in the source code may be generated (e.g. compiler detects common sequences in many functions and "out-lines" them into a new function to avoid code duplication); and functions may be replaced by data (e.g. an "int abcd(uint8_t a, uint8_t b)" replaced by a abcd_table[a][b] lookup table).
For strict C (no extensions or hacks), no. It simply can't support anything like this because it can't expect that (for any compiler including future compilers that don't exist yet) the final output/executable resembles the source code.
An implementation defined extension, or even just a hack involving inline assembly, may be "technically possible" (especially if the compiler doesn't optimize the code well). The most likely approach would be to (ab)use debugging information to determine the caller from "what the function should return to when it returns".
A better way for a compiler to support a hypothetical extension like this may be for the compiler to use some of the optimizations I mentioned - specifically, split the original foo() into 2 separate versions where one version is only ever called from main() and the other version is used for other callers. This has the bonus of letting the compiler optimize out the branches too - it could become like int foo_when_called_from_main() { printf("Hello\n"); }, which could be inlined directly into the caller, so that neither version of foo exists in the final executable. Of course if foo() had other code that's used by all callers then that common code could be lifted out into a new function rather than duplicating it (e.g. so it might become like int foo_when_called_from_main() { printf("Hello\n"); foo_common_code(); }).
There probably isn't any hypothetical compiler that works like that, but there's no real reason you can't do these same optimizations yourself (and have it work on all compilers).
Note: Yes, this was just a crafty way of suggesting that you can/should refactor the code so that it doesn't need to know which function is calling it.
Knowing who called a specific function is essentially what a stack trace is visualizing. There are no general standard way of extracting that though. In theory one could write code that targeted each system type the software would run on, and implement a stack trace function for each of them. In that case you could examine the stack and see what is before the current function.
But with all that said and done, the question you should probably ask is why? Writing a function that functions in a specific way when called from a specific function is not well isolated logic. Instead you could consider passing in a parameter to the function that caused the change in logic. That would also make the result more testable and reliable.
How to actually extract a stack trace has already received many answers here: How can one grab a stack trace in C?
I think if loop in C cannot have a condition as you have mentioned.
If you want to check whether this function is called from main(), you have to do the printf statement in the main() and also at the other function.
I don't really know what you are trying to achieve but according to what I understood, what you can do is each function will pass an additional argument that would uniquely identify that function in form of a character array, integer or enumeration.
for example:
enum function{main, add, sub, div, mul};
and call functions like:
add(3,5,main);//adds 3 and 5. called from main
changes to the code would be typical like if you are adding more functions. but it's an easier way to do it.
No. The C language does not support obtaining the name or other information of who called a function.
As all other answers show, this can only be obtained using external tools, for example that use stack traces and compiler/linker emitted symbol tables.

Call C address as function without prototype

I need to call a function in C by just knowing it address, and no information
on it prototype (I can't cast it to a C function pointer).
The information I have on this function is it address.
I also know the parameters I want to pass to it (Thanks to a void pointer) and
the size of the arguments array (accessed trough the void pointer).
I also want to respect the C calling convention. For x86 version, I pretty much
know how to do it (allocate the space on the stack, copy the parameters to
that space and finally call the function).
The problem is with x64 convention (Linux one for now) where parameters are
passed through registers. I have no idea of the size of each parameter to fill
appropriately registers, I only know the size of the parameter array.
Also, I don't want to depend on gcc so I can't use __builtin_apply that seems
to be not standard and also be pretty dark.
I want to write my own piece of code to support multi compiler and also to
learn interesting stuff.
So basically, the function I want to write as the same prototype as
__builtin_apply which is:
void *call_ptr(void (*fun)(), void *params, size_t size);
I want also the code to write it in C (thanks to asm inline) or pure x64 asm.
So is there a way to do this properly and with respect of the calling
convention ? Or is this impossible with the x64 convention without knowing
exactly the prototype of the function called ?
Especially for x64 calling convention on Linux this will not work at all.
The reason is the very complicated calling convention.
Some examples:
void funcA(float64 x);
void funcB(int64 x);
In these two cases the value "x" is passed to the functions differently because floating point and integer are passed to the functions in different registers.
void funcC(float64 x,int64 y);
void funcD(int64 y,float64 x);
In these two cases the arguments "x" and "y" are in different order. However they are passed to the function in the same way (both functions use the same register for "x" and the same register for "y").
Conclusion: To create a function that does what you want you'd have to pass a string containing the argument types of each argument to the assembler function. The number/size of arguments is definitely not enough. However it would definitely be possible - as long as it must work only on Linux.
I think, all of your decision will not be supported multi-compiler, because the mechanism of passing arguments to function (registers, their order, stack, memory) - it's compiler dependence feature...

Heap/Stack corruption on DLL call

I am trying to use a Visual Studio 2008 SP1 created dll (with Common Language Runtime Support enabled) within Codeblocks (which uses GCC under mingw). Some of the arguments that are being passed to the dll have been dynamically allocated by the calling function. My question is:
"Can the arguments being passed to a dll reside on the heap of the calling function. Is it safe to do this?"
On return from dll the stack of the calling function gets corrupted and on trying to access those, I get a SIGTRAP within codeblocks when I try to debug this problem.
What could be the reason for this?
The prototype of the dll function goes like this:
int __cdecl myTesseractOCR(myOCRData* labels_for_ocr);
myOCRDaata definition is as shown below:
typedef struct __ocr_data
{
char* arr_image [NUMOBJ_LIMIT_HIGH];
int start_x [NUMOBJ_LIMIT_HIGH];
int start_y [NUMOBJ_LIMIT_HIGH];
int width [NUMOBJ_LIMIT_HIGH];
int height [NUMOBJ_LIMIT_HIGH];
int widthstep [NUMOBJ_LIMIT_HIGH];
char number_plate_buff [2*NUMOBJ_LIMIT_HIGH];
int ocr_label_count;
} myOCRData;
arr_image points to data which resides on the calling function's heap where as all the other members of the above structure reside on the stack of the calling function. All these members residing on the stack get corrupted and the program generates a SIGTRAP. I have seen such problems being discussed all across in various threads on stackoverflow but haven't got figured out a concrete solution yet.
I'd advise that you make your DLL interface as flat as possible; i.e. avoid passing structures, even if they are POD. Since you're using 2 different compilers this is particularly important. If you do decide to pass structures, make sure the packing of the structures is explicitly defined under both compilers.
It's perfectly reasonable for the DLL to access memory that resides on the calling applications heap. If you couldn't do that then DLLs would be essentially useless.
Your problem must lie elsewhere. Most likely your aren't quite setting up the parameters for the call to the DLL correctly.
Can you cross check the calling conventions flags for GCC and DLL convention VS2kSP1 CLR
The heap does not belong to a function. It's perfectly fine to allocate memory in a module and pass-it to another one just make sure that the module that allocated the memory is the one that free it.
A second source of troubles may be a different calling conversion. Specify a calling convention for all exported functions.

Access command line arguments without using char **argv in main

Is there any way to access the command line arguments, without using the argument to main? I need to access it in another function, and I would prefer not passing it in.
I need a solution that only necessarily works on Mac OS and Linux with GCC.
I don't know how to do it on MacOS, but I suspect the trick I will describe here can be ported to MacOS with a bit of cross-reading.
On linux you can use the so called ".init_array" section of the ELF binary, to register a function which gets called during program initilization (before main() is called). This function has the same signature as the normal main() function, execept it returns "void".
Thus, you can use this function to remember or process argc, argv[] and evp[].
Here is some code you can use:
static void my_cool_main(int argc, char* argv[], char* envp[])
{
// your code goes here
}
__attribute__((section(".init_array"))) void (* p_my_cool_main)(int,char*[],char*[]) = &my_cool_main;
PS: This code can also be put in a library, so it should fit your case.
It even works, when your prgram is run with valgrind - valgrind does not fork a new process, and this results in /proc/self/cmdline showing the original valgrind command-line.
PPS: Keep in mind that during this very early program execution many subsystem are not yet fully initialized - I tried libc I/O routines, they seem to work, but don't rely on it - even gloval variables might not yet be constructed, etc...
In Linux, you can open /proc/self/cmdline (assuming that /proc is present) and parse manually (this is only required if you need argc/argv before main() - e.g. in a global constructor - as otherwise it's better to pass them via global vars).
More solutions are available here: http://blog.linuxgamepublishing.com/2009/10/12/argv-and-argc-and-just-how-to-get-them/
Yeah, it's gross and unportable, but if you are solving practical problems you may not care.
You can copy them into global variables if you want.
I do not think you should do it as the C runtime will prepare the arguments and pass it into the main via int argc, char **argv, do not attempt to manipulate the behaviour by hacking it up as it would largely be unportable or possibly undefined behaviour!! Stick to the rules and you will have portability...no other way of doing it other than breaking it...
You can. Most platforms provide global variables __argc and __argv. But again, I support zneak's comment.
P.S. Use boost::program_options to parse them. Please do not do it any other way in C++.
Is there some reason why passing a pointer to space that is already consumed is so bad? You won't be getting any real savings out of eliminating the argument to the function in question and you could set off an interesting display of fireworks. Skirting around main()'s call stack with creative hackery usually ends up in undefined behavior, or reliance on compiler specific behavior. Both are bad for functionality and portability respectively.
Keep in mind the arguments in question are pointers to arguments, they are going to consume space no matter what you do. The convenience of an index of them is as cheap as sizeof(int), I don't see any reason not to use it.
It sounds like you are optimizing rather aggressively and prematurely, or you are stuck with having to add features into code that you really don't want to mess with. In either case, doing things conventionally will save both time and trouble.

How can I write a generic C function for calling a Win32 function?

To allow access to the Win32 API from a scripting language (written in C), I would like to write a function such as the following:
void Call(LPCSTR DllName, LPCSTR FunctionName,
LPSTR ReturnValue, USHORT ArgumentCount, LPSTR Arguments[])
which will call, generically, any Win32 API function.
(the LPSTR parameters are essentially being used as byte arrays - assume that they have been correctly sized to take the correct data type external to the function. Also I believe that some additional complexity is required to distinguish between pointer and non-pointer arguments but I'm ignoring that for the purposes of this question).
The problem I have is passing the arguments into the Win32 API functions. Because these are stdcall I can't use varargs so the implementation of 'Call' must know about the number of arguments in advance and hence it cannot be generic...
I think I can do this with assembly code (by looping over the arguments, pushing each to the stack) but is this possible in pure C?
Update: I've marked the 'No it is not possible' answer as accepted for now. I will of course change this if a C-based solution comes to light.
Update: ruby/dl looks like it may be implemented using a suitable mechanism. Any details on this would be appreciated.
First things first: You cannot pass a type as a parameter in C. The only option you are left with is macros.
This scheme works with a little modification (array of void * for arguments), provided you are doing a LoadLibrary/GetProcAddress to call Win32 functions. Having a function name string otherwise will be of no use. In C, the only way you call a function is via its name (an identifier) which in most cases decays to a pointer to the function. You also have to take care of casting the return value.
My best bet:
// define a function type to be passed on to the next macro
#define Declare(ret, cc, fn_t, ...) typedef ret (cc *fn_t)(__VA_ARGS__)
// for the time being doesn't work with UNICODE turned on
#define Call(dll, fn, fn_t, ...) do {\
HMODULE lib = LoadLibraryA(dll); \
if (lib) { \
fn_t pfn = (fn_t)GetProcAddress(lib, fn); \
if (pfn) { \
(pfn)(__VA_ARGS__); \
} \
FreeLibrary(lib); \
} \
} while(0)
int main() {
Declare(int, __stdcall, MessageBoxProc, HWND, LPCSTR, LPCSTR, UINT);
Call("user32.dll", "MessageBoxA", MessageBoxProc,
NULL, ((LPCSTR)"?"), ((LPCSTR)"Details"),
(MB_ICONWARNING | MB_CANCELTRYCONTINUE | MB_DEFBUTTON2));
return 0;
}
No, I don't think its possible to do with without writing some assembly. The reason is you need precise control over what is on the stack before you call the target function, and there's no real way to do that in pure C. It is, of course, simple to do in Assembly though.
Also, you're using PCSTR for all of these arguments, which is really just const char *. But since all of these args aren't strings, what you actually want to use for return value and for Arguments[] is void * or LPVOID. This is the type you should use when you don't know the true type of the arguments, rather than casting them to char *.
The other posts are right about the almost certain need for assembly or other non-standard tricks to actually make the call, not to mention all of the details of the actual calling conventions.
Windows DLLs use at least two distinct calling conventions for functions: stdcall and cdecl. You would need to handle both, and might even need to figure out which to use.
One way to deal with this is to use an existing library to encapsulate many of the details. Amazingly, there is one: libffi. An example of its use in a scripting environment is the implementation of Lua Alien, a Lua module that allows interfaces to arbitrary DLLs to be created in pure Lua aside from Alien itself.
A lot of Win32 APIs take pointers to structs with specific layouts. Of these, a large subset follow a common pattern where the first DWORD has to be initialized to have the size of the struct before it is called. Sometimes they require a block of memory to be passed, into which they will write a struct, and the memory block must be of a size that is determined by first calling the same API with a NULL pointer and reading the return value to discover the correct size. Some APIs allocate a struct and return a pointer to it, such that the pointer must be deallocated with a second call.
I wouldn't be that surprised if the set of APIs that can be usefully called in one shot, with individual arguments convertable from a simple string representation, is quite small.
To make this idea generally applicable, we would have to go to quite an extreme:
typedef void DynamicFunction(size_t argumentCount, const wchar_t *arguments[],
size_t maxReturnValueSize, wchar_t *returnValue);
DynamicFunction *GenerateDynamicFunction(const wchar_t *code);
You would pass a simple snippet of code to GenerateDynamicFunction, and it would wrap that code in some standard boilerplate and then invoke a C compiler/linker to make a DLL from it (there are quite a few free options available), containing the function. It would then LoadLibrary that DLL and use GetProcAddress to find the function, and then return it. This would be expensive, but you would do it once and cache the resulting DynamicFunctionPtr for repeated use. You could do this dynamically by keeping pointers in a hashtable, keyed by the code snippets themselves.
The boilerplate might be:
#include <windows.h>
// and anything else that might be handy
void DynamicFunctionWrapper(size_t argumentCount, const wchar_t *arguments[],
size_t maxReturnValueSize, wchar_t *returnValue)
{
// --- insert code snipped here
}
So an example usage of this system would be:
DynamicFunction *getUserName = GenerateDynamicFunction(
"GetUserNameW(returnValue, (LPDWORD)(&maxReturnValueSize))");
wchar_t userName[100];
getUserName(0, NULL, sizeof(userName) / sizeof(wchar_t), userName);
You could enhance this by making GenerateDynamicFunction accept the argument count, so it could generate a check at the start of the wrapper that the correct number of arguments has been passed. And if you put a hashtable in there to cache the functions for each encountered codesnippet, you could get close to your original example. The Call function would take a code snippet instead of just an API name, but would otherwise be the same. It would look up the code snippet in the hashtable, and if not present, it would call GenerateDynamicFunction and store the result in the hashtable for next time. It would then perform the call on the function. Example usage:
wchar_t userName[100];
Call("GetUserNameW(returnValue, (LPDWORD)(&maxReturnValueSize))",
0, NULL, sizeof(userName) / sizeof(wchar_t), userName);
Of course there wouldn't be much point doing any of this unless the idea was to open up some kind of general security hole. e.g. to expose Call as a webservice. The security implications exist for your original idea, but are less apparent simply because the original approach you suggested wouldn't be that effective. The more generally powerful we make it, the more of a security problem it would be.
Update based on comments:
The .NET framework has a feature called p/invoke, which exists precisely to solve your problem. So if you are doing this as a project to learn about stuff, you could look at p/invoke to get an idea of how complex it is. You could possibly target the .NET framework with your scripting language - instead of interpreting scripts in real time, or compiling them to your own bytecode, you could compile them to IL. Or you could host an existing scripting language from the many already available on .NET.
You could try something like this - it works well for win32 API functions:
int CallFunction(int functionPtr, int* stack, int size)
{
if(!stack && size > 0)
return 0;
for(int i = 0; i < size; i++) {
int v = *stack;
__asm {
push v
}
stack++;
}
int r;
FARPROC fp = (FARPROC) functionPtr;
__asm {
call fp
mov dword ptr[r], eax
}
return r;
}
The parameters in the "stack" argument should be in reverse order (as this is the order they are pushed onto the stack).
Having a function like that sounds like a bad idea, but you can try this:
int Call(LPCSTR DllName, LPCSTR FunctionName,
USHORT ArgumentCount, int args[])
{
void STDCALL (*foobar)()=lookupDLL(...);
switch(ArgumentCount) {
/* Note: If these give some compiler errors, you need to cast
each one to a func ptr type with suitable number of arguments. */
case 0: return foobar();
case 1: return foobar(args[0]);
...
}
}
On a 32-bit system, nearly all values fit into a 32-bit word and shorter values are pushed onto stack as 32-bit words for function call arguments, so you should be able to call virtually all Win32 API functions this way, just cast the arguments to int and the return value from int to the appropriate types.
I'm not sure if it will be of interest to you, but an option would be to shell out to RunDll32.exe and have it execute the function call for you. RunDll32 has some limitations and I don't believe you can access the return value whatsoever but if you form the command line arguments properly it will call the function.
Here's a link
First, you should add the size of each argument as an extra parameter. Otherwise, you need to divine the size of each parameter for each function to push onto the stack, which is possible for WinXX functions since they have to be compatible with the parameters they are documented, but tedious.
Secondly, there isn't a "pure C" way to call a function without knowing the arguments except for a varargs function, and there is no constraint on the calling convention used by a function in a .DLL.
Actually, the second part is more important than the first.
In theory, you could set up a preprocessor macro/#include structure to generate all combinations of parameter types up to, say, 11 parameters, but that implies that you know ahead of time which types will be passed through you function Call. Which is kind of crazy if you ask me.
Although, if you really wanted to do this unsafely, you could pass down the C++ mangled name and use UnDecorateSymbolName to extract the types of the parameters. However, that won't work for functions exported with C linkage.

Resources