How to intercept C library calls in windows? - c

I have a devilish-gui.exe, a devilish.dll and a devilish.h from a C codebase that has been lost.
devilish-gui is still used from the customer and it uses devilish.dll
devilish.h is poorly documented in a 30-pages pdf: it exposes a few C functions that behave in very different ways according to the values in the structs provided as arguments.
Now, I have to use devilish.dll to write a new devilish-webservice. No, I can't rewrite it.
The documentation is almost useless, but since I have devilish-gui.exe I'd like to write a different implementation of the devilish.h so that it log function's call and arguments in a file, and than calls the original dll function. Something similar to what ltrace does on linux, but specialized for this weird library.
How can I write such "intercepting" dll on windows and inject it between devilish.dll and devilish-gui.exe?

A couple of possibilities:
Use Detours.
If you put your implementation of devilish.dll in the same directory as devilish-gui.exe, and move the real implementation of devilish.dll into a subdirectory, Windows will load your implementation instead of the real one. Your implementation can then forward to the real one. I'm assuming that devilish-gui isn't hardened against search path attacks.
Another approach would be to use IntelliTrace to collect a trace log of all the calls into devilish.dll.

Related

Prevent external calls to functions inside lib file

Is there a reliable way to prevent external code from calling inner functions of a lib that was compiled from C code?
I would like to deliver a static library with an API header file. The library has different modules, consisting of .c and .h files. I would like to prevent the recepients from using functions declared in the inner .h files.
Is this possible?
Thanks!
Is there a reliable way to prevent external code from calling inner functions of a lib ?
No, there cannot be (read about Rice's theorem; detecting statically such non-trivial properties is undecidable). The library or the code might use function pointers. A malicious user could play with function pointers and pointer arithmetic to call some private function (perhaps after having reverse-engineered your code), even if it is static.
On Linux you might play with visibility tricks.
Or you could organize your library as a single translation unit (a bit like sqlite is doing its amalgamation) and have all internal functions be static ...
In general, the library should have naming conventions about its internal functions (e.g. suffix all of them with _). This could be practically helpful (but not against malicious users).
Most importantly, a library should be well documented (with naming conventions being also documented), and a serious user will only use documented functions the way they are documented to be useful.
(so I don't think you should care about internal functions being called; you do need to document which public functions can be called, and how, and when...; a user calling anything else should expect undefined behavior, that is very bad things)
I would like to deliver a static library with an APIheader file, and would like to prevent the recepients from using the structs I define and the inner functions.
I am not sure that (at least on Linux) delivering a static library is wise. I would recommend delivering a shared library, read Drepper's How to Write Shared Libraries.
And you can't prevent the recipient (assuming a malicious, clever, and determined one) to use inner functions and internal struct-s. You should just discourage them in the documentation, and document well your public functions and data types.
I would like to prevent the recepients from using functions declared in the inner .h files. Is this possible?
No, that is impossible.
It looks like you seek a technical solution to a social issue. You need to trust your users (and they need to trust you), so you should document what functions can be used (and you could even add in your documentation some sentence saying that using directly any undocumented function yields undefined behavior). You can't do much more. Perhaps (in particular if you are selling your library as a proprietary software) you need a lawyer to write a good contract.
You might consider writing your own GCC plugin (or GCC MELT extension) to detect such calls. That could take you weeks of work and is not worth the trouble (and will remain imperfect).
I am not able to guess your motivations and use case (is it some life-critical software driving a nuclear reactor, a medical drug injector, an autonomous vehicule, a missile?). Please explain what would happen to you if some (malicious but clever) user would call an internal undocumented function. And what could happen to that user?

Redirect posix file calls in C

We have a "library" (a selection of code we would rather not change) that is written from the perspective that it has access to 2 files directly. It uses "open", "read" and "seek" posix calls directly on a file descriptor.
However, now we have a proprietary file system that cannot be accessed through standard IO calls. Seeing that we do not want to re-write the code, it would be great if we could redirect the IO calls to known functions that could then be used as an interface.
Is there any way of changing the calls used above so that the "read" and "seek" can be over-written with new function calls?
Thanks.
When you say you don't want to change the library code, do you mean you want to use existing binary code, or just source? If you have the source and can recompile, I would simply pass -Dread=my_read -Dopen=my_open etc. to the compiler when building the library, and then provide your own my_read etc. functions.
One thing you can try is library function interposition.
In addition to already mentioned function interposition and renaming function calls using a macro, another Linux-only option is to use Filesystem in Userspace. This way you can make your proprietary filesystem accessible to other applications which use the standard POSIX filesystem API. FUSE hello world example is surprisingly short.

Safe cross-platform function to get normalized path

I'd like to have a standard function that will convert relative paths into absolute ones, and if possible I'd like to make it as cross-platform as possible (so I'd like to avoid calling external library functions). This is intended so it's possible to prevent path exploitations.
I am aware that such a function wouldn't be able to detect symbolic links, but I'm ok with that for my application.
I could roll my own code, but there might be some problems with e.g. how a platform handles encoding or variations of the "../" pattern.
Is there something like that already implemented?
There's not a single, universal function you can call, since there's no such function in the C or C++ standard libraries. On Windows, you can use GetFullPathName. On Linux, Mac OS X, and other *Unix-based systems, you can use the realpath(3) function, which as a bonus also resolves symbolic links along the way.
Beware: Any solution to this is only reliable in a single-threaded program. If you're using multiple threads, another can go out and change the working directory out from under you unexpectedly, changing the path name resolution.
I think the closest you're going to get to platform independence are the POSIX libraries. In particular you'll wanna check out unistd.h which unfortunately I don't believe has a 'normalized' path concept. If I remember correctly the standard itself doesn't even know much about directories much less relative ones.
To get better than that I think you'll need to roll your own path goodies.

Some general C questions

I am trying to fully understand the process pro writing code in some language to execution by OS. In my case, the language would be C and the OS would be Windows. So far, I read many different articles, but I am not sure, whether I understand the process right, and I would like to ask you if you know some good articles on some subjects I couldnĀ“t find.
So, what I think I know about C (and basically other languages):
C compiler itself handles only data types, basic math operations, pointers operations, and work with functions. By work with functions I mean how to pass argument to it, and how to get output from function. During compilation, function call is replaced by passing arguments to stack, and than if function is not inline, its call is replaced by some symbol for linker. Linker than find the function definition, and replace the symbol to jump adress to that function (and of course than jump back to program).
If the above is generally true and I get it right, where to final .exe file actually linker saves the functions? After the main() function? And what creates the .exe header? Compiler or Linker?
Now, additional capabilities of C, today known as C standart library is set of functions and the declarations of them, that other programmers wrote to extend and simplify use of C language. But these functions like printf() were (or could be?) written in different language, or assembler. And there comes my next question, can be, for example printf() function be written in pure C without use of assembler?
I know this is quite big question, but I just mostly want to know, wheather I am right or not. And trust me, I read a lots of articles on the web, and I would not ask you, If I could find these infromation together on one place, in one article. Insted I must piece by piece gather informations, so I am not sure if I am right. Thanks.
I think that you're exposed to some information that is less relevant as a beginning C programmer and that might be confusing you - part of the goal of using a higher level language like this is to not have to initially think about how this process works. Over time, however, it is important to understand the process. I think you generally have the right understanding of it.
The C compiler merely takes C code and generates object files that contain machine language. Most of the object file is taken by the content of the functions. A simple function call in C, for example, would be represented in the compiled form as low level operators to push things into the stack, change the instruction pointer, etc.
The C library and any other libraries you would use are already available in this compiled form.
The linker is the thing that combines all the relevant object files, resolves all the dependencies (e.g., one object file calling a function in the standard library), and then creates the executable.
As for the language libraries are written in: Think of every function as a black box. As long as the black box has a standard interface (the C calling convention; that is, it takes arguments in a certain way, returns values in a certain way, etc.), how it is written internally doesn't matter. Most typically, the functions would be written in C or directly in assembly. By the time they make it into an object file (or as a compiled library), it doesn't really matter how they were initially created, what matters is that they are now in the compiled machine form.
The format of an executable depends on the operating system, but much of the body of the executable in windows is very similar to that of the object files. Imagine as if someone merged together all the object files and then added some glue. The glue does loading related stuff and then invokes the main(). When I was a kid, for example, people got a kick out of "changing the glue" to add another function before the main() that would display a splash screen with their name.
One thing to note, though is that regardless of the language you use, eventually you have to make use of operating system services. For example, to display stuff on the screen, to manage processes, etc. Most operating systems have an API that is also callable in a similar way, but its contents are not included in your EXE. For example, when you run your browser, it is an executable, but at some point there is a call to the Windows API to create a window or to load a font. If this was part of your EXE, your EXE would be huge. So even in your executable, there are "missing references". Usually, these are addressed at load time or run time, depending on the operating system.
I am a new user and this system does not allow me to post more than one link. To get around that restriction, I have posted some idea at my blog http://zhinkaas.blogspot.com/2010/04/how-does-c-program-work.html. It took me some time to get all links, but in totality, those should get you started.
The compiler is responsible for translating all your functions written in C into assembly, which it saves in the object file (DLL or EXE, for example). So, if you write a .c file that has a main function and a few other function, the compiler will translate all of those into assembly and save them together in the EXE file. Then, when you run the file, the loader (which is part of the OS) knows to start running the main function first. Otherwise, the main function is just like any other function for the compiler.
The linker is responsible for resolving any references between functions and variables in one object file with the references in other files. For example, if you call printf(), since you do not define the function printf() yourself, the linker is responsible for making sure that the call to printf() goes to the right system library where printf() is defined. This is done at compile-time.
printf() is indeed be written in pure C. What it does is call a system call in the OS which knows how to actually send characters to the standard output (like a window terminal). When you call printf() in your program, at compile time, the linker is responsible for linking your call to the printf() function in the standard C libraries. When the function is passed at run-time, printf() formats the arguments properly and then calls the appropriate OS system call to actually display the characters.

How to walk a directory in C

I am using glib in my application, and I see there are convenience wrappers in glib for C's remove, unlink and rmdir. But these only work on a single file or directory at a time.
As far as I can see, neither the C standard nor glib include any sort of recursive directory walk functionality. Nor do I see any specific way to delete an entire directory tree at once, as with rm -rf.
For what I'm doing this I'm not worried about any complications like permissions, symlinks back up the tree (infinite recursion), or anything that would rule out a very naive
implementation... so I am not averse to writing my own function for it.
However, I'm curious if this functionality is out there somewhere in the standard libraries gtk or glib (or in some other easily reused C library) already and I just haven't stumbled on it. Googling this topic generates a lot of false leads.
Otherwise my plan is to use this type of algorithm:
dir_walk(char* path, void* callback(char*) {
if(is_dir(path) && has_entries(path)) {
entries = get_entries(path);
for(entry in intries) { dir_walk(entry, callback); }
}
else { callback(path) }
}
dir_walk("/home/user/trash", remove);
Obviously I would build in some error handling and the like to abort the process as soon as a fatal error is encountered.
Have you looked at <dirent.h>? AFAIK this belongs to the POSIX specification, which should be part of the standard library of most, if not all C compilers. See e.g. this <dirent.h> reference (Single UNIX specification Version 2 by the Open Group).
P.S., before someone comments on this: No, this does not offer recursive directory traversal. But then I think this is best implemented by the developer; requirements can differ quite a lot, so one-size-fits-all recursive traversal function would have to be very powerful. (E.g.: Are symlinks followed up? Should recursion depth be limited? etc.)
You can use GFileEnumerator if you want to do it with glib.
Several platforms include ftw and nftw: "(new) file tree walk". Checking the man page on an imac shows that these are legacy, and new users should prefer fts. Portability may be an issue with either of these choices.
Standard C libraries are meant to provide primitive functionality. What you are talking about is composite behavior. You can easily implement it using the low level features present in your API of choice -- take a look at this tutorial.
Note that the "convenience wrappers" you mention for remove(), unlink() and rmdir(), assuming you mean the ones declared in <glib/gstdio.h>, are not really "convenience wrappers". What is the convenience in prefixing totally standard functions with a "g_"? (And note that I say this even if I who introduced them in the first place.)
The only reason these wrappers exist is for file name issues on Windows, where these wrappers actually consist of real code; they take file name arguments in Unicode, encoded in UTF-8. The corresponding "unwrapped" Microsoft C library functions take file names in system codepage.
If you aren't specifically writing code intended to be portable to Windows, there is no reason to use the g_remove() etc wrappers.

Resources