Safe cross-platform function to get normalized path - c

I'd like to have a standard function that will convert relative paths into absolute ones, and if possible I'd like to make it as cross-platform as possible (so I'd like to avoid calling external library functions). This is intended so it's possible to prevent path exploitations.
I am aware that such a function wouldn't be able to detect symbolic links, but I'm ok with that for my application.
I could roll my own code, but there might be some problems with e.g. how a platform handles encoding or variations of the "../" pattern.
Is there something like that already implemented?

There's not a single, universal function you can call, since there's no such function in the C or C++ standard libraries. On Windows, you can use GetFullPathName. On Linux, Mac OS X, and other *Unix-based systems, you can use the realpath(3) function, which as a bonus also resolves symbolic links along the way.
Beware: Any solution to this is only reliable in a single-threaded program. If you're using multiple threads, another can go out and change the working directory out from under you unexpectedly, changing the path name resolution.

I think the closest you're going to get to platform independence are the POSIX libraries. In particular you'll wanna check out unistd.h which unfortunately I don't believe has a 'normalized' path concept. If I remember correctly the standard itself doesn't even know much about directories much less relative ones.
To get better than that I think you'll need to roll your own path goodies.

Related

How to intercept C library calls in windows?

I have a devilish-gui.exe, a devilish.dll and a devilish.h from a C codebase that has been lost.
devilish-gui is still used from the customer and it uses devilish.dll
devilish.h is poorly documented in a 30-pages pdf: it exposes a few C functions that behave in very different ways according to the values in the structs provided as arguments.
Now, I have to use devilish.dll to write a new devilish-webservice. No, I can't rewrite it.
The documentation is almost useless, but since I have devilish-gui.exe I'd like to write a different implementation of the devilish.h so that it log function's call and arguments in a file, and than calls the original dll function. Something similar to what ltrace does on linux, but specialized for this weird library.
How can I write such "intercepting" dll on windows and inject it between devilish.dll and devilish-gui.exe?
A couple of possibilities:
Use Detours.
If you put your implementation of devilish.dll in the same directory as devilish-gui.exe, and move the real implementation of devilish.dll into a subdirectory, Windows will load your implementation instead of the real one. Your implementation can then forward to the real one. I'm assuming that devilish-gui isn't hardened against search path attacks.
Another approach would be to use IntelliTrace to collect a trace log of all the calls into devilish.dll.

Is there a way to test whether thread safe functions are available in the C standard library?

In regards to the thread safe functions in newer versions of the C standard library, is there a cross-platform way to tell if these are available via pre-processor definition? I am referring to functions such as localtime_r().
If there is not a standard way, what is the reliable way in GCC? [EDIT] Or posix systems with unistd.h?
There is no standard way to test that, which means there is no way to test it across all platforms. Tools like autoconf will create a tiny C program that calls this function and then try to compile and link it. It this works, looks like the function exists, if not, then it may not exist (or the compiler options are wrong and the appropriate CFLAGS need to be set).
So you have basically 6 options:
Require them to exist. Your code can only work on platforms where they exist; period. If they don't exist, compilation will fail, but that is not your problem, since the platform violates your minimum requirements.
Avoid using them. If you use the non-thread safe ones, maybe protected by a global lock (e.g. a mutex), it doesn't matter if they exist or not. Of course your code will then only work on platforms with POSIX mutexes, however, if a platform has no POSIX mutexes, it won't have POSIX threads either and if it has no POSIX threads (and I guess you are probably using POSIX threads w/o supporting any alternative), why would you have to worry about thread-safety in the first place?
Decide at runtime. Depending on the platform, either do a "weak link", so you can test at runtime if the function was found or not (a pointer to the function will point to NULL if it wasn't) or alternatively resolve the symbol dynamically using something like dlsym() (which is also not really portable, but widely supported in the Linux/UNIX world). However, in that case you need a fallback if the function is not found at runtime.
Use a tool like autoconf, some other tool with similar functionality, or your own configuration script to determine this prior to start of compilation (and maybe set preprocessor macros depending on result). In that case you will also need a fallback solution.
Limit usage to well known platforms. Whether this function is available on a certain platform is usually known (and once it is available, it won't go away in the future). Most platforms expose preprocessor macros to test what kind of platform that is and sometimes even which version. E.g. if you know that GNU/Linux, Android, Free/Open/NetBSD, Solaris, iOS and MacOS X all offer this function, test if you are compiling for one of these platforms and if yes, use it. If the code is compiled for another platform (or if you cannot determine what platform that is), it may or may not offer this function, but since you cannot say for sure, better be safe and use the fallback.
Let the user decide. Either always use the fallback, unless the user has signaled support or do it the other way round (which makes probably more sense), always assume it is there and in case compilation fails, offer a way the user can force your code into "compatibility mode", by somehow specifying that thread-safe-functions are not available (e.g. by setting an environment variable or by using a different make target). Of course this is the least convenient method for the (poor) user.

Change library load order at run time (like LD_PRELOAD but during execution)

How do I change the library a function loads from during run time?
For example, say I want to replace the standard printf function with something new, I can write my own version and compile it into a shared library, then put "LD_PRELOAD=/my/library.so" in the environment before running my executable.
But let's say that instead, I want to change that linkage from within the program itself. Surely that must be possible... right?
EDIT
And no, the following doesn't work (but if you can tell me how to MAKE it work, then that would be sufficient).
void* mylib = dlopen("/path/to/library.so",RTLD_NOW);
printf = dlsym(mylib,"printf");
AFAIK, that is not possible. The general rule is that if the same symbol appears in two libraries, ld.so will favor the library that was loaded first. LD_PRELOAD works by making sure the specified libraries are loaded before any implicitly loaded libraries.
So once execution has started, all implicitly loaded libraries will have been loaded and therefore it's too late to load your library before them.
There is no clean solution but it is possible. I see two options:
Overwrite printf function prolog with jump to your replacement function.
It is quite popular solution for function hooking in MS Windows. You can find examples of function hooking by code rewriting in Google.
Rewrite ELF relocation/linkage tables.
See this article on codeproject that does almost exactly what you are asking but only in a scope of dlopen()'ed modules. In your case you want to also edit your main (typically non-PIC) module. I didn't try it, but maybe its as simple as calling provided code with:
void* handle = dlopen(NULL, RTLD_LAZY);
void* original;
original = elf_hook(argv[0], LIBRARY_ADDRESS_BY_HANDLE(handle), printf, my_printf);
If that fails you'll have to read source of your dynamic linker to figure out what needs to be adapted.
It should be said that trying to replace functions from the libc in your application has undefined behavior as per ISO C/POSIX, regardless of whether you do it statically or dynamically. It may work (and largely will work on GNU/Linux), but it's unwise to rely on it working. If you just want to use the name "printf" but have it do something nonstandard in your program, the best way to do this is to #undef printf and #define printf my_printf AFTER including any system headers. This way you don't interfere with any internal use of the function by libraries you're using...and your implementation of my_printf can even call the system printf if/when it needs to.
On the other hand, if your goal is to interfere with what libraries are doing, somewhere down the line you're probably going to run into compatibility issues. A better approach would probably be figuring out why the library won't do what you want without redefining the functions it uses, patching it, and submitting patches upstream if they're appropriate.
You can't change that. In general *NIX linking concept (or rather lack of concept) symbol is picked from first object where it is found. (Except for oddball AIX which works more like OS/2 by default.)
Programmatically you can always try dlsym(RTLD_DEFAULT) and dlsym(RTLD_NEXT). man dlsym for more. Though it gets out of hand quite quickly. Why is rarely used.
there is an environment variable LD_LIBRARY_PATH where the linker searches for shred libraries, prepend your path to LD_LIBRARY_PATH, i hope that would work
Store the dlsym() result in a lookup table (array, hash table, etc). Then #undef print and #define print to use your lookup table version.

How to walk a directory in C

I am using glib in my application, and I see there are convenience wrappers in glib for C's remove, unlink and rmdir. But these only work on a single file or directory at a time.
As far as I can see, neither the C standard nor glib include any sort of recursive directory walk functionality. Nor do I see any specific way to delete an entire directory tree at once, as with rm -rf.
For what I'm doing this I'm not worried about any complications like permissions, symlinks back up the tree (infinite recursion), or anything that would rule out a very naive
implementation... so I am not averse to writing my own function for it.
However, I'm curious if this functionality is out there somewhere in the standard libraries gtk or glib (or in some other easily reused C library) already and I just haven't stumbled on it. Googling this topic generates a lot of false leads.
Otherwise my plan is to use this type of algorithm:
dir_walk(char* path, void* callback(char*) {
if(is_dir(path) && has_entries(path)) {
entries = get_entries(path);
for(entry in intries) { dir_walk(entry, callback); }
}
else { callback(path) }
}
dir_walk("/home/user/trash", remove);
Obviously I would build in some error handling and the like to abort the process as soon as a fatal error is encountered.
Have you looked at <dirent.h>? AFAIK this belongs to the POSIX specification, which should be part of the standard library of most, if not all C compilers. See e.g. this <dirent.h> reference (Single UNIX specification Version 2 by the Open Group).
P.S., before someone comments on this: No, this does not offer recursive directory traversal. But then I think this is best implemented by the developer; requirements can differ quite a lot, so one-size-fits-all recursive traversal function would have to be very powerful. (E.g.: Are symlinks followed up? Should recursion depth be limited? etc.)
You can use GFileEnumerator if you want to do it with glib.
Several platforms include ftw and nftw: "(new) file tree walk". Checking the man page on an imac shows that these are legacy, and new users should prefer fts. Portability may be an issue with either of these choices.
Standard C libraries are meant to provide primitive functionality. What you are talking about is composite behavior. You can easily implement it using the low level features present in your API of choice -- take a look at this tutorial.
Note that the "convenience wrappers" you mention for remove(), unlink() and rmdir(), assuming you mean the ones declared in <glib/gstdio.h>, are not really "convenience wrappers". What is the convenience in prefixing totally standard functions with a "g_"? (And note that I say this even if I who introduced them in the first place.)
The only reason these wrappers exist is for file name issues on Windows, where these wrappers actually consist of real code; they take file name arguments in Unicode, encoded in UTF-8. The corresponding "unwrapped" Microsoft C library functions take file names in system codepage.
If you aren't specifically writing code intended to be portable to Windows, there is no reason to use the g_remove() etc wrappers.

what is the easiest way to lookup function names of a c binary in a cross-platform manner?

I want to write a small utility to call arbitrary functions from a C shared library. User should be able to list all the exported functions similar to what objdump or nm does. I checked these utilities' source but they are intimidating. Couldn't find enough information on google, if dl library has this functionality either.
(Clarification edit: I don't want to just call a function which is known beforehand. I will appreciate an example fragment along your answer.)
This might be near to what you're looking for:
http://python.net/crew/theller/ctypes/
Well, I'll speak a little bit about Windows. The C functions exported from DLLs do not contain information about the types, names, or number of arguments -- nor do I believe you can determine what the calling convention is for a given function.
For comparison, take a look at National Instrument's LabVIEW programming environment. You can import functions from DLLs, but you have to manually type in the type and names of the arguments before you use a given function. If this limitation is OK, please edit your question to reflect that.
I don't know what is possible with *nix environments.
EDIT: Regarding your clarification. If you don't know what the function is ahead of time, you're pretty screwed on Windows because in general you won't be able to determine what the number and types of arguments the functions take.
You could try ParaDyn's SymtabAPI. It lets you grab all the symbols in a shared library (or executable) and look at their types, offset, etc. It's all wrapped up in a reasonably nice C++ interface and runs on a lot of platforms. It also provides support for binary rewriting, which you could potentially use to do what you're talking about at runtime.
Webpage is here:
http://www.paradyn.org/html/symtab2.1-features.html
Documentation is here:
http://ftp.cs.wisc.edu/paradyn/releases/release5.2/doc/symtabProgGuide.21.pdf
A standard-ish API is the dlopen/dlsym API; AFAIK it's implemented by GNU libc on Linux and Mac OS X's standard C library (libSystem), and it might be implemented on Windows by MinGW or other compatibility packages.
Only sensible solution (without reinventing the wheel) seems to use libbfd. Downsides are its documentation is scarce and it is a bit bloated for my purposes.
The source code for nm and objdump are available. If you want to start from specification then ELF is what you want to look into.
/Allan
I've written something like this in Perl. On Win32 it runs dumpbin /exports, on POSIX it runs nm -gP. Then, since it's Perl, the results are interpreted using regular expressions: / _(\S+)#\d+/ for Win32 (stdcall functions) and /^(\S+) T/ for POSIX.
Eek! You've touched on one of the very platform-dependent topics of programming. On windows, you have DLLs, on linux, you have ld.so, ld-linux.so, and mac os x's dyld.

Resources