Where is the implementation of the GNU C library? - c

I'm looking through the glib header files that reside in /usr/include to get a feel for what is going on behind the scenes. All the files I'm looking at simply declare a bunch of macros and functions but I want to take a look at the implementation of these functions.

The glibc source repository is here:
https://sourceware.org/git/?p=glibc.git;a=tree
Note that a lot of the interesting code is under the sysdeps directory, particularly sysdeps/unix/sysv/linux/*. Also worth noting is that stdio is split between stdio-common and libio, and all of the POSIX threads interfaces are implemented under nptl (which also has its own sysdeps tree.
Further, note that there are a lot of functions for which you will simply not find source code at all. Many of the standard functions are simply entry points for making calls to the kernel (syscalls), and these wrappers are automatically generated as part of the build process.

The readable form of the implementation of the functions within the GLibC is contained within its source code, downloadable from its website.
Note that some of the functions are stubs that delegate to system calls, and the complete implementation will be found within the source code of your operating system.

Related

Where can I see the source code of malloc() or any library function in my windows(xp) PC?

I know that, when we call any library function in our source code, The function definitions will be loaded into RAM (assuming dynamic linking) at run time.
But where exactly the definitions of library functions stored.
If they are not in .c format, how they are stored??
If you need to get any function definition, you need to check the source code [That was obvious].
To get the function definitions which are part of a library, [ex - glibc], you've to get the source code of the library and browse through that. Usually, the library source codes, [.c format, if you mean] will be compiled to produce a library, either
static [usually, noted by .a]
dynamic [Usually, noted by .so, shared object]
to be linked with some source code to produce the final binary.
So, yes, they are in .c format (least, human readable format, I better say) which you can browse through.
Note: An online browsable version of glibc.
P.S - Sorry, if my answer is biased towards linux implementations however, it is still valid for windows(xp) PC
The header file contain the definition. Inside the header file named alloc.h, we can find that header file in the folder include. you have to specify the environment you are using.it is saved with extention. .h
You can find an example Windows implementation of malloc here. On Windows, it's mostly a wrapper for WinAPI functions such as HeapAlloc. You can find other implementations of this and other functions in various opensource libraries.
Note that on Windows, a compiler doesn't have to provide implementations for the standard C functions, as they are all available in msvcrt.dll. You can't get the source code of these implementations, but you can still disassemble the DLL and look at the assembly.

How to intercept C library calls in windows?

I have a devilish-gui.exe, a devilish.dll and a devilish.h from a C codebase that has been lost.
devilish-gui is still used from the customer and it uses devilish.dll
devilish.h is poorly documented in a 30-pages pdf: it exposes a few C functions that behave in very different ways according to the values in the structs provided as arguments.
Now, I have to use devilish.dll to write a new devilish-webservice. No, I can't rewrite it.
The documentation is almost useless, but since I have devilish-gui.exe I'd like to write a different implementation of the devilish.h so that it log function's call and arguments in a file, and than calls the original dll function. Something similar to what ltrace does on linux, but specialized for this weird library.
How can I write such "intercepting" dll on windows and inject it between devilish.dll and devilish-gui.exe?
A couple of possibilities:
Use Detours.
If you put your implementation of devilish.dll in the same directory as devilish-gui.exe, and move the real implementation of devilish.dll into a subdirectory, Windows will load your implementation instead of the real one. Your implementation can then forward to the real one. I'm assuming that devilish-gui isn't hardened against search path attacks.
Another approach would be to use IntelliTrace to collect a trace log of all the calls into devilish.dll.

What is GLIBC? What is it used for?

I was searching for the source code of the C standard libraries. What I mean with it is, for example, how are cos, abs, printf, scanf, fopen, and all the other standard C functions written, I mean to see their source code.
So while searching for this, I came across with GLIBC, but I don't know what it actually is. It is GNU C Library, and it contains some source codes, but what are they actually, are they the source code of the standard functions or are they something else? And what is it used for?
Its the implementation of Standard C library described in C standards plus some extra useful stuffs which are not strictly standard but used frequently.
Its main contents are :
1) C library described in ANSI,c99,c11 standards. It includes macros, symbols, function implementations etc.(printf(),malloc() etc)
2) POSIX standard library. The "userland" glue of system calls. (open(),read() etc. Actually glibc does not "implement" system calls. kernel does it. But glibc provides the user land interface to the services provided by kernel so that user application can use a system call just like a ordinary function.
3) Also some nonstandard but useful stuff.
"use the force, read the source "
$git clone git://sourceware.org/git/glibc.git
(I was recently pretty enlightened when i looked through malloc.c in glibc)
There are several implementations of the standard. Glibc is the implementation that most Linuxes use, but there are others. Glibc also contains (as Aftnix states) the glue functions which set up the scene for jumps into the kernel (also known as system calls). So many of glibc's 'functions' don't do the actual work but only delegate to the kernel.
To read the source of Glibc, just google for it. There are myriad sites which carry it, and also several variations.
Windows uses Microsoft's own implementation, which I believe is called MSVCR.DLL. I doubt that you will find the source code to that library anywhere. Also note that some functions which a Linux hacker might think of as 'standard', simply don't exist on Windows (notably fork). The reverse is also true.
Other systems will have their own libc.
The glibc package contains standard libraries which are used by multiple programs on the system. In order to save disk space and memory, as well as to make upgrading easier, common system code iskept in one place and shared between programs. This particular package contains the most important sets of shared libraries: the standard C library and the standard math library. Without these two libraries, a Linux system will not function. The glibc package also contains national language (locale) support.
Yes, It's the implementation of standard library functions.
More specifically, it is the implementation for all GNU systems and in almost all *NIX systems that use the Linux kernel.
Here are a few "hands-on" points of view:
it implements the POSIX C API on top of the Linux kernel: What is the meaning of "POSIX"?
it contains several assembly hand-optimized versions of ANSI C functions for several different architectures, e.g. strlen:
sysdeps/x86_64/strlen.S
sysdeps/aarch64/strlen.S
how to modify its source, recompile and use it understand it better: How to compile my own glibc C standard library from source and use it?
how to GDB step debug it with QEMU and Buildroot: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/9693c23fe6b2ae1409010a1a29ff0c1b7bd4b39e#gdbserver-libc

How to walk a directory in C

I am using glib in my application, and I see there are convenience wrappers in glib for C's remove, unlink and rmdir. But these only work on a single file or directory at a time.
As far as I can see, neither the C standard nor glib include any sort of recursive directory walk functionality. Nor do I see any specific way to delete an entire directory tree at once, as with rm -rf.
For what I'm doing this I'm not worried about any complications like permissions, symlinks back up the tree (infinite recursion), or anything that would rule out a very naive
implementation... so I am not averse to writing my own function for it.
However, I'm curious if this functionality is out there somewhere in the standard libraries gtk or glib (or in some other easily reused C library) already and I just haven't stumbled on it. Googling this topic generates a lot of false leads.
Otherwise my plan is to use this type of algorithm:
dir_walk(char* path, void* callback(char*) {
if(is_dir(path) && has_entries(path)) {
entries = get_entries(path);
for(entry in intries) { dir_walk(entry, callback); }
}
else { callback(path) }
}
dir_walk("/home/user/trash", remove);
Obviously I would build in some error handling and the like to abort the process as soon as a fatal error is encountered.
Have you looked at <dirent.h>? AFAIK this belongs to the POSIX specification, which should be part of the standard library of most, if not all C compilers. See e.g. this <dirent.h> reference (Single UNIX specification Version 2 by the Open Group).
P.S., before someone comments on this: No, this does not offer recursive directory traversal. But then I think this is best implemented by the developer; requirements can differ quite a lot, so one-size-fits-all recursive traversal function would have to be very powerful. (E.g.: Are symlinks followed up? Should recursion depth be limited? etc.)
You can use GFileEnumerator if you want to do it with glib.
Several platforms include ftw and nftw: "(new) file tree walk". Checking the man page on an imac shows that these are legacy, and new users should prefer fts. Portability may be an issue with either of these choices.
Standard C libraries are meant to provide primitive functionality. What you are talking about is composite behavior. You can easily implement it using the low level features present in your API of choice -- take a look at this tutorial.
Note that the "convenience wrappers" you mention for remove(), unlink() and rmdir(), assuming you mean the ones declared in <glib/gstdio.h>, are not really "convenience wrappers". What is the convenience in prefixing totally standard functions with a "g_"? (And note that I say this even if I who introduced them in the first place.)
The only reason these wrappers exist is for file name issues on Windows, where these wrappers actually consist of real code; they take file name arguments in Unicode, encoded in UTF-8. The corresponding "unwrapped" Microsoft C library functions take file names in system codepage.
If you aren't specifically writing code intended to be portable to Windows, there is no reason to use the g_remove() etc wrappers.

what is the easiest way to lookup function names of a c binary in a cross-platform manner?

I want to write a small utility to call arbitrary functions from a C shared library. User should be able to list all the exported functions similar to what objdump or nm does. I checked these utilities' source but they are intimidating. Couldn't find enough information on google, if dl library has this functionality either.
(Clarification edit: I don't want to just call a function which is known beforehand. I will appreciate an example fragment along your answer.)
This might be near to what you're looking for:
http://python.net/crew/theller/ctypes/
Well, I'll speak a little bit about Windows. The C functions exported from DLLs do not contain information about the types, names, or number of arguments -- nor do I believe you can determine what the calling convention is for a given function.
For comparison, take a look at National Instrument's LabVIEW programming environment. You can import functions from DLLs, but you have to manually type in the type and names of the arguments before you use a given function. If this limitation is OK, please edit your question to reflect that.
I don't know what is possible with *nix environments.
EDIT: Regarding your clarification. If you don't know what the function is ahead of time, you're pretty screwed on Windows because in general you won't be able to determine what the number and types of arguments the functions take.
You could try ParaDyn's SymtabAPI. It lets you grab all the symbols in a shared library (or executable) and look at their types, offset, etc. It's all wrapped up in a reasonably nice C++ interface and runs on a lot of platforms. It also provides support for binary rewriting, which you could potentially use to do what you're talking about at runtime.
Webpage is here:
http://www.paradyn.org/html/symtab2.1-features.html
Documentation is here:
http://ftp.cs.wisc.edu/paradyn/releases/release5.2/doc/symtabProgGuide.21.pdf
A standard-ish API is the dlopen/dlsym API; AFAIK it's implemented by GNU libc on Linux and Mac OS X's standard C library (libSystem), and it might be implemented on Windows by MinGW or other compatibility packages.
Only sensible solution (without reinventing the wheel) seems to use libbfd. Downsides are its documentation is scarce and it is a bit bloated for my purposes.
The source code for nm and objdump are available. If you want to start from specification then ELF is what you want to look into.
/Allan
I've written something like this in Perl. On Win32 it runs dumpbin /exports, on POSIX it runs nm -gP. Then, since it's Perl, the results are interpreted using regular expressions: / _(\S+)#\d+/ for Win32 (stdcall functions) and /^(\S+) T/ for POSIX.
Eek! You've touched on one of the very platform-dependent topics of programming. On windows, you have DLLs, on linux, you have ld.so, ld-linux.so, and mac os x's dyld.

Resources