Is there a way to access debug symbols at run time? - c

Here's some example code to give an idea of what I want.
int regular_function(void)
{
int x,y,z;
/** do some stuff **/
my_api_call();
return x;
}
...
void my_api_call(void)
{
char* caller = get_caller_file();
int line = get_caller_line();
printf("I was called from %s:%d\n", caller, line);
}
Is there a way to implement the get_caller_file() and get_caller_line()? I've seen/used tricks like #defineing my_api_call as a function call passing in the __FILE__ and __LINE__ macros. I was wondering if there was a way to access that information (assuming it's present) at run time instead of compile time? Wouldn't something like Valgrind have to do something like this in order to get the information it returns?

If you have compiled your binary with debug symbols, you may access it using special libraries, like libdwarf for DWARF debug format.

This is highly environment-specific. In most Windows and Linux implementations where debug symbols are provided, the tool vendor provides or documents a way of doing that. For a better answer, provide implementation specifics.

Debugging symbols, if available, need to be stored somewhere so the debugger can get at them. They may or may not be stored in the executable file itself.
You may or may not know the executable file name (argv[0] is not required to have the full path of the program name, or indeed have any useful information in it - see here for details).
Even if you could locate the debugging symbols, you would have to decode them to try and figure out where you were called from.
And your code may be optimised to the point where the information is useless.
That's the long answer. The short answer is that you should probably rely on passing in __FILE__ and __LINE__ as you have been. It's far more portable an reliable.

Related

Issue preventing GCC from optimizing out global variable

I am using ARM-GCC v4.9 (released 2015-06-23) for a STM32F105RC processor.
I've searched stackoverflow.com and I've found this in order to try to convince gcc not to optimize out a global variable, as you may see below:
static const char AppVersion[] __attribute__((used)) = "v3.05/10.oct.2015";
Yet, to my real surprise, the compiler optimized away the AppVersion variable!
BTW: I am using the optimize level -O0 (default).
I also tried using volatile keyword (as suggested on other thread), but it didn't work either :(
I already tried (void)AppVersion; but it doesn't work...
Smart compiler!? Too smart I suppose...
In the meantime, I use a printf(AppVersion); some place in my code, just to be able to keep the version... But this is a boorish solution :(
So, the question is: Is there any other trick that does the job, i.e. keep the version from being optimized away by GCC?
[EDIT]:
I also tried like this (i.e. without static):
const char AppVersion[] __attribute__((used)) = "v3.05/10.oct.2015";
... and it didn't work either :(
Unfortunately I am not aware of a pragma to do this.
There is however another solution. Change AppVersion to:
static char * AppVersion = "v3.05/10.oct.2015";
and add:
__asm__ ("" : : "" (AppVersion));
to your main function.
You see I dropped the 'used' attribute, according to the documentation this is a function attribute.
Other solutions: Does gcc have any options to add version info in ELF binary file?
Though I found this one to be the easiest. This basically won't let the compiler and linker remove AppVersion since we told it that this piece of inline assembly uses it, even though we don't actually insert any inline assembly.
Hopefully that will be satisfactory to you.
Author: Andre Simoes Dias Vieira
Original link: https://answers.launchpad.net/gcc-arm-embedded/+question/280104
Given the presence of "static", all your declaration does is ask the compiler to include the bytes representing characters of the string "v3.05/10.oct.2015" in
some order at some arbitrary location within the file, but not bother to tell
anyone where it put them. Given that the compiler could legitimately write
that sequence of bytes somewhere in the code image file whether or not it
appeared anywhere in the code such a declaration really isn't very useful. To
be sure, it would be unlikely that such a sequence would appear in the code
entirely by chance, and so scanning the binary image for it might be a somewhat
reliable way to determine that it appeared in the code, but in general it's
much better to have some means of affirmatively determining where the string
may be found.
If the string isn't declared static, then the compiler is required to tell the
linker where it is. Since the linker generally outputs the names and
addresses of all symbols in a variety of places including symbol tables,
debug-information files, etc. which may be used in a variety of ways that the
linker knows nothing about, it may be able to tell that a symbol isn't used
within the code, but can generally have no clue about whether some other
utility may be expecting to find it in the symbol table and make use of it. A directive saying the symbol is "used" will tell the linker that even though it doesn't know of anything that's interested in that symbol, something out in the larger universe the linker knows nothing about is interested in it.
It's typical for each compilation unit to give a blob of information to the
linker and say "Here's some stuff; I need a symbol for the start of it, but
I can compute all the addresses of all the internals from that". The linker
has no way of knowing which parts of such a blob are actually used, so it
has no choice but to accept the whole thing verbatim. If the compiler were
to include unused static declarations in its blob, they'd make it through
to the output file. On the other hand, the compiler knows that if it doesn't
export a symbol for something within that blob, nobody else downstream would
be able to find it whether or not the object was included; thus, there would
typically be little benefit to being able to include such a blob and compiler writers generally have to reason to provide a feature to force such inclusion.
It seems that using a custom section also works.
Instead of
__attribute__((used))
try with
__attribute__((section(".your.section.name.here")))
The linker won't touch it, nor will the strip command.

Is it possible to get the signature of a function in a shared library programmatically?

The title is clear, we can loaded a library by dl_open etc..
But how can I get the signature of functions in it?
This answer cannot be answered in general. Technically if you compiled your executable with exhaustive debugging information (code may still be an optimized, release version), then the executable will contain extra sections, providing some kind of reflectivity of the binary. On *nix systems (you referred to dl_open) this is implemented through DWARF debugging data in extra sections of the ELF binary. Similar it works for Mach Universal Binaries on MacOS X.
Windows PEs however uses a completely different format, so unfortunately DWARF is not truley cross plattform (actually in the early development stages of my 3D engine I implemented an ELF/DWARF loader for Windows, so that I could use a common format for the engines various modules, so with some serious effort such can be done).
If you don't want to go into implementing your own loaders, or debugging information accessors, then you may embed the reflection information through some extra symbols exported (by some standard naming scheme) which refer to a table of function names, mapping to their signature. In the case of C source files writing a parser to extract the information from the source file itself is rather trivial. C++ OTOH is so notoriously difficult to parse correctly, that you need some fully fledged compiler to get it right. For this purpose GCCXML was developed, technically a GCC that emits the AST in XML form instead of an object binary. The emitted XML then is much easier to parse.
From the extracted information create a source file with some kind of linked list/array/etc. structure describing each function. If you don't directly export each function's symbol but instead initialize some field in the reflection structure with the function pointer you got a really nice and clean annotated exporting scheme. Technically you could place this information in a spearate section of the binary as well, but putting it in the read only data section does the job as well, too.
However if you're given a 3rd party binary – say worst case scenario it has been compiled from C source, no debugging information and all symbols not externally referenced stripped – you're pretty much screwed. The best you could do, was applying some binary analysis of the way the function accesses the various places in which parameters can be passed.
This will only tell you the number of parameters and the size of each parameter value, but not the type or name/meaning. When reverse engineering some program (e.g. malware analysis or security audit), identifying the type and meaning of the parameters passed to functions is one of the major efforts. Recently I came across some driver I had to reverse for debugging purposes, and you cannot believe how astounded I was by the fact that I found C++ symbols in a Linux kernel module (you can't use C++ in the Linux kernel in a sane way), but also relieved, because the C++ name mangling provided me with plenty information.
On Linux (or Mac) you can use a combination of "nm" and "c++filt" (for C++ libraries)
nm mylibrary.so | c++filt
or
nm mylibrary.a | c++filt
"nm" will give you the mangled form and "c++filt" attempts to put them in a more human-readable format. You might want to use some options in nm to filter down the results, especially if the library is large (or you can "grep" the final output to find a particular item)
No this is not possible. Signature of a function doesn't mean anything at runtime, its a piece of information useful at compile time for the compiler to validate your program.
You can't. Either the library publishes a public API in a header, or you need to know the signature by some other means.
The parameters of a function in the lower level depends on how many stack arguments in the stack frame you consider and how you interpret them. Therefore once the function is compiled into object code it is not possible to get the signature like that. One remote possibility is to disassemble the code and read how it function is working to know the number if parameters, but still the type would be difficult or impossible to determine. In a word, it is not possible.
This information is not available. Not even the debugger knows:
$ cat foo.c
#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[])
{
char foo[10] = { 0 };
char bar[10] = { 0 };
printf("%s\n", "foo");
memcpy(bar, foo, sizeof(foo));
return 0;
}
$ gcc -g -o foo foo.c
$ gdb foo
Reading symbols from foo...done.
(gdb) b main
Breakpoint 1 at 0x4005f3: file foo.c, line 5.
(gdb) r
Starting program: foo
Breakpoint 1, main (argc=1, argv=0x7fffffffe3e8) at foo.c:5
5 {
(gdb) ptype printf
type = int ()
(gdb) ptype memcpy
type = int ()
(gdb)

Change library load order at run time (like LD_PRELOAD but during execution)

How do I change the library a function loads from during run time?
For example, say I want to replace the standard printf function with something new, I can write my own version and compile it into a shared library, then put "LD_PRELOAD=/my/library.so" in the environment before running my executable.
But let's say that instead, I want to change that linkage from within the program itself. Surely that must be possible... right?
EDIT
And no, the following doesn't work (but if you can tell me how to MAKE it work, then that would be sufficient).
void* mylib = dlopen("/path/to/library.so",RTLD_NOW);
printf = dlsym(mylib,"printf");
AFAIK, that is not possible. The general rule is that if the same symbol appears in two libraries, ld.so will favor the library that was loaded first. LD_PRELOAD works by making sure the specified libraries are loaded before any implicitly loaded libraries.
So once execution has started, all implicitly loaded libraries will have been loaded and therefore it's too late to load your library before them.
There is no clean solution but it is possible. I see two options:
Overwrite printf function prolog with jump to your replacement function.
It is quite popular solution for function hooking in MS Windows. You can find examples of function hooking by code rewriting in Google.
Rewrite ELF relocation/linkage tables.
See this article on codeproject that does almost exactly what you are asking but only in a scope of dlopen()'ed modules. In your case you want to also edit your main (typically non-PIC) module. I didn't try it, but maybe its as simple as calling provided code with:
void* handle = dlopen(NULL, RTLD_LAZY);
void* original;
original = elf_hook(argv[0], LIBRARY_ADDRESS_BY_HANDLE(handle), printf, my_printf);
If that fails you'll have to read source of your dynamic linker to figure out what needs to be adapted.
It should be said that trying to replace functions from the libc in your application has undefined behavior as per ISO C/POSIX, regardless of whether you do it statically or dynamically. It may work (and largely will work on GNU/Linux), but it's unwise to rely on it working. If you just want to use the name "printf" but have it do something nonstandard in your program, the best way to do this is to #undef printf and #define printf my_printf AFTER including any system headers. This way you don't interfere with any internal use of the function by libraries you're using...and your implementation of my_printf can even call the system printf if/when it needs to.
On the other hand, if your goal is to interfere with what libraries are doing, somewhere down the line you're probably going to run into compatibility issues. A better approach would probably be figuring out why the library won't do what you want without redefining the functions it uses, patching it, and submitting patches upstream if they're appropriate.
You can't change that. In general *NIX linking concept (or rather lack of concept) symbol is picked from first object where it is found. (Except for oddball AIX which works more like OS/2 by default.)
Programmatically you can always try dlsym(RTLD_DEFAULT) and dlsym(RTLD_NEXT). man dlsym for more. Though it gets out of hand quite quickly. Why is rarely used.
there is an environment variable LD_LIBRARY_PATH where the linker searches for shred libraries, prepend your path to LD_LIBRARY_PATH, i hope that would work
Store the dlsym() result in a lookup table (array, hash table, etc). Then #undef print and #define print to use your lookup table version.

Using Sparse to check C code

Does anyone have experience with Sparse? I seem unable to find any documentation, so the warnings, and errors it produces are unclear to me. I tried checking the mailing list and man page but there really isn't much in either.
For instance, I use INT_MAX in one of my files. This generates an error (undefined identifier) even though I #include limits.h.
Is there any place where the errors and warnings have been explained?
Sparse isn't intended to be a lint, per say. Sparse is intended to produce a parse tree of arbitrary code so that it can be further analyzed.
In your example, you either want to define GNU_SOURCE (which I believe turns on __GNUC__), which exposes the bits you need in limits.h
I would avoid defining __GNUC__ on its own, as several things it activates might behave in an undefined way without all of the other switches that GNU_SOURCE turns on being defined.
My point isn't to help you squash error by error, its to reiterate that sparse is mostly used as a library, not as a stand alone static analysis tool.
From my copy of the README (not sure if I have the current version) :
This means that a user of the library will literally just need to do
struct string_list *filelist = NULL;
char *file;
action(sparse_initialize(argc, argv, filelist));
FOR_EACH_PTR_NOTAG(filelist, file) {
action(sparse(file));
} END_FOR_EACH_PTR_NOTAG(file);
and he is now done - having a full C parse of the file he opened. The
library doesn't need any more setup, and once done does not impose any
more requirements. The user is free to do whatever he wants with the
parse tree that got built up, and needs not worry about the library ever
again. There is no extra state, there are no parser callbacks, there is
only the parse tree that is described by the header files. The action
function takes a pointer to a symbol_list and does whatever it likes with it.
The library also contains (as an example user) a few clients that do the
preprocessing, parsing and type evaluation and just print out the
results. These clients were done to verify and debug the library, and
also as trivial examples of what you can do with the parse tree once it
is formed, so that users can see how the tree is organized.
The included clients are more 'functional test suites and examples' than anything. Its a very useful tool, but you might consider another usage angle if you want to employ it. I like it because it doesn't use *lex / bison , which makes it remarkably easier to hack.
If you look at limits.h you'll see that INT_MAX is defined inside this #if
/* If we are not using GNU CC we have to define all the symbols ourself.
Otherwise use gcc's definitions (see below). */
#if !defined __GNUC__ || __GNUC__ < 2
so to get it to work you should undefine __GNUC__ before including limits.h

Adding a pass to gcc?

Has anybody added a pass to gcc ? or not really a pass but adding an option to do some nasty things... :-) ...
I still have the same problem about calling a function just before returning from another...so I would like to investigate it by implementing something in gcc...
Cheers.
EDIT: Adding a pass to a compiler means revisiting the tree to perform some optimizations or some analysis. I would like to emulate the behavior of __cyg_profile_func_exit but only for some functions and be able to access the original return value.
So I'm going to try to enhance my question. I would like to emulate really basic AOSD-like behavior. AOSD or Aspect oriented programming enables to add crosscutting concerns (debugging is a cross-cutting concern).
int main(int argc, char ** argv) {
return foo(argc);
}
int foo(int arg_num) {
int result = arg_num > 3 ? arg_num : 42;
return result;
}
int dbg(int returned) {
printf("Return %d", returned);
}
I would like to be able to say, I'd like to trigger the dbg function after function foo has been executed. The problem is how to tell the compiler to modify the control flow and execute dbg. dbg should be executed between return and foo(argc) ...
That's really like __cyg_profile_function_exit but only in some cases (and the problem in __cyg_profile_function_exit is that you cannot easily see and modify the returned value).
If you still are interested in adding a GCC pass, you can start reading up GCC Wiki material just about that:
http://gcc.gnu.org/wiki/WritingANewPass and "Implementing Passes" from http://www.airs.com/dnovillo/200711-GCC-Internals/ on how to, well, add a pass.
The intermediate representation you are interested in is called GIMPLE. Some introduction is at http://www.airs.com/dnovillo/200711-GCC-Internals/200711-GCC-Internals-3-IR.pdf
Other information at http://gcc.gnu.org/wiki/GettingStarted
Just for future reference: Upcoming versions of gcc (4.4.0+) will provide support for plugins specifically meant for use cases such as adding optimization passes to the compiler without having to bootstrap the whole compiler.
May 6, 2009:GCC can now be extended using a generic plugin framework on host platforms that support dynamically loadable objects.
(see gcc.gnu.org)
To answer your question: gcc is a pretty popular compiler platform to do compiler research on, so yes, I'm sure someone has done it.
However, I don't think this is something done in a weekend. Hooking into gcc's code-generation is not something you'd do over the weekend. (I'm not sure what your scope is and how much time you're willing to invest.) If you really do want to hack gcc to do what you want, you most certainly want to start by discussing it on one of the gcc mailing lists.
Tips: don't assume that people have read your other questions. If you want to refer to a question, please add a link to it if you want people to find it.
Do you need to use GCC? LLVM looks like it would work. It is written in C++, and it is very easy to write a pass.
It's an interesting question. I'm going to address concepts around the question rather than answer the question directly because, well, I don't know that much about gcc internals.
You've probably already explored some higher-level manipulation of the source code to achieve what you want to accomplish; some kind of
int main(int argc, char ** argv) {
return dbg(foo(argc));
}
inserted with with a macro on the function "foo", perhaps. If you're looking for a compiler hack, though, then you probably don't want to modify source.
There are some gcc extensions discussed here that sound a bit like what you're going for. If gcc has anything that does what you want, it'll probably be documented in the C-language extensions area of the documentation. I couldn't find anything that sounded exactly like what you've described, but perhaps since you understand best what you're looking for, you'll know better how to find it.
A gdb script would do a pretty good job of outputting debug, but it sounds like you've got bigger plans than simply doing printf's. Inserting significant logic into the code seems to be what you're after.
Which reminds me of some dynamic linker tricks I've come across recently. Library interposing could insert code around function calls without affecting the original source. The example I've encountered was on Solaris, but there is probably an analog on other platforms.
Just came across the -finstrument-functions option documented here
-finstrument-functions
Generate instrumentation calls for entry and exit to functions. Just after function
entry and just before function exit, the following profiling functions will be called
with the address of the current function and its call site. (On some platforms,
__builtin_return_address does not work beyond the current function, so the call site
information may not be available to the profiling functions otherwise.)
void __cyg_profile_func_enter (void *this_fn,
void *call_site);
void __cyg_profile_func_exit (void *this_fn,
void *call_site);
But I guess this doesn't work because you are not able to modify the return value from the profiling functions.
The GCC, the GNU Compiler Collection, is a large suite, and I don't think hacking up its source code is your answer for find problems in a single application.
It sounds like you are looking more-so for debugging or profiling tools, such as gdb, and its various front-ends (xgdb, ddd) and and gprof. Memory / Bounds checking tools like electric fence, glibc's memcheck, valgrind, and mudflap might help if this is a memory or pointer issues. Enabling compiler flags for warnings and newer C standards might be useful -std=c99 -Wall -pedantic.
I cannot understand what you mean by
I still have the same problem about
calling a function just before
returning from another.
So I am not certain what you are looking for. Can you give a trivial or pseudo-code example?
I.e.
#include <stdio.h>
void a(void) {
b();
}
void b(void) {
printf("Hello World\n");
}
int main(int ac, char *av[]) {
a();
return 0;
}

Resources