Adding a pass to gcc? - c

Has anybody added a pass to gcc ? or not really a pass but adding an option to do some nasty things... :-) ...
I still have the same problem about calling a function just before returning from another...so I would like to investigate it by implementing something in gcc...
Cheers.
EDIT: Adding a pass to a compiler means revisiting the tree to perform some optimizations or some analysis. I would like to emulate the behavior of __cyg_profile_func_exit but only for some functions and be able to access the original return value.
So I'm going to try to enhance my question. I would like to emulate really basic AOSD-like behavior. AOSD or Aspect oriented programming enables to add crosscutting concerns (debugging is a cross-cutting concern).
int main(int argc, char ** argv) {
return foo(argc);
}
int foo(int arg_num) {
int result = arg_num > 3 ? arg_num : 42;
return result;
}
int dbg(int returned) {
printf("Return %d", returned);
}
I would like to be able to say, I'd like to trigger the dbg function after function foo has been executed. The problem is how to tell the compiler to modify the control flow and execute dbg. dbg should be executed between return and foo(argc) ...
That's really like __cyg_profile_function_exit but only in some cases (and the problem in __cyg_profile_function_exit is that you cannot easily see and modify the returned value).

If you still are interested in adding a GCC pass, you can start reading up GCC Wiki material just about that:
http://gcc.gnu.org/wiki/WritingANewPass and "Implementing Passes" from http://www.airs.com/dnovillo/200711-GCC-Internals/ on how to, well, add a pass.
The intermediate representation you are interested in is called GIMPLE. Some introduction is at http://www.airs.com/dnovillo/200711-GCC-Internals/200711-GCC-Internals-3-IR.pdf
Other information at http://gcc.gnu.org/wiki/GettingStarted

Just for future reference: Upcoming versions of gcc (4.4.0+) will provide support for plugins specifically meant for use cases such as adding optimization passes to the compiler without having to bootstrap the whole compiler.
May 6, 2009:GCC can now be extended using a generic plugin framework on host platforms that support dynamically loadable objects.
(see gcc.gnu.org)

To answer your question: gcc is a pretty popular compiler platform to do compiler research on, so yes, I'm sure someone has done it.
However, I don't think this is something done in a weekend. Hooking into gcc's code-generation is not something you'd do over the weekend. (I'm not sure what your scope is and how much time you're willing to invest.) If you really do want to hack gcc to do what you want, you most certainly want to start by discussing it on one of the gcc mailing lists.
Tips: don't assume that people have read your other questions. If you want to refer to a question, please add a link to it if you want people to find it.

Do you need to use GCC? LLVM looks like it would work. It is written in C++, and it is very easy to write a pass.

It's an interesting question. I'm going to address concepts around the question rather than answer the question directly because, well, I don't know that much about gcc internals.
You've probably already explored some higher-level manipulation of the source code to achieve what you want to accomplish; some kind of
int main(int argc, char ** argv) {
return dbg(foo(argc));
}
inserted with with a macro on the function "foo", perhaps. If you're looking for a compiler hack, though, then you probably don't want to modify source.
There are some gcc extensions discussed here that sound a bit like what you're going for. If gcc has anything that does what you want, it'll probably be documented in the C-language extensions area of the documentation. I couldn't find anything that sounded exactly like what you've described, but perhaps since you understand best what you're looking for, you'll know better how to find it.
A gdb script would do a pretty good job of outputting debug, but it sounds like you've got bigger plans than simply doing printf's. Inserting significant logic into the code seems to be what you're after.
Which reminds me of some dynamic linker tricks I've come across recently. Library interposing could insert code around function calls without affecting the original source. The example I've encountered was on Solaris, but there is probably an analog on other platforms.
Just came across the -finstrument-functions option documented here
-finstrument-functions
Generate instrumentation calls for entry and exit to functions. Just after function
entry and just before function exit, the following profiling functions will be called
with the address of the current function and its call site. (On some platforms,
__builtin_return_address does not work beyond the current function, so the call site
information may not be available to the profiling functions otherwise.)
void __cyg_profile_func_enter (void *this_fn,
void *call_site);
void __cyg_profile_func_exit (void *this_fn,
void *call_site);
But I guess this doesn't work because you are not able to modify the return value from the profiling functions.

The GCC, the GNU Compiler Collection, is a large suite, and I don't think hacking up its source code is your answer for find problems in a single application.
It sounds like you are looking more-so for debugging or profiling tools, such as gdb, and its various front-ends (xgdb, ddd) and and gprof. Memory / Bounds checking tools like electric fence, glibc's memcheck, valgrind, and mudflap might help if this is a memory or pointer issues. Enabling compiler flags for warnings and newer C standards might be useful -std=c99 -Wall -pedantic.
I cannot understand what you mean by
I still have the same problem about
calling a function just before
returning from another.
So I am not certain what you are looking for. Can you give a trivial or pseudo-code example?
I.e.
#include <stdio.h>
void a(void) {
b();
}
void b(void) {
printf("Hello World\n");
}
int main(int ac, char *av[]) {
a();
return 0;
}

Related

How can I plant assembly instructions in the prologue and epilogue of function via gcc

I try to build profiler to some c project.
I want that gcc plant some assembly instruction in all the function entries and function exit points in compile time.
I try to search some guides in the web but without success.
where can I learn how to do that?
thank in advance.
Apparently you can use the -finstrument-functions flag to get gcc to generate instrumentation calls
void __cyg_profile_func_enter(void *func, void *callsite);
void __cyg_profile_func_exit(void *func, void *callsite);
at function entry and exit. I've never used this, but a quick search brings up information and examples here, here, here and here.
Unless you want to modify gcc (which is non-trivial!), I would think that there are two fairly obvious approaches.
Pre-process the C code itself - it's not easy, but not terribly hard either. Find the beginning and end of a function, and add your code to it, then let the compiler proper do the job of making the code... There are quite a few tools on the market that does this in one way or another, for a variety of purposes [code flow analysis, profiling, etc].
Take the assembler output of gcc and process it to to add code to functions there. This is in some ways easier, and in some ways harder. Identifiying functions is probably no more difficult, but "not breaking" the assembler code may be harder unless your inserted assembler code is completely "safe".
Obviously, the option of modifying gcc is also a possibility, but the compiler code is fairly complex, and unless you basically take all the existing hooks for gprof, I don't think it's a school project - unless you are on your way to a PhD or some such.

Avoiding gcc function prologue overhead?

I've lately encountered a lot of functions where gcc generates really bad code on x86. They all fit a pattern of:
if (some_condition) {
/* do something really simple and return */
} else {
/* something complex that needs lots of registers */
}
Think of simple case as something so small that half or more of the work is spent pushing and popping registers that won't be modified at all. If I were writing the asm by hand, I would save and restore the saved-across-calls registers inside the complex case, and avoid touching the stack pointer at all in the simple case.
Is there any way to get gcc to be a little bit smarter and do this itself? Preferably with command line options rather than ugly hacks in the source...
Edit: To make it concrete, here's something very close to some of the functions I'm dealing with:
if (buf->pos < buf->end) {
return *buf->pos++;
} else {
/* fill buffer */
}
and another one:
if (!initialized) {
/* complex initialization procedure */
}
return &initialized_object;
and another:
if (mutex->type == SIMPLE) {
return atomic_swap(&mutex->lock, 1);
} else {
/* deal with ownership, etc. */
}
Edit 2: I should have mentioned to begin with: these functions cannot be inlined. They have external linkage and they're library code. Allowing them to be inlined in the application would result in all kinds of problems.
Update
To explicitely suppress inlining for a single function in gcc, use:
void foo() __attribute__ ((noinline))
{
...
}
See also How can I tell gcc not to inline a function?
Functions like this will regularly be inlined automatically unless compiled -O0 (disable optimization).
In C++ you can hint the compiler using the inline keyword
If the compiler won't take your hint you are probably using too many registers/branches inside the function. The situation is almost certainly resolved by extracting the 'complicated' block into it's own function.
Update i noticed you added the fact that they are extern symbols. (Please update the question with that crucial info). Well, in a sense, with external functions, all bets are off. I cannot really believe that gcc will by definition inline all of a complex function into a tiny caller simply because it is only called from there. Perhaps you can give some sample code that demonstrates the behaviour and we can find the proper optimization flags to remedy that?
Also, is this C or C++? In C++ I know it is common place to include the trivial decision functions inline (mostly as members defined in the class declaration). This won't give a linkage conflict like with simple (extern) C functions.
Also you can have template functions defined that will inline perfectly in all compilation modules without resulting in link conflicts.
I hope you are using C++ because it will give you a ton of options here.
I would do it like this:
static void complex_function() {}
void foo()
{
if(simple_case) {
// do whatever
return;
} else {
complex_function();
}
}
The compiler my insist on inlining complex_function(), in which case you can use the noinline attribute on it.
Perhaps upgrade your version of gcc? 4.6 has just been released. As far as I understand, it has the possibility of "partial inline". That is, an easily integratable outer part of a function is inlined and the expensive part is transformed into a call. But I have to admit that I didn't try it myself, yet.
Edit: The statement I was referring to from the ChangeLog:
Partial inlining is now supported and
enabled by default at -O2 and greater.
The feature can be controlled via
-fpartial-inlining.
Partial inlining splits functions with
short hot path to return. This allows
more aggressive inlining of the hot
path leading to better performance and
often to code size reductions (because
cold parts of functions are not
duplicated).
...
Inlining when optimizing for size
(either in cold regions of a program
or when compiling with -Os) was
improved to better handle C++ programs
with larger abstraction penalty,
leading to smaller and faster code.
I would probably refactor the code to encourage inlining of the simple case. That said, you can use -finline-limit to make gcc consider inlining larger functions, or -fomit-frame-pointer -fno-exceptions to minimize the stack frame. (Note that the latter may break debugging and cause C++ exceptions to misbehave badly.)
Probably you won't be able to get much from tweaking compiler options, though, and will have to refactor.
Seeing as these are external calls, it might be possible the gcc is treating them as unsafe and preserving registers for the function call(hard to know without seeing the registers that it preserves, including the ones you say 'aren't used'). Out of curiousity, does this excessive register spilling still occur with all optimizations disabled?

Is there a way to access debug symbols at run time?

Here's some example code to give an idea of what I want.
int regular_function(void)
{
int x,y,z;
/** do some stuff **/
my_api_call();
return x;
}
...
void my_api_call(void)
{
char* caller = get_caller_file();
int line = get_caller_line();
printf("I was called from %s:%d\n", caller, line);
}
Is there a way to implement the get_caller_file() and get_caller_line()? I've seen/used tricks like #defineing my_api_call as a function call passing in the __FILE__ and __LINE__ macros. I was wondering if there was a way to access that information (assuming it's present) at run time instead of compile time? Wouldn't something like Valgrind have to do something like this in order to get the information it returns?
If you have compiled your binary with debug symbols, you may access it using special libraries, like libdwarf for DWARF debug format.
This is highly environment-specific. In most Windows and Linux implementations where debug symbols are provided, the tool vendor provides or documents a way of doing that. For a better answer, provide implementation specifics.
Debugging symbols, if available, need to be stored somewhere so the debugger can get at them. They may or may not be stored in the executable file itself.
You may or may not know the executable file name (argv[0] is not required to have the full path of the program name, or indeed have any useful information in it - see here for details).
Even if you could locate the debugging symbols, you would have to decode them to try and figure out where you were called from.
And your code may be optimised to the point where the information is useless.
That's the long answer. The short answer is that you should probably rely on passing in __FILE__ and __LINE__ as you have been. It's far more portable an reliable.

Does Klocwork detect never called functions?

my code is a mix up of different bits and pieces from older code.
I would like to erase all never used functions in order to keep the code simple.
Is Klocwork the tool? How do I do it?
Thanks,
Moshe.
You could use the -p or -pg options to gcc to cause code to be added to the prologue and epilogue of every function so that a profile database is written when the program executes. The tool prof is used to analyze the output from -p and gprof for -pg. These tools produce reports showing what functions were used, how many calls, and how much time was spent in each. Unused functions will be missing from the profile database.
You could also use gcov to get a report of what lines of code were actually executed. Functions never called will be executed 0 times....
Klocwork will find unused function/methods. There is a special checker pack you can download on my.klocwork.com (if you have an account) that will give you these special checkers.
I am not familiar with Klocwork, but gcc has the warning option -Wunused-function that detects most uncalled functions. -Wunused-function is part of -Wall.
Klockwork doesnt detect uncalled functions. Its used for static analysis only.
You can check it like this:
foo()
{
char *a;
a = malloc(100);
}
bar()
{
char a[100];
}
main()
{
bar();
}
This would probably report leak in function foo which is actually uncalled. However as schot suggested you can look into compiler options.

debugging c programs

Programming in a sense is easy. But bugs are something which always makes more trouble. Can anyone help me with good debugging tricks and softwares in c?
From "The Elements of Programming Style" Brian Kernighan, 2nd edition, chapter 2:
Everyone knows that debugging is twice
as hard as writing a program in the
first place. So if you're as clever as
you can be when you write it, how will
you ever debug it?
So from that; don't be "too clever"!
But apart from that and the answers already given; use a debugger! That is your starting point tool-wise. You'd be amazed how many programmers struggle along without the aid of a debugger, and they are fools to do so.
But before you even get to the debugger, get your compiler to help you as much as possible; set the warning level to high, and set warnings as errors. A static analysis tool such as lint, pclint, or QA-C would be even better.
Tools for debugging are all well and good and for some classes of error they will just point you straight to the problem. The best tip that I have for debugging is that you need to think about it in the right way. What works for me is the following:
The compiler probably isn't broken. I've been working with C for 25 years now and in all that time it's almost invariably something I'm doing wrong.
Read the error messages. Often I've looked back at the error message and in hindsight realized it was telling me exactly what was wrong.
Read the documentation. Make sure you aren't making assumptions about the language or library that aren't true.
Make a mental model of the problem. I ask myself what needs to be hapening in my code in order for the results I'm seeing to occur. Then add debug statements, assertions or just step through in the debugger (if you can) to see what is really happening.
Talk the problem through with someone else. Just describing it to a a third party often results in a revelation about what might be happening.
Other people will have other ways of approaching debugging, but I find if you have a structured approach to it rather than flailing around changing stuff at random you usually get there and when you do be prepared for the inevitable Why didn't I see that straight away!
Best debugger for C
gdb
Best tools for memory leak checking:
Valgrind
The following are popular debugging tools.
Valgrind
Purify
Duma
Some very simple Tricks/Suggestions
-> Always check that nowhere in your code you have dereferenced a wild/dangling pointer
Example 1)
int main()
{
int *p;
*p=10; //Undefined Behaviour (crash on most implementations)
}
Example 2)
int main()
{
int *p=malloc(sizeof(int));
//do something with p
free p;
printf("%d", *p); ////Undefined Behaviour (crash on most implementations)
}
-> Always initialize variables before using
int main()
{
int k;
for(int i= k;i<10;++i)
^^
Ouch
printf("%d",i");
}
In addition to all the other suggestions (gdb, valgrind, all that), some simple rules when writing the code help a lot when debugging afterwards.
Always use types with the proper
semantics. Unsigned types (best
size_t) for array indices and numbers that represent a cardinal,
ptrdiff_t for pointer differences,
off_t for file offsets etc. enum types for tags and case distinctions.
There is almost no need for the
builtin types int, long, char or
whatever. Avoid them whenever possible.
In particular don't use char for
arithmetic, the signedness problems with that are a plague. Use uint8_t or int8_t
if you feel the need for such a
thing.
Always initialize variables, all of them: integer, double, pointers, struct. It is
not true that this is less efficient
with a modern compiler. In most cases it will just
be optimized away when not necessary.
But especially pointer variables that
are not properly initialized can
produce spurious errors and make code
hard to debug. If you have them
initialized to NULL your program
will fail early, and your debugger will show you the place.
Compile with all warnings on, and
don't finish tidying your code until
the compiler doesn't give a single
warning. They are quite good at that nowadays, take advantage.
Compile with different optimization
options on, or even better with
different versions of your compiler,
or still better with completely
different compilers on different
platforms.
Use the assert macro. This forces you to think of your assumptions and also make your
code fail early if they are not fulfilled.
Unit testing. Makes getting your software correct a lot easier.
gdb is a debugger to analyse your program.
Other techinque is to use printf or logs
Valgrind provides dynamic analysis of the executable
Purify provides static and dynamic analysis. Sparrow and Prevent are some other tools in competition to Purify.
This can be separated into:
Prevention measures:
Use strict coding styles, don't make a mess
Use comments and code revisions
Use static code analysis tools
Use assertions where it's possible
Don't over complicate
Post-factum
Use debugger/tracer
Use memory checking tools
Use regression testing
Use your brain
Off the top of my head, Valgrind.
You might also want to hone your debugging skills by reading the book Debugging by David Agans. Every programmer should read this early on in their career.
valgrind for memory problems if you're on linux. use gdb/ddd on linux as well. On windows a lot of windows programmers don't seem to be knowledgeable of windbg. It is very useful but has a learning curve like gdb; more powerful than the built in debugger in visual studio. learn to use assert, you will catch lots of stuff and you can turn it off in release code if you so choose. Use a unit testing framework like Check, cunit, etc . Always initialize your pointer, to NULL if nothing else. When you free a pointer set it to NULL. Better you to catch a segfault than your user. Pick a coding standard and stick to it, consistency will help you make fewer mistakes. Keep your functions small if at all possible, this will keep you from having 10 level deep braces which are logic nightmares. If compiling using gcc use -Wall and -Wextra . Use the strn* functions instead of str* functions. Well worth the extra thinking they force you to do.

Resources