I've been working with a large codebase written primarily by programmers who no longer work at the company. One of the programmers apparently had a special place in his heart for very long macros. The only benefit I can see to using macros is being able to write functions that don't need to be passed in all their parameters (which is recommended against in a best practices guide I've read). Other than that I see no benefit over an inline function.
Some of the macros are so complicated I have a hard time imagining someone even writing them. I tried creating one in that spirit and it was a nightmare. Debugging is extremely difficult, as it takes N+ lines of code into 1 in the a debugger (e.g. there was a segfault somewhere in this large block of code. Good luck!). I had to actually pull the macro out and run it un-macro-tized to debug it. The only way I could see the person having written these is by automatically generating them out of code written in a function after he had debugged it (or by being smarter than me and writing it perfectly the first time, which is always possible I guess).
Am I missing something? Am I crazy? Are there debugging tricks I'm not aware of? Please fill me in. I would really like to hear from the macro-lovers in the audience. :)
To me the best use of macros is to compress code and reduce errors. The downside is obviously in debugging, so they have to be used with care.
I tend to think that if the resulting code isn't an order of magnitude smaller and less prone to errors (meaning the macros take care of some bookkeeping details) then it wasn't worth it.
In C++, many uses like this can be replaced with templates, but not all. A simple example of Macros that are useful are in the event handler macros of MFC -- without them, creating event tables would be much harder to get right and the code you'd have to write (and read) would be much more complex.
If the macros are extremely long, they probably make the code short but efficient. In effect, he might have used macros to explicitly inline code or remove decision points from the run-time code path.
It might be important to understand that, in the past, such optimizations weren't done by many compilers, and some things that we take for granted today, like fast function calls, weren't valid then.
To me, macros are evil. With their so many side effects, and the fact that in C++ you can gain same perf gains with inline, they are not worth the risk.
For ex. see this short macro:
#define max(a, b) ((a)>(b)?(a):(b))
then try this call:
max(i++, j++)
More. Say you have
#define PLANETS 8
#define SOCCER_MIDDLE_RIGHT 8
if an error is thrown, it will refer to '8', but not either of its meaninful representations.
I only know of two reasons for doing what you describe.
First is to force functions to be inlined. This is pretty much pointless, since the inline keyword usually does the same thing, and function inlining is often a premature micro-optimization anyway.
Second is to simulate nested functions in C or C++. This is related to your "writing functions that don't need to be passed in all their parameters" but can actually be quite a bit more powerful than that. Walter Bright gives examples of where nested functions can be useful.
There are other reasons to use of macros, such as using preprocessor-specific functionality (like including __FILE__ and __LINE__ in autogenerated error messages) or reducing boilerplate code in ways that functions and templates can't (the Boost.Preprocessor library excels here; see Boost.ScopeExit or this sample enum code for examples), but these reasons don't seem to apply for doing what you describe.
Very long macros will have performance drawbacks, like increased compiled binary size, and there are certainly other reasons for not using them.
For the most problematic macros, I would consider running the code through the preprocessor, and replacing the macro output with function calls (inline if possible) or straight LOC. If the macros exists for compatibility with other architectures/OS's, you might be stuck though.
Part of the benefit is code replication without the eventual maintenance cost - that is, instead of copying code elsewhere you create a macro from it and only have to edit it once...
Of course, you could also just make a method to be called but that is sort of more work... I'm against much macro use myself, just trying to present a potential rationale.
There are a number of good reasons to write macros in C.
Some of the most important are for creating configuration tables using x-macros, for making function like macros that can accept multiple parameter types as inputs and converting tables from human readable/configurable/understandable values into computer used values.
I cant really see a reason for people to write very long macros, except for the historic automatic function inline.
I would say that when debugging complex macros, (when writing X macros etc) I tend to preprocess the source file and substitute the preprocessed file for the original.
This allows you to see the C code generated, and gives you real lines to work with in the debugger.
I don't use macros at all. Inline functions serve every useful purpose a macro can do. Macro allow you to do very weird and counterintuitive things like splitting up identifiers (How does someone search for the identifier then?).
I have also worked on a product where a legacy programmer (who thankfully is long gone) also had a special love affair with Macros. His 'custom' scripting language is the height of sloppiness. This was compounded by the fact that he wrote his C++ classes in C, meaning all class functions and variables were all public. Anyways, he wrote almost everything in macro's and variadic functions (Another hideous monstrosity foisted on the world). So instead of writing a proper template class he would use a Macro instead! He also resorted to macro's to create factory classes as well, instead of normal code... His code is pretty much unmaintanable.
From what I have seen, macro's can be used when they are small and are used declaratively and don't contain moving parts like loops, and other program flow expressions. It's OK if the macro is one or at the most two lines long and it declares and instance of something. Something that won't break during runtime. Also macro's should not contain class definitions, or function definitions. If the macro contains code that needs to be stepped into using a debugger than the macro should be removed and replace with something else.
They can also be useful for wrapping custom tracing/debugging functionality. For instance you want custom tracing in debug builds but not release builds.
Anyways when you are working in legacy code like that, just be sure to remove a bit of the macro mess a bit at a time. If you keep it up, with enough time eventually you will remove them all and make life a bit easier for yourself. I have done this in the past, with especially messy macro's. What I do is turn on the compiler switch to have the preprocessor generate an output file. Then I raid that file, and copy the code, re-indent it, and replace the macro with the generated code. Thank goodness for that compiler feature.
Some of the legacy code I've worked with used macros very extensively in the place of methods. The reasoning was that the computer/OS/runtime had an extremely small stack, so that stack overflows were a common problem. Using macros instead of methods meant that there were fewer methods on the stack.
Luckily, most of that code was obsolete, so it is (mostly) gone now.
C89 did not have inline functions. If using a compiler with extensions disabled (which is a desirable thing to do for several reasons), then the macro might be the only option.
Although C99 came out in 1999, there was resistance to it for a long time; commercial compiler vendors didn't feel it was worth their time to implement C99. Some (e.g. MS) still haven't. So for many companies it was not a viable practical decision to use C99 conforming mode, even up to today in the case of some compilers.
I have used C89 compilers that did have an extension for inline functions, but the extension was buggy (e.g. multiple definition errors when there should not be), things like that may dissuade a programmer from using inline functions.
Another thing is that the macro version effectively forces that the function will actually be inlined. The C99 inline keyword is only a compiler hint and the compiler may still decide to generate a single instance of the function code which is linked like a non-inline function. (One compiler that I still use will do this if the function is not trivial and returning void).
Related
Recently I got to view an embedded code in that they are using
#define print() printf("hello world")
instead of
void print() { printf("hello world"); }
My question what is the gain on using #define instead of creating a function?
It may be related to performance.
A function call has some overhead (i.e. calling, saving things on the stack, returning, etc) while a macro is a direct substitution of the macro name with it's contents (i.e. no overhead).
In this example the functions foo and bar does exactly the same. foo uses a macro while bar uses a function call.
As you can see bar and printY together requires more instructions than foo
.
So by using a macro the performance got a little better.
But... there are downsides to this approach:
Macros are hard to debug as you can't single step a macro
Extensive use of a macro increases the size of the binary (compared to using function call). Something that can impact performance in a negative direction.
Also notice that modern compilers (with optimization on) are really good at figuring out when it's a good idea to automatically inline a function (i.e. your code is written with a function call but the compiler decides to inline the function as if it was a macro). So you might get the same performance using function call.
Further, you can use the inline key word as a hint to the compiler that you think it will be good to inline a function. But even with that keyword the compiler may decide not to inline. The only way to make sure that the code gets inline, is by using a macro.
There is no advantage. Using #define like this is quite ancient C programming style.
In the year 1999, the C language got the inline keyword to make all such macros obsolete. And with modern compilers, inline is often superfluous too, since the compiler is nowadays better than the programmer when it comes to determining when to inline.
Some of the embedded compilers out can still be rather bad at such optimizations though, and that's why embedded C code tends to lag behind in modernization.
In general, doing micro-optimizations like this is called "pre-mature optimizations", meaning the programmer is meddling with optimizations that they should leave to the compiler. Even in hard real time systems. Optimizations should only be the last resort when you have 1) detected an actual bottleneck, and 2) disassembled to see if manual inlining actually does anything good for performance.
Sometimes you want to stub out functionality at compile time. Macros give you an easy way to do this.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
As I dive deeper into the C programming language I am having trouble understanding why macros should be used as a last resort. For instance, this post. This is not the first time I've heard chatter that they are a last resort. Some suggest that the memory footprint is more excessive than calling a function (this post). Albeit I understand these arguments, as well as why they should not be used for C++ (compiler optimizations and the like), I do not understand the following:
Since macros 'unroll' (if you will) in the .text segment of the stack there is less overhead associated with macros as opposed to function calls - e.g. memory need not be allocated between the frame pointer and stack pointer. This memory overhead is quantifiable, where as this post suggest that macros are not.
Much of the work I do is in embedded systems, micro-controllers, and systems programming. I have read many books and articles by Bjarne Stroustrup. I am also in the process of reading Clean Code - where Robert Martin persists that readability is king (not implying macros increase readability in their own right).
TLDR:
Considering macros reduce the overhead associated with stack frames and (if used appropriately) can increase readability - Why the negative stigma? They are littered through BSD papers and man pages.
Some will vote to close because this could be construed as an opinion question, but there are some facts.
If you have a decent compiler, calls to simple functions are expanded inline just like macros, but they're considerably more type safe. I've seen cases where inlined functions were faster because the compiler missed common sub-expressions in the the compiled macro that were eliminated when function arguments were inlined.
When you call a simple function multiple times, you can give the compiler hints on whether to expand it inline each time or call it -- trading code space for tiny stack and runtime overhead you mention. Better yet, lots of compilers (like gcc) will make this call automatically based on heuristics that may be better than your intuition. With a macro, you can only modify the macro to call a function. Code changes are more error prone than build hints.
In most compilation systems, error messages and debuggers don't reference the body code of macros. OTOH, modern debuggers will step correctly line-by-line through the body of a function even though it's actually been expanded inline. Similarly, compilers will correctly point to error locations in function bodies.
It's extremely easy to code subtle bugs by expanding multiple arg references: #define MAX(X,Y) (X>Y?X:Y) followed by MAX(++x, 3). Here x is incremented once if it's less than 3, twice otherwise. Ouch.
Multi-level macro expansion (where a macro call produces other macro calls) relies on complex rules and is therefore error-prone. E.g. it's not hard to create macros that depend on specific behavior of one preprocessor so that they fail on another.
Functions can be recursive. Macros can't.
C11 features provide ways to do things where macros have been (ab)used in the past: aggregate literals for example. Again, the advantages are type safety, error, and debugger references.
The upshot is that when you use a macro, you're adding error risk and maintenance cost that don't exist with inlined functions. It's hard to escape the conclusion that you should use macros only as a fallback. The main defensible reason to do otherwise is that you're forced to use an old or junky compiler.
No typesafety and have side effects of pure textual replacement. These are main things for which we should avoid.
Macros can do a lot - quite a lot - but as is well-known it's easy to accidentally build macros that do the wrong thing, either by mishandling side-effects, accidentally evaluating arguments multiple times, or messing up operator precedence. I think that the right evaluation to make would be to evaluate the upsides of the benefits of macros versus the potential risks or drawbacks, then make the call from there.
For example, let's take function-like macros. As you mention, they're often used to make code faster by eliminating short function calls. But nowadays, you can achieve the same thing either by using the new inline keyword adopted from C++ or just by cranking up compiler optimization settings, since compilers these days are dramatically better at optimization than they were many years back. If you're using those macros because you want to perform operations like taking a min or max that have basically identical code but different types, use the ‘_Generic` keyword. These other options are less error-prone and easier to debug and test, so it's probably worth avoiding the risk of a macro error by using them.
Then there's defining constants with#define. This fell out of favor in C++ in favor of constants, and that's also now possible in C using static const variables at global scope. You can use macros to do this, but it's more type-safe to use the other option instead. Most compilers are smart enough to inline the constants and do optimizations on the values in ways that previously only macros would guarantee.
For these more routine operations, the benefits of using macros aren't as high as they used to be because new language features and vastly smarter compilers have provided less-risky alternatives. That's the main reason why in general the advice is to avoid using macros - they're just not the best tool for the job.
This doesn't mean to never use macros. There are many times and places where they're fantastic. X Macros are a really neat way to automatically generate code, and macro substitution is super helpful for taking advantage of compiler-specific or OS-specific features while maintaining portability. I don't foresee those uses going away any time soon. But do consider the alternatives in other cases, since in many instances they were specifically invented to address weaknesses of the macro preprocessing system!
A good optimizing compiler should give efficient code for calls to inline functions, as efficient as if you use macros (think of getc as an example)
But you may want to use macros when they are not replacable by inline functions. (Here is an example).
I wonder why other languages do not support this feature. What I can understand that C / C++ code is platform dependent so to make it work (compile and execute) across various platform, is achieved by using preprocessor directives. And there are many other uses of this apart from this. Like you can put all your debug printf's inside #if DEBUG ... #endif. So while making the release build these lines of code do not get compiled in the binary. But in other languages, achieving this thing (later part) is difficult (or may be impossible, I'm not sure). All code will get compiled in the binary increasing its size. So my question is "why do Java, or other modern compiled languages no support this kind of feature?" which allows you to include or exclude some piece of code from the binary in a much handy way.
The major languages that don't have a preprocessor usually have a different, often cleaner, way to achieve the same effects.
Having a text-preprocessor like cpp is a mixed blessing. Since cpp doesn't actually know C, all it does is transform text into other text. This causes many maintenance problems. Take C++ for example, where many uses of the preprocessor have been explicitly deprecated in favor of better features like:
For constants, const instead of #define
For small functions, inline instead of #define macros
The C++ FAQ calls macros evil and gives multiple reasons to avoid using them.
The portability benefits of the preprocessor are far outweighed by the possibilities for abuse. Here are some examples from real codes I have seen in industry:
A function body becomes so tangled with #ifdef that it is very hard to read the function and figure out what is going on. Remember that the preprocessor works with text not syntax, so you can do things that are wildly ungrammatical
Code can become duplicated in different branches of an #ifdef, making it hard to maintain a single point of truth about what's going on.
When an application is intended for multiple platforms, it becomes very hard to compile all the code as opposed to whatever code happens to be selected for the developer's platform. You may need to have multiple machines set up. (It is expensive, say, on a BSD system to set up a cross-compilation environment that accurately simulates GNU headers.) In the days when most varieties of Unix were proprietary and vendors had to support them all, this problem was very serious. Today when so many versions of Unix are free, it's less of a problem, although it's still quite challenging to duplicate native Windows headers in a Unix environment.
It Some code is protected by so many #ifdefs that you can't figure out what combination of -D options is needed to select the code. The problem is NP-hard, so the best known solutions require trying exponentially many different combinations of definitions. This is of course impractical, so the real consequence is that gradually your system fills with code that hasn't been compiled. This problem kills refactoring, and of course such code is completely immune to your unit tests and your regression tests—unless you set up a huge, multiplatform testing farm, and maybe not even then.
In the field, I have seen this problem lead to situations where a refactored application is carefully tested and shipped, only to receive immediate bug reports that the application won't even compile on other platforms. If code is hidden by #ifdef and we can't select it, we have no guarantee that it typechecks—or even that it is syntactically correct.
The flip side of the coin is that more advanced languages and programming techniques have reduced the need for conditional compilation in the preprocessor:
For some languages, like Java, all the platform-dependent code is in the implementation of the JVM and in the associated libraries. People have gone to huge lengths to make JVMs and libraries that are platform-independent.
In many languages, such as Haskell, Lua, Python, Ruby, and many more, the designers have gone to some trouble to reduce the amount of platform-dependent code compared to C.
In a modern language, you can put platform-dependent code in a separate compilation unit behind a compiled interface. Many modern compilers have good facilities for inlining functions across interface boundaries, so that you don't pay much (or any) penalty for this kind of abstraction. This wasn't the case for C because (a) there are no separately compiled interfaces; the separate-compilation model assumes #include and the preprocessor; and (b) C compilers came of age on machines with 64K of code space and 64K of data space; a compiler sophisticated enough to inline across module boundaries was almost unthinkable. Today such compilers are routine. Some advanced compilers inline and specialize methods dynamically.
Summary: by using linguistic mechanisms, rather than textual replacement, to isolate platform-dependent code, you expose all your code to the compiler, everything gets type-checked at least, and you have a chance of doing things like static analysis to ensure suitable test coverage. You also rule out a whole bunch of coding practices that lead to unreadable code.
Because modern compilers are smart enough to remove dead code in most any case, making manually feeding the compiler this way no longer necessary. I.e. instead of :
#include <iostream>
#define DEBUG
int main()
{
#ifdef DEBUG
std::cout << "Debugging...";
#else
std::cout << "Not debugging.";
#endif
}
you can do:
#include <iostream>
const bool debugging = true;
int main()
{
if (debugging)
{
std::cout << "Debugging...";
}
else
{
std::cout << "Not debugging.";
}
}
and you'll probably get the same, or at least similar, code output.
Edit/Note: In C and C++, I'd absolutely never do this -- I'd use the preprocessor, if nothing else that it makes it instantly clear to the reader of my code that a chunk of it isn't supposed to be complied under certain conditions. I am saying, however, that this is why many languages eschew the preprocessor.
A better question to ask is why did C resort to using a pre-processor to implement these sorts of meta-programming tasks? It isn't a feature as much as it is a compromise to the technology of the time.
The pre-processor directives in C were developed at a time when machine resources (CPU speed, RAM) were scarce (and expensive). The pre-processor provided a way to implement these features on slow machines with limited memory. For example, the first machine I ever owned had 56KB of RAM and a 2Mhz CPU. It still had a full K&R C compiler available, which pushed the system's resources to the limit, but was workable.
More modern languages take advantage of today's more powerful machines to provide better ways of handling the sorts of meta-programming tasks that the pre-processor used to deal with.
Other languages do support this feature, by using a generic preprocessor such as m4.
Do we really want every language to have its own text-substitution-before-execution implementation?
The C pre-processor can be run on any text file, it need not be C.
Of course, if run on another language, it might tokenize in weird ways, but for simple block structures like #ifdef DEBUG, you can put that in any language, run the C pre-processor on it, then run your language specific compiler on it, and it will work.
Note that macros/preprocessing/conditionals/etc are usually considered a compiler/interpreter feature, as opposed to a language feature, because they are usually completely independent of the formal language definition, and might vary from compiler to compiler implementation for the same language.
A situation in many languages where conditional compilation directives can be better than if-then-else runtime code is when compile-time statements (such as variable declarations) need to be conditional. For example
$if debug
array x
$endif
...
$if debug
dump x
$endif
only declares/allocates/compiles x when needing x, whereas
array x
boolean debug
...
if debug then dump x
probably has to declare x regardless of whether debug is true.
Many modern languages actually have syntactic metaprogramming capabilities that go way beyond CPP. Pretty much all modern Lisps (Arc, Clojure, Common Lisp, Scheme, newLISP, Qi, PLOT, MISC, ...) for example have extremely powerful (Turing-complete, actually) macro systems, so why should they limit themselves to the crappy CPP style macros which aren't even real macros, just text snippets?
Other languages with powerful syntactic metaprogramming include Io, Ioke, Perl 6, OMeta, Converge.
Because decreasing the size of the binary:
Can be done in other ways (compare the average size of a C++ executable to a C# executable, for example).
Is not that important, when it you weigh it against being able to write programs that actually work.
Other languages also have better dynamic binding. For example, we have some code that we cannot ship to some customers for export reasons. Our "C" libraries use #ifdef statements and elaborate Makefile tricks (which is pretty much the same).
The Java code uses plugins (ala Eclipse), so that we just don't ship that code.
You can do the same thing in C through the use of shared libraries... but the preprocessor is a lot simpler.
A other point nobody else mentioned is platform support.
Most modern languages can not run on the same platforms as C or C++ can and are not intended to run on this platforms. For example, Java, Python and also native compiled languages like C# need a heap, they are designed to run on a OS with memory management, libraries and large amount of space, they do not run in a freestanding environment. There you can use other ways to archive the same. C can be used to program controllers with 2KiB ROM, there you need a preprocessor for most applications.
This is a more theoretical question about macros (I think). I know macros take source code and produce object code without evaluating it, enabling programmers to create more versatile syntactic structures. If I had to classify these two macro systems, I'd say there was the "C style" macro and the "Lisp style" macro.
It seems that debugging macros can be a bit tricky because at runtime, the code that is actually running differs from the source.
How does the debugger keep track of the execution of the program in terms of the preprocessed source code? Is there a special "debug mode" that must be set to capture extra data about the macro?
In C, I can understand that you'd set a compile time switch for debugging, but how would an interpreted language, such as some forms of Lisp, do it?
Apologize for not trying this out, but the lisp toolchain requires more time than I have to spend to figure out.
I don't think there's a fundamental difference in "C style" and "Lisp style" macros in how they're compiled. Both transform the source before the compiler-proper sees it. The big difference is that C's macros use the C preprocessor (a weaker secondary language that's mostly for simple string substitution), while Lisp's macros are written in Lisp itself (and hence can do anything at all).
(As an aside: I haven't seen a non-compiled Lisp in a while ... certainly not since the turn of the century. But if anything, being interpreted would seem to make the macro debugging problem easier, not harder, since you have more information around.)
I agree with Michael: I haven't seen a debugger for C that handles macros at all. Code that uses macros gets transformed before anything happens. The "debug" mode for compiling C code generally just means it stores functions, types, variables, filenames, and such -- I don't think any of them store information about macros.
For debugging programs that use
macros, Lisp is pretty much the same
as C here: your debugger sees the
compiled code, not the macro
application. Typically macros are
kept simple, and debugged
independently before use, to avoid
the need for this, just like C.
For debugging the macros
themselves, before you go and use it somewhere, Lisp does have features
that make this easier than in C,
e.g., the repl and
macroexpand-1 (though in C
there is obviously a way to
macroexpand an entire file, fully, at
once). You can see the
before-and-after of a macroexpansion,
right in your editor, when you write
it.
I can't remember any time I ran across a situation where debugging into a macro definition itself would have been useful. Either it's a bug in the macro definition, in which case macroexpand-1 isolates the problem immediately, or it's a bug below that, in which case the normal debugging facilities work fine and I don't care that a macroexpansion occurred between two frames of my call stack.
In LispWorks developers can use the Stepper tool.
LispWorks provides a stepper, where one can step through the full macro expansion process.
You should really look into the kind of support that Racket has for debugging code with macros. This support has two aspects, as Ken mentions. On one hand there is the issue of debugging macros: in Common Lisp the best way to do that is to just expand macro forms manually. With CPP the situation is similar but more primitive -- you'd run the code through only the CPP expansion and inspect the result. However, both of these are insufficient for more involved macros, and this was the motivation for having a macro debugger in Racket -- it shows you the syntax expansion steps one by one, with additional gui-based indications for things like bound identifiers etc.
On the side of using macros, Racket has always been more advanced than other Scheme and Lisp implementations. The idea is that each expression (as a syntactic object) is the code plus additional data that contains its source location. This way when a form is a macro, the expanded code that has parts coming from the macro will have the correct source location -- from the definition of the macro rather than from its use (where the forms are not really present). Some Scheme and Lisp implementations will implement a limited for of this using the identity of subforms, as dmitry-vk mentioned.
I don't know about lisp macros (which I suspect are probably quite different than C macros) or debugging, but many - probably most - C/C++ debuggers do not handle source-level debugging of C preprocessor macros particularly well.
Generally, C/C++ debuggers they don't 'step' into the macro definition. If a macro expands into multiple statements, then the debugger will usually just stay on the same source line (where the macro is invoked) for each debugger 'step' operation.
This can make debugging macros a little more painful than they might otherwise be - yet another reason to avoid them in C/C++. If a macro is misbehaving in a truly mysterious way, I'll drop into assembly mode to debug it or expand the macro (either manually or using the compiler's switch). It's pretty rare that you have to go to that extreme; if you're writing macros that are that complicated, you're probably taking the wrong approach.
Usually in C source-level debugging has line granularity ("next" command) or instruction-level granularity ("step into"). Macro processors insert special directives into processed source that allow compiler to map compiled sequences of CPU instructions to source code lines.
In Lisp there exists no convention between macros and compiler to track source code to compiled code mapping, so it is not always possible to do single-stepping in source code.
Obvious option is to do single stepping in macroexpanded code. Compiler already sees final, expanded, version of code and can track source code to machine code mapping.
Other option is to use the fact that lisp expressions during manipulation have identity. If the macro is simple and just does destructuring and pasting code into template then some expressions of expanded code will be identical (with respect to EQ comparison) to expressions that were read from source code. In this case compiler can map some expressions from expanded code to source code.
The simple answer is that it is complicated ;-) There are several different things that contribute to being able to debug a program, and even more for tracking macros.
In C and C++, the preprocessor is used to expand macros and includes into actual source code. The originating filenames and line numbers are tracked in this expanded source file using #line directives.
http://msdn.microsoft.com/en-us/library/b5w2czay(VS.80).aspx
When a C or C++ program is compiled with debugging enabled, the assembler generates additional information in the object file that tracks source lines, symbol names, type descriptors, etc.
http://sources.redhat.com/gdb/onlinedocs/stabs.html
The operating system has features that make it possible for a debugger to attach to a process and control the process execution; pausing, single stepping, etc.
When a debugger is attached to the program, it translates the process stack and program counter back into symbolic form by looking up the meaning of program addresses in the debugging information.
Dynamic languages typically execute in a virtual machine, whether it is an interpreter or a bytecode VM. It is the VM that provides hooks to allow a debugger to control program flow and inspect program state.
I'm working on a refactoring tool for C with preprocessor support...
I don't know the kind of refactoring involved in large C projects and I would like to know what people actually do when refactoring C code (and preprocessor directives)
I'd like to know also if some features that would be really interesting are not present in any tool and so the refactoring has to be done completely manually... I've seen for instance that Xref could not refactor macros that are used as iterators (don't know exactly what that means though)...
thanks
Anybody interested in this (specific to C), might want to take a look at the coccinelle tool:
Coccinelle is a program matching and transformation engine which provides the language SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code. Coccinelle was initially targeted towards performing collateral evolutions in Linux. Such evolutions comprise the changes that are needed in client code in response to evolutions in library APIs, and may include modifications such as renaming a function, adding a function argument whose value is somehow context-dependent, and reorganizing a data structure. Beyond collateral evolutions, Coccinelle is successfully used (by us and others) for finding and fixing bugs in systems code.
Huge topic!
The stuff I need to clean up is contorted nests of #ifdefs. A refactoring tool would understand when conditional stuff appears in argument lists (function declaration or definitions), and improve that.
If it was really good, it would recognize that
#if defined(SysA) || defined(SysB) || ... || defined(SysJ)
was really equivalent to:
#if !defined(SysK) && !defined(SysL)
If you managed that, I'd be amazed.
It would allow me to specify 'this macro is now defined - which code is visible' (meaning, visible to the compiler); it would also allow me to choose to see the code that is invisible.
It would handle a system spread across over 100 top-level directories, with varying levels of sub-directories under those. It would handle tens of thousands of files, with lengths of 20K lines in places.
It would identify where macro definitions come from makefiles instead of header files (aargh!).
Well, since it is part of the preprocessor... #include refactoring is a huge huge topic and I'm not aware of any tools that do it really well.
Trivial problems a tool could tackle:
Enforcing consistent case and backslash usage in #includes
Enforce a consistent header guarding convention, automatically add redundant external guards, etc.
Harder problems a tool could tackle:
Finding and removing spurious includes.
Suggest the use of predeclarations wherever practical.
For macros... perhaps some sort of scoping would be interesting, where if you #define a macro inside a block, the tool would automatically #undef it at the end of a block. Other quick things I can think of:
A quick analysis on macro safety could be helpful as a lot of people still don't know to use do { } while (0) and other techniques.
Alternately, find and flag spots where expressions with side-effects are passed as macro arguments. This could possibly be really helpful for things like... asserts with unintentional side-effects.
Macros can often get quite complex, so I wouldn't try supporting much more than simple renaming.
I will tell you honestly that there are no good tools for refactoring C++ like there are for Java. Most of it will be painful search and replace, but this depends on the actual task. Look at Netbeans and Eclipse C++ plugins.
I've seen for instance that Xref could
not refactor macros that are used as
iterators (don't know exactly what
that means though)
To be honest, you might be in over your head - consider if you are the right person for this task.
If you can handle reliable renaming of various types, variables and macros over a big project with an arbitrarily complex directory hierarchy, I want to use your product.
Just discovered this old question, but I wanted to mention that I've rescued the free version of Xrefactory for C, now named c-xrefactory, which manages to do some refactorings in macros such as rename macro, rename macro parameter. It is an Emacs plugin.