How to find compiler errors in C macros? - c

We have a code base that relies on lots of generated code generated by C macros.
If something goes wrong and there is a error or a warning, the compiler points at the line of the first macro expansion without telling more about where it went wrong inside the expanded code. I my particular case they are those /analyze warnings in Visual Studio.
Are there any tricks and tips that help find problems in complex preprocessor macros?
EDIT:
If you wonder why this code base have complex macros.
This is an emulator project where the decoding phase and execution phase is separated. For example instead of finding out during the execution of each instruction what addressing mode or operand size, etc is used, we generate a function for each combination with a DEFINE_INSTRUCTION macro which in turn generate the functions for all combinations. And chain these functions.

idea: dont ;) don't use macros that are complicated as you loose a lot of IDE support / compiler support
=> if you have such macros, refactor them into functions... maybe even inline functions
but seriously. to help you with the bad macros you're stuck with: As TripeHound said, there are flags to 'compile' C files only to the stage of preprocessed C files --
On the command line, clang -E foo.m will show you the preprocessed output.

Related

Is it possible to see the macros of a compiled C program?

I am trying to learn C and I have this C file that I want view the macros of. Is there a tool to view the macros of the compiled C file.
No. That's literally impossible.
The preprocessor is a textual replacement that happens before the main compile pass. There is no difference between using a macro and putting the code the macro expands to in its place.*
*Ignoring the debugger output. But even then you can do it if you know the right #pragma to tell it the file and line number.
They're always defined in the header file(s) that you've imported with #include, or that those files in turn #include.
This may involve a lot of digging. It may involve going into files that make no sense to you because they're not written for casual inspection.
Any macros of any importance are usually documented. They may use other more complex implementation-specific macros that you shouldn't concern yourself with ordinarily, but if you're curious how they work the source is all there.
That being said, this is only relevant if you have the source and more specifically a complete build environment. Once compiled all these definitions, like the source itself, do not appear in the executable and cannot be inferred directly from the executable, especially not a release build.
Unlike Java or C#, C compiles directly to machine code so there's no way to easily reverse that back to the source. There are "decompilers" that try, but they can only really guess as to the original source. VM-based languages like Java and C# only lightly compile the code, sot here are a lot of hints as to how that code was generated and reversing it is an easier process.

Extending the C preprocessor to inject code

I am working on a project where I need to inject code to C (or C++) files given some smart comments in the source. The code injected is provided by an external file. Does anyone know of any such attempts and can point me to examples - of course I need to preserve original line numbers with #line. My thinking is to replace the cpp with a script which first does this and then calls the system cpp.
Any suggestions will be appreciated
Thanks
Danny
Providing your modified cpp external program won't usually work, at least in recent GCC where the preprocessing is internal to the compiler (so is part of cc1 or cc1plus). Hence, there is no more any cpp program involved in most GCC compilations (but libcpp is an internal library of GCC).
If using mostly GCC, I would suggest to inject code with you own #pragmas (not comments!). You could add your own GCC plugin, or code your own MELT extension, for that purpose (since GCC plugins can add pragmas and builtins but cannot currently affect preprocessing).
As Ira Baxter commented, you could simply put some weird macro invocations and define these macros in separate files.
I don't exactly guess what precise kind of code injection you want.
Alternatively, you could generate your C or C++ code with your own generator (which could emit #line directives) and feed that to gcc

Preprocessor-like substitution into a parser

I am making a parser currently which aims to be able to input data in a program.
The syntax used is greatly inspired from C.
I would enjoy to reproduct a kind of preprocessor inline substitution into it.
for example
#define HELLO ((variable1 + variable2 + variable3))
int variable1 = 37;
int variable2 = 82;
int variable3 = 928;
Thing is... I'm actually using C. I'm also using standard functions from stdio.h to parse through my files.
So... what techniques I could use to make this work correctly and efficiently?
Does the standard compilers substitute the text by re-copying the stream buffer and making the substitution there as the re-copying occurs or what? Is there more efficient techniques?
I guess we say preprocessor because it first substitutes everything until theres no preproc directives (recursive approach maybe?), and then, it starts doing the real compile job?
Excuse my lack of knowledge!
Thanks!
No, modern C compilers don't implement the preprocessor as a text processor, but they have the different compiler phases (preprocessing being one of them) tangled. This is particularly important for the efficiency of the compiler itself and to be able to track errors back into the original source code.
Also implementing a preprocessor by yourself is a tedious task. Think twice before you start such a project.
Yes, you are right about preprocessors. It has the job of bringing together all files which are requires for the execution of the program to 1 file for eg. stdio.h. Then it allows the compiler to compile the program. The file you want to compile is given as argument to the compiler and the techniques used by the compiler may vary according to the os and the compiler itself
The C preprocessor works on tokens not text. In particular, macro expansion cannot contain preprocessor directives. Other preprocessors, such as m4, work differently.

C/C++ Compiler listing what's defined

This question : Is there a way to tell whether code is now being compiled as part of a PCH? lead me to thinking about this.
Is there a way, in perhaps only certain compilers, of getting a C/C++ compiler to dump out the defines that it's currently using?
Edit: I know this is technically a pre-processor issue but let's add that within the term compiler.
Yes. In GCC
g++ -E -dM <file>
I would bet it is possible in nearly all compilers.
Boost Wave (a preprocessor library that happens to include a command line driver) includes a tracing capability to trace macro expansions. It's probably a bit more than you're asking for though -- it doesn't just display the final result, but essentially every step of expanding a macro (even a very complex one).
The clang preprocessor is somewhat similar. It's also basically a library that happens to include a command line driver. The preprocessor defines a macro_iterator type and macro_begin/macro_end of that type, that will let you walk the preprocessor symbol table and do pretty much whatever you want with it (including printing out the symbols, of course).

How does a macro-enabled language keep track of the source code for debugging?

This is a more theoretical question about macros (I think). I know macros take source code and produce object code without evaluating it, enabling programmers to create more versatile syntactic structures. If I had to classify these two macro systems, I'd say there was the "C style" macro and the "Lisp style" macro.
It seems that debugging macros can be a bit tricky because at runtime, the code that is actually running differs from the source.
How does the debugger keep track of the execution of the program in terms of the preprocessed source code? Is there a special "debug mode" that must be set to capture extra data about the macro?
In C, I can understand that you'd set a compile time switch for debugging, but how would an interpreted language, such as some forms of Lisp, do it?
Apologize for not trying this out, but the lisp toolchain requires more time than I have to spend to figure out.
I don't think there's a fundamental difference in "C style" and "Lisp style" macros in how they're compiled. Both transform the source before the compiler-proper sees it. The big difference is that C's macros use the C preprocessor (a weaker secondary language that's mostly for simple string substitution), while Lisp's macros are written in Lisp itself (and hence can do anything at all).
(As an aside: I haven't seen a non-compiled Lisp in a while ... certainly not since the turn of the century. But if anything, being interpreted would seem to make the macro debugging problem easier, not harder, since you have more information around.)
I agree with Michael: I haven't seen a debugger for C that handles macros at all. Code that uses macros gets transformed before anything happens. The "debug" mode for compiling C code generally just means it stores functions, types, variables, filenames, and such -- I don't think any of them store information about macros.
For debugging programs that use
macros, Lisp is pretty much the same
as C here: your debugger sees the
compiled code, not the macro
application. Typically macros are
kept simple, and debugged
independently before use, to avoid
the need for this, just like C.
For debugging the macros
themselves, before you go and use it somewhere, Lisp does have features
that make this easier than in C,
e.g., the repl and
macroexpand-1 (though in C
there is obviously a way to
macroexpand an entire file, fully, at
once). You can see the
before-and-after of a macroexpansion,
right in your editor, when you write
it.
I can't remember any time I ran across a situation where debugging into a macro definition itself would have been useful. Either it's a bug in the macro definition, in which case macroexpand-1 isolates the problem immediately, or it's a bug below that, in which case the normal debugging facilities work fine and I don't care that a macroexpansion occurred between two frames of my call stack.
In LispWorks developers can use the Stepper tool.
LispWorks provides a stepper, where one can step through the full macro expansion process.
You should really look into the kind of support that Racket has for debugging code with macros. This support has two aspects, as Ken mentions. On one hand there is the issue of debugging macros: in Common Lisp the best way to do that is to just expand macro forms manually. With CPP the situation is similar but more primitive -- you'd run the code through only the CPP expansion and inspect the result. However, both of these are insufficient for more involved macros, and this was the motivation for having a macro debugger in Racket -- it shows you the syntax expansion steps one by one, with additional gui-based indications for things like bound identifiers etc.
On the side of using macros, Racket has always been more advanced than other Scheme and Lisp implementations. The idea is that each expression (as a syntactic object) is the code plus additional data that contains its source location. This way when a form is a macro, the expanded code that has parts coming from the macro will have the correct source location -- from the definition of the macro rather than from its use (where the forms are not really present). Some Scheme and Lisp implementations will implement a limited for of this using the identity of subforms, as dmitry-vk mentioned.
I don't know about lisp macros (which I suspect are probably quite different than C macros) or debugging, but many - probably most - C/C++ debuggers do not handle source-level debugging of C preprocessor macros particularly well.
Generally, C/C++ debuggers they don't 'step' into the macro definition. If a macro expands into multiple statements, then the debugger will usually just stay on the same source line (where the macro is invoked) for each debugger 'step' operation.
This can make debugging macros a little more painful than they might otherwise be - yet another reason to avoid them in C/C++. If a macro is misbehaving in a truly mysterious way, I'll drop into assembly mode to debug it or expand the macro (either manually or using the compiler's switch). It's pretty rare that you have to go to that extreme; if you're writing macros that are that complicated, you're probably taking the wrong approach.
Usually in C source-level debugging has line granularity ("next" command) or instruction-level granularity ("step into"). Macro processors insert special directives into processed source that allow compiler to map compiled sequences of CPU instructions to source code lines.
In Lisp there exists no convention between macros and compiler to track source code to compiled code mapping, so it is not always possible to do single-stepping in source code.
Obvious option is to do single stepping in macroexpanded code. Compiler already sees final, expanded, version of code and can track source code to machine code mapping.
Other option is to use the fact that lisp expressions during manipulation have identity. If the macro is simple and just does destructuring and pasting code into template then some expressions of expanded code will be identical (with respect to EQ comparison) to expressions that were read from source code. In this case compiler can map some expressions from expanded code to source code.
The simple answer is that it is complicated ;-) There are several different things that contribute to being able to debug a program, and even more for tracking macros.
In C and C++, the preprocessor is used to expand macros and includes into actual source code. The originating filenames and line numbers are tracked in this expanded source file using #line directives.
http://msdn.microsoft.com/en-us/library/b5w2czay(VS.80).aspx
When a C or C++ program is compiled with debugging enabled, the assembler generates additional information in the object file that tracks source lines, symbol names, type descriptors, etc.
http://sources.redhat.com/gdb/onlinedocs/stabs.html
The operating system has features that make it possible for a debugger to attach to a process and control the process execution; pausing, single stepping, etc.
When a debugger is attached to the program, it translates the process stack and program counter back into symbolic form by looking up the meaning of program addresses in the debugging information.
Dynamic languages typically execute in a virtual machine, whether it is an interpreter or a bytecode VM. It is the VM that provides hooks to allow a debugger to control program flow and inspect program state.

Resources