How to perform step-wise expansion of a C preprocessor macro? - c-preprocessor

gdb has a documented, yet still unimplemented command (as of version 8.3) called macro expand-once. Its purpose is to perform a single-step macro expansion without recursing into other macro invocations. From the docs:
macro expand-onceexpression
macro exp1expression
(This command is not yet implemented.) Show the results of expanding those preprocessor macro invocations that appear explicitly in expression. Macro invocations appearing in that expansion are left unchanged. This command allows you to see the effect of a particular macro more clearly, without being confused by further expansions. Since GDB simply expands macros, but does not parse the result, expression need not be a valid expression; it can be any string of tokens.
What an excruciating tease! Such a feature would lay the groundwork for a conceptually simple, iterative gdb script to output each step of a macro expansion, which is exactly the information I am seeking. Whether or not it happens to be delivered by gdb is secondary to me, but I do want this to be automated somehow — I am tired of digging through code and writing everything out by hand.
Until the macro expand-once command of gdb is implemented, is there some other programmatic way to perform step-wise expansion of a C preprocessor macro? I imagine it might be possible by dumping macro definitions from cpp, parsing the output, and making a sort of "call graph", but maybe I'm being naively optimistic.
NOTE: Although the bounty note states, "Simply providing references to library functions for constructing a potential solution will not be rewarded the bounty," I may still accept such an answer if a bounty-qualifying solution has not been posted by the end of the bounty period.

According to the repository history, here is an excerpt from the ChangeLog, which dates back to 2002:
2002-05-16 Jim Blandy
Add commands for manually expanding macros and showing their
definitions.
However, it appears that it was never followed up on.
Per the GCC Documentation,
Macro expansion is a tricky operation, fraught with nasty corner cases and situations that render what you thought was a nifty way to optimize the preprocessor’s expansion algorithm wrong in quite subtle ways.
Perhaps this is why it was never implemented?
The preprocessor stores macro expansions in tokenized form. This saves repeated lexing passes during expansion, at the cost of a small increase in memory consumption on average. The tokens are stored contiguously in memory, so a pointer to the first one and a token count is all you need to get the replacement list of a macro.
In theory, it would be possible to step through each token via pointer arithmetic; however, if it were so simple then I cannot imagine why it has not already been implemented.

You could use wave (part of the boost library):
Example from the documenation:
File test.cpp:
// test.cpp
#define X(x) x
#define Y() 2
#define CONCAT_(x, y) x ## y
#define CONCAT(x, y) CONCAT_(x, y)
#pragma wave trace(enable)
// this macro expansion is to be traced
CONCAT(X(1), Y()) // should expand to 12
#pragma wave trace(disable)
Running wave -t test.trace test.cpp creates a file test.trace:
test.cpp:8:1: CONCAT(X(1), Y())
test.cpp:5:9: see macro definition: CONCAT(x, y)
invoked with
[
x = X(1)
y = Y()
]
[
test.cpp:2:9: see macro definition: X(x)
invoked with
[
x = 1
]
[
1
rescanning
[
1
]
]
test.cpp:3:9: see macro definition: Y()
[
2
rescanning
[
2
]
]
CONCAT_(1, 2)
rescanning
[
test.cpp:4:9: see macro definition: CONCAT_(x, y)
invoked with
[
x = 1
y = 2
]
[
12
rescanning
[
12
]
]
12
]
]

Related

How to define a macro that takes an arbitrary number of arguments and expands to give only the even arguments? [duplicate]

I am working on a recursive macro. However, it seems that it is not expanded recursively. Here is a minimal working example to show what I mean:
// ignore input, do nothing
#define ignore(...)
// choose between 6 names, depending on arity
#define choose(_1,_2,_3,_4,_5,_6,NAME,...) NAME
// if more than one parameter is given to this macro, then execute f, otherwise ignore
#define ifMore(f,...) choose(__VA_ARGS__,f,f,f,f,f,ignore)(__VA_ARGS__)
// call recursively if there are more parameters
#define recursive(first,args...) first:ifMore(recursive,args)
recursive(a,b,c,d)
// should print: a:b:c:d
// prints: a:recursive(b,c,d)
The recursive macro should expand itself recursively and always concatenate the result, separated with a colon. However, it doesn't work. The recursive macro is generated correctly (as can be seen on the result a:recursive(b,c,d) which includes a well-formed call to the macro again), but the generated recursive call ist not exanded.
Why is this the case and how can I get the behaviour I want?
You can't get the behaviour you want. The C preprocessor is, by design, not turing complete.
You can use multiple macros to get multiple replacements, but you will not achieve true recursion with an arbitrary number of replacements.
As others have mentioned, pure recursion is impossible with C macros. It is, however, possible to simulate recursion-like effects.
The Boost Pre-Processor tools do this well for both C and C++ and are a stand-alone library:
http://www.boost.org/doc/libs/1_60_0/libs/preprocessor/doc/index.html
The compiler pre-processor will not re expand the macro that you define. That is it will blindly replace whatever string is found in the macro statement with the string that it finds in the definition. For example, Can we have recursive macros? or Macro recursive expansion to a sequence and C preprocessor, recursive macros
That is, recursive(a,b,c,d) will be expanded to a:recursive(b,c,d) and the pre-processor will then continue to the next line in the base code. It will not loop around to try to continue to expand the string (see the links that I cited).

How to process macros in LEX?

How do I implement #define in yacc/bison?
For Example:
#define f(x) x*x
If anywhere f(x) appears in any function then it is replaced by the right side of the
macro substituting for the argument ‘x’.
For example, f(3) would be replaced with 3*3. The macro can call another macro too.
It's not usually possible to do macro expansion inside a parser, at least not C-style macros, because C-style macro expansion doesn't respect syntax. For example
#define IF if(
#define THEN )
is legal (although very bad style IMHO). But for that to be handled inside the grammar, it would be necessary to allow a macro identifier to appear anywhere in the input, not just where an identifier might be expected. The necessary modifications to the grammar are going to make it much less readable and are very likely to introduce parser action conflicts. [Note 1]
Alternatively, you could do the macro expansion in the lexical analyzer. The lexical analyzer is not a parser, but parsing a C-style macro invocation doesn't require much sophistication, and if macro parameters were not allowed, it would be even simpler. This is how Flex handles macro replacement in its regular expressions. ({identifier}, for example. [Note 2] Since Flex macros are just raw character sequences, not token lists as with C-style macros, they can be handled by pushing the replacement text back into the input stream. (F)lex provides the unput special action for this purpose. unput pushes one character back into the input stream, so if you want to push an entire macro replacement, you have to unput it one character at a time, back to front so that the last character unput is the first one to be read afterwards.
That's workable but ugly. And it's not really scalable to even the small feature list provided by the C preprocessor. And it violates the fundamental principle of software design, which is that each component does just one thing (so that it can do it well).
So that leaves the most common approach, which is to add a separate macro processor component, so that instead of dividing the parse into lexical scan/syntax analysis, the parse becomes lexical scan/macro expansion/syntax analysis. [Note 3]
A C-style macro processor which works between the lexical analyser and the syntactic analyser could itself be written in Bison. As I mentioned above, the parsing requirements are generally minimal, but there is still parsing to be done and Bison is presumably already part of the project. Although I don't know of any macro processor (other than proof-of-concept programs I've written myself) which do this, I think it's a very flexible solution. In particular, the Bison syntactic analysis phase could be implemented with a push-parser, which avoids the need to produce the entire macro-expanded token stream in order to make it available to a traditional pull-parser.
That's not the only way to design macros, though. Indeed, it has a lot of shortcomings, because the macro expansions are not hygienic, respecting neither syntax nor scope. Probably anyone who has used C macros has at one time or other been bitten by these problems; the simplest manifestation is defining a macro like:
#define NEXT(a) a + 1
and then writing
int x = NEXT(a) * 3;
which is not going to produce the expected result (unless what is expected is a violation of the syntactic form of the last statement). Also, any macro expansion which needs to use a local variable will sooner or later produce an incorrect expansion because of unexpected name collision. Hygienic macro expansion seeks to solve these issues by viewing macro expansion as an operation on syntax trees, not token streams, making the parsing paradigm lexical scan/syntax analysis/macro expansion (of the parse tree). For that operation, the appropriate tool might well be some kind of tree parser.
Notes
Also, you'd want to remove the token from the parse tree Yacc/bison does have a poorly-documented feature, YYBACKUP, which might possibly help be able to accomplish this. I don't know if that's one of its intended use cases; indeed, it is not clear to me what its intended use cases are.
The (f)lex documentation calls these definitions, but they really are macros, and they suffer from all the usual problems macros bring with them, such as mysterious interactions with surrounding syntax.
Another possibility is macro expansion/lexical scan/syntax analysis, which could be implemented using a macro processor like M4. But that completely divorces the macros from the rest of the language.
yacc and lex generate c source at the end. So you can use macros inside the parser and lexer actions.
The actual #define preprocessor directives can go in the first section of the lexer and parser file
%{
// Somewhere here
#define f(x) x*x
%}
These sections will be copied verbatim to the generated c source.

Understanding recursive Macro Expansions

I came across this question in an Embedded interview question set.
#define cat(x,y) x##y
concatenates x to y. But cat(cat(1,2),3) does not expand but gives preprocessor warning. Why?
Does C not encourage Recursive Macro expansions ? My assumption is the expression should display 1##2##3. Am i wrong ?
The problem is that cat(cat(1,2),3) isn't expanded in a normal way which you expect that cat(1,2) would give 12 and cat(12, 3) would give 123.
Macro parameters that are preceded or followed by ## in a replacement list aren't expanded at the time of substitution. As a result, cat(cat(1,2),3) expands to cat(1,2)3, which can't be further expanded since there is no macro named cat(1,2)3.
So the simple rule is that, macros whose replacement lists depends on ## usually can't be called in a nested fashion.

Using a #define-d list as input to a C preprocessor macro

In an example project, I defined the macro
#define FOO(x, y) x + y .
This works perfectly well. For example, FOO(42, 1337) is evaluated to 1379.
However, I now want to use another #define:
#define SAMPLE 42, 1337
When I now call FOO(SAMPLE), this won't work. The compiler tells me that FOO takes two arguments, but is only called with one argument.
I guess that the reason for this is that, although, the arguments of a macro are evaluated in advance of the function itself, that the preprocessor does not parse the whole instruction again after this evaluation. This is a similar to the fact that it is not possible to output additional preprocessor directives from a macro.
Is there any possibility to get the desired functionality?
Replacing the FOO macro with a C function is not a possibility. The original macro is located in third party code I cannot change, and it outputs a comma-separated list of values to be directly used in array initializers. Therefore, a C function cannot replicate the same behaviour.
If it is not possible to accomplish this task by using simple means: How would you store the (x, y) pairs in a maintainable form? In my case, there are 8 arguments. Therefore, storing the individual parts in separate #define-s is also not easy maintainable.
You're running into a problem where the preprocessor is not matching and expanding macros in the order you want. Now you can generally get it to do what you want by inserting some extra macros to force it to get the order right, but in order to that you need to understand what the normal order is.
when the compiler sees the name of a macro with arguments followed by a ( it first scans in that argument list, breaking it into arguments WITHOUT recognizing or expanding any macros in the arguments.
after parsing and separating the arguments, it then rescans each argument for macros, and expands any it finds withing the argument UNLESS the argument is used with # or ## in the macro body
it then replaces each instance of the argument in the body with the (now possibly expanded) argument
finally, it rescans the body for any OTHER macros that may exist with the body for expansion. In this one scan, the original macro WILL NOT be recognized and reexpanded, so you can't have recursive macro expansions
So you can get the effect you want by careful use of an EXPAND macro that takes a single argument and expands it, allowing you to force extra expansions at the right point in the process:
#define EXPAND(X) X
#define FOO(x,y) x + y
#define SAMPLE 42, 1337
EXPAND(FOO EXPAND((SAMPLE)))
In this case you first explicitly expand macros in the argument list, and then manually expand the resulting macro call afterwards.
Update by question poster
#define INVOKE(macro, ...) macro(__VA_ARGS__)
INVOKE(FOO, SAMPLE)
provides an extended solution that works without cluttering the code with EXPANDs.

Can a C macro definition refer to other macros?

What I'm trying to figure out is if something such as this (written in C):
#define FOO 15
#define BAR 23
#define MEH (FOO / BAR)
is allowed? I would want the preprocessor to replace every instance of
MEH
with
(15 / 23)
but I'm not so sure that will work. Certainly if the preprocessor only goes through the code once then I don't think it'd work out the way I'd like.
I found several similar examples but all were really too complicated for me to understand. If someone could help me out with this simple one I'd be eternally grateful!
Short answer yes. You can nest defines and macros like that - as many levels as you want as long as it isn't recursive.
The answer is "yes", and two other people have correctly said so.
As for why the answer is yes, the gory details are in the C standard, section 6.10.3.4, "Rescanning and further replacement". The OP might not benefit from this, but others might be interested.
6.10.3.4 Rescanning and further replacement
After all parameters in the replacement list have been substituted and
# and ## processing has taken place, all placemarker preprocessing tokens are removed.
Then, the resulting preprocessing token sequence
is rescanned, along with all subsequent preprocessing tokens of the
source file, for more macro names to replace.
If the name of the macro being replaced is found during this scan of
the replacement list (not including the rest of the source file's
preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced, it is not
replaced. These nonreplaced macro name preprocessing tokens are no
longer available for further replacement even if they are later
(re)examined in contexts in which that macro name preprocessing token
would otherwise have been replaced.
The resulting completely macro-replaced preprocessing token sequence
is not processed as a preprocessing directive even if it resembles
one, but all pragma unary operator expressions within it are then
processed as specified in 6.10.9 below.
Yes, it's going to work.
But for your personal information, here are some simplified rules about macros that might help you (it's out of scope, but will probably help you in the future). I'll try to keep it as simple as possible.
The defines are "defined" in the order they are included/read. That means that you cannot use a define that wasn't defined previously.
Usefull pre-processor keyword: #define, #undef, #else, #elif, #ifdef, #ifndef, #if
You can use any other previously #define in your macro. They will be expanded. (like in your question)
Function macro definitions accept two special operators (# and ##)
operator # stringize the argument:
#define str(x) #x
str(test); // would translate to "test"
operator ## concatenates two arguments
#define concat(a,b) a ## b
concat(hello, world); // would translate to "helloworld"
There are some predefined macros (from the language) as well that you can use:
__LINE__, __FILE__, __cplusplus, etc
See your compiler section on that to have an extensive list since it's not "cross platform"
Pay attention to the macro expansion
You'll see that people uses a log of round brackets "()" when defining macros. The reason is that when you call a macro, it's expanded "as is"
#define mult(a, b) a * b
mult(1+2, 3+4); // will be expanded like: 1 + 2 * 3 + 4 = 11 instead of 21.
mult_fix(a, b) ((a) * (b))
Yes, and there is one more advantage of this feature. You can leave some macro undefined and set its value as a name of another macro in the compilation command.
#define STR "string"
void main() { printf("value=%s\n", VALUE); }
In the command line you can say that the macro "VALUE" takes value from another macro "STR":
$ gcc -o test_macro -DVALUE=STR main.c
$ ./test_macro
Output:
value=string
This approach works as well for MSC compiler on Windows. I find it very flexible.
I'd like to add a gotcha that tripped me up.
Function-style macros cannot do this.
Example that doesn't compile when used:
#define FOO 1
#define FSMACRO(x) FOO + x
Yes, that is supported. And used quite a lot!
One important thing to note though is to make sure you paranthesize the expression otherwise you might run into nasty issues!
#define MEH FOO/BAR
// vs
#define MEH (FOO / BAR)
// the first could be expanded in an expression like 5 * MEH to mean something
// completely different than the second

Resources