From The C Programming Language, by KRC
After
#define cat(x, y) x ## y
the call cat(var, 123) yields var123. However, the call
cat(cat(1,2),3) is undefined: the presence of ## prevents
the arguments of the outer call from being expanded. Thus it
produces the token string cat ( 1 , 2 )3
and )3 (the catenation of the last token of the first argument with
the first token of the second) is not a legal token.
If a second level of macro definition is introduced,
#define xcat(x, y) cat(x,y)
things work more smoothly; xcat(xcat(1, 2), 3) does produce
123, because the expansion of xcat itself does not involve the
## operator.
What is the property of ## that makes the difference between the two examples?
Why is the inner cat(1,2) in the first example not expanded, while the inner xcat(1,2) in the second example is?
Thanks!
It is one of the (not-so-well-known) characteristics of the macro ## operator that it inhibits further expansion of its arguments (it just considers them plain strings). An excerpt from the gcc pre-processor docs:
...As with stringification, the actual argument is not macro-expanded first...
That is, arguments to ## are not expanded.
By implementing the additional indirection using your xcat macro you are working around the problem (A process that is called the argument prescan is jumping in and actually evaluates the resulting string twice)
Related
If we define a macro
#define M(x, ...) { x, __VA_ARGS__ }
and then use it passing itself as an argument
M(M(1, 2), M(3, 4), M(5, 6))
then it expands to the expected form:
{ { 1, 2 }, { 3, 4 }, { 5, 6 } }
However, when we use the ## operator (to prevent dangling comma from appearing in the output in the case of the single argument invocations, as documented in the GCC manual), i.e.
#define M0(x, ...) { x, ## __VA_ARGS__ }
then the expansion of arguments in
M0(M0(1,2), M0(3,4), M0(5,6))
seems to stop after the first argument, i.e. we get:
{ { 1,2 }, M0(3,4), M0(5,6) }
Is this behavior a bug, or does it stem from some principle?
(I have also checked it with clang, and it behaves in the same way as GCC)
Way down at the end of this answer there is a possible solution.
Is this behavior a bug, or does it stem from some principle?
It stems from two principles whose interaction is pretty subtle. So I agree that it is surprising, but it's not a bug.
The two principles are the following:
Inside the replacement of macro invocation, that macro is not expanded. (See the GCC Manual Section 3.10.5, Self-Referential Macros or the C Standard, §6.10.3.4 paragraph 2.) This precludes recursive macro expansion, which in most cases would produce infinite recursion if allowed. Although it is likely that no-one anticipated such uses, it turns out that there would be ways of using recursive macro expansion which would not result in infinite recursion (see the Boost Preprocessor Library documentation for a thorough discussion of this issue), but the standard isn't going to get changed now.
If ## is applied to a macro argument, it suppresses macro expansion of that argument. (See the GCC Manual section 3.5, Concatenation or the C Standard, §6.10.3.3 paragraph 2.) The suppression of expansion is part of the C Standard, but GCC/Clang's extension to allow use of ## to conditionally suppress the comma preceding __VA_ARGS__ is non-standard. (See the GCC Manual Section 3.6, Variadic Macros.) Apparently, the extension still respects the standard's rule about not expanding concatenated macro arguments.
Now, the curious thing about the second point, with respect to optional comma suppression, is that you hardly ever notice it in practice. You can use ## to conditionally suppress commas and arguments will still get expanded as normal:
#define SHOW_ARGS(arg1, ...) Arguments are (arg1, ##__VA_ARGS__)
#define DOUBLE(a) (2 * a)
SHOW_ARGS(DOUBLE(2))
SHOW_ARGS(DOUBLE(2), DOUBLE(3))
This expands to:
Arguments are ((2 * 2))
Arguments are ((2 * 2), (2 * 3))
Both DOUBLE(2) and DOUBLE(3) are expanded normally, despite the fact that one of them is an argument to the concatenation operator.
But there's a subtlety to macro expansion. Expansion happens twice:
First, macro arguments are expanded. (This expansion is in the context of the text which invokes the macro.) These expanded arguments are substituted for the parameters in the macro replacement body (but only where the parameter is not an argument to # or ##).
Then the # and ## operators are applied to the replacement token list.
Finally, the resulting replacement tokens are inserted into the input stream, so that they are expanded again. This time, the expansion is in the context of the macro so recursive invocation is suppressed.
With that in mind, we see that in SHOW_ARGS(DOUBLE(2), DOUBLE(3)), DOUBLE(2) is expanded in step 1, before being inserted into the replacement token list, and DOUBLE(3) is expanded in step 3, as part of the replacement token list.
This doesn't make a difference with DOUBLE inside SHOW_ARGS, since they're different macros. But the difference would become apparent if they were the same macro.
To see the difference, consider the following macro:
#define INVOKE(A, ...) A(__VA_ARGS__)
That macro creates a macro invocation (or a function invocation, but here we're only interested in the case where it's a macro). That is, in turns INVOKE(X, Y) into X(Y). (That's a simplification of a useful feature, where the named macro is actually invoked several times, possibly with slightly different arguments.)
That works fine with SHOW_ARGS:
INVOKE(SHOW_ARGS, one arg)
⇒ Arguments are (one arg)
But if we try to INVOKE the macro INVOKE itself, we find that the ban on recursive invocation takes effect:
INVOKE(INVOKE, SHOW_ARGS, one arg)
⇒ INVOKE(SHOW_ARGS, one arg)
"Of course", we could expand INVOKE as an argument to INVOKE:
INVOKE(SHOW_ARGS, INVOKE(SHOW_ARGS, one arg))
⇒ Arguments are (Arguments are (one arg))
That works fine because there is no ## inside INVOKE, so expansion of the argument is not suppressed. But if the expansion of the argument had been suppressed, then the argument would be inserted into the macro body unexpanded, and then it would become a recursive expansion.
So that's what is going on in your example:
#define M0(x, ...) { x, ## __VA_ARGS__ }
M0(M0(1,2), M0(3,4), M0(5,6))
⇒ { { 1,2 }, M0(3,4), M0(5,6) }
Here, the first argument to the outer M0, M0(1,2), is not used with ##, so it is expanded as part of the invocation. The other two arguments are part of __VA_ARGS__, which is used with ##. Consequently, they are not expanded prior to being substituted into the macro's replacement list. But as part of the macro's replacement list, their expansion is suppressed by the no-recursive-macros rule.
You can easily work around that by defining two versions of the M0 macro, with the same contents but different names (as suggested in a comment to the OP):
#define M0(x, ...) { x, ## __VA_ARGS__ }
M0(M1(1,2), M1(3,4), M1(5,6))
⇒ { { 1,2 }, { 3,4 }, { 5,6 } }
But that's not very pleasant.
Solution: Use __VA_OPT__
C++2a will include a new feature designed specifically to assist with suppressing commas in variadic invocations: the __VA_OPT__ function-like macro. Inside a variadic macro expansion, __VA_OPT__(x) expands to its argument provided that there is at least one token in the variadic arguments. But if __VA_ARGS__ expands to an empty token list, so does __VA_OPT__(x). Thus, __VA_OPT__(,) can be used for conditional suppression of a comma just like the GCC ## extension, but unlike ##, it does not trigger suppression of macro expansion.
As an extension to the C standard, recent versions of GCC and Clang implement __VA_OPT__ for C as well as C++. (See the GCC Manual Section 3.6, Variadic Macros.) So if you're willing to rely on relatively recent compiler versions, there is a very clean solution:
#define M0(x, ...) { x __VA_OPT__(,) __VA_ARGS__ }
M0(M0(1,2), M0(3,4), M0(5,6))
⇒ { { 1 , 2 } , { 3 , 4 }, { 5 , 6 } }
Notes:
You can see these examples on Godbolt
This question was originally closed as a duplicate of Variadic macros: expansion of pasted tokens but I don't think that answer is really adequate to this particular situation.
I see below code snippet in fwts code base:
#define FWTS_CONCAT(a, b) a ## b
#define FWTS_CONCAT_EXPAND(a,b) FWTS_CONCAT(a, b)
#define FWTS_ASSERT(e, m) \
enum { FWTS_CONCAT_EXPAND(FWTS_ASSERT_ ## m ## _in_line_, __LINE__) = 1 / !!(e) }
#define FWTS_REGISTER_FEATURES(name, ops, priority, flags, features) \
/* Ensure name is not too long */ \
FWTS_ASSERT(FWTS_ARRAY_LEN(name) < 16, \
fwts_register_name_too_long);
My questions are:
For the definition of FWTS_ASSERT(e, m), I know the !! can convert whatever value into 1 or 0. But doesn't it cause error for FWTS_ASSERT() when !!(e) evaluates to 0 thus leads to 1/0 ?
And btw, the FWTS_CONCAT_EXPAND(a,b) and FWTS_CONCAT(a, b) seem to be duplicated, why do we need 2 of them?
ADD 1
Based on #Klas Lindbäck's answer, I want to go through the macro expansion with a concrete example.
Suppose I have:
#define M_1 abc
#define M_2 123
Then I guess the expansion process of FWTS_CONCAT_EXPAND(M_1,M_2) should be:
FWTS_CONCAT_EXPAND(M_1,M_2)
->
FWTS_CONCAT(abc, 123)
->
abc123
If I directly applying FWTS_CONCAT(M_1, M_2), will it be expanded like this?
FWTS_CONCAT(M_1, M_2)
->
M_1M_2
->
Bang! M_1M_2 is an invalid symbol!
(Please correct me if I am wrong...)
ADD 2
Tried with gcc -E macroTest.c -o macroTest.i:
(macroTest.c)
#define M_1 abc
#define M_2 123
#define FWTS_CONCAT(a, b) a ## b
#define FWTS_CONCAT_EXPAND(a,b) FWTS_CONCAT(a, b)
FWTS_CONCAT_EXPAND(M_1, M_2)
FWTS_CONCAT(M_1,M_2)
(macroTest.i)
# 1 "macroTest.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "macroTest.c"
abc123
M_1M_2
I think I get the point of the macro expansion rule. Below are some related concepts and quotation:
Argument Prescan:
Macro arguments are completely macro-expanded before they are
substituted into a macro body, unless they are stringified or pasted
with other tokens. After substitution, the entire macro body,
including the substituted arguments, is scanned again for macros to be
expanded. The result is that the arguments are scanned twice to expand
macro calls in them.
Stringification
When a macro parameter is used with a leading ‘#’, the preprocessor
replaces it with the literal text of the actual argument, converted to
a string constant.
Token Pasting / Token Concatenation:
It is often useful to merge two tokens into one while expanding
macros. This is called token pasting or token concatenation. The ‘##’
preprocessing operator performs token pasting. When a macro is
expanded, the two tokens on either side of each ‘##’ operator are
combined into a single token, which then replaces the ‘##’ and the two
original tokens in the macro expansion.
So the detailed process of my scenario is like this:
FWTS_CONCAT_EXPAND(M_1, M_2)
-> FWTS_CONCAT_EXPAND(abc, 123) // M_1, M_2 pre-expanded since FWTS_CONCAT_EXPAND has no ##.
-> FWTS_CONCAT(abc, 123) // FWTS_CONCAT_EXPAND expanded into FWTS_CONCAT
-> abc123 // FWTS_CONCAT expanded
FWTS_CONCAT(M_1,M_2)
-> M_1M_2 //M_1, M_2 are not pre-expanded because of the ## in FWTS_CONCAT
-> DEADEND
The macros are used for compile time checking. This is useful when you write code that will be compiled and run on many different platforms and where some platforms may not be compatible.
If the first parameter to FWTS_ASSERT evaluates to non-zero (true) then !!(e) will evaluate to 1 and the enum will be created with the name FWTS_ASSERT_<second parameter>_in_line_<line>. I suspect that the enum is never actually used.
If the first parameter to FWTS_ASSERT evaluates to 0 (= false) then the compiler will try to compute 1/0 and generate a compiler error where it will hopefully tell which enum member caused the error, in this case FWTS_ASSERT_fwts_register_name_to_long_in_line_4.
And btw, the FWTS_CONCAT_EXPAND(a,b) and FWTS_CONCAT(a, b) seem to be duplicated, why do we need 2 of them?
FTW_CONCAT_EXPAND is done in 2 steps because we want to first expand any macros in the parameters and then perform the concatenation. Doing it in two steps makes the preprocessor do macro expansion of the parameters before it does the string concatenation.
This question already has answers here:
C preprocessor: stringize macro and identity macro
(2 answers)
What does #x inside a C macro mean?
(4 answers)
How can I concatenate twice with the C preprocessor and expand a macro as in "arg ## _ ## MACRO"?
(3 answers)
Closed 6 years ago.
Please explain the code
#include <stdio.h>
#define A(a,b) a##b
#define B(a) #a
#define C(a) B(a)
main()
{
printf("%s\n",C(A(1,2)));
printf("%s\n",B(A(1,2)));
}
Output
12
A(1,2)
I don't understand, how the first printf evaluates to 12?
Isn't it similar to the second, as C macro is simply a wrapper to B macro?
As mentioned in Wikipedia in C-preprocessor :
The ## operator (known as the "Token Pasting Operator") concatenates
two tokens into one token.
The # operator (known as the "Stringification Operator") converts a
token into a string, escaping any quotes or backslashes appropriately.
If you want to stringify the expansion of a macro argument, you have
to use two levels of macros:
You cannot combine a macro argument with additional text and stringify
it all together. You can however write a series of adjacent string
constants and stringified arguments: the C compiler will then combine
all the adjacent string constants into one long string.
#define xstr(s) str(s)
#define str(s) #s
#define foo 4
str (foo) // outputs "foo"
xstr (foo) // outputs "4"
Also, from C-FAQ Question 11.17 :
It turns out that the definition of # says that it's supposed to
stringize a macro argument immediately, without further expanding it
(if the argument happens to be the name of another macro).
So, similarly, going along these lines :
you're doing C(A(1,2)),
which would roll to C(12), // since no #, so inner argument is expanded
and then to B(12)
// [since you've done two levels of macros in the code:
// 1. from C() to B(), and then, 2. B() to #a]
= 12 .
Whereas, in the first case, only 1 level of stringification is plainly done as per definition of B(a)(since it gets stringified immediately because of #)
macro-replacement of B(A(1,2))
= stringification of A(1,2)
= A(1,2).
The confusion here comes from a simple rule.
When evaluating a macro the pre-processor first resolves the macros in the arguments passed to the macro. However, as a special case, if an argument is right of # or adjacent to ##, it doesn't resolve macros within such arguments. Such are the rules.
Your first case
C(A(1,2))
The pre-processor first applies the C(a) macro, which is defined as B(a). There's no # or ## adjacent to the argument in the definition (none of them in B(a) at all), thus the pre-processor must resolve macros in the argument:
A(1,2)
The definition of A(a,b) is a##b which evaluates into 12.
After the macros in the arguments of the C(a) macro are evaluated, the C macro becomes:
C(12)
The pre-processor now resolves the C(a) macro, which according to its definition becomes
B(12)
Once this is done, the pre-processor evaluates macros inside the result once again and applies the B(a) macro, so the result becomes
"12"
Your second case
B(A(1,2))
Similar to the first case, the pre-processor first applies the B(a) macro. But this time, the definition of the macro is such that the argument is preceded by #. Therefore, the special rule applies and macros inside the argument are not evaluated. Therefore, the result immediately becomes:
"A(1,2)"
The preprocessor goes over the result again trying to find more macros to expand, but now everything is a part of the string, and macros don't get expanded within strings. So the final result is:
"A(1,2)"
C preprocessor has two operators # and ##. The # operator turns the argument of a function like macro to a quoted string where ## operator concatenates two identifiers.
#define A(a,b) a##b will concatenate a with b returning ab as string.
so A(1,2) will return 12
#define B(a) #a will return a as string
#define C(a) B(a) will call previous one and return a as string.
so C(A(1,2)) = C(12) = B(12) = 12 (as string)
B(A(1,2)) = A(1,2) because A(1,2) is taken as an argument and returned as string A(1,2)
There are two operators used in the function-like macros:
## causes a macro to concatenate two parameters.
# causes the input to be effectively turned into a string literal.
In A(a,b) ## causes a to be concatenated with b. In B(a), # effectively creates a string literal from the input. So the expansion runs as follows:
C(A(1,2)) -> C(12) -> B(12) -> "12"
B(A(1,2)) -> "A(1,2)"
Because for C(A(1,2)), the A(1,2) part is evaluated first to turn into 12, the two statements aren't equal like they would appear to be.
You can read more about these at cppreference.
I was reading a book on C programming language where I found:
#define cat(x,y) x##y
#define xcat(x,y) cat(x,y)
calling cat(cat(1,2),3) produces error whereas calling xcat(xcat(1,2),3) produces expected result 123.
How are both working differently ?
Macros whose replacement lists depends on ## usually can't be called in nested fashion.
cat(cat(1,2),3) is not expanded in a normal fashion, with cat(1,2) yielding 12 and then cat(12, 3) yielding 123.
Macro parameters that are preceded or followed by ## in a replacement list aren't expanded at the time of substitution.
6.10.3.1 Argument substitution
1 After the arguments for the invocation of a function-like macro have been identified,
argument substitution takes place. A parameter in the replacement list, unless preceded
by a # or ## preprocessing token or followed by a ## preprocessing token (see below), is replaced by the corresponding argument after all macros contained therein have been
expanded. Before being substituted, each argument’s preprocessing tokens are
completely macro replaced as if they formed the rest of the preprocessing file; no other
preprocessing tokens are available.
As a result, cat(cat(1,2),3) expands to cat(1,2)3, which can't be expanded further, because there is no macro named cat(1,2)3.
In case
#define xcat(x,y) cat(x,y)
writing xcat(xcat(1,2),3) will work. As the preprocessor expands the outer call of xcat, it will expand xcat(1,2) as well; the difference is that xcat's replacement list does not contain ## anymore.
xcat(xcat(1,2),3) ==> cat(12, 3) ==> 12##3 ==> 123
This is a very simplified version of some code I just ran into at work:
#include <stdio.h>
#define F(G) G(1)
#define G(x) x+1
int main() {
printf("%d\n", F(G));
}
prints 2.
Now, I can see that F(G) expands to G(1) and then G(1) expands to 2, but its not clear to me why. I would have expected to get an error that G is not a function from the printf line.
How does the pre-processor parse code like this?
A function-like macro is only invoked if its name is followed by a (.
In F(G), G is not followed by a (, so the G there is not a macro invocation.
In F(G) G(1), G is a macro parameter and thus is not macro-replaced directly (this is a very confusing macro you've got :-O). In G(1), G is replaced by the argument corresponding to the parameter G, which also happens to be G. That replacement is then rescanned and G(1) is evaluated to 1 + 1.
If we rewrite your macros so that you aren't using G in multiple different ways, it's far easier to understand:
#define F(x) x(1)
#define G(x) x + 1
Here, F(G) is replaced by G(1). This is then rescanned, and the invocation of G is evaluated, yielding 1 + 1.
Expanding on James McNellis' answer, the C99 standard prescribes:
6.10.3.4 Rescanning and further replacement
1 After all parameters in the replacement list have been substituted and # and ##
processing has taken place, all placemarker preprocessing tokens are removed. Then, the
resulting preprocessing token sequence is rescanned, along with all subsequent
preprocessing tokens of the source file, for more macro names to replace.
2 If the name of the macro being replaced is found during this scan of the replacement list
(not including the rest of the source file’s preprocessing tokens), it is not replaced.
Furthermore, if any nested replacements encounter the name of the macro being replaced,
it is not replaced. These nonreplaced macro name preprocessing tokens are no longer
available for further replacement even if they are later (re)examined in contexts in which
that macro name preprocessing token would otherwise have been replaced.
3 The resulting completely macro-replaced preprocessing token sequence is not processed
as a preprocessing directive even if it resembles one, but all pragma unary operator
expressions within it are then processed as specified in 6.10.9 below.
#defines do very basic string replacement:
printf("%d\n", F(G));
goes to
printf("%d\n", G(1));
which goes to:
printf("%d\n", 1+1);
The preprocessor makes one pass, but you are thinking that it makes one pass per #define. So during its pass the preprocessor matches and replaces F(G) but doesn't match any G(x).