Macro replacement list rescanning for replacement

Macro replacement list rescanning for replacement - c

I'm reading the Standard N1570 about macro replacement and misunderstand some wording from 6.10.3.4.
1 After all parameters in the replacement list have been substituted
and # and ## processing has taken place, all placemarker preprocessing
tokens are removed. The resulting preprocessing token sequence is then
rescanned, along with all subsequent preprocessing tokens of the
source file, for more macro names to replace
So after all # and ## are resolved we rescan the replacement list. But the section 2 specifies:
2 If the name of the macro being replaced is found during this scan of
the replacement list (not including the rest of the source file’s
preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced, it is not
replaced.
It looks contradictory to me. So what kind of replacement possible in that rescan? I tried the following example:
#define FOOBAR(a, b) printf(#a #b)
#define INVOKE(a, b) a##b(a, b)
int main() {
INVOKE(FOO, BAR); //expands to printf("FOO" "BAR")
}
So INVOKE(FOO, BAR) expands to FOOBAR(FOO, BAR) after substitution of ##. Then the replacement list FOOBAR(FOO, BAR) is rescanned. But the section 2. specifies that the name of the macro being replaced (FOOBAR) is found (yes, defined above) it is not replaced (but actually replaced as can be seen in th demo).
Can you please clarify that wording? What did I miss?
LIVE DEMO

The (original) macro being replaced is not FOOBAR, it's INVOKE. When you're expanding INVOKE and you find FOOBAR, you expand FOOBAR normally. However, if INVOKE had been found when expanding INVOKE, it would no longer be expanded.
Let's take the following code:
#define FOOBAR(a, b) printf(#a #b)
#define INVOKE(a, b) e1 a##b(a, b)
int main() {
INVOKE(INV, OKE);
}
I added the e1 to the expansion of INVOKE to be able to visualise how many expansions happen. The result of preprocessing main is:
e1 INVOKE(INV, OKE);
This proves that INVOKE was expanded once and then, upon rescanning, not expanded again.
[Live example]

Consider the following simple example:
#include<stdio.h>
const int FOO = 42;
#define FOO (42 + FOO)
int main()
{
printf("%d", FOO);
}
Here the output will be 84.
The printf will be expanded to:
printf("%d", 42 + 42);
This means that when the macro FOO is expanded, the expansion will stop when the second FOO is found. It will not be further expanded. Otherwise, you will end up with endless recursion resulting in: 42 + (42 + (42 + (42 + ....)
Live demo here.

Related

Confusion about C macro expansion in enum

I see below code snippet in fwts code base:
#define FWTS_CONCAT(a, b) a ## b
#define FWTS_CONCAT_EXPAND(a,b) FWTS_CONCAT(a, b)
#define FWTS_ASSERT(e, m) \
enum { FWTS_CONCAT_EXPAND(FWTS_ASSERT_ ## m ## _in_line_, __LINE__) = 1 / !!(e) }
#define FWTS_REGISTER_FEATURES(name, ops, priority, flags, features) \
/* Ensure name is not too long */ \
FWTS_ASSERT(FWTS_ARRAY_LEN(name) < 16, \
fwts_register_name_too_long);
My questions are:
For the definition of FWTS_ASSERT(e, m), I know the !! can convert whatever value into 1 or 0. But doesn't it cause error for FWTS_ASSERT() when !!(e) evaluates to 0 thus leads to 1/0 ?
And btw, the FWTS_CONCAT_EXPAND(a,b) and FWTS_CONCAT(a, b) seem to be duplicated, why do we need 2 of them?
ADD 1
Based on #Klas Lindbäck's answer, I want to go through the macro expansion with a concrete example.
Suppose I have:
#define M_1 abc
#define M_2 123
Then I guess the expansion process of FWTS_CONCAT_EXPAND(M_1,M_2) should be:
FWTS_CONCAT_EXPAND(M_1,M_2)
->
FWTS_CONCAT(abc, 123)
->
abc123
If I directly applying FWTS_CONCAT(M_1, M_2), will it be expanded like this?
FWTS_CONCAT(M_1, M_2)
->
M_1M_2
->
Bang! M_1M_2 is an invalid symbol!
(Please correct me if I am wrong...)
ADD 2
Tried with gcc -E macroTest.c -o macroTest.i:
(macroTest.c)
#define M_1 abc
#define M_2 123
#define FWTS_CONCAT(a, b) a ## b
#define FWTS_CONCAT_EXPAND(a,b) FWTS_CONCAT(a, b)
FWTS_CONCAT_EXPAND(M_1, M_2)
FWTS_CONCAT(M_1,M_2)
(macroTest.i)
# 1 "macroTest.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "macroTest.c"
abc123
M_1M_2
I think I get the point of the macro expansion rule. Below are some related concepts and quotation:
Argument Prescan:
Macro arguments are completely macro-expanded before they are
substituted into a macro body, unless they are stringified or pasted
with other tokens. After substitution, the entire macro body,
including the substituted arguments, is scanned again for macros to be
expanded. The result is that the arguments are scanned twice to expand
macro calls in them.
Stringification
When a macro parameter is used with a leading ‘#’, the preprocessor
replaces it with the literal text of the actual argument, converted to
a string constant.
Token Pasting / Token Concatenation:
It is often useful to merge two tokens into one while expanding
macros. This is called token pasting or token concatenation. The ‘##’
preprocessing operator performs token pasting. When a macro is
expanded, the two tokens on either side of each ‘##’ operator are
combined into a single token, which then replaces the ‘##’ and the two
original tokens in the macro expansion.
So the detailed process of my scenario is like this:
FWTS_CONCAT_EXPAND(M_1, M_2)
-> FWTS_CONCAT_EXPAND(abc, 123) // M_1, M_2 pre-expanded since FWTS_CONCAT_EXPAND has no ##.
-> FWTS_CONCAT(abc, 123) // FWTS_CONCAT_EXPAND expanded into FWTS_CONCAT
-> abc123 // FWTS_CONCAT expanded
FWTS_CONCAT(M_1,M_2)
-> M_1M_2 //M_1, M_2 are not pre-expanded because of the ## in FWTS_CONCAT
-> DEADEND

The macros are used for compile time checking. This is useful when you write code that will be compiled and run on many different platforms and where some platforms may not be compatible.
If the first parameter to FWTS_ASSERT evaluates to non-zero (true) then !!(e) will evaluate to 1 and the enum will be created with the name FWTS_ASSERT_<second parameter>_in_line_<line>. I suspect that the enum is never actually used.
If the first parameter to FWTS_ASSERT evaluates to 0 (= false) then the compiler will try to compute 1/0 and generate a compiler error where it will hopefully tell which enum member caused the error, in this case FWTS_ASSERT_fwts_register_name_to_long_in_line_4.
And btw, the FWTS_CONCAT_EXPAND(a,b) and FWTS_CONCAT(a, b) seem to be duplicated, why do we need 2 of them?
FTW_CONCAT_EXPAND is done in 2 steps because we want to first expand any macros in the parameters and then perform the concatenation. Doing it in two steps makes the preprocessor do macro expansion of the parameters before it does the string concatenation.

How to understand rescanning the replacement token sequence for more defined identifiers in case of # or ## in macros?

From The C Programming Language by KRC, about the operators # and ## in macro definition
Two special operators influence the replacement process. First,
if an occurrence of a parameter in the replacement token sequence
is immediately preceded by #, string quotes (") are placed around
the corresponding parameter, and then both the # and the
parameter identifier are replaced by the quoted argument. A \
character is inserted before each " or \ character that appears
surrounding, or inside, a string literal or character constant
in the argument.
Second, if the definition token sequence for either kind of macro
contains a ## operator, then just after replacement of the
parameters, each ## is deleted, together with any white space on
either side, so as to concatenate the adjacent tokens and
form a new token. The effect is undefined if invalid tokens are
produced, or if the result depends on the order of processing of the ## operators. Also, ## may not appear at the beginning or end of a replacement token sequence.
In both kinds of macro, the replacement token sequence is
repeatedly rescanned for more defined identifiers. However, once a
given identifier has been replaced in a given expansion, it is not
replaced if it turns up again during rescanning; instead it is left
unchanged.
I am having trouble understanding the last paragraph, especially the sentences in bold.
Could you rephrase it, and/or give some examples? Thanks.

Consider the snippet:
#define A B + C
#define B 1
#define C 2
int k = A;
In this case first pass
will replace A:
int k = B + C;
The second pass will replace B and C
int k = 1 + 2;
Now consider another snippet:
#define A B + C
#define B A
#define C A
int k = A;
Now the first pass will expand A once, as before:
int k = B + C;
The second will replace B and C as before:
int k = A + A;
But here it will stop, as A was already expanded before in the first pass.

To rephrase the first emphasized sentence: when the preprocessor 'sees' a new #define <identifier> <replacement> directive, it checks whether <replacement> contains <identifier>s that have been defined previously.
But if some identifier's replacement contains the same identifier, it's not replaced by the replacement and left for the compiler to process. This means you can't define recursive macros like this:
#define recursion(a) ((a)>0)?:recursion(a-1):(a)
If you then write:
printf("%d\n", recursion(3));
Then the expansion would have a call to recursion(3-1) and the compiler will treat it as a call to a nonexistent function.

Behavior of ## operator in nested call

I was reading a book on C programming language where I found:
#define cat(x,y) x##y
#define xcat(x,y) cat(x,y)
calling cat(cat(1,2),3) produces error whereas calling xcat(xcat(1,2),3) produces expected result 123.
How are both working differently ?

Macros whose replacement lists depends on ## usually can't be called in nested fashion.
cat(cat(1,2),3) is not expanded in a normal fashion, with cat(1,2) yielding 12 and then cat(12, 3) yielding 123.
Macro parameters that are preceded or followed by ## in a replacement list aren't expanded at the time of substitution.
6.10.3.1 Argument substitution
1 After the arguments for the invocation of a function-like macro have been identified,
argument substitution takes place. A parameter in the replacement list, unless preceded
by a # or ## preprocessing token or followed by a ## preprocessing token (see below), is replaced by the corresponding argument after all macros contained therein have been
expanded. Before being substituted, each argument’s preprocessing tokens are
completely macro replaced as if they formed the rest of the preprocessing file; no other
preprocessing tokens are available.
As a result, cat(cat(1,2),3) expands to cat(1,2)3, which can't be expanded further, because there is no macro named cat(1,2)3.
In case
#define xcat(x,y) cat(x,y)
writing xcat(xcat(1,2),3) will work. As the preprocessor expands the outer call of xcat, it will expand xcat(1,2) as well; the difference is that xcat's replacement list does not contain ## anymore.
xcat(xcat(1,2),3) ==> cat(12, 3) ==> 12##3 ==> 123

How does the C preprocessor handle circular dependencies?

I want to know how the C preprocessor handles circular dependencies (of #defines). This is my program:
#define ONE TWO
#define TWO THREE
#define THREE ONE
int main()
{
int ONE, TWO, THREE;
ONE = 1;
TWO = 2;
THREE = 3;
printf ("ONE, TWO, THREE = %d, %d, %d \n",ONE, TWO, THREE);
}
Here is the preprocessor output. I'm unable to figure out why the output is as such. I would like to know the various steps a preprocessor takes in this case to give the following output.
# 1 "check_macro.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "check_macro.c"
int main()
{
int ONE, TWO, THREE;
ONE = 1;
TWO = 2;
THREE = 3;
printf ("ONE, TWO, THREE = %d, %d, %d \n",ONE, TWO, THREE);
}
I'm running this program on linux 3.2.0-49-generic-pae and compiling in gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5).

While a preprocessor macro is being expanded, that macro's name is not expanded. So all three of your symbols are defined as themselves:
ONE -> TWO -> THREE -> ONE (not expanded because expansion of ONE is in progress)
TWO -> THREE -> ONE -> TWO ( " TWO " )
THREE -> ONE -> TWO -> THREE ( " THREE " )
This behaviour is set by §6.10.3.4 of the C standard (section number from the C11 draft, although as far as I know, the wording and numbering of the section is unchanged since C89). When a macro name is encountered, it is replaced with its definition (and # and ## preprocessor operators are dealt with, as well as parameters to function-like macros). Then the result is rescanned for more macros (in the context of the rest of the file):
2/ If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced…
The clause goes on to say that any token which is not replaced because of a recursive call is effectively "frozen": it will never be replaced:
… These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts in which that macro name preprocessing token would otherwise have been replaced.
The situation which the last sentence refers rarely comes up in practice, but here is the simplest case I could think of:
#define two one,two
#define a(x) b(x)
#define b(x,y) x,y
a(two)
The result is one, two. two is expanded to one,two during the replacement of a, and the expanded two is marked as completely expanded. Subsequently, b(one,two) is expanded. This is no longer in the context of the replacement of two, but the two which is the second argument of b has been frozen, so it is not expanded again.

Your question is answered by publication ISO/IEC 9899:TC2 section 6.10.3.4 "Rescanning and further replacement", paragraph 2, which I quote here for your convenience; in the future, please consider reading the specificaftion when you have a question about the specification.
If the name of the macro being replaced is found during this scan of the replacement list
(not including the rest of the source file’s preprocessing tokens), it is not replaced.
Furthermore, if any nested replacements encounter the name of the macro being replaced,
it is not replaced. These nonreplaced macro name preprocessing tokens are no longer
available for further replacement even if they are later (re)examined in contexts in which
that macro name preprocessing token would otherwise have been replaced.

https://gcc.gnu.org/onlinedocs/cpp/Self-Referential-Macros.html#Self-Referential-Macros answers the question about self referential macros.
The crux of the answer is that when the pre-processor finds self referential macros, it doesn't expand them at all.
I suspect, the same logic is used to prevent expansion of circularly defined macros. Otherwise, the preprocessor will be in an infinite expansion.

In your example you do the macro processing before defining
variables of the same name, so regardless of what the result
of the macro processing is, you always print 1, 2, 3!
Here is an example where the variables are defined first:
#include <stdio.h>
int main()
{
int A = 1, B = 2, C = 3;
#define A B
#define B C
//#define C A
printf("%d\n", A);
printf("%d\n", B);
printf("%d\n", C);
}
This prints 3 3 3. Somewhat insidiously, un-commenting #define C A changes the behaviour of the line printf("%d\n", B);

Here's a nice demonstration of the behavior described in rici's and Eric Lippert's answers, i.e. that a macro name is not re-expanded if it is encountered again while already expanding the same macro.
Content of test.c:
#define ONE 1, TWO
#define TWO 2, THREE
#define THREE 3, ONE
int foo[] = {
ONE,
TWO,
THREE
};
Output of gcc -E test.c (excluding initial # 1 ... lines):
int foo[] = {
1, 2, 3, ONE,
2, 3, 1, TWO,
3, 1, 2, THREE
};
(I would post this as a comment, but including substantial code blocks in comments is kind of awkward, so I'm making this a Community Wiki answer instead. If you feel it would be better included as part of an existing answer, feel free to copy it and ask me to delete this CW version.)

C Preprocessor treating an identifier as object-like instead of function-like

This is a very simplified version of some code I just ran into at work:
#include <stdio.h>
#define F(G) G(1)
#define G(x) x+1
int main() {
printf("%d\n", F(G));
}
prints 2.
Now, I can see that F(G) expands to G(1) and then G(1) expands to 2, but its not clear to me why. I would have expected to get an error that G is not a function from the printf line.
How does the pre-processor parse code like this?

A function-like macro is only invoked if its name is followed by a (.
In F(G), G is not followed by a (, so the G there is not a macro invocation.
In F(G) G(1), G is a macro parameter and thus is not macro-replaced directly (this is a very confusing macro you've got :-O). In G(1), G is replaced by the argument corresponding to the parameter G, which also happens to be G. That replacement is then rescanned and G(1) is evaluated to 1 + 1.
If we rewrite your macros so that you aren't using G in multiple different ways, it's far easier to understand:
#define F(x) x(1)
#define G(x) x + 1
Here, F(G) is replaced by G(1). This is then rescanned, and the invocation of G is evaluated, yielding 1 + 1.

Expanding on James McNellis' answer, the C99 standard prescribes:
6.10.3.4 Rescanning and further replacement
1 After all parameters in the replacement list have been substituted and # and ##
processing has taken place, all placemarker preprocessing tokens are removed. Then, the
resulting preprocessing token sequence is rescanned, along with all subsequent
preprocessing tokens of the source ﬁle, for more macro names to replace.
2 If the name of the macro being replaced is found during this scan of the replacement list
(not including the rest of the source ﬁle’s preprocessing tokens), it is not replaced.
Furthermore, if any nested replacements encounter the name of the macro being replaced,
it is not replaced. These nonreplaced macro name preprocessing tokens are no longer
available for further replacement even if they are later (re)examined in contexts in which
that macro name preprocessing token would otherwise have been replaced.
3 The resulting completely macro-replaced preprocessing token sequence is not processed
as a preprocessing directive even if it resembles one, but all pragma unary operator
expressions within it are then processed as speciﬁed in 6.10.9 below.

#defines do very basic string replacement:
printf("%d\n", F(G));
goes to
printf("%d\n", G(1));
which goes to:
printf("%d\n", 1+1);

The preprocessor makes one pass, but you are thinking that it makes one pass per #define. So during its pass the preprocessor matches and replaces F(G) but doesn't match any G(x).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Macro replacement list rescanning for replacement - c

Related

Confusion about C macro expansion in enum

How to understand rescanning the replacement token sequence for more defined identifiers in case of # or ## in macros?

Behavior of ## operator in nested call

How does the C preprocessor handle circular dependencies?

C Preprocessor treating an identifier as object-like instead of function-like

Categories

Resources