strange result on macro expansion

strange result on macro expansion - c

Consider the following code snippet
#include<stdio.h>
#define A -B
#define B -C
#define C 5
int main()
{
printf("The value of A is %d\n", A);
return 0;
}
Output
The value of A is 5
But this shouldn't compile at all because after expansion it should look something like printf("The value of A is %d\n", --5); and then it should give compile error saying lvalue required. Isn't it ?

Pass it the -E option (Ex: gcc -E a.c). This will output preprocessed source code.
int main()
{
printf("The value of A is %d\n", - -5);
return 0;
}
So it will introduce a space between - and -5 hence it will be not considered as an decrement operator --, so printf will print 5.
GCC Documentation On Token Spacing provides the Information on Why There is an Extra Space Produced:
First, consider an issue that only concerns the stand-alone preprocessor: there needs to be a guarantee that re-reading its preprocessed output results in an identical token stream. Without taking special measures, this might not be the case because of macro substitution. For example:
#define PLUS +
#define EMPTY
#define f(x) =x=
+PLUS -EMPTY- PLUS+ f(=)
==> + + - - + + = = =
not
==> ++ -- ++ ===
One solution would be to simply insert a space between all adjacent tokens. However, we would like to keep space insertion to a minimum, both for aesthetic reasons and because it causes problems for people who still try to abuse the preprocessor for things like Fortran source and Makefiles.
For now, just notice that when tokens are added (or removed, as shown by the EMPTY example) from the original lexed token stream, we need to check for accidental token pasting. We call this paste avoidance. Token addition and removal can only occur because of macro expansion, but accidental pasting can occur in many places: both before and after each macro replacement, each argument replacement, and additionally each token created by the # and ## operators.

I do not think so. Even macro expansion is text processing, it is impossible to create a token from across macro boundaries. Therefore it as -(-5), not --5, because -- is a single token.

The preprocessor introduces a space in-between the expansion of B and C:
#define A -B
#define B -C
#define C 5
A
with output (generated via cpp < test.c)
# 1 "test.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 329 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "test.c" 2
- -5

In C language the program source code is split into so called preprocessing tokens at a very early stage of translation (phase 3), before macro substitution takes place (phase 4). Later (at phase 7) preprocessing tokens are converted into regular tokens which are fed into syntactic and semantic analyzer of the compiler proper (see "5.1.1.2 Translation phases" in the language specification).
Phase 3 is the stage when the preprocessing tokens for future C language operators and other lexical elements are formed (identifiers, numbers, punctuators, string literals etc.) Multi-character punctuators like --, >>= and so on are formed at that early stage. In order to eventually obtain a token for -- operator at phase 7 you need to have that -- early as a complete punctuator at phase 3. No additional punctuator concatenation occurs when transitioning from preprocessing tokens to regular tokens at phase 7, which means that two adjacent - punctuators detected at phase 3 will NOT become a single token -- at phase 7. The compiler proper will never have a chance to see these two adjacent - and a single token --.
In other words, in C you cannot use preprocessor to concatenate things by placing them next to each other. This is why preprocessor has dedicated features like ## to facilitate concatenation. And ## is what you have to use to perform concatenation of two tokens into a single token.
BTW, it is not correct to explain this behavior by claiming that preprocessor will place a space character between your - characters. Nothing like that is present in the language specification. What really happens is that in the internal structures of the compiler your - tokens forever remain as two separate tokens. How preprocessor and compiler achieve that is their internal implementation detail. In implementations with loosely coupled preprocessor and compiler proper (e.g. completely independent modules that communicate through an intermediate textual representation) injecting a space between adjacent punctuators is defintely a natural way to implement the required separation of tokens.

Related

Why doesn’t the preprocessor cause two adjacent minus signs to be a decrement? [duplicate]

This question already has answers here:
strange result on macro expansion
(4 answers)
Closed 5 years ago.
Consider the following code:
#include <stdio.h>
#define A -B
#define B -C
#define C 5
int main()
{
printf("The value of A is %d\n", A);
return 0;
}
Here preprocessing should take place in the following manner:
first A should get replaced with -B
then B should get replaced with -C thus expression resulting into --C
then C should get replaced with 5 thus expression resulting into --5
So the resultant expression should give a compilation error( lvalue error ).
But the correct answer is 5, how can the output be 5?
Please help me in this.

It preprocesses to (note the space):
int main()
{
printf("The value of A is %d\n", - -5);
return 0;
}
The preprocessor pastes tokens, not strings. It won't create -- out of two adjacent - tokens unless you force token concatenation with ##:
#define CAT_(A,B) A##B
#define CAT(A,B) CAT_(A,B)
#define A CAT(-,B)
#define B -C
#define C 5
int main()
{
printf("The value of A is %d\n", A); /* A is --5 here—no space */
return 0;
}

Although the C preprocessor often feels like it’s literally doing a search and replace on the code, the preprocessor actually works a bit differently.
Before the preprocessor runs, the source file is split into preprocessing tokens, which are individual units of text. For example, a single minus sign is treated not as a character, but as a token consisting of a minus sign, and a double minus sign is treated as a token consisting of two minus sign.
The C preprocessor kicks in and replaces each macro not with the literal text of the macro replacement, but rather with the series of preprocessor tokens in that replacement. In this case, the preprocessor replaces A with a minus followed by B, then replaces B with a minus followed by C, then replaces C with 5. The effect here is that there are two unary minuses applied to the 5, rather than a decrement operator, even though a literal search and replace would have generated a decrement operator that produces a syntax error.
This is interesting in that there’s no way you can write two consecutive minus signs in source code and have it interpreted as two unary minuses. This only works because by the time the preprocessor splices everything together, it already knows it’s looking at two unary minuses. The resulting C code isn’t then rescanned to be tokenized a second time around.
Now the legalese: section §5.1.1.2/7 says that after macro substitution is done, each preprocessing token - and here there are two of them (the two minus signs) - are converted into actual tokens, and then they’re syntactically and semantically analyzed. That means that there’s no opportunity for the compiler to rescan those tokens to reinterpret them as a single token. So this is a weird case where the resulting token stream can’t actually be typed into the source code without changing the meaning.

Think of the resultant expression as this instead:
-(-(5))

Confusion about C macro expansion in enum

I see below code snippet in fwts code base:
#define FWTS_CONCAT(a, b) a ## b
#define FWTS_CONCAT_EXPAND(a,b) FWTS_CONCAT(a, b)
#define FWTS_ASSERT(e, m) \
enum { FWTS_CONCAT_EXPAND(FWTS_ASSERT_ ## m ## _in_line_, __LINE__) = 1 / !!(e) }
#define FWTS_REGISTER_FEATURES(name, ops, priority, flags, features) \
/* Ensure name is not too long */ \
FWTS_ASSERT(FWTS_ARRAY_LEN(name) < 16, \
fwts_register_name_too_long);
My questions are:
For the definition of FWTS_ASSERT(e, m), I know the !! can convert whatever value into 1 or 0. But doesn't it cause error for FWTS_ASSERT() when !!(e) evaluates to 0 thus leads to 1/0 ?
And btw, the FWTS_CONCAT_EXPAND(a,b) and FWTS_CONCAT(a, b) seem to be duplicated, why do we need 2 of them?
ADD 1
Based on #Klas Lindbäck's answer, I want to go through the macro expansion with a concrete example.
Suppose I have:
#define M_1 abc
#define M_2 123
Then I guess the expansion process of FWTS_CONCAT_EXPAND(M_1,M_2) should be:
FWTS_CONCAT_EXPAND(M_1,M_2)
->
FWTS_CONCAT(abc, 123)
->
abc123
If I directly applying FWTS_CONCAT(M_1, M_2), will it be expanded like this?
FWTS_CONCAT(M_1, M_2)
->
M_1M_2
->
Bang! M_1M_2 is an invalid symbol!
(Please correct me if I am wrong...)
ADD 2
Tried with gcc -E macroTest.c -o macroTest.i:
(macroTest.c)
#define M_1 abc
#define M_2 123
#define FWTS_CONCAT(a, b) a ## b
#define FWTS_CONCAT_EXPAND(a,b) FWTS_CONCAT(a, b)
FWTS_CONCAT_EXPAND(M_1, M_2)
FWTS_CONCAT(M_1,M_2)
(macroTest.i)
# 1 "macroTest.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "macroTest.c"
abc123
M_1M_2
I think I get the point of the macro expansion rule. Below are some related concepts and quotation:
Argument Prescan:
Macro arguments are completely macro-expanded before they are
substituted into a macro body, unless they are stringified or pasted
with other tokens. After substitution, the entire macro body,
including the substituted arguments, is scanned again for macros to be
expanded. The result is that the arguments are scanned twice to expand
macro calls in them.
Stringification
When a macro parameter is used with a leading ‘#’, the preprocessor
replaces it with the literal text of the actual argument, converted to
a string constant.
Token Pasting / Token Concatenation:
It is often useful to merge two tokens into one while expanding
macros. This is called token pasting or token concatenation. The ‘##’
preprocessing operator performs token pasting. When a macro is
expanded, the two tokens on either side of each ‘##’ operator are
combined into a single token, which then replaces the ‘##’ and the two
original tokens in the macro expansion.
So the detailed process of my scenario is like this:
FWTS_CONCAT_EXPAND(M_1, M_2)
-> FWTS_CONCAT_EXPAND(abc, 123) // M_1, M_2 pre-expanded since FWTS_CONCAT_EXPAND has no ##.
-> FWTS_CONCAT(abc, 123) // FWTS_CONCAT_EXPAND expanded into FWTS_CONCAT
-> abc123 // FWTS_CONCAT expanded
FWTS_CONCAT(M_1,M_2)
-> M_1M_2 //M_1, M_2 are not pre-expanded because of the ## in FWTS_CONCAT
-> DEADEND

The macros are used for compile time checking. This is useful when you write code that will be compiled and run on many different platforms and where some platforms may not be compatible.
If the first parameter to FWTS_ASSERT evaluates to non-zero (true) then !!(e) will evaluate to 1 and the enum will be created with the name FWTS_ASSERT_<second parameter>_in_line_<line>. I suspect that the enum is never actually used.
If the first parameter to FWTS_ASSERT evaluates to 0 (= false) then the compiler will try to compute 1/0 and generate a compiler error where it will hopefully tell which enum member caused the error, in this case FWTS_ASSERT_fwts_register_name_to_long_in_line_4.
And btw, the FWTS_CONCAT_EXPAND(a,b) and FWTS_CONCAT(a, b) seem to be duplicated, why do we need 2 of them?
FTW_CONCAT_EXPAND is done in 2 steps because we want to first expand any macros in the parameters and then perform the concatenation. Doing it in two steps makes the preprocessor do macro expansion of the parameters before it does the string concatenation.

C language macro code - #define with 2 '##'

I recently came across this question and could not find supporting document or data in explanation. The question was asked to me and the person was not willing to share the answer.
#define BIT(A) BIT_##A
#define PIN_0 0
"Do we get BIT_0 by using macro BIT(PIN_0)? If no make necessary corrections?"
I dont know the answer to the above question?

The macro
#define BIT(A) BIT_##A
means to create a single token from what would otherwise be two separate tokens. Without using ## (the token concatenation operator), you might be tempted to do one of:
#define BIT(A) BIT_A
#define BIT(A) BIT_ A
The problem with the first is that, because BIT_A is already a single token, no attempt to match the A to the passed argument will succeed, and you'll get the literal expansion BIT_A no matter what you've used as an argument:
BIT(42) -> BIT_A
The problem with the second is that, even though A is a separate token and will therefore be subject to replacement, the final expansion will not be a single token:
BIT(42) -> BIT_ 42
The ## in your macro takes the value specified by A, and appends it to the literal BIT_ forming one token so, for example,
BIT(7) -> BIT_7
BIT(PIN0) -> BIT_PIN0, but see below if you want BIT_0
This is covered in C11 6.10.3.3 The ## operator:
... each instance of a ## preprocessing token in the replacement list (not from an argument) is deleted and the preceding preprocessing
token is concatenated with the following preprocessing token.
The resulting token is available for further macro replacement.
Now, if you want a macro that will concatenate together BIT_ and another already-evaluated macro into a single token, you have to use some trickery to get it to do the initial macro substitution before the concatenation.
That's because the standard states that the concatenation is performed before regular macro replacement, which is why this trickery is needed. The problem with what you have:
#define PIN_0 0
#define BIT(A) BIT_##A
is that the ## expansion of BIT(PIN0) will initially result in the single token BIT_PIN0. Now, although that's subject to further macro replacement, that single token doesn't actually have a macro replacement, so it's left as is.
To get around this, you have to use levels of indirection to coerce the preprocessor into doing regular macro replacement before ##:
#define CONCAT(x,y) x ## y
#define PIN0 0
#define BIT(A) CONCAT(BIT_,A)
This series of macros shown above goes through a number of stages:
BIT(PIN0)
-> CONCAT(BIT_,PIN0)
-> CONCAT(BIT_,0)
-> BIT_0

How does the C preprocessor handle circular dependencies?

I want to know how the C preprocessor handles circular dependencies (of #defines). This is my program:
#define ONE TWO
#define TWO THREE
#define THREE ONE
int main()
{
int ONE, TWO, THREE;
ONE = 1;
TWO = 2;
THREE = 3;
printf ("ONE, TWO, THREE = %d, %d, %d \n",ONE, TWO, THREE);
}
Here is the preprocessor output. I'm unable to figure out why the output is as such. I would like to know the various steps a preprocessor takes in this case to give the following output.
# 1 "check_macro.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "check_macro.c"
int main()
{
int ONE, TWO, THREE;
ONE = 1;
TWO = 2;
THREE = 3;
printf ("ONE, TWO, THREE = %d, %d, %d \n",ONE, TWO, THREE);
}
I'm running this program on linux 3.2.0-49-generic-pae and compiling in gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5).

While a preprocessor macro is being expanded, that macro's name is not expanded. So all three of your symbols are defined as themselves:
ONE -> TWO -> THREE -> ONE (not expanded because expansion of ONE is in progress)
TWO -> THREE -> ONE -> TWO ( " TWO " )
THREE -> ONE -> TWO -> THREE ( " THREE " )
This behaviour is set by §6.10.3.4 of the C standard (section number from the C11 draft, although as far as I know, the wording and numbering of the section is unchanged since C89). When a macro name is encountered, it is replaced with its definition (and # and ## preprocessor operators are dealt with, as well as parameters to function-like macros). Then the result is rescanned for more macros (in the context of the rest of the file):
2/ If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced…
The clause goes on to say that any token which is not replaced because of a recursive call is effectively "frozen": it will never be replaced:
… These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts in which that macro name preprocessing token would otherwise have been replaced.
The situation which the last sentence refers rarely comes up in practice, but here is the simplest case I could think of:
#define two one,two
#define a(x) b(x)
#define b(x,y) x,y
a(two)
The result is one, two. two is expanded to one,two during the replacement of a, and the expanded two is marked as completely expanded. Subsequently, b(one,two) is expanded. This is no longer in the context of the replacement of two, but the two which is the second argument of b has been frozen, so it is not expanded again.

Your question is answered by publication ISO/IEC 9899:TC2 section 6.10.3.4 "Rescanning and further replacement", paragraph 2, which I quote here for your convenience; in the future, please consider reading the specificaftion when you have a question about the specification.
If the name of the macro being replaced is found during this scan of the replacement list
(not including the rest of the source file’s preprocessing tokens), it is not replaced.
Furthermore, if any nested replacements encounter the name of the macro being replaced,
it is not replaced. These nonreplaced macro name preprocessing tokens are no longer
available for further replacement even if they are later (re)examined in contexts in which
that macro name preprocessing token would otherwise have been replaced.

https://gcc.gnu.org/onlinedocs/cpp/Self-Referential-Macros.html#Self-Referential-Macros answers the question about self referential macros.
The crux of the answer is that when the pre-processor finds self referential macros, it doesn't expand them at all.
I suspect, the same logic is used to prevent expansion of circularly defined macros. Otherwise, the preprocessor will be in an infinite expansion.

In your example you do the macro processing before defining
variables of the same name, so regardless of what the result
of the macro processing is, you always print 1, 2, 3!
Here is an example where the variables are defined first:
#include <stdio.h>
int main()
{
int A = 1, B = 2, C = 3;
#define A B
#define B C
//#define C A
printf("%d\n", A);
printf("%d\n", B);
printf("%d\n", C);
}
This prints 3 3 3. Somewhat insidiously, un-commenting #define C A changes the behaviour of the line printf("%d\n", B);

Here's a nice demonstration of the behavior described in rici's and Eric Lippert's answers, i.e. that a macro name is not re-expanded if it is encountered again while already expanding the same macro.
Content of test.c:
#define ONE 1, TWO
#define TWO 2, THREE
#define THREE 3, ONE
int foo[] = {
ONE,
TWO,
THREE
};
Output of gcc -E test.c (excluding initial # 1 ... lines):
int foo[] = {
1, 2, 3, ONE,
2, 3, 1, TWO,
3, 1, 2, THREE
};
(I would post this as a comment, but including substantial code blocks in comments is kind of awkward, so I'm making this a Community Wiki answer instead. If you feel it would be better included as part of an existing answer, feel free to copy it and ask me to delete this CW version.)

C Preprocessor treating an identifier as object-like instead of function-like

This is a very simplified version of some code I just ran into at work:
#include <stdio.h>
#define F(G) G(1)
#define G(x) x+1
int main() {
printf("%d\n", F(G));
}
prints 2.
Now, I can see that F(G) expands to G(1) and then G(1) expands to 2, but its not clear to me why. I would have expected to get an error that G is not a function from the printf line.
How does the pre-processor parse code like this?

A function-like macro is only invoked if its name is followed by a (.
In F(G), G is not followed by a (, so the G there is not a macro invocation.
In F(G) G(1), G is a macro parameter and thus is not macro-replaced directly (this is a very confusing macro you've got :-O). In G(1), G is replaced by the argument corresponding to the parameter G, which also happens to be G. That replacement is then rescanned and G(1) is evaluated to 1 + 1.
If we rewrite your macros so that you aren't using G in multiple different ways, it's far easier to understand:
#define F(x) x(1)
#define G(x) x + 1
Here, F(G) is replaced by G(1). This is then rescanned, and the invocation of G is evaluated, yielding 1 + 1.

Expanding on James McNellis' answer, the C99 standard prescribes:
6.10.3.4 Rescanning and further replacement
1 After all parameters in the replacement list have been substituted and # and ##
processing has taken place, all placemarker preprocessing tokens are removed. Then, the
resulting preprocessing token sequence is rescanned, along with all subsequent
preprocessing tokens of the source ﬁle, for more macro names to replace.
2 If the name of the macro being replaced is found during this scan of the replacement list
(not including the rest of the source ﬁle’s preprocessing tokens), it is not replaced.
Furthermore, if any nested replacements encounter the name of the macro being replaced,
it is not replaced. These nonreplaced macro name preprocessing tokens are no longer
available for further replacement even if they are later (re)examined in contexts in which
that macro name preprocessing token would otherwise have been replaced.
3 The resulting completely macro-replaced preprocessing token sequence is not processed
as a preprocessing directive even if it resembles one, but all pragma unary operator
expressions within it are then processed as speciﬁed in 6.10.9 below.

#defines do very basic string replacement:
printf("%d\n", F(G));
goes to
printf("%d\n", G(1));
which goes to:
printf("%d\n", 1+1);

The preprocessor makes one pass, but you are thinking that it makes one pass per #define. So during its pass the preprocessor matches and replaces F(G) but doesn't match any G(x).