How does the C preprocessor handle circular dependencies?

How does the C preprocessor handle circular dependencies? - c

I want to know how the C preprocessor handles circular dependencies (of #defines). This is my program:
#define ONE TWO
#define TWO THREE
#define THREE ONE
int main()
{
int ONE, TWO, THREE;
ONE = 1;
TWO = 2;
THREE = 3;
printf ("ONE, TWO, THREE = %d, %d, %d \n",ONE, TWO, THREE);
}
Here is the preprocessor output. I'm unable to figure out why the output is as such. I would like to know the various steps a preprocessor takes in this case to give the following output.
# 1 "check_macro.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "check_macro.c"
int main()
{
int ONE, TWO, THREE;
ONE = 1;
TWO = 2;
THREE = 3;
printf ("ONE, TWO, THREE = %d, %d, %d \n",ONE, TWO, THREE);
}
I'm running this program on linux 3.2.0-49-generic-pae and compiling in gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5).

While a preprocessor macro is being expanded, that macro's name is not expanded. So all three of your symbols are defined as themselves:
ONE -> TWO -> THREE -> ONE (not expanded because expansion of ONE is in progress)
TWO -> THREE -> ONE -> TWO ( " TWO " )
THREE -> ONE -> TWO -> THREE ( " THREE " )
This behaviour is set by §6.10.3.4 of the C standard (section number from the C11 draft, although as far as I know, the wording and numbering of the section is unchanged since C89). When a macro name is encountered, it is replaced with its definition (and # and ## preprocessor operators are dealt with, as well as parameters to function-like macros). Then the result is rescanned for more macros (in the context of the rest of the file):
2/ If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced…
The clause goes on to say that any token which is not replaced because of a recursive call is effectively "frozen": it will never be replaced:
… These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts in which that macro name preprocessing token would otherwise have been replaced.
The situation which the last sentence refers rarely comes up in practice, but here is the simplest case I could think of:
#define two one,two
#define a(x) b(x)
#define b(x,y) x,y
a(two)
The result is one, two. two is expanded to one,two during the replacement of a, and the expanded two is marked as completely expanded. Subsequently, b(one,two) is expanded. This is no longer in the context of the replacement of two, but the two which is the second argument of b has been frozen, so it is not expanded again.

Your question is answered by publication ISO/IEC 9899:TC2 section 6.10.3.4 "Rescanning and further replacement", paragraph 2, which I quote here for your convenience; in the future, please consider reading the specificaftion when you have a question about the specification.
If the name of the macro being replaced is found during this scan of the replacement list
(not including the rest of the source file’s preprocessing tokens), it is not replaced.
Furthermore, if any nested replacements encounter the name of the macro being replaced,
it is not replaced. These nonreplaced macro name preprocessing tokens are no longer
available for further replacement even if they are later (re)examined in contexts in which
that macro name preprocessing token would otherwise have been replaced.

https://gcc.gnu.org/onlinedocs/cpp/Self-Referential-Macros.html#Self-Referential-Macros answers the question about self referential macros.
The crux of the answer is that when the pre-processor finds self referential macros, it doesn't expand them at all.
I suspect, the same logic is used to prevent expansion of circularly defined macros. Otherwise, the preprocessor will be in an infinite expansion.

In your example you do the macro processing before defining
variables of the same name, so regardless of what the result
of the macro processing is, you always print 1, 2, 3!
Here is an example where the variables are defined first:
#include <stdio.h>
int main()
{
int A = 1, B = 2, C = 3;
#define A B
#define B C
//#define C A
printf("%d\n", A);
printf("%d\n", B);
printf("%d\n", C);
}
This prints 3 3 3. Somewhat insidiously, un-commenting #define C A changes the behaviour of the line printf("%d\n", B);

Here's a nice demonstration of the behavior described in rici's and Eric Lippert's answers, i.e. that a macro name is not re-expanded if it is encountered again while already expanding the same macro.
Content of test.c:
#define ONE 1, TWO
#define TWO 2, THREE
#define THREE 3, ONE
int foo[] = {
ONE,
TWO,
THREE
};
Output of gcc -E test.c (excluding initial # 1 ... lines):
int foo[] = {
1, 2, 3, ONE,
2, 3, 1, TWO,
3, 1, 2, THREE
};
(I would post this as a comment, but including substantial code blocks in comments is kind of awkward, so I'm making this a Community Wiki answer instead. If you feel it would be better included as part of an existing answer, feel free to copy it and ask me to delete this CW version.)

Related

Macro replacement list rescanning for replacement

I'm reading the Standard N1570 about macro replacement and misunderstand some wording from 6.10.3.4.
1 After all parameters in the replacement list have been substituted
and # and ## processing has taken place, all placemarker preprocessing
tokens are removed. The resulting preprocessing token sequence is then
rescanned, along with all subsequent preprocessing tokens of the
source file, for more macro names to replace
So after all # and ## are resolved we rescan the replacement list. But the section 2 specifies:
2 If the name of the macro being replaced is found during this scan of
the replacement list (not including the rest of the source file’s
preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced, it is not
replaced.
It looks contradictory to me. So what kind of replacement possible in that rescan? I tried the following example:
#define FOOBAR(a, b) printf(#a #b)
#define INVOKE(a, b) a##b(a, b)
int main() {
INVOKE(FOO, BAR); //expands to printf("FOO" "BAR")
}
So INVOKE(FOO, BAR) expands to FOOBAR(FOO, BAR) after substitution of ##. Then the replacement list FOOBAR(FOO, BAR) is rescanned. But the section 2. specifies that the name of the macro being replaced (FOOBAR) is found (yes, defined above) it is not replaced (but actually replaced as can be seen in th demo).
Can you please clarify that wording? What did I miss?
LIVE DEMO

The (original) macro being replaced is not FOOBAR, it's INVOKE. When you're expanding INVOKE and you find FOOBAR, you expand FOOBAR normally. However, if INVOKE had been found when expanding INVOKE, it would no longer be expanded.
Let's take the following code:
#define FOOBAR(a, b) printf(#a #b)
#define INVOKE(a, b) e1 a##b(a, b)
int main() {
INVOKE(INV, OKE);
}
I added the e1 to the expansion of INVOKE to be able to visualise how many expansions happen. The result of preprocessing main is:
e1 INVOKE(INV, OKE);
This proves that INVOKE was expanded once and then, upon rescanning, not expanded again.
[Live example]

Consider the following simple example:
#include<stdio.h>
const int FOO = 42;
#define FOO (42 + FOO)
int main()
{
printf("%d", FOO);
}
Here the output will be 84.
The printf will be expanded to:
printf("%d", 42 + 42);
This means that when the macro FOO is expanded, the expansion will stop when the second FOO is found. It will not be further expanded. Otherwise, you will end up with endless recursion resulting in: 42 + (42 + (42 + (42 + ....)
Live demo here.

Why doesn’t the preprocessor cause two adjacent minus signs to be a decrement? [duplicate]

This question already has answers here:
strange result on macro expansion
(4 answers)
Closed 5 years ago.
Consider the following code:
#include <stdio.h>
#define A -B
#define B -C
#define C 5
int main()
{
printf("The value of A is %d\n", A);
return 0;
}
Here preprocessing should take place in the following manner:
first A should get replaced with -B
then B should get replaced with -C thus expression resulting into --C
then C should get replaced with 5 thus expression resulting into --5
So the resultant expression should give a compilation error( lvalue error ).
But the correct answer is 5, how can the output be 5?
Please help me in this.

It preprocesses to (note the space):
int main()
{
printf("The value of A is %d\n", - -5);
return 0;
}
The preprocessor pastes tokens, not strings. It won't create -- out of two adjacent - tokens unless you force token concatenation with ##:
#define CAT_(A,B) A##B
#define CAT(A,B) CAT_(A,B)
#define A CAT(-,B)
#define B -C
#define C 5
int main()
{
printf("The value of A is %d\n", A); /* A is --5 here—no space */
return 0;
}

Although the C preprocessor often feels like it’s literally doing a search and replace on the code, the preprocessor actually works a bit differently.
Before the preprocessor runs, the source file is split into preprocessing tokens, which are individual units of text. For example, a single minus sign is treated not as a character, but as a token consisting of a minus sign, and a double minus sign is treated as a token consisting of two minus sign.
The C preprocessor kicks in and replaces each macro not with the literal text of the macro replacement, but rather with the series of preprocessor tokens in that replacement. In this case, the preprocessor replaces A with a minus followed by B, then replaces B with a minus followed by C, then replaces C with 5. The effect here is that there are two unary minuses applied to the 5, rather than a decrement operator, even though a literal search and replace would have generated a decrement operator that produces a syntax error.
This is interesting in that there’s no way you can write two consecutive minus signs in source code and have it interpreted as two unary minuses. This only works because by the time the preprocessor splices everything together, it already knows it’s looking at two unary minuses. The resulting C code isn’t then rescanned to be tokenized a second time around.
Now the legalese: section §5.1.1.2/7 says that after macro substitution is done, each preprocessing token - and here there are two of them (the two minus signs) - are converted into actual tokens, and then they’re syntactically and semantically analyzed. That means that there’s no opportunity for the compiler to rescan those tokens to reinterpret them as a single token. So this is a weird case where the resulting token stream can’t actually be typed into the source code without changing the meaning.

Think of the resultant expression as this instead:
-(-(5))

strange result on macro expansion

Consider the following code snippet
#include<stdio.h>
#define A -B
#define B -C
#define C 5
int main()
{
printf("The value of A is %d\n", A);
return 0;
}
Output
The value of A is 5
But this shouldn't compile at all because after expansion it should look something like printf("The value of A is %d\n", --5); and then it should give compile error saying lvalue required. Isn't it ?

Pass it the -E option (Ex: gcc -E a.c). This will output preprocessed source code.
int main()
{
printf("The value of A is %d\n", - -5);
return 0;
}
So it will introduce a space between - and -5 hence it will be not considered as an decrement operator --, so printf will print 5.
GCC Documentation On Token Spacing provides the Information on Why There is an Extra Space Produced:
First, consider an issue that only concerns the stand-alone preprocessor: there needs to be a guarantee that re-reading its preprocessed output results in an identical token stream. Without taking special measures, this might not be the case because of macro substitution. For example:
#define PLUS +
#define EMPTY
#define f(x) =x=
+PLUS -EMPTY- PLUS+ f(=)
==> + + - - + + = = =
not
==> ++ -- ++ ===
One solution would be to simply insert a space between all adjacent tokens. However, we would like to keep space insertion to a minimum, both for aesthetic reasons and because it causes problems for people who still try to abuse the preprocessor for things like Fortran source and Makefiles.
For now, just notice that when tokens are added (or removed, as shown by the EMPTY example) from the original lexed token stream, we need to check for accidental token pasting. We call this paste avoidance. Token addition and removal can only occur because of macro expansion, but accidental pasting can occur in many places: both before and after each macro replacement, each argument replacement, and additionally each token created by the # and ## operators.

I do not think so. Even macro expansion is text processing, it is impossible to create a token from across macro boundaries. Therefore it as -(-5), not --5, because -- is a single token.

The preprocessor introduces a space in-between the expansion of B and C:
#define A -B
#define B -C
#define C 5
A
with output (generated via cpp < test.c)
# 1 "test.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 329 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "test.c" 2
- -5

In C language the program source code is split into so called preprocessing tokens at a very early stage of translation (phase 3), before macro substitution takes place (phase 4). Later (at phase 7) preprocessing tokens are converted into regular tokens which are fed into syntactic and semantic analyzer of the compiler proper (see "5.1.1.2 Translation phases" in the language specification).
Phase 3 is the stage when the preprocessing tokens for future C language operators and other lexical elements are formed (identifiers, numbers, punctuators, string literals etc.) Multi-character punctuators like --, >>= and so on are formed at that early stage. In order to eventually obtain a token for -- operator at phase 7 you need to have that -- early as a complete punctuator at phase 3. No additional punctuator concatenation occurs when transitioning from preprocessing tokens to regular tokens at phase 7, which means that two adjacent - punctuators detected at phase 3 will NOT become a single token -- at phase 7. The compiler proper will never have a chance to see these two adjacent - and a single token --.
In other words, in C you cannot use preprocessor to concatenate things by placing them next to each other. This is why preprocessor has dedicated features like ## to facilitate concatenation. And ## is what you have to use to perform concatenation of two tokens into a single token.
BTW, it is not correct to explain this behavior by claiming that preprocessor will place a space character between your - characters. Nothing like that is present in the language specification. What really happens is that in the internal structures of the compiler your - tokens forever remain as two separate tokens. How preprocessor and compiler achieve that is their internal implementation detail. In implementations with loosely coupled preprocessor and compiler proper (e.g. completely independent modules that communicate through an intermediate textual representation) injecting a space between adjacent punctuators is defintely a natural way to implement the required separation of tokens.

C Preprocessor treating an identifier as object-like instead of function-like

This is a very simplified version of some code I just ran into at work:
#include <stdio.h>
#define F(G) G(1)
#define G(x) x+1
int main() {
printf("%d\n", F(G));
}
prints 2.
Now, I can see that F(G) expands to G(1) and then G(1) expands to 2, but its not clear to me why. I would have expected to get an error that G is not a function from the printf line.
How does the pre-processor parse code like this?

A function-like macro is only invoked if its name is followed by a (.
In F(G), G is not followed by a (, so the G there is not a macro invocation.
In F(G) G(1), G is a macro parameter and thus is not macro-replaced directly (this is a very confusing macro you've got :-O). In G(1), G is replaced by the argument corresponding to the parameter G, which also happens to be G. That replacement is then rescanned and G(1) is evaluated to 1 + 1.
If we rewrite your macros so that you aren't using G in multiple different ways, it's far easier to understand:
#define F(x) x(1)
#define G(x) x + 1
Here, F(G) is replaced by G(1). This is then rescanned, and the invocation of G is evaluated, yielding 1 + 1.

Expanding on James McNellis' answer, the C99 standard prescribes:
6.10.3.4 Rescanning and further replacement
1 After all parameters in the replacement list have been substituted and # and ##
processing has taken place, all placemarker preprocessing tokens are removed. Then, the
resulting preprocessing token sequence is rescanned, along with all subsequent
preprocessing tokens of the source ﬁle, for more macro names to replace.
2 If the name of the macro being replaced is found during this scan of the replacement list
(not including the rest of the source ﬁle’s preprocessing tokens), it is not replaced.
Furthermore, if any nested replacements encounter the name of the macro being replaced,
it is not replaced. These nonreplaced macro name preprocessing tokens are no longer
available for further replacement even if they are later (re)examined in contexts in which
that macro name preprocessing token would otherwise have been replaced.
3 The resulting completely macro-replaced preprocessing token sequence is not processed
as a preprocessing directive even if it resembles one, but all pragma unary operator
expressions within it are then processed as speciﬁed in 6.10.9 below.

#defines do very basic string replacement:
printf("%d\n", F(G));
goes to
printf("%d\n", G(1));
which goes to:
printf("%d\n", 1+1);

The preprocessor makes one pass, but you are thinking that it makes one pass per #define. So during its pass the preprocessor matches and replaces F(G) but doesn't match any G(x).

Compound literals and function-like macros: bug in gcc or the C standard?

In C99, we have compound literals, and they can be passed to functions as in:
f((int[2]){ 1, 2 });
However, if f is not a function but rather a function-like macro, gcc barfs on this due to the preprocessor parsing it not as one argument but as two arguments, "(int[2]){ 1" and "2 }".
Is this a bug in gcc or in the C standard? If it's the latter, that pretty much rules out all transparent use of function-like macros, which seems like a huge defect...
Edit: As an example, one would expect the following to be a conforming program fragment:
fgetc((FILE *[2]){ f1, f2 }[i]);
But since fgetc could be implemented as a macro (albeit being required to protect its argument and not evaluate it more than once), this code would actually be incorrect. That seems surprising to me.

This "bug" has existed in the standard since C89:
#include <stdio.h>
void function(int a) {
printf("%d\n", a);
}
#define macro(a) do { printf("%d\n", a); } while (0)
int main() {
function(1 ? 1, 2: 3); /* comma operator */
macro(1 ? 1, 2: 3); /* macro argument separator - invalid code */
return 0;
}
I haven't actually looked through the standard to check this parse, I've taken gcc's word for it, but informally the need for a matching : to each ? trumps both operator precedence and argument list syntax to make the first statement work. No such luck with the second.

This is per the C Standard, similar to how in C++, the following is a problem:
f(ClassTemplate<X, Y>) // f gets two arguments: 'ClassTemplate<X' and 'Y>'
If it is legal to add some extra parentheses there in C99, you can use:
f(((int[2]){ 1, 2 }));
^ ^
The rule specifying this behavior, from C99 §6.10.3/11, is as follows:
The sequence of preprocessing tokens bounded by the outside-most matching parentheses
forms the list of arguments for the function-like macro.
The individual arguments within the list are separated by comma preprocessing tokens, but comma preprocessing tokens between matching inner parentheses do not separate arguments.

To the extent that it's a bug at all, it's with the standard, not gcc (i.e., in this respect I believe gcc is doing what the standard requires).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight