Take, for example, the following:
#define FOO
FOO #define BAR 1
BAR
What should, according to each of the ANSI C and C99 standards, be the preprocessed output of the above code?
It seems to me that this should be evaluated to 1; however, running the above example through both gcc -E and clang -E produces the following:
#define BAR 1
BAR
The draft standard "ISO/IEC 9899:201x Committee Draft — April 12, 2011 N1570" section 6.10 actually contains an example of this:
EXAMPLE In:
#define EMPTY
EMPTY # include <file.h>
the sequence of preprocessing tokens on the second line is not a preprocessing directive, because it does not begin with a # at the start of translation phase 4, even though it will do so after the macro EMPTY has been replaced.
It tells us that "... the second line is not a preprocessing directive ..."
So for your code
FOO #define BAR 1
is not a preprocessing directive meaning that only FOO will be replaced and BAR will not be defined. Consequently the output of the preprocessor is:
#define BAR 1
BAR
Your code is not valid
ISO/IEC 9899:2011, Section 6.10 Preprocessing directives:
A preprocessing directive consists of a sequence of preprocessing
tokens that satisfies the following constraints: The first token in
the sequence is a # preprocessing token that (at the start of
translation phase 4) is either the first character in the source file
(optionally after white space containing no new-line characters) or
that follows white space containing at least one new-line character.
This example actually occurs in the Standard (C17 6.10/8):
EXAMPLE In:
#define EMPTY
EMPTY # include <file.h>
the sequence of preprocessing tokens on the second line is not a preprocessing directive, because it does not begin with a # at the start of translation phase 4, even though it will do so after the macro EMPTY has been replaced.
So the output you see from gcc -E is correct. (Note: the amount of whitespace here is not significant , at that stage of translation the program has been translated to a sequence of preprocessing tokens; the different amounts of whitespace in the output is just an artefact of how gcc -E works).
Related
Were there widely used pre-ANSI C compilers† that required the # to be on the first column?
† I would accept any compiler on this list. If I can find mention of it in the comp.lang.c Usenet newsgroup in a post dated before 1995, I would accept it.
K&R C did not specify whether whitespace was permitted before the #. From the original The C Programming Language, §12¶1 of the "C Reference Manual" in Appendix A:
The C compiler contains a preprocessor capable of macro substitution, conditional compilation, and inclusion of named files. Lines beginning with # communicate with this preprocessor.
Thus, whether or not whitespace was permitted to precede the # was unspecified. This would mean a pre-ANSI compiler could fail to compile a program if the directive did not begin on the first column.
In ISO C (and in ANSI C before that), the C preprocessing directives were explicitly permitted to be prefixed with whitespace. In ANSI C (C-89):
A preprocessing directive consists of a sequence of preprocessing
tokens that begins with a # preprocessing token that is either the
first character in the source file (optionally after white space
containing no new-line characters) or that follows white space
containing at least one new-line character, and is ended by the next
new-line character.
ISO C.2011 has similar language, but is clarified even further:
A preprocessing directive consists of a sequence of preprocessing tokens that satisfies the
following constraints: The first token in the sequence is a # preprocessing token that (at
the start of translation phase 4) is either the first character in the source file (optionally
after white space containing no new-line characters) or that follows white space
containing at least one new-line character. The last token in the sequence is the first newline
character that follows the first token in the sequence.165) A new-line character ends
the preprocessing directive even if it occurs within what would otherwise be an invocation of a function-like macro.
165) Thus, preprocessing directives are commonly called ‘‘lines’’. These ‘‘lines’’ have no other syntactic
significance, as all white space is equivalent except in certain situations during preprocessing (see the
# character string literal creation operator in 6.10.3.2, for example).
Short answer: Yes.
I remember writing things like
#if foo
/* ... */
#else
#if bar
/* ... */
#else
#error "neither foo nor bar specified"
#endif
#endif
so that the various pre-ANSI compilers that I once used wouldn't complain about "unrecognized preprocessor directive '#error'". This would have been with Ritchie's original cc for the pdp11, or pcc (the "portable C compiler" which, IIRC, was the basis for the Vax cc of the 80's or so). Both of those compilers -- more accurately, the preprocessor used with both of those compilers -- definitely required the # to be in the first column. (Actually, although those compilers were very different, they might both have used different variants of basically the same preprocessor, which was always a separate program in those days.)
I would like to have the expansion of these C preprocessor lines:
#define _POUND_ #define
_POUND_ _FALSE 0
_FALSE
expand so the last line (i.e. _FALSE) expands to 0. I understand recursive CPP isn't possible directly but that it can be done. Unfortunately, I'm not fully sure I follow the logic presented in this link.
I think I need to force an additional evaluation but I don't know how to do that in this case (i.e. I have tried and failed).
Can you help?
As indicated several times over in comments, what you are looking for is not supported. Here's what the standard has to say about it:
A preprocessing directive consists of a sequence of preprocessing tokens that satisfies the following constraints: The first token in the sequence is a # preprocessing token that (at the start of translation phase 4) is either the first character in the source file (optionally after white space containing no new-line characters) or that follows white space containing at least one new-line character.
(C2011, 6.10/2; emphasis added)
Translation phase 4 is the one in which preprocessing directives are executed, so it follows that macro expansion during phase 4 cannot cause bona fide preprocessing directives to be created. Macros can be expanded to text that has the form of a preprocessing directive, but such text cannot actually be a directive.
It is true that the text resulting from a macro expansion is re-scanned for more macros to expand, but that process does not involve recognizing preprocessing directives that were not already there.
Is it possible to simulate a one-line comment (//) using a preprocessor macro (or magic)? For example, can this compile with gcc -std=c99?
#define LINE_COMMENT() ???
int main() {
LINE_COMMENT() asd(*&##)($*?><?><":}{)(#
return 0;
}
No. Here is an extract from the standard showing the phases of translation of a C program:
The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.
Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal character name is produced by token concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.
As you can see, comments are removed before macros are expanded, so a macro cannot expand into a comment.
You can obviously define a macro that takes an argument and expands to nothing, but it's slightly more restrictive than a comment, as its argument must consist only of valid preprocessor token characters (e.g. no # or unmatched quotes). Not very useful for general commenting purposes.
No. Comments are processed at preprocessor phase. You can do selective compilation (without regard to comments) with #if directives, as in:
#if 0
... // this stuff will not be compiled
...
#endif // up to here.
that's all the magic you can do with the limited macro preprocessor available in C/C++.
Given this piece of C code:
char s[] =
"start"
#ifdef BLAH
"mid"
#endif
"end";
what should the output of the preprocessor be? In other words, what should the actual compiler receive and be able to handle? To narrow the possibilities, let's stick to C99.
I'm seeing that some preprocessors output this:
#line 1 "tst00.c"
char s[] =
"start"
#line 9
"end";
or this:
# 1 "tst00.c"
char s[] =
"start"
# 7 "tst00.c"
"end";
gcc -E outputs this:
# 1 "tst00.c"
# 1 "<command-line>"
# 1 "tst00.c"
char s[] =
"start"
"end";
And gcc is perfectly fine compiling all of the above preprocessed code even with the -fpreprocessed option, meaning that no further preprocessing should be done as all of it has been done already.
The confusion stems from this wording of the 1999 C standard:
5.1.1.2 Translation phases
1 The precedence among the syntax rules of translation is specified by the following
phases.
...
4. Preprocessing directives are executed, macro invocations are expanded, and
_Pragma unary operator expressions are executed. ... All preprocessing directives are
then deleted.
...
6. Adjacent string literal tokens are concatenated.
7. White-space characters separating tokens are no longer significant. Each
preprocessing token is converted into a token. The resulting tokens are syntactically
and semantically analyzed and translated as a translation unit.
In other words, is it legal for the #line directive to appear between adjacent string literals? If it is, it means that the actual compiler must do another round of string literal concatenation, but that's not mentioned in the standard.
Or are we simply dealing with non-standard compiler implementations, gcc included?
The #line or # 1 lines you get from GCC -E (or a compatible tool) are added for the sake of human readers and any tools that might attempt to work with a text form of the output of the preprocessor. They are just for convenience.
In general, yes, directives may appear between concatenated string literal tokens. #line is no different from #ifdef in your example.
Or are we simply dealing with non-standard compiler implementations, gcc included?
-E and -fpreprocessed modes are not standardized. A standard preprocessor always feeds its output into a compiler, not a text file. Moreover:
The output of the preprocessor has no standard textual representation.
The reason for inserting #line directives is so that any __LINE__ and __FILE__ macros that you might insert into the already-preprocessed file, before preprocessing it again, will expand correctly. Perhaps, when compiling such a file, the compiler may notice and use the values when reporting errors. Usage of "preprocessed text files" is nonstandard and generally discouraged.
Given:
#error /*
*/ foo
Microsoft C++ outputs an error message of /* and GCC outputs foo.
Which is correct?
GCC is correct.
Replacement of comments (including line-breaks) happens in translation phase 3, pre-processing in translation phase 4 (ISO/IEC 9899:1999, §5.1.1.2).
Hence, the preprocessing part of the compiler does not "see" the line-breaks anymore.
And, #error is defined like this (§6.10.5):
A preprocessing directive of the form
# error pp-tokens_opt new-line
causes the implementation to produce a diagnostic message that includes the specified
sequence of preprocessing tokens.
So, the foo has to be part of the output.
GCC is correct because it should be replaced by a single space / * ... * / in the standard.