Is there anything in the C Standard preventing me of doing the following?
// main.c
#define DECORATE(x) ***x***
#include "call_macro.h"
this is the text I want decorated)
// call_macro.h
DECORATE(
When running it though gcc -E main.c, I expected to get
*** this is the text I want decorated***
Instead, it complained about macro_call.h:2: error: unterminated argument list invoking macro "DECORATE", but I can't actually find any prohibition of it in the standard.
Thoughts?
5.1.1.2 Translation phases
(1.4) Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed... A #include preprocessing directive causes the named header or source file to be processed from phase 1
through phase 4, recursively. All preprocessing directives are then deleted.
I believe this says that each included header is preprocessed separately, before being "merged" into the overall translation unit. It is at this point that an incomplete function-like macro invocation would be ill-formed:
6.10.3/4 ... There shall exist a ) preprocessing token that terminates the invocation.
This ...
#define DECORATE(x) ***x***
... is a complete definition of function-like macro DECORATE (C17 6.10.3/10). Its scope is the remainder of the translation unit (there being no corresponding #undef; C17 6.10.3.5/1) including source and header files #included into that portion of the translation unit.
For it to be possible for the text of an invocation of that macro to start in an included file and complete in the main file, the model for #include would need to be similar to that for macro expansion: the whole text of the included file being inserted and then processed in the context of the surrounding preprocessing tokens. But that is not the model.
Paragraph 5.1.1.2/1 describes the logical phases for translating C source code. The relevant one here is phase 4 (emphasis added):
Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. If a
character sequence that matches the syntax of a universal character
name is produced by token concatenation (6.10.3.3), the behavior is
undefined. A #include preprocessing directive causes the named
header or source file to be processed from phase 1 through phase 4,
recursively. All preprocessing directives are then deleted.
That #includeing a file causes that file to be processed from phase 1 through phase 4 means, among other things, that any function-like macro invocation that starts in that file must be complete within that file. Preprocessing tokens following the #include directive are not relevant to the preprocessing of the included file.
And paragraph 6.10.3/10 says (in part):
Each subsequent instance of the function-like macro name followed by a
( as the next preprocessing token introduces the sequence of
preprocessing tokens that is replaced by the replacement list in the
definition (an invocation of the macro).
That does not leave room to interpret the
DECORATE(
in macro_call.h as anything other than the beginning of an invocation of DECORATE(), yet that cannot be processed because
The replaced sequence of preprocessing tokens is terminated by the matching ) preprocessing token
and in fact, paragraph 6.10.3/4 requires there to be such a token,
but no such preprocessing token appears in the file.
Related
Consider the following code snippet:
#define FOO() BAR
#define BAR() FOO
FOO()()()
The C standard tells us that after argument substitution etc., the preprocessing tokens resulting from a macro invocation are re-scanned for further macro-names, ignoring the name of the macro that generated them (c99, 6.10.3.4p1-2)
Thus, I'd expect the preprocessor to turn the snippet into BAR()(), then FOO(), and then stop, because the token FOO is a result of the macro FOO, and isn't recognized as a macro name.
But both GCC and clang give me the result BAR, indicating that it is, in fact, expanding one more time. This makes sense only if the invocation of the macro "happens" at the argument list - where the macro name FOO is no longer ignored - and not at the macro name itself. This is very unintuitive, and I find no mention of it in the standard. What am I missing?
Thanks in advance!
Here is the relevant passage from the C Standard:
6.10.3.4 Rescanning and further replacement
1 After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing tokens are removed. The resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.
2 If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced. These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts in which that macro name preprocessing token would otherwise have been replaced.
3 The resulting completely macro-replaced preprocessing token sequence is not processed as a preprocessing directive even if it resembles one, but all pragma unary operator expressions within it are then processed as specified in 6.10.9 below.
If for example you had written
#define QQ() QQ
QQ()()()
The expansion would be just QQ()() because as per 2) when QQ is found during the scan of the replacement list, it is not expanded.
Conversely, in your example, FOO is not found in the replacement list of FOO(), BAR is followed by () which causes it to be expanded and in turn BAR not found in the replacement list of BAR(), but FOO followed by the last set of () is again expanded.
The phrase if any nested replacements encounter the name of the macro being replaced, it is not replaced refers to replacements occurring during expansion of macro arguments. In your example, the replacements occur iteratively, not recursively, hence extra set of () will cause further expansion.
The C preprocessor implements Prosser's blue paint algorithm.
In the moment when a function symbol is expanded, that symbol is painted blue and a blue symbol is not expanded again.
To fully understand how CPP works you must google for "blue paint" and read...
Recently I encountered the following issue. My implementation was looking like this:
#define MY_CODE_VERSION PROJ_VERSION
#include "project.h"
if (3 != MY_CODE_VERSION)
PROJ_VERSION was defined in project.h. Why didn't I get a compilaton warning/error? Because I was trying to define something on a macro that was not known by the time the compiler was reaching the line #define MY_CODE_VERSION PROJ_VERSION.
I took a look over these phases from ANSI C but I can't figure it out the reason (the actual behaviour of the compiler, at which phase MY_CODE_VERSION takes the value of PROJ_VERSION).
My assuption is that this replacement takes place only at line "#if (3 != MY_CODE_VERSION)" and by this time PROJ_VERSION is already known by the compiler from the inclusion of project.h above.
Thank you in advance
I'll not hash out what you already know. What you apparently did not know:
6.10.3.4 Rescanning and further replacement
After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing
tokens are removed. Then, the resulting preprocessing token sequence
is rescanned, along with all subsequent preprocessing tokens of the
source file, for more macro names to replace.
If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s
preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced, it is not
replaced. These nonreplaced macro name preprocessing tokens are no
longer available for further replacement even if they are later
(re)examined in contexts in which that macro name preprocessing token
would otherwise have been replaced.
The resulting completely macro-replaced preprocessing token sequence is not processed as a preprocessing directive even if it
resembles one, but all pragma unary operator expressions within it are
then processed as specified in 6.10.9 below.
In short, once a macro has been expanded and all the string-izer and concatenations have been performed, the resulting "thing" is scanned once again for more stuff to replace. If the same name is found, it is not replaced.
So what you're seeing is standard-defined.
I would like to have the expansion of these C preprocessor lines:
#define _POUND_ #define
_POUND_ _FALSE 0
_FALSE
expand so the last line (i.e. _FALSE) expands to 0. I understand recursive CPP isn't possible directly but that it can be done. Unfortunately, I'm not fully sure I follow the logic presented in this link.
I think I need to force an additional evaluation but I don't know how to do that in this case (i.e. I have tried and failed).
Can you help?
As indicated several times over in comments, what you are looking for is not supported. Here's what the standard has to say about it:
A preprocessing directive consists of a sequence of preprocessing tokens that satisfies the following constraints: The first token in the sequence is a # preprocessing token that (at the start of translation phase 4) is either the first character in the source file (optionally after white space containing no new-line characters) or that follows white space containing at least one new-line character.
(C2011, 6.10/2; emphasis added)
Translation phase 4 is the one in which preprocessing directives are executed, so it follows that macro expansion during phase 4 cannot cause bona fide preprocessing directives to be created. Macros can be expanded to text that has the form of a preprocessing directive, but such text cannot actually be a directive.
It is true that the text resulting from a macro expansion is re-scanned for more macros to expand, but that process does not involve recognizing preprocessing directives that were not already there.
Let's say I have two files, a.h:
#if 1
#include "b.h"
and b.h:
#endif
Both gcc's and clang's preprocessors reject a.h:
$ cpp -ansi -pedantic a.h >/dev/null
In file included from a.h:2:0:
b.h:1:2: error: #endif without #if
#endif
^
a.h:1:0: error: unterminated #if
#if 1
^
However, the C standard (N1570 6.10.2.3) says:
A preprocessing directive of the form
# include "q-char-sequence" new-line
causes the replacement of that directive by the entire contents of the source file identified by the specified sequence between the " delimiters.
which appears to permit the construct above.
Are gcc and clang not compliant in rejecting my code?
The C standard defines 8 translation phases. A source file is processed by each of the 8 phases in sequence (or in an equivalent manner).
Phase 4, as defined in N1570 section 5.1.1.2, is:
Preprocessing directives are executed, macro invocations are expanded,
and _Pragma unary operator expressions are executed. If a
character sequence that matches the syntax of a universal character
name is produced by token concatenation (6.10.3.3), the behavior is
undefined. A #include preprocessing directive causes the named
header or source file to be processed from phase 1 through phase 4,
recursively. All preprocessing directives are then deleted.
The relevant sentence here is:
A #include preprocessing directive causes the named
header or source file to be processed from phase 1 through phase 4,
recursively.
which implies that each included source file is preprocessed by itself. This precludes having a #if in one file and the corresponding #endif in another.
(As "A wild elephant" mentioned in comments, and as rodrigo's answer says, the grammar in section 6.10 also says that an if-section, which starts with a #if (or #ifdef or #ifndef) line and ends with a #endif line, can only appear as part of a preprocessing-file.)
I think the compilers are right, or at best the standard is ambiguous.
The trick is not in how #include is implemented, but in the order in wich preprocessing is done.
Look at the grammar rules in section 6.10 of the C99 standard:
preprocessing-file:
group[opt]
group:
group-part
group group-part
group-part:
if-section
control-line
text-line
# non-directive
if-section:
if-group elif-groups[opt] else-group[opt] endif-line
if-group:
# if constant-expression new-line group[opt]
...
control-line:
# include pp-tokens new-line
...
As you can see, the #include stuff is nested inside the group, and group is the thing inside the #if / #endif.
For example, in a well-formed file such as:
#if 1
#include <a.h>
#endif
That will parse as #if 1, plus a group, plus #endif. And the inside group has an #include.
But in your example:
#if 1
#include <a.h>
The rule if-section does not apply to this source, so the group productions are not even checked.
Probably you can argue that the standard is ambiguous, because it does not specify when the replacement of the #include directive happen, and that a conforming implementation could shift a lot of grammar rules and replace the #include before failing for not finding the #endif. But these ambiguities are impossible to avoid if the side effects of the syntax modify the text you are parsing. Isn't C wonderful?
Thinking of a C preprocessor as a very simple compiler, to translate a file a C preprocessor conceptually carries out a few phases.
Lexical analysis – Groups the sequence of characters making up the preprocessing translation unit into strings having an identified meaning (tokens) in the preprocessor language.
Syntactic analysis – Groups the tokens of the preprocessing translation unit into syntactic structures built according to the preprocessing language grammar.
Code generation – Translates all files making up the preprocessing translation unit into a single file containing 'pure' C instructions only.
Strictly speaking, the translation phases mentioned in §5.1.1.2 of the C Standard (ISO/IEC 9899:201x) relating to preprocessing are phase 3 and phase 4. Phase 3 corresponds almost exactly to lexical analysis while phase 4 is about code generation.
Syntactic analysis (parsing) seems to be missing from that picture. Indeed, the C preprocessor grammar is so simple that real preprocessors/compilers perform it along with lexical analysis.
If the syntactic analysis phase ends successfully – i.e. all statements in the preprocessing translation unit are legal according to the preprocessor grammar – code generation can take place and all preprocessing directives are executed.
Executing a preprocessing directive means to transform the source file according to the its semantics and then removing the directive from the source file.
The semantics for each preprocessor directive is specified in §6.10.1-6.10.9 of the C Standard.
Getting back to your sample program, the 2 files you provided, i.e. a.h and b.h, are conceptually processed as follows.
Lexical Analysis - Each individual preprocessing token is delimited by a '{' on the left and a '}' on the right.
a.h
{#}{if} {1}
{#}{include} {"b.h"}
b.h
{#}{endif}
This phase is performed without errors and its result, the sequence of preprocessing tokens, is passed to the subsequent phase: syntactic analysis.
Syntactic Analysis
A tentative derivation for a.h is given below
preprocessing-file →
group →
group-part →
if-section →
if-group endif-line →
if-group #endif new-line →
…
and it is clear that the contents of a.h cannot be derived from the preprocessing grammar – in fact the terminating #endif is missing – and therefore a.h is not syntactically correct. This is exactly what your compiler is telling you when it writes
a.h:1:0: error: unterminated #if
Something similar happens for b.h; reasoning backwards, the #endif can only be derived from the rule
if-section →
if-group elif-groups[opt] else-group[opt] endif-line
This means the file contents should be derived from one of the following 3 groups
# if constant-expression new-line group[opt]
# ifdef identifier new-line group[opt]
# ifndef identifier new-line group[opt]
Since it is not the case, because b.h does not contain # if/# ifdef/# ifndef but only the single #endif line, again the contents of b.h is not syntactically correct and your compiler tells you about that this way
In file included from a.h:2:0:
b.h:1:2: error: #endif without #if
Code Generation
Of course, since your program is lexically sound but syntactically not correct, this phase never gets performed.
#if / #ifdef / #ifndef
#elif
#else
#endif
must be matched within one file.
Is it possible to simulate a one-line comment (//) using a preprocessor macro (or magic)? For example, can this compile with gcc -std=c99?
#define LINE_COMMENT() ???
int main() {
LINE_COMMENT() asd(*&##)($*?><?><":}{)(#
return 0;
}
No. Here is an extract from the standard showing the phases of translation of a C program:
The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.
Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal character name is produced by token concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.
As you can see, comments are removed before macros are expanded, so a macro cannot expand into a comment.
You can obviously define a macro that takes an argument and expands to nothing, but it's slightly more restrictive than a comment, as its argument must consist only of valid preprocessor token characters (e.g. no # or unmatched quotes). Not very useful for general commenting purposes.
No. Comments are processed at preprocessor phase. You can do selective compilation (without regard to comments) with #if directives, as in:
#if 0
... // this stuff will not be compiled
...
#endif // up to here.
that's all the magic you can do with the limited macro preprocessor available in C/C++.