C translation phase 4 - c

Recently I encountered the following issue. My implementation was looking like this:
#define MY_CODE_VERSION PROJ_VERSION
#include "project.h"
if (3 != MY_CODE_VERSION)
PROJ_VERSION was defined in project.h. Why didn't I get a compilaton warning/error? Because I was trying to define something on a macro that was not known by the time the compiler was reaching the line #define MY_CODE_VERSION PROJ_VERSION.
I took a look over these phases from ANSI C but I can't figure it out the reason (the actual behaviour of the compiler, at which phase MY_CODE_VERSION takes the value of PROJ_VERSION).
My assuption is that this replacement takes place only at line "#if (3 != MY_CODE_VERSION)" and by this time PROJ_VERSION is already known by the compiler from the inclusion of project.h above.
Thank you in advance

I'll not hash out what you already know. What you apparently did not know:
6.10.3.4 Rescanning and further replacement
After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing
tokens are removed. Then, the resulting preprocessing token sequence
is rescanned, along with all subsequent preprocessing tokens of the
source file, for more macro names to replace.
If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s
preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced, it is not
replaced. These nonreplaced macro name preprocessing tokens are no
longer available for further replacement even if they are later
(re)examined in contexts in which that macro name preprocessing token
would otherwise have been replaced.
The resulting completely macro-replaced preprocessing token sequence is not processed as a preprocessing directive even if it
resembles one, but all pragma unary operator expressions within it are
then processed as specified in 6.10.9 below.
In short, once a macro has been expanded and all the string-izer and concatenations have been performed, the resulting "thing" is scanned once again for more stuff to replace. If the same name is found, it is not replaced.
So what you're seeing is standard-defined.

Related

Split function-like macro call between two source files

Is there anything in the C Standard preventing me of doing the following?
// main.c
#define DECORATE(x) ***x***
#include "call_macro.h"
this is the text I want decorated)
// call_macro.h
DECORATE(
When running it though gcc -E main.c, I expected to get
*** this is the text I want decorated***
Instead, it complained about macro_call.h:2: error: unterminated argument list invoking macro "DECORATE", but I can't actually find any prohibition of it in the standard.
Thoughts?
5.1.1.2 Translation phases
(1.4) Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed... A #include preprocessing directive causes the named header or source file to be processed from phase 1
through phase 4, recursively. All preprocessing directives are then deleted.
I believe this says that each included header is preprocessed separately, before being "merged" into the overall translation unit. It is at this point that an incomplete function-like macro invocation would be ill-formed:
6.10.3/4 ... There shall exist a ) preprocessing token that terminates the invocation.
This ...
#define DECORATE(x) ***x***
... is a complete definition of function-like macro DECORATE (C17 6.10.3/10). Its scope is the remainder of the translation unit (there being no corresponding #undef; C17 6.10.3.5/1) including source and header files #included into that portion of the translation unit.
For it to be possible for the text of an invocation of that macro to start in an included file and complete in the main file, the model for #include would need to be similar to that for macro expansion: the whole text of the included file being inserted and then processed in the context of the surrounding preprocessing tokens. But that is not the model.
Paragraph 5.1.1.2/1 describes the logical phases for translating C source code. The relevant one here is phase 4 (emphasis added):
Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. If a
character sequence that matches the syntax of a universal character
name is produced by token concatenation (6.10.3.3), the behavior is
undefined. A #include preprocessing directive causes the named
header or source file to be processed from phase 1 through phase 4,
recursively. All preprocessing directives are then deleted.
That #includeing a file causes that file to be processed from phase 1 through phase 4 means, among other things, that any function-like macro invocation that starts in that file must be complete within that file. Preprocessing tokens following the #include directive are not relevant to the preprocessing of the included file.
And paragraph 6.10.3/10 says (in part):
Each subsequent instance of the function-like macro name followed by a
( as the next preprocessing token introduces the sequence of
preprocessing tokens that is replaced by the replacement list in the
definition (an invocation of the macro).
That does not leave room to interpret the
DECORATE(
in macro_call.h as anything other than the beginning of an invocation of DECORATE(), yet that cannot be processed because
The replaced sequence of preprocessing tokens is terminated by the matching ) preprocessing token
and in fact, paragraph 6.10.3/4 requires there to be such a token,
but no such preprocessing token appears in the file.

Nested invocations of function-like macros

Consider the following code snippet:
#define FOO() BAR
#define BAR() FOO
FOO()()()
The C standard tells us that after argument substitution etc., the preprocessing tokens resulting from a macro invocation are re-scanned for further macro-names, ignoring the name of the macro that generated them (c99, 6.10.3.4p1-2)
Thus, I'd expect the preprocessor to turn the snippet into BAR()(), then FOO(), and then stop, because the token FOO is a result of the macro FOO, and isn't recognized as a macro name.
But both GCC and clang give me the result BAR, indicating that it is, in fact, expanding one more time. This makes sense only if the invocation of the macro "happens" at the argument list - where the macro name FOO is no longer ignored - and not at the macro name itself. This is very unintuitive, and I find no mention of it in the standard. What am I missing?
Thanks in advance!
Here is the relevant passage from the C Standard:
6.10.3.4 Rescanning and further replacement
1 After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing tokens are removed. The resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.
2 If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced. These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts in which that macro name preprocessing token would otherwise have been replaced.
3 The resulting completely macro-replaced preprocessing token sequence is not processed as a preprocessing directive even if it resembles one, but all pragma unary operator expressions within it are then processed as specified in 6.10.9 below.
If for example you had written
#define QQ() QQ
QQ()()()
The expansion would be just QQ()() because as per 2) when QQ is found during the scan of the replacement list, it is not expanded.
Conversely, in your example, FOO is not found in the replacement list of FOO(), BAR is followed by () which causes it to be expanded and in turn BAR not found in the replacement list of BAR(), but FOO followed by the last set of () is again expanded.
The phrase if any nested replacements encounter the name of the macro being replaced, it is not replaced refers to replacements occurring during expansion of macro arguments. In your example, the replacements occur iteratively, not recursively, hence extra set of () will cause further expansion.
The C preprocessor implements Prosser's blue paint algorithm.
In the moment when a function symbol is expanded, that symbol is painted blue and a blue symbol is not expanded again.
To fully understand how CPP works you must google for "blue paint" and read...

C Preprocessor: Dynamic #Define Creation

I would like to have the expansion of these C preprocessor lines:
#define _POUND_ #define
_POUND_ _FALSE 0
_FALSE
expand so the last line (i.e. _FALSE) expands to 0. I understand recursive CPP isn't possible directly but that it can be done. Unfortunately, I'm not fully sure I follow the logic presented in this link.
I think I need to force an additional evaluation but I don't know how to do that in this case (i.e. I have tried and failed).
Can you help?
As indicated several times over in comments, what you are looking for is not supported. Here's what the standard has to say about it:
A preprocessing directive consists of a sequence of preprocessing tokens that satisfies the following constraints: The first token in the sequence is a # preprocessing token that (at the start of translation phase 4) is either the first character in the source file (optionally after white space containing no new-line characters) or that follows white space containing at least one new-line character.
(C2011, 6.10/2; emphasis added)
Translation phase 4 is the one in which preprocessing directives are executed, so it follows that macro expansion during phase 4 cannot cause bona fide preprocessing directives to be created. Macros can be expanded to text that has the form of a preprocessing directive, but such text cannot actually be a directive.
It is true that the text resulting from a macro expansion is re-scanned for more macros to expand, but that process does not involve recognizing preprocessing directives that were not already there.

Macro Expansion: Argument with Commas

The code I'm working on uses some very convoluted macro voodoo in order to generate code, but in the end there is a construct that looks like this
#define ARGS 1,2,3
#define MACROFUNC_OUTER(PARAMS) MACROFUNC_INNER(PARAMS)
#define MACROFUNC_INNER(A,B,C) A + B + C
int a = MACROFUNC_OUTER(ARGS);
What is expected is to get
int a = 1 + 2 + 3;
This works well for the compiler it has originally been written for (GHS) and also for GCC, but MSVC (2008) considers PARAMS as a single preprocessing token that it won't expand, setting then A to the whole PARAM and B and C to nothing. The result is this
int a = 1,2,3 + + ;
while MSVC warns that not enough actual parameters for macro 'MACROFUNC_INNER'.
Is it possible to get MSVC do the expansion with some tricks (another layer of macro to force a second expansion, some well placed ## or #, ...). Admitting that changing the way the construct work is not an option. (i.e.: can I solve the problem myself?)
What does the C standard say about such corner case? I couldn't find in the C11 norm anything that explicitly tells how to handle arguments that contains a list of arguments. (i.e.: can I argue with the author of the code that he has to write it again, or is just MVSC non-conform?)
MSVC is non-conformant. The standard is actually clear on the point, although it does not feel the need to mention this particular case, which is not exceptional.
When a function-like macro invocation is encountered, the preprocessor:
§6.10.3/11 identifies the arguments, which are possibly empty sequences of tokens separated by non-protected commas , (a comma is protected if it is inside parentheses ()).
§6.10.3.1/1 does a first pass over the macro body, substituting each parameter which is not used in a # or ## operation with the corresponding fully macro-expanded argument. (It does no other substitutions in the macro body in this step.)
§6.10.3.4/1 rescans the substituted replacement token sequence, performing more macro replacements as necessary.
(The above mostly ignores stringification (#) and token concatenation (##), which are not relevant to this question.)
This order of operations unambiguously leads to the behaviour expected by whoever wrote the software.
Apparently (according to #dxiv, and verified here) the following standards-compliant workaround works on some versions of MS Visual Studio:
#define CALL(A,B) A B
#define OUTER(PARAM) CALL(INNER,(PARAM))
#define INNER(A,B,C) whatever
For reference, the actual language from the C11 standard, skipping over the references to # and ## handling:
§6.10.3 11 The sequence of preprocessing tokens bounded by the outside-most matching parentheses forms the list of arguments for the function-like macro. The individual arguments within the list are separated by comma preprocessing tokens, but comma preprocessing tokens between matching inner parentheses do not separate arguments.…
§6.10.3.1 1 After the arguments for the invocation of a function-like macro have been identified, argument substitution takes place. A parameter in the replacement list… is replaced by the corresponding argument after all macros contained therein have been expanded. Before being substituted, each argument’s preprocessing tokens are completely macro replaced as if they formed the rest of the preprocessing file…
§6.10.3.4 1 After all parameters in the replacement list have been substituted… [t]he resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.
C11 says that each appearance of an object-like macro's name
[is] replaced by the replacement list of preprocessing tokens that constitute the remainder of the directive. The replacement list is then rescanned for more macro names as specified below.
[6.10.3/9]
Of function-like macros it says this:
If the identifier-list in the macro definition does not end with an ellipsis, the number of arguments [...] in an invocation of a function-like macro shall equal the number of parameters in the macro definition.
[6.10.3/4]
and this:
The sequence of preprocessing tokens bounded by the outside-most matching parentheses forms the list of arguments for the function-like macro.
[6.10.3/11]
and this:
After the arguments for the invocation of a function-like macro have been identified, argument substitution takes place. A parameter in the replacement list [...] is replaced by the corresponding argument after all macros contained therein have been expanded. Before being substituted, each argument’s preprocessing tokens are completely macro replaced as if they formed the rest of the preprocessing file; no other preprocessing tokens are available.
[6.10.3.1/1]
Of macros in general it also says this:
After all parameters in the replacement list have been substituted [... t]he resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.
[6.10.3.4/1]
MSVC++ does not properly expand the arguments to function-like macros before rescanning the expansion of such macros. It seems unlikely that there is any easy workaround.
UPDATE:
In light of #dxiv's answer, however, it may be that there is a solution after all. The problem with his solution with respect to standard-conforming behavior is that there needs to be one more expansion than is actually performed. That can easily enough be supplied. This variation on his approach works with GCC, as it should, and inasmuch as it is based on code that dxiv claims works with MSVC++, it seems likely to work there, too:
#define EXPAND(x) x
#define PAREN(...) (__VA_ARGS__)
#define EXPAND_F(m, ...) EXPAND(m PAREN(__VA_ARGS__))
#define SUM3(a,b,c) a + b + c
#define ARGS 1,2,3
int sum = EXPAND_F(SUM3, ARGS);
I have of course made it a little more generic than perhaps it needs to be, but that may serve you well if you have a lot of these to deal with..
Curiuosly enough, the following appears to work in MSVC (tested with 2010 and 2015).
#define ARGS 1,2,3
#define OUTER(...) INNER PARAN(__VA_ARGS__)
#define PARAN(...) (__VA_ARGS__)
#define INNER(A,B,C) A + B + C
int a = OUTER(ARGS);
I don't know that it's supposed to work by the letter of the standard, in fact I have a hunch it's not. Could still be conditionally compiled just for MSVC, as a workaround.
[EDIT] P.S. As pointed out in the comments, the above is (another) non-standard MSVC behavior. Instead, the alternative workarounds posted by #rici and #JohnBollinger in the respective replies are compliant, thus recommended.

Inserting a one-line line comment with a preprocessor macro

Is it possible to simulate a one-line comment (//) using a preprocessor macro (or magic)? For example, can this compile with gcc -std=c99?
#define LINE_COMMENT() ???
int main() {
LINE_COMMENT() asd(*&##)($*?><?><":}{)(#
return 0;
}
No. Here is an extract from the standard showing the phases of translation of a C program:
The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.
Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal character name is produced by token concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.
As you can see, comments are removed before macros are expanded, so a macro cannot expand into a comment.
You can obviously define a macro that takes an argument and expands to nothing, but it's slightly more restrictive than a comment, as its argument must consist only of valid preprocessor token characters (e.g. no # or unmatched quotes). Not very useful for general commenting purposes.
No. Comments are processed at preprocessor phase. You can do selective compilation (without regard to comments) with #if directives, as in:
#if 0
... // this stuff will not be compiled
...
#endif // up to here.
that's all the magic you can do with the limited macro preprocessor available in C/C++.

Resources