Macro Expansion: Argument with Commas - c

The code I'm working on uses some very convoluted macro voodoo in order to generate code, but in the end there is a construct that looks like this
#define ARGS 1,2,3
#define MACROFUNC_OUTER(PARAMS) MACROFUNC_INNER(PARAMS)
#define MACROFUNC_INNER(A,B,C) A + B + C
int a = MACROFUNC_OUTER(ARGS);
What is expected is to get
int a = 1 + 2 + 3;
This works well for the compiler it has originally been written for (GHS) and also for GCC, but MSVC (2008) considers PARAMS as a single preprocessing token that it won't expand, setting then A to the whole PARAM and B and C to nothing. The result is this
int a = 1,2,3 + + ;
while MSVC warns that not enough actual parameters for macro 'MACROFUNC_INNER'.
Is it possible to get MSVC do the expansion with some tricks (another layer of macro to force a second expansion, some well placed ## or #, ...). Admitting that changing the way the construct work is not an option. (i.e.: can I solve the problem myself?)
What does the C standard say about such corner case? I couldn't find in the C11 norm anything that explicitly tells how to handle arguments that contains a list of arguments. (i.e.: can I argue with the author of the code that he has to write it again, or is just MVSC non-conform?)

MSVC is non-conformant. The standard is actually clear on the point, although it does not feel the need to mention this particular case, which is not exceptional.
When a function-like macro invocation is encountered, the preprocessor:
§6.10.3/11 identifies the arguments, which are possibly empty sequences of tokens separated by non-protected commas , (a comma is protected if it is inside parentheses ()).
§6.10.3.1/1 does a first pass over the macro body, substituting each parameter which is not used in a # or ## operation with the corresponding fully macro-expanded argument. (It does no other substitutions in the macro body in this step.)
§6.10.3.4/1 rescans the substituted replacement token sequence, performing more macro replacements as necessary.
(The above mostly ignores stringification (#) and token concatenation (##), which are not relevant to this question.)
This order of operations unambiguously leads to the behaviour expected by whoever wrote the software.
Apparently (according to #dxiv, and verified here) the following standards-compliant workaround works on some versions of MS Visual Studio:
#define CALL(A,B) A B
#define OUTER(PARAM) CALL(INNER,(PARAM))
#define INNER(A,B,C) whatever
For reference, the actual language from the C11 standard, skipping over the references to # and ## handling:
§6.10.3 11 The sequence of preprocessing tokens bounded by the outside-most matching parentheses forms the list of arguments for the function-like macro. The individual arguments within the list are separated by comma preprocessing tokens, but comma preprocessing tokens between matching inner parentheses do not separate arguments.…
§6.10.3.1 1 After the arguments for the invocation of a function-like macro have been identified, argument substitution takes place. A parameter in the replacement list… is replaced by the corresponding argument after all macros contained therein have been expanded. Before being substituted, each argument’s preprocessing tokens are completely macro replaced as if they formed the rest of the preprocessing file…
§6.10.3.4 1 After all parameters in the replacement list have been substituted… [t]he resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.

C11 says that each appearance of an object-like macro's name
[is] replaced by the replacement list of preprocessing tokens that constitute the remainder of the directive. The replacement list is then rescanned for more macro names as specified below.
[6.10.3/9]
Of function-like macros it says this:
If the identifier-list in the macro definition does not end with an ellipsis, the number of arguments [...] in an invocation of a function-like macro shall equal the number of parameters in the macro definition.
[6.10.3/4]
and this:
The sequence of preprocessing tokens bounded by the outside-most matching parentheses forms the list of arguments for the function-like macro.
[6.10.3/11]
and this:
After the arguments for the invocation of a function-like macro have been identified, argument substitution takes place. A parameter in the replacement list [...] is replaced by the corresponding argument after all macros contained therein have been expanded. Before being substituted, each argument’s preprocessing tokens are completely macro replaced as if they formed the rest of the preprocessing file; no other preprocessing tokens are available.
[6.10.3.1/1]
Of macros in general it also says this:
After all parameters in the replacement list have been substituted [... t]he resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.
[6.10.3.4/1]
MSVC++ does not properly expand the arguments to function-like macros before rescanning the expansion of such macros. It seems unlikely that there is any easy workaround.
UPDATE:
In light of #dxiv's answer, however, it may be that there is a solution after all. The problem with his solution with respect to standard-conforming behavior is that there needs to be one more expansion than is actually performed. That can easily enough be supplied. This variation on his approach works with GCC, as it should, and inasmuch as it is based on code that dxiv claims works with MSVC++, it seems likely to work there, too:
#define EXPAND(x) x
#define PAREN(...) (__VA_ARGS__)
#define EXPAND_F(m, ...) EXPAND(m PAREN(__VA_ARGS__))
#define SUM3(a,b,c) a + b + c
#define ARGS 1,2,3
int sum = EXPAND_F(SUM3, ARGS);
I have of course made it a little more generic than perhaps it needs to be, but that may serve you well if you have a lot of these to deal with..

Curiuosly enough, the following appears to work in MSVC (tested with 2010 and 2015).
#define ARGS 1,2,3
#define OUTER(...) INNER PARAN(__VA_ARGS__)
#define PARAN(...) (__VA_ARGS__)
#define INNER(A,B,C) A + B + C
int a = OUTER(ARGS);
I don't know that it's supposed to work by the letter of the standard, in fact I have a hunch it's not. Could still be conditionally compiled just for MSVC, as a workaround.
[EDIT] P.S. As pointed out in the comments, the above is (another) non-standard MSVC behavior. Instead, the alternative workarounds posted by #rici and #JohnBollinger in the respective replies are compliant, thus recommended.

Related

How does C expand function-like macros at the end of a macro replacement list?

You can iterate over a preprocessor sequence using the following construct:
#define A() B
#define B() A
A()()()()()
Expands to B on most compilers/preprocessor: clang, gcc, tcc, chibicc, SDCC (I couldn't test msvc, because it didn't work on godbolt, but if you want to test it make sure to use the /Zc:preprocessor flag, because otherwise the preprocessor will be non conforment).
Reading 6.10.3.4 seems to suggest, that the expansion of B happens inside A, which would cause the second expansion of A not to happen, rather it would be painted blue, and the expansion would stop.
6.10.3.4 Rescanning and further replacement
After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing tokens are removed. The resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.
But Annex J.1 says that whether this is done using nesting or not is unspecified behavior:
When a fully expanded macro replacement list contains a function-like macro name as its last preprocessing token and the next preprocessing token from the source file is a (, and the fully expanded replacement of that macro ends with the name of the first macro and the next preprocessing token from the source file is again a (, whether that is considered a nested replacement (6.10.3).
Ok, fair, so most preprocessor use the non nesting approach, but what allows the following to work?
#define A() B(
#define B() A(
A()))))
Now granted the former will give you an error, for a "unterminated argument list invoking macro 'B'", but wouldn't you expect this to expand to A())), where A is now painted blue, which shouldn't give an error?
And further, you can get rid of the error by detecting the last closing parentheses, showing that this does also not seem to use nesting, which is weird, because where does the standard suggest that this is valid?
There is already a similar question on SO, but I don't see how the answer has anything to do with the question, since the passage quoted is only talking about argument substitution:
6.10.3.1 Argument substitution
After the arguments for the invocation of a function-like macro have been identified, argument substitution takes place. A parameter
in the replacement list, unless preceded by a # or ## preprocessing
token or followed by a ## preprocessing token (see below), is replaced
by the corresponding argument after all macros contained therein have
been expanded. Before being substituted, each argument’s preprocessing
tokens are completely macro replaced as if they formed the rest of the
preprocessing file; no other preprocessing tokens are available.
Which makes sense, so e.g. in #define A(x) x x x the argument x passed to A would only need to be expanded once in insolation and afterwards the resulting tokens are inserted in place of the occurrences of x in the expansion list.
This also explains the following behavior:
#define STR(x) #x
#define f(x) x
#define F(x) STR(x(23))
F(f) // expands to "f(23)"
So the in isolation part refers to the arguments them self and not what happens in the rescanned, that is detailed in 6.10.3.4, which is my initial standard quote.
So what is going on here, how should I think about the macro expansion process?
From my reading of DR17, if the ) is joined with the result of expansion on the left of it has been intentionally left unspecified in the standard. The behavior is undefined. Strictly conforming programs shouldn't use this.
Why is the macro-name not painted blue? ,
https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_017.html ,
https://port70.net/%7Ensz/c/c11/n1570.html#6.10.3.4p4

Nested invocations of function-like macros

Consider the following code snippet:
#define FOO() BAR
#define BAR() FOO
FOO()()()
The C standard tells us that after argument substitution etc., the preprocessing tokens resulting from a macro invocation are re-scanned for further macro-names, ignoring the name of the macro that generated them (c99, 6.10.3.4p1-2)
Thus, I'd expect the preprocessor to turn the snippet into BAR()(), then FOO(), and then stop, because the token FOO is a result of the macro FOO, and isn't recognized as a macro name.
But both GCC and clang give me the result BAR, indicating that it is, in fact, expanding one more time. This makes sense only if the invocation of the macro "happens" at the argument list - where the macro name FOO is no longer ignored - and not at the macro name itself. This is very unintuitive, and I find no mention of it in the standard. What am I missing?
Thanks in advance!
Here is the relevant passage from the C Standard:
6.10.3.4 Rescanning and further replacement
1 After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing tokens are removed. The resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.
2 If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced. These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts in which that macro name preprocessing token would otherwise have been replaced.
3 The resulting completely macro-replaced preprocessing token sequence is not processed as a preprocessing directive even if it resembles one, but all pragma unary operator expressions within it are then processed as specified in 6.10.9 below.
If for example you had written
#define QQ() QQ
QQ()()()
The expansion would be just QQ()() because as per 2) when QQ is found during the scan of the replacement list, it is not expanded.
Conversely, in your example, FOO is not found in the replacement list of FOO(), BAR is followed by () which causes it to be expanded and in turn BAR not found in the replacement list of BAR(), but FOO followed by the last set of () is again expanded.
The phrase if any nested replacements encounter the name of the macro being replaced, it is not replaced refers to replacements occurring during expansion of macro arguments. In your example, the replacements occur iteratively, not recursively, hence extra set of () will cause further expansion.
The C preprocessor implements Prosser's blue paint algorithm.
In the moment when a function symbol is expanded, that symbol is painted blue and a blue symbol is not expanded again.
To fully understand how CPP works you must google for "blue paint" and read...

Parameterized macros involving the ## operator in the replacement-list

In the book that I am reading "C Programming A Modern Approach", there is a section on Page 343 that discusses some tricks you can use to get around certain deficits in macros.
The example problem is depicted as follows:
#define CONCAT(x,y) x##y (Directive 1)
The author then explains that the following line of code will fail to function as intended if using the aforementioned directive:
CONCAT(a, CONCAT(b,c))
This line of code will result in aCONCAT(b,c) as opposed to the desired abc.
In order to address this shortcoming, the author proposes the following work-around:
#define CONCAT2(x,y) CONCAT(x,y) (Directive 2)
The author explains that the presence of Directive 1 and Directive 2 will ensure that the slightly different line of code CONCAT2(a, CONCAT2(b,c)) is correctly replaced with abc.
(notice that this line of code is different than the original line of code...CONCAT2 is used instead of CONCAT.)
Could someone please walk me through why this will successfully carry out the desired objective? From what I understand, the preprocesser will keep scanning the precompiled code until all defined terms have been dealt with. For a given scan, how many defined words are updated per line?
I would think that the following flow of preprocessing replacements take place:
Given CONCAT2(a, CONCAT2(b,c))...
First pass over: CONCAT(a, CONCAT2(b,c))
However, for the second pass over, does CONCAT get expanded to its replacement list expression? Or does CONCAT2 get expanded to its replacement list expression? In either case, it seems like we once again arrive at a failed expression of either aCONCAT2(b,c) or CONCAT(a, CONCAT(b,c)), which would therefore still fail just like the very original case we presented.
Any help is greatly appreciated!
When the preprocessor detects a function-like macro invocation while scanning a source line, it completely expands the macro's arguments before substituting them into the macro's replacement text, except that where an argument appears as an operand of the stringification (#) or token-pasting (##) operator, its literal value is used for the operation. The resulting replacement text, with expanded arguments and the results of any # and ## operations substituted, is then rescanned for additional macros to expand.
Thus, with ...
CONCAT(a, CONCAT(b,c))
... the literal values of both arguments are used as operands for the token-pasting operation. The result is ...
aCONCAT(b,c)
. That is rescanned for further macros to expand, but aCONCAT is not defined as a macro name, so no further macro expansion occurs.
Now consider ...
CONCAT2(a, CONCAT2(b,c))
. In CONCAT2, neither argument is an operand of # or ##, so both are fully macro-expanded before being substituted. Of course a is unchanged, but CONCAT2(b,c) expands to CONCAT(b,c), which upon rescan is expanded to bc. By substitution of the expanded argument values into its replacement text, the outer CONCAT2 invocation expands to ...
CONCAT(a, bc)
. That expansion is then rescanned, in the context of the surrounding source text, for further macro expansion, yielding ...
abc
. That is again rescanned, but there are no further macro expansions to perform, so that's the final result.

Substitute parent tokens in child macro

I'm looking to replace some tokens within a called macro but can't seem to determine the right ordering of expansion and\or deferral. For example:
#define EXPAND(...) __VA_ARGS__
#define REPLACE(hello,y) EXPAND(y)
REPLACE(goodbye, hello world)
In my mind the REPLACE macro would call the EXPAND macro, making it functionally identical to:
#define REPLACE(hello,y) hello world
Allowing the hello world to be transformed into goodbye world.
My compiler (MSVC 2017) doesn't seem to be doing that, so I suspect that I'm in the wrong here. I've read up on expansion and deferral and have tried many different combinations of DEFER() and EXPAND(), but none seem to give the result I'm after.
Does anyone have any insight into what I'm doing wrong?
That is not how macro parameters are handled, and for a reason. If the use of a macro parameter name in the arguments to a macro could be replaced, then it would be impossible to write safe macros: accidentally using the name of a macro parameter would cause chaos, and there is no reason why a macro caller needs to know what the names of the parameters are. Macro parameters are local to the macro expansion, similar to the way that function parameters are local to the body of the function.
Here's the actual substitution algorithm, from §6.10.3.1/1 [Argument Substitution] of the C standard:
After the arguments for the invocation of a function-like macro have been identified, argument substitution takes place. A parameter in the replacement list… is replaced by the corresponding argument after all macros contained therein have been expanded. Before being substituted, each argument’s preprocessing tokens are completely macro replaced as if they formed the rest of the preprocessing file; no other preprocessing tokens are available.
Note that the arguments are macro replaced before being placed into the macro expansion. Once that is done, the parameter names in the replacement list are no longer relevant, and are not part of the replaced text.
Once the macro invocation has been replaced with its expansion, the resulting tokens are then scanned again (§6.10.3.4: "The resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace."). However, since the macro invocation has been completely replaced prior to this rescan, the parameter tokens no longer appear.
So this particular solution to your problem is a dead-end. I recommend that you back up a step and focus on the problem you actually wish to solve.

Unmatched bracket macro weirdness

What is the correct output of preprocessing the following 3 lines under the C99 rules?
#define y(x) x
#define x(a) y(a
x(1) x(2)))
BTW cpp under linux produces an error message, but I can't see why the answer isn't simply
1 2
Assuming cpp is correct and I'm wrong, I'd be very grateful for an explanation.
When a macro is found, the preprocessor gathers up the arguments to the macro and then scans each macro argument in isolation for other macros to expand within the argument BEFORE the first macro is expanded:
6.10.3.1 Argument substitution
After the arguments for the invocation of a function-like macro have been identified,
argument substitution takes place. A parameter in the replacement list, unless preceded
by a # or ## preprocessing token or followed by a ## preprocessing token (see below), is
replaced by the corresponding argument after all macros contained therein have been
expanded. Before being substituted, each argument’s preprocessing tokens are
completely macro replaced as if they formed the rest of the preprocessing file; no other
preprocessing tokens are available.
So in this specific example, it sees x(1) and expands that, giving
y(1 x(2)))
It then identifies the macro call y(1 x(2)), with the argument 1 x(2) and prescans that for macros to expand. Within that it finds x(2) which expands to y(2 and then triggers the error due to there not being a ) for the y macro. Note at this point its still looking to expand the argument of the first y macro, so its looking at it in isolation WITHOUT considering the rest of the input file, unlike the expansion that takes place for 6.10.3.4
Now there's some question as to whether this should actually be an error, or if the preprocessor should treat this y(2 sequence as not being a macro invocation at all, as there is no ')'. If it does the latter then it will expand that y call to 1 y(2 which will then be combined with the rest of the input ()) and ultimately expand to 1 2
After a macro is expanded, attempts to expand macros in the resulting text occur in isolation before it is combined with the surrounding text. Thus the attempt to expand y(1 gives this error. It would actually be very difficult to specify macro expansion that works the way you want, while still meeting lots of the other required behaviors (such as lack of infinite recursion).

Resources