About ## preprocessor in C - c-preprocessor

Given
#define cat(x,y) x##y
The call cat(a,1) returns a1, but cat(cat(1,2),3) is undefined.
However if I also define #define xcat(x,y) cat(x,y), then the result of xcat(xcat(1,2),3) is now 123. Can anybody please explain in detail why this is so?

I tested this using both GCC and Clang.
GCC gives the error:
test.c:6:1: error: pasting ")" and "3" does not give a valid preprocessing token
Clang gives the error:
test.c:6:11: error: pasting formed ')3', an invalid preprocessing token
int b = cat(cat(1,2),3);
What appears to be happening is that the compiler wraps the result of cat(1,2) in parentheses as soon as it is expanded; so when you call cat(1,2) in your code, it really gives you (12). Then, calling cat((12),3) again leads to ((12)3), which is not a valid token, and this results in a compile error.
The common opinion is "when using the token-pasting operator (##), you should use two levels of indirection" (i.e., use your xcat workaround). See Why do I need double layer of indirection for macros? and What should be done with macros that need to paste two tokens together?.

In xcat(x,y), the x and y are not adjacent to the ## operator, and
so they undergo macro expansion before being substituted.
So x is identified as xcat(1,2) and y is identified as 3. But prior
to substitution, x is macro-expanded to cat(1,2), which turns into 1##2
which turns into 12. So ultimately, xcat(xcat(1,2),3) will expand
to cat(12,3), which will turn out 123.
This Works --> cat(xcat(1,2),3) --> cat(cat(1,2),3) --> cat(12,3)
The behavior is well-defined because all of the token pastings
result in valid preprocessor tokens i.e any expanded xpression should be a valid token at any stage.

I don't think if cat is actually gonna be expanded for 2 consecutive times. That's why I wonder why would compiler even produce such a message like 'pasting ")" and "3" does not give a valid preprocessing token'.
Also, I don't think the inner cat is gonna be expanded first. So, I presume the output would be cat(1,2)3. That direct me to cogitate how would the compiler interpret this.

Related

C variadic macro with two named parameters

I want to use a variadic macro but it appears to be designed to only treat the first parameter specially. I want the first two parameters to be named and the rest not, like so:
#define FOO(AA,BB,...) AA->BB(AA,##...)
FOO(mystruct,funcname,123)
However this is not working with LLVM. Am I doing something wrong, or is there a limitation to how the variadic macro works?
UPDATE
The correct answer is, use ##VA_ARGS instead of ##...
There are some webpages that claim that "..." is valid but at least with the MacOS llvm it is not.
The macro arguments are not expanded with ... in the macro expansion - how could they, because then you couldn't have a macro that used ellipsis in the expansion. Instead it will be available as a special parameter __VA_ARGS__.
With this, the following program
#define FOO(AA,BB,...) AA->BB(AA, __VA_ARGS__)
FOO(mystruct,funcname,123)
FOO(mystruct,funcname,123,456)
will be preprocessed to
The ## is a token-pasting operator. It will make a single preprocessing token out of 2 parts. , ## ... attempts to make a preprocessing token ,.... It is not a valid C token, and that is why Clang will report
<source>:3:1: error: pasting formed ',...', an invalid preprocessing token
... macro arguments are pasted into macro bodies with __VA_ARGS__.
The problem is how to allow for it to be empty.
If it is empty, you'll usually want to comma before it erased and
you can use the GNU ##__VA_ARGS__ extension to achieve that.
#define FOO(AA,BB,...) AA->BB(AA,##__VA_ARGS__) /*GNU extension*/
FOO(mystruct,funcname) //warning with -pedantic
FOO(mystruct,funcname,123)
The above, however, will trigger warnings if compiled with -pedantic.
If you want your macro usable without warnings at -pedantic, you could perhaps achieve that by swapping the first two arguments in the macro definition.
#define FIRST(...) FIRST_(__VA_ARGS__,)
#define FIRST_(X,...) X
#define BAR_(CallExpr,...) CallExpr(__VA_ARGS__)
#define BAR(BB,/*AA,*/...) BAR_(FIRST(__VA_ARGS__)->BB,__VA_ARGS__)
BAR(funcname,mystruct) //no warning
BAR(funcname,mystruct,123)

Why does a variadic macro give me an error?

Given this sample code:
#define vX(a, ...) ((a)(__VA_ARGS__) ? 1 : 0)
{
int f();
vX(f);
}
I get error C2155: '?': invalid left operand, expected arithmetic or pointer type
On the other hand if I provide a second argument to the macro it compiles fin - eg.:
vX(f,1)
is OK. I'm compiling C code with the msvc compiler.
Sorry to bother everyone but the mistake was on my side - the 2 functions that were giving me error not only had no argument but were of void return type also - that was causing my problem and not anything macro related.
From the GCC documentation:
When the macro is invoked, all the tokens in its argument list after the last named argument (this macro has none), including any commas, become the variable argument. This sequence of tokens replaces the identifier __VA_ARGS__ in the macro body wherever it appears.
So, basically the __VA_ARGS__ part cannot be empty, that requires GNU extensions (__VA_OPT__).
The specification of the preprocessor about this is somewhat verbose, but suffice it to say, that you specified vX must accept at least two arguments.
The reason is that the number of arguments is largely determined by the number of commas in the macro. So for instance vX(f,) would cause your error to go away as well. The reason is that we again provide two arguments, namely f and an empty sequence of tokens after the comma.
One trick to get around it, is to split the macro across two expansions:
#define vX_(a, ...) ((a)(__VA_ARGS__) ? 1 : 0)
#define vX(...) vX_(__VA_ARGS__,)
Note how I added that comma? Now when you write vX(f) it will expand to vX_(f,) which will expand again to give you the expression you wanted. Although, that will not work in the general case, since you'll get a trailing comma. That is why GCC introduced __VA_OPT__ (#unwind's answer), so that the comma could be added conditionally.

Expansion of function-like macro creates a separate token

I just found out that gcc seems to treat the result of the expansion of a function-like macro as a separate token. Here is a simple example showing the behavior of gcc:
#define f() foo
void f()_bar(void);
void f()bar(void);
void f()-bar(void);
When I execute gcc -E -P test.c (running just the preprocessor), I get the following output:
void foo _bar(void);
void foo bar(void);
void foo-bar(void);
It seems like, in the first two definitions, gcc inserts space after the expanded macro to ensure it is a separate token. Is that really what is happening here?
Is this mandated by any standard (I couldn't find documentation on the topic)?
I want to make _bar part of the same token. Is there any way to do this? I could use the token concatenation operator ## but it will require several levels of macros (since in the real code f() is more complex). I was wondering if there is a simple (and probably more readable) solution.
It seems like, in the first two definitions, gcc inserts space after the expanded macro to ensure it is a separate token. Is that really what is happening here?
Yes.
Is this mandated by any standard (I couldn't find documentation on the topic)?
Yes, although an implementation would be allowed to insert even more than one whitespace to separate the tokens.
f()_bar
here you have 4 tokens after lexical analysis (they are actually pre-processor tokens at this stage but let's call them tokens): f, (, ) and _bar.
The function-like macro replacement semantic (as defined in C11, 6.10.3) has to replace the 3 token f, (, ) into a new one foo. It is not allowed to work on other tokens and change the last _bar token. For this the implementation has to insert at least one whitespace to preserve _bar token. Otherwise the result would have been foo_bar which is a single token.
gcc preprocessor somewhat documents it here:
Once the input file is broken into tokens, the token boundaries never change, except when the ‘##’ preprocessing operator is used to paste tokens together. See Concatenation. For example,
#define foo() bar
foo()baz
==> bar baz
not
==> barbaz
In the other case, like f()-bar, there 5 tokens: f, (, ), - and bar. (- is a punctuator token in C whereas _ in _bar is simply a character of the identifier token). The implementation does not have to insert token separator (as whitespace) here as after macro replacement -bar are still considered as two separate tokens from C syntax.
gcc preprocessor (cpp) does not insert whitespace here simply because it does not have to. In cpp documentation, on token spacing it is written (on a different issue):
However, we would like to keep space insertion to a minimum, both for aesthetic reasons and because it causes problems for people who still try to abuse the preprocessor for things like Fortran source and Makefiles.
I didn't address the solution to your issue in this answer, but I think you have to use operator explicitly specified to concatenate tokens: the ## token pasting operator.
The only way I can think of (if you can not use the token concatenation operator ##) is using the traditional (pre-standard) C preprocessing:
gcc -E -P -traditional-cpp test.c
Output:
void foo_bar(void);
void foobar(void);
void foo-bar(void);
More info

Unmatched bracket macro weirdness

What is the correct output of preprocessing the following 3 lines under the C99 rules?
#define y(x) x
#define x(a) y(a
x(1) x(2)))
BTW cpp under linux produces an error message, but I can't see why the answer isn't simply
1 2
Assuming cpp is correct and I'm wrong, I'd be very grateful for an explanation.
When a macro is found, the preprocessor gathers up the arguments to the macro and then scans each macro argument in isolation for other macros to expand within the argument BEFORE the first macro is expanded:
6.10.3.1 Argument substitution
After the arguments for the invocation of a function-like macro have been identified,
argument substitution takes place. A parameter in the replacement list, unless preceded
by a # or ## preprocessing token or followed by a ## preprocessing token (see below), is
replaced by the corresponding argument after all macros contained therein have been
expanded. Before being substituted, each argument’s preprocessing tokens are
completely macro replaced as if they formed the rest of the preprocessing file; no other
preprocessing tokens are available.
So in this specific example, it sees x(1) and expands that, giving
y(1 x(2)))
It then identifies the macro call y(1 x(2)), with the argument 1 x(2) and prescans that for macros to expand. Within that it finds x(2) which expands to y(2 and then triggers the error due to there not being a ) for the y macro. Note at this point its still looking to expand the argument of the first y macro, so its looking at it in isolation WITHOUT considering the rest of the input file, unlike the expansion that takes place for 6.10.3.4
Now there's some question as to whether this should actually be an error, or if the preprocessor should treat this y(2 sequence as not being a macro invocation at all, as there is no ')'. If it does the latter then it will expand that y call to 1 y(2 which will then be combined with the rest of the input ()) and ultimately expand to 1 2
After a macro is expanded, attempts to expand macros in the resulting text occur in isolation before it is combined with the surrounding text. Thus the attempt to expand y(1 gives this error. It would actually be very difficult to specify macro expansion that works the way you want, while still meeting lots of the other required behaviors (such as lack of infinite recursion).

During C macro expansion, is there a special case for macros that would expand to "/*"?

Here's a relevant example. It's obviously not valid C, but I'm just dealing with the preprocessor here, so the code doesn't actually have to compile.
#define IDENTITY(x) x
#define PREPEND_ASTERISK(x) *x
#define PREPEND_SLASH(x) /x
IDENTITY(literal)
PREPEND_ASTERISK(literal)
PREPEND_SLASH(literal)
IDENTITY(*pointer)
PREPEND_ASTERISK(*pointer)
PREPEND_SLASH(*pointer)
Running gcc's preprocessor on it:
gcc -std=c99 -E macrotest.c
This yields:
(...)
literal
*literal
/literal
*pointer
**pointer
/ *pointer
Please note the extra space in the last line.
This looks like a feature to prevent macros from expanding to "/*" to me, which I'm sure is well-intentioned. But at a glance, I couldn't find anything pertaining to this behaviour in the C99 standard. Then again, I'm inexperienced at C. Can someone shed some light on this? Where is this specified? I would guess that a compiler adhering to C99 should not just insert extra spaces during macro expansion just because it would probably prevent programming mistakes.
The source code is already tokenized before being processed by CPP.
So what you have is a / and a * token that will not be combined implicitly to a /* "token" ( since /* is not really a preprocessor token I put it in "").
If you use -E to output preprocessed source CPP needs to insert a space in order to avoid /* being read by a subsequent compiler pass.
The same feature prevents from two e.g. + signs from different macros being combined into a ++ token on output.
The only way to really paste two preprocessor tokens together is with the ## operator:
#define P(x,y) x##y
...
P(foo,bar)
results in the token foobar
P(+,+)
results in the token ++, but
P(/,*)
is not valid since /* is not a valid preprocessor token.
The behavior of the pre-processor is standardized. In the summary at http://en.wikipedia.org/wiki/C_preprocessor , the results you are observing are the effect of:
"3: Tokenization - The preprocessor breaks the result into preprocessing tokens and whitespace. It replaces comments with whitespace".
This takes place before:
"4: Macro Expansion and Directive Handling".

Resources