Is # permitted in an object-like macro, and if so, what happens?
The C standard only defines the behaviour of # in a macro for function-like macros.
Sample code:
#include <stdio.h>
#define A X#Y
#define B(X) #X
#define C(X) B(X)
int main()
{
printf(C(A) "\n");
}
gcc outputs X#Y, suggesting that it permits # to be present and performs no special processing. However, since the definition of the # operator does not define the behaviour in this case, is it actually undefined behaviour?
As you noticed, # only has a defined effect in function-like macros. § 6.10.3.2/1 (all references to the standard are to the C11 draft (N1570)). To see what happens in object-like macros, we must look elsewhere.
A preprocessing directive of the form
# define identifier replacement-list new-line
defines an
object-like macro
that causes each subsequent instance of the macro name
to be replaced by the replacement list of preprocessing tokens that constitute the
remainder of the directive. [...]
§ 6.10.3/9
Therefore, the only question is whether # is allowed in a replacement-list. If so, it takes part in replacement as usual.
We find the syntax in § 6.10/1:
replacement-list:
pp-tokens (opt.)
pp-tokens:
preprocessing-token
pp-tokens preprocessing-token
Now, is # a valid preprocessing-token? § 6.4/1 says:
preprocessing-token:
header-name
identifier
pp-number
character-constant
string-literal
punctuator
each non-white-space character that cannot be one of the above
It's certainly not a header-name (§ 6.4.7/1), it's not allowed in identifier tokens (§ 6.4.2.1/1), nor is it a pp-number (which is basically any number in an allowed format, § 6.4.8/1), nor a character-constant (such as u'c', § 6.4.4.4/1) or a string-literal (exactly what you'd expect, e.g. L"String", § 6.4.5/1).
However, it is listed as a punctuator in § 6.4.6/1. Therefore, it is allowed in the replacement-list of an object-like macro and will be copied verbatim. It is now subject to rescanning as described in § 6.10.3.4. Let us look at your example:
C(A) will be replaced with C(X#Y). # has no special effect here, because it is not in the replacement-list of C, but its argument. C(X#Y) is obviously turned into B(X#Y). Then B's argument is turned into a string literal via the # operator in B's replacement-list, yielding "X#Y"
Therefore, you don't have undefined behavior.
Related
When the c preprocessor runs an #if/#elif preprocessing directive, it performs 4 operations on the tokens that directly follow:
Replace every occurrence of defined {identifier} with 1 if {identifier} is defined, 0 otherwise.
Invoke all macros.
Replace every remaining identifier with 0.
Parse and evaluate the result as a constant-expression.
Now, it's pretty clear from the standard (c99, 6.10.1) that steps 3 and 4 actually happen in that order, and after 1 and 2 are completed. But I can't find any clarification on the order of 1 and 2.
From some limited testing I did, it seems that gcc executes steps 1 and 2 based on the ordering of the tokens - in defined MACRO, the defined executes first, but in MACRO(defined ID) the macro does.
Is this behavior required by the standard? Implementation-defined? Undefined?
Your step 2 is performed first. The order is steps 2, 1, 3, 4:
C 2018 6.10.1 4 says [emphasis mine]:
Prior to evaluation, macro invocations in the list of preprocessing tokens that will become the controlling constant expression are replaced (except for those macro names modified by the defined unary operator), just as in normal text…
6.10.1 1 says that the controlling expression of an #if or #elif shall be an integer constant expression which may additionally contain the expressions defined identifier or defined ( identifier ) which:
… evaluate to 1 if the identifier is currently defined as a macro name (that is, if it is predefined or if it has been the subject of a #define preprocessing directive without an intervening #undef directive with the same subject identifier), 0 if it is not.
Thus, first macro replacement is performed, per 6.10.1 4, and later the evaluation of the expression is performed.
Note that, in spite of the fact that macro replacement occurs first, if it generates defined tokens, they are not necessarily evaluated, because 6.10.1 4 goes on to say:
… If the token defined is generated as a result of this replacement process…, the behavior is undefined.
Then your step 3 occurs, again per 6.10.1 4:
… After all replacements due to macro expansion and the defined unary operator have been performed, all remaining identifiers (including those lexically identical to keywords) are replaced with the pp-number 0, and then each preprocessing token is converted into a token…
Then the controlling expression is evaluated, your step 4:
… The resulting tokens compose the controlling constant expression which is evaluated according to the rules of 6.6…
please, tell me what's wrong with this code:
// part of main.c
#if have(some_macro)
printf("Import succeeded!\n");
#else
# error Import failed!
#endif
// part of utils.h
#define have(macro) defined(macro)
If I write just #if defined(some_macro) it works.
What's the problem?
You cannot have anything following #if expand to defined. That is subject to undefined behavior.
From the C99 Standard (emphasis mine):
6.10.1 Conditional inclusion
...
3 Preprocessing directives of the forms
# if *constant-expression new-line group<sub>opt</sub>*
# elif *constant-expression new-line group<sub>opt</sub>*
check whether the controlling constant expression evaluates to nonzero.
4 Prior to evaluation, macro invocations in the list of preprocessing tokens that will become the controlling constant expression are replaced (except for those macro names modified by the defined unary operator), just as in normal text. If the token defined is generated as a result of this replacement process or use of the defined unary operator does not match one of the two specified forms prior to macro replacement, the behavior is undefined.
Platform: Ubuntu 14.04 LTS
Compiler: GCC 4.8.2 (Ubuntu 4.8.2-19ubuntu1)
I am writing a header that needs to be backwards complaint with C90. I have some optional functions that are included if the complier supports C99 or C11. So I use the standard feature test macros. I just want to know if the way I do this is correct.
#if __STDC__
//code
#endif
#if __STDC_VERSION__ >= 199000L
//code
#endif
#if __STDC_VERSION__ >= 201100L
//code
#endif
I am assuming that if the compiler is strictly C90 or C89 then the compiler will error out and say __STDC_VERSION__ is undefined am I correct? If so should I use defined()? For example #if defined(__STDC__) rather than just #if __STDC__.
Undefined identifiers in preprocessor-conditionals are replaced by 0 if they are not argument to define, thus no error:
6.10.1 Conditional inclusion
3 Preprocessing directives of the forms
#if constant-expression new-line groupopt
#elif constant-expression new-line groupopt
check whether the controlling constant expression evaluates to nonzero.
4 Prior to evaluation, macro invocations in the list of preprocessing tokens that will become
the controlling constant expression are replaced (except for those macro names modified
by the defined unary operator), just as in normal text. If the token defined is
generated as a result of this replacement process or use of the defined unary operator
does not match one of the two specified forms prior to macro replacement, the behavior is
undefined. After all replacements due to macro expansion and the defined unary
operator have been performed, all remaining identifiers (including those lexically
identical to keywords) are replaced with the pp-number 0, and then each preprocessing
token is converted into a token. [...]
You can use the #error directive to force a compilation-error.
Let's say I have the following piece of code. Is it correct to say that the conditional directive always evaluates to zero since M is expanded before compile time where the value of i is zero according to C specification? Could someone explain it if I'm wrong:
#include <stdio.h>
#define M i
int main() {
int i = 10;
#if M == 0
i = 0;
#endif
printf("%d", i);
return 0;
}
With #if identifiers that are not macros are all considered to be the number zero. The gcc docs for #if lay out the requirements nicel, which says (emphasis mine going forward):
#if expression
expression is a C expression of integer type, subject to stringent restrictions. It may contain
Integer constants.
Character constants, which are interpreted as they would be in normal code.
Arithmetic operators for addition, subtraction, multiplication, division, bitwise operations, shifts, comparisons, and logical operations (&& and ||). The latter two obey the usual short-circuiting rules of standard C.
Macros. All macros in the expression are expanded before actual computation of the expression's value begins.
Uses of the defined operator, which lets you check whether macros are defined in the middle of an ‘#if’.
Identifiers that are not macros, which are all considered to be the number zero. This allows you to write #if MACRO instead of #ifdef MACRO, if you know that MACRO, when defined, will always have a nonzero value. Function-like macros used without their function call parentheses are also treated as zero.
and notes that:
In some contexts this shortcut is undesirable. The -Wundef option causes GCC to warn whenever it encounters an identifier which is not a macro in an ‘#if’.
This is covered in the draft C99 standard section 6.10.1 Conditional inclusion paragraph 4 which says:
[...]After all replacements due to macro expansion and the defined unary
operator have been performed, all remaining identifiers (including those lexically
identical to keywords) are replaced with the pp-number 0, and then each preprocessing
token is converted into a token.[...]
Yes, they are treated as the number 0.
C11 §6.10.1 Conditional inclusion
... After all replacements due to macro expansion and the defined unary operator have been performed, all remaining identifiers (including those lexically identical to keywords) are replaced with the pp-number 0, and then each preprocessing token is converted into a token.
pp-number here refers to preprocessing numbers in §6.4.8.
The C standard gives the following example:
#define hash_hash # ## #
#define mkstr(a) # a
#define in_between(a) mkstr(a)
#define join(c, d) in_between(c hash_hash d)
char p[] = join(x, y); // equivalent to char p[] = "x ## y";
But it also says 'The order of evaluation of # and ## operators is unspecified.'
Why is the expansion of hash_hash guaranteed to be interpreted as the ## operator applied to #'s, instead of the # operator applied to ##?
Because '#' only acts as an operator if it appears in a function-like macro and is followed by a parameter name ... but hash_hash isn't a function-like macro and those '#' aren't followed by parameter names.
Quote C99:
A punctuator is a symbol that has independent syntactic and semantic
significance. Depending on context, it may specify an operation to be
performed (which in turn may yield a value or a function designator,
produce a side effect, or some combination thereof) in which case it
is known as an operator (other forms of operator also exist in some
contexts). An operand is an entity on which an operator acts.
# and ## are punctuators.
Besides :
6.10.3.1 Argument substitution
After the arguments for the invocation of a function-like macro have been identified, argument
substitution takes place. A parameter in the replacement list, unless
preceded by a # or ## preprocessing token or followed by a ##
preprocessing token (see below), is replaced by the corresponding
argument after all macros contained therein have been expanded. Before
being substituted, each argument’s preprocessing tokens are completely
macro replaced as if they formed the rest of the preprocessing file;
no other preprocessing tokens are available.