int main(void)
{
#if 0
something"
#endif
return 0;
}
A simple program above generates a warning: missing terminating " character in gcc. This seems odd, because it means that the compiler allow the code blocks between #if 0 and endif have invalid statement like something here, but not double quotes " that don't pair. The same happens in the use of #ifdef and #ifndef.
Real comments are fine here:
int main(void)
{
/*
something"
*/
return 0;
}
Why? And the single quote ' behave similarly, is there any other tokens that are treating specially?
See the comp.Lang.c FAQ, 11.19:
Under ANSI C, the text inside a "turned off" #if, #ifdef, or #ifndef must still consist of "valid preprocessing tokens." This means that the characters " and ' must each be paired just as in real C code, and the pairs mustn't cross line boundaries.
Compilation needs to go through many cycles, before generating executable binary.
You are not in the compiler yet. Your pre-processor is flagging this error. This will not check for C language syntax, but missing quotes, braces and things like that are pre-processor errors.
After this pre-processor pass, Your code will go to the C Compiler which will detect the error you are expecting...
The preprocessor works at the token level, and a string literal is considered a single token. The preprocessor is warning you that you have an invalid token.
According to the C99 standard, a preprocessing token is one of these things:
header-name
identifier
pp-number
character-constant
string-literal
punctuator
each non-white-space character that cannot be one of the
above
The standard also says:
If a ' or a " character matches the last category, the behavior is
undefined.
Things like "statement" above are invalid to the C compiler, but it is a valid token, and the preprocessor eliminates this token before it gets to the compiler.
Beside the Kevin's answer, Incompatibilities of GCC says:
GCC complains about unterminated character constants inside of preprocessing conditionals that fail. Some programs have English comments enclosed in conditionals that are guaranteed to fail; if these comments contain apostrophes, GCC will probably report an error. For example, this code would produce an error:
#if 0
You can't expect this to work.
#endif
The best solution to such a problem is to put the text into an actual C comment delimited by /*...*/.
Related
I was reading about tokens and counting the number of tokens in a program.
Previously I read somewhere that preprocessor commands are not counted as tokens.
But when I read about tokens on Geeksforgeeks it is given in section "special symbols":
pre processor(#): The preprocessor is a macro processor that is used automatically by the compiler to transform your program before actual compilation.
So I am confused that in a program, if we write #define will it be a token?
For example:
#include<stdio.h>
#define max 100
int main()
{
printf("max is %d", max);
return 0;
}
How many tokens are in this example.?
The linked article is full of basic errors, and should not be relied upon.
The process of parsing C or C++ is defined as a series of transformations:1
Backslash-newline is replaced with nothing whatsoever -- not even a space.
Comments are removed and replaced with a single space each.
The surviving text is converted into a series of preprocessing tokens. These are less specific than the tokens used by the language proper: for instance, the keyword if is an IF token to the language proper, but just an IDENT token to the preprocessor.
Preprocessing directives are executed and macros are expanded.
Each preprocessing token is converted into a token.
the stream of tokens is parsed into an abstract syntax tree, and the rest of the compiler takes it from there.
Your example program
#include<stdio.h>
#define max 100
int main()
{
printf("max is %d", max);
return 0;
}
will, after transformation 3, be this series of 23 preprocessing tokens:
PUNCT:# IDENT:include INCLUDE-ARG:<stdio.h>
PUNCT:# IDENT:define IDENT:max PP-NUMBER:100
IDENT:int IDENT:main PUNCT:( PUNCT:)
PUNCT:{
IDENT:printf PUNCT:( STRING:"max is %d" PUNCT:, IDENT:max PUNCT:) PUNCT:;
IDENT:return PP-NUMBER:0 PUNCT:;
PUNCT:}
The directives are still present at this stage. Please notice that #include and #define are each two tokens: the # and the directive name are separate. Some people like to write complex #if nests with the hashmarks all in column 1 but the directive names indented.
After transformation 5, though, the directives are gone and we have this series of 16+n tokens:
[ ... some large volume of tokens produced from the contents of stdio.h ... ]
INT IDENT:main LPAREN RPAREN
LBRACE
IDENT:printf LPAREN STRING:"max is %d" COMMA DECIMAL-INTEGER:100 RPAREN SEMICOLON
RETURN DECIMAL-INTEGER:0 SEMICOLON
RBRACE
where 'n' is however many tokens came from stdio.h.
Preprocessing directives (#include, #define, #if, etc.) are always removed from the token stream and perhaps replaced with something else, so you will never have tokens after transformation 6 that directly result from the text of a directive line. But you will usually have tokens that result from the effects of each directive, such as the contents of stdio.h, and DECIMAL-INTEGER:100 replacing IDENT:max.
Finally, C and C++ do this series of operations almost, but not quite, the same, and the specifications are formally independent. You can usually rely on preprocessing operations to behave the same in both languages, as long as you're only doing simple things with the preprocessor, which is best practice nowadays anyway.
1 You will sometimes see people talking about translation phases, which are the way the C and C++ standards officially describe this series of operations. My list is not the list of translation phases; it includes separate bullet points for some things that are grouped as a single phase by the standards, and leaves out several steps that aren't relevant to this discussion.
This question already has answers here:
In which step of compilation are comments removed?
(2 answers)
Closed 5 years ago.
Consider this (horrible, terrible, no good, very bad) code structure:
#define foo(x) // commented out debugging code
// Misformatted to not obscure the point
if (a)
foo(a);
bar(a);
I've seen two compilers' preprocessors generate different results on this code:
if (a)
bar(a);
and
if (a)
;
bar(a);
Obviously, this is a bad thing for a portable code base.
My question: What is the preprocessor supposed to do with this? Elide comments first, or expand macros first?
Unfortunately, the original ANSI C Specification specifically excludes any Preprocessor features in section 4 ("This specification describes only the C language. It makes no provision for either the library or the preprocessor.").
The C99 specification handles this explicity, though. The comments are replaced with a single space in the "translation phase", which happens prior to the Preprocessing directive parsing. (Section 6.10 for details).
VC++ and the GNU C Compiler both follow this paradigm - other compilers may not be compliant if they're older, but if it's C99 compliant, you should be safe.
As described in this copy-n-pasted decription of the translation phases in the C99 standard, removing comments (they are replaced by a single whitespace) occurs in translation phase 3, while preprocessing directives are handled and macros are expanded in phase 4.
In the C90 standard (which I only have in hard copy, so no copy-n-paste) these two phases occur in the same order, though the description of the translation phases is slightly different in some details from the C99 standard - the fact that comments are removed and replaced by a single whitespace character before preprocessing directives are handled and macros expanded is not different.
Again, the C++ standard has these 2 phases occur in the same order.
As far as how the '//' comments should be handled, the C99 standard says this (6.4.9/2):
Except within a character constant, a string literal, or a comment, the characters //
introduce a comment that includes all multibyte characters up to, but not including, the
next new-line character.
And the C++ standard says (2.7):
The characters // start a comment, which terminates with the next newline
character.
So your first example is clearly an error on the part of that translator - the ';' character after the foo(a) should be retained when the foo() macro is expanded - the comment characters should not be part of the 'contents' of the foo() macro.
But since you're faced with a buggy translator, you might want to change the macro definition to:
#define foo(x) /* junk */
to workaround the bug.
However (and I'm drifting off topic here...), since line splicing (backslashes just before a new-line) occurs before comments are processed, you can run into something like this bit of nasty code:
#define evil( x) printf( "hello "); // hi there, \
printf( "%s\n", x); // you!
int main( int argc, char** argv)
{
evil( "bastard");
return 0;
}
Which might surprise whoever wrote it.
Or even better, try the following, written by someone (certainly not me!) who likes box-style comments:
int main( int argc, char** argv)
{
//----------------/
printf( "hello "); // Hey, what the??/
printf( "%s\n", "you"); // heck?? /
//----------------/
return 0;
}
Depending on whether your compiler defaults to processing trigraphs or not (compilers are supposed to, but since trigraphs surprise nearly everyone who runs across them, some compilers decide to turn them off by default), you may or may not get the behavior you want - whatever behavior that is, of course.
According to MSDN, comments are replaced with a single space in the tokenization phase,
which happens before the preprocessing phase where macros are expanded.
Never put // comments in your macros. If you must put comments, use /* */. In addition, you have a mistake in your macro:
#define foo(x) do { } while(0) /* junk */
This way, foo is always safe to use. For example:
if (some condition)
foo(x);
will never throw a compiler error regardless of whether or not foo is defined to some expression.
#ifdef _TEST_
#define _cerr cerr
#else
#define _cerr / ## / cerr
#endif
will work on some compilers (VC++). When _TEST_ is not defined,
_cerr ...
will be replaced by the comment line
// cerr ...
I seem to recall that compliance requires three steps:
strip
expand macros
strip again
The reason for this has to do with the compiler being able to accept .i files directly.
I've made a trivial reduction of my issue:
#define STR_BEG "
#define STR_END "
int main()
{
char * s = STR_BEG abc STR_END;
printf("%s\n", s);
}
When compiling this, I get the following error:
static2.c:12:16: error: expected expression
char * s = STR_BEG abc STR_END;
^
static2.c:7:17: note: expanded from macro 'STR_BEG'
#define STR_BEG "
Now, if I just run the preprocessor, gcc -E myfile.c, I get:
int main()
{
char * s = " abc ";
printf("%s\n", s);
}
Which is exactly what I wanted, and perfectly legal resultant code. So what's the deal?
The macro isn't really expanding "correctly", because this isn't a valid C Preprocessor program. As Kerrek says, the preprocessor doesn't quite work on arbitrary character sequences - it works on whole tokens. Tokens are punctuation characters, identifiers, numbers, strings, etc. of the same form (more or less) as the ones that form valid C code. Those defines do not describe valid strings - they open them, and fail to close them before the end of the line. So an invalid token sequence is being passed to the preprocessor. The fact it manages to produce output from an invalid program is arguably handy, but it doesn't make it correct and it almost certainly guarantees garbage output from the preprocessor at best. You need to terminate your strings for them to form whole tokens - right now they form garbage input.
To actually wrap a token, or token sequence, in quotes, use the stringification operator #:
#define STRFY(A) #A
STRFY(abc) // -> "abc"
GCC and similar compilers will warn you about mistakes like this if you compile or preprocess with the -Wall flag enabled.
(I assume you only get errors when you try to compile as C, but not when you do it in two passes, because internally to the compiler, it retains the information that these are "broken" tokens, which is lost if you write out an intermediate file and then compile the preprocessed source in a second pass... if so, this is an implementation detail, don't rely on it.)
One possible solution to your actual problem might look like this:
#define LPR (
#define start STRFY LPR
#define end )
#define STRFY(A) #A
#define ID(...) __VA_ARGS__
ID(
char * s = start()()()end; // -> char * s = "()()()";
)
The ID wrapper is necessary, though. There's no way to do it without that (it can go around any number of lines, or even your whole program, but it must exist for reasons that are well-covered in other questions).
int main(void)
{
#if 0
something"
#endif
return 0;
}
A simple program above generates a warning: missing terminating " character in gcc. This seems odd, because it means that the compiler allow the code blocks between #if 0 and endif have invalid statement like something here, but not double quotes " that don't pair. The same happens in the use of #ifdef and #ifndef.
Real comments are fine here:
int main(void)
{
/*
something"
*/
return 0;
}
Why? And the single quote ' behave similarly, is there any other tokens that are treating specially?
See the comp.Lang.c FAQ, 11.19:
Under ANSI C, the text inside a "turned off" #if, #ifdef, or #ifndef must still consist of "valid preprocessing tokens." This means that the characters " and ' must each be paired just as in real C code, and the pairs mustn't cross line boundaries.
Compilation needs to go through many cycles, before generating executable binary.
You are not in the compiler yet. Your pre-processor is flagging this error. This will not check for C language syntax, but missing quotes, braces and things like that are pre-processor errors.
After this pre-processor pass, Your code will go to the C Compiler which will detect the error you are expecting...
The preprocessor works at the token level, and a string literal is considered a single token. The preprocessor is warning you that you have an invalid token.
According to the C99 standard, a preprocessing token is one of these things:
header-name
identifier
pp-number
character-constant
string-literal
punctuator
each non-white-space character that cannot be one of the
above
The standard also says:
If a ' or a " character matches the last category, the behavior is
undefined.
Things like "statement" above are invalid to the C compiler, but it is a valid token, and the preprocessor eliminates this token before it gets to the compiler.
Beside the Kevin's answer, Incompatibilities of GCC says:
GCC complains about unterminated character constants inside of preprocessing conditionals that fail. Some programs have English comments enclosed in conditionals that are guaranteed to fail; if these comments contain apostrophes, GCC will probably report an error. For example, this code would produce an error:
#if 0
You can't expect this to work.
#endif
The best solution to such a problem is to put the text into an actual C comment delimited by /*...*/.
This question already has answers here:
In which step of compilation are comments removed?
(2 answers)
Closed 5 years ago.
Consider this (horrible, terrible, no good, very bad) code structure:
#define foo(x) // commented out debugging code
// Misformatted to not obscure the point
if (a)
foo(a);
bar(a);
I've seen two compilers' preprocessors generate different results on this code:
if (a)
bar(a);
and
if (a)
;
bar(a);
Obviously, this is a bad thing for a portable code base.
My question: What is the preprocessor supposed to do with this? Elide comments first, or expand macros first?
Unfortunately, the original ANSI C Specification specifically excludes any Preprocessor features in section 4 ("This specification describes only the C language. It makes no provision for either the library or the preprocessor.").
The C99 specification handles this explicity, though. The comments are replaced with a single space in the "translation phase", which happens prior to the Preprocessing directive parsing. (Section 6.10 for details).
VC++ and the GNU C Compiler both follow this paradigm - other compilers may not be compliant if they're older, but if it's C99 compliant, you should be safe.
As described in this copy-n-pasted decription of the translation phases in the C99 standard, removing comments (they are replaced by a single whitespace) occurs in translation phase 3, while preprocessing directives are handled and macros are expanded in phase 4.
In the C90 standard (which I only have in hard copy, so no copy-n-paste) these two phases occur in the same order, though the description of the translation phases is slightly different in some details from the C99 standard - the fact that comments are removed and replaced by a single whitespace character before preprocessing directives are handled and macros expanded is not different.
Again, the C++ standard has these 2 phases occur in the same order.
As far as how the '//' comments should be handled, the C99 standard says this (6.4.9/2):
Except within a character constant, a string literal, or a comment, the characters //
introduce a comment that includes all multibyte characters up to, but not including, the
next new-line character.
And the C++ standard says (2.7):
The characters // start a comment, which terminates with the next newline
character.
So your first example is clearly an error on the part of that translator - the ';' character after the foo(a) should be retained when the foo() macro is expanded - the comment characters should not be part of the 'contents' of the foo() macro.
But since you're faced with a buggy translator, you might want to change the macro definition to:
#define foo(x) /* junk */
to workaround the bug.
However (and I'm drifting off topic here...), since line splicing (backslashes just before a new-line) occurs before comments are processed, you can run into something like this bit of nasty code:
#define evil( x) printf( "hello "); // hi there, \
printf( "%s\n", x); // you!
int main( int argc, char** argv)
{
evil( "bastard");
return 0;
}
Which might surprise whoever wrote it.
Or even better, try the following, written by someone (certainly not me!) who likes box-style comments:
int main( int argc, char** argv)
{
//----------------/
printf( "hello "); // Hey, what the??/
printf( "%s\n", "you"); // heck?? /
//----------------/
return 0;
}
Depending on whether your compiler defaults to processing trigraphs or not (compilers are supposed to, but since trigraphs surprise nearly everyone who runs across them, some compilers decide to turn them off by default), you may or may not get the behavior you want - whatever behavior that is, of course.
According to MSDN, comments are replaced with a single space in the tokenization phase,
which happens before the preprocessing phase where macros are expanded.
Never put // comments in your macros. If you must put comments, use /* */. In addition, you have a mistake in your macro:
#define foo(x) do { } while(0) /* junk */
This way, foo is always safe to use. For example:
if (some condition)
foo(x);
will never throw a compiler error regardless of whether or not foo is defined to some expression.
#ifdef _TEST_
#define _cerr cerr
#else
#define _cerr / ## / cerr
#endif
will work on some compilers (VC++). When _TEST_ is not defined,
_cerr ...
will be replaced by the comment line
// cerr ...
I seem to recall that compliance requires three steps:
strip
expand macros
strip again
The reason for this has to do with the compiler being able to accept .i files directly.