With the following code, what's the output of this code, and why?
#include <stdio.h>
int main() {
printf("Hello world\n"); // \\
printf("What's the meaning of this?");
return 0;
}
The backslash at the end of the 4th line is escaping the following new line so that they become one continuous line. And because we can see the // beginning a comment, the 5th line is commented out.
That is, your code is the equivalent of:
#include <stdio.h>
int main() {
printf("Hello world\n"); // \printf("What's the meaning of this?");
return 0;
}
The output is simply "Hello world" with a new line.
Edit: As Erik and pmg both said, this is true in C99 but not C89. Credit where credit is due.
It is defined in the 2nd phase of translation (ISO/IEC 9899:1999 §5.1.1.2):
Each instance of a backslash character (\) immediately followed by a new-line
character is deleted, splicing physical source lines to form logical source lines.
It's "Hello world\n". Didn't you try? Line continuation (and e.g trigraphs) are well documented, look it up. A syntax highlighting editor (e.g. Visual Studio with VA X) will make this obvious.
Note that this works in C99 and C++ - not C89
The trailing backslash causes the next line to be 'spliced' to the line that ends inthe backslash - even if it's part of a comment. This is nearly always unintentional (unless it's a deliberate obfuscation trick), and will cause a bug unless the next line is entirely whitespace or a comment itself.
This happens because 'line-splicing' occurs in translation phase 2, while removing comments happens in phase 3.
Newer compilers will warn about the single-line comment being continued to the next line (I'm not sure exactly what warning level might be required though):
GCC 4.5.1 (MinGW)
C:\temp\test.c:4:34: warning: multi-line comment
MSVC 9 (VS 2008) or 10 (VS 2010):
C:\temp\test.c(5) : warning C4010: single-line comment contains line-continuation character
C:\temp\test.c:4:34: warning: multi-line comment
C:\temp\test.c(5) : warning C4010: single-line comment contains line-continuation character
Related
This question already has answers here:
In which step of compilation are comments removed?
(2 answers)
Closed 5 years ago.
Consider this (horrible, terrible, no good, very bad) code structure:
#define foo(x) // commented out debugging code
// Misformatted to not obscure the point
if (a)
foo(a);
bar(a);
I've seen two compilers' preprocessors generate different results on this code:
if (a)
bar(a);
and
if (a)
;
bar(a);
Obviously, this is a bad thing for a portable code base.
My question: What is the preprocessor supposed to do with this? Elide comments first, or expand macros first?
Unfortunately, the original ANSI C Specification specifically excludes any Preprocessor features in section 4 ("This specification describes only the C language. It makes no provision for either the library or the preprocessor.").
The C99 specification handles this explicity, though. The comments are replaced with a single space in the "translation phase", which happens prior to the Preprocessing directive parsing. (Section 6.10 for details).
VC++ and the GNU C Compiler both follow this paradigm - other compilers may not be compliant if they're older, but if it's C99 compliant, you should be safe.
As described in this copy-n-pasted decription of the translation phases in the C99 standard, removing comments (they are replaced by a single whitespace) occurs in translation phase 3, while preprocessing directives are handled and macros are expanded in phase 4.
In the C90 standard (which I only have in hard copy, so no copy-n-paste) these two phases occur in the same order, though the description of the translation phases is slightly different in some details from the C99 standard - the fact that comments are removed and replaced by a single whitespace character before preprocessing directives are handled and macros expanded is not different.
Again, the C++ standard has these 2 phases occur in the same order.
As far as how the '//' comments should be handled, the C99 standard says this (6.4.9/2):
Except within a character constant, a string literal, or a comment, the characters //
introduce a comment that includes all multibyte characters up to, but not including, the
next new-line character.
And the C++ standard says (2.7):
The characters // start a comment, which terminates with the next newline
character.
So your first example is clearly an error on the part of that translator - the ';' character after the foo(a) should be retained when the foo() macro is expanded - the comment characters should not be part of the 'contents' of the foo() macro.
But since you're faced with a buggy translator, you might want to change the macro definition to:
#define foo(x) /* junk */
to workaround the bug.
However (and I'm drifting off topic here...), since line splicing (backslashes just before a new-line) occurs before comments are processed, you can run into something like this bit of nasty code:
#define evil( x) printf( "hello "); // hi there, \
printf( "%s\n", x); // you!
int main( int argc, char** argv)
{
evil( "bastard");
return 0;
}
Which might surprise whoever wrote it.
Or even better, try the following, written by someone (certainly not me!) who likes box-style comments:
int main( int argc, char** argv)
{
//----------------/
printf( "hello "); // Hey, what the??/
printf( "%s\n", "you"); // heck?? /
//----------------/
return 0;
}
Depending on whether your compiler defaults to processing trigraphs or not (compilers are supposed to, but since trigraphs surprise nearly everyone who runs across them, some compilers decide to turn them off by default), you may or may not get the behavior you want - whatever behavior that is, of course.
According to MSDN, comments are replaced with a single space in the tokenization phase,
which happens before the preprocessing phase where macros are expanded.
Never put // comments in your macros. If you must put comments, use /* */. In addition, you have a mistake in your macro:
#define foo(x) do { } while(0) /* junk */
This way, foo is always safe to use. For example:
if (some condition)
foo(x);
will never throw a compiler error regardless of whether or not foo is defined to some expression.
#ifdef _TEST_
#define _cerr cerr
#else
#define _cerr / ## / cerr
#endif
will work on some compilers (VC++). When _TEST_ is not defined,
_cerr ...
will be replaced by the comment line
// cerr ...
I seem to recall that compliance requires three steps:
strip
expand macros
strip again
The reason for this has to do with the compiler being able to accept .i files directly.
I am reading the book "The C Programming Language" by Brian Kernighan and Dennis Ritchie(2nd edition, published by PHI). In the first article 1.1 Getting started of the first chapter A Tutorial Introduction, page number 7, they say that one must use \n in the printf() argument, otherwise the C compile will produce an error message. But when I compiled the program without \n in printf(), it went fine. I did not see any error message. I am using Dev-C portable with "MinGW GCC 4.6.2 32-bit" compiler.
Why I do not get the error message?
Here is the passage in question, from page 7 of the second edition of K&R:
You must use \n to include a newline character in the printf argument; if you try something like
printf("hello, world
");
the C compiler will produce an error message.
This means that you can't embed a literal newline in a quoted string.
Either one of the lines below, however, are fine:
printf("hello, world"); /* does not print a newline */
printf("hello, world\n"); /* prints a newline */
All the text above is saying is that you can't have a quoted string that spans multiple lines in the source code.
You can also escape a newline with a backslash. The C preprocessor will remove the backslash and newline, so the following two statements are equivalent:
printf("hello, world\
");
printf("hello, world");
And if you have a lot of text, you can put multiple quoted strings next to each other, or separated by whitespace, and the compiler will join them for you:
printf("hello, world\n"
"this is a second line of text\n"
"but you still need to include backslash-n to break each line\n");
You don't get a compile-time error message because there is no error.
In the first article they say that one must use \n in the printf() argument, otherwise the C compiler will produce an error message.
Can you cite (by section and/or page number) where that statement appears? I seriously do not believe that K&R (you're using the second edition, right?) says that. If it did say that, it would be an error in the book.
Update: What the book says, quite correctly, is that a newline in a string literal is represented by the two-character sequence \n, not by an actual newline character. A string literal must be on a single logical source line; something like
printf("hello
world");
is a syntax error. This applies to all string literals, whether they're printf format strings or not.
An actual newline in a string literal is an error. A \n sequence that represents a newline is optional; its lack is not an error, but a printf format string should usually end with a \n.
There is no requirement for a printf call to include the \n character, and I've never seen a compiler complain about a printf that lacks a \n.
There is an issue here, but it's not a compile-time error.
Some examples:
printf("No newline");
This is a perfectly legal call. It prints the specified string on standard output without a newline character.
printf("hello%c", '\n');
There's no \n in the format string, but it prints hello followed by a newline. Again, this is perfectly legal.
The actual issue is that you should (almost) always print a newline at the very end of your output. This complete program:
#include <stdio.h>
int main(void) {
printf("hello");
return 0;
}
is legal, but its behavior may be undefined in some implementations. The relevant rule is in the standard, section 7.21.2 paragraph 2 (the quote is from the N1570 draft):
A text stream is an ordered sequence of characters composed into
lines, each line consisting of zero or more characters plus a
terminating new-line character. Whether the last line requires a
terminating new-line character is implementation-defined.
Whether that terminating newline character is required or not, it's (almost always) a very good idea to end your output with a newline. If I run it on my system, I get the string hello immediately followed by my shell prompt on the same line. It's not illegal, but it's inconvenient and ugly.
But that applies only at the very end of the program's output. This program is perfectly valid and has well defined behavior:
#include <stdio.h>
int main(void) {
printf("hello");
putchar('\n');
return 0;
}
Still, the easiest and most reliable way to produce clean output is for each printf call to print exactly one line, which ends with exactly one '\n' character. This isn't a universal rule; sometimes it's convenient to print a line a piece at a time, or to print two or more lines in a single printf.
Very often, if you don't end your printf format string with a \n, some of the output stays in the stdout buffer, and you need to call fflush to get all the output shown.
This means that if you don't get all the expected output you should add fflush at appropriate places (e.g. before calls to fork).
But you won't get a compiler message in such case, because it is not an error (it may be a mistake many beginners are doing). If you really wanted, you could customize your compiler (e.g. with MELT if using a recent GCC compiler) to get the warning. I believe it is not worth the effort (because there are legitimate calls to printf without any \n....)
An example of legitimate printf calls without newlines would be if you coded a (recursive) function to output an expression from its AST; you certainly should not emit a newline after each token.
See documentation of printf(3), fflush(3), stdio(3), setvbuf(3) etc...
int main(void)
{
#if 0
something"
#endif
return 0;
}
A simple program above generates a warning: missing terminating " character in gcc. This seems odd, because it means that the compiler allow the code blocks between #if 0 and endif have invalid statement like something here, but not double quotes " that don't pair. The same happens in the use of #ifdef and #ifndef.
Real comments are fine here:
int main(void)
{
/*
something"
*/
return 0;
}
Why? And the single quote ' behave similarly, is there any other tokens that are treating specially?
See the comp.Lang.c FAQ, 11.19:
Under ANSI C, the text inside a "turned off" #if, #ifdef, or #ifndef must still consist of "valid preprocessing tokens." This means that the characters " and ' must each be paired just as in real C code, and the pairs mustn't cross line boundaries.
Compilation needs to go through many cycles, before generating executable binary.
You are not in the compiler yet. Your pre-processor is flagging this error. This will not check for C language syntax, but missing quotes, braces and things like that are pre-processor errors.
After this pre-processor pass, Your code will go to the C Compiler which will detect the error you are expecting...
The preprocessor works at the token level, and a string literal is considered a single token. The preprocessor is warning you that you have an invalid token.
According to the C99 standard, a preprocessing token is one of these things:
header-name
identifier
pp-number
character-constant
string-literal
punctuator
each non-white-space character that cannot be one of the
above
The standard also says:
If a ' or a " character matches the last category, the behavior is
undefined.
Things like "statement" above are invalid to the C compiler, but it is a valid token, and the preprocessor eliminates this token before it gets to the compiler.
Beside the Kevin's answer, Incompatibilities of GCC says:
GCC complains about unterminated character constants inside of preprocessing conditionals that fail. Some programs have English comments enclosed in conditionals that are guaranteed to fail; if these comments contain apostrophes, GCC will probably report an error. For example, this code would produce an error:
#if 0
You can't expect this to work.
#endif
The best solution to such a problem is to put the text into an actual C comment delimited by /*...*/.
int main(void)
{
#if 0
something"
#endif
return 0;
}
A simple program above generates a warning: missing terminating " character in gcc. This seems odd, because it means that the compiler allow the code blocks between #if 0 and endif have invalid statement like something here, but not double quotes " that don't pair. The same happens in the use of #ifdef and #ifndef.
Real comments are fine here:
int main(void)
{
/*
something"
*/
return 0;
}
Why? And the single quote ' behave similarly, is there any other tokens that are treating specially?
See the comp.Lang.c FAQ, 11.19:
Under ANSI C, the text inside a "turned off" #if, #ifdef, or #ifndef must still consist of "valid preprocessing tokens." This means that the characters " and ' must each be paired just as in real C code, and the pairs mustn't cross line boundaries.
Compilation needs to go through many cycles, before generating executable binary.
You are not in the compiler yet. Your pre-processor is flagging this error. This will not check for C language syntax, but missing quotes, braces and things like that are pre-processor errors.
After this pre-processor pass, Your code will go to the C Compiler which will detect the error you are expecting...
The preprocessor works at the token level, and a string literal is considered a single token. The preprocessor is warning you that you have an invalid token.
According to the C99 standard, a preprocessing token is one of these things:
header-name
identifier
pp-number
character-constant
string-literal
punctuator
each non-white-space character that cannot be one of the
above
The standard also says:
If a ' or a " character matches the last category, the behavior is
undefined.
Things like "statement" above are invalid to the C compiler, but it is a valid token, and the preprocessor eliminates this token before it gets to the compiler.
Beside the Kevin's answer, Incompatibilities of GCC says:
GCC complains about unterminated character constants inside of preprocessing conditionals that fail. Some programs have English comments enclosed in conditionals that are guaranteed to fail; if these comments contain apostrophes, GCC will probably report an error. For example, this code would produce an error:
#if 0
You can't expect this to work.
#endif
The best solution to such a problem is to put the text into an actual C comment delimited by /*...*/.
This question already has answers here:
In which step of compilation are comments removed?
(2 answers)
Closed 5 years ago.
Consider this (horrible, terrible, no good, very bad) code structure:
#define foo(x) // commented out debugging code
// Misformatted to not obscure the point
if (a)
foo(a);
bar(a);
I've seen two compilers' preprocessors generate different results on this code:
if (a)
bar(a);
and
if (a)
;
bar(a);
Obviously, this is a bad thing for a portable code base.
My question: What is the preprocessor supposed to do with this? Elide comments first, or expand macros first?
Unfortunately, the original ANSI C Specification specifically excludes any Preprocessor features in section 4 ("This specification describes only the C language. It makes no provision for either the library or the preprocessor.").
The C99 specification handles this explicity, though. The comments are replaced with a single space in the "translation phase", which happens prior to the Preprocessing directive parsing. (Section 6.10 for details).
VC++ and the GNU C Compiler both follow this paradigm - other compilers may not be compliant if they're older, but if it's C99 compliant, you should be safe.
As described in this copy-n-pasted decription of the translation phases in the C99 standard, removing comments (they are replaced by a single whitespace) occurs in translation phase 3, while preprocessing directives are handled and macros are expanded in phase 4.
In the C90 standard (which I only have in hard copy, so no copy-n-paste) these two phases occur in the same order, though the description of the translation phases is slightly different in some details from the C99 standard - the fact that comments are removed and replaced by a single whitespace character before preprocessing directives are handled and macros expanded is not different.
Again, the C++ standard has these 2 phases occur in the same order.
As far as how the '//' comments should be handled, the C99 standard says this (6.4.9/2):
Except within a character constant, a string literal, or a comment, the characters //
introduce a comment that includes all multibyte characters up to, but not including, the
next new-line character.
And the C++ standard says (2.7):
The characters // start a comment, which terminates with the next newline
character.
So your first example is clearly an error on the part of that translator - the ';' character after the foo(a) should be retained when the foo() macro is expanded - the comment characters should not be part of the 'contents' of the foo() macro.
But since you're faced with a buggy translator, you might want to change the macro definition to:
#define foo(x) /* junk */
to workaround the bug.
However (and I'm drifting off topic here...), since line splicing (backslashes just before a new-line) occurs before comments are processed, you can run into something like this bit of nasty code:
#define evil( x) printf( "hello "); // hi there, \
printf( "%s\n", x); // you!
int main( int argc, char** argv)
{
evil( "bastard");
return 0;
}
Which might surprise whoever wrote it.
Or even better, try the following, written by someone (certainly not me!) who likes box-style comments:
int main( int argc, char** argv)
{
//----------------/
printf( "hello "); // Hey, what the??/
printf( "%s\n", "you"); // heck?? /
//----------------/
return 0;
}
Depending on whether your compiler defaults to processing trigraphs or not (compilers are supposed to, but since trigraphs surprise nearly everyone who runs across them, some compilers decide to turn them off by default), you may or may not get the behavior you want - whatever behavior that is, of course.
According to MSDN, comments are replaced with a single space in the tokenization phase,
which happens before the preprocessing phase where macros are expanded.
Never put // comments in your macros. If you must put comments, use /* */. In addition, you have a mistake in your macro:
#define foo(x) do { } while(0) /* junk */
This way, foo is always safe to use. For example:
if (some condition)
foo(x);
will never throw a compiler error regardless of whether or not foo is defined to some expression.
#ifdef _TEST_
#define _cerr cerr
#else
#define _cerr / ## / cerr
#endif
will work on some compilers (VC++). When _TEST_ is not defined,
_cerr ...
will be replaced by the comment line
// cerr ...
I seem to recall that compliance requires three steps:
strip
expand macros
strip again
The reason for this has to do with the compiler being able to accept .i files directly.