What is the meaning of multi-line comment warnings in C? - c

I'm working on a C file for a homework assignment and I thought it might help the graders if I made my answers visible like so:
//**********|ANSWER|************\\
//blah blah blah, answering the
//questions, etc etc
and found when compiling with gcc that those backslash characters at the end of the first line seemed to be triggering a "multi-line comment" warning. When I removed them, the warning disappeared. So my question is twofold:
a) how exactly does the presence of the backslash characters make it a "multi-line comment", and
b) why would a multi-line comment be a problem anyway?

C (since the 1999 standard) has two forms of comments.
Old-style comments are introduced by /* and terminated by */, and can span a portion of a line, a complete line, or multiple lines.
C++-style comments are introduced by // and terminated by the end of the line.
But a backslash at the end of a line causes that line to be spliced to the next line. So you can legally introduce a comment with //, put a backslash at the end of the line, and cause the comment to span multiple physical lines (but only one logical line).
That's what you're doing on your first line:
//**********|ANSWER|************\\
Just use something other than backslash at the end of the line, for example:
//**********|ANSWER|************//
Though even that is potentially misleading, since it almost looks like an old-style /* .. */ comment. You might consider something a little simpler:
/////////// |ANSWER| ////////////
or:
/**********|ANSWER|************/

The compiler simply tells you that you might have inadvertently commented-out the next line of code by ending the previous comment line with \, which is a line continuation character in C. This causes the second line to get concatenated with the first. This in turn makes the // comment to actually comment-out both original lines. In your case it is not a problem, since the next line is a comment as well.
But if the next line was not intended to be a comment, then you might have ended up with "weird behavior": compiler ignoring the second line for no apparent reason. The situation is often complicated by the fact that some syntax-highlighting code editors do not detect this situation and fail to highlight the next line as a comment.
Generally, for this specific reason, it is not a good idea to abuse the \ character as code level. Use it only if you really have to, i.e. only if you really want to stitch several lines into one.

Nobody asked, but this is the top answer in Google, so
Suppressing, this specific warning could be done with -Wno-comment option.

a) how exactly does the presence of the backslash characters make it a "multi-line comment", and
A backslash as the last character on a line means that the compiler should disregard the backslash and the newline character - it tells the compiler to do this before it should check for comments. So it says that before removing comments it should effectively look at
//**********|ANSWER|************\//blah blah blah, answering the
//questions, etc etc
it now sees the // at the start and ignores the rest of the line
b) why would a multi-line comment be a problem anyway?
In your example it isn't since the second line is a comment anyway, but what if you had written something useful on the second line?
Well since you asked question "a" it's likely that you didn't realize that the compiler behaved this way, and if you don't realize that you've commented out a line of code, then it's quite nice of the compiler to warn you.
Another reason is that even if had known this is that normally an editor will not visibly show whitespace and it's therefore easy to miss that the backslash may or may not be the last character on the line. For example:
int i = 42;
// backslash+space: \
i++
// backslash and no space: \
i--
printf("%d\n", i);
Would result in 43 since the i-- is commented out, but i++ isn't (because the backslash is not the last character on the line, but a space is).

That will comment the line below it as well. If you want to do that all on one line without a warning try
/* // Bla \\ */

Related

Misplaced preprocessor character '\'

I'm trying to get a bunch of C modules written in 1994 for a Panasonic 3DO lib to compile with armcc. I've run into an error which I'm kind of confused about. My knowledge of C is not that deep, so perhaps one of you would be so kind as to help me figure this out:
#define DS_MSG_HEADER \
long whatToDo; /* opcode determining msg contents */ \
Item msgItem; /* message item for sending this buffer */ \
void* privatePtr; /* ptr to sender's private data */ \
void* link /* user defined -- for linking msg into lists */
The \ character is used in many include files in this library I'm unfamiliar with this syntax... and the ARM compiler seems to hate it.
Serious error: misplaced preprocessor character '\'
If you know why these \ characters are being used, could please explain? (Sorry if its a noob question) Also, is there an alternative way to write this so the compiler is happy?
This error is shown (among other reasons) if the shown backslash '\' is not the last character on the line.
I can think of two reasons:
Somehow you got at least one whitespace (space, tab) after the backslash.
I never had this problem.
The source is stored with Windows-style end-of-line markers, that are '\r' and '\n', "carriage return" and "line feed". And you are trying to compile it on a Unix-like system (Linux?) or by a compiler that expects Unix-like end-of-line markers, that is only '\n', "line feed". (Or the other way around.)
This is a quite common problem, that hits me time after time.
In any case, open the source in a capable editor and enable the visibility of "unvisible characters", commonly an option with this icon: ΒΆ. Check for whitespace. Then check for the coding of the end-of-line. Save with the appropriate one.

Is it 100% safe to strip trailing whitespace from .c/.h source files?

I'd like to automate removal of all trailing whitespace from .c and .h files, to reduce garbage that causes merge conflicts in git history, etc.
Is there any conceivable way this could change the output of the compilation stage? Or is it perfectly safe to do this automatically?
The only case I can think of where this could change the meaning is if there's a macro that ends with backslash followed by spaces:
#define FOO bar\<space>
where <space> represents a space character. Before trimming, the backslash escapes the space, which I don't think has any effect. But when you remove the space it escapes the newline, so the next line will become part of the expansion.
Since there's no reason to write an escaped space like that, this seems like a very unlikely problem. In fact, if there's code that looks like this, I think it's more likely that they intended to write a multi-line expansion, and the space was added by accident.
Outside macros and string literals, all sequences of whitespace are treated as a single space, and there's no difference between spaces and newlines.
UPDATE:
This case isn't actually valid. C doesn't allow escape sequences outside string or character literals, and only newlines can be escaped with backslash. GCC has an extension to treat this as an escaped newline (in case the programmer made the mistake I describe above), and it prints a warning when it's doing it. So removing the spaces will produce the same result but get rid of the warning.

Using \ to extend single-line comments

I just noticed that I can use \ to extend the single-line comment to the next line, similarly to doing so in pre-processor directives.
Why is nobody speaking for this language feature?
I didn't even see it in books..
What language version supports this?
It's part of C. Called line splicing.
The K&R book talks about it
Lines that end with the backslash character \ are folded by deleting the backslash and the
following newline character. This occurs before division into tokens.
This occurs in the preprocessing phase.
So single line comments can be made to appear like multi line like
//This is \
still a single line comment
Likewise with the case of strings
char str[]="Hello \
world. This is \
a string";
Edit: As noted in the comments, single line comments were not there in ANSI C but were introduced as part of the standard in C99 though many compilers already supported it.
From C99,
Except within a character constant, a string literal, or a comment, the characters // introduce a comment that includes all multibyte characters up to, but not including, the next new-line character. The contents of such a comment are examined only to identify multibyte characters and to find the terminating new-line character.
As far as line splicing is concerned, it is specified in C89 itself
2.1.1.2 Translation phases
Each instance of a new-line character and an immediately preceding backslash character is deleted, splicing physical source lines to form logical source lines. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character.
Look at KamiKaze's answer to see the relevant part of C99.
While it's true that a \ will effectively escape the newline at the end of a single-line comment, splicing the line with the following one (just as it does on any other line), you could claim that this is a bug in the Standard. At any rate, the situation is fantastically confusing. You might believe that both of these facts are true:
The single-line comment syntax // turns the rest of the line, up to the next newline, into a comment, which is not interpreted in any way, i.e. is ignored.
At the end of any line, a \ character eliminates the newline and splices the line to the following line.
But these two rules are basically in conflict; it looks like they can't both be true at the same time.
Now in fact, by definition, the second rule "wins", and the first rule really has to say that the rest of the line is not interpreted in any way except to check whether the last character is a \, in which case it retains its line-splicing meaning.
(Now, if you're a compiler writer or a language lawyer, of course, you don't think about it that way. If you're a compiler writer or language lawyer, you know that the \ was processed during an earlier phase of compilation, before comments are parsed, meaning that the first rule is perfectly true as stated. But most people don't think like compiler writers and language lawyers.)
My point is that this situation is basically fraught with peril. I would bet good money that there are compilers or other language processors out there that get this wrong. I would urge any sane programmer not to rely on this, not to put a \ at the end of any line that contains a single-line comment. (And if I were writing a compiler or other language processor, I'd try to warn about this.)
This is not a feature of comments but a general feature of the language, as it applies to all newline-characters.
The following is found in the C99 standard:
5.1.1.2 Translation phases
Each instance of a backslash character () immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place.
So it is standard compliant for C99 at least.
It is not much talked about, because the relevant usecases (except for large macros and strings) are quite rare. If you need a multiline comment (the standard comment in C, // was added from C++ later on), you could just use
/* multi
line
comment
*/
Every use except for large macros and strings will make the code harder to read and might even make it quite confusing. So generally it is not used except for the mentioned niches.
Every instance of \ followed by a newline is removed from the source during the first phase of parsing, before tokenisation and comment handling.
As a consequence, a single line comment can be extended to the next line of source code by escaping this newline with a \ (or a ??/ trigraph sequence):
// this is a single \
line comment
Note how the stackoverflow code highlighter is fooled by this trick and does not colorize the end of the comment line.
This feature can be further abused to make really weird looking comments:
/\
/\ This is a single line comment /\
\/ \/
/\
*\ This is a multi-line comment
*\
/
Any token can be broken in pieces this way. Check this corner case:
\
r\
et\
urn\
0x7\
ffff;\

evaluate expression with backslash in middle of statement

Suppose in C, I have the following code:
i=5 \
+6;
If I print i, it gives me 11.
I do not understand how the above code executes correctly. At first glance, I guessed it to be compiler error because of unrecognized token \. Can somebody explain the logic? Is it related to maximal munch logic?
A backslash at the end of a line tells the compiler to ignore the new-line character.
It is a way of formatting lines to be readable for humans without interrupting the source text. E.g., if you have a long string enclosed in quotation marks, you can use a backslash to continue the string on a new line without inserting a new-line character in the string.
(This was more useful before the C standard added the property that adjacent strings, such as "abc" "def", are concatenated. Now you can put strings on consecutive lines, and they will be concatenated. Prior to that, you had to use the backslash to do it.)
Nowadays the most common use of the backslash is, as heretolearn points out, to continue preprocessor macro definitions. Unlike regular C statements, preprocessor statements must be on a single line. However, some preprocessor macro definitions are quite long. To format them (somewhat) nicely, a definition is spread over multiple physical lines, but the backslash makes them into one line for the compiler (including the preprocessor).
A backslash followed by a new-line character are completely removed from the source text by the compiler, unlike a new-line character by itself. So the source text:
abc\
def
is equivalent to the single identifier abcdef, not abc def. You can use it in the middle of any operator or other language construction except trigraph sequences (trigraph sequences, such as ??=, are converted to replacement characters, such as #, before the backslash-new-line processing):
MyStructureVariable-\
>MemberName
IncrementMe+\
+
However, do not do that. Use it reasonably.
The practice of escaping the newlines at the end of a line is indeed to mark the continuation of the statement onto the next line. Apparently that was needed in the old C compilers There is only one place that I'm sure it is still needed and that is in macro definitions of functions, something that is generally frowned upon in C++.
A continued line is a line which ends with a backslash, . The
backslash is removed and the following line is joined with the current
one. No space is inserted, so you may split a line anywhere, even in
the middle of a word. (It is generally more readable to split lines
only at white space.)
The trailing backslash on a continued line is commonly referred to as
a backslash-newline.
If there is white space between a backslash and the end of a line,
that is still a continued line. However, as this is usually the result
of an editing mistake, and many compilers will not accept it as a
continued line, GCC will warn you about it.
Reference

What does '\' actually do in C?

As far as I know \ in C just appends the next line as if there was not a line break.
Consider the following code:
main(){\
return 0;
}
When I saw the pre-processed code(gcc -E) it shows
main(){return
0;
}
and not
main(){return 0;
}
What is the reason for this kind of behaviour? Also, how can I get the code I expected?
Yes, your expected result is the one required by the C and C++ standards. The backslash simply escapes the newline, i.e. the backslash-newline sequence is deleted.
GCC 4.2.1 from my OS X installation gives the expected result, as does Clang. Furthermore, adding a #define to the beginning and testing with
#define main(){\
return 0;
}
main()
yields the correct result
}
{return 0;
Perhaps gcc -E does some extra processing after preprocessing and before outputting it. In any case, the line break seen by the rest of the preprocessor seems to be in the right place. So it's a cosmetic bug.
UPDATE: According to the GCC FAQ, -E (or the default setting of the cpp command) attempts to put output tokens in roughly the same visual location as input tokens. To get "raw" output, specify -P as well. This fixes the observed issues.
Probably what happened:
In preserving visual appearance, tokens not separated by spaces are kept together.
Line splicing happens before spaces are identified for the above.
The { and return tokens are grouped into the same visual block.
0 follows a space and its location on the next line is duly noted.
PLUG: If this is really important to you, I have implemented my own preprocessor with correct implementation of both raw-preprocessed and whitespace-preserving "pretty" modes. Following this discussion I added line splices to the preserved whitespace. It's not really intended as a standalone tool, though. It's a testbed for a compiler framework which happens to be a fully compliant C++11 preprocessor library, which happens to have a miniature command-line driver. (The error messages are on par with GCC, or Clang, sans color, though.)
From K&R section A.12 Preprocessing:
A.12.2 Line Splicing
Lines that end with the backslash character \ are
folded by deleting the backslash and the following newline character.
This occurs before division into tokens.
It doesn't matter :/ The tokenizer will not see any difference. 1
Update In response to the comments:
There seems to be a fair amount of confusion as to what the expected output of the preprocessor should be. My point is that the expectation /seems/ reasonable at a glance but doesn't actually need to be specified in this way for the output to be valid. The amount of whitespace present in the output is simply irrelevant to the parser. What matters is that the preprocessor should treat the continued line as one line while interpreting it.
In other words: the preprocessor is not a text transformation tool, it's a token manipulation tool.
If it matters to you, you're probably
using the preprocessor for for something other than C/C++
treating C++ code as text, which is a ... code smell. (libclang and various less complete parser libraries come to mind).
1 (The preprocessor is free to achieve the specified result in whichever way it sees fit. The result you are seeing is possibly the most efficient way the implementors have found to implement this particular transformation)

Resources