Inserting a one-line line comment with a preprocessor macro - c

Is it possible to simulate a one-line comment (//) using a preprocessor macro (or magic)? For example, can this compile with gcc -std=c99?
#define LINE_COMMENT() ???
int main() {
LINE_COMMENT() asd(*&##)($*?><?><":}{)(#
return 0;
}

No. Here is an extract from the standard showing the phases of translation of a C program:
The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.
Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal character name is produced by token concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.
As you can see, comments are removed before macros are expanded, so a macro cannot expand into a comment.
You can obviously define a macro that takes an argument and expands to nothing, but it's slightly more restrictive than a comment, as its argument must consist only of valid preprocessor token characters (e.g. no # or unmatched quotes). Not very useful for general commenting purposes.

No. Comments are processed at preprocessor phase. You can do selective compilation (without regard to comments) with #if directives, as in:
#if 0
... // this stuff will not be compiled
...
#endif // up to here.
that's all the magic you can do with the limited macro preprocessor available in C/C++.

Related

C translation phase 4

Recently I encountered the following issue. My implementation was looking like this:
#define MY_CODE_VERSION PROJ_VERSION
#include "project.h"
if (3 != MY_CODE_VERSION)
PROJ_VERSION was defined in project.h. Why didn't I get a compilaton warning/error? Because I was trying to define something on a macro that was not known by the time the compiler was reaching the line #define MY_CODE_VERSION PROJ_VERSION.
I took a look over these phases from ANSI C but I can't figure it out the reason (the actual behaviour of the compiler, at which phase MY_CODE_VERSION takes the value of PROJ_VERSION).
My assuption is that this replacement takes place only at line "#if (3 != MY_CODE_VERSION)" and by this time PROJ_VERSION is already known by the compiler from the inclusion of project.h above.
Thank you in advance
I'll not hash out what you already know. What you apparently did not know:
6.10.3.4 Rescanning and further replacement
After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing
tokens are removed. Then, the resulting preprocessing token sequence
is rescanned, along with all subsequent preprocessing tokens of the
source file, for more macro names to replace.
If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s
preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced, it is not
replaced. These nonreplaced macro name preprocessing tokens are no
longer available for further replacement even if they are later
(re)examined in contexts in which that macro name preprocessing token
would otherwise have been replaced.
The resulting completely macro-replaced preprocessing token sequence is not processed as a preprocessing directive even if it
resembles one, but all pragma unary operator expressions within it are
then processed as specified in 6.10.9 below.
In short, once a macro has been expanded and all the string-izer and concatenations have been performed, the resulting "thing" is scanned once again for more stuff to replace. If the same name is found, it is not replaced.
So what you're seeing is standard-defined.

C Preprocessor: Dynamic #Define Creation

I would like to have the expansion of these C preprocessor lines:
#define _POUND_ #define
_POUND_ _FALSE 0
_FALSE
expand so the last line (i.e. _FALSE) expands to 0. I understand recursive CPP isn't possible directly but that it can be done. Unfortunately, I'm not fully sure I follow the logic presented in this link.
I think I need to force an additional evaluation but I don't know how to do that in this case (i.e. I have tried and failed).
Can you help?
As indicated several times over in comments, what you are looking for is not supported. Here's what the standard has to say about it:
A preprocessing directive consists of a sequence of preprocessing tokens that satisfies the following constraints: The first token in the sequence is a # preprocessing token that (at the start of translation phase 4) is either the first character in the source file (optionally after white space containing no new-line characters) or that follows white space containing at least one new-line character.
(C2011, 6.10/2; emphasis added)
Translation phase 4 is the one in which preprocessing directives are executed, so it follows that macro expansion during phase 4 cannot cause bona fide preprocessing directives to be created. Macros can be expanded to text that has the form of a preprocessing directive, but such text cannot actually be a directive.
It is true that the text resulting from a macro expansion is re-scanned for more macros to expand, but that process does not involve recognizing preprocessing directives that were not already there.

Macro definition for ".global"

I have a "c" program that uses ".global" assembly language code. My compiler does not allow this, i need to ignore the same, i tried the following:
#define .global //global
But this gives a compiler error. Is there any other option that I can use. The compilation error is:
"expected an identifier"
You are almost certainly going to end up writing your own preprocessing script; it shouldn't be too difficult, if your source files are reasonably controlled. If you don't use the .global construct in string literals or comments, for example, it would be sufficient to do something like:
sed 's/.global [_[:alpha:]][_[:alnum:]]*;//g'
(perhaps with a bit more attention to detail about whitespace).
You cannot manufacture a comment with a macro. (You also cannot define a macro whose name starts with a ., although I suppose there could be a compiler which accepts that as an extension.)
Comments are replaced with whitespace in phase 3 of the translation process. Preprocessor directives are not examined until phase 4, by which time all the comments have disappeared.
So there is no difference between
#define COMMENT //comment
#define COMMENT
Standards reference: §5.1.1.2/1:
The source file is decomposed into preprocessing tokens7) and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character.…
Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed.…

Why does this C program compile without an error?

I'm a beginner in C, and I was playing with C. I typed a C code like this:
#include <stdio.h>
int main()
{
printf("hello world\n");
\
return 0;
}
Even though I used \ knowingly, the C compiler doesn't throw any error. What is this symbol used for in the C language?
Edit:
Even this works:
"\n";
The sequence backslash-newline is removed from the code in a very early phase (phase 2) of the translation process. It used to be how you created long string literals before there was string concatenation, and is how you still extend macros over multiple lines.
See §5.1.1.2 Translation Phases of the C99 standard:
The precedence among the syntax rules of translation is specified by the following
phases.5)
Physical source file multibyte characters are mapped, in an implementation defined
manner, to the source character set (introducing new-line characters for
end-of-line indicators) if necessary. Trigraph sequences are replaced by
corresponding single-character internal representations.
Each instance of a backslash character (\) immediately followed by a new-line
character is deleted, splicing physical source lines to form logical source lines.
Only the last backslash on any physical source line shall be eligible for being part
of such a splice. A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character before any such
splicing takes place.
The source file is decomposed into preprocessing tokens6) and sequences of
white-space characters (including comments). A source file shall not end in a
partial preprocessing token or in a partial comment. Each comment is replaced by
one space character. New-line characters are retained. Whether each nonempty
sequence of white-space characters other than new-line is retained or replaced by
one space character is implementation-defined.
Preprocessing directives are executed, macro invocations are expanded, and
_Pragma unary operator expressions are executed. If a character sequence that
matches the syntax of a universal character name is produced by token
concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing
directive causes the named header or source file to be processed from phase 1
through phase 4, recursively. All preprocessing directives are then deleted.
Each source character set member and escape sequence in character constants and
string literals is converted to the corresponding member of the execution character
set; if there is no corresponding member, it is converted to an implementation defined
member other than the null (wide) character.7)
Adjacent string literal tokens are concatenated.
White-space characters separating tokens are no longer significant. Each
preprocessing token is converted into a token. The resulting tokens are
syntactically and semantically analyzed and translated as a translation unit.
All external object and function references are resolved. Library components are
linked to satisfy external references to functions and objects not defined in the
current translation. All such translator output is collected into a program image
which contains information needed for execution in its execution environment.
5) Implementations shall behave as if these separate phases occur, even though many are typically folded together in practice.
6) As described in 6.4, the process of dividing a source file’s characters into preprocessing tokens is
context-dependent. For example, see the handling of < within a #include preprocessing directive.
7) An implementation need not convert all non-corresponding source characters to the same execution
character.
If you had a blank or any other character after your stray backslash, you would have a compilation error. We can tell that you don't have anything after it because you don't have a compilation error.
The other part of your question, about:
"\n";
is quite different. It is a simple expression that has no side-effects and therefore no effect on the program. The optimizer will completely discard it. When you write:
i = 1;
you have an expression with a value that is discarded; it is evaluated for its side-effect of modifying i.
Sometimes, you'll find code like:
*ptr++;
The compiler will warn you that the result of the expression is discarded; the expression can be simplified to:
ptr++;
and will achieve the same effect in the program.
The \, when immediately followed by a newline, is consumed by preprocessing and causes the next "physical" line to be joined to the current logical line. This is very important for writing long preprocessing directives, which have to be all on one logical line:
#define SHORT very log macro \
consisting of lots and \
lots of preprocessor \
tokens
If you remove the backslash-newline sequences, it is no longer correct. Some other languages from the Unix culture have a similar backslash line continuation syntax: the POSIX shell language derived from the Bourne shell, and also makefiles.
$ this is \
one shell command
About "\n";, that is a primary expression used to form an expression-statement. In C, expressions can be used as statements, and this is exploited all the time. Your printf call, for instance, is an expression statement. printf("hello world\n") is a postfix expression which calls a function, obtaining a return value. Because you used this expression as a statement, the return value is thrown away. The return value of printf
indicates how many characters were printed, or whether it was successful at all, so by throwing it away, your program makes itself oblivious to whether the printf call actually worked.
Since the value of an expression-statement is discarded, if such a statement also has no side effects, it is a useless statement which does nothing (like your "\n"). But such useless expression statements are not erroneous. If you add warning options to your compiler command line you might get a warning such as "statement with no effect" or something like that.
The backslash \ get interpreted by the C preprocessor. It protect its following character (the new line character on your case).
The backslash is simply escaping the next character. In this case, probably a line end (CR) character. Perfectly reasonable.
The backslash plus what is following it is an escape sequence; "\n" together is the newline character (prints a newline). Another important one is "\t", for tab.

Working of the C Preprocessor

How does the following piece of code work, in other words what is the algorithm of the C preprocessor? Does this work on all compilers?
#include <stdio.h>
#define b a
#define a 170
int main() {
printf("%i", b);
return 0;
}
The preprocessor just replaces b with a wherever it finds it in the program and then replaces a with 170 It is just plain textual replacement.
Works on gcc.
It's at §6.10.3 (Macro Replacement):
6.10.3.4 Rescanning and further replacement
1) After all parameters in the replacement list have been substituted and #
and ## processing has taken place, all placemarker preprocessing tokens are removed. Then, the resulting preprocessing token sequence
is rescanned, along with all subsequent preprocessing tokens of the
source file, for more macro names to replace.
Further paragraphs state some complementary rules and exceptions, but this is basically it.
Though it may violate some definitions of "single pass", it's very useful. Like the recursive preprocessing of included files (§5.1.1.2p4).
This simple replacement (first b with a and then a with 170) should work with any compiler.
You should be careful with more complicated cases (usually involving stringification '#' and token concatenation '##') as there are corner case handled differently at least by MSVC and gcc.
In doubt, you can always check the ISO standard (a draft is available online) to see how things are supposed to work :). Section 6.10.3 is the most relevant in your case.
The preprocessor just replaces the symbols sequentially whenever they appear. The order of the definitions does not matter in this case, b is replaced by a first, and the printf statement becomes
printf("%i", a);
and after a is replaced by 170, it becomes
printf("%i", 170);
If the order of definition was changed, i.e
#define a 170
#define b a
Then preprocessor replaces a first, and the 2nd definition becomes
#define b 170
So, finally the printf statement becomes
printf("%i",170);
This works for any compiler.
To get detailed info you can try gcc -E to analyse your pre-processor output which can easily clear your doubt
#define simply assigns a value to a keyword.
Here, 'b' is first assigned value 'a' then 'a' is assigned value '170'. For simplicity, it can be expressed as follows:
b=a=170
It's just a different way of defining the same thing.
I think you are trying to get the information how the source code is processed by compiler. To know exactly you have to go through Translation Phases. The general steps that are followed by every compiler (tried to give every detail - gathered from different blogs and websites) are below:
First Step by Compiler - Physical source file characters are mapped to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences are replaced by corresponding single-character internal representations.
Second Step by Compiler - Each instance of a new-line character and an immediately preceding backslash character is deleted, splicing physical source lines to form logical source lines. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character.
Third Step by Compiler - The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of other white-space characters is retained or replaced by one space character is implementation-defined.
Fourth Step by Compiler - Preprocessing directives are executed and macro invocations are expanded. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively.
Fivth Step by Compler - Each escape sequence in character constants and string literals is converted to a member of the execution character set.
Sixth Step by Compiler - Adjacent character string literal tokens are concatenated and adjacent wide string literal tokens are concatenated.
Seventh Step by Compiler - White-space characters separating tokens are no longer significant. Preprocessing tokens are converted into tokens. The resulting tokens are syntactically and semantically analyzed and translated.
Last Step - All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment.

Resources