Are there any guarantees about consistency of __LINE__ directives? - c

GCC 9 has recently changed the behavior of the __LINE__ directive in some cases. The program below illustrates the change:
#include <stdio.h>
#define expand() __LINE__
int main() {
printf("%d\n",expand(
));
return 0;
}
Because the macro expand() (which expands to __LINE__) spans more than one line, GCC up to 8.3 (and Clang up to 8.0) considers the number of the last line of the expansion, printing 5. But GCC 9 considers the first line, and prints 4.
(Godbolt link: https://godbolt.org/z/3Nk2al)
The C11 standard is not very precise about the exact behavior of __LINE__, other than:
6.10.8 Predefined macro names
The values of the predefined macros listed in the following subclauses (except for __FILE__ and __LINE__) remain constant throughout the translation unit.
(...)
6.8.10.1 Mandatory macros
The following macro names shall be defined by the implementation:
(...)
__LINE__ The presumed line number (within the current source file) of the current source line (an integer constant).
I assume this means that the exact value is implementation-defined, and therefore one cannot expect its value to remain constant across different compiler versions, or different compilers. Or is there some argument to that effect elsewhere in the standard?
For instance, could one argue that the presumed line number of the current source line should be stable as long as the source itself did not change?

While the general case of finding a line number from a specific instruction is hard, i.e. GDB trying to come up with a line number from some code that crashed, the printf __LINE__ instruction is relatively straight forward as the compiler generates the number as a static for a specific location.
The C11 standard itself is just saying that the line number should not change while you are in a macro, i.e. __LINE__ should reflect where you are in a program after the macro has been expanded, not what line of code the macro is located on. This allows you to do things like provide the line number of the callee of your function, by creating a macro that printed the line number and then called your function. For example:
#define f(x) ({ printf("called f(x) at line=%d\n", __LINE__); f__real(x); })
As to the exact line number that is presented, that is compiler dependent and is not apart of any standard. I.e. your example could be rewritten as:
int main() {
printf("%d\n",
__LINE__);
}
and in this case a valid response could be either 2 or 3.
If you are trying to determine the line number dynamically, that is provided through system backtrace libraries. i.e. backtrace() on linux.

Related

Using Backslash-Newline outside of a macro in C

int main(){\
int a = 5;\
return a;\
}
Above compiles fine. I assume the C preprocessor removes the backslashes before compilation?
output of gcc -E:
int main(){
int a = 5;
return a;}
It seems like not all the \n (new line) characters get removed similar to how it's done with Macros, it just mainly removed the backslash.
I have seen this used in multiline macros such as:
#define TEST(in)\
int a = in; \
int b = 6;
int main(){
TEST(5)
return 0;
}
output of gcc -E:
int main(){
int a = 5; int b = 6;
return 0;
}
Preprocess will remove the backslash as well as the \n character in the above example, but why is it not removing all the new line characters in my first example?
"Splices" -- backslash newline sequences -- are removed before the preprocessor processes the program text. At least that's the theory, bearing in mind that the C standard does not actually define a process called the "preprocessor".
What it does define is a procedure for converting the program text into a stream of tokens which can be parsed, and then turning that into an executable. The procedure consists of eight translation phases, and the compiler must produce the same result as would be produced if the phases were executed one at a time, each one taking as input the output of the previous phase. (Most of the inputs and output are streams of tokens, rather than character strings. So the output GCC produces when run with the -E flag doesn't correspond to anything in the standard, allowing GCC to basically produce whatever output it finds convenient. Or that its authors thought you would find convenient.)
The "as if" clause means that a particular compiler can combine phases or execute them in pieces, as long as it doesn't change the result. So you can really only look at the process as the abstract description of an algorithm. Still, it's useful to understand. The full text is found in §5.1.1.2 of the standard.
A highly condensed and commented description of the phases, which is incomplete and somewhat imprecise in its details, in the hopes that it's easier to digest than the language in the standard. But do read it in the original.
Remove trigraphs (which are now deprecated, so don't worry if you don't know what they are) and, if necessary, convert the program text to whatever character encoding the compiler requires.
Remove splices. All backslash-newline sequences are simply removed from the program text, leaving nothing behind. (OK, that's the theory. In practice, most compilers still know the original source line number of every bit of text. But this information is only used for producing diagnostics.)
Split the text into tokens and whitespace sequences, and replace all comments with a single space character.
"Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed". This is as close as the standard gets to defining the preprocessor, so it's reasonable to say that the "preprocessor" is the execution of phase 4. #include directives are preprocessor directives, and processing the include directive starts with passing the included file through phases 1-3 before inserting it into the token stream to be further preprocessed.
Replace all the escape sequences in character and string literals with the actual characters (possibly wide characters) which will be used during execution.
Concatenate adjacent string literals.
Remove all whitespace, leaving only tokens. Convert preprocessing tokens into syntactic tokens. Parse the resulting token stream and convert it into a "translation unit". Or, in other words, compile the program into an object file (although that's way more specific than the language in the standard).
Combine all the translation units and necessary library modules into a single executable image. Informally, this is the linking phase and the result is something you can hand to the operating system for execution.
That's what the standard mandates. But real-world compilers do lots of other stuff, like generate more or less readable error messages; rearrange the code in ways that might make it execute faster and/or occupy less space; insert debugging information into the executable; and produce whatever additional analyses and reports the user has requested (none of which are standardised). This, for example, includes the -E and/or -S outputs. The compiler does these things as a favour to you, and they can be helpful in understanding the way your program was compiled. But you shouldn't take them too seriously, since the official result of the compilation process is the actual executable.
Most compilation toolchains can also produce libraries, so it is not the case that all programs are immediately fully processed into executable images. But that's the only outcome which is standardised. Although the standard refers to libraries, particularly the standard library, it does not make any assumptions about how libraries come into existence.
The standard libraries (and headers) don't even have to exist in the filesystem; it's enough that the compiler recognises their names and responds appropriately. Some of the stuff the standard library has to implement cannot be written in portable C, so it is quite possible that the standard library source code, if it exists, is not all in the form of a standard C program. Standard library headers might include constructs which receive special handling by the compiler, and thus cannot be used by other compilers or copied directly into your program.
This might all seem too much in the air, but the intention was to make it possible to have C implementations which run on extremely limited processors, including processors without any external storage at all. (And it is still quite common to target embedded systems which might be missing lots of things you normally take for granted.) And, on the whole, it's served us pretty well over the years.
The C Standard does not specify how preprocessing can be performed with textual output, as gcc -E does. Compilers try and produce textual output that can be fed back to the compiler to produce the same program. White space details is largely irrelevant for this as long as the same tokens are produced. In your example, gcc outputs readable text where escaped newlines are output as newlines as long as they are surrounded by white space. This optional: escaped newlines should actually be removed completely from the input during an early phase, allowing for tokens that are broken on separate lines to be reconstituted.
Here is a pathological example:
#inc\
lude\
<st\
dio.\
h>
int \
main\
() {\
retu\
rn 0\
; }
Regarding macros with definitions spanning multiple lines with escaped newlines, their expansion is described precisely by the C Standard: sequences of whitespace and comments must be replaced by a single space.
Here is an illustrative example:
int main(){\
int a = 5;\
return a;\
}
#define TEST() int main(){\
int a = 5;\
return a;\
}
#define XSTR(x) #x
#define STR(x) XSTR(x)
TEST()
STR(TEST())
Output of gcc -E:
# 1 "prepmain.c"
# 1 "<built-in>" # 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "prepmain.c" int main(){
int a = 5;
return a;} # 14 "prepmain.c"
int main(){ int a = 5; return a;} "int main(){ int a = 5; return a;}"
Output of gcc -E -P:
int main(){ int a = 5; return a;}
int main(){ int a = 5; return a;}
"int main(){ int a = 5; return a;}"

How does a backslash-newline combo affect the value of the C preprocessor's __LINE__ macro?

Per the C standard, collapsing multiple physical lines joined by a backslash-newline sequence is an earlier phase of translation than executing the preprocessor.
Assuming no complications due to an earlier #line directive, then, does the value of the __LINE__ macro reflect the physical line number before such lines are spliced? That is what you'd find e.g. by manually inspecting the source or what a text editor would report the line # as, and would probably be the more useful alternative. Or does it reflect the line # subsequent to splicing, which presumably is what the preprocessor would actually see given the order of translation phases actually specified in the standard?
(And if I understand correctly--which I very well may not--the preprocessor would have no way of knowing whether a given line was the product of splicing or not.)
Compilers implement __LINE__ by remembering physical line numbers in ways not specified by the C standard.
C 2018 6.10.8.1 1 tells us __LINE__ is replaced by “The presumed line number (within the current source file) of the current source line (an integer constant).” This specification is vague and cannot be implemented in a useful way while adhering to the standard literally.
Consider this code:
#define Assert(test) do { if (!test) printf("Assertion on line %d failed.\n", __LINE__); } while (0)
... Many lines of code follow, including some with line splicing.
Assert(condition);
... Many lines of code.
To be useful, this code must print the physical line number on which the Assert is used. It needs to be the physical line number so that the user can locate the line in a text editor, and it needs to be the line on which the Assert macro is replaced, not defined, because that is where the problem is detected. Both GCC and Clang do this.
However, this requires that the physical line number from before line splicing be provided during macro replacement, which occurs after line splicing. In C 2018 5.1.1.2 1, the standard specifies a translation model in which:
in phase 2, “Each instance of a backslash character () followed immediately by a new-line character is deleted, splicing physical source lines to form logical source lines,” and,
in phase 3, “The source file is decomposed into preprocessing tokens and white-space characters,” including new-line characters but not ones deleted in phase 2, and,
in phase 4, macro invocations are expanded.
So, if a compiler replaces a __LINE__ macro in phase 4 and literally has only the preprocessing tokens and remaining white-space characters, it cannot know the physical line number to provide.
Therefore, a compiler cannot be implemented literally following the standard’s model of translation. To be useful, it must associate a physical line number with each preprocessing token that could be a macro name. Whenever a macro is replaced, it must propagate the associated physical line number. Then, when a __LINE__ token is finally replaced, the compiler will have the associated physical line number to replace it with.
A multiline macro joined by newlines gets translated into a single line of code, so it gives you the line that the macro appeared on.
For example, given the following code:
#define TEST do {\
printf("line1=%d\n" __LINE__);\
printf("line2=%d\n" __LINE__);\
printf("line3=%d\n" __LINE__);\
} while (0)
int main(void)
{
TEST;
return 0;
}
Passing it through the preprocessor in gcc gives you this:
# 1 "x1.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "x1.c"
int main(void)
{
do { printf("line1=%d\n" 9); printf("line2=%d\n" 9); printf("line3=%d\n" 9);} while (0);
return 0;
}

Quotation Mark Inside C Macro Leads to Bizarre Behavior [duplicate]

This question already has answers here:
In which step of compilation are comments removed?
(2 answers)
Closed 5 years ago.
Consider this (horrible, terrible, no good, very bad) code structure:
#define foo(x) // commented out debugging code
// Misformatted to not obscure the point
if (a)
foo(a);
bar(a);
I've seen two compilers' preprocessors generate different results on this code:
if (a)
bar(a);
and
if (a)
;
bar(a);
Obviously, this is a bad thing for a portable code base.
My question: What is the preprocessor supposed to do with this? Elide comments first, or expand macros first?
Unfortunately, the original ANSI C Specification specifically excludes any Preprocessor features in section 4 ("This specification describes only the C language. It makes no provision for either the library or the preprocessor.").
The C99 specification handles this explicity, though. The comments are replaced with a single space in the "translation phase", which happens prior to the Preprocessing directive parsing. (Section 6.10 for details).
VC++ and the GNU C Compiler both follow this paradigm - other compilers may not be compliant if they're older, but if it's C99 compliant, you should be safe.
As described in this copy-n-pasted decription of the translation phases in the C99 standard, removing comments (they are replaced by a single whitespace) occurs in translation phase 3, while preprocessing directives are handled and macros are expanded in phase 4.
In the C90 standard (which I only have in hard copy, so no copy-n-paste) these two phases occur in the same order, though the description of the translation phases is slightly different in some details from the C99 standard - the fact that comments are removed and replaced by a single whitespace character before preprocessing directives are handled and macros expanded is not different.
Again, the C++ standard has these 2 phases occur in the same order.
As far as how the '//' comments should be handled, the C99 standard says this (6.4.9/2):
Except within a character constant, a string literal, or a comment, the characters //
introduce a comment that includes all multibyte characters up to, but not including, the
next new-line character.
And the C++ standard says (2.7):
The characters // start a comment, which terminates with the next newline
character.
So your first example is clearly an error on the part of that translator - the ';' character after the foo(a) should be retained when the foo() macro is expanded - the comment characters should not be part of the 'contents' of the foo() macro.
But since you're faced with a buggy translator, you might want to change the macro definition to:
#define foo(x) /* junk */
to workaround the bug.
However (and I'm drifting off topic here...), since line splicing (backslashes just before a new-line) occurs before comments are processed, you can run into something like this bit of nasty code:
#define evil( x) printf( "hello "); // hi there, \
printf( "%s\n", x); // you!
int main( int argc, char** argv)
{
evil( "bastard");
return 0;
}
Which might surprise whoever wrote it.
Or even better, try the following, written by someone (certainly not me!) who likes box-style comments:
int main( int argc, char** argv)
{
//----------------/
printf( "hello "); // Hey, what the??/
printf( "%s\n", "you"); // heck?? /
//----------------/
return 0;
}
Depending on whether your compiler defaults to processing trigraphs or not (compilers are supposed to, but since trigraphs surprise nearly everyone who runs across them, some compilers decide to turn them off by default), you may or may not get the behavior you want - whatever behavior that is, of course.
According to MSDN, comments are replaced with a single space in the tokenization phase,
which happens before the preprocessing phase where macros are expanded.
Never put // comments in your macros. If you must put comments, use /* */. In addition, you have a mistake in your macro:
#define foo(x) do { } while(0) /* junk */
This way, foo is always safe to use. For example:
if (some condition)
foo(x);
will never throw a compiler error regardless of whether or not foo is defined to some expression.
#ifdef _TEST_
#define _cerr cerr
#else
#define _cerr / ## / cerr
#endif
will work on some compilers (VC++). When _TEST_ is not defined,
_cerr ...
will be replaced by the comment line
// cerr ...
I seem to recall that compliance requires three steps:
strip
expand macros
strip again
The reason for this has to do with the compiler being able to accept .i files directly.

Different behavior in visual c++ versus gcc/clang while stringifying parameter which contains comma

I'm using stringizing operator to convert parameter which may contains comma passed to a macro into string. As I know, some characters cannot be stringified – notably, the comma(,) because it is used to delimit parameters and the right parenthesis()) because it marks the end of the parameter. So I use a variadic macro to pass commas to the stringizing operator like this:
#include <stdio.h>
#define TEST 10, 20
#define MAKE_STRING(...) #__VA_ARGS__
#define STRING(x) MAKE_STRING(x)
int main()
{
printf("%s\n", STRING(TEST) );
return 0;
}
it works fine. But it occurs to me what would happen without variadic macro, so I modify the macro: #define MAKE_STRING(x) #x. It compiles fine unexpectedly in visual c++ 2008/2010, and output 10, 20 while gcc/clang give the compilation error as expected:
macro "MAKE_STRING" passed 2 arguments, but takes just 1
So my question: is the Visual c++ doing additional work or the behavior is undefined?
VS in general allows extra parameters in macros and then just drops them silently:
STRING(10, 20, 30) - still works and prints 10. This is not the case here, but it pretty much means VS don't even have the error gcc threw at you.
It's not any additional work but "merely" a difference in substitution order.
I am not sure if this will answer your question but i hope this will help you solving your problem. When defining a string constant in C, you should include it in double quotes (for spaces). Also, the # macro wrap the variable name inside double quotes so, for example, #a become "a".
#include <stdio.h>
#define TEST "hello, world"
#define MAKE_STRING(x) #x
int main()
{
int a;
printf("%s\n", TEST);
printf("%s\n", MAKE_STRING(a));
return 0;
}
I compiled this code using gcc 4.7.1 and the output is:
hello, world
a
I dunno why this has upvotes, or an answer got downvoted (so the poster deleted it) but I don't know what you expect!
#__VA_ARGS__ makes no sense, suppose I have MACRO(a,b,c) do you want "a,b,c" as the string?
http://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html#Variadic-Macros
Read, that became standard behaviour, variable length arguments in macros allow what they do in variable length arguments to functions. The pre-processor operates on text!
The only special case involving # is ##, which deletes a comma before the ## if there are no extra arguments (thus preventing a syntax error)
NOTE:
It is really important you read the MACRO(a,b,c) part and what do you expect, a string "a,b,c"? or "a, b, c" if you want the string "a, b, c" WRITE THE STRING "a, b, c"
Using the # operator is great for stuff like
#define REGISTER_THING(THING) core_of_program.register_thing(THING); printf("%s registered\n",#THING);

Does the C preprocessor strip comments or expand macros first? [duplicate]

This question already has answers here:
In which step of compilation are comments removed?
(2 answers)
Closed 5 years ago.
Consider this (horrible, terrible, no good, very bad) code structure:
#define foo(x) // commented out debugging code
// Misformatted to not obscure the point
if (a)
foo(a);
bar(a);
I've seen two compilers' preprocessors generate different results on this code:
if (a)
bar(a);
and
if (a)
;
bar(a);
Obviously, this is a bad thing for a portable code base.
My question: What is the preprocessor supposed to do with this? Elide comments first, or expand macros first?
Unfortunately, the original ANSI C Specification specifically excludes any Preprocessor features in section 4 ("This specification describes only the C language. It makes no provision for either the library or the preprocessor.").
The C99 specification handles this explicity, though. The comments are replaced with a single space in the "translation phase", which happens prior to the Preprocessing directive parsing. (Section 6.10 for details).
VC++ and the GNU C Compiler both follow this paradigm - other compilers may not be compliant if they're older, but if it's C99 compliant, you should be safe.
As described in this copy-n-pasted decription of the translation phases in the C99 standard, removing comments (they are replaced by a single whitespace) occurs in translation phase 3, while preprocessing directives are handled and macros are expanded in phase 4.
In the C90 standard (which I only have in hard copy, so no copy-n-paste) these two phases occur in the same order, though the description of the translation phases is slightly different in some details from the C99 standard - the fact that comments are removed and replaced by a single whitespace character before preprocessing directives are handled and macros expanded is not different.
Again, the C++ standard has these 2 phases occur in the same order.
As far as how the '//' comments should be handled, the C99 standard says this (6.4.9/2):
Except within a character constant, a string literal, or a comment, the characters //
introduce a comment that includes all multibyte characters up to, but not including, the
next new-line character.
And the C++ standard says (2.7):
The characters // start a comment, which terminates with the next newline
character.
So your first example is clearly an error on the part of that translator - the ';' character after the foo(a) should be retained when the foo() macro is expanded - the comment characters should not be part of the 'contents' of the foo() macro.
But since you're faced with a buggy translator, you might want to change the macro definition to:
#define foo(x) /* junk */
to workaround the bug.
However (and I'm drifting off topic here...), since line splicing (backslashes just before a new-line) occurs before comments are processed, you can run into something like this bit of nasty code:
#define evil( x) printf( "hello "); // hi there, \
printf( "%s\n", x); // you!
int main( int argc, char** argv)
{
evil( "bastard");
return 0;
}
Which might surprise whoever wrote it.
Or even better, try the following, written by someone (certainly not me!) who likes box-style comments:
int main( int argc, char** argv)
{
//----------------/
printf( "hello "); // Hey, what the??/
printf( "%s\n", "you"); // heck?? /
//----------------/
return 0;
}
Depending on whether your compiler defaults to processing trigraphs or not (compilers are supposed to, but since trigraphs surprise nearly everyone who runs across them, some compilers decide to turn them off by default), you may or may not get the behavior you want - whatever behavior that is, of course.
According to MSDN, comments are replaced with a single space in the tokenization phase,
which happens before the preprocessing phase where macros are expanded.
Never put // comments in your macros. If you must put comments, use /* */. In addition, you have a mistake in your macro:
#define foo(x) do { } while(0) /* junk */
This way, foo is always safe to use. For example:
if (some condition)
foo(x);
will never throw a compiler error regardless of whether or not foo is defined to some expression.
#ifdef _TEST_
#define _cerr cerr
#else
#define _cerr / ## / cerr
#endif
will work on some compilers (VC++). When _TEST_ is not defined,
_cerr ...
will be replaced by the comment line
// cerr ...
I seem to recall that compliance requires three steps:
strip
expand macros
strip again
The reason for this has to do with the compiler being able to accept .i files directly.

Resources