C preprocessor stringification weirdness - c

I am defining a macro that evaluates to a constant string, holding the filename and the line number, for logging purposes.
It works fine, but I just can't figure out why 2 additional macros are needed - STRINGIFY and TOSTRING, when intuition suggests simply __FILE__ ":" #__LINE__.
#include <stdio.h>
#define STRINGIFY(x) #x
#define TOSTRING(x) STRINGIFY(x)
#define THIS_ORIGIN (__FILE__ ":" TOSTRING(__LINE__))
int main (void) {
/* correctly prints "test.c:9" */
printf("%s", THIS_ORIGIN);
return 0;
}
This just seems like an ugly hack to me.
Can someone explain in detail what happens stage by stage so that __LINE__ is stringified correctly, and why neither of __FILE__ ":" STRINGIFY(__LINE__) and __FILE__ ":" #__LINE__ works?

Because of the order of expansion. The GCC documentation says:
Macro arguments are completely macro-expanded before they are substituted into a macro body, unless they are stringified or pasted with other tokens. After substitution, the entire macro body, including the substituted arguments, is scanned again for macros to be expanded. The result is that the arguments are scanned twice to expand macro calls in them.
So if the argument will be stringified, it is not expanded first. You are getting the literal text in the parenthesis. But if it's being passed to another macro, it is expanded. Therefore if you want to expand it, you need two levels of macros.
This is done because there are cases where you do not want to expand the argument before stringification, most common being the assert() macro. If you write:
assert(MIN(width, height) >= 240);
you want the message to be:
Assertion MIN(width, height) >= 240 failed
and not some insane thing the MIN macro expands to (in gcc it uses several gcc-specific extensions and is quite long IIRC).

You can't simply use __FILE__":"#__LINE__ because the stringify operator # can only be applied to a macro parameter.
__FILE__ ":" STRINGIFY(__LINE__) would work OK with other text (eg __FILE__ ":" STRINGIFY(foo), but doesn't work when used with another macro (which is all __LINE__ really is) as the parameter; otherwise that macro doesn't get substituted.

Related

How to defer a macro substitution in a concatenate-stringify cascade

This is a follow-up question to this one (and also closer to the actual problem at hand).
If I have the following case:
#include <stdio.h>
#define FOO_ONE 12
#define FOO_TWO 34
#define BAR_ONE 56
#define BAR_TWO 78
#define FOO 99
#define STRINGIFY(mac) #mac
#define CONCAT(mac1, mac2) STRINGIFY(mac1) STRINGIFY(mac2)
#define MAKE_MAC(mac) CONCAT(mac##_ONE, mac##_TWO)
#define PRINT(mac) printf(#mac ": " MAKE_MAC(mac) "\n")
void main(int argc, char *argv[])
{
PRINT(FOO);
PRINT(BAR);
}
As can be seen, the stringified, concatenated macros are then substituted inside a printf() statement which itself is inside a macro.
Because FOO is defined (as 99), it happens that it is expanded before the concatenation with _ONE and _TWO, effectively creating the tokens 99_ONE and 99_TWO.
This program outputs:
FOO: 99_ONE99_TWO
BAR: 5678
How can I defer the expansion of the FOO macro (effectively, eliminating it altogether, to get the required output of:
FOO: 1234
BAR: 5678
NOTE: assume the PRINT() macro signature cannot be changed (i.e., can't add a parameter, etc.). However its implementation can be changed. Also, FOO, FOO_* and BAR_* definitions cannot be modified as well.
If you can do something each time FOO is used, you could first undefine the macro inorder for it expand as you want and then redefine it as in
#undef FOO
PRINT(FOO);
#define FOO 99
in which case it will expand to
printf("FOO" ": " "12" "34" "\n");
printf("BAR" ": " "56" "78" "\n");
to print what you want.
How can I defer the expansion of the FOO macro ... NOTE: assume the PRINT() macro signature cannot be changed (i.e., can't add a parameter, etc.)
You can't.
Macro expansions go through a series of steps:
Argument substitution
Paste and stringification in no particular order
Rescan and further replacement
Argument substitution occurs with your arguments; in the case of the invocation PRINT(FOO), FOO; and it's the very first step. By the time you even get the preprocessor to recognize that something in your replacement list is a macro, you've long past argument substitution.
The rule for argument substitution is that if any of your parameters are mentioned in your replacement list, and those parameters are being neither stringified nor pasted, then the corresponding argument is fully evaluated and those mentions of the parameters are replaced with the result. In this case, PRINT(FOO), after argument substitution, results in the replacement list:
printf(#mac ": " MAKE_MAC(99) "\n")
Again, MAKE_MAC's definition doesn't matter; it's not even recognized as a macro until rescan and further replacement.
Now, without your restriction, you could defer FOO from expanding... by adding a second parameter to PRINT and pasting to it (that disqualifies it from a.s., and by the time rescan and replacement comes along it'd be invoking your next macro). But with your restriction, you're DOA.
The actual expansion of FOO to 99 happens at the moment when PRINT is expanded and before its body is re-scanned one more time. The relevant part of the ANSI standard is:
6.10.3.1 Argument substitution
1 After the arguments for the invocation of a function-like macro have been identified,
argument substitution takes place. A parameter in the replacement list, unless preceded
by a # or ## preprocessing token or followed by a ## preprocessing token (see below), is
replaced by the corresponding argument after all macros contained therein have been
expanded. Before being substituted, each argument’s preprocessing tokens are
completely macro replaced as if they formed the rest of the preprocessing file; no other
preprocessing tokens are available.
In your case you can avoid expansion of FOO (inside mac) by replacing
#define PRINT(mac) printf(#mac ": " MAKE_MAC(mac) "\n")
with
#define PRINT(mac) printf(#mac ": " CONCAT(mac##_ONE, mac##_TWO) "\n")

How can macro output double quotes

ATOMIC_JOIN(prefix, detail_platform) is an macro which will output some string as follows:
base/atomic/gcc_gnu_x64
in another macro ATOMIC_DETAIL_HEADER, which output expected to be:
"base/atomic/gcc_gnu_x64.hpp" // notice: double quotes included in the output
I try to write the ATOMIC_DETAIL_HEADER, such as:
#define ATOMIC_DETAIL_HEADER(prefix) "ATOMIC_JOIN(prefix, ATOMIC_DETAIL_PLATFORM).hpp"
#define ATOMIC_DETAIL_HEADER(prefix) \"ATOMIC_JOIN(prefix, ATOMIC_DETAIL_PLATFORM).hpp\"
#define ATOMIC_DETAIL_HEADER(prefix) "##ATOMIC_JOIN(prefix, ATOMIC_DETAIL_PLATFORM).hpp##"
... failed!
but if i hope output is:
<base/atomic/gcc_gnu_x64.hpp>
The follow macro define can do right thing:
#define ATOMIC_DETAIL_HEADER(prefix) <ATOMIC_JOIN(prefix, ATOMIC_DETAIL_PLATFORM).hpp>
A cpp macro cannot build strings this way. It can join tokens to form new tokens, but at every stage it must be a valid token. Your example with angle-brackets works because the bracket characters are distinct tokens whereas the double-quotes cannot exist floating-off like that, and you cannot apply ## to it.
In most contexts, the compiler will concatenate adjacent string literals, so it may be sufficient to #stringify each piece at let the compiler do that.
While luser droog correctly stated why your use of quotes didn't work, he didn't show exactly how the goal can be accomplished. Indeed the # operator replaces a parameter by a string literal, i. e. puts quotation marks around the argument. This is slightly complicated by the fact that your token sequence has to be expanded first, so an additional level of macro substitution is needed:
#define QUOTED(a) #a
#define QUOTE(a) QUOTED(a)
#define ATOMIC_DETAIL_HEADER(prefix) QUOTE(ATOMIC_JOIN(prefix, ATOMIC_DETAIL_PLATFORM).hpp)

When is using whitespace for readability NOT allowed?

C seems to be pretty permissive when it comes to whitespace.
We can use or omit whitespace around an operator, between a function name and its parenthesized list of arguments, between an array name and its index, etc. in order to make code more readable. I understand this is a matter of preference.
The only place I can think of where whitespace is NOT allowed is this:
#include < stdio.h > // fatal error: stdio.h : No such file or directory
What are the other contexts in C where whitespace cannot be used for readability?
In most cases, adding whitespace within a single token either makes the program invalid or changes the meaning of the token. An obvious example: "foo" and " foo " are both valid string literals with different values, because a string literal is a single token. Changing 123456 to 123 456 changes it from a single integer constant to two integer constants, resulting in a syntax error.
The exceptions to this involve the preprocessor.
You've already mentioned the #include directive. Note that given:
#include "header.h"
the "header.h" is not syntactically a string literal; it's processed before string literals are meaningful. The syntax is similar, but for example a \t sequence in a header name isn't necessarily replaced by a tab character.
Newlines (which are a form of whitespace) are significant in preprocessor directives; you can't legally write:
#ifdef
FOO
/* ... */
#endif
But whitespace other than newlines is permitted:
# if SPACES_ARE_ALLOWED_HERE
#endif
And there's one case I can think of where whitespace is permitted between preprocessor tokens but it changes the meaning. In the definition of a function-like macro, the ( that introduces the parameter list must immediately follow the macro name. This:
#define TWICE(x) ((x) + (x))
defines TWICE as a function-like macro that takes one argument. But this:
#define NOT_TWICE (x) ((x) + (x))
defines NOT_TWICE as an ordinary macro with no arguments that expands to (x) ((x) + (x)).
This rule applies only to macro definitions; a macro invocation follows the normal rules, so you can write either TWICE(42) or TWICE ( 42 ).
White spaces are not allowed for readability (are significant) within a lexical token. I.e. within an identifier (foo bar is different from foobar), within a number (123 456 is different from 123456), within a string (that's your example basically) or within an operator (+ + is different from ++ and + = is different from +=). Between those you can add as much white space as you want, but when you add white space inside such a token you will break the lexical token into two separate tokens (or change the value in case of string constants), thus changing the meaning of your code .
In most cases the code with the added white space is either equivalent to the original code or results in a syntax error. But there are exceptions. For example:
return a +++ b;
is the same as
return a ++ + b;
but is different from:
return a + ++ b;
As I recall you need to be very careful with function-like macros, as in such dummy example:
#include <stdio.h>
#define sum(x, y) ((x)+(y))
int main(void)
{
printf("%d\n", sum(2, 2));
return 0;
}
the:
#define sum(x, y) ((x)+(y))
is different thing than say:
#define sum (x, y) ((x)+(y))
The latter one is object-like macro, that replaces exactly with (x, y) ((x)+(y)), that is parameters are not being subsituted (as it happens in function-like macro).

Macro within macro does not work

I came across one more piece of code that is even more confusing..
#include "stdio.h"
#define f(a,b) a##b
#define g(a) #a
#define h(a) g(a)
int main(void)
{
printf("%s\n",h(f(1,2)));
printf("%s\n",g(1));
printf("%s\n",g(f(1,2)));
return 0;
}
output is
12
1
f(1,2)
My assumption was
1) first f(1,2) is replaced by 12 , because of macro f(a,b)
concantenates its arguments
2) then g(a) macro replaces 1 by a string literal "1"
3) the output should be 1
But why is g(f(1,2)) not getting substituted to 12.
I'm sure i'm missing something here.
Can someone explain me this program ?
Macro replacement occurs from the outside in. (Strictly speaking, the preprocessor is required to behave as though it replaces macros one at a time, starting from the beginning of the file and restarting after each replacement.)
The standard (C99 §6.10.3.2/2) says
If, in the replacement list, a parameter is immediately preceded by a # preprocessing
token, both are replaced by a single character string literal preprocessing token that
contains the spelling of the preprocessing token sequence for the corresponding
argument.
Since # is present in the replacement list for the macro g, the argument f(1,2) is converted to a string immediately, and the result is "f(1,2)".
On the other hand, in h(f(1,2)), since the replacement list doesn't contain #, §6.10.3.1/1 applies,
After the arguments for the invocation of a function-like macro have been identified,
argument substitution takes place. A parameter in the replacement list, unless preceded
by a # or ## preprocessing token or followed by a ## preprocessing token (see below), is
replaced by the corresponding argument after all macros contained therein have been
expanded.
and the argument f(1, 2) is macro expanded to give 12, so the result is g(12) which then becomes "12" when the file is "re-scanned".
Macros can't expand into preprocessing directives. From C99 6.10.3.4/3 "Rescanning and further replacement":
The resulting completely macro-replaced preprocessing token sequence
is not processed as a preprocessing directive even if it resembles
one,
Source: https://stackoverflow.com/a/2429368/2591612
But you can call f(a,b) from g like you did with h. f(a,b) is interpreted as a string literal as #Red Alert states.

Different behavior in visual c++ versus gcc/clang while stringifying parameter which contains comma

I'm using stringizing operator to convert parameter which may contains comma passed to a macro into string. As I know, some characters cannot be stringified – notably, the comma(,) because it is used to delimit parameters and the right parenthesis()) because it marks the end of the parameter. So I use a variadic macro to pass commas to the stringizing operator like this:
#include <stdio.h>
#define TEST 10, 20
#define MAKE_STRING(...) #__VA_ARGS__
#define STRING(x) MAKE_STRING(x)
int main()
{
printf("%s\n", STRING(TEST) );
return 0;
}
it works fine. But it occurs to me what would happen without variadic macro, so I modify the macro: #define MAKE_STRING(x) #x. It compiles fine unexpectedly in visual c++ 2008/2010, and output 10, 20 while gcc/clang give the compilation error as expected:
macro "MAKE_STRING" passed 2 arguments, but takes just 1
So my question: is the Visual c++ doing additional work or the behavior is undefined?
VS in general allows extra parameters in macros and then just drops them silently:
STRING(10, 20, 30) - still works and prints 10. This is not the case here, but it pretty much means VS don't even have the error gcc threw at you.
It's not any additional work but "merely" a difference in substitution order.
I am not sure if this will answer your question but i hope this will help you solving your problem. When defining a string constant in C, you should include it in double quotes (for spaces). Also, the # macro wrap the variable name inside double quotes so, for example, #a become "a".
#include <stdio.h>
#define TEST "hello, world"
#define MAKE_STRING(x) #x
int main()
{
int a;
printf("%s\n", TEST);
printf("%s\n", MAKE_STRING(a));
return 0;
}
I compiled this code using gcc 4.7.1 and the output is:
hello, world
a
I dunno why this has upvotes, or an answer got downvoted (so the poster deleted it) but I don't know what you expect!
#__VA_ARGS__ makes no sense, suppose I have MACRO(a,b,c) do you want "a,b,c" as the string?
http://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html#Variadic-Macros
Read, that became standard behaviour, variable length arguments in macros allow what they do in variable length arguments to functions. The pre-processor operates on text!
The only special case involving # is ##, which deletes a comma before the ## if there are no extra arguments (thus preventing a syntax error)
NOTE:
It is really important you read the MACRO(a,b,c) part and what do you expect, a string "a,b,c"? or "a, b, c" if you want the string "a, b, c" WRITE THE STRING "a, b, c"
Using the # operator is great for stuff like
#define REGISTER_THING(THING) core_of_program.register_thing(THING); printf("%s registered\n",#THING);

Resources