How does C preprocessor actually work? - c

I made the code snippet simpler to explain
// Example 1
#define sum2(a, b) (a + b)
#define sum3(a, b, c) (sum2(a, sum2(b, c)))
sum3(1, 2, 3) // will be expanded to ((1 + (2 + 3)))
// Example 2
#define score student_exam_score
#define print_score(student_exam_score) printf("%d\n", score)
#undef score
print_score(80); // will be expanded to printf("%d\n", score);
// but not printf("%d\n", 80); that I expect
The first one is intuitive, and that kinds of codes exists in several places such as finding the maximum or minimum number. However, I want to use that technique to make my code clean and easy to read, so I replace the some words in a macro with a shorter and more meaningful name.
AFAIK, C preprocessor runs only once per compilation unit and only performs string replacement, but why print_score cannot be expanded to printf("%d\n", 80);?
This is the replacement procedure I guess:
#define score student_exam_score
#define print_score(student_exam_score) printf("%d\n", score)
#undef score
print_score(80);
// -->
#define score student_exam_score // runs this first
#define print_score(student_exam_score) printf("%d\n", student_exam_score) // changed
#undef score
print_score(80);
// -->
#define score student_exam_score
#define print_score(student_exam_score) printf("%d\n", student_exam_score) // then this
#undef score
printf("%d\n", 80); // changed

It's a sequencing issue. First the macros are defined, and score is undefined before it is ever used. Then, when print_score is expanded, it first substitutes all instances of student_exam_score, of which there are none. It then rescans the result, looking for further macros to expand, but there are none since score has been undefined and is no longer available.
Even if you moved #undef score down below the reference to print_score, it still wouldn't work since parameter substitution only happens once (score would be expanded but student_exam_score would not).
Note that score is not substituted into the body of print_score at the time is it defined. Substitution only happens when the macro is instantiated, which is why #undef score results in the score macro having no effect whatsoever.
These examples will make it clearer. First, consider the following:
#define foo bar
#define baz(bar) (foo)
baz(123)
This is expanded as follows:
baz(123)
-> (foo)
-> (bar)
Expansion stops here. Parameter substitution was done before foo was expanded, and does not happen again.
Now consider the following:
#define foo bar
#define baz(bar) (foo)
#undef foo
baz(123)
This is expanded as follows:
baz(123)
-> (foo)
Expansion stops here because foo is no longer defined. Its earlier definition had no effect on the definition of baz, because macro substitution does not happen when macros are defined. It only happens when they are expanded.

Related

Macro replacement list rescanning for replacement

I'm reading the Standard N1570 about macro replacement and misunderstand some wording from 6.10.3.4.
1 After all parameters in the replacement list have been substituted
and # and ## processing has taken place, all placemarker preprocessing
tokens are removed. The resulting preprocessing token sequence is then
rescanned, along with all subsequent preprocessing tokens of the
source file, for more macro names to replace
So after all # and ## are resolved we rescan the replacement list. But the section 2 specifies:
2 If the name of the macro being replaced is found during this scan of
the replacement list (not including the rest of the source file’s
preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced, it is not
replaced.
It looks contradictory to me. So what kind of replacement possible in that rescan? I tried the following example:
#define FOOBAR(a, b) printf(#a #b)
#define INVOKE(a, b) a##b(a, b)
int main() {
INVOKE(FOO, BAR); //expands to printf("FOO" "BAR")
}
So INVOKE(FOO, BAR) expands to FOOBAR(FOO, BAR) after substitution of ##. Then the replacement list FOOBAR(FOO, BAR) is rescanned. But the section 2. specifies that the name of the macro being replaced (FOOBAR) is found (yes, defined above) it is not replaced (but actually replaced as can be seen in th demo).
Can you please clarify that wording? What did I miss?
LIVE DEMO
The (original) macro being replaced is not FOOBAR, it's INVOKE. When you're expanding INVOKE and you find FOOBAR, you expand FOOBAR normally. However, if INVOKE had been found when expanding INVOKE, it would no longer be expanded.
Let's take the following code:
#define FOOBAR(a, b) printf(#a #b)
#define INVOKE(a, b) e1 a##b(a, b)
int main() {
INVOKE(INV, OKE);
}
I added the e1 to the expansion of INVOKE to be able to visualise how many expansions happen. The result of preprocessing main is:
e1 INVOKE(INV, OKE);
This proves that INVOKE was expanded once and then, upon rescanning, not expanded again.
[Live example]
Consider the following simple example:
#include<stdio.h>
const int FOO = 42;
#define FOO (42 + FOO)
int main()
{
printf("%d", FOO);
}
Here the output will be 84.
The printf will be expanded to:
printf("%d", 42 + 42);
This means that when the macro FOO is expanded, the expansion will stop when the second FOO is found. It will not be further expanded. Otherwise, you will end up with endless recursion resulting in: 42 + (42 + (42 + (42 + ....)
Live demo here.

Defining C macros in preprocessor if statements

Below I change the value of the function that I call depending on the value of INPUT:
#include <stdio.h>
#define INPUT second
#if INPUT == first
#define FUNCTOCALL(X) first(X)
#elif INPUT == second
#define FUNCTOCALL(X) second(X)
#endif
void first(int x) {
printf("first %d\n", x);
}
void second(int x) {
printf("second %d\n", x);
}
int main() {
FUNCTOCALL(3);
return 0;
}
However, the output is first 3, even if INPUT is equal to second, as above. In fact, the first branch is always entered, regardless of the value of INPUT. I'm completely stumped by this - could someone explain what stupid mistake I'm making?
The c preprocessor only works on integer constant expressions in its conditionals.
If you give it tokens it can't expand (such as first or second where first and second aren't macros)
it'll treat them as 0 and 0 == 0 was true last time I used math. That's why the first branch is always taken.
6.10.1p4:
... After all replacements due to macro expansion and the defined
unary operator have been performed, all remaining identifiers
(including those lexically identical to keywords) are replaced with
the pp-number 0, and then each preprocessing token is converted into a
token. ...
You have no macros first and second defined. Be aware that the pre-processor is not aware of C or C++ function names!* In comparisons and calculations (e. g. #if value or #if 2*X == Y), macros not defined (not defined at all or undefined again) or defined without value evaluate to 0. So, as first and second are not defined, INPUT is defined without value, and the comparison in both #if expressions evaluates to 0 == 0...
However, if you did define the two macros as needed, they would collide with the C function names and the pre-processor would replace these with the macro values as you just defined them, most likely resulting in invalid code (e. g. functions named 1 and 2)...
You might try this instead:
#define INPUT SECOND
#define FIRST 1
#define SECOND 2
#if INPUT == FIRST
#define FUNCTOCALL(X) first(X)
#elif INPUT == SECOND
#define FUNCTOCALL(X) second(X)
#else
# error INPUT not defined
#endif
Note the difference in case, making the macro and the function name differ.
* To be more precise, the pre-processor is not aware of any C or C++ tokens, so it does not know about types like int, double, structs or classes, ... – all it knows is what you make it explicitly aware of with #define, everything else is just text it operates on and, if encountering some known text nodes, replacing them with whatever you defined.

Can I capture the underlying value of one macro when defining another?

Imagine I want to #define a macro such that it is equal to the current value of another macro (if such a concept exists).
For example:
#include "def_a.h" // defines macro A
#define B A
This defines B to be A. If A later changes definition (i.e., through a redefinition) the value of B also changes (because B expands to A at the point of use, which further expands to the new value of A).
What I'd like is some way to "bake in" the value of A into B so that B just expands to the value of A, not A itself.
For example:
#define A first
#define B BAKE_IN(A)
#undef A
#define A second
#define C BAKE_IN(A)
#undef A
#define A third
// here I want B to expand to first, and C to expand to second
Of course BAKE_IN is not a real thing, but I'm wondering if there is some way to achieve this effect.
Now, I didn't really say what should happen if A itself is defined in terms of other macros, but I'm OK both with "one level of expansion" (i.e., B gets the value of A is expanded, so further macros remain) and also "full expansion" (i.e., A is fully expanded, recursively, as it would be at a point of use).
Macros are never replaced in the body of a #define directive, so there is no way to define a macro as the current value of another macro, except for the limited case of macros whose value is a constant arithmetic expression.
In the latter case, you can use BOOST_PP_ASSIGN_SLOT from the Boost preprocessor library. (Although most of the Boost libraries are C++-specific, the Boost preprocessor library works for both C and C++, and has no dependency on any other Boost component.)
I don't think there is a clean solution. The closest thing that I found is to preserve "stringified" values within char arrays:
#include <stdio.h>
#define BAKE_IN(X, id) BAKE_IN_REAL(X ## _, X, id)
#define BAKE_IN_REAL(X, Y, id) static const char X ## id[] = #Y;
#define BAKE_OUT(X, id) X ## _ ## id
#define A first
BAKE_IN(A, 1)
#define B BAKE_OUT(A, 1)
#undef A
#define A second
BAKE_IN(A, 2)
#define C BAKE_OUT(A, 2)
#undef A
int main(void)
{
printf("%s\n", B); // prints "first"
printf("%s\n", C); // prints "second"
return 0;
}
The idea is that BAKE_IN macro declares object named as e.g. A_1, which holds the current expansion of A.
There are two major limitations:
Every pair of BAKE_IN and BAKE_OUT needs unique id
The expansion is only available in "stringified" form

Extract Argument from C Macro

I have a number of definitions consisting of two comma-separated expressions, like this:
#define PIN_ALARM GPIOC,14
I want to pass the second expression of those definitions (14 in the case above) to unary macros like the following:
#define _PIN_MODE_OUTPUT(n) (1U << ((n) * 2U))
How can I extract the second number? I want a macro, call it "PICK_RIGHT", which will do this for me:
#define PICK_RIGHT(???) ???
So that I can make a new macro that can take my "PIN" definitions:
#define PIN_MODE_OUTPUT(???) _PIN_MODE_OUTPUT(PICK_RIGHT(???))
And I can simply do:
#define RESULT PIN_MODE_OUTPUT(PIN_ALARM)
Do not use macros for this. If you must, the following will work by throwing away the left part first so just the number remains. Use with care. No guarantees.
#define PIN_ALARM GPIOC,14
#define RIGHTPART_ONLY(a,b) b
#define PIN_MODE_OUTPUT(a) RIGHTPART_ONLY(a)
#define RESULT PIN_MODE_OUTPUT(PIN_ALARM)
int main (void)
{
printf ("we'll pick ... %d\n", PIN_MODE_OUTPUT(PIN_ALARM));
printf ("or maybe %d\n", RESULT);
return 0;
}
If you want the left part as a string, you can use this (with the same warnings as above), where the left part gets converted to a string by #:
#define LEFTPART_ONLY(a,b) #a
#define PIN_MODE_NAME(a) LEFTPART_ONLY(a)
There is a practical reason this is not entirely without problems. GPIOC is a symbol and as such it is possibly defined elsewhere. Fortunately, it is not a problem if it is undefined, or it is but to a simple type - after all, first thing the macros do is "throw away the left part". But as Jonathan Leffler comments
Note that if GPIOC maps to a macro containing commas, you're likely to get compilation errors.

Renaming a macro in C

Let's say I have already defined 9 macros from
ABC_1 to ABC_9
If there is another macro XYZ(num) whose objective is to call one of the ABC_{i} based on the value of num, what is a good way to do this? i.e. XYZ(num) should call/return ABC_num.
This is what the concatenation operator ## is for:
#define XYZ(num) ABC_ ## num
Arguments to macros that use concatenation (and are used with the operator) are evaluated differently, however (they aren't evaluated before being used with ##, to allow name-pasting, only in the rescan pass), so if the number is stored in a second macro (or the result of any kind of expansion, rather than a plain literal) you'll need another layer of evaluation:
#define XYZ(num) XYZ_(num)
#define XYZ_(num) ABC_ ## num
In the comments you say that num should be a variable, not a constant. The preprocessor builds compile-time expressions, not dynamic ones, so a macro isn't really going to be very useful here.
If you really wanted XYZ to have a macro definition, you could use something like this:
#define XYZ(num) ((int[]){ \
0, ABC_1, ABC_2, ABC_3, ABC_4, ABC_5, ABC_6, ABC_7, ABC_8, ABC_9 \
}[num])
Assuming ABC_{i} are defined as int values (at any rate they must all be the same type - this applies to any method of dynamically selecting one of them), this selects one with a dynamic num by building a temporary array and selecting from it.
This has no obvious advantages over a completely non-macro solution, though. (Even if you wanted to use macro metaprogramming to generate the list of names, you could still do that in a function or array definition.)
Yes, that's possible, using concatenation. For example:
#define FOO(x, y) BAR ##x(y)
#define BAR1(y) "hello " #y
#define BAR2(y) int y()
#define BAR3(y) return y
FOO(2, main)
{
puts(FOO(1, world));
FOO(3, 0);
}
This becomes:
int main()
{
puts("hello " "world");
return 0;
}

Resources