Is a repeated macro invocation via token concatenation unspecified behavior? - c

The C11 standard admits vagueness with regard to at least one situation that can arise in macro expansion, when a function like macro expands to its unenvoked name, and is invoked by the next preprocessing token. The example given in the standard is this.
#define f(a) a*g
#define g(a) f(a)
// may produce either 2*f(9) or 2*9*g
f(2)(9)
That example does not clarify what happens when a macro, M, is expanded, and all or part of the result contributes via token concatenation to a second preprocessing token, M, which is invoked.
Question: Is such an invocation blocked?
Here is an example of such an invocation. This issue tends to only come up when using a fairly complicated set of macros, so this example is contrived for the sake of simplicity.
// arity gives the arity of its args as a decimal integer (good up to 4 args)
#define arity(...) arity_help(__VA_ARGS__,4,3,2,1,)
#define arity_help(_1,_2,_3,_4,_5,...) _5
// define 'test' to mimic 'arity' by calling it twice
#define test(...) test_help_A( arity(__VA_ARGS__) )
#define test_help_A(k) test_help_B(k)
#define test_help_B(k) test_help_##k
#define test_help_1 arity(1)
#define test_help_2 arity(1,2)
#define test_help_3 arity(1,2,3)
#define test_help_4 arity(1,2,3,4)
// does this expand to '1' or 'arity(1)'?
test(X)
test(X) expands to test_help_A( arity(X) ), which invokes test_help_A on rescanning, which expands its arg before substitution, and so is identical to test_help_A(1), which produces test_help_B(1), which produces test_help_1. This much is clear.
So, the question comes in here. test_help_1 is produced using a character, 1, that came from an expansion of arity. So can the expansion of test_help_1 invoke arity again? My versions of gcc and clang each think so.
Can anyone argue that the interpretations made by gcc and clang are required by something in the standard?
Is anyone aware of an implementation that interprets this situation differently?

I think that gcc's and clang's interpretation are correct. The two expansions of arity are not in the same call path. The first descends from the expansion of test_help_A's argument, the second from the expansion of test_help_A itself.
The idea of these rules is to guarantee that there can't be infinite recursion, which is guaranteed, here. There is progress in the evaluation of the macro between the two calls.

Related

Restricting preprocessing-numbers in a C preprocessor to only handle valid floating and integer constants

I'm currently implementing a C11 compiler and I'm aiming to integrate the preprocessor into the rest compiler and not have it as a stand-alone component. As such, the preprocessor can safely assume that its output will be valid in the following stages.
Reading about the preprocessing number token, it seems like it only exists to simplify the implementation of a stand-alone preprocessor. Simplifying the format of numbers, it doesn't have to handle the full complexity of numeral expressions. Quoting the GCC docs:
The purpose of this unusual definition is to isolate the preprocessor from the full complexity of numeric constants. It does not have to distinguish between lexically valid and invalid floating-point numbers, which is complicated.
As the preprocessor will be integrated to the rest of the compiler framework, this is not an issue for me.
In section 6.4.8.4 [Preprocessing numbers; Semantics] of the C11 standard, it claims
A preprocessing number does not have type or a value; it acquires both after a successful conversion (as part of translation phase 7) to a floating constant token or an integer constant token.
So it seems like every preprocessing-number will be converted into a floating or integer constant later on in the compilation process. I cannot find any other references to preprocessing-numbers in the standard, so it seems like this is their only purpose, but I may be wrong.
My question is, would it be valid for the preprocessor to restrict preprocessing-numbers to only valid integer and floating point constants? Or are there cases where having such a restriction would cause otherwise valid programs to fail?
There are certainly valid programs which include pp-numbers not convertible to an integer or float. The common case is a preprocessing token which does not become a token.
For example, it might be stringified:
#define STRINGIFY_(X) #X
#define STRINGIFY(V) STRINGIFY_(V)
#define VERSION 3.4.6a
#define PROGNAME foo
int main(void) {
printf("%s-%s\n", STRINGIFY(PROGNAME), STRINGIFY(VERSION));
}
Moreover, the version number in the above example could have been produced with token concatenation, another way preprocessing tokens never become program tokens:
#include <stdio.h>
#define STRINGIFY_(X) #X
#define STRINGIFY(V) STRINGIFY_(V)
#define CONCAT3_(x,y,z) x##y##z
#define CONCAT3(x,y,z) CONCAT3_(x,y,z)
#define CONCAT_V(mj, mn, pl) CONCAT3(mj, ., CONCAT3(mn, ., pl))
#define MAJOR 3
#define MINOR 4
#define PATCH 6a
#define VERSION CONCAT_V(MAJOR, MINOR, PATCH)
#define PROGNAME foo
int main(void) {
printf("%s-%s\n", STRINGIFY(PROGNAME), STRINGIFY(VERSION));
}
There are other ways for a pp-number (or any other preprocessing token) to never be converted to a token:
As the argument to a macro which does not use the corresponding parameter in its replacement text.
In program text in a preprocessor conditional whose controlling expression is false.
This is often used "in the wild" by to hide not-completely written code inside an #if 0 … #endif block; the excluded code may have almost arbitrary syntax errors, as long as comments and strings are terminated, included invalid pp-numbers and even stray punctuation. (# is a valid preprocessing token which cannot be converted to a token.)

What does #define (integer) do?

Certainly a dup and I shall remove it ASAP I'll run into an answer. I just can't find what I'm looking for.
What does this two lines in c mean?
#define NN_DIGITS ( 4)
#define MM_MARKS_DONE (255)
I know what #define and #define () does, where #define () execute the macro in (), but I don't know this particular caveat (with an integer).
Is actually redundant to write down () to define an integer value? Shall this values be interpreted bitwise? What will happen if we shan't write (). Will 4 and 255 be taken as a string?
Keyword: "execute". This is the root of your misunderstanding.
Macros aren't executed. They are substituted. The preprcosseor replaces the token NN_DIGITS by the token sequence ( 4). As a matter of fact, it would replace it with practically any token sequence. Even #define NN_DIGITS ( %34 (DDd ] is a valid macro definition (courtesy of my cat), despite the fact we most certainly don't want to try and expand it.
Is actually redundant to write down () to define an integer value?
From a practical standpoint, yes, it's redundant. But some would probably do it to maintain consistency with other macros where the resulting expressions can depend on the presence of parenthesis.
Shall this values be interpreted bitwise?
Everything is bitwise to a computer.
What will happen if we shan't write (). Will 4 and 255 be taken as a string?
No, it will just be the tokens 4 and 255 as opposed to the sequences ( 4) and (255) respectfully. The preprocessor deals only in tokens, it knows practically nothing about the type system. If the macro appear in a program, say:
int a = NN_DIGITS;
It will be turned by the preprocessor into:
int a = ( 4);
And then compiled further by the other steps in the pipeline of turning a program into an executable.
The parenthesis does absolutely nothing in this case - it's just noise.
There's a general rule of survival saying that function-like macros should always:
Wrap each occurrence of a macro parameter in parenthesis, and
Wrap the whole macro in an outer parenthesis
That is:
#define ADD(x,y) x + y // BAD
#define ADD(x,y) (x) + (y) // BAD
#define ADD(x,y) ((x) + (y)) // correct
This is to dodge issues of operator precedence and will be addressed by any decent beginner-level learning material.
Overly pedantic people who've learned the above rules tend to apply them to all macros, not just function-like macros. But in case the macro contains nothing but a single integer constant (a single pre-processor token), then the parenthesis achieves absolutely nothing.
Is actually redundant to write down () to define an integer value?
Yes, it just adds noise.
Shall this values be interpreted bitwise?
Macros are mostly just to regard as text replacement. What you do with the value in the calling code is no business of the macro.
What will happen if we shan't write ()
The code will get slightly easier to read.
Will 4 and 255 be taken as a string?
No, why would they.
There is a specific case where the parenthesis causes harm though, and that is when you use macros to convert a pre-processor constant to a string. Suppose I have this program:
#define STR(x) #x
#define AGE(x) STR(x)
#define DOG_AGE 5
int main(void)
{
puts("My dog is " AGE(DOG_AGE) " years old.");
}
AGE expands the macro DOG_AGE to 5 and then the next macro converts it to a string. So this prints My dog is 5 years old. because the # operator converts the pre-processor token exactly as it is given. If I add "useless noise parenthesis" to the macro:
#define DOG_AGE (5)
Then the output becomes My dog is (5) years old. Not what I intended.

How to use the token pasting operator with a variable number of arguments?

I thought of having a generic version of #define concatenate(a, b, c) a ## b ## c
I tried it like this:
#include <stdio.h>
#define concatenate(arg1, ...) arg1 ## __VA_ARGS__
int main()
{
int dob = 121201;
printf("%d", concatenate(d, o, b));
return 0;
}
I also tried many other ways:
#define concatenate(arg1, ...) arg1 ## ##__VA_ARGS__
#define concatenate(...) ## ##__VA_ARGS__
#define concatenate(...) ##__VA_ARGS__
#define concatenate(arg1, ...) arg1 ## ...
#define concatenate(arg1, ...) arg1 ## concatenate(##__VA_ARGS__)
Alas, all my attempts failed. I was wondering if it is even possible to do this in any way?
It's possible. Jens Gustedt's interesting P99 macro library includes the macro P99_PASTE, which has precisely the signature of your concatenate, as well as the same semantics.
The mechanics which P99 utilizes to implement that function are complex, to say the least. In particular, they rely on several hundred numbered macros which compensate for the fact that the C preprocessor does not allow recursive macro expansion.
Another useful explanation of how to do iteration in the C preprocessor is found in the documentation for the Boost Preprocessor Library, particularly the topic on reentrancy.
Jens' documentation for P99_PASTE emphasizes the fact that the macro pastes left-to-right to avoid the ambiguity of ##. That might need a bit of explanation.
The token-paste (##) operator is a binary operator; if you want to paste more than two segments into a single token, you need to do it a pair at a time, which means that all intermediate results must be valid tokens. That can require a certain amount of caution. Consider, for example, this macro which attempts to add an exponent to the end of an integer:
#define EXPONENT(INT, EXP) INT ## E ## EXP
(This will only work if both macro arguments are literal integers. In order to allow the macro arguments to be macros, we would need to introduce another level of indirection in the macro expansion. But that's not the point here.)
What we will almost immediately discover is that EXPONENT(42,-3) doesn't work, because -3 is not a single token. It's two tokens, - and 3, and the paste operator will only paste the -. That will result in a two-token sequence 42E- 3, which will eventually lead to a compiler error.
42E and 42E- are valid tokens, by the way. They are ppnumbers, preprocessing numbers, which are any combination of dots, digits, letters and exponents, provided that the token starts with a digit or a dot followed by a digit. (Exponents are one of the letters E or P, possibly lower-case and possibly followed by a sign. Otherwise, sign characters cannot appear in a ppnumber.)
So we could try to fix this by asking the user to separate the sign from the number:
#define EXPONENT(INT, SIGN, EXP) INT ## E ## SIGN ## EXP
EXPONENT(42,-,3)
That will work if the ## operators are evaluated from left-to-right. But the C standard does not impose any particular evaluation order of multiple ## operators. If we're using a preprocessor which works from right to left, then the first thing it will try to do is to paste - and 3, which won't work because -3 is not a single token, just as with the simpler definition.
Now, I can't offer an example of a compiler which will fail on this macro, since I don't have a right-to-left preprocessor handy. Both gcc and clang evaluate ## left-to-right, and I think that's far and away the most common evaluation order. But you can't rely on that; in order to write portable code, you need to ensure that the paste operators are evaluated in the expected order. And that's the guarantee offered by P99_PASTE.
Note: It's possible that there is an application in which right-to-left pasting is required, but after thinking about it for some time, the only example I could come up with of a token paste which would work right-to-left but not left-to-right is the following rather obscure corner case:
#define DOUBLE_HASH %: ## % ## :
and I can't think of any plausible context in which that might come up.

Working of conditional compilation #if and #else (and others) in c

I tried to write a program using some conditional compilation pre-processing directives instead of "if-else" as follows.
#include<stdio.h>
int main ()
{
int x;
scanf ("%d",&x);
#if (x==5)
printf ("x is 5");
#else
printf ("x not 5");
#endif
}
But the thing is, it always print the else part even though value of xis 5. My simplest question is----->WHY?
Is it possible to successfully complete this program (i.e taking value of x from user and check conditions using #if directive and print statement under #if).
During compilation it shows a warning "'x' is not defined, evaluates to 0". But x seems defined to me. Does that mean x should be defined using #define. Please explain me concept behind Conditional Compilation.
x is not an integer literal or an integer literal expression (integer literals + operators) or a macro expanding to those, so in a conditional, the preprocessor replaces it with 0 (6.10.1p4). 0==5 is false, so the #else branch is taken.
The preprocessor doesn't know about C declarations, types and such. It only works with tokens (and macros that ultimately expand to those).
6.10.1p4
After all replacements due to macro expansion and the defined unary
operator have been performed, all remaining identifiers (including
those lexically identical to keywords) are replaced with the pp-number
0, and then each preprocessing token is converted into a token.
Preprocessing takes place before the compilation. So preprocessor does not know anything about your C code or variables. You cant use any C variables in conditions.
Conditional compilation is for different purposes.
#define DEBUG
/* ....*/
#ifdef DEBUG
printf("Some debug value %d\n", val);
#endif
Operands in #if statements can be only constants, things defined with #define, and a special defined operator. Any other identifiers in the expression are replaced with 0. The x in your sample code is not defined with #define, so (x==5) becomes (0==0).
In the C 2018 standard, clause 6.10.1 tells us that evaluation of the expression in an #if statement includes:
Preprocessor macros (things defined with #define) are replaced according to their definitions.
Uses of the defined operator are replaced with 0 or 1.
Any remaining identifiers are replaced with 0.
Because the x in your sample code is not defined with #define, it is replaced with 0 in the #if statement. This results in (0==5), which is false, so code between the #if and the #else is skipped.
In a preprocessor statement, you cannot evaluate variables based on values that will be set during program execution.
It's the "pre-processor". "Pre" means "before".
You're trying to use a runtime value during preprocessing! The preprocessor of course has no access to that information during the build.
This problem isn't limited to runtime values, but is more fundamental. Even if you were trying to use a (named) compile-time constant such as constexpr int x = 2, you couldn't do that. These are two languages interleaving, like generating HTML with PHP; the HTML has no knowledge of PHP variables, and the PHP has no knowledge of what widgets the user clicks on the page. These are completely different execution contexts with no built-in interaction or cross-compatibility.

sizeof() is not executed by preprocessor

#if sizeof(int) != 4
/* do something */
Using sizeof inside #if doesn't work while inside #define it works, why?
#define size(x) sizeof(x)/sizeof(x[0]) /*works*/
Nothing is evil - everything can be misused, or in your case misunderstood. The sizeof operator is a compiler feature, but compiler features are not available to the preprocessor (which runs before the compiler gets involved), and so cannot be used in #if preprocessor directives.
However, when you say:
#define size(x) sizeof(x)/sizeof(x[0])
and use it:
size(a)
the preprocessor performs a textual substitution that is handed to the compiler:
sizeof(a)/sizeof(a[0])
C "Preprocessor" Macros Only Evaluate Constants and Other Macros
The short answer is a preprocessor expression only provides a meaningful evaluation of an expression composed of other preprocessor macros and constants.
Try this, you will not get an error:
#if sizeof < 2
int f(int x) { return x; }
#endif
If you generate assembly, you will find that sizeof < 2 compiles the function and sizeof >= 2 does not. Neither returns an error.
What's going on? It turns out that, except for preprocessor macros themselves, all identifiers in a preprocessor ("macro") expression are replaced with 0. So the above #if is the same as saying:
#if Easter_Bunny < 2
or
#if 0 < 2
This is why you don't actually get any sort of error when mistakenly using the sizeof operator in a preprocessor expression.
As it happens, sizeof is an operator, but it's also an identifier, and identifiers that are not themselves macros all turn into 0 in preprocessor expressions. The preprocessor runs, at least conceptually, before the compiler. It can turn non-C syntax into C so at the point it is running, the C program hasn't even been parsed yet. It isn't possible to reference actual C objects yet: they don't exist.
And naturally, a sizeof in the replacement text of a definition is simply passed through to the compiler as, well, the replacement text where the macro is used.
The preprocessor cannot evaluate the results of the sizeof operator. That is calculated by the compiler, long after the preprocessor is finished.
Since the second expression results in a compile-time computation, it works. The first is an impossible test for the preprocessor.
#define is merely text replacement. #if being a conditional preprocessor directive evaluates sizeof() but at the time of preprocessing the preprocessor has no idea what sizeof() is. Preprocessor runs before the lexical analysis phase.
sizeof is replaced at compile time.
Preprocessing runs before compile starts.
The compiler doesn't touch either line. Rather, the preprocessor rips through the file, replacing any instances of size(x) with your macro. The compiler DOES see these replacements.
Preprocessor doesn't know sizeof operator, it just cannot understand it. So #if doesn't work, since it has to understand it to work, because it is a conditional conditional preprocessor; it needs to know whether it evaluates to true or false.
But #define doesn't need to understand sizeof, as #define is just for text replacement. Preprocessor searches size macro (defined in #define) in the source code, and replaces it with what it is defined to be, which is in your case sizeof(x)/sizeof(x[0]).
The reason it doesn't work is because the pre-processor macros are 'evaluated' in a pass before the code reaches the compiler. So in the if pre-processor directive, the sizeof(int) (actually the sizeof(int) != 4) cannot be evaluated because that is done by the compiler, not the pre-processor.
The define statement though, simply does a text substitution, and so when it comes to the compiler, everywhere you had 'size(x)' you would have 'sizeof(x)/sizeof(x[0])' instead, and then this evaluates there at the compile stage... at every point in the code where you had 'size(x)'
If you want to check the size of the integer in the processor, use your make system to discover the size of integer on your system before running the preprocessor and write it to a header file as e.g. #define SIZEOF_INT 4, include this header file and do #if SIZEOF_INT == 4
For example, if you use cmake, you can use the CMAKE_SIZEOF_INT variable which has the size of the integer which you can put in a macro.

Resources