Difference between macro and preprocessor - c

From what I understand , #define blah 8 is a macro . While , # is the pre-processor directive .
Can we say #include,#if,#ifdef,etc. are also macros , or are they called something else ? Or is it that macro is just a term used for #define statements only?
Please correct me if I am wrong.

Lines that start with # are preprocessing directives. They are directives that tell the preprocessor to do something.
#include, #if, #ifdef, #ifndef, #else, #elif, #endif, #define, #undef, #line, #error, and #pragma are all preprocessing directives. (A line containing only # is also a preprocessing directive, but it has no effect.)
#define blah 8 is a preprocessing directive, it is not a macro. blah is a macro. This #define preprocessing directive defines the macro named blah as an object-like macro replaced by the token 8.

#include, #if, etc. are features of the preprocessor.
#define blah 8
Is a preprocessor directive and declares a new macro named blah.
Macros are the result of a #define statement.
The preprocessor is a feature of C.

Preporcessor: the program that does the preprocessing (file inclusion, macro expansion, conditional compilation).
Macro: a word defined by the #define preprocessor directive that evaluates to some other expression.
Preprocessor directive: a special #-keyword, recognized by the preprocessor.

preprocessor modifies the source file before handing it over the compiler.
Consider preprocessor as a program that runs before compiler.
Preprocessor directives are like commands to the preprocessor program.Some common preprocessor directives in C are
#include <header name> - Instructs the preprocessor to paste the text of the given file to the current file.
#if <value> - Checks whether the value is true if so it will include the code until #endif
#define - Useful for defining a constant and creating a macro
macros are name for some fragment of code.So wherever the name is used it get replaced by the fragment of code by the preprocessor program.
eg:
#define BUFFER_SIZE 100
In your code wherever you use BUFFER_SIZE it gets replaced by 100
int a=BUFFER_SIZE;
a becomes 100 here
There are also many predefined macros in C for example __DATE__,__TIME__ etc.

Related

Check if a constructed constant is #define'd

I am trying to build a test that checks if a certain file defines a header guard with a certain namespace. Because the test is generic, this namespace is only known at compile-time and passed in as -DTHENAMESPACE=BLA. We then use some magic from https://stackoverflow.com/a/1489985/1711232 to paste that together.
This means I want to do something like:
#define PASTER(x, y) x##_##y
#define EVALUATOR(x, y) PASTER(x, y)
#define NAMESPACE(fun) EVALUATOR(THENAMESPACE, fun)
#ifndef NAMESPACE(API_H) // evaluates to BLA_API_H
# error "namespace not properly defined"
#endif
But this does not work properly, with cpp complaining about the ifndef not expecting the parentheses.
How can I do this properly, if it is possible at all?
I have also tried adding more layers of indirection, but not with a lot of success.
So directly, properly executing the #ifdef this at least appears to not be possible:
Considering the defined operator:
If the defined operator appears as a result of a macro expansion, the C standard says the behavior is undefined. GNU cpp treats it as a genuine defined operator and evaluates it normally. It will warn wherever your code uses this feature if you use the command-line option -Wpedantic, since other compilers may handle it differently. The warning is also enabled by -Wextra, and can also be enabled individually with -Wexpansion-to-defined.
https://gcc.gnu.org/onlinedocs/cpp/Defined.html#Defined
and ifdef expects a MACRO, and does not do further expansion.
https://gcc.gnu.org/onlinedocs/cpp/Ifdef.html#Ifdef
But maybe it is possible to trigger an 'undefined constant' warning (-Wundef), which would also allow my test pipeline to catch this problem.
If we assume that include guards always looks like
#define NAME /* no more tokens here */
and if, as you said, any compile time error (rather than #error exclusively) is acceptable, then you can do following:
#define THENAMESPACE BLA
#define BLA_API_H // Comment out to get a error.
#define CAT(x,y) CAT_(x,y)
#define CAT_(x,y) x##y
#define NAMESPACE(x) static int CAT(UNUSED_,__LINE__) = CAT(CAT(THENAMESPACE,CAT(_,x)),+1);
NAMESPACE(API_H)
Here, NAMESPACE(API_H) tries to concatenate BLA_API_H and + using ##.
This results in error: pasting "BLA_API_H" and "+" does not give a valid preprocessing token except if BLA_API_H is #defined to 'no tokens'.
In presence of #define BLA_API_H, NAMESPACE(API_H) simply becomes
static int UNUSED_/*line number here*/ = +1;
If you settle for a less robust solution, you can even get nice error messages:
#define THENAMESPACE BLA
#define BLA_API_H // Comment out to get a error.
#define TRUTHY_VALUE_X 1
#define CAT(x,y) CAT_(x,y)
#define CAT_(x,y) x##y
#define NAMESPACE(x) CAT(CAT(TRUTHY_VALUE_,CAT(THENAMESPACE,CAT(_,x))),X)
#if !NAMESPACE(API_H)
#error "namespace not properly defined"
#endif
Here, if BLA_API_H is defined, then #if !NAMESPACE(API_H) expands to #if 1.
If BLA_API_H is not defined, then it expands to #if TRUTHY_VALUE_BLA_API_HX, and TRUTHY_VALUE_BLA_API_HX evaluates to false due to being undefined.
The problem here is that if TRUTHY_VALUE_BLA_API_HX accidentally turns out to be defined to something truthy, you'll get a false negatie.
I don't think that macro expansion inside #ifndef directive is possible in the realm of standard C.
#ifndef symbol is equivalent of #if !defined symbol. Standard says this about it (§6.10.1 Conditional inclusion):
... it may contain unary operator expressions of the form
defined identifier
or
defined ( identifier )
and
Prior to evaluation, macro invocations in the list of preprocessing tokens that will become the controlling constant expression are replaced (except for those macro names modified by the defined unary
operator), just as in normal text. If the token defined is generated as a result of this replacement
process or use of the defined unary operator does not match one of the two specified forms prior to
macro replacement, the behavior is undefined. ...
So basically identifiers in defined expression are not expanded, and your current NAMESPACE(API_H) is not valid form of identifier.
Possible workaround could be to simply use:
#if NAMESPACE(API_H) == 0
# error "namespace not properly defined"
#endif
This works because non-existing identifiers are replaced with 0. Problem with this approach is that there will be false error if BLA_API_H is defined as 0, but depending on your situation that may be acceptable.
You can also do this using preprocessor pattern matching.
#define PASTE3(A,B,C) PASTE3_I(A,B,C)
#define PASTE3_I(A,B,C) A##B##C
#define PASTE(A,B) PASTE_I(A,B)
#define PASTE_I(A,B) A##B
#define THIRD(...) THIRD_I(__VA_ARGS__,,,)
#define THIRD_I(A,B,C,...) C
#define EMPTINESS_DETECTOR ,
#if THIRD(PASTE3(EMPTINESS_,PASTE(THENAMESPACE,_API_H),DETECTOR),0,1)
# error "namespace not properly defined"
#endif
This is the same idea as #HolyBlackCat's answer except that it's done in the preprocessor; the inner paste in the expression in the #if directive generates a token based on THENAMESPACE, pasting it to your required _API_H. If that token itself is defined in a macro, it will expand to a placemarker during the PASTE3 operation; this effectively pastes EMPTINESS_ [placemarker] DETECTOR, which is a macro expanding to a comma. That comma will shift the arguments of the indirect THIRD, placing 0 there, making the conditional equivalent to #if 0. Anything else won't shift the arguments, which results in THIRD selecting 1, which triggers the #error.
This also makes the same assumption HolyBlackCat's answer makes... that inclusion guards always look like #define BLA_API_H, but you can accommodate specific alternate styles using expanded pattern matching... for example, if you want to accept inclusion guards like #define BLAH_API_H 1 (who does that?), you could add #define EMPTINESS_1DETECTOR ,.
The language defines no way other than use of the defined operator or an equivalent to test whether identifiers are defined as macro names. In particular, a preprocessor directive of the form
#ifndef identifier
is equivalent to
#if ! defined identifier
(C11, 6.10.1/5). Similar applies to #ifdef.
The defined operator takes a single identifier as its operand, however, not an expression (C11, 6.10.1/1). Moreover, although the expression associated with an #if directive is macro expanded prior to evaluation, the behavior is undefined if in that context macro expansion produces the token "defined", and macro names modified by the defined unary operator are explicitly excluded from expansion (C11, 6.10.1/4).
Thus, although it is possible in many contexts to construct macro names via token pasting, and in such contexts the results are thereafter be subject to macro expansion, the operand of a defined operator is not such a context. The language therefore defines no way to test whether a constructed or indirectly-specified identifier is defined as a macro name.
HOWEVER, you can avoid relying on defined if you are in control of all the header guards, and you are willing to deviate slightly from traditional style. Instead of merely #defineing your header guards, define them to some nonzero integer value, say 1:
#if ! MYPREFIX_SOMEHEADER_H
#define MYPREFIX_SOMEHEADER_H 1
// body of someheader.h ...
#endif
You can then drop the defined operator from your test expression:
#if ! NAMESPACE(API_H)
# error "namespace not properly defined"
#endif
Do note, however, that the #define directive has a similar issue: it defines a single identifier, which is not subject to prior macro expansion. Thus, you cannot dynamically construct header guard names. I'm not sure whether you had that in mind, but if you did, then all the foregoping is probably moot.

Using #undef before #define

In many places I see the usage of undefine macro before defining the same macro. For example:
#undef FORMULA
#ifdef SOMETHING
#define FORMULA 1
#else
#define FORMULA 2
#endif
What for the #undefine FORMULA used?
I may guess that it deals with the case when the macro was already defined before. But isn't the new definition overrides the old one? Thanks!
A macro name currently defined cannot be redefined with a different definition (see below), so #undef allows that macro name to be redefined with a different definition.
Here's the relevant legalese:
Both C and C++ Standards (same wording):
A macro definition lasts (independent of block structure) until a corresponding #undef directive is encountered or (if none is encountered) until the end of the preprocessing translation unit.
Slight differences in wording, same meaning:
C Standard (N1256), §6.10.3/2:
An identifier currently defined as an object-like macro shall not be redefined by another #define preprocessing directive unless the second definition is an object-like macro definition and the two replacement lists are identical. Likewise, an identifier currently defined as a function-like macro shall not be redefined by another #define preprocessing directive unless the second definition is a function-like macro definition that has the same number and spelling of parameters, and the two replacement lists are identical.
C++ Standard (N3337) §16.3/2
An identifier currently defined as an object-like macro may be redefined by another #define preprocessing directive provided that the second definition is an object-like macro definition and the two replacement lists are identical, otherwise the program is ill-formed. Likewise, an identifier currently defined as a function-like macro may be redefined by another #define preprocessing directive provided that the second definition is a function-like macro definition that has the same number and spelling of parameters, and the two replacement lists are identical, otherwise the program is ill-formed.
Same wording in both Standards:
Two replacement lists are identical if and only if the preprocessing tokens in both have the same number, ordering, spelling, and white-space separation, where all white-space separations are considered identical.
So:
#define X(y) (y+1)
#define X(z) (z+1) // ill-formed, not identical
IMHO, using #undef is generally dangerous due to the scoping rules for preprocessor macros. I'd prefer to get a warning or error from the preprocessor and come up with a different preprocessor macro rather than have some translation unit silently accept a wrong macro definition that introduces a bug into the program. Consider:
// header1.h
#undef PORT_TO_WRITE_TO
#define PORT_TO_WRITE_TO 0x400
// header2.h
#undef PORT_TO_WRITE_TO
#define PORT_TO_WRITE_TO 0x410
and have a translation unit #include both headers. No warning, probably not the intended result.
Yes, the new definition overrides all previous. But there is corresponding warning message. With #undef you have no warnings.
But isn't the new definition overrides the old one?
Yes, it does (when your compiler allows it). However, redefining a macro results in a compiler warning, which using #undefine lets you avoid.
This may be important in programming shops with strict rules on compiler warnings - for example, by requiring all production code to be compiled with -Werror flag, which treats all warnings as errors.
#undef removes the macro, so the name is free to be defined again.
If the macro was never defined in the first place, #undef has no effect, so there's no downside. #undef … #define should be read as replacing any potential previous definition, but not to imply that it must already be defined.
Popular compilers do allow you to skip the #undef, but this is not allowed by the official standard ISO C and C++ language specifications. It is not portable to do so.

What do parentheses mean in an #if defined preprocessor operator?

(I am working on a SDK wherein I have the code of the particular SDK in reference and I am not able to trace out the flow of the program.)
What does
#if defined (AR7x00)
mean? Specifically, what is the purpose of parentheses in a such a preprocessor operator?
These three preprocessor directives:
#if defined (AR7x00)
#if defined AR7x00
#ifdef AR7x00
all mean exactly the same thing: that the following code is to be processed only if the macro AR7x00 is currently defined.
The #ifdef ... directive is simply a convenient alternative to #if defined .... There's also a #ifndef ... directive; #ifndef FOO is equivalent to #if ! defined FOO.
As for the parentheses, the syntax for the defined operator allows for either an identifier, or an identifier in parentheses, with no difference in meaning. I'm not entirely sure why the parentheses are optional; I suspect it's just historical. (The language reference in the 1978 first edition of K&R doesn't mention the defined operator. The second edition shows both forms, with and without parentheses.)
Strictly speaking, these are not grouping parentheses of the kind you see in ordinary expressions; they're specifically part of the syntax of the defined operator, which can only be used in preprocessor #if directives. In particular, this:
#if defined ((AR7x00))
is a syntax error.
I think the parantheses are more important when using multiple conditionals in the preprocessor. For instance the following conditional expression
#if defined( foo ) || defined( bar )
#error SOME PROBLEM
#endif
evaluates differently in my gnu-arm-gcc environment than this one
#if defined foo || defined bar
#error SOME PROBLEM
#endif
The first example does what I expect, while the other doesn't seem to ever match regardless of if the tokens "foo" and/or "bar" are #defined.
The gnu.org online docs state that you can use defined with or without parentheses.
From the MSDN entry for #if:
You can also group symbols and operators with parentheses.
The parentheses don't do anything other than group symbols and operators together to modify precedence.
C11, 6.10.1 Conditional inclusion, 5:
Preprocessing directives of the forms
# ifdef identifier new-line groupopt
# ifndef identifier new-line groupopt
check whether the identifier is or is not currently defined as a macro name. Their conditions are equivalent to #if defined identifier and #if !defined identifier respectively.
Here we see that parentheses are not required.

Can a C macro definition refer to other macros?

What I'm trying to figure out is if something such as this (written in C):
#define FOO 15
#define BAR 23
#define MEH (FOO / BAR)
is allowed? I would want the preprocessor to replace every instance of
MEH
with
(15 / 23)
but I'm not so sure that will work. Certainly if the preprocessor only goes through the code once then I don't think it'd work out the way I'd like.
I found several similar examples but all were really too complicated for me to understand. If someone could help me out with this simple one I'd be eternally grateful!
Short answer yes. You can nest defines and macros like that - as many levels as you want as long as it isn't recursive.
The answer is "yes", and two other people have correctly said so.
As for why the answer is yes, the gory details are in the C standard, section 6.10.3.4, "Rescanning and further replacement". The OP might not benefit from this, but others might be interested.
6.10.3.4 Rescanning and further replacement
After all parameters in the replacement list have been substituted and
# and ## processing has taken place, all placemarker preprocessing tokens are removed.
Then, the resulting preprocessing token sequence
is rescanned, along with all subsequent preprocessing tokens of the
source file, for more macro names to replace.
If the name of the macro being replaced is found during this scan of
the replacement list (not including the rest of the source file's
preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced, it is not
replaced. These nonreplaced macro name preprocessing tokens are no
longer available for further replacement even if they are later
(re)examined in contexts in which that macro name preprocessing token
would otherwise have been replaced.
The resulting completely macro-replaced preprocessing token sequence
is not processed as a preprocessing directive even if it resembles
one, but all pragma unary operator expressions within it are then
processed as specified in 6.10.9 below.
Yes, it's going to work.
But for your personal information, here are some simplified rules about macros that might help you (it's out of scope, but will probably help you in the future). I'll try to keep it as simple as possible.
The defines are "defined" in the order they are included/read. That means that you cannot use a define that wasn't defined previously.
Usefull pre-processor keyword: #define, #undef, #else, #elif, #ifdef, #ifndef, #if
You can use any other previously #define in your macro. They will be expanded. (like in your question)
Function macro definitions accept two special operators (# and ##)
operator # stringize the argument:
#define str(x) #x
str(test); // would translate to "test"
operator ## concatenates two arguments
#define concat(a,b) a ## b
concat(hello, world); // would translate to "helloworld"
There are some predefined macros (from the language) as well that you can use:
__LINE__, __FILE__, __cplusplus, etc
See your compiler section on that to have an extensive list since it's not "cross platform"
Pay attention to the macro expansion
You'll see that people uses a log of round brackets "()" when defining macros. The reason is that when you call a macro, it's expanded "as is"
#define mult(a, b) a * b
mult(1+2, 3+4); // will be expanded like: 1 + 2 * 3 + 4 = 11 instead of 21.
mult_fix(a, b) ((a) * (b))
Yes, and there is one more advantage of this feature. You can leave some macro undefined and set its value as a name of another macro in the compilation command.
#define STR "string"
void main() { printf("value=%s\n", VALUE); }
In the command line you can say that the macro "VALUE" takes value from another macro "STR":
$ gcc -o test_macro -DVALUE=STR main.c
$ ./test_macro
Output:
value=string
This approach works as well for MSC compiler on Windows. I find it very flexible.
I'd like to add a gotcha that tripped me up.
Function-style macros cannot do this.
Example that doesn't compile when used:
#define FOO 1
#define FSMACRO(x) FOO + x
Yes, that is supported. And used quite a lot!
One important thing to note though is to make sure you paranthesize the expression otherwise you might run into nasty issues!
#define MEH FOO/BAR
// vs
#define MEH (FOO / BAR)
// the first could be expanded in an expression like 5 * MEH to mean something
// completely different than the second

Why # is required before #include<stdio.h>?

What is the function of #?
It denotes a preprocessor directive:
One important thing you need to remember is that the C preprocessor is not part of the C compiler.
The C preprocessor uses a different syntax. All directives in the C preprocessor begin with a pound sign (#). In other words, the pound sign denotes the beginning of a preprocessor directive, and it must be the first nonspace character on the line.
# was probably chosen arbitrarily as an otherwise unused character in C syntax. # would have worked just as well, I presume.
If there wasn't a character denoting it, then there would probably be trouble differentiating between code intended for the preprocessor -- how would you tell whether if (FOO) was meant to be preprocessed or not?
Because # is the standard prefix for introducing preprocessor statements.
In early C compilers, the pre-processor was a separate program which would handle all the preprocessor statements (similar to the way early C++ "compilers" such as cfront generated C code) and generate C code for the compiler (it may still be a separate program but it may also be just a phase of the compiler nowadays).
The # symbol is just a useful character that can be recognised by the preprocessor and acted upon, such as:
#include <stdio.h>
#if 0
#endif
#pragma treat_warnings_as_errors
#define USE_BUGGY_CODE
and so on.
Preprocessor directives are lines included in the code of our programs that are not program statements but directives for the preprocessor. These lines are always preceded by a hash sign (#). The preprocessor is executed before the actual compilation of code begins, therefore the preprocessor digests all these directives before any code is generated by the statements.
Source: http://www.cplusplus.com/doc/tutorial/preprocessor/
It's because # is an indicator that its a preprocessor statement
meaning before it compiles your code, it is going to include the file stdio.h
# is a pre-processor directive. The preprocessor handles directives for source file inclusion (#include), macro definitions (#define), and conditional inclusion (#if).
When the pre-processor encounters this, it will include the headers, expand the macros and proceeds towards compilation. It can be used for other purposes like halting compilation using the #error directive. This is called conditional compilation.
We know, without preprocessor programm do not run. And preprocessor is # or #include or #define or other. So # is required before #include .

Resources