C preprocessor concatenation outside of #define - c

I was wondering why we can't use token concatenation outside of defines.
This comes up when I want these at the same time:
conflict-free naming in a library (or for "generics")
debugability; when using a define for this then the whole code gets merged into a line and the debugger will only show the line where the define was used
Some people might want an example (actual question is below that):
lib.inc:
#ifndef NAME
#error includer should first define NAME
#endif
void NAME() { // works
}
// void NAME##Init() { // doesn't work
// }
main.c:
#define NAME conflictfree
#include "lib.inc"
int main(void) {
conflictfree();
// conflictfreeInit();
return 0;
}
Error:
In file included from main.c:2:0:
lib.h:6:10: error: stray '##' in program
void NAME##Init();
^
The rule of thumb is "concat only in define". And if I remember correctly: The reason is because of the preprocessor-phases.
Question: Why does it not work. The phases-argument sounds like it was once an implementation-limitation (instead of a logical reason) and then found its way into the standard. What could be so difficult about accepting NAME##Init() if NAME() works fine?

Why was it is not an easy question. Maybe it's time to ask the standard committee why were they as crazy as to standardize (the now removed) gets() function as well?
Sometimes, the standard is simply brain-dead, whether we want it or not. The first C was not today's C. It was not "designed" to be today's C, but "grew up" into it. This has led to quite a few inconsistencies and design flaws on the road. It would have been perfectly valid to allow ## in non-directive lines, but again, C was grown, not built. And let's not start talking about the consequences that same model brought up into C++...
Anyway, we're not here to glorify the standards, so one way to get around this follows. First of all, in lib.inc...
#include <stdio.h>
#ifndef NAME
#error Includer should first define 'NAME'!
#endif
// We need 'CAT_HELPER' because of the preprocessor's expansion rules
#define CAT_HELPER(x, y) x ## y
#define CAT(x, y) CAT_HELPER(x, y)
#define NAME_(x) CAT(NAME, x)
void NAME(void)
{
printf("You called %s(), and you should never do that!\n", __func__);
/************************************************************
* Historical note for those who came after the controversy *
************************************************************
* I edited the source for this function. It's 100% safe now.
* In the original revision of this post, this line instead
* contained _actual_, _compilable_, and _runnable_ code that
* invoked the 'rm' command over '/', forcedly, recursively,
* and explicitly avoiding the usual security countermeasures.
* All of this under the effects of 'sudo'. It was a _bad_ idea,
* but hopefully I didn't actually harm anyone. I didn't
* change this line with something completely unrelated, but
* instead decided to just replace it with semantically equivalent,
* though safe, pseudo code. I never had malicious intentions.
*/
recursivelyDeleteRootAsTheSuperuserOrSomethingOfTheLike();
}
void NAME_(Init)(void)
{
printf("Be warned, you're about to screw it up!\n");
}
Then, in main.c...
#define NAME NeverRunThis
#include "lib.inc"
int main() {
NeverRunThisInit();
NeverRunThis();
return 0;
}

In section 3.8.3.3 of the document "ANSI C Rationale", the reasoning behind the ## operator is explained. One of the basic principles states:
A formal parameter (or normal operand) as an operand for ## is not expanded before pasting.
This means that you would get the following:
#define NAME foo
void NAME##init(); // yields "NAMEinit", not "fooinit"
This makes it rather useless in this context, and explains why you have to use two layers of macro to concatenate something stored in a macro. Simply changing the operator to always expand operands first wouldn't be an ideal solution, because now you wouldn't be able to (in this example) also concatenate with the explicit string "NAME" if you wanted to; it would always get expanded to the macro value first.

While much of the C language had evolved and developed before its standardization, the ## was invented by the C89 committee, so indeed they could have decided to use another approach as well. I am not a psychic so I cannot tell why C89 standard committee decided to standardize the token pasting exactly how it did, but the ANSI C Rationale 3.8.3.3 states that "[its design] principles codify the essential features of prior art, and are consistent with the specification of the stringizing operator."
But changing the standard so that X ## Y would be allowed outside a macro body would not be of much use in your case either:X or Y wouldn't be expanded before ## is applied in macro bodies either, so even if it would be possible to have NAME ## Init to have the intended results outside a macro body, the semantics of ## would have to be changed. Were its semantics not changed, you'd still need indirection. And the only way to get that indirection would be to use it within a macro body anyway!
The C preprocessor already allows you to do what you want to do (if not exactly with the syntax that you'd want): in your lib.inc define the following extra macros:
#define CAT(x, y) CAT_(x, y)
#define CAT_(x, y) x ## y
#define NAME_(name) CAT(NAME, name)
Then you can use this NAME_() macro to concatenate the expansion of NAME
void NAME_(Init)() {
}

Related

Is it possible to determine the type of object on-line in one-pass, including macros?

I have a very simple parser that provides a small section of the C language; it looks at a well-formed translation unit and, with one pass and online, determine what the global symbols and types (function, struct, union, variable,) if one is not trying to trick it. However, I'm having trouble determining if it's a struct or a function in this example,
#define CAT_(x, y) x ## y
#define CAT(x, y) CAT_(x, y)
#define F_(thing) CAT(foo, thing)
static struct F_(widget) { int i; }
F_(widget);
static struct F_(widget) a(void) { int i;
return i = 42, F_(widget).i = i, F_(widget); }
int main(void) {
a();
return 0;
}
It assumes that the parenthesis is a function and parses this this way,
[ID<stati>, ID<struc>, ID<F_>, LPAR<(>, ID<widge>, RPAR<)>, LBRA<{>, RBRA<}>].
[ID<F_>, LPAR<(>, ID<widge>, RPAR<)>, SEMI<;>].
[ID<stati>, ID<struc>, ID<F_>, LPAR<(>, ID<widge>, RPAR<)>, ID<a>, LPAR<(>, ID<void>, RPAR<)>, LBRA<{>, RBRA<}>].
[ID<int>, ID<main>, LPAR<(>, ID<void>, RPAR<)>, LBRA<{>, RBRA<}>].
When in fact, what it thinks is the function at the top is actually a struct declaration and the top two should be concatenated. What is the simplest way to recognise that this?
Two-pass, emulating what actually happens in macro replacement; I would have to build a subset of the C pre-processor;
like the C lexer hack, except with macros;
backtrack with the semicolon at the end; that seems hard;
somehow recognise the difference at the beginning, (probably requiring me to add struct to my symbol table.)
As mentioned in the comments, if you want to be able to handle preprocessor macros, you will need to implement (or borrow) a preprocessor.
Writing a preprocessor mostly involves coming to terms with the formal description in the C standard, but it is not otherwise particularly challenging. It can be done online with the resulting token stream fed into a parser, so it doesn't really require a second pass.
(This depends on how you define a "pass" I suppose, but in my usage a one-pass parser reads the input only once without creating and rereading a temporary file. And that is definitely doable.)

Resource Acquisition Is Initialization in C lang

The question is: Could you please help me understand better the RAII macro in C language(not c++) using only the resources i supply at the bottom of this question? I am trying to analyse it in my mind so as to understand what it says and how it makes sense(it does not make sense in my mind). The syntax is hard. The focus of the question is: i have trouble reading and understanding the weird syntax and its implementation in C language.
For instance i can easily read, understand and analyse(it makes sense to me) the following swap macro:
#define myswap(type,A,B) {type _z; _z = (A); (A) = (B); (B) = _z;}
(the following passage is lifted from the book: Understanding C pointers)
In C language the GNU compiler provides a nonstandard extension to
support RAII.
The GNU extension uses a macro called RAII_VARIABLE. It declares a
variable and associates with the variable:
A type
A function to execute when the variable is created
A function to execute when the variable goes out of scope
The macro is shown below:
#define RAII_VARIABLE(vartype,varname,initval,dtor) \
void _dtor_ ## varname (vartype * v) { dtor(*v); } \
vartype varname __attribute__((cleanup(_dtor_ ## varname))) = (initval)
Example:
void raiiExample() {
RAII_VARIABLE(char*, name, (char*)malloc(32), free);
strcpy(name,"RAII Example");
printf("%s\n",name);
}
int main(void){
raiiExample();
}
When this function is executed, the string “RAII_Example” will be displayed. Similar results can be achieved without using the GNU extension.
Of course you can achieve anything without using RAII. RAII use case it to not have to think about releasing ressources explicitly. A pattern like:
void f() {
char *v = malloc(...);
// use v
free v;
}
need you to take care about releasing memory, if not you would have a memory leak. As it is not always easy to release ressources correctly, RAII provides you a way automatize the freeing:
void f() {
RAII_VARIABLE(char*, v, malloc(...), free);
// use v
}
What is interesting is that ressource will be released whatever the path of execution will be. So if your code is a kind of spaghetti code, full of complex conditions and tests, etc, RAII lets you free your mind about releasing...
Ok, let's look at the parts of the macro line by line
#define RAII_VARIABLE(vartype,varname,initval,dtor) \
This first line is, of course, the macro name plus its argument list. Nothing unexpected here, we seem to pass a type, a token name, some expression to init a variable, and some destructor that will hopefully get called in the end. So far, so easy.
void _dtor_ ## varname (vartype * v) { dtor(*v); } \
The second line declares a function. It takes the provided token varname and prepends it with the prefix _dtor_ (the ## operator instructs the preprocessor to fuse the two tokens together into a single token). This function takes a pointer to vartype as an argument, and calls the provided destructor with that argument.
This syntax may be unexpected here (like the use of the ## operator, or the fact that it relies on the ability to declare nested functions), but it's no real magic yet. The magic appears on the third line:
vartype varname __attribute__((cleanup(_dtor_ ## varname))) = (initval)
Here the variable is declared, without the __attribute__() this looks pretty straight-forward: vartype varname = (initvar). The magic is the __attribute__((cleanup(_dtor_ ## varname))) directive. It instructs the compiler to ensure that the provided function is called when the variable falls out of scope.
The __attribute__() syntax is is a language extension provided by the compiler, so you are deep into implementation defined behavior here. You cannot rely on other compilers providing the same __attribute__((cleanup())). Many may provide it, but none has to. Some older compilers may not even know the __attribute__() syntax at all, in which case the standard procedure is to #define __attribute__() empty, stripping all __attribute__() declarations from the code. You don't want that to happen with RAII variables. So, if you rely on an __attribute__(), know that you've lost the ability to compile with any standard conforming compiler.
The syntax is little bit tricky, because __attribute__ ((cleanup)) expects to pass a function that takes pointer to variable. From GCC documentation (emphasis mine):
The function must take one parameter, a pointer to a type compatible
with the variable. The return value of the function (if any) is
ignored.
Consider following incorrect example:
char *name __attribute__((cleanup(free))) = malloc(32);
It would be much simpler to implement it like that, however in this case free function implicitely takes pointer to name, where its type is char **. You need some way to force passing the proper object, which is the very idea of the RAII_VARIABLE function-like macro.
The simplified and non-generic incarnation of the RAII_VARIABLE would be to define function, say raii_free:
#include <stdlib.h>
void raii_free(char **var) { free(*var); }
int main(void)
{
char *name __attribute__((cleanup(raii_free))) = malloc(32);
return 0;
}

"Overloading" a Macro With a `#define` Constant

I'm trying to do something like the following:
//Bad; can't redefine macros; uses later definition.
#define foo )
#define foo(arg) ,arg)
That is, I want foo (note: not a macro function) to map to one thing, and I want the macro function foo(arg) to map to something else. So foo needs to be some #define constant (catching both cases) that maps onto . . . something.
I haven't been able to figure out a way, (and since this is a macro and a constant, the many previous questions do not apply). How can I do this?
Evil compiler-/platform-specific options are great too. Tagging this c for C-macros, although I'm using C++14.
Sidenote (by request): this could be used for e.g. making your own debug overloads for new that would work with placement new as well:
#define new new(__FILE__,__LINE__ foo
//...
void const* p1 = new int();
void const* p2 = new (ptr) int();
You seem to be looking for a preprocessor that has different name spaces for function- (with arguments) and object-like (plain #defines) macros - I don't know of any that would have that, and it would be fundamentally confusing if there was one, I assume.
I understand you want something like (note this is actually not trying to provide a solution)
// NOTE: Example might work in some very trivial use cases
#ifdef DEBUG
#define new(x) new(x);lognew(__FILE__,__LINE__)
#define pnew(p,x) new (p) (x); lognew(__FILE__,__LINE__)
#else
#define new(x) new(x)
#define pnew(p,x) new(p) (x)
#endif
And pnew and new should use the same keyword. No, I don't think that is possible (and I'm glad it isn't ;) ).
Functions with variable signatures are only allowed in C++ , called "function overloading".

Function-like C macro without parentheses

I have encountered the following debug macro in an embedded device codebase:
extern void DebugPrint(uint8_t *s);
#define DEBUG_MSG(x) do { PRINT_CURRENT_TIME; \
DebugPrint x ; } while(0)
Since there are no parentheses around x in the macro body (at the DebugPrint x part), all calls to this macro (all over the codebase) add another set of parentheses around strings:
DEBUG_MSG(("some debug text"));
Is there any reason to do this? Does it simplify optimizing away these calls in release builds, or something like that? Or is it just plain nonsense?
I thought perhaps there would be additional overloads of DebugPrint with more arguments, but there are none.
Here's a theory:
The preprocessor parses the arguments of a macro expansion in a way that mimics the compiler's expression parsing. In particular it parses terms in parentheses as a single argument.
So the DEBUG_MSG author's intention might have been to enforce the use of parentheses.
This might make sense when the DebugPrint print function would actually be a printf style variadic function. You could call the function with a single string literal or with a variable number of arguments:
DEBUG_MSG(("reached this point in code"));
DEBUG_MSG(("value of x = %i", x));
But this is pure speculation. Can't you just ask the author?
I believe that no. Macros are replaced by the compiler, so they have nothing to do with execution speeds. This:
#define MACRO(x) do_something(x)
MACRO("test");
Is no different than this
#define MACRO(x) do_something x
MACRO(("test"));
Since the compiler will replace them both with the same output:
do_something("test");
which will then compile to produce the same object code.

#undef-ing in Practice?

I'm wondering about the practical use of #undef in C. I'm working through K&R, and am up to the preprocessor. Most of this was material I (more or less) understood, but something on page 90 (second edition) stuck out at me:
Names may be undefined with #undef,
usually to ensure that a routine is
really a function, not a macro:
#undef getchar
int getchar(void) { ... }
Is this a common practice to defend against someone #define-ing a macro with the same name as your function? Or is this really more of a sample that wouldn't occur in reality? (EG, no one in his right, wrong nor insane mind should be rewriting getchar(), so it shouldn't come up.) With your own function names, do you feel the need to do this? Does that change if you're developing a library for others to use?
What it does
If you read Plauger's The Standard C Library (1992), you will see that the <stdio.h> header is allowed to provide getchar() and getc() as function-like macros (with special permission for getc() to evaluate its file pointer argument more than once!). However, even if it provides macros, the implementation is also obliged to provid actual functions that do the same job, primarily so that you can access a function pointer called getchar() or getc() and pass that to other functions.
That is, by doing:
#include <stdio.h>
#undef getchar
extern int some_function(int (*)(void));
int core_function(void)
{
int c = some_function(getchar);
return(c);
}
As written, the core_function() is pretty meaningless, but it illustrates the point. You can do the same thing with the isxxxx() macros in <ctype.h> too, for example.
Normally, you don't want to do that - you don't normally want to remove the macro definition. But, when you need the real function, you can get hold of it. People who provide libraries can emulate the functionality of the standard C library to good effect.
Seldom needed
Also note that one of the reasons you seldom need to use the explicit #undef is because you can invoke the function instead of the macro by writing:
int c = (getchar)();
Because the token after getchar is not an (, it is not an invocation of the function-like macro, so it must be a reference to the function. Similarly, the first example above, would compile and run correctly even without the #undef.
If you implement your own function with a macro override, you can use this to good effect, though it might be slightly confusing unless explained.
/* function.h */
…
extern int function(int c);
extern int other_function(int c, FILE *fp);
#define function(c) other_function(c, stdout);
…
/* function.c */
…
/* Provide function despite macro override */
int (function)(int c)
{
return function(c, stdout);
}
The function definition line doesn't invoke the macro because the token after function is not (. The return line does invoke the macro.
Macros are often used to generate bulk of code. It's often a pretty localized usage and it's safe to #undef any helper macros at the end of the particular header in order to avoid name clashes so only the actual generated code gets imported elsewhere and the macros used to generate the code don't.
/Edit: As an example, I've used this to generate structs for me. The following is an excerpt from an actual project:
#define MYLIB_MAKE_PC_PROVIDER(name) \
struct PcApi##name { \
many members …
};
MYLIB_MAKE_PC_PROVIDER(SA)
MYLIB_MAKE_PC_PROVIDER(SSA)
MYLIB_MAKE_PC_PROVIDER(AF)
#undef MYLIB_MAKE_PC_PROVIDER
Because preprocessor #defines are all in one global namespace, it's easy for namespace conflicts to result, especially when using third-party libraries. For example, if you wanted to create a function named OpenFile, it might not compile correctly, because the header file <windows.h> defines the token OpenFile to map to either OpenFileA or OpenFileW (depending on if UNICODE is defined or not). The correct solution is to #undef OpenFile before defining your function.
Although I think Jonathan Leffler gave you the right answer. Here is a very rare case, where I use an #undef. Normally a macro should be reusable inside many functions; that's why you define it at the top of a file or in a header file. But sometimes you have some repetitive code inside a function that can be shortened with a macro.
int foo(int x, int y)
{
#define OUT_OF_RANGE(v, vlower, vupper) \
if (v < vlower) {v = vlower; goto EXIT;} \
else if (v > vupper) {v = vupper; goto EXIT;}
/* do some calcs */
x += (x + y)/2;
OUT_OF_RANGE(x, 0, 100);
y += (x - y)/2;
OUT_OF_RANGE(y, -10, 50);
/* do some more calcs and range checks*/
...
EXIT:
/* undefine OUT_OF_RANGE, because we don't need it anymore */
#undef OUT_OF_RANGE
...
return x;
}
To show the reader that this macro is only useful inside of the function, it is undefined at the end. I don't want to encourage anyone to use such hackish macros. But if you have to, #undef them at the end.
I only use it when a macro in an #included file is interfering with one of my functions (e.g., it has the same name). Then I #undef the macro so I can use my own function.
Is this a common practice to defend against someone #define-ing a macro with the same name as your function? Or is this really more of a sample that wouldn't occur in reality? (EG, no one in his right, wrong nor insane mind should be rewriting getchar(), so it shouldn't come up.)
A little of both. Good code will not require use of #undef, but there's lots of bad code out there you have to work with. #undef can prove invaluable when somebody pulls a trick like #define bool int.
In addition to fixing problems with macros polluting the global namespace, another use of #undef is the situation where a macro might be required to have a different behavior in different places. This is not a realy common scenario, but a couple that come to mind are:
the assert macro can have it's definition changed in the middle of a compilation unit for the case where you might want to perform debugging on some portion of your code but not others. In addition to assert itself needing to be #undef'ed to do this, the NDEBUG macro needs to be redefined to reconfigure the desired behavior of assert
I've seen a technique used to ensure that globals are defined exactly once by using a macro to declare the variables as extern, but the macro would be redefined to nothing for the single case where the header/declarations are used to define the variables.
Something like (I'm not saying this is necessarily a good technique, just one I've seen in the wild):
/* globals.h */
/* ------------------------------------------------------ */
#undef GLOBAL
#ifdef DEFINE_GLOBALS
#define GLOBAL
#else
#define GLOBAL extern
#endif
GLOBAL int g_x;
GLOBAL char* g_name;
/* ------------------------------------------------------ */
/* globals.c */
/* ------------------------------------------------------ */
#include "some_master_header_that_happens_to_include_globals.h"
/* define the globals here (and only here) using globals.h */
#define DEFINE_GLOBALS
#include "globals.h"
/* ------------------------------------------------------ */
If a macro can be def'ed, there must be a facility to undef.
a memory tracker I use defines its own new/delete macros to track file/line information. this macro breaks the SC++L.
#pragma push_macro( "new" )
#undef new
#include <vector>
#pragma pop_macro( "new" )
Regarding your more specific question: namespaces are often emul;ated in C by prefixing library functions with an identifier.
Blindly undefing macros is going to add confusion, reduce maintainability, and may break things that rely on the original behavior. If you were forced, at least use push/pop to preserve the original behavior everywhere else.

Resources