I'm thinking about best way to write C define processor that would be able to handle macros. Unfortunately nothing intelligent comes to my mind.
It should behave exactly like one in C, so it handles expressions like this:
#define max(a, b) (a > b ? a : b)
printf("%d\n", max(a, b));
Or this:
#define F 10
#define max(a, b) (a > b ? a : b)
printf("%d\n", max(a, F));
I know about install and lookup functions from K&R2, what else do I need for
replacing text inside parenthesis?
Does anyone have any advice or some pseudo-code maybe?
I know it's complex task, but still, what would be best possible way to do it?
Macro processors are very interesting but can became a difficult beast to tame (think about recursive expansions, for example).
You can look at the implementation of already existing macro processors like M4 (http://www.scs.stanford.edu/~reddy/links/gnu/m4.pdf).
In very general terms you will need:
a parser that will first extract the macro definitions from your files (deleting them from the file, of course)
another parser that identify where macros need to be expanded and performs the expansion (e.g. you will want to skip strings and comments!)
I think it's a very interesting exercise. The proper data structure to handle all this is not trivial.
This is a pattern matching problem, you should take a look at regular expressions to start with, then when you've grasped the theory on that you could move on to reading about lexers.
A regular expression is basically matching a string to a predefined pattern.
Some regexp (short for regular expression) software/libraries:
- Boost.Regexp
- GNU C library regexp
- PCRE
And a lexer is a piece of software that does something with the matched text, for example, replacing that piece of text with some other piece of text, basically what you seem to need.
Some known lexers:
- flex
- Boost.Wave
2 suggestions:
use boost wave (http://www.boost.org/doc/libs/1_40_0/libs/wave/index.html)
use the preprocessor that comes with your compiler
ie "don't try this at home".
Related
I'm currently using the __COUNTER__ macro in my C library code to generate unique integer identifiers. It works nicely, but I see two issues:
It's not part of any C or C++ standard.
Independent code that also uses __COUNTER__ might get confused.
I thus wish to implement an equivalent to __COUNTER__ myself.
Alternatives that I'm aware of, but do not want to use:
__LINE__ (because multiple macros per line wouldn't get unique ids)
BOOST_PP_COUNTER (because I don't want a boost dependency)
BOOST_PP_COUNTER proves that this can be done, even though other answers claim it is impossible.
In essence, I'm looking for a header file "mycounter.h", such that
#include "mycounter.h"
__MYCOUNTER__
__MYCOUNTER__ __MYCOUNTER__
__MYCOUNTER__
will be preprocessed by gcc -E to
(...)
0
1 2
3
without using the built-in __COUNTER__.
Note: Earlier, this question was marked as a duplicate of this, which deals with using __COUNTER__ rather than avoiding it.
You can't implement __COUNTER__ directly. The preprocessor is purely functional - no state changes. A hidden counter is inherently impossible in such a system. (BOOST_PP_COUNTER does not prove what you want can be done - it relies on #include and is therefore one-per-line only - may as well use __LINE__. That said, the implementation is brilliant, you should read it anyway.)
What you can do is refactor your metaprogram so that the counter could be applied to the input data by a pure function. e.g. using good ol' Order:
#include <order/interpreter.h>
#define ORDER_PP_DEF_8map_count \
ORDER_PP_FN(8fn(8L, 8rec_mc(8L, 8nil, 0)))
#define ORDER_PP_DEF_8rec_mc \
ORDER_PP_FN(8fn(8L, 8R, 8C, \
8if(8is_nil(8L), \
8R, \
8let((8H, 8seq_head(8L)) \
(8T, 8seq_tail(8L)) \
(8D, 8plus(8C, 1)), \
8if(8is_seq(8H), \
8rec_mc(8T, 8seq_append(8R, 8seq_take(1, 8L)), 8C), \
8rec_mc(8T, 8seq_append(8R, 8seq(8C)), 8D) )))))
ORDER_PP (
8map_count(8seq( 8seq(8(A)), 8true, 8seq(8(C)), 8true, 8true )) //((A))(0)((C))(1)(2)
)
(recurses down the list, leaving sublist elements where they are and replacing non-list elements - represented by 8false - with an incrementing counter variable)
I assume you don't actually want to simply drop __COUNTER__ values at the program toplevel, so if you can place the code into which you need to weave __COUNTER__ values inside a wrapper macro that splits it into some kind of sequence or list, you can then feed the list to a pure function similar to the example.
Of course a metaprogramming library capable of expressing such code is going to be significantly less portable and maintainable than __COUNTER__ anyway. __COUNTER__ is supported by Intel, GCC, Clang and MSVC. (not everyone, e.g. pcc doesn't have it, but does anyone even use that?) Arguably if you demonstrate the feature in use in real code, it makes a stronger case to the standardisation committee that __COUNTER__ should become part of the next C standard.
You are confusing two different things:
1 - the preprocessor which handles#define and #include like stuff. It does only works as the text (meaning character sequences) level and has very few computing capabilities. It is so limited that it cannot implement __COUNTER__. The preprocessor work consist only in macro expansion and file replacement. The crucial point it that it occur before the compilation even start.
2 - the C++ language and in particular the template (meta)programming language which can be used to compute stuff during the compilation phase. It is indeed turing complete but as I already said compilation start after preprocessing.
So what you are asking is not doable in standard C or C++. To solve this problem boost implement its own preprocessor which is not standard compliant and has much more computing capabilities. In particular it is possible to use build an analogue to __counter__ with it.
This small header of mine contains an own implementation of a C preprocessor counter (it uses a slightly different syntax).
I would like to be able to write preprocessor macros using a more fully fledged language. Such a language would ideally include the following features:
boolean and natural arithmetic and comparisons
branching based on comparisons
list representation
recursion
variable binding
functions as first class values and higher order functions
partial function application and currying
functional primitives, such as map and fold
useful functions/structures for common code generation tasks
Is this possible to implement within the C preprocessor?
Incredibly, the answer is yes! The Order header-only library provides a set of macros that implement a functional language inside the C preprocessor. It includes all of the specified features and more. You can use it as long as your C preprocessor is nearly completely C99 compliant. The GNU CPP (as used in GCC and G++) is compatible, as is the Boost Wave preprocessor. Order has been around since 2004. Although it is no longer maintained, it is very full featured, if not fully documented.
Here is a simple example use of Order:
#define AVERAGE(...) ((ORDER_PP( \
8seq_for_each_with_delimiter( \
8put, \
8emit(8quote(+)), \
8tuple_to_seq(8quote((__VA_ARGS__)))))) / \
ORDER_PP(8to_lit(8tuple_size(8quote((__VA_ARGS__))))))
The macro AVERAGE expands to an expression expressing the mean of the provided arguments. AVERAGE(a, b, c) (for example) expands to ((a + b + c) / 3). This is a very simple example that does not use all of Order's features.
Another simple example, showing use of pre-compilation arithmetic (using an arbitrary precision natural number representation), functions as first class values (see the use of 8plus) and variable binding, is a macro for computing (integer arithmetic) averages in the preprocessor:
#define AVERAGE_LITERAL(...) ORDER_PP(\
8let((8A, 8quote((__VA_ARGS__))), \
8to_lit(8quotient( \
8seq_fold(8plus, 0, 8tuple_to_seq(8A)), \
8tuple_size(8A)))))
AVERAGE_LITERAL(5, 6, 8, 9) expands to 7.
I have only touched upon a few of the features. More practical examples are provided in the accompanying documentation and tutorial, including those showing how Order can help remove practically all tedious code repetition.
Order is very powerful, and it is still very relevant to C++ programmers - templates and inlining can only solve some problems. Order solves most of the others. The only inherent limitations I can find are the inability to manipulate string literals or manipulate tokens in any way other than replacing or concatenating, as this is imposed by the C preprocessor.
I have some experience in programming in C but I would not dare to call myself proficient.
Recently, I encountered the following macro:
#define CONST(x) (x)
I find it typically used in expressions like for instance:
double x, y;
x = CONST(2.0)*y;
Completely baffled by the point of this macro, I extensively researched the advantages/disadvantages and properties of macros but still I can not figure out what the use of this particular macro would be. Am I missing something?
As presented in the question, you are right that the macro does nothing.
This looks like some artificial structure imposed by whoever wrote that code, maybe to make it abundantly clear where the constants are, and be able to search for them? I could see the advantage in having searchable constants, but this is not the best way to achieve that goal.
It's also possible that this was part of some other macro scheme that either never got implemented or was only partially removed.
Some (old) C compilers do not support the const keyword and this macro is most probably a reminiscence of a more elaborate sequence of macros that handled different compilers. Used like in x = CONST(2.0)*y; though makes no sense.
You can check this section from the Autoconf documentation for more details.
EDIT: Another purpose of this macro might be custom preprocessing (for extracting and/or replacing certain constants for example), like Qt Framework's Meta Object Compiler does.
There is absolutely no benefit of that macro and whoever wrote it must be confused. The code is completely equivalent to x = 2.0*y;.
Well this kind of macro could actually be usefull when there is a need to workaround the macro expansion.
A typical example of such need is the stringification macro. Refer to the following question for an example : C Preprocessor, Stringify the result of a macro
Now in your specific case, I don't see the benefit appart from extreme documention or code parsing purposes.
Another use could be to reserve those values as future function invocations, something like this:
/* #define CONST(x) (x) */
#define CONST(x) some_function(x)
// ...
double x, y;
x = CONST(2.0)*y; // x = some_function(2.0)*y;
Another good thing about this macro would be something like this
result=CONST(number+number)*2;
or something related to comparisons
result=CONST(number>0)*2;
If there is some problem with this macro, it is probably the name. This "CONST" thing isn't related with constants but with some other thing. It would be nice to look for the rest of the code to know why the author called it CONST.
This macro does have the effect of wrapping parenthesis around x during the macro expansion.
I'm guessing someone is trying to allow for something along the lines of
CONST(3+2)*y
which, without the parens, would become
3+2*y
but with the parens becomes
(3+2)*y
I seem to recall that we had the need for something like this in a previous development lifetime.
Ignoring that there are sometimes better non-macro ways to do this (I have good reasons, sadly), I need to write a big bunch of generic code using macros. Essentially a macro library that will generate a large number of functions for some pre-specified types.
To avoid breaking a large number of pre-existing unit tests, one of the things the library must do is, for every type, generate the name of that type in all caps for printing. E.g. a type "flag" must be printed as "FLAG".
I could just manually write out constants for each type, e.g.
#define flag_ALLCAPSNAME FLAG
but this is not ideal. I'd like to be able to do this programatically.
At present, I've hacked this together:
char capname_buf[BUFSIZ];
#define __MACRO_TO_UPPERCASE(arg) strcpy(capname_buf, arg); \
for(char *c=capname_buf;*c;c++)*c = (*c >= 'a' && *c <= 'z')? *c - 'a' + 'A': *c;
__MACRO_TO_UPPERCASE(#flag)
which does what I want to some extent (i.e. after this bit of code, capname_buf has "FLAG" as its contents), but I would prefer a solution that would allow me to define a string literal using macros instead, avoiding the need for this silly buffer.
I can't see how to do this, but perhaps I'm missing something obvious?
I have a variadic foreach loop macro written (like this one), but I can't mutate the contents of the string literal produced by #flag, and in any case, my loop macro would need a list of character pointers to iterate over (i.e. it iterates over lists, not over indices or the like).
Thoughts?
It is not possible in portable C99 to have a macro which converts a constant string to all uppercase letters (in particular because the notion of letter is related to character encoding. An UTF8 letter is not the same as an ASCII one).
However, you might consider some other solutions.
customize your editor to do that. For example, you could write some emacs code which would update each C source file as you require.
use some preprocessor on your C source code (perhaps a simple C code generator script which would emit a bunch of #define in some #include-d file).
use GCC extensions to have perhaps
#define TO_UPPERCASE_COUNTED(Str,Cnt)
#define TO_UPPERCASE(Str) TO_UPPERCASE_COUNTED(Str,__COUNT__) {( \
static char buf_##Cnt[sizeof(Str)+4]; \
char *str_##Cnt = Str; \
int ix_##Cnt = 0; \
for (; *str_##Cnt; str_##Cnt++, ix_##Cnt++) \
if (ix_##Cnt < sizeof(buf_##Cnt)-1) \
buf_##Cnt[ix_##Cnt] = toupper(*str_##Cnt); \
buf_##Cnt; )}
customize GCC, perhaps using MELT (a domain specific language to extend GCC), to provide your __builtin_capitalize_constant to do the job (edit: MELT is now an inactive project). Or code in C++ your own GCC plugin doing that (caveat, it will work with only one given GCC version).
It's not possible to do this entirely using the c preprocessor. The reason for this is that the preprocessor reads the input as (atomic) pp-tokens from which it composes the output. There's no construct for the preprocessor to decompose a pp-token into individual characters in any way (no one that would help you here anyway).
In your example when the preprocessor reads the string literal "flag" it's to the preprocessor basically an atomic chunk of text. It have constructs to conditionally remove such chunks or glue them together into larger chunks.
The only construct that allows you in some sense to decompose a pp-token is via some expressions. However these expressions only can work on arithmetic types which is why they won't help you here.
Your approach circumvents this problem by using C language constructs, ie you do the conversion at runtime. The only thing the preprocessor does then is to insert the C code to convert the string.
In a previous question what I thought was a good answer was voted down for the suggested use of macros
#define radian2degree(a) (a * 57.295779513082)
#define degree2radian(a) (a * 0.017453292519)
instead of inline functions. Please excuse the newbie question, but what is so evil about macros in this case?
Most of the other answers discuss why macros are evil including how your example has a common macro use flaw. Here's Stroustrup's take: http://www.research.att.com/~bs/bs_faq2.html#macro
But your question was asking what macros are still good for. There are some things where macros are better than inline functions, and that's where you're doing things that simply can't be done with inline functions, such as:
token pasting
dealing with line numbers or such (as for creating error messages in assert())
dealing with things that aren't expressions (for example how many implementations of offsetof() use using a type name to create a cast operation)
the macro to get a count of array elements (can't do it with a function, as the array name decays to a pointer too easily)
creating 'type polymorphic' function-like things in C where templates aren't available
But with a language that has inline functions, the more common uses of macros shouldn't be necessary. I'm even reluctant to use macros when I'm dealing with a C compiler that doesn't support inline functions. And I try not to use them to create type-agnostic functions if at all possible (creating several functions with a type indicator as a part of the name instead).
I've also moved to using enums for named numeric constants instead of #define.
There's a couple of strictly evil things about macros.
They're text processing, and aren't scoped. If you #define foo 1, then any subsequent use of foo as an identifier will fail. This can lead to odd compilation errors and hard-to-find runtime bugs.
They don't take arguments in the normal sense. You can write a function that will take two int values and return the maximum, because the arguments will be evaluated once and the values used thereafter. You can't write a macro to do that, because it will evaluate at least one argument twice, and fail with something like max(x++, --y).
There's also common pitfalls. It's hard to get multiple statements right in them, and they require a lot of possibly superfluous parentheses.
In your case, you need parentheses:
#define radian2degree(a) (a * 57.295779513082)
needs to be
#define radian2degree(a) ((a) * 57.295779513082)
and you're still stepping on anybody who writes a function radian2degree in some inner scope, confident that that definition will work in its own scope.
For this specific macro, if I use it as follows:
int x=1;
x = radian2degree(x);
float y=1;
y = radian2degree(y);
there would be no type checking, and x,y will contain different values.
Furthermore, the following code
float x=1, y=2;
float z = radian2degree(x+y);
will not do what you think, since it will translate to
float z = x+y*0.017453292519;
instead of
float z = (x+y)+0.017453292519;
which is the expected result.
These are just a few examples for the misbehavior ans misuse macros might have.
Edit
you can see additional discussions about this here
if possible, always use inline function. These are typesafe and can not be easily redefined.
defines can be redfined undefined, and there is no type checking.
Macros are relatively often abused and one can easily make mistakes using them as shown by your example. Take the expression radian2degree(1 + 1):
with the macro it will expand to 1 + 1 * 57.29... = 58.29...
with a function it will be what you want it to be, namely (1 + 1) * 57.29... = ...
More generally, macros are evil because they look like functions so they trick you into using them just like functions but they have subtle rules of their own. In this case, the correct way would be to write it would be (notice the paranthesis around a):
#define radian2degree(a) ((a) * 57.295779513082)
But you should stick to inline functions. See these links from the C++ FAQ Lite for more examples of evil macros and their subtleties:
inline vs. macros
macros containing if
macros with multiple lines
macros used to paste two tokens together
The compiler's preprocessor is a finnicky thing, and therefore a terrible candidate for clever tricks. As others have pointed out, it's easy to for the compiler to misunderstand your intention with the macro, and it's easy for you to misunderstand what the macro will actually do, but most importantly, you can't step into macros in the debugger!
Macros are evil because you may end up passing more than a variable or a scalar to it and this could resolve in an unwanted behavior (define a max macro to determine max between a and b but pass a++ and b++ to the macro and see what happens).
If your function is going to be inlined anyway, there is no performance difference between a function and a macro. However, there are several usability differences between a function and a macro, all of which favor using a function.
If you build the macro correctly, there is no problem. But if you use a function, the compiler will do it correctly for you every time. So using a function makes it harder to write bad code.