Unit Testing a Function Macro - c

I'm writing unit tests for some function macros, which are just wrappers around some function calls, with a little housekeeping thrown in.
I've been writing tests all morning and I'm starting to get tedium of the brainpan, so this might just be a case of tunnel vision, but:
Is there a valid case to be made for unit testing for macro expansion? By that I mean checking that the correct function behavior is produced for the various source code forms of the function macro's arguments. For example, function arguments can take the form, in source code of a:
literal
variable
operator expression
struct member access
pointer-to-struct member access
pointer dereference
array index
function call
macro expansion
(feel free to point out any that I've missed)
If the macro doesn't expand properly, then the code usually won't even compile. So then, is there even any sensible point in a different unit test if the argument was a float literal or a float variable, or the result of a function call?
Should the expansion be part of the unit test?

As I noted in a comment:
Using expressions such as value & 1 could reveal that the macros are careless, but code inspections can do that too.
I think going through the full panoply of tests is overkill; the tedium is giving you a relevant warning.
There is an additional mode of checking that might be relevant, namely side-effects such as: x++ + ++y as an argument. If the argument to the macro is evaluated more than once, the side-effects will probably be scrambled, or at least repeated. An I/O function (getchar(), or printf("Hello World\n")) as the argument might also reveal mismanagement of arguments.
It also depends in part on the rules you want to apply to the macros. However, if they're supposed to look like and behave like function calls, they should only evaluate arguments once (but they should evaluate each argument — if the macro doesn't evaluate an argument at all, then the side-effects won't occur that should occur (that would occur if the macro was really a function).
Also, don't underestimate the value of inline functions.

Based on the comments and some of the points made in #Jonathan Leffler's answer, I've come to the conclusion that this is something that is better tested in functional testing, preferably with a fuzz tester.
That way, using a couple of automation scripts, the fuzzer can throw a jillion arguments at the function macro and log those that either don't compile, produce compiler warnings, or compile and run, but produce the incorrect result.
Since fuzz tests aren't supposed to run quickly (like unit tests), there's no problem just adding it to the fuzz suite and letting it run over the weekend.

The goal of testing is to find errors. And, your macro definitions can contain errors. Therefore, there is a case for testing macros in general, and unit-testing in particular can find many specific errors, as will be explained below.
Code inspection can obviously also be used to find errors, however, there are good points in favor of doing both: Unit-tests can cheaply be repeated whenever the respective code is modified, say, for reactoring.
Code inspections can not cheaply be repeated (at least they cause more effort than re-running unit-tests), but they also can find other points that tests can never detect, like, wrong or bad documentation, design issues like code duplication etc.
That said, there are a number of issues you can find when unit-testing macros, some of which were already mentioned. And, it may in principle be possible that there are fuzz testers which also check for such problems, but I doubt that problems with macro definitions are already in focus of fuzz-testers:
wrong algorithm: Expressions and statements in macro definitions can just be as wrong as they can be in normal non-macro code.
unsufficient parenthesization (as mentioned in the comments): This is a potential problem with macros, and it can be detected, possibly even at compile time, by passing expressions with operators with low precedence as macro arguments. For example, calling FOO(x = 2) in test code will lead to a compile error if FOO(a) is defined as (2 * a) instead of (2 * (a)).
unintended multiple use of arguments in the expansion (as mentioned by Jonathan): This also is a potential problem specific to macros. It should be part of the specification of a macro how often its arguments will be evaluated in the expanded code (and sometimes there can no fixed number be given, see assert). Such statements about how often an argument will be evaluated can be tested by passing macro arguments with side effects that can afterwards be checked by the test code. For example, if FOO(a) is defined to be ((a) * (a)), then the call FOO(++x) will result in x being incremented twice rather than once.
unintended expansion: Sometimes a macro shall expand in a way that causes no code to be produced. assert with NDEBUG is an example here, which shall expand such that the expanded code will be optimized away completely. Whether a macro shall expand in such a way typically depends on configuration macros. To check that a macro actually 'disappears' for the respective configuration, syntactically wrong macro arguments can be used: FOO(++ ++) for example can be a compile-time test to see if instead of the empty expansion one of the non-empty expansions was used (whether this works, however, depends on whether the non-empty expansions use the argument).
bad semicolon: to ensure that a function like macro expands cleanly into a compound statement (with proper do-while(0) wrapper but without trailing semicolon), a compile time check like if (0) FOO(42); else .. can be used.
Note: Those tests I mentioned to be compile-time tests are, strictly speaking, just some form of static analysis. In contrast to using a static analysis tool, such tests have the benefit to specifically test those properties that the macros are expected to have according to their design. Like, static analysis tools typically issue warnings when macro arguments are used without parentheses in the expansion - however, in many expansions parentheses are intentionally not used.

Related

C macro or GCC directive to consume next code statement

Is there a C macro, a GCC directive or pragma, to consume the next code statement following the macro, without explicitly passing it as an argument to the macro?
Something like this:
#define CONSUME_NEXT_IF(a) if (a) { (<next>) } else;
And I would use it as:
CONSUME_NEXT_IF(a) stmt1;
And expect it to expand to:
if (a) stmt1;
else;
I am using an if statement here just as an example. The conditional statement isn't the point, rather the ability to consume stmt1 by the macro without actually passing it as an argument.
#define CONSUME_NEXT_IF(a) if (!(a)) {} else
will achieve the effect of only executing the "next statement" (between use of the macro and next ;) if a is true (or non-zero). If you have suitable constraints on what type of expression a is, you might be able to remove the () on (a).
Personally, although you've explained in comments that you want a similar effect to annotations, I consider this will introduce more maintenance concerns - including code obfuscation - than it alleviates. Particularly if it interacts with other macros being used in a or stmt1.
And, of course, it would be necessary to modify your "large code base" to use the macro.
This also leaves dead code in your executable - it doesn't stop code for stmt1 being emitted to the executable (unless a has a fixed compile-time value, and your compiler has capability to detect and optimise code in such circumstances). Therefore such a construct will mean you cannot satisfy requirements of several assurance standards that require prevention of dead code.

What is better: function or define

I have couple of simple functions like
#define JacobiLog(x1,x2) ((x1>x2)?x1:x2)+log(1+exp(-fabs(x1-x2)))
What is better to implement (code, compile, memory...) - as above with define or to write some simple function
double JacobiLog(double x1,double x2)
{
return ((x1>x2) ? x1 : x2) + log(1+exp(-fabs(x1-x2)));
}
The compiler will probably automatically set your function as inline. You should use it and not a define.
It will also avoid unexpected comportment in the case where you use your define as
double num = JacobiLog(x++, y++);
I let you imagine the problem with code replacement...
define can possibly be little faster, but most probably compiler will inline the function anyway (or you can mark as inline) and they will be the same. But function is better, because it is more readable and easier to debug.
The function is better, assuming a good compiler.
With the function, it is left to the compiler whether the code is inlined, or not (assuming the definition of the function is accessible to everyone who uses it, for example if it is an inline function declared in a header for C++, or just a plain function with all of its users in the same translation unit). With the macro, it is always inlined, which is not necessarily faster, as it may lead to code bloat and therefore more cache misses and page faults.
Not to mention macros are difficult to read and, even worse, to debug.
Even though the 'define' is faster (since it prevents a function call), the compiler can optimize and inline your function, and make it as fast.
If you are in a c++ environment, you should always use template and functions. It will make you're program more readable and prevent type error.
In C, macro can be useful since the type is not specified (see example below):
/* Will work with int, long, double, short, etc. */
#HIGHER(VAL1, VAL2) ((VAL1) > (VAL2) ? (VAL1) : (VAL2))
It's a micro-optimization. Unless you're doing embedded programming and every instruction counts, go with the function. Not to mention that the log is likely about 100x slower than the overhead to call a function. So you can only get about a 1% saving if your program consists mainly of calling this function. [1] Once your program starts doing significant other things, this saving will be reduced to basically unnoticeable.
The compiler is free to inline the function wherever possible, which would make the two identical. However, you can't force the compiler to do so. There is an inline keyword in C++, but this is just a hint, the compiler is free to ignore it.
See this for some differences between the two (this covers inline versus non-inline functions, but, as stated above, inline functions are essentially the same as #define's). The basic conclusion to the link is "it depends".
Also note that, behaviourally, a #define and a function are not 100% equivalent.
[1]: Figures largely made up. Benchmark if you want accurate results.
First (for a complete answer) we have to acknowledge that using a macro can have surprise side-effects which you might not intend, and that a function ensures that you know the incoming types and you know that each parameter is evaluated exactly once.
More often than not, these effects of using a macro are a source of problems.
Generally a compiler will inline the function as appropriate, and if it does its job right then it should have nearly all the advantages of a macro but without the rarely-intended side-effects.
Occasionally, though, you can actually get some benefits that an inlining compiler mightn't recognise. For example your macro will temporarily defer converting the arguments to double if they were int or long and perform more operations in integer arithmetic (which might have a performance or precision advantage). You might also get integer overflow and incorrect results.
Since you included 'memory' in your list of "better" factors, it's tempting to say that the function is smaller (assuming you configure your compiler to optimise for size), but this isn't necessarily true.
Obviously as a function you need only one copy of it in memory and all callers can use that same code, whereas inlined or expanded at every use duplicates the code. Your compiler is very unlikely to isolate a macro and convert it into a function called from many different places in the code.
Where a never-inlined function can fail to be smaller is where it stands in the way of simplifications. There are three common cases I can think of:
If all of the uses of the function involve constant parameters, the inlined simplifications might come out smaller than the whole original function.
The register marshalling code required to execute a function call with the parameters in the correct registers can be longer than the function itself.
Adding a function call can add to the register pressure in the caller, forcing it to generate more complicated code, possibly forcing it to create a stack frame and save more registers on entry and exit.

Parsing C files without preprocessing it

I want to run simple analysis on C files (such as if you call foo macro with INT_TYPE as argument, then cast the response to int*), I do not want to prerprocess the file, I just want to parse it (so that, for instance, I'll have correct line numbers).
Ie, I want to get from
#include <a.h>
#define FOO(f)
int f() {FOO(1);}
an list of tokens like
<include_directive value="a.h"/>
<macro name="FOO"><param name="f"/><result/></macro>
<function name="f">
<return>int</return>
<body>
<macro_call name="FOO"><param>1</param></macro_call>
</body>
</function>
with no need to set include path, etc.
Is there any preexisting parser that does it? All parsers I know assume C is preprocessed. I want to have access to the macros and actual include instructions.
Our C Front End can parse code containing preprocesser elements can do this to fair extent and still build a usable AST. (Yes, the parse tree has precise file/line/column number information).
There are a number of restrictions, which allows it to handle most code. In those few cases it cannot handle, often a small, easy change to the source file giving equivalent code solves the problem.
Here's a rough set of rules and restrictions:
#includes and #defines can occur wherever a declaration or statement can occur, but not in the middle of a statement. These rarely cause a problem.
macro calls can occur where function calls occur in expressions, or can appear without semicolon in place of statements. Macro calls that span non-well-formed chunks are not handled well (anybody surprised?). The latter occur occasionally but not rarely and need manual revision. OP's example of "j(v,oid)*" is problematic, but this is really rare in code.
#if ... #endif must be wrapped around major language concepts (nonterminals) (constant, expression, statement, declaration, function) or sequences of such entities, or around certain non-well-formed but commonly occurring idioms, such as if (exp) {. Each arm of the conditional must contain the same kind of syntactic construct as the other arms. #if wrapped around random text used as bad kind of comment is problematic, but easily fixed in the source by making a real comment. Where these conditions are not met, you need to modify the original source code, often by moving the #if #elsif #else #end a few tokens.
In our experience, one can revise a code base of 50,000 lines in a few hours to get around these issues. While that seems annoying (and it is), the alternative is to not be able to parse the source code at all, which is far worse than annoying.
You also want more than just a parser. See Life After Parsing, to know what happens after you succeed in getting a parse tree. We've done some additional work in building symbol tables in which the declarations are recorded with the preprocessor context in which they are embedded, enabling type checking to include the preprocessor conditions.
You can have a look at this ANTLR grammar. You will have to add rules for preprocessor tokens, though.
Your specific example can be handled by writing your own parsing and ignore macro expansion.
Because FOO(1) itself can be interpreted as a function call.
When more cases are considered however, the parser is much more difficult. You can refer PDF Link to find more information.

Using a macro for a small operation, is this good practice?

I have a small piece of code that requires to read 4-bit values bitpacked in a 32-bit integer. Since I need to call this operation several times, even if it's simple, I require max speed on it.
I was pondering about macros and inline functions, thus I made this macro:
#define UI32TO4(x, p) (x >> ((p - 1) *4) & 15)
And I have an inline function that does the same thing.
static inline Uint8 foo_getval(Uint32 bits, int pos){
return (bits >> ((pos-1)*4)) & 15;
}
Considering the simplicity of the operation, and that the values are already prepared for this call (so no possibility of calling on the wrong types, or pass values that are too big or that stuff), what would be the best one to use? Or, at least, the most comprehensible for someone else potentially reading/modifying the code later on?
EDIT! Forgot to mention, I am using C99.
The function is safer. Your assumptions that the values are always "right" only holds while you're developing that code. You can't tell if someone down the line (or yourself when you're tired) won't pass unexpected values.
The compiler will do the inlining when it sees it as effective. Use type-safe functions whenever you can, use macros only when you have no other practical choice.
I would use the inline function because macros can cause unwanted side effects. Use macros only to save typing if necessary.
If a macro name is the same name as a function name in an other compilation unit you would get strange compilation errors. These problems can be hard to find, especially if the macro is expanded elsewhere and no error occurs.
Additionally a function warns you about parameter types and would not let you give a double for pos. The macro could allow this.
It's late, and I'm grumpy (and I'll probably delete this post later) but I get tired of hearing the same arguments against macros parroted over and over again (a double redundacy):
Joachim Pileborg (above) states "using a function allows the compiler to do better typechecking". This is often stated, but I don't believe it. With macros, the compiler already has all the available type information at its fingertips. Functions simply destroy this. (And possibly destroy optimization, by pushing registers out to the stack, but that's a side issue.)
And frast (above) states "macros can cause unwanted side effects". True--but so can functions. I think the rule is to always use UPPER_CASE for macros which don't have function semantics. This rule has often been broken. But it doesnt apply here: the OP has redundantly used both uppercase and function semantics.
But I would suggest a tiny improvement. The OP has quite correctly placed parentheses around the whole macro, but there should also be parentheses around each argument:
#define UI32TO4(x, p) ((x) >> (((p) - 1) * 4) & 15)
Always enclose your macro args in parentheses, unless you are doing string or token concatenting, etc.
Macros are, of course, dangerous, but so are functions. (And the less said of STL, the better).

When should you use macros instead of inline functions?

In a previous question what I thought was a good answer was voted down for the suggested use of macros
#define radian2degree(a) (a * 57.295779513082)
#define degree2radian(a) (a * 0.017453292519)
instead of inline functions. Please excuse the newbie question, but what is so evil about macros in this case?
Most of the other answers discuss why macros are evil including how your example has a common macro use flaw. Here's Stroustrup's take: http://www.research.att.com/~bs/bs_faq2.html#macro
But your question was asking what macros are still good for. There are some things where macros are better than inline functions, and that's where you're doing things that simply can't be done with inline functions, such as:
token pasting
dealing with line numbers or such (as for creating error messages in assert())
dealing with things that aren't expressions (for example how many implementations of offsetof() use using a type name to create a cast operation)
the macro to get a count of array elements (can't do it with a function, as the array name decays to a pointer too easily)
creating 'type polymorphic' function-like things in C where templates aren't available
But with a language that has inline functions, the more common uses of macros shouldn't be necessary. I'm even reluctant to use macros when I'm dealing with a C compiler that doesn't support inline functions. And I try not to use them to create type-agnostic functions if at all possible (creating several functions with a type indicator as a part of the name instead).
I've also moved to using enums for named numeric constants instead of #define.
There's a couple of strictly evil things about macros.
They're text processing, and aren't scoped. If you #define foo 1, then any subsequent use of foo as an identifier will fail. This can lead to odd compilation errors and hard-to-find runtime bugs.
They don't take arguments in the normal sense. You can write a function that will take two int values and return the maximum, because the arguments will be evaluated once and the values used thereafter. You can't write a macro to do that, because it will evaluate at least one argument twice, and fail with something like max(x++, --y).
There's also common pitfalls. It's hard to get multiple statements right in them, and they require a lot of possibly superfluous parentheses.
In your case, you need parentheses:
#define radian2degree(a) (a * 57.295779513082)
needs to be
#define radian2degree(a) ((a) * 57.295779513082)
and you're still stepping on anybody who writes a function radian2degree in some inner scope, confident that that definition will work in its own scope.
For this specific macro, if I use it as follows:
int x=1;
x = radian2degree(x);
float y=1;
y = radian2degree(y);
there would be no type checking, and x,y will contain different values.
Furthermore, the following code
float x=1, y=2;
float z = radian2degree(x+y);
will not do what you think, since it will translate to
float z = x+y*0.017453292519;
instead of
float z = (x+y)+0.017453292519;
which is the expected result.
These are just a few examples for the misbehavior ans misuse macros might have.
Edit
you can see additional discussions about this here
if possible, always use inline function. These are typesafe and can not be easily redefined.
defines can be redfined undefined, and there is no type checking.
Macros are relatively often abused and one can easily make mistakes using them as shown by your example. Take the expression radian2degree(1 + 1):
with the macro it will expand to 1 + 1 * 57.29... = 58.29...
with a function it will be what you want it to be, namely (1 + 1) * 57.29... = ...
More generally, macros are evil because they look like functions so they trick you into using them just like functions but they have subtle rules of their own. In this case, the correct way would be to write it would be (notice the paranthesis around a):
#define radian2degree(a) ((a) * 57.295779513082)
But you should stick to inline functions. See these links from the C++ FAQ Lite for more examples of evil macros and their subtleties:
inline vs. macros
macros containing if
macros with multiple lines
macros used to paste two tokens together
The compiler's preprocessor is a finnicky thing, and therefore a terrible candidate for clever tricks. As others have pointed out, it's easy to for the compiler to misunderstand your intention with the macro, and it's easy for you to misunderstand what the macro will actually do, but most importantly, you can't step into macros in the debugger!
Macros are evil because you may end up passing more than a variable or a scalar to it and this could resolve in an unwanted behavior (define a max macro to determine max between a and b but pass a++ and b++ to the macro and see what happens).
If your function is going to be inlined anyway, there is no performance difference between a function and a macro. However, there are several usability differences between a function and a macro, all of which favor using a function.
If you build the macro correctly, there is no problem. But if you use a function, the compiler will do it correctly for you every time. So using a function makes it harder to write bad code.

Resources