Is this function macro safe? - c

Can you tell me if anything and what can go wrong with this C "function macro"?
#define foo(P,I,X) do { (P)[I] = X; } while(0)
My goal is that foo behaves exactly like the following function foofunc for any POD data type T (i.e. int, float*, struct my_struct { int a,b,c; }):
static inline void foofunc(T* p, size_t i, T x) { p[i] = x; }
For example this is working correctly:
int i = 0;
float p;
foo(&p,i++,42.0f);
It can handle things like &p due to putting P in parentheses, it does increment i exactly once because I appears only once in the macro and it requires a semicolon at the end of the line due to do {} while(0).
Are there other situations of which I am not aware of and in which the macro foo would not behave like the function foofunc?
In C++ one could define foofunc as a template and would not need the macro. But I look for a solution which works in plain C (C99).

The fact that your macro works for arbitrary X arguments hinges on the details of operator precedence. I recommend using parentheses even if they happen not to be necessary here.
#define foo(P,I,X) do { (P)[I] = (X); } while(0)
This is an instruction, not an expression, so it cannot be used everywhere foofunc(P,I,X) could be. Even if foofunc returns void, it can be used in comma expressions; foo can't. But you can easily define foo as an expression, with a cast to void if you don't want to risk using the result.
#define foo(P,I,X) ((void)((P)[I] = (X)))
With a macro instead of a function, all you lose is the error checking. For example, you can write foo(3, ptr, 42) instead of foo(ptr, 3, 42). In an implementation where size_t is smaller than ptrdiff_t, using the function may truncate I, but the macro's behavior is more intuitive. The type of X may be different from the type that P points to: an automatic conversion will take place, so in effect it is the type of P that determines which typed foofunc is equivalent.
In the important respects, the macro is safe. With appropriate parentheses, if you pass syntactically reasonable arguments, you get a well-formed expansion. Since each parameter is used exactly once, all side effects will take place. The order of evaluation between the parameters is undefined either way.

The do { ... } while(0) construct protects your result from any harm, your inputs P and I are protected by () and [], respectively. What is not protected, is X. So the question is, whether protection is needed for X.
Looking at the operator precedence table (http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Operator_precedence), we see that only two operators are listed as having lower precedence than = so that the assignment could steal their argument: the throw operator (this is C++ only) and the , operator.
Now, apart from being C++ only, the throw operator is uncritical because it does not have a left hand argument that could be stolen.
The , operator, on the other hand, would be a problem if X could contain it as a top level operator. But if you parse the statement
foo(array, index, x += y, y)
you see that the , operator would be interpreted to delimit a fourth argument, and
foo(array, index, (x += y, y))
already comes with the parentheses it requires.
To make a long story short:
Yes, your definition is safe.
However, your definition relies on the impossibility to pass stuff, more_stuff as one macro parameter without adding parentheses. I would prefer not to rely on such intricacies, and just write the obviously safe
#define foo(P, I, X) do { (P)[I] = (X); } while(0)

Related

Double evaluation within macro: a case of sizeof() to determine array's size passed as compound literal

C99 makes it possible to define arrays basically anywhere, as compound literals.
For example, given a trivial function sumf() that accepts an array of float as input, we would expect the prototype to be :
float sumf(const float* arrayf, size_t size);
This can then be used like that :
float total = sumf( (const float[]){ f1, f2, f3 }, 3 );
It's convenient because there is no need to declare a variable beforehand.
The syntax is slightly ugly, but this could be hidden behind a macro.
However, note the final 3. This is the size of the array. It is required so that sumf() knows where to stop. But as code ages and get refactored, it's also an easy source of errors, because now this second argument must be kept in sync with the first parameter definition. For example, adding f4 requires to update this value to 4, otherwise the function returns a wrong calculation (and there is no warning notifying this issue).
So it would be better to keep both in sync.
If it was an array which was declared through a variable, it would be easy.
We could have a macro, that simplifies the expression like this : float total = sumf( ARRAY(array_f) ); with just #define ARRAY(a) (a) , sizeof(a) / sizeof(*(a)). But then, array_f must be defined before calling the function, so it's not longer a compound literal.
Since it's a compound literal, it has no name, so it can't be referenced. Hence I could not find any better way than to repeat the compound literal in both parameters.
#define LIST_F(...) (const float*)( __VA_ARGS__) , sizeof((const float*)( __VA_ARGS__)) / sizeof(float)
float total = sumf ( LIST_F( f1, f2, f3 ) );
and this would work. Adding an f4 into the list would automatically update the size argument to correct size.
However, this all works fine as long as all members are variables. But what about cases where it's a function ? Would the function be invoked twice ?
Say for example : float total = sumf ( LIST_F( v1, f2() ) );, will f2() be invoked twice? This is unclear to me as f2() is mentioned within sizeof(), so it could, in theory, know the return type size without actually invoking f2(). But I'm unsure what the standard says about that. Is there a guarantee ? Is it implementation dependent ?
will f2() be invoked twice?
No, sizeof is not evaluated (unless it's a variable length array, but it's not).
what the standard says about that. Is there a guarantee ?
From C11 6.5.3.4p2:
The sizeof operator yields the size (in bytes) of its operand, [...] If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
Is it implementation dependent ?
No, it should be always fine.
Note that your other macro uses (const float*)(__VA_ARGS__), that will not work - the syntax is (float[]){ stuff }. Anyway, I would just do one macro, why two, too much typing. Just:
#define SUMF_ARRAY(...) \
sumf( \
(const float[]){__VA_ARGS__}, \
sizeof((const float[]){__VA_ARGS__}) / sizeof(float))
float total = SUMF_ARRAY(f1(), f2(), f3());

Multiline macro function with "return" statement

I'm currently working on a project, and a particular part needs a multi-line macro function (a regular function won't work here as far as I know).
The goal is to make a stack manipulation macro, that pulls data of an arbitrary type off the stack (being the internal stack from a function call, not a high-level "stack" data type). If it were a function, it'd look like this:
type MY_MACRO_FUNC(void *ptr, type);
Where type is the type of data being pulled from the stack.
I currently have a working implementation of this for my platform (AVR):
#define MY_MACRO_FUNC(ptr, type) (*(type*)ptr); \
(ptr = /* Pointer arithmetic and other stuff here */)
This allows me to write something like:
int i = MY_MACRO_FUNC(ptr, int);
As you can see in the implementation, this works because the statement which assigns i is the first line in the macro: (*(type*)ptr).
However, what I'd really like is to be able to have a statement before this, to verify that ptr is a valid pointer before anything gets broken. But, this would cause the macro to be expanded with the int i = pointing to that pointer check. Is there any way to get around this issue in standard C? Thanks for any help!
As John Bollinger points out, macros expanding to multiple statements can have surprising results. A way to make several statements (and declarations!) a single statement is to wrap them into a block (surrounded by do … while(0), see for example here).
In this case, however, the macro should evaluate to something, so it must be an expression (and not a statement). Everything but declarations and iteration and jump statements (for, while, goto) can be transformed to an expression: Several expressions can be sequenced with the comma operator, if-else-clauses can be replaced by the conditional operator (?:).
Given that the original value of ptr can be recovered (I’ll assume "arithmetic and other stuff here" as adding 4 for the sake of having an example)
#define MY_MACRO_FUNC(ptr, type) \
( (ptr) && (uintptr_t)(ptr)%4 == 0 \
? (ptr) += 4 , *(type*)((ptr) - 4) \
: (abort() , (type){ 0 }) )
Note, that I put parentheses around ptr and around the whole expression, see e.g. here for an explanation.
The second and third operand of ?: must be of the same type, so I included (type){0} after the abort call. This expression is never evaluated. You just need some valid dummy object; here, type cannot be a function type.
If you use C89 and can’t use compound literals, you can use (type)0, but that wouldn’t allow for structure or union types.
Just as a note, Gcc has an extension Statements and Declarations in Expressions.
This is very nasty:
#define MY_MACRO_FUNC(ptr, type) (*(type*)ptr); \
(ptr = /* Pointer arithmetic and other stuff here */)
It may have unexpected results in certain inoccuous-looking circumstances, such as
if (foo) bar = MY_MACRO_FUNC(ptr, int);
Consider: what happens then if foo is 0?
I think you would be better off implementing this in a form that assigns the popped value instead of 'returning' it:
#define MY_POP(stack, type, v) do { \
if (!stack) abort_abort_abort(); \
v = *((type *) stack); \
stack = (... compute new value ...); \
} while (0)

Anonymous functions using GCC statement expressions

This question isn't terribly specific; it's really for my own C enrichment and I hope others can find it useful as well.
Disclaimer: I know many will have the impulse to respond with "if you're trying to do FP then just use a functional language". I work in an embedded environment that needs to link to many other C libraries, and doesn't have much space for many more large shared libs and does not support many language runtimes. Moreover, dynamic memory allocation is out of the question. I'm also just really curious.
Many of us have seen this nifty C macro for lambda expressions:
#define lambda(return_type, function_body) \
({ \
return_type __fn__ function_body \
__fn__; \
})
And an example usage is:
int (*max)(int, int) = lambda (int, (int x, int y) { return x > y ? x : y; });
max(4, 5); // Example
Using gcc -std=c89 -E test.c, the lambda expands to:
int (*max)(int, int) = ({ int __fn__ (int x, int y) { return x > y ? x : y; } __fn__; });
So, these are my questions:
What precisely does the line int (*X); declare? Of course, int * X; is a pointer to an integer, but how do these two differ?
Taking a look at the exapnded macro, what on earth does the final __fn__ do? If I write a test function void test() { printf("hello"); } test; - that immediately throws an error. I do not understand that syntax.
What does this mean for debugging? (I'm planning to experiment myself with this and gdb, but others' experiences or opinions would be great). Would this screw up static analyzers?
This declaration (at block scope):
int (*max)(int, int) =
({
int __fn__ (int x, int y) { return x > y ? x : y; }
__fn__;
});
is not C but is valid GNU C.
It makes use of two gcc extensions:
nested functions
statement expressions
Both nested functions (defining a function inside a compound statement) and statement expressions (({}), basically a block that yields a value) are not permitted in C and come from GNU C.
In a statement expression, the last expression statement is the value of the construct. This is why the nested function __fn__ appears as an expression statement at the end of the statement expression. A function designator (__fn__ in the last expression statement) in a expression is converted to a pointer to a function by the usual conversions. This is the value used to initialize the function pointer max.
Your lambda macro exploits two funky features. First it uses nested functions to actually define the body of your function (so your lambda is not really anonymous, it just uses an implicit __fn__ variable (which should be renamed to something else, as double-leading-underscore names are reserved for the compiler, so maybe something like yourapp__fn__ would be better).
All of this is itself performed within a GCC compound statement (see http://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html#Statement-Exprs), the basic format of which goes something like:
({ ...; retval; })
the last statement of the compound statement being the address of the just-declared function. Now, int (*max)(int,int) simply gets assigned the value of the compound statement, which is now the pointer to the 'anonymous' function just declared.
Debugging macros are a royal pain of course.
As for the reason why test; .. at least here, i get the 'test redeclared as different type of symbol', which I assume means GCC is treating it as a declaration and not a (useless) expression. Because untyped variables default to int and because you have already declared test as a function (essentially, void (*)(void)) you get that.. but I could be wrong about that.
This is not portable by any stretch of the imagination though.
Partial answer:
It isn't int(*X) you are interested in. It is int (*X)(y,z). That is a function pointer to the function called X which takes (y,z) and returns int.
For debugging, this will be really hard. Most debuggers can't trace through a macro. You would most likely have to debug the assembly.
int (*max)(int, int) is the type of variable you are declaring. It is defined as a function pointer named max which returns int, and takes two ints as parameters.
__fn__ refers to the function name, which in this case is max.
I don't have an answer there. I would imagine you can step through it if you have run it through the preprocessor.

How to check if a parameter is an integral constant expression in a C preprocessor macro?

I'm currently cleaning up an existing C-library to publish it shamelessly.
A preprocessor macro NPOT is used to calculate the next greater power of two for a given integral constant expression at compile time. The macro is normally used in direct initialisations. For all other cases (e.g. using variable parameters), there is an inline function with the same function.
But if the user passes a variable, the algorithm expands to a huge piece of machine code. My question is:
What may I do to prevent a user from passing anything but an integral constant expression to my macro?
#define NPOT(x) complex_algorithm(x)
const int c=10;
int main(void) {
int i=5;
foo = NPOT(5); // works, and does everything it should
foo = NPOT(c); // works also, but blows up the code extremely
foo = NPOT(i); // blows up the code also
}
What I already tried:
Define the macro to #define NPOT(x) complex_algorithm(x ## u). It still works and throws a - even if hardly helpful - compiler error for variable parameters. Unless there is no variable like iu... Dirty, dangerous, don't want it.
Documentation, didn't work for most users.
You can use any expression that needs a constant integral expression and that will then be optimized out.
#define NPOT(X) \
(1 \
? complex_algorithm(X) \
: sizeof(struct { int needs_constant[1 ? 1 : (X)]; }) \
)
eventually you should cast the result of the sizeof to the appropriate integer type, so the return expression is of a type that you'd expect.
I am using an untagged struct here to
have a type so really no temporary is produced
have a unique type such that the expression can be repeated anywhere in the code without causing conflicts
trigger the use of a VLA, which is not allowed inside a struct as of C99:
A member of a structure or union may have any object type other than a
variably modified type.
I am using the ternary ?: with 1 as the selecting expression to ensure that the : is always evaluated for its type, but never evaluated as an expression.
Edit: It seems that gcc accepts VLA inside struct as an extension and doesn't even warn about it, even when I explicitly say -std=c99. This is really a bad idea of them.
For such a weird compiler :) you could use sizeof((int[X]){ 0 }), instead. This is "as forbidden" as the above version, but additionally even gcc complains about it.
#define INTEGRAL_CONST_EXPR(x) ((void) sizeof (struct {int a:(x);}), (x))
This will give a compile error if x is not a integral constant expression.
my_function(INTEGRAL_CONST_EXPR(1 + 2 + 3)); // OK
my_function(INTEGRAL_CONST_EXPR(1.0 + 2 + 3)); // compile error
Note that this solution does not work for initializing a static variable:
static int a = INTEGRAL_CONST_EXPR(2 + 3);
will trigger a compile error because of an expression with , is not a constant expression.
As #JensGustedt put in the comment, an integral constant expression resolving to a negative integer number cannot be used in this solution as bit-field width cannot be negative.

Macro vs Function in C

I often see instances in which using a macro is better than using a function.
Could someone explain me with an example the disadvantage of a macro compared to a function?
Macros are error-prone because they rely on textual substitution and do not perform type-checking. For example, this macro:
#define square(a) a * a
works fine when used with an integer:
square(5) --> 5 * 5 --> 25
but does very strange things when used with expressions:
square(1 + 2) --> 1 + 2 * 1 + 2 --> 1 + 2 + 2 --> 5
square(x++) --> x++ * x++ --> increments x twice
Putting parentheses around arguments helps but doesn't completely eliminate these problems.
When macros contain multiple statements, you can get in trouble with control-flow constructs:
#define swap(x, y) t = x; x = y; y = t;
if (x < y) swap(x, y); -->
if (x < y) t = x; x = y; y = t; --> if (x < y) { t = x; } x = y; y = t;
The usual strategy for fixing this is to put the statements inside a "do { ... } while (0)" loop.
If you have two structures that happen to contain a field with the same name but different semantics, the same macro might work on both, with strange results:
struct shirt
{
int numButtons;
};
struct webpage
{
int numButtons;
};
#define num_button_holes(shirt) ((shirt).numButtons * 4)
struct webpage page;
page.numButtons = 2;
num_button_holes(page) -> 8
Finally, macros can be difficult to debug, producing weird syntax errors or runtime errors that you have to expand to understand (e.g. with gcc -E), because debuggers cannot step through macros, as in this example:
#define print(x, y) printf(x y) /* accidentally forgot comma */
print("foo %s", "bar") /* prints "foo %sbar" */
Inline functions and constants help to avoid many of these problems with macros, but aren't always applicable. Where macros are deliberately used to specify polymorphic behavior, unintentional polymorphism may be difficult to avoid. C++ has a number of features such as templates to help create complex polymorphic constructs in a typesafe way without the use of macros; see Stroustrup's The C++ Programming Language for details.
Macro features:
Macro is Preprocessed
No Type Checking
Code Length Increases
Use of macro can lead to side effect
Speed of Execution is Faster
Before Compilation macro name is replaced by macro value
Useful where small code appears many time
Macro does not Check Compile Errors
Function features:
Function is Compiled
Type Checking is Done
Code Length remains Same
No side Effect
Speed of Execution is Slower
During function call, Transfer of Control takes place
Useful where large code appears many time
Function Checks Compile Errors
Side-effects are a big one. Here's a typical case:
#define min(a, b) (a < b ? a : b)
min(x++, y)
gets expanded to:
(x++ < y ? x++ : y)
x gets incremented twice in the same statement. (and undefined behavior)
Writing multi-line macros are also a pain:
#define foo(a,b,c) \
a += 10; \
b += 10; \
c += 10;
They require a \ at the end of each line.
Macros can't "return" anything unless you make it a single expression:
int foo(int *a, int *b){
side_effect0();
side_effect1();
return a[0] + b[0];
}
Can't do that in a macro unless you use GCC's statement expressions. (EDIT: You can use a comma operator though... overlooked that... But it might still be less readable.)
Order of Operations: (courtesy of #ouah)
#define min(a,b) (a < b ? a : b)
min(x & 0xFF, 42)
gets expanded to:
(x & 0xFF < 42 ? x & 0xFF : 42)
But & has lower precedence than <. So 0xFF < 42 gets evaluated first.
When in doubt, use functions (or inline functions).
However answers here mostly explain the problems with macros, instead of having some simple view that macros are evil because silly accidents are possible.You can be aware of the pitfalls and learn to avoid them. Then use macros only when there is a good reason to.
There are certain exceptional cases where there are advantages to using macros, these include:
Generic functions, as noted below, you can have a macro that can be used on different types of input arguments.
Variable number of arguments can map to different functions instead of using C's va_args.eg: https://stackoverflow.com/a/24837037/432509.
They can optionally include local info, such as debug strings:(__FILE__, __LINE__, __func__). check for pre/post conditions, assert on failure, or even static-asserts so the code won't compile on improper use (mostly useful for debug builds).
Inspect input args, You can do tests on input args such as checking their type, sizeof, check struct members are present before casting(can be useful for polymorphic types).Or check an array meets some length condition.see: https://stackoverflow.com/a/29926435/432509
While its noted that functions do type checking, C will coerce values too (ints/floats for example). In rare cases this may be problematic. Its possible to write macros which are more exacting then a function about their input args. see: https://stackoverflow.com/a/25988779/432509
Their use as wrappers to functions, in some cases you may want to avoid repeating yourself, eg... func(FOO, "FOO");, you could define a macro that expands the string for you func_wrapper(FOO);
When you want to manipulate variables in the callers local scope, passing pointer to a pointer works just fine normally, but in some cases its less trouble to use a macro still.(assignments to multiple variables, for a per-pixel operations, is an example you might prefer a macro over a function... though it still depends a lot on the context, since inline functions may be an option).
Admittedly, some of these rely on compiler extensions which aren't standard C. Meaning you may end up with less portable code, or have to ifdef them in, so they're only taken advantage of when the compiler supports.
Avoiding multiple argument instantiation
Noting this since its one of the most common causes of errors in macros (passing in x++ for example, where a macro may increment multiple times).
its possible to write macros that avoid side-effects with multiple instantiation of arguments.
C11 Generic
If you like to have square macro that works with various types and have C11 support, you could do this...
inline float _square_fl(float a) { return a * a; }
inline double _square_dbl(float a) { return a * a; }
inline int _square_i(int a) { return a * a; }
inline unsigned int _square_ui(unsigned int a) { return a * a; }
inline short _square_s(short a) { return a * a; }
inline unsigned short _square_us(unsigned short a) { return a * a; }
/* ... long, char ... etc */
#define square(a) \
_Generic((a), \
float: _square_fl(a), \
double: _square_dbl(a), \
int: _square_i(a), \
unsigned int: _square_ui(a), \
short: _square_s(a), \
unsigned short: _square_us(a))
Statement expressions
This is a compiler extension supported by GCC, Clang, EKOPath & Intel C++ (but not MSVC);
#define square(a_) __extension__ ({ \
typeof(a_) a = (a_); \
(a * a); })
So the disadvantage with macros is you need to know to use these to begin with, and that they aren't supported as widely.
One benefit is, in this case, you can use the same square function for many different types.
Example 1:
#define SQUARE(x) ((x)*(x))
int main() {
int x = 2;
int y = SQUARE(x++); // Undefined behavior even though it doesn't look
// like it here
return 0;
}
whereas:
int square(int x) {
return x * x;
}
int main() {
int x = 2;
int y = square(x++); // fine
return 0;
}
Example 2:
struct foo {
int bar;
};
#define GET_BAR(f) ((f)->bar)
int main() {
struct foo f;
int a = GET_BAR(&f); // fine
int b = GET_BAR(&a); // error, but the message won't make much sense unless you
// know what the macro does
return 0;
}
Compared to:
struct foo {
int bar;
};
int get_bar(struct foo *f) {
return f->bar;
}
int main() {
struct foo f;
int a = get_bar(&f); // fine
int b = get_bar(&a); // error, but compiler complains about passing int* where
// struct foo* should be given
return 0;
}
No type checking of parameters and code is repeated which can lead to code bloat. The macro syntax can also lead to any number of weird edge cases where semi-colons or order of precedence can get in the way. Here's a link that demonstrates some macro evil
one drawback to macros is that debuggers read source code, which does not have expanded macros, so running a debugger in a macro is not necessarily useful. Needless to say, you cannot set a breakpoint inside a macro like you can with functions.
Functions do type checking. This gives you an extra layer of safety.
Adding to this answer..
Macros are substituted directly into the program by the preprocessor (since they basically are preprocessor directives). So they inevitably use more memory space than a respective function. On the other hand, a function requires more time to be called and to return results, and this overhead can be avoided by using macros.
Also macros have some special tools than can help with program portability on different platforms.
Macros don't need to be assigned a data type for their arguments in contrast with functions.
Overall they are a useful tool in programming. And both macroinstructions and functions can be used depending on the circumstances.
I did not notice, in the answers above, one advantage of functions over macros that I think is very important:
Functions can be passed as arguments, macros cannot.
Concrete example: You want to write an alternate version of the standard 'strpbrk' function that will accept, rather than an explicit list of characters to search for within another string, a (pointer to a) function that will return 0 until a character is found that passes some test (user-defined). One reason you might want to do this is so that you can exploit other standard library functions: instead of providing an explicit string full of punctuation, you could pass ctype.h's 'ispunct' instead, etc. If 'ispunct' was implemented only as a macro, this wouldn't work.
There are lots of other examples. For example, if your comparison is accomplished by macro rather than function, you can't pass it to stdlib.h's 'qsort'.
An analogous situation in Python is 'print' in version 2 vs. version 3 (non-passable statement vs. passable function).
If you pass function as an argument to macro it will be evaluated every time.
For example, if you call one of the most popular macro:
#define MIN(a,b) ((a)<(b) ? (a) : (b))
like that
int min = MIN(functionThatTakeLongTime(1),functionThatTakeLongTime(2));
functionThatTakeLongTime will be evaluated 5 times which can significantly drop perfomance

Resources