I'm currently working on a project, and a particular part needs a multi-line macro function (a regular function won't work here as far as I know).
The goal is to make a stack manipulation macro, that pulls data of an arbitrary type off the stack (being the internal stack from a function call, not a high-level "stack" data type). If it were a function, it'd look like this:
type MY_MACRO_FUNC(void *ptr, type);
Where type is the type of data being pulled from the stack.
I currently have a working implementation of this for my platform (AVR):
#define MY_MACRO_FUNC(ptr, type) (*(type*)ptr); \
(ptr = /* Pointer arithmetic and other stuff here */)
This allows me to write something like:
int i = MY_MACRO_FUNC(ptr, int);
As you can see in the implementation, this works because the statement which assigns i is the first line in the macro: (*(type*)ptr).
However, what I'd really like is to be able to have a statement before this, to verify that ptr is a valid pointer before anything gets broken. But, this would cause the macro to be expanded with the int i = pointing to that pointer check. Is there any way to get around this issue in standard C? Thanks for any help!
As John Bollinger points out, macros expanding to multiple statements can have surprising results. A way to make several statements (and declarations!) a single statement is to wrap them into a block (surrounded by do … while(0), see for example here).
In this case, however, the macro should evaluate to something, so it must be an expression (and not a statement). Everything but declarations and iteration and jump statements (for, while, goto) can be transformed to an expression: Several expressions can be sequenced with the comma operator, if-else-clauses can be replaced by the conditional operator (?:).
Given that the original value of ptr can be recovered (I’ll assume "arithmetic and other stuff here" as adding 4 for the sake of having an example)
#define MY_MACRO_FUNC(ptr, type) \
( (ptr) && (uintptr_t)(ptr)%4 == 0 \
? (ptr) += 4 , *(type*)((ptr) - 4) \
: (abort() , (type){ 0 }) )
Note, that I put parentheses around ptr and around the whole expression, see e.g. here for an explanation.
The second and third operand of ?: must be of the same type, so I included (type){0} after the abort call. This expression is never evaluated. You just need some valid dummy object; here, type cannot be a function type.
If you use C89 and can’t use compound literals, you can use (type)0, but that wouldn’t allow for structure or union types.
Just as a note, Gcc has an extension Statements and Declarations in Expressions.
This is very nasty:
#define MY_MACRO_FUNC(ptr, type) (*(type*)ptr); \
(ptr = /* Pointer arithmetic and other stuff here */)
It may have unexpected results in certain inoccuous-looking circumstances, such as
if (foo) bar = MY_MACRO_FUNC(ptr, int);
Consider: what happens then if foo is 0?
I think you would be better off implementing this in a form that assigns the popped value instead of 'returning' it:
#define MY_POP(stack, type, v) do { \
if (!stack) abort_abort_abort(); \
v = *((type *) stack); \
stack = (... compute new value ...); \
} while (0)
Related
Is it possible to implement something similar to C++20's std::bit_cast in C? It would be a lot more convenient than using union or casting pointers to different types and dereferencing.
If you had a bit_cast, then implementing some floating point functions would be easier:
float Q_rsqrt( float number )
{
int i = 0x5f3759df - ( bit_cast(int, number) >> 1 );
float y = bit_cast(float, i);
y = y * ( 1.5f - ( number * 0.5f * y * y ) );
y = y * ( 1.5f - ( number * 0.5f * y * y ) );
return y;
}
See also Fast inverse square root
The naive solution is:
#define bit_cast(T, ...) (*(T*) &(__VA_ARGS__))
But it has major problems:
it is undefined behavior because it violates strict aliasing
it doesn't work for bit-casting rvalues because we are taking the address of the second operand directly
it doesn't make sure that the operands have the same size
Can we implement a bit_cast without these issues?
It is possible in non-standard standard C, thanks to typeof. typeof is also a further proposed feature for C23, so it may become possible in standard C23. One of the solutions below makes some sacrifices which allow C99 compliance.
Implementation Using union
Let's look at how the approach using union works first:
#define bit_cast(T, ...) \
((union{typeof(T) a; typeof(__VA_ARGS__) b;}) {.b=(__VA_ARGS__)}.a)
We are creating a compound literal from an anonymous union made of T and whatever type the given expression has. We initialize this literal to .b= ... using designated initializers and then access the .a member of type T.
The typeof(T) is necessary if we want to pun function pointers, arrays, etc., due to C's type syntax.
Implementation using memcpy
This implementation is slightly longer, but has the advantage of relying only on C99, and can even work without the use of typeof:
#define bit_cast(T, ...) \
(*(typeof(T)*) memcpy(&(T){0}, &(typeof(__VA_ARGS__)) {(__VA_ARGS__)}, sizeof(T)))
We are copying from one compound literal to another and then accessing the destination's value:
the source literal is a copy of our input expression, which allows us to take its address, even for bit_cast(float, 123) where 123 is an rvalue
the destination is a zero-initialized literal of type T
memcpy returns the destination operand, so we can cast the result to typeof(T)* and then dereference that pointer.
We can completely eliminate typeof here and make this C99-compliant, but there are downsides:
#define bit_cast(T, ...) \
(*((T*) memcpy(&(T){0}, &(__VA_ARGS__), sizeof(T))))
We are now taking the address of the expression directly, so we can't use bit_cast on rvalues anymore. We are using T* without typeof, so we can no longer convert to function pointers, arrays, etc.
Implementing Size Checking (since C11)
As for the last issue, which is that we don't verify that both operands have the same size: We can use _Static_assert (since C11) to make sure of that. Unfortunately, _Static_assert is a declaration, not an expression, so we have to wrap it up:
#define static_assert_expr(...) \
((void) (struct{_Static_assert(__VA_ARGS__); int _;}) {0})
We are creating a compound literal that contains the assertion and discarding the expression.
We can easily integrate this in the previous two implementations using the comma operator:
#define bit_cast_memcpy(T, ...) ( \
static_assert_expr(sizeof(T) == sizeof(__VA_ARGS__), "operands must have the same size"), \
(*(typeof(T)*) memcpy(&(T){0}, &(typeof(__VA_ARGS__)) {(__VA_ARGS__)}, sizeof(T))) \
)
#define bit_cast_union(T, ...) ( \
static_assert_expr(sizeof(T) == sizeof(__VA_ARGS__), "operands must have the same size"), \
((union{typeof(T) a; typeof(__VA_ARGS__) b;}) {.b=(__VA_ARGS__)}.a) \
)
Known and Unfixable Issues
Because of how macros work, we can not use this if the punned type contains a comma:
bit_cast(int[0,1], x)
This doesn't work because macros ignore square brackets and the 1] would not be considered part of the type, but would go into __VA_ARGS__.
C99 makes it possible to define arrays basically anywhere, as compound literals.
For example, given a trivial function sumf() that accepts an array of float as input, we would expect the prototype to be :
float sumf(const float* arrayf, size_t size);
This can then be used like that :
float total = sumf( (const float[]){ f1, f2, f3 }, 3 );
It's convenient because there is no need to declare a variable beforehand.
The syntax is slightly ugly, but this could be hidden behind a macro.
However, note the final 3. This is the size of the array. It is required so that sumf() knows where to stop. But as code ages and get refactored, it's also an easy source of errors, because now this second argument must be kept in sync with the first parameter definition. For example, adding f4 requires to update this value to 4, otherwise the function returns a wrong calculation (and there is no warning notifying this issue).
So it would be better to keep both in sync.
If it was an array which was declared through a variable, it would be easy.
We could have a macro, that simplifies the expression like this : float total = sumf( ARRAY(array_f) ); with just #define ARRAY(a) (a) , sizeof(a) / sizeof(*(a)). But then, array_f must be defined before calling the function, so it's not longer a compound literal.
Since it's a compound literal, it has no name, so it can't be referenced. Hence I could not find any better way than to repeat the compound literal in both parameters.
#define LIST_F(...) (const float*)( __VA_ARGS__) , sizeof((const float*)( __VA_ARGS__)) / sizeof(float)
float total = sumf ( LIST_F( f1, f2, f3 ) );
and this would work. Adding an f4 into the list would automatically update the size argument to correct size.
However, this all works fine as long as all members are variables. But what about cases where it's a function ? Would the function be invoked twice ?
Say for example : float total = sumf ( LIST_F( v1, f2() ) );, will f2() be invoked twice? This is unclear to me as f2() is mentioned within sizeof(), so it could, in theory, know the return type size without actually invoking f2(). But I'm unsure what the standard says about that. Is there a guarantee ? Is it implementation dependent ?
will f2() be invoked twice?
No, sizeof is not evaluated (unless it's a variable length array, but it's not).
what the standard says about that. Is there a guarantee ?
From C11 6.5.3.4p2:
The sizeof operator yields the size (in bytes) of its operand, [...] If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
Is it implementation dependent ?
No, it should be always fine.
Note that your other macro uses (const float*)(__VA_ARGS__), that will not work - the syntax is (float[]){ stuff }. Anyway, I would just do one macro, why two, too much typing. Just:
#define SUMF_ARRAY(...) \
sumf( \
(const float[]){__VA_ARGS__}, \
sizeof((const float[]){__VA_ARGS__}) / sizeof(float))
float total = SUMF_ARRAY(f1(), f2(), f3());
I am curious to know the use of parentheses for both filp and x pointers in the following assignment operation:
#define init_sync_kiocb(x, filp) \
do { \
struct task_struct *tsk = current; \
(x)->ki_flags = 0; \
(x)->ki_users = 1; \
(x)->ki_key = KIOCB_SYNC_KEY; \
(x)->ki_filp = (filp); \ // This line here
....
....
Source:
https://github.com/gp-b2g/gp-peak-kernel/blob/master/include/linux/aio.h#L135
These are used in a macro definition which is handled by the preprocessor as text substitution. The fact that it is text substitution can result in weird expressions. Consider:
p = &a_struct_array[10];
init_sync_kiocb(p + 20, filp)
without the parens, it turns into:
p + 20->ki_filp = (filp);
with the parens:
(p + 20)->ki_filp = (filp);
I couldn't, but I bet similar examples can be found for the filp too, or at least you never know for sure.
The left-hand side is just typical safety measure since x is a macro parameter. It could expand to something that makes the -> operator fail unless the "thing that needs to be a struct pointer" is protected.
The right-hand side is less obvious to me but might be done just for reasons of consistency and symmetry; always protect macro arguments with parentheses. Some people treat that as a hard rule, and perhaps that project's style guide does, too.
It is inside a macro. This is common and good habit. Imagine you invoke the macro init_sync_kiocb as e.g.
init_sync_kiocb(pp?*pp:&x,fil?fil:somfil+1);
with the parenthesis this gets expanded as
(pp?*pp:&x)->ki_filp = (fil?fil:somfil+1);
without parenthesis the macro-expansion would be wrong (typing error, or parsing error):
pp?*pp:&x->ki_filp = fil?fil:somfil+1;
Don't forget to mention this is part of a function macro expansion. Such parameters should always be parenthesised to avoid bugs if the passed-in expressions are complex.
Can you tell me if anything and what can go wrong with this C "function macro"?
#define foo(P,I,X) do { (P)[I] = X; } while(0)
My goal is that foo behaves exactly like the following function foofunc for any POD data type T (i.e. int, float*, struct my_struct { int a,b,c; }):
static inline void foofunc(T* p, size_t i, T x) { p[i] = x; }
For example this is working correctly:
int i = 0;
float p;
foo(&p,i++,42.0f);
It can handle things like &p due to putting P in parentheses, it does increment i exactly once because I appears only once in the macro and it requires a semicolon at the end of the line due to do {} while(0).
Are there other situations of which I am not aware of and in which the macro foo would not behave like the function foofunc?
In C++ one could define foofunc as a template and would not need the macro. But I look for a solution which works in plain C (C99).
The fact that your macro works for arbitrary X arguments hinges on the details of operator precedence. I recommend using parentheses even if they happen not to be necessary here.
#define foo(P,I,X) do { (P)[I] = (X); } while(0)
This is an instruction, not an expression, so it cannot be used everywhere foofunc(P,I,X) could be. Even if foofunc returns void, it can be used in comma expressions; foo can't. But you can easily define foo as an expression, with a cast to void if you don't want to risk using the result.
#define foo(P,I,X) ((void)((P)[I] = (X)))
With a macro instead of a function, all you lose is the error checking. For example, you can write foo(3, ptr, 42) instead of foo(ptr, 3, 42). In an implementation where size_t is smaller than ptrdiff_t, using the function may truncate I, but the macro's behavior is more intuitive. The type of X may be different from the type that P points to: an automatic conversion will take place, so in effect it is the type of P that determines which typed foofunc is equivalent.
In the important respects, the macro is safe. With appropriate parentheses, if you pass syntactically reasonable arguments, you get a well-formed expansion. Since each parameter is used exactly once, all side effects will take place. The order of evaluation between the parameters is undefined either way.
The do { ... } while(0) construct protects your result from any harm, your inputs P and I are protected by () and [], respectively. What is not protected, is X. So the question is, whether protection is needed for X.
Looking at the operator precedence table (http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Operator_precedence), we see that only two operators are listed as having lower precedence than = so that the assignment could steal their argument: the throw operator (this is C++ only) and the , operator.
Now, apart from being C++ only, the throw operator is uncritical because it does not have a left hand argument that could be stolen.
The , operator, on the other hand, would be a problem if X could contain it as a top level operator. But if you parse the statement
foo(array, index, x += y, y)
you see that the , operator would be interpreted to delimit a fourth argument, and
foo(array, index, (x += y, y))
already comes with the parentheses it requires.
To make a long story short:
Yes, your definition is safe.
However, your definition relies on the impossibility to pass stuff, more_stuff as one macro parameter without adding parentheses. I would prefer not to rely on such intricacies, and just write the obviously safe
#define foo(P, I, X) do { (P)[I] = (X); } while(0)
I'm currently cleaning up an existing C-library to publish it shamelessly.
A preprocessor macro NPOT is used to calculate the next greater power of two for a given integral constant expression at compile time. The macro is normally used in direct initialisations. For all other cases (e.g. using variable parameters), there is an inline function with the same function.
But if the user passes a variable, the algorithm expands to a huge piece of machine code. My question is:
What may I do to prevent a user from passing anything but an integral constant expression to my macro?
#define NPOT(x) complex_algorithm(x)
const int c=10;
int main(void) {
int i=5;
foo = NPOT(5); // works, and does everything it should
foo = NPOT(c); // works also, but blows up the code extremely
foo = NPOT(i); // blows up the code also
}
What I already tried:
Define the macro to #define NPOT(x) complex_algorithm(x ## u). It still works and throws a - even if hardly helpful - compiler error for variable parameters. Unless there is no variable like iu... Dirty, dangerous, don't want it.
Documentation, didn't work for most users.
You can use any expression that needs a constant integral expression and that will then be optimized out.
#define NPOT(X) \
(1 \
? complex_algorithm(X) \
: sizeof(struct { int needs_constant[1 ? 1 : (X)]; }) \
)
eventually you should cast the result of the sizeof to the appropriate integer type, so the return expression is of a type that you'd expect.
I am using an untagged struct here to
have a type so really no temporary is produced
have a unique type such that the expression can be repeated anywhere in the code without causing conflicts
trigger the use of a VLA, which is not allowed inside a struct as of C99:
A member of a structure or union may have any object type other than a
variably modified type.
I am using the ternary ?: with 1 as the selecting expression to ensure that the : is always evaluated for its type, but never evaluated as an expression.
Edit: It seems that gcc accepts VLA inside struct as an extension and doesn't even warn about it, even when I explicitly say -std=c99. This is really a bad idea of them.
For such a weird compiler :) you could use sizeof((int[X]){ 0 }), instead. This is "as forbidden" as the above version, but additionally even gcc complains about it.
#define INTEGRAL_CONST_EXPR(x) ((void) sizeof (struct {int a:(x);}), (x))
This will give a compile error if x is not a integral constant expression.
my_function(INTEGRAL_CONST_EXPR(1 + 2 + 3)); // OK
my_function(INTEGRAL_CONST_EXPR(1.0 + 2 + 3)); // compile error
Note that this solution does not work for initializing a static variable:
static int a = INTEGRAL_CONST_EXPR(2 + 3);
will trigger a compile error because of an expression with , is not a constant expression.
As #JensGustedt put in the comment, an integral constant expression resolving to a negative integer number cannot be used in this solution as bit-field width cannot be negative.