Here's an example of a macro that wraps iterator functions in C,
Macro definition:
/* helper macros for iterating over tree types */
#define NODE_TREE_TYPES_BEGIN(ntype) \
{ \
GHashIterator *__node_tree_type_iter__ = ntreeTypeGetIterator(); \
for (; !BLI_ghashIterator_done(__node_tree_type_iter__); BLI_ghashIterator_step(__node_tree_type_iter__)) { \
bNodeTreeType *ntype = BLI_ghashIterator_getValue(__node_tree_type_iter__);
#define NODE_TREE_TYPES_END \
} \
BLI_ghashIterator_free(__node_tree_type_iter__); \
} (void)0
Example use:
NODE_TREE_TYPES_BEGIN(nt)
{
if (nt->ext.free) {
nt->ext.free(nt->ext.data);
}
}
NODE_TREE_TYPES_END;
However nested use (while functional), causes shadowing (gcc's -Wshadow)
NODE_TREE_TYPES_BEGIN(nt_a)
{
NODE_TREE_TYPES_BEGIN(nt_b)
{
/* do something */
}
NODE_TREE_TYPES_END;
}
NODE_TREE_TYPES_END;
The only way I can think of to avoid this is to pass a unique identifier to NODE_TREE_TYPES_BEGIN and NODE_TREE_TYPES_END. So my question is...
Is there there a way to prevent shadowing if variables declared within an iterator macro when its scope is nested?
You don't need to insert the same unique identifier in two places, if you can restructure the block so that it never needs the second macro to close it - then you only have one macro invocation and can use simple solutions like __LINE__ or __COUNTER__.
You can restructure the block by taking further advantage of for, to insert operations intended to happen after the block, in a position textually before it:
#define NODE_TREE_TYPES(ntype) \
for (GHashIterator *__node_tree_type_iter__ = ntreeTypeGetIterator(); \
__node_tree_type_iter__; \
(BLI_ghashIterator_free(__node_tree_type_iter__), __node_tree_type_iter__ = NULL)) \
for (bNodeTreeType *ntype = NULL; \
(ntype = BLI_ghashIterator_getValue(__node_tree_type_iter__), !BLI_ghashIterator_done(__node_tree_type_iter__)); \
BLI_ghashIterator_step(__node_tree_type_iter__))
The outer level of your original macro pairs is a compound statement, containing exactly three things: a declaration+initialization, an enclosed for structure, and a single free operation after which the declared variable is not used again.
This makes it very easy to restructure as a for of its own instead of an explicit compound statement: the declaration+initialization goes in the first clause of the for (wouldn't be as easy if you'd had two variables, although it is still possible); the enclosed for can be placed after the end of the for header we're building, since it's a single statement; and the free operation is placed in the third clause. Since the variable is not used in any further statements, we can take advantage of it: combine the free with an explicit assignment of NULL, using the comma operator, and then make the middle clause a check that the variable is not NULL, ensuring the loop runs exactly once.
The nested for gets a similar but more minor modification. Its statement body contains a declaration and per-loop initialization, but we can still hoist this out; put the declaration in the unused first clause of the for (which will still put it in the new scope), and initialize it in the second clause so that it happens at the start of every iteration; combine that initialization with the actual test using the comma operator again. This removes all boilerplate from the statement block and therefore means you no longer have any braces, and thus no need for a second macro to close the braces.
Then you have a single macro invocation you can use like this:
NODE_TREE_TYPES (nt) {
if (nt->ext.free) {
nt->ext.free(nt->ext.data);
}
}
(you can then apply the generation of a unique identifier to this to get rid of shadowing easily, using techniques shown in other questions)
Is this ugly? Does abusing the for statement and comma operator make the average C programmer's skin crawl? Oh lord yes. BUT, it's a bit cleaner, and it's the arguable "right" way to mess about if you really have to mess about.
Having a "close" macro that inserts compound-statement-breaks or hides close braces is a much worse idea, because not only does it give you problems with identifiers and matching scope, but it also hides the block structure of the program from the reader; abuse of the for statement at least means that the block structure of the program, and variable scope and so on, is not mutilated as well.
Related
Both kernel coding style and gnome's C style guide states that:
Do not unnecessarily use braces where a single statement will do.
if (condition)
action();
but at the same time it should be sometimes used, as in else branch of:
if (condition) {
do_this();
do_that();
} else {
otherwise();
}
Is there any technical or usability reasons to prefer it this way? Are there any objective reasons not to put the braces there everytime?
There are only stylistic and ease-of-editing-related reasons.
Whether you omit the brace or not, C compilers must act as if the braces were there (+ a pair around the whole iteration statement (if or if-else)).
6.8.4p3:
A selection statement is a block whose scope is a strict subset of the
scope of its enclosing block. Each associated substatement is also a
block whose scope is a strict subset of the scope of the selection
statement.
The existence of these implicit blocks can be nicely demonstrated with enums:
#include <stdio.h>
int main()
{
enum{ e=0};
printf("%d\n", (int)e);
if(1) printf("%d\n", (sizeof(enum{e=1}),(int)e));
if(sizeof(enum{e=2})) printf("%d\n", (int)e);
printf("%d\n", (int)e);
//prints 0 1 2 0
}
A similar rule also exists for iteration statements: 6.8.5p5.
These implicit blocks also mean that a compound literal defined inside an iteration or selection statement is limited to such an implicit block. That is why example http://port70.net/~nsz/c/c11/n1570.html#6.5.2.5p15 from the standard puts a compound literal in between a label an explicit goto instead of simply using a while statement, which would limit the scope of the literal, regardless of whether or not explicit braces were used.
While it may be tempting, don't ever do:
if (Ptr) Ptr = &(type){0}; //WRONG way to provide a default for Ptr
The above leads to UB (and actually nonworking wit gcc -O3) because of the scoping rules.
The correct way to do the above is either with:
type default_val = {0};
if (Ptr) Ptr = &default_val; //OK
or with:
Ptr = Ptr ? Ptr : &(type){0}; //OK
These implicit blocks are new in C99 and the inner ones (for selection statements (=ifs)) are well rationalized (C99RationaleV5.10.pdf, section 6.8) as aids in refactoring, preventing braces that are added from previously unbraced branches from changing meaning.
The outermost branch around the whole selection statements doesn't appear to be so well rationalized, unfortunately (more accurately, it's not rationalized at all). It appears copied from the rule for iterations statements, which appears to copy the C++ rules where for-loop-local variables are destructed at the very end of the whole for loop (as if the for loop were braced).
(Unfortunately, I think that for selection statement the outermost implicit {} does more harm than good as it prevents you from having macros that stack-allocate in just the scope of the caller but also need a check, because then you can only check such macros with ?: but not with if, which is weird.)
Well, there's one special case in which braces do need to be used: Suppose you have the following code:
if (a)
if (b)
f();
else g();
As it is indented, one could assume the else g(); statement belongs to the first if(a) statement, but C syntax rules say that it is interpreted as (now with braces):
if (a) {
if (b) {
f();
}
else {
g();
}
}
which actually means:
if (a) {
if (b) {
f();
}
else {
g();
}
}
in case you wanted the other possibility, then you must use braces. For example you can write it this way:
if (a) {
if (b)
f();
}
else
g();
which actually means:
if (a) {
if (b) {
f();
}
}
else {
g();
}
Note
As all elementary programming books recommend: If you are in doubt about operator precedence, then use parentheses; if you extend that to statements coding, if you are in doubt, use braces! :)
I hate those "if in doubt" guidelines with a passion. They engender laziness that pushes the cost onto the code reader.
Such guidelines lead to code that is more cluttered, slower to read, and therefore harder to debug.
If in doubt go and read the precedence table.
If still hesitating, write some test code to verify the interpretation.
Repeat this every time you code until precedence becomes second nature.
When you are sure you have a firm grasp of precedence, then and only then write the production code.
If you really can't manager that, then always break up your statements so that they contain no more than two levels of grouping parentheses in any one statement. If that means you have to make up lots of temporary variable names, that's a good thing.
I am writing some code that I will want to use multiple times with slightly different function and variable names. I want to replace part of the function and variable names with a macro. gcc filename.c -E shows that the substitution is not being made. How do I rectify this?
Here is some code from the file, before substitution:
#define _CLASS Object
#define POOLLEVEL1 1024
#define POOLLEVEL2 1024
typedef struct {
int Self;
int Prev;
int Next;
int In_Use;
//----data----//
//----function pointers----//
} Object;
_CLASS* _CLASS_Pool[POOLLEVEL1] = { 0 };
//Note on POOLLEVEL1, POOLLEVEL2: _CLASS_Pool[] is an array of pointers to arrays of type _CLASS. The number of objects in these arrays is LEVEL2, the maximum number of arrays of type object is LEVEL1; The arrays of type object are allocated when needed.
int _CLASS_Available_Head = -1;
int _CLASS_Available_Tail = -1;
//Start and finish of list of available objects in pool.
// More follows
The preprocessor operates on tokens. And when it comes to identifiers _CLASS is one token, while _CLASS_Pool is another entirely, since they are different identifiers. The preprocessor is not going to stop in the middle of parsing an identifier to check if part of it is another identifier. No, it will gobble up all of _CLASS_Pool before recognizing what the identifier is.
If you ever heard the preprocessor does pure textual substitution, that was a gross over-simplification. It operates on tokens, something best to always keep in mind.
So what you need is a mechanism by which the preprocessor accepts _CLASS as a token, expands it, and then pastes it to another token. Fortunately for you, those mechanisms already exist. It can be written as follows:
#define CONCAT(a, b) CONCAT_(a, b)
#define CONCAT_(a, b) a ## b
To be used like this:
_CLASS* CONCAT(_CLASS, _Pool)[POOLLEVEL1] = { 0 };
int CONCAT(_CLASS, _Available_Head) = -1;
/* and so forth */
The first CONCAT accepts your arguments, and forwards them to another function like macro. Forwarding them allows for any intermediate expansion, like _CLASS -> Object. Tokens that aren't object-like macros remains unchanged. CONCAT_ then simply applies the in-built token pasting operator. You can examine the result and tweak it further.
As an aside, the C standard reserves all identifiers that begin by an underscore, followed by an uppercase letter (_[A-Z][0-9a-zA-Z]*), to the implementation, for any use. Using them yourself leaves you open for undefined behavior. In general, try to avoid leading underscore in identifiers, unless you know all the rules for reserved identifiers by heart.
I'm a little confused on if I can use #define to point to a function. I have a codec/DSP who's tool automatically generates pages of code like this:
SIGMA_WRITE_REGISTER(address, data, length);
SIGMA_WRITE_REGISTER(address, data, length);
SIGMA_WRITE_REGISTER(address, data, length);
....
Then in another .h file they do this:
#define SIGMA_WRITE_REGISTER( address, data, length ) {
/*TODO: implement macro or define as function*/}
Which helpfully doesn't define anything about writing registers. That's fine though wrote some code for my micro to write registers over I2C and that seems to be working. Now I don't want to just paste that code into the above define and have it instantiate it 1000 times. I was hoping I could just use the is define as an alias to my function?
Something like:
#define SIGMA_WRITE_REGISTER(address, data, length) { my_i2_c_func(address, data, length)}
I tried something like this and it compiled, I'm not so sure it's working though. Is this a valid thing to do or am I barking up the wrong tree?
Yes, you can surely use #define to point to a full-fledged function or alias.
Consider a simpler example below, just for understanding
#define STRLEN(x) my_strlen(x)
and
int my_strlen(char *p)
{
// check for NULL pointer argument?
int x;
for (x = 0; *p++; x++);
return x;
}
now, in your code, you can use STRLEN as you wish.
Note: Regarding the presence of { }, you can either get rid of them, or use a do..while loop, or define the function as a part of macro itself. Choice is yours. However, as MACRO is expanded during the pre-processing stage [resemble a textual replacement], you need to be extra bit careful about the {} and the ; usage. The MACRO usage should not break the code.
It's (almost) valid, but you need a semicolon after the last close parenthesis and before the close brace.
The braces in the replacement are completely superfluous, so you'll remove them anyway, and then you don't need the semicolon you just added. The braces give you a null statement after each statement block (and if you keep the semicolon in the macro, you also get a null statement after each macro invocation.
The comments in the .h file indicate that you can replace the macro with a function (call). What you're doing is basically fine.
#define SIGMA_WRITE_REGISTER my_i2_c_func
I'm currently working on a project, and a particular part needs a multi-line macro function (a regular function won't work here as far as I know).
The goal is to make a stack manipulation macro, that pulls data of an arbitrary type off the stack (being the internal stack from a function call, not a high-level "stack" data type). If it were a function, it'd look like this:
type MY_MACRO_FUNC(void *ptr, type);
Where type is the type of data being pulled from the stack.
I currently have a working implementation of this for my platform (AVR):
#define MY_MACRO_FUNC(ptr, type) (*(type*)ptr); \
(ptr = /* Pointer arithmetic and other stuff here */)
This allows me to write something like:
int i = MY_MACRO_FUNC(ptr, int);
As you can see in the implementation, this works because the statement which assigns i is the first line in the macro: (*(type*)ptr).
However, what I'd really like is to be able to have a statement before this, to verify that ptr is a valid pointer before anything gets broken. But, this would cause the macro to be expanded with the int i = pointing to that pointer check. Is there any way to get around this issue in standard C? Thanks for any help!
As John Bollinger points out, macros expanding to multiple statements can have surprising results. A way to make several statements (and declarations!) a single statement is to wrap them into a block (surrounded by do … while(0), see for example here).
In this case, however, the macro should evaluate to something, so it must be an expression (and not a statement). Everything but declarations and iteration and jump statements (for, while, goto) can be transformed to an expression: Several expressions can be sequenced with the comma operator, if-else-clauses can be replaced by the conditional operator (?:).
Given that the original value of ptr can be recovered (I’ll assume "arithmetic and other stuff here" as adding 4 for the sake of having an example)
#define MY_MACRO_FUNC(ptr, type) \
( (ptr) && (uintptr_t)(ptr)%4 == 0 \
? (ptr) += 4 , *(type*)((ptr) - 4) \
: (abort() , (type){ 0 }) )
Note, that I put parentheses around ptr and around the whole expression, see e.g. here for an explanation.
The second and third operand of ?: must be of the same type, so I included (type){0} after the abort call. This expression is never evaluated. You just need some valid dummy object; here, type cannot be a function type.
If you use C89 and can’t use compound literals, you can use (type)0, but that wouldn’t allow for structure or union types.
Just as a note, Gcc has an extension Statements and Declarations in Expressions.
This is very nasty:
#define MY_MACRO_FUNC(ptr, type) (*(type*)ptr); \
(ptr = /* Pointer arithmetic and other stuff here */)
It may have unexpected results in certain inoccuous-looking circumstances, such as
if (foo) bar = MY_MACRO_FUNC(ptr, int);
Consider: what happens then if foo is 0?
I think you would be better off implementing this in a form that assigns the popped value instead of 'returning' it:
#define MY_POP(stack, type, v) do { \
if (!stack) abort_abort_abort(); \
v = *((type *) stack); \
stack = (... compute new value ...); \
} while (0)
I am curious to know the use of parentheses for both filp and x pointers in the following assignment operation:
#define init_sync_kiocb(x, filp) \
do { \
struct task_struct *tsk = current; \
(x)->ki_flags = 0; \
(x)->ki_users = 1; \
(x)->ki_key = KIOCB_SYNC_KEY; \
(x)->ki_filp = (filp); \ // This line here
....
....
Source:
https://github.com/gp-b2g/gp-peak-kernel/blob/master/include/linux/aio.h#L135
These are used in a macro definition which is handled by the preprocessor as text substitution. The fact that it is text substitution can result in weird expressions. Consider:
p = &a_struct_array[10];
init_sync_kiocb(p + 20, filp)
without the parens, it turns into:
p + 20->ki_filp = (filp);
with the parens:
(p + 20)->ki_filp = (filp);
I couldn't, but I bet similar examples can be found for the filp too, or at least you never know for sure.
The left-hand side is just typical safety measure since x is a macro parameter. It could expand to something that makes the -> operator fail unless the "thing that needs to be a struct pointer" is protected.
The right-hand side is less obvious to me but might be done just for reasons of consistency and symmetry; always protect macro arguments with parentheses. Some people treat that as a hard rule, and perhaps that project's style guide does, too.
It is inside a macro. This is common and good habit. Imagine you invoke the macro init_sync_kiocb as e.g.
init_sync_kiocb(pp?*pp:&x,fil?fil:somfil+1);
with the parenthesis this gets expanded as
(pp?*pp:&x)->ki_filp = (fil?fil:somfil+1);
without parenthesis the macro-expansion would be wrong (typing error, or parsing error):
pp?*pp:&x->ki_filp = fil?fil:somfil+1;
Don't forget to mention this is part of a function macro expansion. Such parameters should always be parenthesised to avoid bugs if the passed-in expressions are complex.