Create custom macro similar to __COUNTER__ in GCC plugin - c

I am refactoring a program which requires types to have a global unique number which increases by one for every object (ie. the max unique number should be the the amount of types declared with this ID) in the entire project (__COUNTER__ only works for the current file).
This currently looks like
struct foo {
static const int index = __GLOBAL_COUNTER__(TYPES, _3862e1e60a2749c2bfd2add9f3ddbb25);
};
A python script is run on this which runs the normal C++ processor then uses regex to find uses of __GLOBAL_COUNTER__ and replaces them with a number. The macro argument number is the name of the counter to use and the second is a UUID (so that that the value is constant between includes).
Issues with approach is that the use of the macro doesn't work properly when mixed with other macros and the python script and regex can replace things I don't intend (eg. in strings). Also having to manually generate a UUID for each type is cumbersome.
So my question is whether it is possible to write a macro as a GCC plugin to provide this functionality and where I should start. I have searched the documentation and read some of the GCC source code, but I haven't found anything.
NOTE: this is generally merged into another macro to save a bit of typing
#define TYPE_INDEX(x) static const int index = __GLOBAL_COUNTER__(TYPES, x);
So if there are other approaches for example changing the syntax so something like this if it is easier would also work, but I am not sure how I would go about it.
indexed_struct foo {
};

Related

Is it possible to get a function name without invoking it?

How can i get a function's name without calling/invoking it, or is that even possible ?
I have an array of sorting functions, my goal is to be able to list the name of each one, dynamically, without having to invoke any.
After searching on the web, i couldn't find any solution that doesn't require the function being invoked and uses __FUNCTION__ or __func__.
The array of functions that is use:
// Pointer to functions
char *(*srtFunc[])(int *, int) = {selection, bubble, recursiveBubble, insertion, recursiveInsertion};
More information about what I want to achieve with this:
I want to loop over each function in the given array, create a file with the name of the function, invoke the function 100 times with different arguments each time, and print the time spent by the function each time in its dedicated file, redo for the remaining functions.
Unfortunately, not easily. C is not built for introspection and doesn't have features like this-- the name of function foo and the call to function foo are compiled down to just some jump and call instructions in the output; the actual name "foo" is essentially a convenience for you when programming and disappears in the compiled output.
The macro __FUNCTION__ is a preprocessor macro-- and as you note it only works within a function, because all it does it tell the preprocessor (as its churning through the text) hey, as you're scanning this token just drop in the name of the function you're currently scanning and then continue on. It's very "dumb" and is upstream of even the compiler.
There are various ways to get the effective result you want here, including most simply just manually building a table of string literals that have the same names as your functions. You can do this in fairly clean ways (see #nielsen's answer for a useful snippet) using macros. But the preprocessor/compiler can't help you derive or enforce a table from the actual functions so you will always have some risk of an issue at runtime when you make changes to it. Unfortunately C just doesn't have the capability for the kind of elegance you're looking for in this design.
You may be able to do something with smart preprocessor tricks, but your code would be difficult to read. I think I would go for the really low-tech solution here and just add an array of the function names matching the array of function pointers:
#define ARRAY_SIZE(A) (sizeof(A)/sizeof(A[0]))
// Pointer to functions
char *(*srtFunc[])(int *, int) = {selection, bubble, recursiveBubble, insertion, recursiveInsertion};
const char *srtFuncNames[] = {"selection", "bubble", "recursiveBubble", "insertion", "recursiveInsertion"};
_Static_assert(ARRAY_SIZE(srtFuncNames)==ARRAY_SIZE(srtFunc), "Function table and names out of synch!");
Having the two definitions just after each other makes it easy to keep them synchronized and the code is easy to read. The _Static_assert (available from C11) will help remembering to add new names as new functions are added.
Alternatively, a structure can be defined holding a function pointer and corresponding name. This can be initialized using a macro as follows:
typedef struct
{
char *(*srtFunc)(int *, int);
const char *srtName;
} sortMethod;
#define SORT_METHOD(S) {(S), #S}
sortMethod methods[] = {
SORT_METHOD(selection),
SORT_METHOD(bubble),
SORT_METHOD(recursiveBubble),
SORT_METHOD(insertion),
SORT_METHOD(recursiveInsertion)
};

#define in C, legal character

There is a C structure
struct a
{
int val1,val2;
}
I have made changes to the code like
struct b
{
int val2;
}
struct a
{
int val1;
struct b b_obj;
}
Now, usage of val2 in the other C files is like a_obj->val2;.
I want to replace its declaration usage and there are a lot of them, so I have defined a macro in the header file where the struct a is defined as follows:
#define a_obj->val2 (a_obj->b_obj.val2)
It's not working. Is -> illegal in the identifier part of a macro definition #define?
Could someone please tell me where am I wrong?
Edit as suggested by #Basile -
It's a legacy source code, a very huge project. Not sure of LOC.
I want to make such changes because I want to make it more modular.
For example I want to group similar fields of the structure under a same name and that's the reason I want to create another struct B with fields which are related to B feature and also common to A.
I can't use Find Replace feature of other text editors, I am using VIM.
This kind of macro magic will get you into trouble soon,
because it is making your source code unreadable and brittle (credits Basile for the phrasing).
But this should work for what you describe.
struct b
{
int val2m;
}
struct a
{
int val1;
struct b b_obj;
}
#define val2 b_obj.val2m
The trick is to give the actual identifier inside the struct declaration a new name (val2m), so that the name all the other code uses can be turned into a magic alias,
which then can contain the modified access to take a detour via the additionally introduced inner struct.
This is only a kind of band-aid for the problematic situation of having to change something backstage in existing code with many references. Only use it if there is no chance of refactoring the code cleanly. ("band-aid", appropriate image by StoryTeller, credits).
I explicitly recommend looking at Basiles answer, for a cleaner more "future-proof" way. It is the way to go to avoid the trouble I predict with using this macro magic. Use it if you are not forced by very good reasons.
As other explained, the preprocessor works only on tokens, and you can only #define a name. Read the documentation of cpp and the C11 standard n1570.
What you want to do is very ugly (and there are few occasions where it is worthwhile). It makes your code messy, unreadable, and brittle.
Learn to use better your source code editor (you probably have some interactive replace, or interactive replace with regexp-s; if you don't, switch to a better editor like GNU emacs or vim - and study the documentation of your editor). You could also use scripting tools like ed, sed, grep, awk etc... to help you in doing those replacements.
In a small project, replacing relevant occurrences of ->val2 (or .val2) with ->b_obj.val2 (or .b_obj.val2) is really easy, even if you have a hundred of them. And that keeps your code readable. Don't forget to use some version control system (to keep both old and new versions of your code).
In a large project of at least a million of lines of source code, you might ask how to find every occurrence of field usage of val2 for a given type (but you should probably name val2 well enough to have most occurrences of it be relevant; in other words, take care of the naming of your fields). That is a very different question (e.g. you could write some GCC plugin to find such occurrences and help you in replacing the relevant ones).
If you are refactoring an old and large legacy code, you need to be sure to keep it readable, and you don't want fancy macro tricks. For example, you might add some static inline function to access that field. And it could be then worthwhile to use some better tools (e.g. a compiler plugin, some kind of C parser, etc...) to help you in that refactoring.
Keep the source code readable by human developers. Otherwise, you are shooting yourself in the foot. What you want to do is unreasonable, it decreases the readability of the code base.
I can't use Find Replace feature of other text editors, I am using VIM.
vim is scriptable (e.g. in lua) and accepts plugins (so if interactive replace is not enough, consider writing some vim plugin or script to help you), and has powerful find-replace-regexp facilities. You might also use some combination of scripts to help you. In many cases they are enough. If they are not, you should explain why.
Also, you could temporarily replace the val2 field of struct a with a unique name like val2_3TYRxW1PuK7 (or whatever is appropriate, making some unique "random-looking" name is easy). Then you run your full build (e.g. after some make clean). The compiler would emit error messages for every place where you need to replace val2 used as a field of struct a (but won't mind for any other occurrence of the val2 name used for some other purpose). That could help you a lot -once you have corrected your code to get rid of all errors- (especially when combined with some editor scripting) because then you just need to replace val2_3TYRxW1PuK7 with b_obj.val2 everywhere.
Is -> illegal in #define?
Yes.
#define identifier can only be letter, number or underscore.
Macros definitions must be regular identifiers, so you can't use any special character like - or >.
I've thinked that may be you can use an union, like this:
struct b
{
int val2;
}
struct a
{
int val1;
union {
struct b b_obj;
int val2;
}
}
so you can still using a_obj->val2.

'Reverse' a collection of C preprocessor macros easily

I have a lot of preprocessor macro definitions, like this:
#define FOO 1
#define BAR 2
#define BAZ 3
In the real application, each definition corresponds to an instruction in an interpreter virtual machine. The macros are also not sequential in numbering to leave space for future instructions; there may be a #define FOO 41, then the next one is #define BAR 64.
I'm now working on a debugger for this virtual machine, and need to effectively 'reverse' these preprecessor macros. In other words, I need a function which takes the number and returns the macro name, e.g. an input of 2 returns "BAR".
Of course, I could create a function using a switch myself:
const char* instruction_by_id(int id) {
switch (id) {
case FOO:
return "FOO";
case BAR:
return "BAR";
case BAZ:
return "BAZ";
default:
return "???";
}
}
However, this will a nightmare to maintain, since renaming, removing or adding instructions will require this function to be modified too.
Is there another macro which I can use to create a function like this for me, or is there some other approach? If not, is it possible to create a macro to perform this task?
I'm using gcc 6.3 on Windows 10.
You have the wrong approach. Read SICP if you have not read it.
I have a lot of preprocessor macro definitions, like this:
#define FOO 1
#define BAR 2
#define BAZ 3
Remember that C or C++ code can be generated, and it is quite easy to instruct your build automation tool to generate some particular C file (with GNU make or ninja you just add some rule or recipe).
For example, you could use some different preprocessor (liek GPP or m4), or some script -e.g. in awk or Python or Guile, etc..., or write your own program (in C, C++, Ocaml, etc...), to generate the header file containing these #define-s. And another script or program (or the same one, invoked differently) could generate the C code of instruction_by_id
Such basic metaprogramming techniques (of generating some or several C files from something higher level but specific) have been used since at least the 1980s (e.g. with yacc or RPCGEN). The C preprocessor facilitates that with its #include directive (since you can even include lines inside some function body, etc...). Actually, the idea that code is data (and proof) and data is code is even older (Church-Turing thesis, Curry-Howard correspondence, Halting problem). The Gödel, Escher, Bach book is very entertaining....
For example, you could decide to have a textual file opcodes.txt (or even some sqlite database containing stuff....) like
# ignore lines starting with an hashsign
FOO 1
BAR 2
and have two small awk or Python scripts (or two tiny C specialized programs), one generating the #define-s (into opcode-defines.h) and another generating the body of instruction_by_id (into opcode-instr.inc). Then you need to adapt your Makefile to generate these, and put #include "opcode-defines.h" inside some global header, and have
const char* instruction_by_id(int id) {
switch (id) {
#include "opcode-instr.inc"
default: return "???";
}
}
this will a nightmare to maintain,
Not so with such a metaprogramming approach. You'll just maintain opcodes.txt and the scripts using it, but you express a given "knowledge element" (the relation of FOO to 1) only once (in a single line of opcode.txt). Of course you need to document that (at the very least, with comments in your Makefile).
Metaprogramming from some higher-level, declarative formalization, is a very powerful paradigm. In France, J.Pitrat pioneered it (and he is writing an interesting blog today, while being retired) since the 1960s. In the US, J.MacCarthy and the Lisp community also.
For an entertaining talk, see Liam Proven FOSDEM 2018 talk on The circuit less traveled
Large software are using that metaprogramming approach quite often. For example, the GCC compiler have about a dozen of C++ code generators (in total, they are emitting more than a million of C++ lines).
Another way of looking at such an approach is the idea of domain-specific languages that could be compiled to C. If you use an operating system providing dynamic loading, you can even write a program emitting C code, forking a process to compile it into some plugin, then loading that plugin (on POSIX or Linux, with dlopen). Interestingly, computers are now fast enough to enable such an approach in an interactive application (in some sort of REPL): you can emit a C file of a few thousand lines, compile it into some .so shared object file, and dlopen that, in a fraction of second. You could also use JIT-compiling libraries like GCCJIT or LLVM to generate code at runtime. You could embed an interpreter (like Lua or Guile) into your program.
BTW, metaprogramming approaches is one of the reasons why basic compilation techniques should be known by most developers (and not only just people in the compiler business); another reason is that parsing problems are very common. So read the Dragon Book.
Be aware of Greenspun's tenth rule. It is much more than a joke, actually a profound truth about large software.
In a similar case I've resorted to defining a text file format that defines the instructions, and writing a program to read this file and write out the C source of the actual instruction definitions and the C source of functions like your instruction_by_id(). This way you only need to maintain the text file.
As awesome as general code generation is, I’m surprised that nobody mentioned that (if you relax your problem definition just a bit) the C preprocessor is perfectly capable of generating the necessary code, using a technique called X macros. In fact every simple bytecode VM in C that I’ve seen uses this approach.
The technique works as follows. First, there is a file (call it insns.h) containing the authoritative list of instructions,
INSN(FOO, 1)
INSN(BAR, 2)
INSN(BAZ, 3)
or alternatively a macro in some other header containing the same,
#define INSNS \
INSN(FOO, 1) \
INSN(BAR, 2) \
INSN(BAZ, 3)
whichever is more conveinent for you. (I’ll use the first option in the following.) Note that INSN is not defined anywhere. (Traditionally it would be called X, thus the name of the technique.) Wherever you want to loop over your instructions, define INSN to generate the code you want, include insns.h, then undefine INSN again.
In your disassembler, write
const char *instruction_by_id(int id) {
switch (id) {
#define INSN(NAME, VALUE) \
case NAME: return #NAME;
#include "insns.h" /* or just INSNS if you use a macro */
#undef INSN
default: return "???";
}
}
using the prefix stringification operator # to turn names-as-identifiers into names-as-string-literals.
You obviously can’t define the constants this way, because macros cannot define other macros in the C preprocessor. However, if you don’t insist that the instruction constants be preprocessor constants, there’s a different perfectly serviceable constant facility in the C language: enumerations. Whether or not you use an enumerated type, the enumerators defined inside it are regular integer constants from the point of view of the compiler (though not the preprocessor—you cannot use #ifdef with them, for example). So, using an anonymous enumeration type, define your constants like this:
enum {
#define INSN(NAME, VALUE) \
NAME = VALUE,
#include "insns.h" /* or just INSNS if you use a macro */
#undef INSN
NINSNS /* C89 doesn’t allow trailing commas in enumerations (but C99+ does), and you may find this constant useful in any case */
};
If you want to statically initialize an array indexed by your bytecodes, you’ll have to use C99 designated initializers {[FOO] = foovalue, [BAR] = barvalue, /* ... */} whether or not you use X macros. However, if you don’t insist on assigning custom codes to your instructions, you can eliminate VALUE from the above and have the enumeration assign consecutive codes automatically, and then the array can be simply initialized in order, {foovalue, barvalue, /* ... */}. As a bonus, NINSNS above then becomes equal to the number of the instructions and the size of any such array, which is why I called it that.
There are more tricks you can use here. For example, if some instructions have variants for several data types, the instruction list X macro can call the type list X macro to generate the variants automatically. (The somewhat ugly second option of storing the X macro list in a large macro and not an include file may be more handy here.) The INSN macro may take additional arguments such as the mode name, which would ignored in the code list but used to call the appropriate decoding routine in the disassembler. You can use token pasting operator ## to add prefixes to the names of the constants, as in INSN_ ## NAME to generate INSN_FOO, INSN_BAR, etc. And so on.

How can I get the function name as text not string in a macro?

I am trying to use a function-like macro to generate an object-like macro name (generically, a symbol). The following will not work because __func__ (C99 6.4.2.2-1) puts quotes around the function name.
#define MAKE_AN_IDENTIFIER(x) __func__##__##x
The desired result of calling MAKE_AN_IDENTIFIER(NULL_POINTER_PASSED) would be MyFunctionName__NULL_POINTER_PASSED. There may be other reasons this would not work (such as __func__ being taken literally and not interpreted, but I could fix that) but my question is what will provide a predefined macro like __func__ except without the quotes? I believe this is not possible within the C99 standard so valid answers could be references to other preprocessors.
Presently I have simply created my own object-like macro and redefined it manually before each function to be the function name. Obviously this is a poor and probably unacceptable practice. I am aware that I could take an existing cpp program or library and modify it to provide this functionality. I am hoping there is either a commonly used cpp replacement which provides this or a preprocessor library (prefer Python) which is designed for extensibility so as to allow me to 'configure' it to create the macro I need.
I wrote the above to try to provide a concise and well defined question but it is certainly the Y referred to by #Ruud. The X is...
I am trying to manage unique values for reporting errors in an embedded system. The values will be passed as a parameter to a(some) particular function(s). I have already written a Python program using pycparser to parse my code and identify all symbols being passed to the function(s) of interest. It generates a .h file of #defines maintaining the values of previously existing entries, commenting out removed entries (to avoid reusing the value and also allow for reintroduction with the same value), assigning new unique numbers for new identifiers, reporting malformed identifiers, and also reporting multiple use of any given identifier. This means that I can simply write:
void MyFunc(int * p)
{
if (p == NULL)
{
myErrorFunc(MYFUNC_NULL_POINTER_PASSED);
return;
}
// do something actually interesting here
}
and the Python program will create the #define MYFUNC_NULL_POINTER_PASSED 7 (or whatever next available number) for me with all the listed considerations. I have also written a set of macros that further simplify the above to:
#define FUNC MYFUNC
void MyFunc(int * p)
{
RETURN_ASSERT_NOT_NULL(p);
// do something actually interesting here
}
assuming I provide the #define FUNC. I want to use the function name since that will be constant throughout many changes (as opposed to LINE) and will be much easier for someone to transfer the value from the old generated #define to the new generated #define when the function itself is renamed. Honestly, I think the only reason I am trying to 'solve' this 'issue' is because I have to work in C rather than C++. At work we are writing fairly object oriented C and so there is a lot of NULL pointer checking and IsInitialized checking. I have two line functions that turn into 30 because of all these basic checks (these macros reduce those lines by a factor of five). While I do enjoy the challenge of crazy macro development, I much prefer to avoid them. That said, I dislike repeating myself and hiding the functional code in a pile of error checking even more than I dislike crazy macros.
If you prefer to take a stab at this issue, have at.
__FUNCTION__ used to compile to a string literal (I think in gcc 2.96), but it hasn't for many years. Now instead we have __func__, which compiles to a string array, and __FUNCTION__ is a deprecated alias for it. (The change was a bit painful.)
But in neither case was it possible to use this predefined macro to generate a valid C identifier (i.e. "remove the quotes").
But could you instead use the line number rather than function name as part of your identifier?
If so, the following would work. As an example, compiling the following 5-line source file:
#define CONCAT_TOKENS4(a,b,c,d) a##b##c##d
#define EXPAND_THEN_CONCAT4(a,b,c,d) CONCAT_TOKENS4(a,b,c,d)
#define MAKE_AN_IDENTIFIER(x) EXPAND_THEN_CONCAT4(line_,__LINE__,__,x)
static int MAKE_AN_IDENTIFIER(NULL_POINTER_PASSED);
will generate the warning:
foo.c:5: warning: 'line_5__NULL_POINTER_PASSED' defined but not used
As pointed out by others, there is no macro that returns the (unquoted) function name (mainly because the C preprocessor has insufficient syntactic knowledge to recognize functions). You would have to explicitly define such a macro yourself, as you already did yourself:
#define FUNC MYFUNC
To avoid having to do this manually, you could write your own preprocessor to add the macro definition automatically. A similar question is this: How to automatically insert pragmas in your program
If your source code has a consistent coding style (particularly indentation), then a simple line-based filter (sed, awk, perl) might do. In its most naive form: every function starts with a line that does not start with a hash or whitespace, and ends with a closing parenthesis or a comma. With awk:
{
print $0;
}
/^[^# \t].*[,\)][ \t]*$/ {
sub(/\(.*$/, "");
sub(/^.*[ \t]/, "");
print "#define FUNC " toupper($0);
}
For a more robust solution, you need a compiler framework like ROSE.
Gnu-C has a __FUNCTION__ macro, but sadly even that cannot be used in the way you are asking.

many wrapping functions in C

My C header file contains about 300 various functions, their names all beginning with "foo_db_" and accepting a "db_t" as their first parameter (knowing what is exactly a db_t is no really relevant here, it's just a struct).
function foo_db_my_first_function(db_t *db, char *param1, int param2);
function foo_db_my_second_function(db_t *db, double param1, const char *param2, int param3);
(...)
function foo_db_my_Nth_function(db_t *db, int param1);
My job is to write another 300 wrapping functions named "foo_XXXX" (XXXX begin the suffix of the "foo_db_" function) with a default value for the first parameter.
static __inline function foo_my_first_function(char *param1, int param2) {
foo_db_my_first_function(DEFAULT_DB, param1, param2);
}
(...)
I was wondering if I could write some macros to ease my job: declare the "db" function and the corresponding "default" function (without the first parameter).
Unfortunately, I cannot use C99 and variadic macros arguments :( so I think I'm screwed :), but I prefer to ask first here before burning my fingers to write those 300 functions :/
Assuming the original header file for the API is regular enough, then a script in your favorite text processing language (Perl, Lua, Python, Awk, or even /bin/sh in a pinch) will likely be the simplest approach.
Your script would collect all public function declarations using a regex or simple text matching to identify them (likely based on the foo_db_ prefix). It could then write two output files. First, a suitable .h file declaring your wrappers, and second the .c source file implementing them by stuffing DEFAULT_DB into their first parameter. You will need to do a minimal amount of work to copy the rest of the parameters through, but with luck the declarations are all regular enough that the text manipulation can be as simple as "rest of line" or the like.
Having done that, I would check the script into revision control, and get it invoked at build time, treating the generated files as transient build products. However, if you don't have a sufficiently flexible build system (this is why I still perfer make to nearly everything else I've seen proposed) then you will have to find a suitable kludge to signal that your generated default wrappers are out of date when the API changes.
This approach will require investing some time in the code generator script, but you should be ahead on that well before the time you imagine hand-coding your 100th wrapper. And the second time you run it....
In extreme cases, you could end up needing to implement much of the front-end of a C compiler. In that case, I see two approaches that are both more socially acceptable than arranging a meeting with the architect in a dark alley. First, there is a GCC back-end that emits its AST in XML; the resulting XML is a bear, but has been reduced down to a tree of tokens that can be manipulated. Second, there is always LPeg, a full parser that is easily used from Lua (and I suspect that there are other PEG parsers out there for other scripting languages too). Sample code for LPeg that lexes and parses C is referenced at the Wiki page.
Do it in Excel. Create a cell with "foo_db_ (db_t *db)", drag it down as many places as you need, fill in all the blanks, then copy it all into your program (you can test that the copy will work ahead of time, but I just tried with Notepad and it seems to work as intended). Now you have all your function headers, and can fill in the rest from there.

Resources