Case insensitive #define - c-preprocessor

Is it possible to issue a case insensitive #define statement with the preprocessor?
For example, I want to convert any casing of foobar to spameggs, i.e.:
FooBar -> spameggs
foobar -> spameggs
fooBar -> spameggs
Foobar -> spameggs
FOOBAR -> spameggs
FOOBAr -> spameggs (an odd possibility I know)
etc
The reason behind this is that I want to #define some fortran subroutines to have different names, and they of course are case insensitive. Please note that I don't really care about preserving the capitalisation scheme (which in the last example seems kinda nonsense).

Alas, as you know, C identifiers are case sensitive. Hence, so are pre-processor symbols (if one were case sensitive and the other were not, you could get some very strange behaviours when you intended to alter just one of the symbols with the preprocessor). There isn't an override flag for this behaviour, nor an alternative define construct (at least that I'm aware of in GCC compiler front-ends for C/++).
The most obvious solution will be to grep your code for foobar, case insensitively. Use the results to build a table of all possible foobar casings and either
Correct them all to one consistent casing
Create a single pre-processor file that does the redefinitions for all the cases.
In the later solution, you needn't pollute some human readable code with this - just machine generate a FixFooBar.h file full of these remappings, and include it where needed.

Have you tried using the entry command:
subroutine name1 (args)
entry name2 (args)
entry name3 (args)
....
return
end

Related

'Reverse' a collection of C preprocessor macros easily

I have a lot of preprocessor macro definitions, like this:
#define FOO 1
#define BAR 2
#define BAZ 3
In the real application, each definition corresponds to an instruction in an interpreter virtual machine. The macros are also not sequential in numbering to leave space for future instructions; there may be a #define FOO 41, then the next one is #define BAR 64.
I'm now working on a debugger for this virtual machine, and need to effectively 'reverse' these preprecessor macros. In other words, I need a function which takes the number and returns the macro name, e.g. an input of 2 returns "BAR".
Of course, I could create a function using a switch myself:
const char* instruction_by_id(int id) {
switch (id) {
case FOO:
return "FOO";
case BAR:
return "BAR";
case BAZ:
return "BAZ";
default:
return "???";
}
}
However, this will a nightmare to maintain, since renaming, removing or adding instructions will require this function to be modified too.
Is there another macro which I can use to create a function like this for me, or is there some other approach? If not, is it possible to create a macro to perform this task?
I'm using gcc 6.3 on Windows 10.
You have the wrong approach. Read SICP if you have not read it.
I have a lot of preprocessor macro definitions, like this:
#define FOO 1
#define BAR 2
#define BAZ 3
Remember that C or C++ code can be generated, and it is quite easy to instruct your build automation tool to generate some particular C file (with GNU make or ninja you just add some rule or recipe).
For example, you could use some different preprocessor (liek GPP or m4), or some script -e.g. in awk or Python or Guile, etc..., or write your own program (in C, C++, Ocaml, etc...), to generate the header file containing these #define-s. And another script or program (or the same one, invoked differently) could generate the C code of instruction_by_id
Such basic metaprogramming techniques (of generating some or several C files from something higher level but specific) have been used since at least the 1980s (e.g. with yacc or RPCGEN). The C preprocessor facilitates that with its #include directive (since you can even include lines inside some function body, etc...). Actually, the idea that code is data (and proof) and data is code is even older (Church-Turing thesis, Curry-Howard correspondence, Halting problem). The Gödel, Escher, Bach book is very entertaining....
For example, you could decide to have a textual file opcodes.txt (or even some sqlite database containing stuff....) like
# ignore lines starting with an hashsign
FOO 1
BAR 2
and have two small awk or Python scripts (or two tiny C specialized programs), one generating the #define-s (into opcode-defines.h) and another generating the body of instruction_by_id (into opcode-instr.inc). Then you need to adapt your Makefile to generate these, and put #include "opcode-defines.h" inside some global header, and have
const char* instruction_by_id(int id) {
switch (id) {
#include "opcode-instr.inc"
default: return "???";
}
}
this will a nightmare to maintain,
Not so with such a metaprogramming approach. You'll just maintain opcodes.txt and the scripts using it, but you express a given "knowledge element" (the relation of FOO to 1) only once (in a single line of opcode.txt). Of course you need to document that (at the very least, with comments in your Makefile).
Metaprogramming from some higher-level, declarative formalization, is a very powerful paradigm. In France, J.Pitrat pioneered it (and he is writing an interesting blog today, while being retired) since the 1960s. In the US, J.MacCarthy and the Lisp community also.
For an entertaining talk, see Liam Proven FOSDEM 2018 talk on The circuit less traveled
Large software are using that metaprogramming approach quite often. For example, the GCC compiler have about a dozen of C++ code generators (in total, they are emitting more than a million of C++ lines).
Another way of looking at such an approach is the idea of domain-specific languages that could be compiled to C. If you use an operating system providing dynamic loading, you can even write a program emitting C code, forking a process to compile it into some plugin, then loading that plugin (on POSIX or Linux, with dlopen). Interestingly, computers are now fast enough to enable such an approach in an interactive application (in some sort of REPL): you can emit a C file of a few thousand lines, compile it into some .so shared object file, and dlopen that, in a fraction of second. You could also use JIT-compiling libraries like GCCJIT or LLVM to generate code at runtime. You could embed an interpreter (like Lua or Guile) into your program.
BTW, metaprogramming approaches is one of the reasons why basic compilation techniques should be known by most developers (and not only just people in the compiler business); another reason is that parsing problems are very common. So read the Dragon Book.
Be aware of Greenspun's tenth rule. It is much more than a joke, actually a profound truth about large software.
In a similar case I've resorted to defining a text file format that defines the instructions, and writing a program to read this file and write out the C source of the actual instruction definitions and the C source of functions like your instruction_by_id(). This way you only need to maintain the text file.
As awesome as general code generation is, I’m surprised that nobody mentioned that (if you relax your problem definition just a bit) the C preprocessor is perfectly capable of generating the necessary code, using a technique called X macros. In fact every simple bytecode VM in C that I’ve seen uses this approach.
The technique works as follows. First, there is a file (call it insns.h) containing the authoritative list of instructions,
INSN(FOO, 1)
INSN(BAR, 2)
INSN(BAZ, 3)
or alternatively a macro in some other header containing the same,
#define INSNS \
INSN(FOO, 1) \
INSN(BAR, 2) \
INSN(BAZ, 3)
whichever is more conveinent for you. (I’ll use the first option in the following.) Note that INSN is not defined anywhere. (Traditionally it would be called X, thus the name of the technique.) Wherever you want to loop over your instructions, define INSN to generate the code you want, include insns.h, then undefine INSN again.
In your disassembler, write
const char *instruction_by_id(int id) {
switch (id) {
#define INSN(NAME, VALUE) \
case NAME: return #NAME;
#include "insns.h" /* or just INSNS if you use a macro */
#undef INSN
default: return "???";
}
}
using the prefix stringification operator # to turn names-as-identifiers into names-as-string-literals.
You obviously can’t define the constants this way, because macros cannot define other macros in the C preprocessor. However, if you don’t insist that the instruction constants be preprocessor constants, there’s a different perfectly serviceable constant facility in the C language: enumerations. Whether or not you use an enumerated type, the enumerators defined inside it are regular integer constants from the point of view of the compiler (though not the preprocessor—you cannot use #ifdef with them, for example). So, using an anonymous enumeration type, define your constants like this:
enum {
#define INSN(NAME, VALUE) \
NAME = VALUE,
#include "insns.h" /* or just INSNS if you use a macro */
#undef INSN
NINSNS /* C89 doesn’t allow trailing commas in enumerations (but C99+ does), and you may find this constant useful in any case */
};
If you want to statically initialize an array indexed by your bytecodes, you’ll have to use C99 designated initializers {[FOO] = foovalue, [BAR] = barvalue, /* ... */} whether or not you use X macros. However, if you don’t insist on assigning custom codes to your instructions, you can eliminate VALUE from the above and have the enumeration assign consecutive codes automatically, and then the array can be simply initialized in order, {foovalue, barvalue, /* ... */}. As a bonus, NINSNS above then becomes equal to the number of the instructions and the size of any such array, which is why I called it that.
There are more tricks you can use here. For example, if some instructions have variants for several data types, the instruction list X macro can call the type list X macro to generate the variants automatically. (The somewhat ugly second option of storing the X macro list in a large macro and not an include file may be more handy here.) The INSN macro may take additional arguments such as the mode name, which would ignored in the code list but used to call the appropriate decoding routine in the disassembler. You can use token pasting operator ## to add prefixes to the names of the constants, as in INSN_ ## NAME to generate INSN_FOO, INSN_BAR, etc. And so on.

Is there a tool that checks what predefined macros a C file depends on?

To avoid impossible situation one could reduce the problem to two cases.
Case 1
The first (simplest) case is situation where the preprocessor has a chance to detect it, that is there's a preprocessor directive that depends on a macro being predefined (that is defined before the first line of input) or not. For example:
#ifdef FOO
#define BAR 42
#else
#define BAR 43
#endif
depends on FOO being predefined or not. However the file
#undef FOO
#ifdef FOO
#define BAR 42
#endif
does not. A harder case would be to detect if the dependency actually does matter, which it doesn't in the above cases (as neither FOO or BAR affects the output).
Case 2
The second (harder) case is where successful compilation depends on predefined macros:
INLINE int fubar(void) {
return 42;
}
which is perfectly fine as far as the preprocessor is concerned whether or not ENTRY_POINT is predefined, but unless INLINE is carefully defined that code won't compile. Similarily we could in this case it might be possible to exclude cases where the output isn't affected, but I can't find an example of that. The complication here is that in the example:
int fubar(void) {
return 42;
}
the fubar being predefined can alter the successful compilation of this, so one would probably need to restrict it to cases where a symbol need to be predefined in order to compile successfully.
I guess such a tool would be something similar to a preprocessor (and C parser in the second case). The question is if there is such a tool? Or is there a tool that only handles the first case? Or none at all?
In C everything can be (re)defined, so there is no way to know in advance what is intended to be (re)defined. Usually some naming conventions helps us to figure out what is meant to be a macro (like upper-case). Therefore it is not possible to have such tool. Of course if you assume that the compilation errors are caused by missing macro definitions then you can use them to analyze what is missing.

How can I get the function name as text not string in a macro?

I am trying to use a function-like macro to generate an object-like macro name (generically, a symbol). The following will not work because __func__ (C99 6.4.2.2-1) puts quotes around the function name.
#define MAKE_AN_IDENTIFIER(x) __func__##__##x
The desired result of calling MAKE_AN_IDENTIFIER(NULL_POINTER_PASSED) would be MyFunctionName__NULL_POINTER_PASSED. There may be other reasons this would not work (such as __func__ being taken literally and not interpreted, but I could fix that) but my question is what will provide a predefined macro like __func__ except without the quotes? I believe this is not possible within the C99 standard so valid answers could be references to other preprocessors.
Presently I have simply created my own object-like macro and redefined it manually before each function to be the function name. Obviously this is a poor and probably unacceptable practice. I am aware that I could take an existing cpp program or library and modify it to provide this functionality. I am hoping there is either a commonly used cpp replacement which provides this or a preprocessor library (prefer Python) which is designed for extensibility so as to allow me to 'configure' it to create the macro I need.
I wrote the above to try to provide a concise and well defined question but it is certainly the Y referred to by #Ruud. The X is...
I am trying to manage unique values for reporting errors in an embedded system. The values will be passed as a parameter to a(some) particular function(s). I have already written a Python program using pycparser to parse my code and identify all symbols being passed to the function(s) of interest. It generates a .h file of #defines maintaining the values of previously existing entries, commenting out removed entries (to avoid reusing the value and also allow for reintroduction with the same value), assigning new unique numbers for new identifiers, reporting malformed identifiers, and also reporting multiple use of any given identifier. This means that I can simply write:
void MyFunc(int * p)
{
if (p == NULL)
{
myErrorFunc(MYFUNC_NULL_POINTER_PASSED);
return;
}
// do something actually interesting here
}
and the Python program will create the #define MYFUNC_NULL_POINTER_PASSED 7 (or whatever next available number) for me with all the listed considerations. I have also written a set of macros that further simplify the above to:
#define FUNC MYFUNC
void MyFunc(int * p)
{
RETURN_ASSERT_NOT_NULL(p);
// do something actually interesting here
}
assuming I provide the #define FUNC. I want to use the function name since that will be constant throughout many changes (as opposed to LINE) and will be much easier for someone to transfer the value from the old generated #define to the new generated #define when the function itself is renamed. Honestly, I think the only reason I am trying to 'solve' this 'issue' is because I have to work in C rather than C++. At work we are writing fairly object oriented C and so there is a lot of NULL pointer checking and IsInitialized checking. I have two line functions that turn into 30 because of all these basic checks (these macros reduce those lines by a factor of five). While I do enjoy the challenge of crazy macro development, I much prefer to avoid them. That said, I dislike repeating myself and hiding the functional code in a pile of error checking even more than I dislike crazy macros.
If you prefer to take a stab at this issue, have at.
__FUNCTION__ used to compile to a string literal (I think in gcc 2.96), but it hasn't for many years. Now instead we have __func__, which compiles to a string array, and __FUNCTION__ is a deprecated alias for it. (The change was a bit painful.)
But in neither case was it possible to use this predefined macro to generate a valid C identifier (i.e. "remove the quotes").
But could you instead use the line number rather than function name as part of your identifier?
If so, the following would work. As an example, compiling the following 5-line source file:
#define CONCAT_TOKENS4(a,b,c,d) a##b##c##d
#define EXPAND_THEN_CONCAT4(a,b,c,d) CONCAT_TOKENS4(a,b,c,d)
#define MAKE_AN_IDENTIFIER(x) EXPAND_THEN_CONCAT4(line_,__LINE__,__,x)
static int MAKE_AN_IDENTIFIER(NULL_POINTER_PASSED);
will generate the warning:
foo.c:5: warning: 'line_5__NULL_POINTER_PASSED' defined but not used
As pointed out by others, there is no macro that returns the (unquoted) function name (mainly because the C preprocessor has insufficient syntactic knowledge to recognize functions). You would have to explicitly define such a macro yourself, as you already did yourself:
#define FUNC MYFUNC
To avoid having to do this manually, you could write your own preprocessor to add the macro definition automatically. A similar question is this: How to automatically insert pragmas in your program
If your source code has a consistent coding style (particularly indentation), then a simple line-based filter (sed, awk, perl) might do. In its most naive form: every function starts with a line that does not start with a hash or whitespace, and ends with a closing parenthesis or a comma. With awk:
{
print $0;
}
/^[^# \t].*[,\)][ \t]*$/ {
sub(/\(.*$/, "");
sub(/^.*[ \t]/, "");
print "#define FUNC " toupper($0);
}
For a more robust solution, you need a compiler framework like ROSE.
Gnu-C has a __FUNCTION__ macro, but sadly even that cannot be used in the way you are asking.

Is introspection on a function's arguments in C possible?

For example, in Python I can use getargspec from inspect to access a function's arguments in the follow way:
>>> def test(a,b,c):
... return a*b*c
...
>>> getargspec(test)
ArgSpec(args=['a', 'b', 'c'], varargs=None, keywords=None, defaults=None)
Is this possible to do in C at all? More specifically I am only interested in the arguments' names, I don't particularly care about their types.
The language doesn't include anything along this line at all.
Depending on the implementation, there's a pretty fair chance that if you want this badly enough, you can get at it. To do so, you'll typically have to compile with debugging information enabled, and use code specific to a precise combination of compiler and platform to do it. Most compilers do support creating and accessing debugging information that would include the names of the parameters to a function -- but code to do it will not be portable (and in many cases, it'll also be pretty ugly).
No, all variable names are gone during compilation (except perhaps file-scope variables with "extern" storage duration), so you can't get the declaration names of the arguments.
No, this is absolutely impossible in C, since variable names exist only at compile time.
Probably it would be solvable via macro.
You could define
#define defFunc3(resultType,name,type0,arg0name,type1,arg1name,type2,arg2name) \
...store the variable names somehow... \
...you can access the variable name strings with the # operator, e.g. #arg0name \
...or build a function with concatenation: getargspec_##name \
resultType name(type0 arg0name, type1 arg1name, type2 arg2name )
and then declare your function with this macro:
defFunc3( float, test, float, a, float, b, float, c ) {
return a * b * c;
}
In the macro body, you could somehow store the variable names (with the stringification preprocessor operator) and/or the function address somewhere or create some kind of "getargspec" function (via the concatenation preprocessor operator).
But this will be definitely ugly, error prone and tricky (since you can not execute code in such a function definition directly). I would avoid such macro magic whenever possible.

Slashes and dots in function names and prototypes?

I'm new to C and looking at Go's source tree I found this:
https://code.google.com/p/go/source/browse/src/pkg/runtime/race.c
void runtime∕race·Read(int32 goid, void *addr, void *pc);
void runtime∕race·Write(int32 goid, void *addr, void *pc);
void
runtime·raceinit(void)
{
// ...
}
What do the slashes and dots (·) mean? Is this valid C?
IMPORTANT UPDATE:
The ultimate answer is certainly the one you got from Russ Cox, one of Go authors, on the golang-nuts mailing list. That said, I'm leaving some of my earlier notes below, they might help to understand some things.
Also, from reading this answer linked above, I believe the ∕ "pseudo-slash" may now be translated to regular / slash too (like the middot is translated to dot) in newer versions of Go C compiler than the one I've tested below - but I don't have time to verify.
The file is compiled by the Go Language Suite's internal C compiler, which originates in the Plan 9 C compiler(1)(2), and has some differences (mostly extensions, AFAIK) to the C standard.
One of the extensions is, that it allows UTF-8 characters in identifiers.
Now, in the Go Language Suite's C compiler, the middot character (·) is treated in a special way, as it is translated to a regular dot (.) in object files, which is interpreted by Go Language Suite's internal linker as namespace separator character.
Example
For the following file example.c (note: it must be saved as UTF-8 without BOM):
void ·Bar1() {}
void foo·bar2() {}
void foo∕baz·bar3() {}
the internal C compiler produces the following symbols:
$ go tool 8c example.c
$ go tool nm example.8
T "".Bar1
T foo.bar2
T foo∕baz.bar3
Now, please note I've given the ·Bar1() a capital B. This is
because that way, I can make it visible to regular Go code - because
it is translated to the exact same symbol as would result from
compiling the following Go code:
package example
func Bar1() {} // nm will show: T "".Bar1
Now, regarding the functions you named in the question, the story goes further down the rabbit hole. I'm a bit less sure if I'm right here, but I'll try to explain based on what I know. Thus, each sentence below this point should be read as if it had "AFAIK" written just at the end.
So, the next missing piece needed to better understand this puzzle, is to know something more about the strange "" namespace, and how the Go suite's linker handles it. The "" namespace is what we might want to call an "empty" (because "" for a programmer means "an empty string") namespace, or maybe better, a "placeholder" namespace. And when the linker sees an import going like this:
import examp "path/to/package/example"
//...
func main() {
examp.Bar1()
}
then it takes the $GOPATH/pkg/.../example.a library file, and during import phase substitutes on the fly each "" with path/to/package/example. So now, in the linked program, we will see a symbol like this:
T path/to/package/example.Bar1
The "·" character is \xB7 according to my Javascript console.
The "∕" character is \x2215.
The dot falls within Annex D of the C99 standard lists which special characters which are valid as identifiers in C source. The slash doesn't seem to, so I suspect it's used as something else (perhaps namespacing) via a #define or preprocessor magic.
That would explain why the dot is present in the actual function definition, but the slash is not.
Edit: Check This Answer for some additional information. It's possible that the unicode slash is just allowed by GCC's implementation.
It appears this is not standard C, nor C99. In particular, it both gcc and clang complain about the dot, even when in C99 mode.
This source code is compiled by the Part 9 compiler suite (in particular, ./pkg/tool/darwin_amd64/6c on OS X), which is bootstrapped by the Go build system. According to this document, bottom of page 8, Plan 9 and its compiler do not use ASCII at all, but use Unicode instead. At bottom of page 9, it it stated that any character with a sufficiently high code point is considered valid for use in an identifier name.
There's no pre-processing magic at all - the definition of functions do not match the declaration of functions simply because those are different functions. For example, void runtime∕race·Initialize(); is an external function whose definition appears in ./src/pkg/runtime/race/race.go; likewise for void runtime∕race·MapShadow(…).
The function which appears later, void runtime·raceinit(void), is a completely different function, which is aparant by the fact it actually calls runtime∕race·Initialize();.
The go compiler/runtime is compiled using the C compilers originally developed for plan9. When you build go from source, it'll first build the plan9 compilers, then use those to build Go.
The plan9 compilers support unicode function names [1], and the Go developers use unicode characters in their function names as pseudo namespaces.
[1] It looks like this might actually be standards compliant: g++ unicode variable name but gcc doesn't support unicode function/variable names.

Resources