For loop macro which unrolled on the pre-processor phase? - c

I want to use gcc pre-processor to write almost the same code declaration for 500 times. let's say for demonstration purposes I would like to use a macro FOR_MACRO:
#define FOR_MACRO(x) \
#for i in {1 ... x}: \
const int arr_len_##x[i] = {i};
and calling FOR_MACRO(100) will be converted into:
const int arr_len_1[1] = {1};
const int arr_len_2[2] = {2};
...
const int arr_len_100[100] = {100};

This is not a good idea:
While possible in principle, using the preprocessor means you have to manually unroll the loop at least once, you end up with some arbitrary implementation-defined limit on loop depth and all statements will be generated in a single line.
Better use the scripting language of your choice to generate the code (possibly in a separate includeable file) and integrate that with your build process.

You can use Order-PP for this, if you desperately need to.
It's a scripting language implemented in the preprocessor. This means it's conceptually similar to using a scripting language to generate C code (in fact, the same) except there are no external tools and the script runs at the same time as the C compiler: everything is done with C macros. Despite being built on the preprocessor, there are no real limits to loop iterations, recursion depth, or anything like that (the limit is somewhere in the billions, you don't need to worry about it).
To emit the code requested in the question example, you could write:
#include <order/interpreter.h>
ORDER_PP( // runs Order code
8for_each_in_range(8fn(8I,
8print( 8cat(8(const int arr_len_), 8I)
([) 8I (] = {) 8I (};) )),
1, 101)
)
I can't fathom why you would do this instead of simply integrating an external language like Python into your build process (Order might be implemented using macros, but it's still a separate language to understand), but the option is there.
Order only works with GCC as far as I know; other preprocessors run out of stack too quickly (even Clang), or are not perfectly standard-compliant.

Instead of providing you with a solution for exactly your problem, are you sure it cannot be handled in a better way?
Maybe it would be better to
use one array with one more dimension
fill the data with the help of an array at runtime, as you obviously want to fill out the first entry of each array. If you leave the array uninitialized, it will (provided it is defined on module level) be put into .bss segment instead of .data and will probably need less space in the binary file.

You could use e.g P99 to do such preprocessor code unrolling. But because of the limited capacities of the preprocessor this comes with a limit, and that limit is normally way below 500.

Related

'Reverse' a collection of C preprocessor macros easily

I have a lot of preprocessor macro definitions, like this:
#define FOO 1
#define BAR 2
#define BAZ 3
In the real application, each definition corresponds to an instruction in an interpreter virtual machine. The macros are also not sequential in numbering to leave space for future instructions; there may be a #define FOO 41, then the next one is #define BAR 64.
I'm now working on a debugger for this virtual machine, and need to effectively 'reverse' these preprecessor macros. In other words, I need a function which takes the number and returns the macro name, e.g. an input of 2 returns "BAR".
Of course, I could create a function using a switch myself:
const char* instruction_by_id(int id) {
switch (id) {
case FOO:
return "FOO";
case BAR:
return "BAR";
case BAZ:
return "BAZ";
default:
return "???";
}
}
However, this will a nightmare to maintain, since renaming, removing or adding instructions will require this function to be modified too.
Is there another macro which I can use to create a function like this for me, or is there some other approach? If not, is it possible to create a macro to perform this task?
I'm using gcc 6.3 on Windows 10.
You have the wrong approach. Read SICP if you have not read it.
I have a lot of preprocessor macro definitions, like this:
#define FOO 1
#define BAR 2
#define BAZ 3
Remember that C or C++ code can be generated, and it is quite easy to instruct your build automation tool to generate some particular C file (with GNU make or ninja you just add some rule or recipe).
For example, you could use some different preprocessor (liek GPP or m4), or some script -e.g. in awk or Python or Guile, etc..., or write your own program (in C, C++, Ocaml, etc...), to generate the header file containing these #define-s. And another script or program (or the same one, invoked differently) could generate the C code of instruction_by_id
Such basic metaprogramming techniques (of generating some or several C files from something higher level but specific) have been used since at least the 1980s (e.g. with yacc or RPCGEN). The C preprocessor facilitates that with its #include directive (since you can even include lines inside some function body, etc...). Actually, the idea that code is data (and proof) and data is code is even older (Church-Turing thesis, Curry-Howard correspondence, Halting problem). The Gödel, Escher, Bach book is very entertaining....
For example, you could decide to have a textual file opcodes.txt (or even some sqlite database containing stuff....) like
# ignore lines starting with an hashsign
FOO 1
BAR 2
and have two small awk or Python scripts (or two tiny C specialized programs), one generating the #define-s (into opcode-defines.h) and another generating the body of instruction_by_id (into opcode-instr.inc). Then you need to adapt your Makefile to generate these, and put #include "opcode-defines.h" inside some global header, and have
const char* instruction_by_id(int id) {
switch (id) {
#include "opcode-instr.inc"
default: return "???";
}
}
this will a nightmare to maintain,
Not so with such a metaprogramming approach. You'll just maintain opcodes.txt and the scripts using it, but you express a given "knowledge element" (the relation of FOO to 1) only once (in a single line of opcode.txt). Of course you need to document that (at the very least, with comments in your Makefile).
Metaprogramming from some higher-level, declarative formalization, is a very powerful paradigm. In France, J.Pitrat pioneered it (and he is writing an interesting blog today, while being retired) since the 1960s. In the US, J.MacCarthy and the Lisp community also.
For an entertaining talk, see Liam Proven FOSDEM 2018 talk on The circuit less traveled
Large software are using that metaprogramming approach quite often. For example, the GCC compiler have about a dozen of C++ code generators (in total, they are emitting more than a million of C++ lines).
Another way of looking at such an approach is the idea of domain-specific languages that could be compiled to C. If you use an operating system providing dynamic loading, you can even write a program emitting C code, forking a process to compile it into some plugin, then loading that plugin (on POSIX or Linux, with dlopen). Interestingly, computers are now fast enough to enable such an approach in an interactive application (in some sort of REPL): you can emit a C file of a few thousand lines, compile it into some .so shared object file, and dlopen that, in a fraction of second. You could also use JIT-compiling libraries like GCCJIT or LLVM to generate code at runtime. You could embed an interpreter (like Lua or Guile) into your program.
BTW, metaprogramming approaches is one of the reasons why basic compilation techniques should be known by most developers (and not only just people in the compiler business); another reason is that parsing problems are very common. So read the Dragon Book.
Be aware of Greenspun's tenth rule. It is much more than a joke, actually a profound truth about large software.
In a similar case I've resorted to defining a text file format that defines the instructions, and writing a program to read this file and write out the C source of the actual instruction definitions and the C source of functions like your instruction_by_id(). This way you only need to maintain the text file.
As awesome as general code generation is, I’m surprised that nobody mentioned that (if you relax your problem definition just a bit) the C preprocessor is perfectly capable of generating the necessary code, using a technique called X macros. In fact every simple bytecode VM in C that I’ve seen uses this approach.
The technique works as follows. First, there is a file (call it insns.h) containing the authoritative list of instructions,
INSN(FOO, 1)
INSN(BAR, 2)
INSN(BAZ, 3)
or alternatively a macro in some other header containing the same,
#define INSNS \
INSN(FOO, 1) \
INSN(BAR, 2) \
INSN(BAZ, 3)
whichever is more conveinent for you. (I’ll use the first option in the following.) Note that INSN is not defined anywhere. (Traditionally it would be called X, thus the name of the technique.) Wherever you want to loop over your instructions, define INSN to generate the code you want, include insns.h, then undefine INSN again.
In your disassembler, write
const char *instruction_by_id(int id) {
switch (id) {
#define INSN(NAME, VALUE) \
case NAME: return #NAME;
#include "insns.h" /* or just INSNS if you use a macro */
#undef INSN
default: return "???";
}
}
using the prefix stringification operator # to turn names-as-identifiers into names-as-string-literals.
You obviously can’t define the constants this way, because macros cannot define other macros in the C preprocessor. However, if you don’t insist that the instruction constants be preprocessor constants, there’s a different perfectly serviceable constant facility in the C language: enumerations. Whether or not you use an enumerated type, the enumerators defined inside it are regular integer constants from the point of view of the compiler (though not the preprocessor—you cannot use #ifdef with them, for example). So, using an anonymous enumeration type, define your constants like this:
enum {
#define INSN(NAME, VALUE) \
NAME = VALUE,
#include "insns.h" /* or just INSNS if you use a macro */
#undef INSN
NINSNS /* C89 doesn’t allow trailing commas in enumerations (but C99+ does), and you may find this constant useful in any case */
};
If you want to statically initialize an array indexed by your bytecodes, you’ll have to use C99 designated initializers {[FOO] = foovalue, [BAR] = barvalue, /* ... */} whether or not you use X macros. However, if you don’t insist on assigning custom codes to your instructions, you can eliminate VALUE from the above and have the enumeration assign consecutive codes automatically, and then the array can be simply initialized in order, {foovalue, barvalue, /* ... */}. As a bonus, NINSNS above then becomes equal to the number of the instructions and the size of any such array, which is why I called it that.
There are more tricks you can use here. For example, if some instructions have variants for several data types, the instruction list X macro can call the type list X macro to generate the variants automatically. (The somewhat ugly second option of storing the X macro list in a large macro and not an include file may be more handy here.) The INSN macro may take additional arguments such as the mode name, which would ignored in the code list but used to call the appropriate decoding routine in the disassembler. You can use token pasting operator ## to add prefixes to the names of the constants, as in INSN_ ## NAME to generate INSN_FOO, INSN_BAR, etc. And so on.

initialising constant static array with algorhythm [duplicate]

I am thinking about the following problem: I want to program a microcontroller (let's say an AVR mega type) with a program that uses some sort of look-up tables.
The first attempt would be to locate the table in a separate file and create it using any other scripting language/program/.... In this case there is quite some effort in creating the necessary source files for C.
My thought was now to use the preprocessor and compiler to handle things. I tried to implement this with a table of sine values (just as an example):
#include <avr/io.h>
#include <math.h>
#define S1(i,n) ((uint8_t) sin(M_PI*(i)/n*255))
#define S4(i,n) S1(i,n), S1(i+1,n), S1(i+2,n), S1(i+3,n)
uint8_t lut[] = {S4(0,4)};
void main()
{
uint8_t val, i;
for(i=0; i<4; i++)
{
val = lut[i];
}
}
If I compile this code I get warnings about the sin function. Further in the assembly there is nothing in the section .data. If I just remove the sin in the third line I get the data in the assembly. Clearly all information are available at compile time.
Can you tell me if there is a way to achieve what I intent: The compiler calculates as many values as offline possible? Or is the best way to go using an external script/program/... to calculate the table entries and add these to a separate file that will just be #included?
The general problem here is that sin call makes this initialization de facto illegal, according to rules of C language, as it's not constant expression per se and you're initializing array of static storage duration, which requires that. This also explains why your array is not in .data section.
C11 (N1570) §6.6/2,3 Constant expressions (emphasis mine)
A constant expression can be evaluated during translation rather than
runtime, and accordingly may be used in any place that a constant may
be.
Constant expressions shall not contain assignment, increment,
decrement, function-call, or comma operators, except when they are
contained within a subexpression that is not evaluated.115)
However as by #ShafikYaghmour's comment GCC will replace sin function call with its built-in counterpart (unless -fno-builtin option is present), that is likely to be treated as constant expression. According to 6.57 Other Built-in Functions Provided by GCC:
GCC includes built-in versions of many of the functions in the
standard C library. The versions prefixed with __builtin_ are always
treated as having the same meaning as the C library function even if
you specify the -fno-builtin option.
What you are trying is not part of the C language. In situations like this, I have written code following this pattern:
#if GENERATE_SOURCECODE
int main (void)
{
... Code that uses printf to write C code to stdout
}
#else
// Source code generated by the code above
... Here I paste in what the code above generated
// The rest of the program
#endif
Every time you need to change it, you run the code with GENERATE_SOURCECODE defined, and paste in the output. Works well if your code is self contained and the generated output only ever changes if the code generating it changes.
First of all, it should go without saying that you should evaluate (probably by experiment) whether this is worth doing. Your lookup table is going to increase your data size and programmer effort, but may or may not provide a runtime speed increase that you need.
If you still want to do it, I don't think the C preprocessor can do it straightforwardly, because it has no facilities for iteration or recursion.
The most robust way to go about this would be to write a program in C or some other language to print out C source for the table, and then include that file in your program using the preprocessor. If you are using a tool like make, you can create a rule to generate the table file and have your .c file depend on that file.
On the other hand, if you are sure you are never going to change this table, you could write a program to generate it once and just paste it in.

Frama-C code slicer not loading any C files

I have a 1000 lines C file with 10 maths algorithms written by a professor, I need to delete 9 maths functions and all their dependencies from the 1000 lines, so i am having a go using Frama-C Boron windows binary installer.
Now it won't load the simplest example.c file... i select source file and nothing loads.
Boron edition is from 2010 so i checked how to compile a later Frama-C: they say having a space in my windows 7 user name can cause problems, which is not encouraging.
Is Frama-C my best option for my slicing task?
Here is an example file that won't load:
// File swap.c:
/*# requires \valid(a) && \valid(b);
# ensures A: *a == \old(*b) ;
# ensures B: *b == \old(*a) ;
# assigns *a,*b ;
#*/
void swap(int *a,int *b)
{
int tmp = *a ;
*a = *b ;
*b = tmp ;
return ;
}
Here is the code i wish to take only one function from, the option labelled smooth and swvd. https://sites.google.com/site/kootsoop/Home/cohens_class_code
I looked at the code you linked to, and it does not seem like the best candidate for a Frama-C analysis. For instance, that code is not strictly C99-conforming, using e.g. some old-style prototypes (including implicit int return types), functions that are used before they are defined without forward declarations (fft), and a missing header inclusion (stdlib.h). Those are not big issues, since the changes are relatively simple, and some of them are treated similarly to how gcc -std=c99 works: they emit warnings but not errors. However, it's important to notice that they do require a non-zero amount of time, therefore this won't be a "plug-and-play" solution.
On a more general note, Frama-C relies on CIL (C Intermediate Language) for C code normalization, so the sliced program will probably not be identical to "the original program minus the sliced statements". If the objective is merely to remove some statements but keep the code syntactically identical otherwise, then Frama-C will not be ideal1.
Finally, it is worth noting that some Frama-C analyses can help finding dead code, and the result is even clearer if the code is already split into functions. For instance, using Value analysis on a properly configured program, it is possible to see which statements/functions are never executed. But this does rely on the absence of at least some kinds of undefined behavior.
E.g. if uninitialized variables are used in your program (which is forbidden by the C standard, but occasionally happens and goes unnoticed), the Value analysis will stop its propagation and the code afterwards may be marked as dead, since it is semantically dead w.r.t. the standard. It's important to be aware of that, since a naive approach would be misleading.
Overall, for the code size you mention, I'm not sure Frama-C would be a cost-effective approach, especially if (1) you have never used Frama-C and are having trouble compiling it (Boron is a really old release, not recommended) and (2) if you already know your code base, and therefore would be relatively proficient in manually slicing its parts.
1That said, I do not know of any C slicer that preserves statements like that; my point is that, while intuitively one might think that a C slicer would straightforwardly preserve most of the syntax, C is such a devious language that doing so is very hard, hence why most tools will do some normalization steps beforehand.

How to make GCC evaluate functions at compile time?

I am thinking about the following problem: I want to program a microcontroller (let's say an AVR mega type) with a program that uses some sort of look-up tables.
The first attempt would be to locate the table in a separate file and create it using any other scripting language/program/.... In this case there is quite some effort in creating the necessary source files for C.
My thought was now to use the preprocessor and compiler to handle things. I tried to implement this with a table of sine values (just as an example):
#include <avr/io.h>
#include <math.h>
#define S1(i,n) ((uint8_t) sin(M_PI*(i)/n*255))
#define S4(i,n) S1(i,n), S1(i+1,n), S1(i+2,n), S1(i+3,n)
uint8_t lut[] = {S4(0,4)};
void main()
{
uint8_t val, i;
for(i=0; i<4; i++)
{
val = lut[i];
}
}
If I compile this code I get warnings about the sin function. Further in the assembly there is nothing in the section .data. If I just remove the sin in the third line I get the data in the assembly. Clearly all information are available at compile time.
Can you tell me if there is a way to achieve what I intent: The compiler calculates as many values as offline possible? Or is the best way to go using an external script/program/... to calculate the table entries and add these to a separate file that will just be #included?
The general problem here is that sin call makes this initialization de facto illegal, according to rules of C language, as it's not constant expression per se and you're initializing array of static storage duration, which requires that. This also explains why your array is not in .data section.
C11 (N1570) §6.6/2,3 Constant expressions (emphasis mine)
A constant expression can be evaluated during translation rather than
runtime, and accordingly may be used in any place that a constant may
be.
Constant expressions shall not contain assignment, increment,
decrement, function-call, or comma operators, except when they are
contained within a subexpression that is not evaluated.115)
However as by #ShafikYaghmour's comment GCC will replace sin function call with its built-in counterpart (unless -fno-builtin option is present), that is likely to be treated as constant expression. According to 6.57 Other Built-in Functions Provided by GCC:
GCC includes built-in versions of many of the functions in the
standard C library. The versions prefixed with __builtin_ are always
treated as having the same meaning as the C library function even if
you specify the -fno-builtin option.
What you are trying is not part of the C language. In situations like this, I have written code following this pattern:
#if GENERATE_SOURCECODE
int main (void)
{
... Code that uses printf to write C code to stdout
}
#else
// Source code generated by the code above
... Here I paste in what the code above generated
// The rest of the program
#endif
Every time you need to change it, you run the code with GENERATE_SOURCECODE defined, and paste in the output. Works well if your code is self contained and the generated output only ever changes if the code generating it changes.
First of all, it should go without saying that you should evaluate (probably by experiment) whether this is worth doing. Your lookup table is going to increase your data size and programmer effort, but may or may not provide a runtime speed increase that you need.
If you still want to do it, I don't think the C preprocessor can do it straightforwardly, because it has no facilities for iteration or recursion.
The most robust way to go about this would be to write a program in C or some other language to print out C source for the table, and then include that file in your program using the preprocessor. If you are using a tool like make, you can create a rule to generate the table file and have your .c file depend on that file.
On the other hand, if you are sure you are never going to change this table, you could write a program to generate it once and just paste it in.

initializng a C array with repetitive data

I would like to initialize an array of structs, with the same element repetitively, ie
struct st ar[] = { {1,2}, {1,2}, {1,2} };
However I do NOT want to run any code for that, I wish that the layout of the memory upon program's execution would be like so, without any CPU instructions involved (it would increase boot time on very slow CPU and relatively large arrays).
This makes sense, when using the array as mini ad-hoc database (maps id to struct) and when one wishes to use a default value to all database values.
My best solution was to use something of the form
#define I {1,2}
struct st ar[SIZE_OF_ARRAY] = { I,I,I };
So that the compiler will warn me if I'm having too much or too little Is. But this is far from ideal.
I think there's no solution for that in ANSI-C, but I thought that maybe there's a macro-abuse, or gcc extension that would do the work. Ideally I would like a standard solution, but even compiler specific ones would suffice.
I thought That I would somehow be able to define a macro recursively so that I(27) would be resolved to 27 {1,2}s, but I don't think that's possible. But maybe I'm mistaken, is there any hack for that?
Maybe inline assemby would do the trick? It would be very easy to define such memory layout with MASM or TASM, but I'm not sure it is possible to embed memory layout instructions within C code.
Is there any linker trick that would lure it to initialize memory according to my orders?
PS
I know I can generate automatically C file with some script. Using custom scripts is not desirable. If I'd use a custom script, I'll invent a C-macro REP(count,exp,sep) and would write a mini-C-preprocessor to replace that with exp sep exp sep ... exp {exp appears count time}.
The boost preprocessor library (which works fine for C) could help.
#include <boost/preprocessor/repetition/enum.hpp>
#define VALUE(z, n, text) {1,2}
struct st ar[] = {
BOOST_PP_ENUM(27, VALUE, _)
};
#undef VALUE
If you wish to use it, you'll just need the boost/preprocessor directory from boost - it's entirely self contained.
Although, it does have some arbitrary limits on the number of elements (I think it's 256 repetitions in this case). There is an alternative called chaos which doesn't, but it's experimental and will only work for preprocessors that follow the standard precisely (GCC's does).
The easiest way I can think of is to write a script to generate the initialisers that you can include into your C code:
{1,2},
{1,2},
{1,2}
Then in your source:
struct st ar[] = {
#include "init.inc"
};
You can arrange your Makefile to generate this automatically if you like.
I'm guessing you can, on the file with the array definition, do:
#include "rep_array_autogen.c"
struct st ar[SIZE_OF_ARRAY] = REPARRAY_DATA;
and have your Makefile generate rep_array_autogen.c with a format like
#define SIZE_OF_ARRAY 3
#define REPARRAY_DATA {{1, 2}, {1, 2}, {1, 2}, }
Since rep_array_autogen.c is built on your machine, it would be just as fast as hand-coding it in there.
What you have in the form of Is is the best Standard C can give you. I don't understand how you can be averse to standard initialization code but okay with assembly. Embedding assembly is necessarily a non-standard non-portable mechanism.
The problem with recursive macros is that you would not know where to stop.
Elazar,
Can you give data to support your view that pre-initialized data is faster than run-time initialization?
If you are running code on a standard machine with Hard Disk Drive (HDD) and your program resides on the HDD, then the time to copy the data from data section of your binary image to RAM is essentially the same amount of time as using some type of run-time initialization.
Kip Leitner

Resources