What is this madness? - c

I've never seen anything like this; I can't seem to wrap my head around it. What does this code even do? It looks super fancy, and I'm pretty sure this stuff is not described anywhere in my C book. :(
union u;
typedef union u (*funcptr)();
union u {
funcptr f;
int i;
};
typedef union u $;
int main() {
int printf(const char *, ...);
$ fact =
($){.f = ({
$ lambda($ n) {
return ($){.i = n.i == 0 ? 1 : n.i * fact.f(($){.i = n.i - 1}).i};
}
lambda;
})};
$ make_adder = ($){.f = ({
$ lambda($ n) {
return ($){.f = ({
$ lambda($ x) {
return ($){.i = n.i + x.i};
}
lambda;
})};
}
lambda;
})};
$ add1 = make_adder.f(($){.i = 1});
$ mul3 = ($){.f = ({
$ lambda($ n) { return ($){.i = n.i * 3}; }
lambda;
})};
$ compose = ($){
.f = ({
$ lambda($ f, $ g) {
return ($){.f = ({
$ lambda($ n) {
return ($){.i = f.f(($){.i = g.f(($){.i = n.i}).i}).i};
}
lambda;
})};
}
lambda;
})};
$ mul3add1 = compose.f(mul3, add1);
printf("%d\n", fact.f(($){.i = 5}).i);
printf("%d\n", mul3.f(($){.i = add1.f(($){.i = 10}).i}).i);
printf("%d\n", mul3add1.f(($){.i = 10}).i);
return 0;
}

This example primarily builds on two GCC extensions: nested functions, and statement expressions.
The nested function extension allows you to define a function within the body of another function. Regular block scoping rules apply, so the nested function has access to the local variables of the outer function when it is called:
void outer(int x) {
int inner(int y) {
return x + y;
}
return inner(6);
}
...
int z = outer(4)' // z == 10
The statement expression extension allows you to wrap up a C block statement (any code you would normally be able to place within braces: variable declarations, for loops, etc.) for use in a value-producing context. It looks like a block statement in parentheses:
int foo(x) {
return 5 + ({
int y = 0;
while (y < 10) ++y;
x + y;
});
}
...
int z = foo(6); // z == 20
The last statement in the wrapped block provides the value. So it works pretty much like you might imagine an inlined function body.
These two extensions used in combination let you define a function body with access to the variables of the surrounding scope, and use it immediately in an expression, creating a kind of basic lambda expression. Since a statement expression can contain any statement, and a nested function definition is a statement, and a function's name is a value, a statement expression can define a function and immediately return a pointer to that function to the surrounding expression:
int foo(int x) {
int (*f)(int) = ({ // statement expression
int nested(int y) { // statement 1: function definition
return x + y;
}
nested; // statement 2 (value-producing): function name
}); // f == nested
return f(6); // return nested(6) == return x + 6
}
The code in the example is dressing this up further by using the dollar sign as a shortened identifier for a return type (another GCC extension, much less important to the functionality of the example). lambda in the example isn't a keyword or macro (but the dollar is supposed to make it look like one), it's just the name of the function (reused several times) being defined within the statement expression's scope. C's rules of scope nesting mean it's perfectly OK to reuse the same name within a deeper scope (nested "lambdas"), especially when there's no expectation of the body code using the name for any other purpose (lambdas are normally anonymous, so the functions aren't expected to "know" that they're actually called lambda).
If you read the GCC documentation for nested functions, you'll see that this technique is quite limited, though. Nested functions expire when the lifetime of their containing frame ends. That means they can't be returned, and they can't really be stored usefully. They can be passed up by pointer into other functions called from the containing frame that expect a normal function pointer, so they are fairly useful still. But they don't have anywhere near the flexibility of true lambdas, which take ownership (shared or total depends on the language) of the variables they close over, and can be passed in all directions as true values or stored for later use by a completely unrelated part of the program. The syntax is also fairly ungainly, even if you wrap it up in a lot of helper macros.
C will most likely be getting true lambdas in the next version of the language, currently called C2x. You can read more about the proposed form here - it doesn't really look much like this (it copies the anonymous function syntax and semantics found in Objective-C). The functions created this way have lifetimes that can exceed their creating scope; the function bodies are true expressions, without the need for a statement-containing hack; and the functions themselves are truly anonymous, no intermediate names like lambda required.
A C2x version of the above example will most likely look something like this:
#include <stdio.h>
int main(void) {
typedef int (^ F)(int);
__block F fact; // needs to be mutable - block can't copy-capture
// its own variable before initializing it
fact = ^(int n) {
return n == 0 ? 1 : n * fact(n - 1);
};
F (^ make_adder)(int) = ^(int n) {
return _Closure_copy(^(int x) { return n + x; });
};
F add1 = make_adder(1);
F mul3 = ^(int n) { return n * 3; };
F (^ compose)(F, F) = ^(F f, F g) {
return _Closure_copy(^(int n) { return f(g(n)); });
};
F mul3add1 = compose(mul3, add1);
printf("%d\n", fact(5));
printf("%d\n", mul3(add1(10)));
printf("%d\n", mul3add1(10));
_Closure_free(add1);
_Closure_free(mul3add1);
return 0;
}
Much simpler without all that union stuff.
(You can compile and run this modified example in Clang right now - use the -fblocks flag to enable the lambda extension, add #include <Block.h> to the top of the file, and replace _Closure_copy and _Closure_free with Block_copy and Block_release respectively.)

Related

using function names as functions in a C macro

Suppose i have code like this in my program:
if (!strcmp(current, "sin")) {
pushFloat(sin(x), &operands);
} else if (!strcmp(current, "cos")) {
pushFloat(cos(x), &operands);
} else if (!strcmp(current, "tan")) {
pushFloat(tan(x), &operands);
} else if (!strcmp(current, "ctg")) {
pushFloat(1. / tan(x), &operands);
} else if (!strcmp(current, "ln")) {
pushFloat(log(x), &operands);
} else if (!strcmp(current, "sqrt")) {
pushFloat(sqrt(x), &operands);
}
There are function names such as "sin" or "cos" saved in the current char array
Instead of using this long if block, or replacing it with an even longer switch block, i wanted to write a simple macro like this: #define PUSHFUNC(stack, func, value)(pushFloat(func(value), &stack)) and call it like this PUSHFUNC(operands, current, x)
Doing it this way creates an error "current is not a function or function pointer". I initially thought macros are just text replacement, so if i force a string that is equal to an actual function into a macro, it would expand to the function itself, but looks like i was wrong. Is there a way to achieve what i want using a macro, or should i just write a map block?
I initially thought macros are just text replacement,
That's your problem: macros are just text replacement. So if you have:
#define PUSHFUNC(stack, func, value) (pushFloat(func(value), &stack))
And you write:
PUSHFUNC(operands, current, x)
You get:
(pushFloat(current(value), &operands))
And indeed, you have no function named current. Macros are expanded before your code compiles; the preprocessor has no knowledge of the content of your variables.
If you really want to avoid a long chain of if statements, you could implement some sort of table lookup:
#include <stdio.h>
#include <string.h>
#include <stddef.h>
#include <math.h>
typedef double (*floatop)(double x);
typedef struct {
char *name;
floatop operation;
} entry;
double ctg(double);
entry opertable[] = {
{"sin", sin},
{"cos", cos},
{"tan", tan},
{"ctg", ctg},
{"sqrt", sqrt},
{NULL, NULL},
};
double ctg(double x) {
return 1. / tan(x);
}
floatop findop(char *name) {
int i;
for (i=0; opertable[i].name; i++) {
if (strcmp(opertable[i].name, name) == 0) {
return opertable[i].operation;
}
}
}
int main() {
float x = 4;
printf("sin(%f) = %f\n", x, findop("sin")(x));
printf("sqrt(%f) = %f\n", x, findop("sqrt")(x));
printf("tan(%f) = %f\n", x, findop("tan")(x));
printf("ctg(%f) = %f\n", x, findop("ctg")(x));
}
...but this requires that all of your functions take the same arguments, so for things like ctg you would need to add a helper function. You also need to decide if the increased complexity of the table lookup makes sense: it really depends on how many different operation names you expect to implement.
The output of the above code is:
sin(4.000000) = -0.756802
sqrt(4.000000) = 2.000000
tan(4.000000) = 1.157821
ctg(4.000000) = 0.863691
Is there a way to achieve what i want using a macro, or should i just write a map block?
I would recommend using an enum containing symbols for all the functions you might want to call, and using that in a switch-case block, instead of comparing a bunch of strings. Here's a very brief sample that only uses some of the functions you refer to...
enum which_func { SIN, COS, TAN, };
enum which_func which = SIN;
switch (which) {
case SIN:
pushFloat(sin(x), &operands);
break;
case COS:
pushFloat(cos(x), &operands);
break;
case TAN:
pushFloat(tan(x), &operands);
break;
default:
assert(false); // shouldn't be reachable if enum value is well-defined
}
This version will be easier to maintain in the long run, more efficient to execute and possibly more robust to logic errors (there are some compiler warnings that you can enable which will warn you if you're not handling all enum values, which can help you catch missed cases in your logic).
To add to what other answers said, what you can do is to make a macro that expands to the "basic block" of your if chain, avoiding some repetitions thanks to the stringizing operator:
#define HANDLE_FN_EXPR(fn, expr) \
else if(!strcmp(current, #fn)) \
pushFloat((expr), &operands)
#define HANDLE_FN(fn) \
HANDLE_FN_EXPR(fn, fn(x))
Then you can do
if(0);
HANDLE_FN(sin);
HANDLE_FN(cos);
HANDLE_FN(tan);
HANDLE_FN_EXPR(ctg, 1./tan(x));
HANDLE_FN(ln);
HANDLE_FN(sqrt);
Macros do in fact do text replacement. Given your macro definition, this:
PUSHFUNC(operands, current, x)
expands to this:
(pushFloat(current(x), &operands))
So as you can see, the text that is being replaced is the name of the variable, not the text that it contains.
And even if this did work as you expected, it wouldn't be able to properly handle the 1. / tan(x) case.
This means there isn't really a better way to do what you want.
Why not create some objects for each function type? I know, this is C not C++, but the idea will still work. First, create the function object type:-
typedef struct _Function
{
char *name;
float (*function) (float argument);
} Function;arg
And now create an array of function objects:-
Function functions [] =
{
{ "sin", sin },
{ "cos", cos }
// and so on
};
where the functions are defined:-
float sin(float x)
{
return 0; // put correct code here
}
float cos(float x)
{
return 0; // put correct code here
}
Finally, parse the input:-
for (int i = 0; i < sizeof functions / sizeof functions[0]; ++i)
{
if (strcmp(functions[i].name, current) == 0)
{
pushFloat(functions[i].function(arg)); // add operands!
break;
}
}
I find using enums for stuff like this very hard to maintain! Adding new functions means going through the code to find cases where the enum is used and updating it prone to errors (like missing a place!).
All because it's not C++, doesn't mean you can't use objects! It's just there's no language support for it so you have to do a bit more work (and, yeah, there are features missing!)

Can gcc/clang optimize initialization computing?

I recently wrote a parser generator tool that takes a BNF grammar (as a string) and a set of actions (as a function pointer array) and output a parser (= a state automaton, allocated on the heap). I then use another function to use that parser on my input data and generates a abstract syntax tree.
In the initial parser generation, there is quite a lot of steps, and i was wondering if gcc or clang are able to optimize this, given constant inputs to the parser generation function (and never using the pointers values, only dereferencing them) ? Is is possible to run the function at compile time, and embed the result (aka, the allocated memory) in the executable ?
(obviously, that would be using link time optimization, since the compiler would need to be able to check that the whole function does indeed have the same result with the same parameters)
What you could do in this case is have code that generates code.
Have your initial parser generator as a separate piece of code that runs independently. The output of this code would be a header file containing a set of variable definitions initialized to the proper values. You then use this file in your main code.
As an example, suppose you have a program that needs to know the number of bits that are set in a given byte. You could do this manually whenever you need:
int count_bits(uint8_t b)
{
int count = 0;
while (b) {
count += b & 1;
b >>= 1;
}
return count;
}
Or you can generate the table in a separate program:
int main()
{
FILE *header = fopen("bitcount.h", "w");
if (!header) {
perror("fopen failed");
exit(1);
}
fprintf(header, "int bit_counts[256] = {\n");
int count;
unsigned v;
for (v=0,count=0; v<256; v++) {
uint8_t b = v;
while (b) {
count += b & 1;
b >>= 1;
}
fprintf(header, " %d,\n" count);
}
fprintf(header, "};\n");
fclose(header);
return 0;
}
This create a file called bitcount.h that looks like this:
int bit_counts[256] = {
0,
1,
1,
2,
...
7,
};
That you can include in your "real" code.

pass struct of arrays into function

I am trying to pass a struct of 2D arrays and to do calculations on them.
typedef struct{
float X[80][2];
float Y[80][2];
float Z[80][2];
int T[80][2];
int K[80];
} STATS;
void MovingAverage(STATS *stat_array, int last_stat) {
//Average = Average(Prev) + (ValueToAverage/n) - (Average(Prev)/n)
stat_array->**X**[last_stat][0] = stat_array->**X**[last_stat][0] +
(stat_array->**X**[last_stat][1] / stat_array->T[last_stat][0]) -
(stat_array->**X**[last_stat][0] / stat_array->T[last_stat][0]);
}
calling the function:
MovingAverage(*stat_array, last_stat);
My question is:
how do I access in a generic way to X Y and Z inside MovingAverage function?
Edit:
void MovingAverage(STATS *stat_array, int last_stat, (char *(array_idx)) {
//Average = Average(Prev) + (ValueToAverage/n) - (Average(Prev)/n)
stat_array->**array_idx**[last_stat][0] =
stat_array->**array_idx**[last_stat][0] +
(stat_array->**array_idx**[last_stat][1] /
stat_array->T[last_stat][0]) -
(stat_array->**array_idx**[last_stat][0] /
stat_array->T[last_stat][0]);
}
I know it won't work, but just to demonstrate my willings,
Somebody here (not me) could probably come up with some preprocessor magic to do what you're asking, but that is a solution I would not pursue. I consider it bad practice since macros can quickly get hairy and tough to debug. You can't have "variables" inside your source code, if that makes sense. During the build procedure, one of the first things that runs is the preprocessor, which resolves all your macros. It then passes that source code to the compiler. The compiler is not going to do any text substitutions for you, it cranks on the source code it has. To achieve what you want, write a function that operates on the type you want, and call that function with all your types. I'd change your MovingAverage function to something like this:
void MovingAverage(float arr[80][2], const int T[80][2], int last_stat)
{
arr[last_stat][0] = ... // whatever calculation you want to do here
}
int main(void)
{
STATS stat_array;
int last_stat;
// .. initialize stat_array and last_stat
// now call MovingAverage with each of your 3 arrays
MovingAverage(stat_array.X, stat_array.T, last_stat);
MovingAverage(stat_array.Y, stat_array.T, last_stat);
MovingAverage(stat_array.Z, stat_array.T, last_stat);
...
return 0;
}

"Fake" OOP in C - how to deal with destructors and fake function epilogues on returning

I am doing some experiments with faking OOP in C, and I've stumbled upon a conundrum. In C++ I assume the compiler inserts destructors in the function epilogue, after the return statement has been executed.
Faking that in C would require the destructors be manually invoked in the appropriate order, but the problem is the return value might depend on some of those objects, so at one hand destruction cannot occur before the return statement, on the other hand statements after the return statements are never reached. And the issue becomes more complicated by the fact there might be multiple return statements from inside different blocks which require their own respective fake epilogues.
So the question is how can I possibly deal with it? It doesn't have to be nice, since it doesn't look like it can be...
So far the best I could come up was to "cache" the return value at the moment of its return, do all the cleanup and after all that simply return the cached value, but I wonder if a a more efficient solution might exist, and on a side note on how well the compiler will deal with this one to minimize its eventual overhead. Sort of:
T foo() {
T _retValue;
...
if (something) {
...
_retValue = someValue;
goto blockID_cleanup;
blockID_cleanup:
...
goto foo_cleanup; // goto parent block until function block
}
_retValue = somethingElse;
goto foo_cleanup;
foo_cleanup:
...
return _retValue;
}
Edit: Seems you're actually asking how objects are returned from functions, your question isn't 100% clear but here goes:
class A
{
public:
A(int value)
: mTest(value) {}
A operator + (const A& other)
{
return mTest + other.mTest;
}
operator int()
{
return mTest;
}
private:
int mTest = 0;
};
int foo()
{
A a(2);
A aa(4);
return a + aa;
}
This would become the following pesudo code:
int foo()
{
A a;
A aa;
a_ctor(&a, 2);
a_ctor(&aa, 4);
A temp;
a_copy(temp, a_operator_plus(a, aa)); // temp is another "instance"
// no need to worry about the dtors, the return value references nothing from these objects that isn't in scope anymore. If it did then this would be an error even in C++, so don't worry about that
a_dtor(&aa);
a_dtor(&a);
return temp.mTest;
}
C++ "generated" code will not call dtors "after" the return statement. Dtors are called just like any other function.
Assume the C++ code is:
class A
{
public:
A(const A&) = delete;
A& operator = (const A&) = delete;
A()
{
std::cout << "A ctor" << std::endl;
mExampleBuffer = new char[128]; // allocate resources example, we don't do anything with this..
}
~A()
{
std::cout << "A dtor" << std::endl;
delete[] mExampleBuffer;
}
private:
char* mExampleBuffer = nullptr; // in real code this would be a std::vector or std::unique_ptr
};
Then used as:
void foo()
{
A a;
return; // not required, but here for clarity
}
Then in C this would be:
struct A
{
// there is no "private" in C, so we need people to read this comment and not mess with mExampleBuffer
char* mExampleBuffer;
};
void a_ctor(A* thisPtr)
{
printf("A ctor\n");
thisPtr->mExampleBuffer = malloc(sizeof(char)*128);
if (!thisPtr->mExampleBuffer)
{
// TODO: In C++ this would throw, in C you're gonna have to use setlongjmp or some such to simulate it.. plus use some sort of "cleanupstack" to do the unwinding
}
}
void a_dtor(A* thisPtr)
{
printf("A dtor\n");
free(thisPtr->mExampleBuffer);
}
void foo()
{
A a = {};
a_ctor(&a);
a_dtor(&a); // nothing magic here, simply called before the return statement
return;
}
As you can see for lots of classes using "real" C++ with RAII this would become a complete nightmare.. also you're not taking into account that the actual generated code would probably inline this so that there is no "class", i.e it would look something like:
void foo()
{
printf("A ctor\n");
char* mExampleBuffer = malloc(sizeof(char)*128); // not sure if would remove this or not since not used :) didn't check
printf("A dtor\n");
free(mExampleBuffer);
return;
}
Hopefully this explains the dtor mechanism. Don't forget that with inheritance each dtor must call the base.
I'd like to illustrate a way to return complex object in C by mimicking move semantics to expand on Peter G. answer.
struct T {
char * data;
};
void swap(T * a, T * b) {
swap(&a.data, &b.data);
}
void destruct(T & d) {
free(d.data);
}
void foo(T * rv) {
T x = {"Valueable data"};
swap(rv, &x); //This is what return in C++ does
destruct(&x); //This happens, when function scope in C++ ends
}
void bar() {
T holder = {0};
foo(holder);
destruct(&holder);
}
Notice how allocation and deallocation of an object are always in the same scope.
In C++, a value returned from a function must not refer to memory of local objects, that would be an error. So, to me it looks like you're possibly trying to solve a problem not even a C++ compiler has to solve.
If on the other you want simply want to return a value computed by one of the local objects, first assign the value computed by the object to a local variable, destruct the object and then return the pre-computed return value.

When is CAMLparamX required?

I am writing an interface to a C-library using external declarations in OCaml. I used ctypes for testing but it involved a 100% overhead for fast calls (measured by a core_bench micro benchmark).
The functions look like this:
/* external _create_var : float -> int -> int -> int -> _npnum = "ocaml_tnp_number_create_var" ;; */
value ocaml_tnp_number_create_var(value v, value nr, value p, value o) {
//CAMLparam4(v, nr, p, o);
const int params = Int_val(p);
const int order = Int_val(o);
const int number = Int_val(nr);
const double value = Double_val(v);
return CTYPES_FROM_PTR(tnp_number_create_variable(value, number, params, order));
}
/* external _delete : _npnum -> unit = "ocaml_tnp_number_delete" ;; */
value ocaml_tnp_number_delete(value num) {
//CAMLparam1(num);
struct tnp_number* n = CTYPES_TO_PTR(num);
tnp_number_delete(n);
return Val_unit;
}
I borrowed the CTYPES_* macros, so I am basically moving pointers around as Int64 values.
#define CTYPES_FROM_PTR(P) caml_copy_int64((intptr_t)P)
#define CTYPES_TO_PTR(I64) ((void *)Int64_val(I64))
#define CTYPES_PTR_PLUS(I64, I) caml_copy_int64(Int64_val(I64) + I)
AFAIK, those values are represented as boxes which are tagged as "custom", which should be left untouched by the GC.
Do I need to uncomment the CAMLparamX macros to notify the GC about my usage or is it legal to omit them?
According to the comment in byterun/memory.h your function must start with a CAMLparamN macro with all value parameters.

Resources