Fake anonymous functions in C - c

In this SO thread, Brian Postow suggested a solution involving fake anonymous functions:
make a comp(L) function that returns the version of comp for arrays of length L... that way L becomes a parameter, not a global
How do I implement such a function?

See the answer I just posted to that question. You can use the callback(3) library to generate new functions at runtime. It's not standards compliant, since it involves lots of ugly platform-specific hacks, but it does work on a large number of systems.
The library takes care of allocating memory, making sure that memory is executable, and flushing the instruction cache if necessary, in order to ensure that code which is dynamically generated (i.e. the closure) is executable. It essentially generates stubs of code that might look like this on x86:
pop %ecx
push $THUNK
push %ecx
jmp $function
THUNK:
.long $parameter
And then returns the address of the first instruction. What this stub does is stores the the return address into ECX (a scratch register in the x86 calling convention), pushes an extra parameter onto the stack (a pointer to a thunk), and then re-pushes the return address. Then, it jumps to the actual function. This results in the function getting fooled into thinking it has an extra parameter, which is the hidden context of the closure.
It's actually more complicated than that (the actual function called at the end of the stub is __vacall_r, not the function itself, and __vacall_r() handles more implementation details), but that's the basic principle.

I don't believe you can do that with C99 -- there's no partial application or closure facility available unless you start manually generating machine code at runtime.
Apple's recently proposed blocks would work, though you need compiler support for that. Here's a brief overview of blocks. I have no idea when/if any vendor outside apple will support them.

It is not possible to generate ordinary functions during run-time in either C or C++. What Brian suggested is based on a big "if": "...if you can fake anonymous functions...". And the answer to that "if" is: no, you can't. (Although it is not clear what he meant by "fake".)
(In C++ it is possible to generate function-like objects at run time, but not ordinary functions.)
The above applies to standard C and C++ languages. Particular implementations can support various implementation-provided extensions and/or manually-implemented hacks, like "closures", "delegates" and similar stuff. Nothing of that, of course, have anything to do with standard C/C++ languages.

Related

Why calling conventions aren't used in all C programs

I am new to programming and while reading Charles Petzold book Programming Windows, I stumbled upon WINAPI (actually was surprised by the presence of another word before a function's name besides the return type) and found that it is a calling convention and to the best of my understanding it is a way of how a function pushes variables on the stack and gets the return value, I wondered why we do not use them in every C programs? Are they just exclusive to OS programming?
Calling conventions are typically tied to compiler, architecture and (when it comes to using system runtime libraries) the OS; they're not part of the C standard at all. In most cases, there's only one calling convention for a given architecture/compiler/OS combo, so you don't need to think about it; it just uses the only convention that OS supports.
The one place where it has mattered a lot in recent history was on 32 bit x86 systems, particularly on Windows. x86 had very few general purpose registers, so only a few were available at all, less than what a typical function might need for its arguments, and using them for argument passing meant you often needed to push whatever they used to contain to the stack, so there were a lot of trade-offs involved in calling conventions (it's faster to pass arguments in registers, but only if the caller could spare the registers or at least not be forced into excessive spilling to stack), and Windows went with "we'll use 'em all in different scenarios".
In modern usage on x86-64 (which is far less register starved) and on non-x86 architectures (which usually had enough registers), most compilers/OSes stick with a single common calling convention, so again, you don't need to pay attention. It's a curiosity, not something you need to personally pay attention to unless you're hand-writing whole functions in assembly.
We DO use calling conventions in all C programs, but they are typically defaulted in the compiler settings and so do not have to be expressed explicitly in code, unless actually necessary (library interactions, etc).
Calling conventions are not part of the C language itself, but are handled by compiler vendors as extensions to the language.
Most C compilers typically default to the __cdecl calling convention, but this can be changed by the compiler user if needed. Once upon a time, Windows APIs used to use __pascal, but for a very long time now __stdcall is being used instead. Hence the existence of the WINAPI preprocessor macro so Microsoft could switch between them without requiring most existing code to be rewritten.

What is not allowed in restricted C for ebpf?

From bpf man page:
eBPF programs can be written in a restricted C that is compiled
(using the clang compiler) into eBPF bytecode. Various features are
omitted from this restricted C, such as loops, global variables,
variadic functions, floating-point numbers, and passing structures as
function arguments.
AFAIK the man page it's not updated. I'd like to know what is exactly forbidden when using restricted C to write an eBPF program? Is what the man page says still true?
It is not really a matter of what is “allowed” in the ELF file itself. This sentence means that once compiled into eBPF instructions, your C code may produce code that would be rejected by the verifier. For example, loops in BPF programs have long been rejected in BPF programs, because there was no guaranty that they would terminate (the only workaround was to unroll them at compile time).
So you can basically use pretty much whatever you want in C and produce successfully an ELF object file. But then you want it to pass the verifier. What components will surely result in the verifier complaining? Let's have a look at the list from man page:
Loops: Linux version 5.3 introduces support for bounded loops, so loops now work to some extent. “Bounded loops” means loops for which the verifier has a way to tell they will eventually finish: typically, a for (i = 0; i < CONSTANT; i++) kind loop should work (assuming i is not modified in the block).
Global variables: There has been some work recently to support global variables, but they are processed in a specific way (as single-entry maps) and I have not really experimented with them, so I don't know how transparent this is and if you can simply have global variables defined in your program. Feel free to experiment :).
Variadic functions: Pretty sure this is not supported, I don't see how that would translate in eBPF at the moment.
Floating point numbers: Still not supported.
Passing structure as function arguments: Not supported, although passing pointers to structs should work I think.
If you are interested in this level of details, you should really have a look at Cilium's documentation on BPF. It is not completely up-to-date (only the very new features are missing), but much more complete and accurate than the man page. In particular, the LLVM section has a list of items that should or should not work in C programs compiled to eBPF. In addition to the aforedmentioned items, they cite:
(All function needing to be inlined, no function calls) -> This one is outdated, BPF has function calls.
No shared library calls: This is true. You cannot call functions from standard libraries, or functions defined in other BPF programs. You can only call into functions defined in the same BPF programs, or BPF helpers implemented in the kernel, or perform “tail calls”.
Exception: LLVM built-in functions for memset()/memcpy()/memmove()/memcmp() are available (I think they're pretty much the only functions you can call, other than BPF helpers and your other BPF functions).
No const string or arrays allowed (because of how they are handled in the ELF file): I think this is still valid today?
BPF program stack is limited to 512 bytes, so your C program must not result in an executable that attempts to use more.
Additional allowed or good-to-know items are listed. I can only encourage you to dive into it!
Let's go over these:
Variadic functions, floating-point numbers, and passing structures as function arguments are still not possible. As far as I know, there is no ongoing work to support these.
Global variables should be supported in kernels >=5.2, thanks to a recent patchset by Daniel Borkmann.
Unbounded loops are still unsupported. There is limited support for bounded loops in kernels >=5.3. I'm using "limited" here because a small program may still be rejected if the loop is too large.
The best way to know what the last versions of Linux allow is to read the verifier's code and/or to follow the bpf mailing list (you might also want to follow the netdev mailing list as some patchsets may still land there). I've also found checking patchwork to be quite efficient as it more clearly outlines the state of each patchset.

From Compiler to assembler

I have a question regarding the assembler. I was thinking of how the C function that takes multiple parameters as an argument is transformed into assembly. So my question is, is there a subroutine in assembly that takes arguments as a parameter to operate?
The code might look something like this:
Call label1, R16.
Where R16 is the subroutine input parameter.
If that's not the case then that means that EACH time the C function is called, it gets assembled into a subroutine with the parameters related to the specific call being substituted automatically in it. That basically means that whenever a C function is called, the compiler transforms it into an inline function which am sure is not the case either :D
So which is right?
Thanks alot! :)
The compiler uses a "calling convention" which can be specific to that one compiler for that one target architecture (x86, arm, mips, pdp-11, etc). For architectures with "plenty" of general purpose registers, the calling convention often starts with passing parameters in registers, and then uses the stack, for architectures with not a lot of registers the stack is primarily if not completely used for parameter passing and the return.
The calling convention is a set of rules, such that if everyone follows the rules you can compile functions into objects and link them with other objects and they will be able to call each others functions or call themselves.
So it is a bit of a hybrid of what you were assuming. The code built for that function is in some respects custom to that function as the number and type of parameters dictate what registers or how much stack is consumed and how. At the same time all functions conform to the same formula so they look more alike than different.
On an arm for example you might have three integers being passed in to a function, they would for all the arm calling conventions I have seen (generally you find that even though it could vary across compilers it often doesnt or in the case of arm and mips and some others they try to dictate the convention for everyone rather than the compiler folks trying to do it) the first parameter in the C function would come in in r0, the second in r1 and third in r2. If the first parameter were a 64 bit integer though then r0 and r1 are used for that first parameter and r2 gets the second and r3 the third, after r3 you use the stack, ordering of parameters on the stack is also dictated by the convention. So when a caller or a callee's code is compiled using the same C prototype then both sides know exactly where to find the parameters and construct the assembly language to do that.
There might be some minimal options in some instruction sets, but in general that is not the case.
Some assemblers have macros though that mimic procedural calls (usually with only a few registrable basetypes).
And no, only in the case of inline functions a new function is generated with the parametrised with the parameters substituted.
A compiler doesn't generate code for a procedure by textual substitution of parameters, but by putting all relevant parameters in registers or on the stack in a fixed regime called the "calling convention".
The code that calculates and loads the parameters (in registers or on stack) is generated for each invocation, and the procedure/function remains unmodified and loads the parameters from where it knows it can find them

Emulating lambdas in C?

I should mention that I'm generating code in C, as opposed to doing this manually. I say this because it doesn't matter too much if there's a lot of code behind it, because the compiler should manage it all. Anyway, how would I go around emulating a lambda in C? I was thinking I could just generate a function with some random name somewhere in the source code and then call that? I'm not too sure. I haven't really tried anything just yet, since I wanted to get the idea down before I implement it.
Is there some kind of preprocessor directive I can do, or some macro that will make this cleaner to do? I've been inspired by Jon Blow to try out compiler development, and he seemed to implement Lambdas in his language Jai. However, I think he does something where he generates bytecode, and then into C? I'm not sure.
Edit:
I'm working on a compiler, the compiler is just a project of mine to keep me busy, plus I wanted to learn more about compilers. I primarily use clang, I'm on Ubuntu 14.10. I don't have any garbage collection, but I wanted to try my hand at some kind of smart pointer-y/rust/ARC inspired memory model for garbage collection, i.e. little to no overhead. I chose C because I wanted to dabble in it more. My project is free software, just a hobby project.
There are several ways of doing it ("having" lambdas in C). The important thing to understand is that lambdas give closures and that closures are mixing "code" with "data" (the closed values); notice that objects are also mixing "code" with "data" and there is a similarity between objects and closures. See also this answer on Programmers.
Traditionally, in C, you not only use function pointers, but you adopt a convention regarding callbacks. This for instance is the case with GTK: every time you pass a function pointer, you also pass some data with it. You can view callbacks (the convention of giving C function pointer with some void*data) as a way to implement closures.
Since you generate C code (which is a wise idea, I'm doing similar things in MELT which -on Linux- generates C++ code at runtime, compile it into a shared object, and dlopen-s that) you could adopt a callback convention and pass some closed values to every function that you generate.
You might also consider closed values as static variables, but this approach is generally unwise.
There have been in the past some lambda.h header library which generates a machine-specific trampoline code for closures (essentially generating a code which pushes some closed values as arguments then call some routine). You might use some JIT compilation techniques (using libjit, GNU lightning, LLVM, asmjit, ....) to do the same. See also libffi to call an arbitrary function (of signature known at runtime only).
Notice that there is a strong -but indirect- relation between closures and garbage collection (read the GC handbook for more), and it is not by accident that every functional language has a GC. C++11 lambda functions are an exception on this (and it is difficult to understand all the intricacies of memory management of C++11 closures). So if you are generating C code, you could and probably should use Boehm's conservative garbage collector (which is wrapping dlopen) and you would have closure GC-ed values. (You could use some other GC libraries, e.g. Ravenbrook's MPS or my unmaintained Qish...) Then you could have the convention that every generated C function takes its closure as first argument.
I would suggest to read Scott's book on Programming Language Pragmatics and (assuming you know a tiny bit of Scheme or Lisp; if you don't you should learn a bit of Scheme and read SICP) Queinnec's book Lisp In Small Pieces (if you happen to read French, read the latest French variant).

Mixing Assembly language and C programs

I am using a bootloader program which is in Assembly and I am calling a C function frequently to SEND and RECEIVE a Character at a time. The controller I am using seems to have just 3 general purpose registers which it uses frequently. Apart from that I am storing some bytes in fixed RAM locations.
SO, my question is:
Will C function overwrite these RAM location, which were defined in Assembly?
I am doing PUSH and PULL of the concerned registers before going and after coming from these C functions.
If I understand your question correctly, you are concerned about the RAM locations used in your assembly module overlapping with some variable declared in a C module. You can examine the list file output by your linker to determine if this is the case. The linker list file will show all of the RAM addresses used by your C modules which you can compare to the fixed RAM locations used in the assembly module.
Note that if your linker does not produce a list file automatically, you will have to read through your linker's documentation to find the right command line option to do so.
As long as you are keeping the previous values on the stack when doing the c calls you should be fine. Just make sure that you are pushing onto stack before the call and popping off the stack after returning.
It all depends on the C calling convention that the C code was compiled in. Calling convention is how the caller and callee will communicate with regards to passing data into the function and returning values afterwards. This includes who wil do stuff like back up registers onto the stack before/after calling, will it be necessary to prep the registers before calling the C function, can you guarantee that the registers will return the way they were, etc.
You'll need to find out how the C code was compiled (with what Calling Convention setting). Note that this is also architecture specific. A summary of the different calling conventions and a description of what each entails can be found at Wikipedia here:
http://en.wikipedia.org/wiki/Calling_convention
http://en.wikipedia.org/wiki/X86_calling_conventions
On x86, cdecl and stdcall are the most popular conventions. cdecl means your ASM code should do the cleanup, while stdcall says the function being called is responsible for it. If you have the source code for the C function, I would suggest passing the necessary flags to the compiler to make it a "Callee cleanup" convention (usually stdcall, but safecall and fastcall are also options) which means you can safely call the C function without worrying about register corruption.

Resources