is it possible to execute C code during C++ stack unwinding / exception - c

I need to write a C library which will be integrated into a C++ code base. This library may call C++ code passed as a callback. These functions may throw C++ exceptions.
I'd like to ensure that the cleanup code is run during the stack unwinding process. I could use the cleanup attribute to ensure that:
If -fexceptions is enabled, then cleanup_function is run during the stack unwinding that happens during the processing of the exception.
From the GCC docs.
Unfortunately I can't use the cleanup attribute. I'd like to register the cleanup function to be run during stack unwinding programmatically using portable C.
Is this possible?

I'd like to register the cleanup function to be run during stack unwinding programmatically using portable C.
Not possible in portable C.
The C11 standard n1570 does not even require any call stack and permit compiler optimizations not using it. In some cases, there is no "stack unwinding". Think of tail-call optimizations (try gcc -Wall -O3 -S -fverbose-asm with a recent GCC) and read this draft report explaining some gcc optimizations (work in progress in June 2020). If you think of C++, read n3337, its C++11 standard.
However, if you decide to use (specifically) a recent enough GCC (so GCC 10 in June 2020) you could consider using specific builtins or pragmas. GCC has a chapter about C language extensions and another one on C++ extensions and also one about invoking it.
You might even be interested in writing your GCC plugin, or in using its libgccjit or in reusing its libbacktrace by Ian Taylor.
On Linux, see also dlopen(3) and dlsym(3) and consider using Clang.
You could ask some help from e.g. AdaCore or on gcc-help#gcc.gnu.org public mailing list.

Related

Why would gcc change the order of functions in a binary?

Many questions about forcing the order of functions in a binary to match the order of the source file
For example, this post, that post and others
I can't understand why would gcc want to change their order in the first place?
What could be gained from that?
Moreover, why is toplevel-reorder default value is true?
GCC can change the order of functions, because the C standard (e.g. n1570 or newer) allows to do that.
There is no obligation for GCC to compile a C function into a single function in the sense of the ELF format. See elf(5) on Linux
In practice (with optimizations enabled: try compiling foo.c with gcc -Wall -fverbose-asm -O3 foo.c then look into the emitted foo.s assembler file), the GCC compiler is building intermediate representations like GIMPLE. A big lot of optimizations are transforming GIMPLE to better GIMPLE.
Once the GIMPLE representation is "good enough", the compiler is transforming it to RTL
On Linux systems, you could use dladdr(3) to find the nearest ELF function to a given address. You can also use backtrace(3) to inspect your call stack at runtime.
GCC can even remove functions entirely, in particular static functions whose calls would be inline expanded (even without any inline keyword).
I tend to believe that if you compile and link your entire program with gcc -O3 -flto -fwhole-program some non static but unused functions can be removed too....
And you can always write your own GCC plugin to change the order of functions.
If you want to guess how GCC works: download and study its source code (since it is free software) and compile it on your machine, invoke it with GCC developer options, ask questions on GCC mailing lists...
See also the bismon static source code analyzer (some work in progress which could interest you), and the DECODER project. You can contact me by email about both. You could also contribute to RefPerSys and use it to generate GCC plugins (in C++ form).
What could be gained from that?
Optimization. If the compiler thinks some code is like to be used a lot it may put that code in a different region than code which is not expected to execute often (or is an error path, where performance is not as important). And code which is likely to execute after or temporally near some other code should be placed nearby, so it is more likely to be in cache when needed.
__attribute__((hot)) and __attribute__((cold)) exist for some of the same reasons.
why is toplevel-reorder default value is true?
Because 99% of developers are not bothered by this default, and it makes programs faster. The 1% of developers who need to care about ordering use the attributes, profile-guided optimization or other features which are likely to conflict with no-toplevel-reorder anyway.

When to use -std=c11 while compiling a C source code using ubunto

I am trying to compile a C source code to a machine code using an ubunto terminal
My tutor instruction was to use the following command:
running clang myprogramm.c -std=c11
Why shall I use the keyword -std=c11 and what is the difference to using just
clang myprogramm.c
Using std= options is required by your tutor (I'm divinig her motives, I'm particularly good at this!) because she wants to make sure you stay away from all those nifty Clang features that turn the accepted language from C to A LANGUAGE SUPERFICIALLY LOOKING LIKE C BUT ACTUALLY A DIFFERENT LANGUAGE NOT SUPPORTED BY OTHER C COMPILERS.
That is more than just additional library functions. It include syntax changes that break the grammar of Standard C, as defined by ISO. A grasshopper should not use these while learning. Using -std=c11 makes sure Clang either warns about or even rejects, with an error, such constructs.
When to specify the standard? Whenever you use the compiler. It is never a good idea to let the compiler just use whatever it wants.
If someone tries to use a compiler that is too old, then they will get a warning or error, and they will understand why the compile fails.
If a code contributor (maybe even yourself!) tries to add code using features that are too new, their code will be rejected. That's very important if you intend to keep compatibility with an older standard.
By explicitly stating the standard, using new features or extensions are a choice and don't happen by accident.

How can I get the GCC compiler to not optimize a standard library function call like 'printf'?

Is there a way that GCC does not optimize any function calls?
In the generated assembly code, the printf function is replaced by putchar. This happens even with the default -O0 minimal optimization flag.
#include <stdio.h>
int main(void) {
printf("a");
return 0;
}
(Godbolt is showing GCC 9 doing it, and Clang 8 keeping it unchanged.)
Use -fno-builtin to disable all replacement and inlining of standard C functions with equivalents. (This is very bad for performance in code that assumes memcpy(x,y, 4) will compile to just an unaligned/aliasing-safe load, not a function call. And disables constant-propagation such as strlen of string literals. So normally you'd want to avoid that for practical use.)
Or use -fno-builtin-FUNCNAME for a specific function, like -fno-builtin-printf.
By default, some commonly-used standard C functions are handled as builtin functions, similar to __builtin_popcount. The handler for printf replaces it with putchar or puts
if possible.
6.59 Other Built-in Functions Provided by GCC
The implementation details of a C statement like printf("a") are not considered a visible side effect by default, so they aren't something that get preserved. You can still set a breakpoint at the call site and step into the function (at least in assembly, or in source mode if you have debug symbols installed).
To disable other kinds of optimizations for a single function, see __attribute__((optimize(0))) on a function or #pragma GCC optimize. But beware:
The optimize attribute should be used for debugging purposes only. It is not suitable in production code.
You can't disable all optimizations. Some optimization is inherent in the way GCC transforms through an internal representation on the way to assembly. See Disable all optimization options in GCC.
E.g., even at -O0, GCC will optimize x / 10 to a multiplicative inverse.
It still stores everything to memory between C statements (for consistent debugging; that's what -O0 really means); GCC doesn't have a "fully dumb" mode that tries to transliterate C to assembly as naively as possible. Use tcc for that. Clang and ICC with -O0 are somewhat more literal than GCC, and so is MSVC debug mode.
Note that -g never has any effect on code generation, only on the metadata emitted. GCC uses other options (mostly -O, -f*, and -m*) to control code generation, so you can always safely enable -g without hurting performance, other than a larger binary. It's not debug mode (that's -O0); it's just debug symbols.

what gcc compiler options can I use for gfortran

I studied Option Summary for gfortran but found no compiler option to detect integer overflow. Then I found the GCC (GNU Compiler Collection) flag option -fsanitize=signed-integer-overflow here and used it when invoking gfortran. It works--integer overflow can be detected at run time!
So what does -fsanitize=signed-integer-overflow do here? Just adding to the machine code generated by gfortran some machine-level pieces that check integer overflow?
What is the relation between GCC (GNU Compiler Collection) flag options and gfortran compiler options ? What gcc compiler options can I use for gfortran, g++ etc ?
There is the GCC - GNU Compiler Collection. It shares the common backend and middleend and has frontends for different languages. For example frontends for C, C++ and Fortran which are usually invoked by commands gcc, g++ and gfortran.
It is actually more complicated, you can call gcc on a Fortran source and gfortran on a C source and it will work almost the same with the exceptions of libraries being linked (there are some other fine points). The appropriate frontend will be called based on the file extension or the language requested.
You can look almost all GCC (not just gcc) flags for all of the mentioned frontends. There are certain flags which are language specific. Normally you will get a warning like
gfortran -fcheck=all source.c
cc1: warning: command line option ‘-fcheck=all’ is valid for Fortran but not for C
but the file will compile fine, the option is just ignored and you will get a warning about that. Notice it is a C file and it is compiled by the gfortran command just fine.
The sanitization options are AFAIK not that language specific and work for multiple languages implemented in GCC, maybe with some exceptions for some obviously language specific checks. Especially -fsanitize=signed-integer-overflow which you ask about works perfectly fine for both C and C++. Signed integer overwlow is undefined behaviour in C and C++ and it is not allowed by the Fortran standard (which effectively means the same, Fortran just uses different words).
This isn't a terribly precise answer to your question, but an aha! moment, when learning about compilers, is learning that gcc (the GNU Compiler Collection), like llvm, is an example of a three-stage compiler.
The ‘front end’ parses the syntax of whichever language you're interested, and spits out an Abstract Syntax Tree (AST), which represents your program in a language-independent way.
Then the ‘middle end’ (terrible name, but ‘the clever bit’) reorganises that AST into another AST which is semantically equivalent but easier to turn into machine code.
Then the ‘back end’ turns that reorganised AST into assembler for one-or-other processor, possibly doing platform-specific micro-optimisations along the way.
That's why the (huge number of) gcc/llvm options are unexpectedly common to (apparently wildly) different languages. A few of the options are specific to C, or Fortran, or Objective-C, or whatever, but the majority of them (probably) are concerned with the middle and last bits, and so are common to all of the languages that gcc/llvm supports.
Thus the various options are specific to stage 1, 2 or 3, but may not be conveniently labelled as such; with this in mind, however, you might reasonably intuit what is and isn't relevant to the particular language you're interested in.
(It's for this sort of reason that I will dogmatically claim that CC++FortranJavaPerlPython is essentially a single language, with only trivial syntactical and library minutiae to distinguish between dialects).

Can I run GCC as a daemon (or use it as a library)?

I would like to use GCC kind of as a JIT compiler, where I just compile short snippets of code every now and then. While I could of course fork a GCC process for each function I want to compile, I find that GCC's startup overhead is too large for that (it seems to be about 50 ms on my computer, which would make it take 50 seconds to compile 1000 functions). Therefore, I'm wondering if it's possible to run GCC as a daemon or use it as a library or something similar, so that I can just submit a function for compilation without the startup overhead.
In case you're wondering, the reason I'm not considering using an actual JIT library is because I haven't found one that supports all the features I want, which include at least good knowledge of the ABI so that it can handle struct arguments (lacking in GNU Lightning), nested functions with closure (lacking in libjit) and having a C-only interface (lacking in LLVM; I also think LLVM lacks nested functions).
And no, I don't think I can batch functions together for compilation; half the point is that I'd like to compile them only once they're actually called for the first time.
I've noticed libgccjit, but from what I can tell, it seems very experimental.
My answer is "No (you can't run GCC as a daemon process, or use it as a library)", assuming you are trying to use the standard GCC compiler code. I see at least two problems:
The C compiler deals in complete translation units, and once it has finished reading the source, compiles it and exits. You'd have to rejig the code (the compiler driver program) to stick around after reading each file. Since it runs multiple sub-processes, I'm not sure that you'll save all that much time with it, anyway.
You won't be able to call the functions you create as if they were normal statically compiled and linked functions. At the least you will have to load them (using dlopen() and its kin, or writing code to do the mapping yourself) and then call them via the function pointer.
The first objection deals with the direct question; the second addresses a question raised in the comments.
I'm late to the party, but others may find this useful.
There exists a REPL (read–eval–print loop) for c++ called Cling, which is based on the Clang compiler. A big part of what it does is JIT for c & c++. As such you may be able to use Cling to get what you want done.
The even better news is that Cling is undergoing an attempt to upstream a lot of the Cling infrastructure into Clang and LLVM.
#acorn pointed out that you'd ruled out LLVM and co. for lack of a c API, but Clang itself does have one which is the only one they guarantee stability for: https://clang.llvm.org/doxygen/group__CINDEX.html

Resources