GCC plugin: copying function's arguments - c

I develop a GCC plugin that instruments the applications being compiled. The applications are written in C and are built with GCC 4.7 (4.8 and 4.9 are also an option) on an x86 Linux system.
My plugin implements a compilation pass which is placed after "ssa" standard pass and operates on GIMPLE representation. Among other things, I need to implement the following but currently cannot figure out how to do it correctly.
When processing a C function, I need to insert the code at its beginning that copies its arguments to the local variables I create, for future processing.
My first naive implementation looked as follows:
tree p;
gimple_seq seq = NULL;
gimple_stmt_iterator gsi = gsi_start_bb(single_succ(ENTRY_BLOCK_PTR));
for (p = DECL_ARGUMENTS(current_function_decl); p; p = DECL_CHAIN(param)) {
tree copy_par;
copy_par = create_tmp_var(TREE_TYPE(p), NULL);
add_referenced_var(copy_par);
copy_par = make_ssa_name(copy_par, NULL);
g = gimple_build_assign(copy_par, p);
SSA_NAME_DEF_STMT(copy_par) = g;
gimple_seq_add_stmt_without_update (&seq, g);
... // more processing here
}
...
gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
This way, however, an invalid assignment of the parameter declaration to a variable is created according to a dump:
gimple_assign <parm_decl, D.2206_11, par, NULL>
D.2206_11 corresponds to the local variable I created, par - the argument of the function I want to copy.
GCC crashes at some later pass as a result, trying to process this added statement. I suppose this is because p is not the variable holding the value of the respective argument but a declaration of that variable. Is it this way? And how to get that variable?
I tried using gimple_build_assign_with_ops(NOP_EXPR, copy_par, p, NULL_TREE) instead of gimple_build_assign() but it did not do either. GCC still crashes at the same place. I can provide the backtrace but I feel I am just missing something fundamental.
I also looked at traversal of trees starting from TYPE_ARG_TYPES (TREE_TYPE (current_function_decl)) and further via TREE_CHAIN(...) but that seems to give the types of the arguments rather than the respective variables.
So, the question is, how to add copying of the function's arguments properly.
Note
Perhaps, this could be done with the help of MELT or GCC Python plugin, but in this project, I need to perform all transformations of the code using only what GCC itself provides.

I found that copying of the arguments works with GCC 4.8 and 4.9 but not with 4.7. Here is the code that works for me now (instrument_fentry() performs the copying).
To avoid the copies being removed at the later compile passes, I made the corresponding variables volatile.
In addition, if there were no SSA_NAME with default definition statement specified for a given parameter (see SSA_NAME_IS_DEFAULT_DEF()), I added such SSA_NAMEs with GIMPLE_NOPs as the defining statements.
This has worked OK with GCC 4.8 and 4.9 so far, so it looks like a bug in GCC 4.7 or some change of rules between GCC 4.7 and 4.8. I could not pinpoint the exact commit though.

Related

A more compact way of passing a variable number of arguments in C [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I would like to call a function (called here just function) that accepts a variable list of arguments without knowing in advance how many arguments will be needed.
I have come up with this:
int param_num
char **param // initialized and populated somewhere else
...
if (param_num == 0) result = function();
else if (param_num == 1) result = function(param[0]);
else if (param_num == 2) result = function(param[0],param[1]);
...
The above code is just a proof of concept and is not intended to be compilable. The actual function has at least one fixed argument. I cannot change the function because it belongs to an external library. The actual code is more complex and is working as expected but...
My question is: is there a more compact way of writing the same code without touching "function"?
PS working in Linux with gcc 7
PPS it is just a curiosity. There is no real problem that I am trying to solve. The above code is working as expected. Just wondering if there were a way to make it prettier.
Thanks
Read about <stdarg.h> and variadic functions. On Linux, see also stdarg(3). It is how you can define in C such functions of variable arity (like printf). The first few arguments should be fixed, and of known type, so you cannot have exactly what you want (with a call like function() without arguments). So a variadic function is required to have at least one first fixed-type argument (which usually determines, like in printf, the type and number of following variadic arguments). So the C language specification forbids to call the same function without any arguments and with two of them.
Be also aware of the ABI of your system. It defines calling conventions (which usually vary with the signature of the called function) For 64 bits x86 on Linux, see this (floating point and pointer arguments get passed in different registers).
GCC (specifically) has also builtins for constructing function calls, but the C11 standard n1570 don't have anything like that.
It is unclear what you exactly want to achieve, but in some cases you might generate code (e.g. pedantically use metaprogramming) at runtime. There are several ways to achieve that on Linux:
You could generate (in a file) some C code in /tmp/emittedcode.c, fork its compilation as a plugin (by running gcc -fPIC -shared -Wall -O -g /tmp/emittedcode.c -o /tmp/generatedplugin.so) then load that plugin using dlopen(3) (as void*dlh = dlopen("/tmp/generatedplugin.so", RTLD_NOW); but you should handle failure) and dlsym(3) of symbols there. My manydl.c example shows that you can practically do that for many hundred thousands of generated plugins.
You could use some JIT-compiling library like libgccjit or asmjit or libjit, or GNU lightning, etc...
In both cases, you'll then (conceptually) add a new code segment to your process and get the address of a freshly created function there into some function pointer.
Be also aware of closures and anonymous functions, and notice that C don't have them (but have to use callbacks). Read SICP.
As commented by Antti Haapala, look also into libffi (it enables you to call an arbitrary function with arbitrary arguments by "interpreting" that call, which you would describe by ad-hoc data structures).
If you exactly know what type of args you'll need to pass, you can create a structure with possible args defined in it
you can then pass the structure and do your stuff
struct args {
define args....
}
.
.
struct args* a;
result = function(a);
keep in mind that you'll have to add some checks in the function to see what args are passed.

How to compile and keep "unused" C declarations with clang -emit-llvm

Context
I'm writing a compiler for a language that requires lots of runtime functions. I'm using LLVM as my backend, so the codegen needs types for all those runtime types (functions, structs, etc) and instead of defining all of them manually using the LLVM APIs or handwriting the LLVM IR I'd like to write the headers in C and compile to the bitcode that the compiler can pull in with LLVMParseBitcodeInContext2.
Issue
The issue I'm having is that clang doesn't seem to keep any of the type declarations that aren't used by any any function definitions. Clang has -femit-all-decls which sounds like it's supposed to solve it, but it unfortunately isn't and Googling suggests it's misnamed as it only affects unused definitions, not declarations.
I then thought perhaps if I compile the headers only into .gch files I could pull them in with LLVMParseBitcodeInContext2 the same way (since the docs say they use "the same" bitcode format", however doing so errors with error: Invalid bitcode signature so something must be different. Perhaps the difference is small enough to workaround?
Any suggestions or relatively easy workarounds that can be automated for a complex runtime? I'd also be interested if someone has a totally alternative suggestion on approaching this general use case, keeping in mind I don't want to statically link in the runtime function bodies for every single object file I generate, just the types. I imagine this is something other compilers have needed as well so I wouldn't be surprised if I'm approaching this wrong.
e.g. given this input:
runtime.h
struct Foo {
int a;
int b;
};
struct Foo * something_with_foo(struct Foo *foo);
I need a bitcode file with this equivalent IR
runtime.ll
; ...etc...
%struct.Foo = type { i32, i32 }
declare %struct.Foo* #something_with_foo(%struct.Foo*)
; ...etc...
I could write it all by hand, but this would be duplicative as I also need to create C headers for other interop and it'd be ideal not to have to keep them in sync manually. The runtime is rather large. I guess I could also do things the other way around: write the declarations in LLVM IR and generate the C headers.
Someone else asked about this years back, but the proposed solutions are rather hacky and fairly impractical for a runtime of this size and type complexity: Clang - Compiling a C header to LLVM IR/bitcode
Clang's precompiled headers implementation does not seem to output LLVM IR, but only the AST (Abstract Syntax Tree) so that the header does not need to be parsed again:
The AST file itself contains a serialized representation of Clang’s
abstract syntax trees and supporting data structures, stored using the
same compressed bitstream as LLVM’s bitcode file format.
The underlying binary format may be the same, but it sounds like the content is different and LLVM's bitcode format is merely a container in this case. This is not very clear from the help page on the website, so I am just speculating. A LLVM/Clang expert could help clarify this point.
Unfortunately, there does not seem to be an elegant way around this. What I suggest in order to minimize the effort required to achieve what you want is to build a minimal C/C++ source file that in some way uses all the declarations that you want to be compiled to LLVM IR. For example, you just need to declare a pointer to a struct to ensure it does not get optimized away, and you may just provide an empty definition for a function to keep its signature.
Once you have a minimal source file, compile it with clang -O0 -c -emit-llvm -o precompiled.ll to get a module with all definitions in LLVM IR format.
An example from the snippet you posted:
struct Foo {
int a;
int b;
};
// Fake function definition.
struct Foo * something_with_foo(struct Foo *foo)
{
return NULL;
}
// A global variable.
struct Foo* x;
Output that shows that definitions are kept: https://godbolt.org/g/2F89BH
So, clang doesn't actually filter out the unused declarations. It defers emitting forward declarations till their first use. Whenever a function is used it checks if it has been emitted already, if not it emits the function declaration.
You can look at these lines in the clang repo.
// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
if (!FD->doesDeclarationForceExternallyVisibleDefinition())
return;
The simple fix here would be to either comment the last two lines or just add && false to the second condition.
// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
if (!FD->doesDeclarationForceExternallyVisibleDefinition() && false)
return;
This will cause clang to emit a declaration as soon as it sees it, this might also change the order in which definitions appear in your .ll (or .bc) files. Assuming that is not an issue.
To make it cleaner you can also add a command line flag --emit-all-declarations and check that here before you continue.

OCaml dynamically check for badly behaved native functions

Is it possible to check at runtime for badly-behaved native functions in OCaml? This would be useful in mixed C/OCaml projects.
When implementing an intrinsic OCaml function in C, care has to be taken to live in harmony with the runtime.
For instance, in the following example add in libadd intentionally does not use CAMLreturn as would be appropriate.
(* foo.ml *)
external add : int -> int -> int = "add";;
Printf.printf "%d\n" (add 4 5);;
and the C source file
// libadd.c
#include <caml/memory.h>
#include <caml/mlvalues.h>
CAMLprim value
add(value ml_x, value ml_y)
{
CAMLparam2(ml_x, ml_y);
long x = Long_val(ml_x);
long y = Long_val(ml_y);
// intentional mistake here
// don't use CAMLreturn
return Val_long(x + y);
}
If you compile this code using either OCaml compiler
$ ocamlopt foo.ml libadd.c
$ ocamlc -custom foo.ml libadd.c
Then a.out just prints 9 without complaint.
./a.out
9
Is there a way to get either compiler to emit additional checks around native function calls to check that the OCaml calling convention has been adhered to?
ocaml does nothing wrt to this issue, the error relies in the C code that is compiled by gcc. And gcc cannot check that the return is compatible or not with Ocaml.
May be one way to limit mis-written C for Ocaml is to redefine return to avoid using it :
#define return _forbidden_
Your initial C code will fail to compile if you include those define in your code.
It does not solve the issue, but it may be useful to force the user to take care of the way the function shall return.
Another way is having a sanity script checking that any function whose return type is CAML* does not contain any return...
The CAML macros are just simple preprocessor macros. You can always just write the underlying C code directly instead of using the macro. Nothing short of changing gcc to know about how to interface with ocaml will fix that.
There is one simple trick for matching BEGIN and END style macros though to fail if one of the two is forgotten accidentally. The trick is to have an opening { in the BEGIN macro and a closing } on the END macro. Forgetting one of them will give an error because then {} don't balance.
Problem is that a function can have multiple return statements making the use of unbalanced {} impossible.

How to add a builtin function in a GCC plugin?

It is possible for a GCC plugin to add a new builtin function? If so, how to do it properly?
GCC version is 5.3 (or newer). The code is compiled and processed by the plugin written in C.
It is mentioned in the rationale for GCC plugins at gcc-melt.org that this is doable but I cannot see how.
As far as I can see in the sources of GCC, the builtins are created using add_builtin_function() from gcc/langhooks.c:
tree
add_builtin_function (const char *name,
tree type,
int function_code,
enum built_in_class cl,
const char *library_name,
tree attrs)
It is more or less clear which values the arguments of this function should have, except for function_code, a unique numeric ID of the function.
Looks like (see add_builtin_function_common()), a value from enum built_in_function is expected there but a GCC plugin cannot change that enum.
One cannot pass any random value greater than END_BUILTINS as function_code either, it seems. builtin_decl_implicit() and builtin_decl_explicit() would have a failed assertion in that case.
So, what is the proper way to add a builtin in a GCC plugin (without using MELT and such, just GCC plugin API)?
Update
I looked again at the implementation of add_builtin_function_common() and of langhooks.builtin_function() for C as well as at how these are used in GCC. It seems that 0 is acceptable as function_code in some cases. You cannot use builtin_decl_implicit() then but you can save the DECL returned by add_builtin_function() and use it later.
Looks like the only event when I can try to create built-ins that way is PLUGIN_START_UNIT (otherwise GCC may crash due to external_scope variable being NULL).
I tried the following at that stage (fntype was created before):
decl = add_builtin_function (
"my_helper", fntype,
0 /* function_code */,
BUILT_IN_NORMAL /* enum built_in_class cl */,
NULL /* library_name */,
NULL_TREE /* attrs */)
my_helper was defined in a different C source file compiled and linked with the main source file. Then I used decl to insert the calls to that function into other functions (gimple_build_call) during my GIMPLE pass.
GCC output no errors and indeed inserted the call to my_helper but as a call to an ordinary function. I actually needed a builtin to avoid a call but rather insert the body of the function.
On the other hand, tsan0 pass, which executes right after my pass, inserts the calls of builtin functions just like one would expect: there is no explicit call as a result, just the body of the function is inserted. Its builtins, however, are defined by GCC itself rather than by the plugins.
So I suppose my builtin still needs something to be a valid builtin, but I do not know what it is. What could that be?
I'm assuming what you want to do (from your comment and linked post) is insert C code into a function. In that case, I would have thought you wouldn't need to go so far as to write a compiler plugin. Have a look at Boost.Preprocessor, which can do very advanced manipulations of C code using only the preprocessor.

Why don't we get a compile time error even if we don't include stdio.h in a C program?

How does the compiler know the prototype of sleep function or even printf function, when I did not include any header file in the first place?
Moreover, if I specify sleep(1,1,"xyz") or any arbitrary number of arguments, the compiler still compiles it.
But the strange thing is that gcc is able to find the definition of this function at link time, I don't understand how is this possible, because actual sleep() function takes a single argument only, but our program mentioned three arguments.
/********************************/
int main()
{
short int i;
for(i = 0; i<5; i++)
{
printf("%d",i);`print("code sample");`
sleep(1);
}
return 0;
}
Lacking a more specific prototype, the compiler will assume that the function returns int and takes whatever number of arguments you provide.
Depending on the CPU architecture arguments can be passed in registers (for example, a0 through a3 on MIPS) or by pushing them onto the stack as in the original x86 calling convention. In either case, passing extra arguments is harmless. The called function won't use the registers passed in nor reference the extra arguments on the stack, but nothing bad happens.
Passing in fewer arguments is more problematic. The called function will use whatever garbage happened to be in the appropriate register or stack location, and hijinks may ensue.
In classic C, you don't need a prototype to call a function. The compiler will infer that the function returns an int and takes a unknown number of parameters. This may work on some architectures, but it will fail if the function returns something other than int, like a structure, or if there are any parameter conversions.
In your example, sleep is seen and the compiler assumes a prototype like
int sleep();
Note that the argument list is empty. In C, this is NOT the same as void. This actually means "unknown". If you were writing K&R C code, you could have unknown parameters through code like
int sleep(t)
int t;
{
/* do something with t */
}
This is all dangerous, especially on some embedded chips where the way parameters are passed for a unprototyped function differs from one with a prototype.
Note: prototypes aren't needed for linking. Usually, the linker automatically links with a C runtime library like glibc on Linux. The association between your use of sleep and the code that implements it happens at link time long after the source code has been processed.
I'd suggest that you use the feature of your compiler to require prototypes to avoid problems like this. With GCC, it's the -Wstrict-prototypes command line argument. In the CodeWarrior tools, it was the "Require Prototypes" flag in the C/C++ Compiler panel.
C will guess int for unknown types. So, it probably thinks sleep has this prototype:
int sleep(int);
As for giving multiple parameters and linking...I'm not sure. That does surprise me. If that really worked, then what happened at run-time?
This is to do with something called 'K & R C' and 'ANSI C'.
In good old K & R C, if something is not declared, it is assumed to be int.
So any thing that looks like a function call, but not declared as function
will automatically take return value of 'int' and argument types depending
on the actuall call.
However people later figured out that this can be very bad sometimes. So
several compilers added warning. C++ made this error. I think gcc has some
flag ( -ansic or -pedantic? ) , which make this condition an error.
So, In a nutshell, this is historical baggage.
Other answers cover the probable mechanics (all guesses as compiler not specified).
The issue that you have is that your compiler and linker have not been set to enable every possible error and warning. For any new project there is (virtually) no excuse for not doing so. for legacy projects more excuse - but should strive to enable as many as possible
Depends on the compiler, but with gcc (for example, since that's the one you referred to), some of the standard (both C and POSIX) functions have builtin "compiler intrinsics". This means that the compiler library shipped with your compiler (libgcc in this case) contains an implementation of the function. The compiler will allow an implicit declaration (i.e., using the function without a header), and the linker will find the implementation in the compiler library because you're probably using the compiler as a linker front-end.
Try compiling your objects with the '-c' flag (compile only, no link), and then link them directly using the linker. You will find that you get the linker errors you expect.
Alternatively, gcc supports options to disable the use of intrinsics: -fno-builtin or for granular control, -fno-builtin-function. There are further options that may be useful if you're doing something like building a homebrew kernel or some other kind of on-the-metal app.
In a non-toy example another file may include the one you missed. Reviewing the output from the pre-processor is a nice way to see what you end up with compiling.

Resources