OCaml dynamically check for badly behaved native functions

OCaml dynamically check for badly behaved native functions - c

Is it possible to check at runtime for badly-behaved native functions in OCaml? This would be useful in mixed C/OCaml projects.
When implementing an intrinsic OCaml function in C, care has to be taken to live in harmony with the runtime.
For instance, in the following example add in libadd intentionally does not use CAMLreturn as would be appropriate.
(* foo.ml *)
external add : int -> int -> int = "add";;
Printf.printf "%d\n" (add 4 5);;
and the C source file
// libadd.c
#include <caml/memory.h>
#include <caml/mlvalues.h>
CAMLprim value
add(value ml_x, value ml_y)
{
CAMLparam2(ml_x, ml_y);
long x = Long_val(ml_x);
long y = Long_val(ml_y);
// intentional mistake here
// don't use CAMLreturn
return Val_long(x + y);
}
If you compile this code using either OCaml compiler
$ ocamlopt foo.ml libadd.c
$ ocamlc -custom foo.ml libadd.c
Then a.out just prints 9 without complaint.
./a.out
9
Is there a way to get either compiler to emit additional checks around native function calls to check that the OCaml calling convention has been adhered to?

ocaml does nothing wrt to this issue, the error relies in the C code that is compiled by gcc. And gcc cannot check that the return is compatible or not with Ocaml.
May be one way to limit mis-written C for Ocaml is to redefine return to avoid using it :
#define return _forbidden_
Your initial C code will fail to compile if you include those define in your code.
It does not solve the issue, but it may be useful to force the user to take care of the way the function shall return.
Another way is having a sanity script checking that any function whose return type is CAML* does not contain any return...

The CAML macros are just simple preprocessor macros. You can always just write the underlying C code directly instead of using the macro. Nothing short of changing gcc to know about how to interface with ocaml will fix that.
There is one simple trick for matching BEGIN and END style macros though to fail if one of the two is forgotten accidentally. The trick is to have an opening { in the BEGIN macro and a closing } on the END macro. Forgetting one of them will give an error because then {} don't balance.
Problem is that a function can have multiple return statements making the use of unbalanced {} impossible.

Related

initialising constant static array with algorhythm [duplicate]

I am thinking about the following problem: I want to program a microcontroller (let's say an AVR mega type) with a program that uses some sort of look-up tables.
The first attempt would be to locate the table in a separate file and create it using any other scripting language/program/.... In this case there is quite some effort in creating the necessary source files for C.
My thought was now to use the preprocessor and compiler to handle things. I tried to implement this with a table of sine values (just as an example):
#include <avr/io.h>
#include <math.h>
#define S1(i,n) ((uint8_t) sin(M_PI*(i)/n*255))
#define S4(i,n) S1(i,n), S1(i+1,n), S1(i+2,n), S1(i+3,n)
uint8_t lut[] = {S4(0,4)};
void main()
{
uint8_t val, i;
for(i=0; i<4; i++)
{
val = lut[i];
}
}
If I compile this code I get warnings about the sin function. Further in the assembly there is nothing in the section .data. If I just remove the sin in the third line I get the data in the assembly. Clearly all information are available at compile time.
Can you tell me if there is a way to achieve what I intent: The compiler calculates as many values as offline possible? Or is the best way to go using an external script/program/... to calculate the table entries and add these to a separate file that will just be #included?

The general problem here is that sin call makes this initialization de facto illegal, according to rules of C language, as it's not constant expression per se and you're initializing array of static storage duration, which requires that. This also explains why your array is not in .data section.
C11 (N1570) §6.6/2,3 Constant expressions (emphasis mine)
A constant expression can be evaluated during translation rather than
runtime, and accordingly may be used in any place that a constant may
be.
Constant expressions shall not contain assignment, increment,
decrement, function-call, or comma operators, except when they are
contained within a subexpression that is not evaluated.115)
However as by #ShafikYaghmour's comment GCC will replace sin function call with its built-in counterpart (unless -fno-builtin option is present), that is likely to be treated as constant expression. According to 6.57 Other Built-in Functions Provided by GCC:
GCC includes built-in versions of many of the functions in the
standard C library. The versions prefixed with __builtin_ are always
treated as having the same meaning as the C library function even if
you specify the -fno-builtin option.

What you are trying is not part of the C language. In situations like this, I have written code following this pattern:
#if GENERATE_SOURCECODE
int main (void)
{
... Code that uses printf to write C code to stdout
}
#else
// Source code generated by the code above
... Here I paste in what the code above generated
// The rest of the program
#endif
Every time you need to change it, you run the code with GENERATE_SOURCECODE defined, and paste in the output. Works well if your code is self contained and the generated output only ever changes if the code generating it changes.

First of all, it should go without saying that you should evaluate (probably by experiment) whether this is worth doing. Your lookup table is going to increase your data size and programmer effort, but may or may not provide a runtime speed increase that you need.
If you still want to do it, I don't think the C preprocessor can do it straightforwardly, because it has no facilities for iteration or recursion.
The most robust way to go about this would be to write a program in C or some other language to print out C source for the table, and then include that file in your program using the preprocessor. If you are using a tool like make, you can create a rule to generate the table file and have your .c file depend on that file.
On the other hand, if you are sure you are never going to change this table, you could write a program to generate it once and just paste it in.

How to add a builtin function in a GCC plugin?

It is possible for a GCC plugin to add a new builtin function? If so, how to do it properly?
GCC version is 5.3 (or newer). The code is compiled and processed by the plugin written in C.
It is mentioned in the rationale for GCC plugins at gcc-melt.org that this is doable but I cannot see how.
As far as I can see in the sources of GCC, the builtins are created using add_builtin_function() from gcc/langhooks.c:
tree
add_builtin_function (const char *name,
tree type,
int function_code,
enum built_in_class cl,
const char *library_name,
tree attrs)
It is more or less clear which values the arguments of this function should have, except for function_code, a unique numeric ID of the function.
Looks like (see add_builtin_function_common()), a value from enum built_in_function is expected there but a GCC plugin cannot change that enum.
One cannot pass any random value greater than END_BUILTINS as function_code either, it seems. builtin_decl_implicit() and builtin_decl_explicit() would have a failed assertion in that case.
So, what is the proper way to add a builtin in a GCC plugin (without using MELT and such, just GCC plugin API)?
Update
I looked again at the implementation of add_builtin_function_common() and of langhooks.builtin_function() for C as well as at how these are used in GCC. It seems that 0 is acceptable as function_code in some cases. You cannot use builtin_decl_implicit() then but you can save the DECL returned by add_builtin_function() and use it later.
Looks like the only event when I can try to create built-ins that way is PLUGIN_START_UNIT (otherwise GCC may crash due to external_scope variable being NULL).
I tried the following at that stage (fntype was created before):
decl = add_builtin_function (
"my_helper", fntype,
0 /* function_code */,
BUILT_IN_NORMAL /* enum built_in_class cl */,
NULL /* library_name */,
NULL_TREE /* attrs */)
my_helper was defined in a different C source file compiled and linked with the main source file. Then I used decl to insert the calls to that function into other functions (gimple_build_call) during my GIMPLE pass.
GCC output no errors and indeed inserted the call to my_helper but as a call to an ordinary function. I actually needed a builtin to avoid a call but rather insert the body of the function.
On the other hand, tsan0 pass, which executes right after my pass, inserts the calls of builtin functions just like one would expect: there is no explicit call as a result, just the body of the function is inserted. Its builtins, however, are defined by GCC itself rather than by the plugins.
So I suppose my builtin still needs something to be a valid builtin, but I do not know what it is. What could that be?

I'm assuming what you want to do (from your comment and linked post) is insert C code into a function. In that case, I would have thought you wouldn't need to go so far as to write a compiler plugin. Have a look at Boost.Preprocessor, which can do very advanced manipulations of C code using only the preprocessor.

How to make GCC evaluate functions at compile time?

I am thinking about the following problem: I want to program a microcontroller (let's say an AVR mega type) with a program that uses some sort of look-up tables.
The first attempt would be to locate the table in a separate file and create it using any other scripting language/program/.... In this case there is quite some effort in creating the necessary source files for C.
My thought was now to use the preprocessor and compiler to handle things. I tried to implement this with a table of sine values (just as an example):
#include <avr/io.h>
#include <math.h>
#define S1(i,n) ((uint8_t) sin(M_PI*(i)/n*255))
#define S4(i,n) S1(i,n), S1(i+1,n), S1(i+2,n), S1(i+3,n)
uint8_t lut[] = {S4(0,4)};
void main()
{
uint8_t val, i;
for(i=0; i<4; i++)
{
val = lut[i];
}
}
If I compile this code I get warnings about the sin function. Further in the assembly there is nothing in the section .data. If I just remove the sin in the third line I get the data in the assembly. Clearly all information are available at compile time.
Can you tell me if there is a way to achieve what I intent: The compiler calculates as many values as offline possible? Or is the best way to go using an external script/program/... to calculate the table entries and add these to a separate file that will just be #included?

The general problem here is that sin call makes this initialization de facto illegal, according to rules of C language, as it's not constant expression per se and you're initializing array of static storage duration, which requires that. This also explains why your array is not in .data section.
C11 (N1570) §6.6/2,3 Constant expressions (emphasis mine)
A constant expression can be evaluated during translation rather than
runtime, and accordingly may be used in any place that a constant may
be.
Constant expressions shall not contain assignment, increment,
decrement, function-call, or comma operators, except when they are
contained within a subexpression that is not evaluated.115)
However as by #ShafikYaghmour's comment GCC will replace sin function call with its built-in counterpart (unless -fno-builtin option is present), that is likely to be treated as constant expression. According to 6.57 Other Built-in Functions Provided by GCC:
GCC includes built-in versions of many of the functions in the
standard C library. The versions prefixed with __builtin_ are always
treated as having the same meaning as the C library function even if
you specify the -fno-builtin option.

What you are trying is not part of the C language. In situations like this, I have written code following this pattern:
#if GENERATE_SOURCECODE
int main (void)
{
... Code that uses printf to write C code to stdout
}
#else
// Source code generated by the code above
... Here I paste in what the code above generated
// The rest of the program
#endif
Every time you need to change it, you run the code with GENERATE_SOURCECODE defined, and paste in the output. Works well if your code is self contained and the generated output only ever changes if the code generating it changes.

First of all, it should go without saying that you should evaluate (probably by experiment) whether this is worth doing. Your lookup table is going to increase your data size and programmer effort, but may or may not provide a runtime speed increase that you need.
If you still want to do it, I don't think the C preprocessor can do it straightforwardly, because it has no facilities for iteration or recursion.
The most robust way to go about this would be to write a program in C or some other language to print out C source for the table, and then include that file in your program using the preprocessor. If you are using a tool like make, you can create a rule to generate the table file and have your .c file depend on that file.
On the other hand, if you are sure you are never going to change this table, you could write a program to generate it once and just paste it in.

How could I make a constant in C except using a number

I am working on a C math library, and it is using macros do to the most of it's work, I am now facing a problem.
This is what the macro looks like:
the_macro(a, b, c)
and the macro itself does something like:
(a - b > 0) ? error_function : 1
the error_function is used to stop the user at complie time, so if (a - b > 0) is true, then the macro will expand as a function which does not have a definition. So this will cause a linkage error.
Everthing seems good, but today my boss told me we need to do some unit-test, so I wrote a function which wraps the macro:
int my_func(int a, int b, int c)
{
return the_macro(a, b, c);
}
here comes the problem, the code can't pass linkage, because if I use a var instead of a constant to call the_macro, these error_functions will be in the .o file, because the int a, int b, int c are all known at runtime, so I can only call the macro function with constants: the_macro(2, 3, 4) is there any way to avoid this? or is there a better solution to do unit-test on this macro?
EDIT:
The code I'm working on is confidential... but I made an example which demonstrates the problem:
#include <stdio.h>
#define the_macro(a, b)\
(a > b)?error_function():1
// Comment out my_func(), then the program will run normaly
// But if you don't comment it out, the linkage error will come out.
void my_func(int a, int b)
{
the_macro(a, b);
}
int main()
{
printf("%d\n", the_macro(1, 10));
return 0;
}
I'm using gcc-4

Regardless of where you use the macro, if error_function is not declared, you should get a compiler error. If it is declared but not defined, you have undefined behavior. Whether the arguments to the macro are constants or not changes nothing in this respect. (It may affect what the actual behavior is in the case of undefined behavior.)

When you call the macro with constants, the compiler knows the value and thus, perhaps as as optimization, the expression the_macro (5, 4, 0) gets replaced by 1 instead of error_function. When your expression a-b evaluates to <= 0, your compiler replaces it with error_function, and stops your compilation.
On the other hand, when you use variables, the compiler doesn't know the result of the expression and has to use the full expansion of the macro, which contains a call to undefined function, and hence you get the linkage error.

For the purposes of your unit tests (only) why not define error_function() as part of your unit test and have it return an error unconditionally that your test framework can detect. That way you should be able to mimic the behaviour you're seeing at compile time using either constants or variables.
It's not exactly what you want, but unit test frameworks are always, by their nature run-time testing mechanisms, so an automated compile time test is probably not going to be possible.
Alternatively, you could use system() to run a command line build including your library, redirect the output, including errors into a file. You could then open the file and scan for known text of the linkage error.

Let's see if I understand this correctly:
You want a way to break compilation if a-b>0? This is actually impossible unless you use C11. There simply is no way to have the compiler abort depending on a condition. In your case you are trying to use a combination of the optimizer and the linker to get the desired behavior. But this cannot work reliably.
The expression (a - b > 0) ? error_function : 1 may be reduced by the optimizer to one if a-b>0, but this is not guaranteed. There is a guaranteed behavior compiler has to show defined by the C standard and this standard does not mention an optimizer. The same optimizer may sometimes reduce the expression, and sometimes not reduce it depending on other things in your code. Or it may or may not reduce it depending on the command line flags you are passing.
So with using this macro you are writing code, which may suddenly break unexpectedly when you switch compiler, compiler version, operating system, add or remove linked libraries or target architecture. Code that suddenly breaks depending on such changes is very bad. Don't do this to your fellow developers.
Better to write portable code for which you can be sure that future compilers will understand it because it follows the standard. In pre C11 there is no way to do this. If you really need this, tell your boss the only way is to use C11 which has a static_assert keyword which can give you the conditional abortion of the compilation.

Why don't we get a compile time error even if we don't include stdio.h in a C program?

How does the compiler know the prototype of sleep function or even printf function, when I did not include any header file in the first place?
Moreover, if I specify sleep(1,1,"xyz") or any arbitrary number of arguments, the compiler still compiles it.
But the strange thing is that gcc is able to find the definition of this function at link time, I don't understand how is this possible, because actual sleep() function takes a single argument only, but our program mentioned three arguments.
/********************************/
int main()
{
short int i;
for(i = 0; i<5; i++)
{
printf("%d",i);`print("code sample");`
sleep(1);
}
return 0;
}

Lacking a more specific prototype, the compiler will assume that the function returns int and takes whatever number of arguments you provide.
Depending on the CPU architecture arguments can be passed in registers (for example, a0 through a3 on MIPS) or by pushing them onto the stack as in the original x86 calling convention. In either case, passing extra arguments is harmless. The called function won't use the registers passed in nor reference the extra arguments on the stack, but nothing bad happens.
Passing in fewer arguments is more problematic. The called function will use whatever garbage happened to be in the appropriate register or stack location, and hijinks may ensue.

In classic C, you don't need a prototype to call a function. The compiler will infer that the function returns an int and takes a unknown number of parameters. This may work on some architectures, but it will fail if the function returns something other than int, like a structure, or if there are any parameter conversions.
In your example, sleep is seen and the compiler assumes a prototype like
int sleep();
Note that the argument list is empty. In C, this is NOT the same as void. This actually means "unknown". If you were writing K&R C code, you could have unknown parameters through code like
int sleep(t)
int t;
{
/* do something with t */
}
This is all dangerous, especially on some embedded chips where the way parameters are passed for a unprototyped function differs from one with a prototype.
Note: prototypes aren't needed for linking. Usually, the linker automatically links with a C runtime library like glibc on Linux. The association between your use of sleep and the code that implements it happens at link time long after the source code has been processed.
I'd suggest that you use the feature of your compiler to require prototypes to avoid problems like this. With GCC, it's the -Wstrict-prototypes command line argument. In the CodeWarrior tools, it was the "Require Prototypes" flag in the C/C++ Compiler panel.

C will guess int for unknown types. So, it probably thinks sleep has this prototype:
int sleep(int);
As for giving multiple parameters and linking...I'm not sure. That does surprise me. If that really worked, then what happened at run-time?

This is to do with something called 'K & R C' and 'ANSI C'.
In good old K & R C, if something is not declared, it is assumed to be int.
So any thing that looks like a function call, but not declared as function
will automatically take return value of 'int' and argument types depending
on the actuall call.
However people later figured out that this can be very bad sometimes. So
several compilers added warning. C++ made this error. I think gcc has some
flag ( -ansic or -pedantic? ) , which make this condition an error.
So, In a nutshell, this is historical baggage.

Other answers cover the probable mechanics (all guesses as compiler not specified).
The issue that you have is that your compiler and linker have not been set to enable every possible error and warning. For any new project there is (virtually) no excuse for not doing so. for legacy projects more excuse - but should strive to enable as many as possible

Depends on the compiler, but with gcc (for example, since that's the one you referred to), some of the standard (both C and POSIX) functions have builtin "compiler intrinsics". This means that the compiler library shipped with your compiler (libgcc in this case) contains an implementation of the function. The compiler will allow an implicit declaration (i.e., using the function without a header), and the linker will find the implementation in the compiler library because you're probably using the compiler as a linker front-end.
Try compiling your objects with the '-c' flag (compile only, no link), and then link them directly using the linker. You will find that you get the linker errors you expect.
Alternatively, gcc supports options to disable the use of intrinsics: -fno-builtin or for granular control, -fno-builtin-function. There are further options that may be useful if you're doing something like building a homebrew kernel or some other kind of on-the-metal app.

In a non-toy example another file may include the one you missed. Reviewing the output from the pre-processor is a nice way to see what you end up with compiling.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

OCaml dynamically check for badly behaved native functions - c

Related

initialising constant static array with algorhythm [duplicate]

How to add a builtin function in a GCC plugin?

How to make GCC evaluate functions at compile time?

How could I make a constant in C except using a number

Why don't we get a compile time error even if we don't include stdio.h in a C program?

Categories

Resources