Calling conventions and language bindings - c

I am a little confused about how to best handle calling convention differences in a public API and keep it in sync with its bindings. Let's say I am writing a C API, made available through a shared object library or a DLL. Now assume I have been told I should not use the default calling convention on Windows - that is, on Linux and other Unixes I should use the standard calling convention used by the compiler (probably cdecl) but that on Windows I should force the use of stdcall. So I have some #ifdef logic in the headers that sets the right calling convention as needed. The C headers of the library necessarily take care of that, so the C public API is usable.
Now suppose I want to write bindings for my library in another language. That means I have to rewrite the calling convention logic (depending on the current system) in that language too, for the bindings to correctly map to the library. And so on for all bindings. Some languages may not have good (or any) support for this.
Is there a more elegant way to do this? Should I just use the default calling convention everywhere, and assume that other languages will pick the right one for external/imported functions? Do I even need to worry about this stuff (I think so)? Thanks.

Many languages use a built-in or third party library for simplifying calls to shared libraries. These libraries often include support for both calling conventions. One example of this is JNA for invoking native shared libraries from java. Now that being said, if you don't want to rely on other languages using a single calling convention, you can implement the shared library with both types of functions included and have initializers which return the appropriate bindings for each type. For instance, if your library has 2 functions named function1 and function2 you could implement it like this:
typedef struct
{
int (*function1)(int a, int b);
char* (*function2)(void);
}API;
//stdcall implementation
//these functions compiled to use stdcalling convention
int stdcall_function1(int a, int b)
{
/*...*/
}
char* stdcall_function2(void)
{
/*...*/
}
API getSTDCallInstance()
{
API api;
api.function1 = &stdcall_function1;
api.function2 = &stdcall_function2;
return api;
}
//cdecl implementation
//these functions compiled to use cdecl convention
int cdecl_function1(int a, int b)
{
/*...*/
}
char* cdecl_function2()
{
/*...*/
}
API getCDECLInstance()
{
API api;
api.function1 = &cdecl_function1;
api.function2 = &cdecl_function2;
return api;
}
If you implement your library in this manner, then the loading language can use the appropriate initializer to get a handle to the struct containing the correct implementation for them.

Related

Removing functions included from a header from scope of the next files

In my project we are heavily using a C header which provides an API to comunicate to an external software. Long story short, in our project's bugs show up more often on the calling of the functions defined in those headers (it is an old and ugly legacy code).
I would like to implement an indirection on the calling of those functions, so I could include some profiling before calling the actual implementation.
Because I'm not the only person working on this project, I would like to make those wrappers in a such way that if someone uses the original implementations directly it should cause a compile error.
If those headers were C++ sources, I would be able to simply make a namespace, wrap the included files in it, and implement my functions using it (the other developers would be able to use the original implementation using the :: operator, but just not being able to call it directly is enough encapsulation to me). However the headers are C sources (which I have to include with extern "C" directive to include), so namespaces won't help me AFAIK.
I tried to play around with defines, but with no luck, like this:
#define my_func api_func
#define api_func NULL
What I wanted with the above code is to make my_func to be translated to api_func during the preprocessing, while making a direct call to api_func give a compile error, but that won't work because it will actually make my_func to be translated to NULL too.
So, basically, I would like to make a wrapper, and make sure the only way to access the API is through this wrapper (unless the other developers make some workaround, but this is inevitable).
Please note that I need to wrap hundreds of functions, which show up spread in the whole code several times.
My wrapper necessarily will have to include those C headers, but I would like to make them leave scope outside the file of my wrapper, and make them to be unavailable to every other file who includes my wrapper, but I guess this is not possible in C/C++.
You have several options, none of them wonderful.
if you have the sources of the legacy software, so that you can recompile it, you can just change the names of the API functions to make room for the wrapper functions. If you additionally make the original functions static and put the wrappers in the same source files, then you can ensure that the originals are called only via the wrappers. Example:
static int api_func_real(int arg);
int api_func(int arg) {
// ... instrumentation ...
int result = api_func_real(arg);
// ... instrumentation ...
return result;
}
static int api_func_real(int arg) {
// ...
}
The preprocessor can help you with that, but I hesitate to recommend specifics without any details to work with.
if you do not have sources for the legacy software, or if otherwise you are unwilling to modify it, then you need to make all the callers call your wrappers instead of the original functions. In this case you can modify the headers or include an additional header before that uses #define to change each of the original function names. That header must not be included in the source files containing the API function implementations, nor in those providing the wrapper function implementations. Each define would be of the form:
#define api_func api_func_wrapper
You would then implement the various api_func_wrapper() functions.
Among the ways those cases differ is that if you change the legacy function names, then internal calls among those functions will go through the wrappers bearing the original names (unless you change the calls, too), but if you implement wrappers with new names then they will be used only when called explicitly, which will not happen for internal calls within the legacy code (unless, again, you modify those calls).
You can do something like
[your wrapper's include file]
int origFunc1 (int x);
int origFunc2 (int x, int y);
#ifndef WRAPPER_IMPL
#define origFunc1 wrappedFunc1
#define origFunc2 wrappedFunc2
#else
int wrappedFunc1(int x);
int wrappedFunc2(int x, int y);
#endif
[your wrapper implementation]
#define WRAPPER_IMPL
#include "wrapper.h"
int wrapperFunc1 (...) {
printf("Wrapper1 called\n");
origFunc1(...);
}
Your wrapper's C file obviously needs to #define WRAPPER_IMPL before including the header.
That is neither nice nor clean (and if someone wants to cheat, he could simply define WRAPPER_IMPL), but at least some way to go.
There are two ways to wrap or override C functions in Linux:
Using LD_PRELOAD:
There is a shell environment variable in Linux called LD_PRELOAD,
which can be set to a path of a shared library,
and that library will be loaded before any other library (including glibc).
Using ‘ld --wrap=symbol‘:
This can be used to use a wrapper function for symbol.
Any further reference to symbol will be resolved to the wrapper function.
a complete writeup can be found at:
http://samanbarghi.com/blog/2014/09/05/how-to-wrap-a-system-call-libc-function-in-linux/

Exposing functions instead of constants for describing library features

I've noticed that in a lot of library, version informations, as well as informations on the availability of special features that may differ or be absent depending on the build, are made accessible to client applications not by a constant, but by a function call returning a constant, e.g.:
const char *libversion(void) {
return "0.2";
}
bool support_ssl(void) {
return LIB_SSL_ENABLED; /* whatever */
}
instead of simply:
const char *libversion = "0.2";
bool support_ssl = LIB_SSL_ENABLED;
Is there a practical reason for doing this, or is it only some kind of convention?
Is there a practical reason for doing this, or is it only some kind of convention?
I'd say both…
A practical reason I see for this, is that when you distribute your library, your users install a compiled version of it as a shared object, and access its data using the header. If the constant is accessible through a function, its prototype is declared in the header, but the value is defined in the compilation unit, linked in the shared object file. Edit: I'm not saying it's not possible, but a good reason for doing so is to keep the possibility to keep the API stable, while switching from a constant value to a calculated value for a given function, cf reason #3.
Another practical reason I can see is that you could access that API using some sort of "middleware", like corba, that enables you to access functions, but not constants (please be kind with me if I'm wrong about that particular point, I haven't done any CORBA in 10 years…).
And in the end, it's somehow good OOP convention, the header file being a pure functional interface, and all the members being encapsulated enabling a full decoupling of the inner workings of the library and the exposed behaviour.

Function overloading in C shared libary (different return type, different amount of arguments)

I have the job to create a shared library which should be useable as a replacement for several (older) versions of an other shared library.
Now the problem:
I have to combine:
Library a:
const char *mixer_ctl_get_enum_string(struct mixer_ctl *ctl, unsigned int enum_id);
const char *mixer_ctl_get_name(struct mixer_ctl *ctl);
Library b:
int mixer_ctl_get_enum_string(struct mixer_ctl *ctl, unsigned int enum_id, char *string, unsigned int size);
int mixer_ctl_get_name(struct mixer_ctl *ctl, char *name, unsigned int size);
I found out how to handle several amounts of input-params, but now they also have different return-types. I found examples for this in C++, but not for C.
How can I do this?
If C would work like Java, I would just implement both and everything is fine, but in C?
Thanks for your help & kind regards!
There is no easy or general solution.
In C++ you could package up functions into classes, and function names only have to be unique in the class. C doesn't have this.
In C++, the return type and types of arguments count as part of a function name (so void foo(int) and void foo(float) are actually different functions and the compiler knows which one to call). C doesn't have this.
In C there is a single, global namespace, and the types do not count as part of the function name. As others have noted, the standard C function names are different for different return types: sqrt() returns double but sqrtf() returns float.
There are functions in C that can take a varying number of arguments; a classic example is printf(). But these are tricky to write, and not a general solution to your problem. In the case of printf() there is a "format string" argument, and the printf() function just has to trust that the format string correctly matches up with the arguments to printf(). (Well, since printf() is common, some compilers actually check the format string against the arguments, but your library functions don't have this advantage!)
I've done a lot of work in C, and the single global namespace is one of the most annoying limitations of C. Is there any chance that you can actually use C++ for this project? If you use the basic features of C++, you can treat it as "C with classes" and just take advantage of namespaces and function overloading.
Otherwise I think your best bet is to use a refactor tool, or a really good search-and-replace feature in a text editor, to change the function names to be globally unique. An obvious way to do this is to change every function to have the library name as a prefix:
const char *a_mixer_ctl_get_name(struct mixer_ctl *ctl); // library a
int b_mixer_ctl_get_name(struct mixer_ctl *ctl, char *name, unsigned int size); // library b
Then you would have to refactor or search-and-replace the programs using the old libraries, but since the libraries are mutually contradictory you should have an easy time getting things working again.
There is no function overloading in C.
Reason is that there is no name mangling in assembly generation. In C++ overloaded functions with different types will have different mangled names, in C -- not.
So common practice is to specify different name for function with different type, like
float sqrtf(float x);
double sqrt(double x);
long double sqrtl (long double x);
And so on...
C does not provide function overloading facilities, neither does the shared library mechanism. The only way to differentiate the function calls will be to have different names.
To solve your problem of different interfaces (return types, parameters, ...), I'd suggest to build 3 shared libraries:
A shared library with combined functionality with a certain interface
A proxy shared library for application A, providing the interface expected, passing the calls to the combined library
A proxy shared library for application B, providing the interface expected, passing the calls to the combined library
Needless to say, that you sould change the application A & B somewhen over time to make direct use of the combined library.

In C, given a variable list of arguments, how to build a function call using them?

Suppose there's a list of arguments stored somehow, in a array for example.
Given a function pointer, how could I make a call to it passing the stored list of arguments?
I'm not trying to pass the array as an argument ok. You got it, ok? I want to pass each of its elements as an argument. An array is just to illustrate, I could be storing the arguments in some tuple structure. Also, look that I have at hand a function pointer and may have a signature in string format. I'm not trying to just define a function that is able to deal with a variadic list.
The only way I see how to do that is by employing assembly (by __asm push et al.) or this:
void (*f)(...);
int main()
{
f = <some function pointer>;
int args[]; <stored in a array, just to illustrate>
int num_args = <some value>;
switch(num_args)
{
case 0:
f();
break;
case 1:
f(args[0]);
break;
case 2:
f(args[0], args[1]);
break;
/* etc */
}
return 0;
}
I don't like this approach too much...
Is there another portable and shorter form?
Several script languages are able to call C functions.
How script languages like Python or Ruby do that? How they implement it in a portable way? Does they just use assembly for several platforms or the above in the end?
Look that I'm really not asking about details of parameter marshaling and other stuff from script languages to C, I'm interested only in how, in the end, internally, the call to the C function by the script language is built.
EDIT
I'll keep the question's title but I think a better way for asking it is:
How to call a C function with its pointer and signature available only at runtime?
UPDATE
From Foreign Interface for PLT Scheme:
A call-out is a normal function call. In a dynamic setting,
we create a “call-interface” object which specifies (binary)
input/output types; this object can be used with an arbitrary
function pointer and an array of input values to perform a callout to the function and retrieve its result. Doing this requires
manipulating the stack and knowing how a function is called,
these are details that libffi deals with.
Thanks #AnttiHaapala for searching, finding and pointing libffi. It's what I was looking for, it's being used by a bunch of script languages, it's a portable library, implemented across several architectures and compilers.
You asked what is the portable way to call any function pointer with given number of arguments. The correct answer is that there is no such way.
For example python is able to call C functions through the ctypes module, but this is portable only for as long as you know the exact prototype and calling conventions. In C the easiest way to achieve the same is to know the prototype of the function pointer at compile time.
Update
For python / ctypes example, on each platform that has the ctypes module enabled, python knows how to write the calling stack for a given set of arguments. On Windows for example, python knows of 2 standard calling conventions - cdecl with C order of parameters on stack, and stdcall with "pascal style ordering". On Linux it does need to worry about whether to call 32 or 64 bit shared objects, and so forth. If python is compiled to another platform, the ctypes needs changes as well; the C code in ctypes module is not, as such, portable.
Update 2
For Python the magic is in here: ctypes source code. Notably it seems to link http://sourceware.org/libffi/ which might be just what you needed.
I am the author of libffi. It will do what you are asking.
#AnttiHaapala pointed out libffi. Here's some information about it:
What is libffi?
Some programs may not know at the time of compilation what arguments are to be passed to a function. For instance, an interpreter may be told at run-time about the number and types of arguments used to call a given function. ‘libffi’ can be used in such programs to provide a bridge from the interpreter program to compiled code.
The ‘libffi’ library provides a portable, high level programming interface to various calling conventions. This allows a programmer to call any function specified by a call interface description at run time.
FFI stands for Foreign Function Interface. A foreign function interface is the popular name for the interface that allows code written in one language to call code written in another language. The ‘libffi’ library really only provides the lowest, machine dependent layer of a fully featured foreign function interface. A layer must exist above ‘libffi’ that handles type conversions for values passed between the two languages.
‘libffi’ assumes that you have a pointer to the function you wish to call and that you know the number and types of arguments to pass it, as well as the return type of the function.
Historic background
libffi, originally developed by Anthony Green (SO user: anthony-green), was inspired by the Gencall library from Silicon Graphics. Gencall was developed by Gianni Mariani, then employed by SGI, for the purpose of allowing calls to functions by address and creating a call frame for the particular calling convention. Anthony Green refined the idea and extended it to other architectures and calling conventions and open sourcing libffi.
Calling pow with libffi
#include <stdio.h>
#include <math.h>
#include <ffi.h>
int main()
{
ffi_cif call_interface;
ffi_type *ret_type;
ffi_type *arg_types[2];
/* pow signature */
ret_type = &ffi_type_double;
arg_types[0] = &ffi_type_double;
arg_types[1] = &ffi_type_double;
/* prepare pow function call interface */
if (ffi_prep_cif(&call_interface, FFI_DEFAULT_ABI, 2, ret_type, arg_types) == FFI_OK)
{
void *arg_values[2];
double x, y, z;
/* z stores the return */
z = 0;
/* arg_values elements point to actual arguments */
arg_values[0] = &x;
arg_values[1] = &y;
x = 2;
y = 3;
/* call pow */
ffi_call(&call_interface, FFI_FN(pow), &z, arg_values);
/* 2^3=8 */
printf("%.0f^%.0f=%.0f\n", x, y, z);
}
return 0;
}
I think I can assert libffi is a portable way to do what I asked, contrary to Antti Haapala's assertion that there isn't such a way. If we can't call libffi a portable technology, given how far it's ported/implemented across compilers and architectures, and which interface complies with C standard, we too can't call C, or anything, portable.
Information and history extracted from:
https://github.com/atgreen/libffi/blob/master/doc/libffi.info
http://en.wikipedia.org/wiki/Libffi
For safety you should unpack the variables before they are sent. Using assembler to hack the parameter stack might not be portable between compilers. Calling conventions might vary.
I can't speak for Ruby, but I have written quite a few programs using the C interfaces to Perl and Python. Perl and Python variables are not directly comparible with C variables, they have many more features. For example, a Perl scalar might have dual string and numeric values, only one of which is valid at any one time.
Conversion between Perl/Python variables and C is done using pack and unpack (in the struct module in Python). At the C interface you have to call specific APIs to do the conversion, depending on type. So, it is not just a straight pointer transfer, and it certainly does not involve assembler.

Detecting OS version and choosing threading functions accordingly

Is there a good way to check for OS version (in this case Windows Vista+ or not) and decide at runtime what version of a function is going to be used.
Concretely I am talking about implementing pthreads in Win32 threads. In my ideal case, the pthreads library would determine at program startup which OS is running. If it is Vista+, all function calls will be redirected to the cool new and fast functions, otherwise, the old emulation layer will be used.
So in effect, the library will have two version of each function, one new and one old. And a one-time runtime check would determine at runtime, before the program enters main so to speak, which version it's going to use. I know there's libraries that detect CPU features like SSE at runtime, and use the relevant functions, but I think they check at each function call. That would be too expensive to do in a low-level threading library IMO.
Is this possible? Can function calls be "relinked"/redirected at runtime so speak?
EDIT: crazy things like custom crt startup code would be possible for this (I'm talking about winpthreads for mingw-w64, which provides its own startup code)
The simple answer? Define and build a dispatch table/structure for your library. Something like this:
// Define function pointers and dispatch structure.
typedef void( *PFN_pthread_exit )( void *value_ptr );
typedef struct tag_PTHREAD_IMPL
{
PFN_pthread_create ptr_pthread_exit;
// Add the rest rest here.
} PTHREAD_IMPL;
// Define your various implementations dispatcher structures.
static PTHREAD_IMPL legacy_impl = {
&legacy_pthread_exit_impl
};
static PTHREAD_IMPL latest_andgreatest_impl = {
&pthread_exit_impl
};
static PTHREAD_IMPL* s_pImpl = NULL;
Next, your library's initialize function should contain something like this:
int StaticInitialize( )
{
// Initalize dispatcher
if( latest and greatest OS version )
s_pImpl = &latest_andgreatest_impl
else
s_pImpl = &legacy_impl;
}
Finally, your libraries exported functions should look something like this:
int pthread_exit( void *value_ptr )
{
ASSERT( s_pImpl );
ASSERT( s_pImpl->ptr_pthread_exit );
return s_pImpl->ptr_pthread_exit( value_ptr );
}
Naturally, you'll need to ensure that your modern implementations utilize runtime binding for exports that don't exist on legacy platforms.
Have fun!

Resources