C Header Files and ABI

C Header Files and ABI - c

I'd like to know how C Header Files and ABIs relate. The sizes of various types are architecture and even compiler-dependent. Then how can one reliably link to a C library?
For a more specific problem: When using Haskell's FFI, one even only uses Haskell types like CDouble to define (duplicate the definition of) the C library interface. I don't know where the binary type size information is coming from. What is the trick for making the linking work?

Please see this link https://code.google.com/p/tabi
It may help you to avoid difficulties with possible ABI differences between Haskell and C.

The library type information comes from magic macros that are run to insert information grabbed from the C compiler by autoconf.
For example, see the definition of CDoublehere: https://hackage.haskell.org/package/base-4.8.2.0/docs/src/Foreign.C.Types.html#CDouble
and then see where the HTYPE_DOUBLE size comes from in this autoconf input here: https://hackage.haskell.org/package/base-4.8.2.0/src/include/HsBaseConfig.h.in
Since GHH compiles against the compiler/arch it is compiled with (except in the special cross-compiler modes, which are new and different in ways I'm not fully cognizant of) this makes everything tie out with the ABI properly.

Related

How to work around compiler built-in types in C standard header files

I am working on a static analysis tool for C. I need to pass the code being analysed through the C preprocessor so that the tool can see the library function prototypes, type definitions, etc. Unfortunately both with clang on Mac OS X and gcc on Linux distros, some of the standard header files refer to compiler built-in types like __builtin_va_list that my tool doesn't know about. Does anyone have any suggestions for how to work around this. One possibility, if it's available somewhere, would be a vanilla-flavoured set of header files that produce C that conforms strictly to the standard. The header files don't have to map to any ABI, as the tool doesn't need to compile and run the code: they just have to give the API promised by the C standard. Any suggestions will be gratefully received.

Instead of finding a set of standard standard header files, you can just use a set of empty files with the expected names and pass the source code through the compiler preprocessor with a -Idirectory option. Your syntax analysis tool should be able to deal with the remaining symbols.
It would be useful to have a preprocessor option in addition to -dI to preserve #include lines instead of handling them.
In the mean time, you can try using the include files from my nolibc repository.

C and assembly how can it work?

I am wondering how mixing C and assembly can be possible as compilers generate code in different ways, for example many C compilers will use registers rather than pushing to the stack while making a function call, These functions will then move those registers into the appropiate memory locations because of this what if you write assembly code or link with an object file created by a different compiler that will call the C function but instead push the arguments to the stack rather than set the registers.
My guess is the C compiler assembly output has done it in such a clever way that it doesn't make a difference and it will still work but I can't be sure looking at the assembly code it doesn't appear it would work.
Can anyone answer my question as I am writing a compiler and need to know this so I don't make any mistakes should I want to link with a C module in the future.

The conventions that are used for calling functions are part of what's called the "application binary interface" (ABI). If this interface is specified, then all code that follows the specification can be linked together.
There is no standard ABI for C. However, most popular platforms have one prevailing C compiler that effectively produces a de-facto standard ABI (e.g. there's one for Windows, one for Linux on x86 (32 and 64 bit), one for Linux on ARM, etc.). ABIs may specify a large number of separate "calling conventions", and your C compiler will typically let you specify the desired convention at the point of function declaration using some vendor extension.
Conversely, if there is no documented ABI for your C compiler, or for an existing bit of object code, then you cannot in general link (or otherwise interact) with it successfully.

Should a Fortran-compiled and C-compiled DLL be able to import interchangeably? (x86 target)

The premise: I'm writing a plug-in DLL which conforms to an industry standard interface / function signature. This will be used in at least two different software packages used internally at my company, both of which have some example skeleton code or empty shells of this particular interface. One vendor authors their example in C/C++, the other in Fortran.
Ideally I'd like to just have to write and maintain this library code in one language and not duplicate it (especially as I'm only just now getting some comfort level in various flavors of C, but haven't touched Fortran).
I've emailed off to both our vendors to see if there's anything specific their solvers need when they import this DLL, but this has made me curious at a more fundamental level. If I compile a DLL with an exposed method void foo(int bar) in both C and Fortran... by the time it's down to x86 machine instructions - does it make any difference in how that method is called by program "X"? I've gathered so far that if I were to do C++ I'd need the extern "C" bit to avoid "mangling" - there anything else I should be aware of?

It matters. The exported function must use a specific calling convention, there are several incompatible ones in common use in 32-bit code. The calling convention dictates where the function arguments are stored, in what order they are passed and how they are removed again. As well as how the function return value is passed back.
And the name of the function matters, exported function names are often decorated with extra characters. Which is what extern "C" is all about, it suppresses the name mangling that a C++ compiler uses to prevent overloaded functions from having the same exported name. So the name is one that the linker for a C compiler can recognize.
The way a C compiler makes function calls is pretty much the standard if you interop with code written in other languages. Any modern Fortran compiler will support declarations to make them compatible with a C program. And surely this is something that's already used by whatever software vendor you are working with that provides an add-on that was written in Fortran. And the other way around, as long as you provide functions that can be used by a C compiler then the Fortran programmer has a good chance at being able to call it.

Yes it has been discussed here many many times. Study answers and questions in this tag https://stackoverflow.com/questions/tagged/fortran-iso-c-binding .
The equivalent of extern "C" in fortran is bind(C). The equivalency of the datatypes is done using the intrinsic module iso_c_binding.
Also be sure to use the same calling conventions. If you do not specify anything manually, the default is usually the same for both. On Linux this is non-issue.

extern "C" is used in C++ code. So if you DLL is written in C++, you mustn't pass any C++ objects (classes).
If you stick with C types, you need to make sure the function passes parameters in a single way e.g. use C's default of _cdecl. Not sure what Fortran uses.

Where is pow function defined and implemented in C?

I read that the pow(double, double) function is defined in "math.h" but I can't find its declaration.
Does anybody know where this function declared? And where is it implemented in C?
Reference:
http://publications.gbdirect.co.uk/c_book/chapter9/maths_functions.html

Quite often, an include file such as <math.h> will include other header files that actually declare the functions you would expect to see in <math.h>. The idea is that the program gets what it expects when it includes <math.h>, even if the actual function definitions are in some other header file.
Finding the implementation of a standard library function such as pow() is quite another matter. You will have to dig up the source code to your C standard runtime library and find the implementation in there.

Where it's defined depends on your environment. The code is inside a compiled C standard library somewhere.
Its "definition" is in the source code for your c standard library distribution. One such distribution is eglibc. This is browsable online, or in a source distribution:
w_pow.c
math_private.h
Short answer: In the C standard library source code.

The actual implementation of pow may vary from compiler to compiler. Generally, math.h (or a vendor-specific file included by math.h) provides the prototype for pow (i.e., its declaration), but the implementation is buried in some library file such as libm.a. Depending on your compiler, the actual source code for pow or any other library function may not be available.

declared: in the include directory of your system/SDK (e.g.: /usr/include;/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS3.2.sdk/usr/include/architecture/arm/math.h)
defined (implemented):
as library (compiled, binary code): in the library directory of your system/SDK (e.g.: /usr/lib (in case of the math library it's libm.dylib)
as source (program code): this is the interesting part. I work on a Mac OS X 10.6.x right now. The sources for the functions declared in math.h (e.g.: extern double pow ( double, double ); ) are not shipped with the installation (at least I couldn't find it). You are likely to find those sources in your system/SDK's C library. In my case the math library (libm) is a separate project, some of its sources are provided by Apple: http://www.opensource.apple.com/tarballs/Libm/Libm-315.tar.gz
The extern keyword in the function declaration of pow means, that it's defined somewhere else. Math functions are low-level high-performance implementations mostly done in assembly code (*.s). The assembly routines (taking the arguments/giving the parameters via registers/stack) are linked with the rest of the C library. The linking/exporting of the function/routine names is platform specific and doesn't really matter if ones goal is not dive into assembly coding.
I hope this helped,
Raphael

If you are seeking how the calculation is implemented, you can find it here:
http://fossies.org/dox/gcc-4.5.3/e__pow_8c_source.html
The name of the function is __ieee754_pow
which is called by pow function.

I’s really defined in math.h. Have you tried including math.h and simply using pow? What do you mean by “can't find it”?

Here's a C implementation for fdlibm: http://www.netlib.org/fdlibm/e_pow.c
For what it's worth, when v8 dropped its cos/sine tables, it pulled from fdlibm's implementation to do so: https://code.google.com/p/v8/source/detail?r=22918
From the change commit comments: "Implement trigonometric functions using a fdlibm port."
Mozilla on the other hand calls the cstdlib math functions, which will have variable performance by build and system (ex: may or may not invoke the chip-level implementations of transcendental functions). While C# bytecode seems to make explicit references to chip-level functions when it can. However, "pow" is not one of those, iirc (doesn't seem to have an chip-level function) and is implemented elsewhere.
See also: https://bugzilla.mozilla.org/show_bug.cgi?id=967709
For a cos/sine discussion in the Mozilla community, comparison of Mozilla's implementation vs old v8 implementation.
See also: How is Math.Pow() implemented in .NET Framework?
Intrinsic functions are chip-level, actually implemented on the processor. (We don't necessarily need lookup tables any more.)

Its here and also here.
Also go on wikipedia
You will find pow there.

Reflection support in C

I know it is not supported, but I am wondering if there are any tricks around it. Any tips?

Reflection in general is a means for a program to analyze the structure of some code.
This analysis is used to change the effective behavior of the code.
Reflection as analysis is generally very weak; usually it can only provide access to function and field names. This weakness comes from the language implementers essentially not wanting to make the full source code available at runtime, along with the appropriate analysis routines to extract what one wants from the source code.
Another approach is tackle program analysis head on, by using a strong program analysis tool, e.g., one that can parse the source text exactly the way the compiler does it.
(Often people propose to abuse the compiler itself to do this, but that usually doesn't work; the compiler machinery wants to be a compiler and it is darn hard to bend it to other purposes).
What is needed is a tool that:
Parses language source text
Builds abstract syntax trees representing every detail of the program.
(It is helpful if the ASTs retain comments and other details of the source
code layout such as column numbers, literal radix values, etc.)
Builds symbol tables showing the scope and meaning of every identifier
Can extract control flows from functions
Can extact data flow from the code
Can construct a call graph for the system
Can determine what each pointer points-to
Enables the construction of custom analyzers using the above facts
Can transform the code according to such custom analyses
(usually by revising the ASTs that represent the parsed code)
Can regenerate source text (including layout and comments) from
the revised ASTs.
Using such machinery, one implements analysis at whatever level of detail is needed, and then transforms the code to achieve the effect that runtime reflection would accomplish.
There are several major benefits:
The detail level or amount of analysis is a matter of ambition (e.g., it isn't
limited by what runtime reflection can only do)
There isn't any runtime overhead to achieve the reflected change in behavior
The machinery involved can be general and applied across many languages, rather
than be limited to what a specific language implementation provides.
This is compatible with the C/C++ idea that you don't pay for what you don't use.
If you don't need reflection, you don't need this machinery. And your language
doesn't need to have the intellectual baggage of weak reflection built in.
See our DMS Software Reengineering Toolkit for a system that can do all of the above for C, Java, and COBOL, and most of it for C++.
[EDIT August 2017: Now handles C11 and C++2017]

Tips and tricks always exists. Take a look at Metaresc library https://github.com/alexanderchuranov/Metaresc
It provides interface for types declaration that will also generate meta-data for the type. Based on meta-data you can easily serialize/deserialize objects of any complexity. Out of the box you can serialize/deserialize XML, JSON, XDR, Lisp-like notation, C-init notation.
Here is a simple example:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "metaresc.h"
TYPEDEF_STRUCT (point_t,
double x,
double y
);
int main (int argc, char * argv[])
{
point_t point = {
.x = M_PI,
.y = M_E,
};
char * str = MR_SAVE_XML (point_t, &point);
if (str)
{
printf ("%s\n", str);
free (str);
}
return (EXIT_SUCCESS);
}
This program will output
$ ./point
<?xml version="1.0"?>
<point>
<x>3.1415926535897931</x>
<y>2.7182818284590451</y>
</point>
Library works fine for latest gcc and clang on Linux, MacOs, FreeBSD and Windows.
Custom macro language is one of the options. User could do declaration as usual and generate types descriptors from DWARF debug info. This moves complexity to the build process, but makes adoption much easier.

any tricks around it? Any tips?
The compiler will probably optionally generate 'debug symbol file', which a debugger can use to help debug the code. The linker may also generate a 'map file'.
A trick/tip might be to generate and then read these files.

Based on the responses to How can I add reflection to a C++ application? (Stack Overflow) and the fact that C++ is considered a "superset" of C, I would say you're out of luck.
There's also a nice long answer about why C++ doesn't have reflection (Stack Overflow).

I needed reflection in a bunch of structs in a C++ project.
I created a xml file with the description of all those structs - fortunately the fields types were primitive types.
I used a template (not C++ template) to auto generate a class for each struct along with setter/getter methods.
In each class I used a map to associate string names and class members (pointers to members).
I didn't regret using reflection because it opened new ways to design my core functionality that I couldn't even imagine without reflection.
(BTW, it was an external report generator for a program that uses a raw database)
So, I used code generation, function pointers and maps to simulate reflection.

I know of the following options, but all come at cost and a lot of limitations:
Use libdl (#include <dfcln.h>)
Call a tool like objdump or nm
Parse the object files yourself (using a corresponding library)
Involve a parser and generate the necessary information at compile time.
"Abuse" the linker to generate symbol arrays.
I'll use a bit of unit test frameworks as examples further down, because automatic test discovery for unit test frameworks is a typical example where reflection comes in very handy, and it's something that most unit test frameworks for C fall short of.
Using libdl (#include <dfcln.h>) (POSIX)
If you're on a POSIX environment, a little bit of reflection can be done using libdl. Plugins are developed that way.
Use
#include <dfcln.h>
in your source code and link with -ldl.
Then you have access to functions dlopen(), dlerror(), dlsym() and dlclose() with which you could load and access / run shared objects at runtime. However, it does not give you easy access to the symbol table.
Another disadvantage of this approach is that you basically restrict reflection to objects loaded as dynamic library (shared object loaded at runtime via dlopen()).
Running nm or objdump
You could run nm or objdump to show the symbol table and parse the output.
For me, nm -P --defined-only -g xyz.o gives good results, and parsing the output is trivial.
You'd be interested in the first word of each line only, which is the symbol name, and maybe the second one, which is the section type.
If you do not know the object name in some static way, i.e. the object is actually a shared object, at least on Linux you then might want to skip symbol names starting with '_'.
objdump, nm or similar tools are also often available outside POSIX environments.
Parsing the object files yourself
You could parse the object files yourself. You probably don't want to implement that from scratch but use an existing library for that. This is how nm, objdump and even libdl are implemented. You could peek at the source code of nm, objdump and libdl and the libraries they use in order to find out how they do what they do.
Involving a Parser
You could write a parser and code generator which generates the necessary reflective information at compile time and stores it in the object file. Then you have a lot of freedom and could even implement primitive forms of annotations. That's what some unit test frameworks like AceUnit do.
I found that writing a parser which covers straight-forward C syntax is fairly trivial. Writing a parser which really understands C and could deal with all cases is NOT trivial.
So, this has limitations which depend on how exotic the C syntax is that you want to reflect upon.
"Abusing" the linker to generate symbol arrays
You could put references to symbols which you want to reflect upon in a special section and use a linker configuration to emit the section boundaries so you can access them in C.
I've described here N-Dependency injection in C - better way than linker-defined arrays? how this works.
But beware, this is depending on a lot of things and not very portable. I have only tried this with GCC/ld, and I know it doesn't work with all compilers / linkers. Also, it's almost guaranteed that dead code elimination will not detect how you call this stuff, so if you use dead code elimination, you will have to add all the reflected symbols as entry points.
Pitfalls
For some of the mechanisms, dead code elimination can be a problem, in particular when you "abuse" the linker to generate a symbol arrays. It can be worked around by telling the reflected symbols as entry points to the linker, and depending on the amount of symbols this might be neither nice nor convenient.
Conclusion
Combining nm and libdl can actually give quite good results. The combination can be almost as powerful as the level of Reflection used by JUnit 3.x in Java. The level of reflection given is sufficient to implement a JUnit 3.x-style unit test framework for C, including test-case discovery by naming convention.
Involving a parser is more work and limited to objects that you compile yourself, but gives you most power and freedom. The level of reflection given can be sufficient to implement a JUnit 4.x-style unit test framework for C, including test-case discovery by annotations. AceUnit is a unit test framework for C that does exactly this.
Combining parsing and the linker to generate symbol arrays can give very nice results - if your environment is so much under your control that you can ensure that working with the linker that way works for you.
And of course you can combine all approaches to stitch together the bits and pieces until they fit your needs.

You would need to implement it from yourself from the ground up. In straight C, there is no runtime information whatsoever kept on structure and composite types. Metadata simply does not exist in the standard.

Implementing reflection for C would be much simpler... because C is simple language.
There is some basic options for analazing program, like detect if function exists by calling dlopen/dlsym -- depends on your needs.
There are tools for creating code that can modify/extend itselfusing tcc.
You may use the above tool in order to create your own code analizers.

For similar reasons to the author of the question, I have been working on a C-type-reflection-API along with a C reflection graph database format and a clang plug-in that writes reflection metadata.
The intent is to use the C reflection API for writing serialization and deserialization routines, such as mappers for ASN.1, function argument printers, function proxies, fuzzers, etc. Clang and GCC both have plugin APIs that allow access to the AST but there currently is no standard graph format for C reflection metadata.
The proposed C reflection API is called Crefl:
https://github.com/michaeljclark/crefl
The Crefl API provides runtime access to reflection metadata for C structure declarations with support for arbitrarily nested combinations of: intrinsic, set, enum, struct, union, field (member), array, constant, variable.
The Crefl reflection graph database format for portable reflection metadata.
The Crefl clang plug-in outputs C reflection metadata used by the library.
The Crefl API provides task-oriented query access to C reflection metadata
A C reflection API provides access to runtime reflection metadata for C structure declarations with support for arbitrarily nested combinations of: intrinsic, set, enum, struct, union, field, array, constant, variable. The Crefl C reflection data model is essentially a transcription of the C data types in ISO/IEC 9899:9999.
C intrinsic data types.
integer types.
floating-point types.
complex number types.
boolean type.
nested struct, union, field, and bitfield
arrays and pointers
typedef type aliases
enum and enum constants
functions and function parameters
const, volatile and restrict qualifiers
GNU-C style attributes using (__attribute__).
The library is still a work in progress. The hope is to find others who are interested in reflection support in C.

Parsers and Debug Symbols are great ideas. However, the gotcha is that C does not really have arrays. Just pointers to stuff.
For example, there is no way by reading the source code to know whether a char * points to a character, a string, or a fixed array of bytes based on some "nearby" length field. This is a problem for human readers let alone any automated tool.
Why not use a modern language, like Java or .Net? Can be faster than C as well.