Writing a D (D2) binding for existing C libraries

Writing a D (D2) binding for existing C libraries - c

I'd really like to get more into D, but the lack of good library support is really hindering me. Therefore I'd like to create some D bindings for existing C libraries I'd like to use. I've never done any binding, but it doesn't look too difficult either.
I'm planning to do this for D2 (not specifically D1, but if it could be for both, even better). I am using the DMD2 compiler.
What conventions should be used (I noticed version statements, aliases and regular constants / function definitions)?
What would be the difference between binding to a static library (and thus linked against) or a dynamic library? Is there any difference in the binding?
For binding a static library, the DMD compiler doesn't seem to accept .a or .o files, only .lib and .obj. Does this mean the libraries must be compiled with the DMC compiler (as opposed to the GCC compiler), and then linked through the DMD compiler?
If someone had a very short example of how a binding would be accomplished, I would be great full. Currently I can compile C code with DMC, link the object files and run functions from the C code in D. However, most C libraries just need a header file inclusion AND need to be linked against in C. I'm uncertain how to make bindings that work for that...
Thanks!

A few things to note:
DMD and its linker Optlink work with the older OMF object file format, not COFF. This means that the C files you link against need to also be OMF. If you don't want to use DMC, there are tools that will convert COFF to OMF, though I don't know the details about them.
As far as translating .h files to .d files, a utility called htod is packaged with DMD, and will do this translation for you, albeit somewhat imperfectly if you severely abuse the preprocessor. Generally, you use const, immutable, or enum for manifest constants, version statements for conditional compilation, and regular (possibly templated) functions for macro functions.
As far as examples, one place to look would be in druntime, which contains bindings for the entire C standard library.

You may have a look at how Aldacron does with Derelict2.

Related

How to work around compiler built-in types in C standard header files

I am working on a static analysis tool for C. I need to pass the code being analysed through the C preprocessor so that the tool can see the library function prototypes, type definitions, etc. Unfortunately both with clang on Mac OS X and gcc on Linux distros, some of the standard header files refer to compiler built-in types like __builtin_va_list that my tool doesn't know about. Does anyone have any suggestions for how to work around this. One possibility, if it's available somewhere, would be a vanilla-flavoured set of header files that produce C that conforms strictly to the standard. The header files don't have to map to any ABI, as the tool doesn't need to compile and run the code: they just have to give the API promised by the C standard. Any suggestions will be gratefully received.

Instead of finding a set of standard standard header files, you can just use a set of empty files with the expected names and pass the source code through the compiler preprocessor with a -Idirectory option. Your syntax analysis tool should be able to deal with the remaining symbols.
It would be useful to have a preprocessor option in addition to -dI to preserve #include lines instead of handling them.
In the mean time, you can try using the include files from my nolibc repository.

Is it possible to see the macros of a compiled C program?

I am trying to learn C and I have this C file that I want view the macros of. Is there a tool to view the macros of the compiled C file.

No. That's literally impossible.
The preprocessor is a textual replacement that happens before the main compile pass. There is no difference between using a macro and putting the code the macro expands to in its place.*
*Ignoring the debugger output. But even then you can do it if you know the right #pragma to tell it the file and line number.

They're always defined in the header file(s) that you've imported with #include, or that those files in turn #include.
This may involve a lot of digging. It may involve going into files that make no sense to you because they're not written for casual inspection.
Any macros of any importance are usually documented. They may use other more complex implementation-specific macros that you shouldn't concern yourself with ordinarily, but if you're curious how they work the source is all there.
That being said, this is only relevant if you have the source and more specifically a complete build environment. Once compiled all these definitions, like the source itself, do not appear in the executable and cannot be inferred directly from the executable, especially not a release build.
Unlike Java or C#, C compiles directly to machine code so there's no way to easily reverse that back to the source. There are "decompilers" that try, but they can only really guess as to the original source. VM-based languages like Java and C# only lightly compile the code, sot here are a lot of hints as to how that code was generated and reversing it is an easier process.

Why call static linker instead of preprocessor?

Suppose we have a static library and we want to use it for our main.c file, now the question is
Why we must call the linker (ld) ? since all we do is copy - pasting the code from our static lib in our main.c file ?
Couldn't the preprocessor deal with that ?

It could do it, in exactly the same way you could use a fish to fell a tree. It's not really the job it was designed to do.
The preprocessor phase is meant to morph the source code before being given to the compilation phase. Though some may complain about how deficient it seems, it does actually do this job reasonably well.
The linker, on the other hand, does not understand source code at all. It's primary purpose is to tie together object files (which may come from C, C++, nasm, gfortran, BCPL or even more bizarre compilers) to create an executable capable of running on the target system.

What is the difference between include and link when linking to a library?

What does include and link REALLY do? What are the differences? And why do I need to specify both of them?
When I write #include math.h and then write -lm to compile it, what does #include math.h and -lm do respectively?
In my understanding, when linking a library, you need its .h file and its .o file. Does this suggest #include math.h means take in the .h file while -lm take in the .o file?

The reason that you need both a header (the interface description) and the library (the implementation) is that C separates the two clearer than languages like C# or Java do. One can compile a C function (e.g. by invoking gcc -c <sourcefile>) which calls library code even when the called library is not present; the header, which contains the interface description, suffices. (This is not possible with C# or Java; the assemblies resp. class files/jars must be present.) During the link stage though the library must be there, even when it's dynamic, afaik.
With C#, Java, or script languages, by contrast, the implementation contains all information necessary to define the interface. The compiler (which is not as clearly separated from the linker) looks in the jar file or the C# assembly which contain called implementations and obtains information about function signatures and types from there.
Theoretically, that information could probably be present in a library written in C as well — it's basically the debug information. But the classic C compiler (as opposed to the linker) is oblivious to libraries or object files and cannot parse them. (One should remember that the "compiler" executable you usually use to compile a C program , e.g. gcc, is a "compiler driver" which interprets the command line arguments and calls the programs which actually do stuff, e.g. the preprocessor, actual compiler and actual linker, to create the desired output.)
So in theory, if you have a properly annotated library in a known location, you could probably write a compiler which compiles a C function against it without having function declarations and type definitions; the compiler would have to produce the proper declarations. The compiler would have to know which library to parse (which corresponds to setting a C# project "Reference" in VS or having a class path and name/class correspondence in Java).
It would probably be easiest to use a well-known debugging format like stabs or dwarf and extract the interface definitions from it with a little helper program which uses the API for the debug format, extracts the information and produces a C header which is prepended to every source file. That would be the job of the compiler driver, and the actual compiler would still be oblivious to that.

It's because headers files contain only declaration and .o files (or .obj, .dll or .lib) contain definitions of methods.
If you open an .h file, you will not see the code of methods, because that is in the libraries.
One reason is commercial, because you need to publish your code and have the source code in your company. Libraries are compiled, so you could publish it.
Header files only tell compiler, what classes and methods it can find in the library.

The header files are kind of a table-of-contents plus a kind of dictionary for the compiler. It tells the compiler what the library offers and gives special values readable names.
The library file itself contains the contents.

What you are asking are entirely two different things.
Don't worry , i will explain them to you.
You use # symbol to instruct the preprocessor to include the math.h header files which internally contain the function prototypes of fabs(),ceil() etc..
And you use -lm to instruct the linker, to include the pre-compiled function definitions of fabs(),ceil() etc. functions in the exe file .
Now, you may ask why we have to explicitly link library file of math functions unlike for other functions and the answer is ,it is due to some undefined historical reasons.

Any good reason to #include source (.c .cpp) files?

i've been working for some time with an opensource library ("fast artificial neural network"). I'm using it's source in my static library. When i compile it however, i get hundreds of linker warnings which are probably caused by the fact that the library includes it's *.c files in other *.c files (as i'm only including some headers i need and i did not touch the code of the lib itself).
My question: Is there a good reason why the developers of the library used this approach, which is strongly discouraged? (Or at least i've been told all my life that this is bad and from my own experience i believe it IS bad). Or is it just bad design and there is no gain in this approach?
I'm aware of this related question but it does not answer my question. I'm looking for reasons that might justify this.
A bonus question: Is there a way how to fix this without touching the library code too much? I have a lot of work of my own and don't want to create more ;)

As far as I see (grep '#include .*\.c'), they only do this in doublefann.c, fixedfann.c, and floatfann.c, and each time include the reason:
/* Easy way to allow for build of multiple binaries */
This exact use of the preprocessor for simple copy-pasting is indeed the only valid use of including implementation (*.c) files, and relatively rare. (If you want to include some code for another reason, just give it a different name, like *.h or *.inc.) An alternative is to specify configuration in macros given to the compiler (e.g. -DFANN_DOUBLE, -DFANN_FIXED, or -DFANN_FLOAT), but they didn't use this method. (Each approach has drawbacks, so I'm not saying they're necessarily wrong, I'd have to look at that project in depth to determine that.)
They provide makefiles and MSVS projects which should already not link doublefann.o (from doublefann.c) with either fann.o (from fann.c) or fixedfann.o (from fixedfann.c) and so on, and either their files are screwed up or something similar has gone wrong.
Did you try to create a project from scratch (or use your existing project) and add all the files to it? If you did, what is happening is each implementation file is being compiled independently and the resulting object files contain conflicting definitions. This is the standard way to deal with implementation files and many tools assume it. The only possible solution is to fix the project settings to not link these together. (Okay, you could drastically change their source too, but that's not really a solution.)
While you're at it, if you continue without using their project settings, you can likely skip compiling fann.c, et. al. and possibly just removing those from the project is enough – then they won't be compiled and linked. You'll want to choose exactly one of double-/fixed-/floatfann to use, otherwise you'll get the same link errors. (I haven't looked at their instructions, but would not be surprised to see this summary explained a bit more in-depth there.)

Including C/C++ code leads to all the code being stuck together in one translation unit. With a good compiler, this can lead to a massive speed boost (as stuff can be inlined and function calls optimized away).
If actual code is going to be included like this, though, it should have static in most of its declarations, or it will cause the warnings you're seeing.

If you ever declare a single global variable or function in that .c file, it cannot be included in two places which both compile to the same binary, or the two definitions will collide. If it is included in even one place, it cannot also be compiled on its own while still being linked into the same binary as its user.
If the file is only included in one place, why not just make it a discrete compilation unit (and use its globals via extern declarations)? Why bother having it included at all?
If your C files declare no global variables or functions, they are header files and should be named as such.
Therefore, by exhaustive search, I can say that the only time you would ever potentially want to include C files is if the same C code is used in building multiple different binaries. And even there, you're increasing your compile time for no real gain.
This is assuming that functions which should be inlined are marked inline and that you have a decent compiler and linker.
I don't know of a quick way to fix this.

I don't know that library, but as you describe it, it is either bad practice or your understanding of how to use it is not good enough.
A C project that wants to be included by others should always provide well structured .h files for others and then the compiled library for linking. If it wants to include function definitions in header files it should either mark them as static (old fashioned) or as inline (possible since C99).

I haven't looked at the code, but it's possible that the .c or .cpp files being included actually contain code that works in a header. For example, a template or an inline function. If that is the case, then the warnings would be spurious.

I'm doing this at the moment at home because I'm a relative newcomer to C++ on Linux and don't want to get bogged down in difficulties with the linker. But I wouldn't recommend it for proper work.
(I also once had to include a header.dat into a C++ program, because Rational Rose didn't allow headers to be part of the issued software and we needed that particular source file on the running system (for arcane reasons).)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight