Linking object files from different C compilers - c

Say I have two compilers, or even a single compiler with two different option sets. Each compiler compiles some C code into an object and I try to link the two .o files with a common linker. Will this succeed?
My initial thought is: not always. If the compilers are using the same object file format and have compatible options, then it would succeed. But, if the compilers have conflicting options, or (and this is an easy one) are using two different object file formats, it would not work correctly.
Does anyone have more insight on this? What standards would the object files need to comply with to gain confidence that this will work?

Most flavors of *nix OSes have well defined and open ABI and mostly use ELF object file format, so it is not a problem at all for *nix.
Windows is less strictly defined and different compilers may vary in some calling conventions (for example __fastcall may not be supported by some compilers or may have different behavior, see https://en.wikipedia.org/wiki/X86_calling_conventions). But main set of calling conventions (__stdcall, _cdecl, etc) is standard enough to ensure successfull call of function compiled by one compiler from another compiler, otherwise the program won't work at all, since unlike Linux every system call in Windows is wrapped by function from DLL which you need to successfully call.
The other problem is that there is no standard common format for object files. Although most tools (MS, Intel, GCC (MinGW), Clang) use COFF format, some may use OMF (Watcom) or ELF (TinyC).
Another problem is so called "name mangling". Although it was introduced to support overloading C++ functions with the same name, it was adopted by C compilers to prevent linkage of functions defined with different calling conventions. For example, function int _cdecl fun(void); will get compiled name _fun whilst int __stdcall fun(void); will get name _fun#0. More information on name mangling see here: https://en.wikipedia.org/wiki/Name_mangling.
At last, default behavior may differ for some compilers, so yes, options may prevent successful linking of object files produced by different compilers or even by the same compiler. For example, TinyC use default convention _cdecl, whilst CLang use __stdcall. TinyC with default options may not produce code that may be linked with other because it doesn't prepend name by underscore sign. To make it cross-linkable it needs -fleading-underscore option.
But keeping in mind all said above the code may successfully be intermixed. For example, I successfully linked together code produced by Visual Studio, Intel Parallel Studio, GCC (MinGW), Clang, TinyC, NASM.

Related

Are static libraries version independent?

Say you're using C11, and you need to use a library (either static or dynamic) written in C17. Can you compile the library to object files, and then just link those with the ones of your program? I mean, object files are just executable files (which are in machine-code, or binary?), except you need to link them before they're executable. Anything crazy in what I just wrote?
By the way, is an object file without any dependencies executable?
For one routine to be able to call another, they need to pass and receive arguments in a compatible way. Computing platforms typically have an application binary interface (ABI) that says how arguments are passed. Routines written in C, C++, FORTRAN, PL/I, or other languages can call each other as long as they use the same ABI. Routines compiled with different C standards can call each other as long as they use the same ABI.
There are compatibility issues other than passing arguments. One version of a library might have some feature that requires specifying a length. The next version might require specifying the length and a width, or it might have the length required and the width optional. A program written for one version of the library might not be able to use another version because it is not passing the arguments the library requires, even though the way it is passing the arguments conforms to the ABI.
If you have the source code for a library and for your own program and you compile them both using the same compiler, just with a C 2017/2018 switch for one and C 2011 for the other, they can work together if you call the library routines correctly.
Object files are generally not executable because they are in a different format than executable files. There are different object formats, and, as far as what is theoretically possible if not actually practiced, somebody could design an object file format that is executable if it contains no dependencies or could design a program loader that reads an object file format and loads it for execution, again if it has no dependencies. In that regard, though, you could execute a source file by making a program loader compiled, linked, and loaded it.

Can I write (x86) assembly language which will build with both GCC and MSVC?

I have a project which is entirely written in C. The same C files can be compiled using either GCC for Linux or MSVC for Windows. For performance reasons, I need to re-write some of the code as x86 assembly language.
Is it possible to write this assembly language as a source file which will build with both the GCC and MSVC toolchains? Alternatively, if I write an assembly source file for one toolchain, is there a tool to convert it to work with the other?
Or, am I stuck either maintaining two copies of the assembly source code, or using a third-party assembler such as NASM?
I see two problems:
masm and gas have different syntax. gas can be configured to use Intel syntax with the .syntax intel,noprefix directive, but even then small differences remain (such as, different directives). A possible approach is to preprocess your assembly source with the C preprocessor, using macros for all directives that differ between the two. This also has the advantage of providing a unified comment syntax.
However, just using a portable third party assembler like nasm is likely to be less of a hassle.
Linux and Windows have different calling conventions. A possible solution for x86-32 is to stick to a well-supported calling convention like stdcall. You can tell gcc what calling convention to use when calling a function using function attributes. For example, this would declare foo to use the stdcall calling convention:
extern int foo(int x, int y) __attribute__((stdcall));
You can do the same thing in MSVC with __declspec, solving this issue.
On x86-64, a similar solution is likely possible, but I'm not exactly sure what attributes you have to set.
You can of course also use the same cpp-approach as for the first problem to generate slightly different function prologues and epilogues depending on what calling convention you need. However, this might be less maintainable.

C and assembly how can it work?

I am wondering how mixing C and assembly can be possible as compilers generate code in different ways, for example many C compilers will use registers rather than pushing to the stack while making a function call, These functions will then move those registers into the appropiate memory locations because of this what if you write assembly code or link with an object file created by a different compiler that will call the C function but instead push the arguments to the stack rather than set the registers.
My guess is the C compiler assembly output has done it in such a clever way that it doesn't make a difference and it will still work but I can't be sure looking at the assembly code it doesn't appear it would work.
Can anyone answer my question as I am writing a compiler and need to know this so I don't make any mistakes should I want to link with a C module in the future.
The conventions that are used for calling functions are part of what's called the "application binary interface" (ABI). If this interface is specified, then all code that follows the specification can be linked together.
There is no standard ABI for C. However, most popular platforms have one prevailing C compiler that effectively produces a de-facto standard ABI (e.g. there's one for Windows, one for Linux on x86 (32 and 64 bit), one for Linux on ARM, etc.). ABIs may specify a large number of separate "calling conventions", and your C compiler will typically let you specify the desired convention at the point of function declaration using some vendor extension.
Conversely, if there is no documented ABI for your C compiler, or for an existing bit of object code, then you cannot in general link (or otherwise interact) with it successfully.

Passing struct between code generated by different compilers

The memory layout of a struct is up to the compiler. So what happens when some code compiled by one compiler uses a struct generated by code compiled by another compiler?
For example, say I have a header file that declares a struct somestruct, and a function that returns the struct. One source file defines that function and is compiled by compiler A. Another source file uses than function and is compiled by compiler B and links against the binary of the other source file.
If the two compilers create two different layouts for somestruct, then what's the layout of the variable returned by the function? Does it defer to one compiler's layout, or will there be a memory bug when the second source file tries to access elements of the struct returned by the first source file? Is it an error at compile time or link time?
The function will return a structure as specified by the ABI of the compiler of the function. The callee compiler, will just treat the function as if it conforms to the ABI of itself.
Assuming the two compilers use a similar ABI, in most cases, no errors will be reported during compile-time or link time or even during runtime. For some compatible compilers like Clang, GCC, and Intel C Compiler on OS X and Linux, no errors should result (if there are errors then it's a bug of the compiler). However in real world it is usually difficult to find fully compatible compilers (in most cases their ABIs are similar but not exactly the same; such ABI errors will be even harder to track down because your app would appear normal and crashes under some really weird circumstances are encountered during runtime).
Just as Basile said, name mangling for C++ poses an additional difference in ABI, but such differences are more easily caught during compile time as the linker literally can't find the symbol of the function, rather than finding a function that is not compatible.
Also, passing structures is another headache in terms of ABI because there are multiple structure-packing ABIs, sometimes even different in "compatible" compilers like GCC/MinGW and MSVC. (See also the -m[no-]ms-bitfields option in GCC, which forces GCC to use the MSVC ABI for structures.) I have also seen some cases where passing structures by pointer is more reliable than passing structures by value.
The layout of data (e.g. structures etc...), and the call protocol (how are call done at the processor level) are defined in a (processor and operating system specific) document called Application Binary Interface. If both compilers are following the same ABI (for the same processor and the same operating system) their generated code should be interoperable.
See e.g. the wikipage for x86 calling conventions and the x86-64 ABI specification.
Name mangling, notably for C++, might also be an issue.
Read also Levine's book on Linkers and Loaders

Delphi dcu to obj

Is there a way to convert a Delphi .dcu file to an .obj file so that it can be linked using a compiler like GCC? I've not used Delphi for a couple of years but would like to use if for a project again if this is possible.
Delphi can output .obj files, but they are in a 32-bit variant of Intel OMF. GCC, on the other hand, works with ELF (Linux, most Unixes), COFF (on Windows) or Mach-O (Mac).
But that alone is not enough. It's hard to write much code without using the runtime library, and the implementation of the runtime library will be dependent on low-level details of the compiler and linker architecture, for things like correct order of initialization.
Moreover, there's more to compatibility than just the object file format; code on Linux, in particular, needs to be position-independent, which means it can't use absolute values to reference global symbols, but rather must index all its global data from a register or relative to the instruction pointer, so that the code can be relocated in memory without rewriting references.
DCU files are a serialization of the Delphi symbol tables and code generated for each proc, and are thus highly dependent on the implementation details of the compiler, which changes from one version to the next.
All this is to say that it's unlikely that you'd be able to get much Delphi (dcc32) code linking into a GNU environment, unless you restricted yourself to the absolute minimum of non-managed data types (no strings, no interfaces) and procedural code (no classes, no initialization section, no data that needs initialization, etc.)
(answer to various FPC remarks, but I need more room)
For a good understanding, you have to know that a delphi .dcu translates to two differernt FPC files, .ppu file with the mentioned symtable stuff, which includes non linkable code like inline functions and generic definitions and a .o which is mingw compatible (COFF) on Windows. Cygwin is mingw compatible too on linking level (but runtime is different and scary). Anyway, mingw32/64 is our reference gcc on Windows.
The PPU has a similar version problem as Delphi's DCU, probably for the same reasons. The ppu format is different nearly every major release. (so 2.0, 2.2, 2.4), and changes typically 2-3 times an year in the trunk
So while FPC on Windows uses own assemblers and linkers, the .o's it generates are still compatible with mingw32 In general FPC's output is very gcc compatible, and it is often possible to link in gcc static libs directly, allowing e.g. mysql and postgres linklibs to be linked into apps with a suitable license. (like e.g. GPL) On 64-bit they should be compatible too, but this is probably less tested than win32.
The textmode IDE even links in the entire GDB debugger in library form. GDB is one of the main reasons for gcc compatibility on Windows.
While Barry's points about the runtime in general hold for FPC too, it might be slightly easier to work around this. It might only require calling certain functions to initialize the FPC rtl from your startup code, and similarly for the finalize. Compile a minimal FPC program with -al and see the resulting assembler (in the .s file, most notably initializeunits and finalizeunits) Moreover the RTL is more flexible and probably more easily cut down to a minimum.
Of course as soon as you also require exceptions to work across gcc<->fpc bounderies you are out of luck. FPC does not use SEH, or any scheme compatible with anything else ATM. (contrary to Delphi, which uses SEH, which at least in theory should give you an advantage there, Barry?) OTOH, gcc might use its own libunwind instead of SEH.
Note that the default calling convention of FPC on x86 is Delphi compatible register, so you might need to insert proper cdecl (which should be gcc compatible) modifiers, or even can set it for entire units at a time using {$calling cdecl}
On *nix this is bog standard (e.g. apache modules), I don't know many people that do this on win32 though.
About compatibility; FPC can compile packages like Indy, Teechart, Zeos, ICS, Synapse, VST
and reams more with little or no mods. The dialect levels of released versions are a mix of D7 and up, with the focus on D7. The dialect level is slowly creeping to D2006 level in trunk versions. (with for in, class abstract etc)
Yes. Have a look at the Project Options dialog box:
(High-Res)
As far as I am aware, Delphi only supports the OMF object file format. You may want to try an object format converter such as Agner Fog's.
Since the DCU format is proprietary and has a tendency of changing from one version of Delphi to the next, there's probably no reliable way to convert a DCU to an OBJ. Your best bet is to build them in OBJ format in the first place, as per Andreas's answer.

Resources