Crosscompiler Binary compatibility in C - c

I need to verify something for which I have doubts. If a shared library ( .dll) is written in C, with the C99 standard and compiled under a compiler. Say MinGw. Then in my experience it is binary compatible and hence useable from any other compiler. Say MS Visual Studio. I say in my experience because I have tried it successfully more than once. But I need to verify if this is a rule.
And in addition I would like to ask if it is indeed so, then why libraries written completely in C, like openCV for example don't provide compiled binaries for every different OS? I know that the obvious reason would be to set all the compile-time parameters, but other than that there is none right?
EDIT: I am adding an additional question which I see as a logical extension to the original. Isn't this how one would go and create a closed source library? Since the option of giving source goes out of the window there, giving binaries is the only choice. And in that case providing binaries for as many architectures as possible is the desired result, with C being an obvious choice for having the best portability between systems and compilers. Right?

In the specific case of C compilers (MSVC and GCC/MinGW) in the Windows world, you are correct in the assumption of binary compatibility. One can link a C interface DLL compiled by GCC to a program in Visual Studio. This is the way C99 projects like ffmpeg allow developers to write application wiht Visual Studio. One only needs to create the import library with lib.exe found in the Microsoft toolchain from the DLL. Or vice versa, using mingw.org's pexports or better, mingw-w64's gendef tool, one can create a GCC import lib for a MSVC produced DLL.
This handy interoperability breaks down when you enter the C++ interface world, where the ABI of MSVC and GCC is different and incompatible. It may work, it may not, no guarantees are made and no effort is (currently) being done in changing that. Also, debugging info is obviously different, until someone writes a debug information generator/writer in GCC that is compatible to MSVC's debugger (along with gdb support of course).
I don't think C99 specifically changes anything to function declarations or the way arguments are handled in symbol definitions, so there should be no problem here either.
Note that as Vijay said, there is still the architecture difference, so a x86 library can't be used when linking to an AMD64 library.
To also answer your additional question about closed source binaries and distributing a version for all available compilers/architectures.
This is exactly the way you would create a closed source binary. In addition to the import library, it is also very important to hide exports from the DLL, making the DLL itself useless for linking (if you don't want client code to use private functions in the library, see for example the output of dumpbin /exports on a MSOffice DLL, lots of hidden stuff there). You can achieve the same thing with GCC (I believe, never used or tried it) using things like __attribute(hidden) etc...
Some compiler specific points:
MSVC comes with four (well, actually only three remaining in newer versions) different runtime libraries through /MT, /MD, and /LD. On top of this, you would have to provide a build for each version of Visual Studio (including Service Packs) to assure compatibility. But that is closed source binary and Windows for you...
GCC does not have this problem; MinGW always links to msvcrt.dll provided by Windows (since Windows 98), equivalent with /MD (and maybe also a debug library equivalent with /MDd). But I there are two versions of MinGW (mingw.org and mingw-w64) which do not guarantee binary compatibility. THe latter is more complete as it provides 64-bit options as well as 32-bit, and provides a more complete header/library set (including a substantial part of DirectX and DDK).

The general rule is that IF your OS/CPU combination has a standard ABI, and IF that ABI is powerful enough for your language, most compilers will follow that ABI and as a result will be binary compatible, allowing you to link libraries (shared or static) compiled with different compilers to programs compiled with other compilers just fine.
The problem is that most ABIs are fairly weak -- they're designed around low-level languages like C and FORTRAN and date back to the days before object oriented languages like C++. So they tend to lack support for things like function overloading, user-defined operators, exceptions, global contructors and destructors, virtual functions, inheritance, and such that are needed by C++.
This lack was recognized when C++ was designed which is why C++ has extern "C" -- which causes the compiler to limit itself to the standard ABI for certain functions, while disabling all the extra C++ features that the ABIs generally don't support.

A shared library or dll compiled to a particular architecture can be linked to applications compiled by other compilers that target the same architecture. (By architecture, I mean a processor/OS combination). But it is not practical for a library developer to compile against all possible architectures. Moreover, when a library is distributed in source form, users can build binaries optimized to their specific requirements.

Related

Are C standard library structures compatible between compilers and library versions on macOS or Linux?

My host application took over the ownership of e.g. a FILE object which came from a dynamic library. Can I call fclose() on this object safely even though my host application and the dynamic library are compiled with different versions of clang / gcc?
Background
On Windows (with different VS runtimes) it would be illegal and I have to first extract the fclose() function from the runtime library which is used by the dynamic library since all runtimes have their own pools and internal structures for file or memory objects.
An illustration for the situation in Windows would look like this:
Does this restriction apply for Linux and macOS as well?
The issue is not whether your application and the dynamic libraries were compiled with different versions of clang and/or gcc. The issue is whether, ultimately, there's one underlying C library that manipulates one kind of FILE * object and has one, compatible implementation of fclose().
Under MacOS and Linux, at least, the answer to all these questions is likely to be "yes". In my experience it's hard to get two different, incompatible C libraries into the mix; you'd have to really work at it.
Addendum: I suppose I should admit, however, that my experience may be getting dated. In my experience, on any Unix-like system, there's exactly one C library, generally /lib/libc.{a,so}. But I gather that "modern" compilers are tending to access their own compiler- and version-specific libraries off in special places, meaning that the scenario you're worried about could be a problem. To me, it seems, this way lies madness, but then again, it seems that more and more of the world seems to be embracing dependency hell, rather than trying to eliminate it.
It is not generally safe to use a library designed for one compiler with code compiled by a different compiler. A compiler may generate code that implements the nominal functions in the standard library using internal routines or interfaces, and those routines or interfaces may be different or missing in the library designed for another compiler.
Nor is it safe to take any pointer to some internal data structure from one library and use it with another library.
If the sources are just compiled with different versions of one compiler (e.g., clang 73 and clang 89), not different compilers (e.g., Apple clang versus GCC), the compiler might offer some guarantee about library compatibility. You would have to check its documentation. Or, if the compiler is intended to use the library provided with the operating system, that could work. Again, you would have to check its documentation.
On Linux, if both your code and the other library dynamically link to the same library (such as libc.so.6), both will get the same version and implementation of that library at runtime. You can check which libraries a given dynamic library links to with ldd.
If you were linking to a library that statically linked in a supporting library, you would need to be careful to pass any structures to or from it against the same version of the library. But this is more likely to come up in practice with libc++ and libstdc++ than with libc.
So, don't statically link your library to another and then pass a data structure that requires client code to separately link to the same library.

C headers: compiler specific vs library specific?

Is there some clear-cut distinction between standard C *.h header files that are provided by the C compiler, as oppossed to those which are provided by a standard C library? Is there some list, or some standard locations?
Motivation: int this answer I got a while ago, regarding a missing unistd.h in the latest TinyC compiler, the author argued that unistd.h (contrarily to sys/unistd.h) should not be provided by the compiler but by your C library.
I could not make much sense of that response (for one thing shouldn't that also apply to, say, stdio.h?) but I'm still wondering about it. Is that correct? Where is some authoritative reference for this?
Looking in other compilers, I see that other "self contained" POSIX C compilers that are hosted in Windows (like the GCC toolchain that comes with MinGW, in several incarnations; or Digital Mars compiler), include all header files.
And in a standard Linux distribution (say, Centos 5.10) I see that the gcc package provides a few header files (eg, stdbool.h, syslimits.h) in /usr/lib/gcc/i386-redhat-linux/4.1.1/include/, and the glibc-headers package provides the majority of the headers in /usr/include/ (including stdio.h, /usr/include/unistd.h and /usr/include/sys/unistd.h).
So, in neither case I see support for the above claim.
No, there is no clear-cut distinction.
As far as the C standard is concerned (here's a recent draft), the compiler and the library together make up the implementation, distinguished mostly by being described in sections 6 and 7 of the standard, respectively.
For some implementations, the compiler and the runtime library are provided by the same vendor/organization/person, either as a single installable package or as two separate packages. For other implementations (including gcc), the bulk of the standard library is provided by the underlying operating system, but the installation package for the compiler includes a few of its own headers.
Another example: When you install gcc from source on Solaris, the installer runs a script that grabs copies of some of the existing header files (provided by Sun's Oracle's runtime library) and edits them, installing the modified copies in a separate directory.
On GNU/Linux systems, the default C compiler is usually gcc, and the runtime library is provided by glibc -- both GNU packages, but developed separately. The MinGW implementation under Windows uses the gcc compiler with Microsoft's runtime library (which leads to some problems because they disagree on the representation of long double).
The choice of which standard headers need to be provided by the compiler is made by the authors of the compiler. Headers whose implementation is tightly tied to a particular compiler (such as <stdint.h>, <limits.h>, and <float.h>) are typically provided by the compiler; headers that provide an interface to operating system services, like <stdio.h> and <stdlib.h> are typically provided by the runtime library or perhaps by the OS.
The C standard provides no direct guidance regarding how this choice should be made.
Except for embedded (free-standing) C implementations, it makes little sense to separate C into compiler and libraries. Neither a compiler without libraries, not a C library without a compiler make much sense. Only a compiler together with a library makes up a complete implementation.
Since C89, the standard library is part of the C standard, and the list of required header files is listed in the standard. Other sets of libraries are standardized by Posix, X/Open ...
See this answer for a list: List of standard header files in C and C++
There are some headers that are by nature closer to the compiler itself, e.g. limits.h, which specifies the size of the data types. Some headers are closer to the OS, i.e. unistd.h. In both cases however, there are intersections, and if the OS' idea of, say, size_t and the compiler's implementation do not agree, nothing will work.
One could also argue that unistd.h should be provided by the OS and not by the C library - after all, the knowledge how to call a kernel function belongs into the OS.
In summary, I think this distinction into compiler, C library, operating system makes little sense.

A simple explanation of what is MinGW

I'm an avid Python user and it seems that I require MinGW to be installed on my Windows machine to compile some libraries. I'm a little confused about MinGW and GCC. Here's my question (from a real dummy point of view):
So Python is language which both interpreted and compiled. There are Linux and Windows implementations of Python which one simply installs and used the binary to a execute his code. They come bundled with a bunch of built-in libraries that you can use. It's the same with Ruby from what I've read.
Now, I've done a tiny bit a of C and I know that one has a to compile it. It has its built-in libraries which seem to be called header files which you can use. Now, back in the school day's, C, was writing code in a vi-like IDE called Turbo-C and then hitting F9 to compile it. That's pretty much where my C education ends.
What is MinGW and what is GCC? I've been mainly working on Windows systems and have even recently begun using Cygwin. Aren't they the same?
A simple explanation hitting these areas would be helpful.
(My apologies if this post sounds silly/stupid. I thought I'd ask here. Ignoring these core bits never made anyone a better programmer.)
Thanks everyone.
MinGW is a complete GCC toolchain (including half a dozen frontends, such as C, C++, Ada, Go, and whatnot) for the Windows platform which compiles for and links to the Windows OS component C Runtime Library in msvcrt.dll. Rather it tries to be minimal (hence the name).
This means, unlike Cygwin, MinGW does not attempt to offer a complete POSIX layer on top of Windows, but on the other hand it does not require you to link with a special compatibility library.
It therefore also does not have any GPL-license implications for the programs you write (notable exception: profiling libraries, but you will not normally distribute those so that does not matter).
The newer MinGW-w64 comes with a roughly 99% complete Windows API binding (excluding ATL and such) including x64 support and experimental ARM implementations. You may occasionally find some exotic constant undefined, but for what 99% of the people use 99% of the time, it just works perfectly well.
You can also use the bigger part of what's in POSIX, as long as it is implemented in some form under Windows. The one major POSIX thing that does not work with MinGW is fork, simply because there is no such thing under Windows (Cygwin goes through a lot of pain to implement it).
There are a few other minor things, but all in all, most things kind of work anyway.
So, in a very very simplified sentence: MinGW(-w64) is a "no-frills compiler thingie" that lets you write native binary executables for Windows, not only in C and C++, but also other languages.
To compile C program you need a C implementation for your specific computer.
C implementations consist, basically, of a compiler (its preprocesser and headers) and a library (the ready-made executable code).
On a computer with Windows installed, the library that contains most ready-made executable code is not compatible with gcc compiler ... so to use this compiler in Windows you need a different library: that's where MinGW enters. MinGW provides, among other things, the library(ies) needed for making a C implementation together with gcc.
The Windows library and MSVC together make a different implementation.
MinGW is a suite of development tools that contains GCC (among others), and GCC is a C compiler within that suite.
MinGW is an implementation of most of the GNU building utilities, like gcc and make on windows, while gcc is only the compiler. Cygwin is a lot bigger and sophisticated package, wich installs a lot more than MinGW.
The only reason for existence of MinGW is to provide linux-like environment for developers not capable of using native windows tools. It is inferior in almost every respect to Microsoft tooolchains on Win32/Win64 platforms, BUT it provides environment where linux developer does not have to learn anything new AND he/she can compile linux code almost without modifications. It is a questionable approach , but many people find that convenience more important than other aspects of the development .
It has nothing to do with C or C++ as was indicated in earlier answers, it has everything to do with the environment developer wants. Argument about GNU toolchains on windows and its nessessety, is just that - an argument
GCC - unix/linux compiler,
MinGW - approximation of GCC on Windows environment,
Microsoft compiler and Intel compiler - more of the same as names suggest(both produce much , much better programs on Windows then MinGW, btw)

Why in Linux compiler we have to give additional arguments while compiling and running C programs?

I have implemented semaphores in Linux last year. But for that I have to use -lpthread.
Now while implementing log10() function in C, I surfed the INTERNET and I saw that I have to use -lm.
I want to know why these kind of command line arguments are necessary in Linux.And Does this rule is compiler oriented?
(In windows Turboc compiler, I never used these kind of arguments.)
You are instructing the compiler to look for certain libraries and use them to try and produce a final object file.
When you were doing your threading code, you used threading primitives. These threading primitives are implemented in a library called pthread, -lpthread tells the linker to use the library pthread, without providing this switch the compiler will not be able to produce a valid object file as it is missing threading code implementation.
On the file system the libraries can be found in /usr/lib and lib (among others) when you look in these directories you will see files start with the lib prefix. for example libpthreadxxxxxx. You will have to do your own research to figure out what the xxxx means.
The development cycle using unix style tools is very granular on the surface, when you use heavyweight IDE's (read: visual studiio for C++), the IDE implicetly links against loads of standard libraries, so often you do not need to supply the name of the libraries you will commonly use. However, when you start doing more advanced programming you will probably have to install and configure your IDE to use external code libraries. If you were to use threading primitives in visual studio, you most likely will not have to provide the compiler with information on where to look for threading primitives, Microsoft considers this a common library and every new project will implicitly link against it.
A little discussion on GCC
GCC is a very diverse compiler producing code for various different usage scenarios. As such they try to be neutral and do not make assumptions. For example pthread is a particular threading primitives implementation. However, even through now on Linux at least it is the defacto standard, it is not the only one. Other Unix implementation have had different implementation. When such choices exist it is not fair for the compiler developers to implicitly link against libraries. They do however implicitly link against standard libraries; for example G++ is just a wrapper command to the internal compiler code, it is a C++ front-end so it implicitly links against an implementation of the C++ standard library. Similarly the C front end links against a the standard C library.
People often do not want to use certain standard library implementation, and instead they might want to use another implementation, in such cases you have to explicetly inform the compiler to use an implementation that you provide. Such use cases are very granular and are surface level issues with G++. In visual studio, you would have to tinker a lot to make such changes generally, since it is not an anticipated use-case anymore.
wikipedia will provide you with more information.
Edit: I'll fix the spelling and Grammatical issues later :D
The option -l indicates to gcc what libraries must be used for linking. -lpthread stands for "use the pthread library", and -lm stands for "use the m library" which is the math library. These commands are relative to gcc, not linux.
Because by default, gcc only links the C library (libc), which contains the well-known functions printf, scanf, and many more.
log10 exists in a different library called libm, and thus you need to explictly tell gcc to link that library, with -lm. The same logic applies for -lpthread.
This is purely a backwards, harmful practice. Separating parts of the standard library into separate .so files does nothing but increase load time and memory usage. Good luck getting anyone to change it though... Just accept that you have to do it (and that POSIX specifically allows, but does not require, that an implementation require -lm for using the math functions and -lpthread for using threads, etc.) and move on to more important things.
Or, go pester Drepper about it on the glibc bug tracker/mailing list. He won't change his mind, but if you enjoy flamewars you can get some kicks...

C libraries are distributed along with compilers or directly by the OS?

As per my understanding, C libraries must be distributed along with compilers. For example, GCC must be distributing it's own C library and Forte must be distributing it's own C library. Is my understanding correct?
But, can a user library compiled with GCC work with Forte C library? If both the C libraries are present in a system, which one will get invoked during run time?
Also, if an application is linking to multiple libraries some compiled with GCC and some with Forte, will libraries compiled with GCC automatically link to the GCC C library and will it behave likewise for Forte.
GCC comes with libgcc which includes helper functions to do things like long division (or even simpler things like multiplication on CPUs with no multiply instruction). It does not require a specific libc implementation. FreeBSD uses a BSD derived one, glibc is very popular on Linux and there are special ones for embedded systems like avr-libc.
Systems can have many libraries installed (libc and other) and the rules for selecting them vary by OS. If you link statically it's entirely determined at compile time. If you link dynamically there are versioning and path rules which come into play. Generally you cannot mix and match at runtime because of bits of the library (from headers) that got compiled into the executable.
The compile products of two compilers should be compatible if they both follow the ABI for the platform. That's the purpose of defining specific register and calling conventions.
As far as Solaris is concerned, you assumption is incorrect. Being the interface between the kernel and the userland, the standard C library is provided with the operating system. That means whatever C compiler you use (Forte/studio or gcc), the same libc is always used. In any case, the rare ports of the Gnu standard C library (glibc) to Solaris are quite limited and probably lacking too much features to be usable. http://csclub.uwaterloo.ca/~dtbartle/opensolaris/
None of the other answers (yet) mentions an important feature that promotes interworking between compilers and libraries - the ABI or Application Binary Interface. On Unix-like machines, there is a well documented ABI, and the C compilers on the system all follow the ABI. This allows a great deal of mix'n'match. Normally, you use the system-provided C library, but you can use a replacement version provided with a compiler, or created separately. And normally, you can use a library compiled by one compiler with programs compiled by other compilers.
Sometimes, one compiler uses a runtime support library for some operations - perhaps 64-bit arithmetic routines on a 32-bit machine. If you use a library built with this compiler as part of a program built with another compiler, you may need to link this library. However, I've not seen that as a problem for a long time - with pure C.
Linking C++ is a different matter. There isn't the same degree of interworking between different C++ compilers - they disagree on details of class layout (vtables, etc) and on how exception handling is done, and so on. You have to work harder to create libraries built with one C++ compiler that can be used by others.
Only few things of the C library are mandatory in the sense that they are not needed for a freestanding environment. It only has to provide what is necessary for the headers
<float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, and <stdint.h>
These usually don't implement a lot of functions that must be provided.
The other type of environments are called "hosted" environments. As the name indicated they suppose that there is some entity that "hosts" the running program, usually the OS. So usually the C library is provided by that "hosting environment", but as Ben said, on different systems there may even be alternative implementations.
Forte? That's really old.
The preferred compilers and developer tools for Solaris are all contained in Oracle Solaris Studio.
C/C++/Fortran with a debugger, performance analyzer, and IDE based on NetBeans, and lots of libraries.
http://www.oracle.com/technetwork/server-storage/solarisstudio/index.html
It's (still) free, too.
I think there a is a bit of confusion about terms: a library is NOT DLL's or .so: in the real sense of programming languages, Libraries are compiled code the LINKER will merge with our binary (.o). So the linker (or the compiler via some directives...) can manage them, but OS can't, simply is NOT a concept related to OS.
We are used to think OSes are written in C and we can rebuild the OS using gcc/libraries or similar, but C is NOT linux / unix.
We can also have an OS written in Pascal (Mac OS was in this manner many years ago..) AND use libraries with our favorite C compiler, OR have an OS written in ASM (even if not all, as in first Windows version), but we must have C libraries to build an exe.

Resources