C headers: compiler specific vs library specific? - c

Is there some clear-cut distinction between standard C *.h header files that are provided by the C compiler, as oppossed to those which are provided by a standard C library? Is there some list, or some standard locations?
Motivation: int this answer I got a while ago, regarding a missing unistd.h in the latest TinyC compiler, the author argued that unistd.h (contrarily to sys/unistd.h) should not be provided by the compiler but by your C library.
I could not make much sense of that response (for one thing shouldn't that also apply to, say, stdio.h?) but I'm still wondering about it. Is that correct? Where is some authoritative reference for this?
Looking in other compilers, I see that other "self contained" POSIX C compilers that are hosted in Windows (like the GCC toolchain that comes with MinGW, in several incarnations; or Digital Mars compiler), include all header files.
And in a standard Linux distribution (say, Centos 5.10) I see that the gcc package provides a few header files (eg, stdbool.h, syslimits.h) in /usr/lib/gcc/i386-redhat-linux/4.1.1/include/, and the glibc-headers package provides the majority of the headers in /usr/include/ (including stdio.h, /usr/include/unistd.h and /usr/include/sys/unistd.h).
So, in neither case I see support for the above claim.

No, there is no clear-cut distinction.
As far as the C standard is concerned (here's a recent draft), the compiler and the library together make up the implementation, distinguished mostly by being described in sections 6 and 7 of the standard, respectively.
For some implementations, the compiler and the runtime library are provided by the same vendor/organization/person, either as a single installable package or as two separate packages. For other implementations (including gcc), the bulk of the standard library is provided by the underlying operating system, but the installation package for the compiler includes a few of its own headers.
Another example: When you install gcc from source on Solaris, the installer runs a script that grabs copies of some of the existing header files (provided by Sun's Oracle's runtime library) and edits them, installing the modified copies in a separate directory.
On GNU/Linux systems, the default C compiler is usually gcc, and the runtime library is provided by glibc -- both GNU packages, but developed separately. The MinGW implementation under Windows uses the gcc compiler with Microsoft's runtime library (which leads to some problems because they disagree on the representation of long double).
The choice of which standard headers need to be provided by the compiler is made by the authors of the compiler. Headers whose implementation is tightly tied to a particular compiler (such as <stdint.h>, <limits.h>, and <float.h>) are typically provided by the compiler; headers that provide an interface to operating system services, like <stdio.h> and <stdlib.h> are typically provided by the runtime library or perhaps by the OS.
The C standard provides no direct guidance regarding how this choice should be made.

Except for embedded (free-standing) C implementations, it makes little sense to separate C into compiler and libraries. Neither a compiler without libraries, not a C library without a compiler make much sense. Only a compiler together with a library makes up a complete implementation.
Since C89, the standard library is part of the C standard, and the list of required header files is listed in the standard. Other sets of libraries are standardized by Posix, X/Open ...
See this answer for a list: List of standard header files in C and C++
There are some headers that are by nature closer to the compiler itself, e.g. limits.h, which specifies the size of the data types. Some headers are closer to the OS, i.e. unistd.h. In both cases however, there are intersections, and if the OS' idea of, say, size_t and the compiler's implementation do not agree, nothing will work.
One could also argue that unistd.h should be provided by the OS and not by the C library - after all, the knowledge how to call a kernel function belongs into the OS.
In summary, I think this distinction into compiler, C library, operating system makes little sense.

Related

C - What are C implementations?

Could you give me an explanation of what exactly an implementation is, taking inspiration from the three highlighted expressions?
From "C Primer Plus" > Language Standards
Currently, many C implementations are available. Ideally, when you
write a C program, it should work the same on any implementation,
providing it doesn’t use machine-specific programming. For this to be
true in practice, different implementations need to conform to a
recognized standard.
A standard conforming C implementation consists of a compiler that translates compilation units as mandated by the standard, an implementation of the standard library for all functions required by the standard and something (normally a linker) that puts everything together to build an executable file. In fact the implementation also includes all the software required to then run the produced executable.
We commonly speak of compilers (gcc, clang, msvc) when we should speak of C development environment. And inside each vendor system, you may have different implementations, because for example gcc or clang can generate executables for different int sizes (32 or 64 bits) and eventually different endiannesses. Each configuration then constitutes a specific implementation.
To be more exhaustive, it should be noted that support the standard library may be optional in so called standalone execution environment (by opposition to hosted execution environment). In real world, standalone mode is used for kernel development, because the kernel must be able to start before all the functions from the standard library are available. Else we would have a chicken and egg problem if the kernel required functions that it only can provide when it is fully loaded...
References: The draft n1570 for C11 defines an implementation as:
3.12
implementation:
particular set of software, running in a particular translation environment under particular
control options, that performs translation of programs for, and supports execution of
functions in, a particular execution environment
C is typically implemented by a C compiler. In that paragraph you could simply replace this accordingly.
A reason for the more generic term "implementation" is that a compiler might contain multiple frontends or be made out of different tasks, or there are theoretically other forms than a compiler thinkable.
The C standard is defined against a "abstract machine" which behaves as the standard describes but doesn't have any other requirements and defines mapping to real machines as exercise for the implementor.
An implementation is an application to a particular specification.
For example, the specification could be the ISO international standard, and an implementation could be gcc or clang; the compilers that have implemented the standard specifications coupled with an implementation of the standard library.

I'm confused with C libraries

Ok here's the thing.
Most people learn about the C standard library simultaneously as they first get in contact with the C language and I wasn't an exception either. But as I am studying linux now, I tend to get confused with C libraries. well first, I know that you get a nice old C standard lib as you install gcc on your linux distro as a static lib. After that, you get a new stable version of glibc pretty soon as you connect to the internet.
I started to look into glibc API and here's where I got messed up. glibc seems to support vast amount of lib basically starting from POSIX C Standard lib (which implements the standard C lib(including C99 as I know of)) to it's own extensions based on the POSIX standard C lib.
Does this mean that glibc actually modified or added functions in the POSIX C Standard lib? or even add whole new header set? Cause I see some functions that are not in the standard C lib but actually included in the standard C header (such as strnlen() in
Also referring to what I mentioned about a 'glibc making whole new header set', is because I'm starting to see some header files that seems pretty unique such as linux/blahblah.h or sys/syscalls.h <= (are these the libs that only glibc support?)
Next Ques is that I actually heard linux is built based on C language. Does this mean linux compiles itself with it's own gcc compiler???????
For the first question, glibc follows both standard C and POSIX, from About glibc
The GNU C Library is primarily designed to be a portable and high performance C library. It follows all relevant standards including ISO C11 and POSIX.1-2008. It is also internationalized and has one of the most complete internationalization interfaces known.
For the second question, yes, you can compile Linux using gcc. Even gcc itself can be compiled using gcc, it's called bootstrapping.
Glibc implements the POSIX, ANSI and ISO C standards, and adds its own 'fluff', which it calls "glibc extensions". The reason that they are all "mixed together" is because they wrote the library as one package, there is no separate POSIX-only glibc.
<linux/blah> is not part of glibc. It is a set headers written specifically for the operating system, by people outside of glibc, to give the programmer access to the Linux kernel API. It is "part" of the Linux kernel and is installed with it, and is used for kernel hacking. <sys/blah> is part of glibc, and is specific to Linux. It gives access to a fairly abstracted Linux system API.
As for your second question, yes. Linux is written in C, as it is (according to Linus) the only programming language for kernel and system programming. The way this is done is through a technique called bootstrapping, where a small compiler is built (usually manually in ASM) and builds the entire kernel or the entirety of GCC.
There is one more thing to be aware of: one of the purposes of the libc is to abstract from the actual system kernel. As such, the libc is the one part of your app that is kernel specific. If you had a different kernel with different syscalls, you would need to have a specially compiled libc. AFAIK, the libc is therefore usually linked as a shared library.
On linux, we usually have the glibc installed, because linux systems usually are GNU/Linux systems with a GNU toolchain on top of the linux kernel.
And yes, the glibc does expand the standards in certain spots: The asprintf() function for instance originated as a gnu-addition. It almost made it into the C11 standard subsequently, but until it becomes part of them, it's use will require a glibc-based system, or statically linking with the glibc.
By default, the glibc headers do not define these gnu additions. You can switch them on by defining the preprocessor macro GNU_SOURCE before including the appropriate headers, or by specifying -std=gnu11 to the gcc call.

Crosscompiler Binary compatibility in C

I need to verify something for which I have doubts. If a shared library ( .dll) is written in C, with the C99 standard and compiled under a compiler. Say MinGw. Then in my experience it is binary compatible and hence useable from any other compiler. Say MS Visual Studio. I say in my experience because I have tried it successfully more than once. But I need to verify if this is a rule.
And in addition I would like to ask if it is indeed so, then why libraries written completely in C, like openCV for example don't provide compiled binaries for every different OS? I know that the obvious reason would be to set all the compile-time parameters, but other than that there is none right?
EDIT: I am adding an additional question which I see as a logical extension to the original. Isn't this how one would go and create a closed source library? Since the option of giving source goes out of the window there, giving binaries is the only choice. And in that case providing binaries for as many architectures as possible is the desired result, with C being an obvious choice for having the best portability between systems and compilers. Right?
In the specific case of C compilers (MSVC and GCC/MinGW) in the Windows world, you are correct in the assumption of binary compatibility. One can link a C interface DLL compiled by GCC to a program in Visual Studio. This is the way C99 projects like ffmpeg allow developers to write application wiht Visual Studio. One only needs to create the import library with lib.exe found in the Microsoft toolchain from the DLL. Or vice versa, using mingw.org's pexports or better, mingw-w64's gendef tool, one can create a GCC import lib for a MSVC produced DLL.
This handy interoperability breaks down when you enter the C++ interface world, where the ABI of MSVC and GCC is different and incompatible. It may work, it may not, no guarantees are made and no effort is (currently) being done in changing that. Also, debugging info is obviously different, until someone writes a debug information generator/writer in GCC that is compatible to MSVC's debugger (along with gdb support of course).
I don't think C99 specifically changes anything to function declarations or the way arguments are handled in symbol definitions, so there should be no problem here either.
Note that as Vijay said, there is still the architecture difference, so a x86 library can't be used when linking to an AMD64 library.
To also answer your additional question about closed source binaries and distributing a version for all available compilers/architectures.
This is exactly the way you would create a closed source binary. In addition to the import library, it is also very important to hide exports from the DLL, making the DLL itself useless for linking (if you don't want client code to use private functions in the library, see for example the output of dumpbin /exports on a MSOffice DLL, lots of hidden stuff there). You can achieve the same thing with GCC (I believe, never used or tried it) using things like __attribute(hidden) etc...
Some compiler specific points:
MSVC comes with four (well, actually only three remaining in newer versions) different runtime libraries through /MT, /MD, and /LD. On top of this, you would have to provide a build for each version of Visual Studio (including Service Packs) to assure compatibility. But that is closed source binary and Windows for you...
GCC does not have this problem; MinGW always links to msvcrt.dll provided by Windows (since Windows 98), equivalent with /MD (and maybe also a debug library equivalent with /MDd). But I there are two versions of MinGW (mingw.org and mingw-w64) which do not guarantee binary compatibility. THe latter is more complete as it provides 64-bit options as well as 32-bit, and provides a more complete header/library set (including a substantial part of DirectX and DDK).
The general rule is that IF your OS/CPU combination has a standard ABI, and IF that ABI is powerful enough for your language, most compilers will follow that ABI and as a result will be binary compatible, allowing you to link libraries (shared or static) compiled with different compilers to programs compiled with other compilers just fine.
The problem is that most ABIs are fairly weak -- they're designed around low-level languages like C and FORTRAN and date back to the days before object oriented languages like C++. So they tend to lack support for things like function overloading, user-defined operators, exceptions, global contructors and destructors, virtual functions, inheritance, and such that are needed by C++.
This lack was recognized when C++ was designed which is why C++ has extern "C" -- which causes the compiler to limit itself to the standard ABI for certain functions, while disabling all the extra C++ features that the ABIs generally don't support.
A shared library or dll compiled to a particular architecture can be linked to applications compiled by other compilers that target the same architecture. (By architecture, I mean a processor/OS combination). But it is not practical for a library developer to compile against all possible architectures. Moreover, when a library is distributed in source form, users can build binaries optimized to their specific requirements.

C libraries are distributed along with compilers or directly by the OS?

As per my understanding, C libraries must be distributed along with compilers. For example, GCC must be distributing it's own C library and Forte must be distributing it's own C library. Is my understanding correct?
But, can a user library compiled with GCC work with Forte C library? If both the C libraries are present in a system, which one will get invoked during run time?
Also, if an application is linking to multiple libraries some compiled with GCC and some with Forte, will libraries compiled with GCC automatically link to the GCC C library and will it behave likewise for Forte.
GCC comes with libgcc which includes helper functions to do things like long division (or even simpler things like multiplication on CPUs with no multiply instruction). It does not require a specific libc implementation. FreeBSD uses a BSD derived one, glibc is very popular on Linux and there are special ones for embedded systems like avr-libc.
Systems can have many libraries installed (libc and other) and the rules for selecting them vary by OS. If you link statically it's entirely determined at compile time. If you link dynamically there are versioning and path rules which come into play. Generally you cannot mix and match at runtime because of bits of the library (from headers) that got compiled into the executable.
The compile products of two compilers should be compatible if they both follow the ABI for the platform. That's the purpose of defining specific register and calling conventions.
As far as Solaris is concerned, you assumption is incorrect. Being the interface between the kernel and the userland, the standard C library is provided with the operating system. That means whatever C compiler you use (Forte/studio or gcc), the same libc is always used. In any case, the rare ports of the Gnu standard C library (glibc) to Solaris are quite limited and probably lacking too much features to be usable. http://csclub.uwaterloo.ca/~dtbartle/opensolaris/
None of the other answers (yet) mentions an important feature that promotes interworking between compilers and libraries - the ABI or Application Binary Interface. On Unix-like machines, there is a well documented ABI, and the C compilers on the system all follow the ABI. This allows a great deal of mix'n'match. Normally, you use the system-provided C library, but you can use a replacement version provided with a compiler, or created separately. And normally, you can use a library compiled by one compiler with programs compiled by other compilers.
Sometimes, one compiler uses a runtime support library for some operations - perhaps 64-bit arithmetic routines on a 32-bit machine. If you use a library built with this compiler as part of a program built with another compiler, you may need to link this library. However, I've not seen that as a problem for a long time - with pure C.
Linking C++ is a different matter. There isn't the same degree of interworking between different C++ compilers - they disagree on details of class layout (vtables, etc) and on how exception handling is done, and so on. You have to work harder to create libraries built with one C++ compiler that can be used by others.
Only few things of the C library are mandatory in the sense that they are not needed for a freestanding environment. It only has to provide what is necessary for the headers
<float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, and <stdint.h>
These usually don't implement a lot of functions that must be provided.
The other type of environments are called "hosted" environments. As the name indicated they suppose that there is some entity that "hosts" the running program, usually the OS. So usually the C library is provided by that "hosting environment", but as Ben said, on different systems there may even be alternative implementations.
Forte? That's really old.
The preferred compilers and developer tools for Solaris are all contained in Oracle Solaris Studio.
C/C++/Fortran with a debugger, performance analyzer, and IDE based on NetBeans, and lots of libraries.
http://www.oracle.com/technetwork/server-storage/solarisstudio/index.html
It's (still) free, too.
I think there a is a bit of confusion about terms: a library is NOT DLL's or .so: in the real sense of programming languages, Libraries are compiled code the LINKER will merge with our binary (.o). So the linker (or the compiler via some directives...) can manage them, but OS can't, simply is NOT a concept related to OS.
We are used to think OSes are written in C and we can rebuild the OS using gcc/libraries or similar, but C is NOT linux / unix.
We can also have an OS written in Pascal (Mac OS was in this manner many years ago..) AND use libraries with our favorite C compiler, OR have an OS written in ASM (even if not all, as in first Windows version), but we must have C libraries to build an exe.

How does linking to OS C libraries under Windows and Linux work?

I understand Linux ships with a c library, which implements the ISO C functions and system call functions, and that this library is there to be linked against when developing C. However, different c compilers do not necessarily produce linkable code (e.g. one might pad datastructures used in function arguments differently from another). How is the built-in c library meant to be linked to when I could use any compiler to compile my C? Is the story any different for static versus dynamic linking?
Under Windows on the other hand, each compiler provides its own standard library, which solves part of the problem, but system calls are still in a single set of DLLs. How are C applications linked to these DLLs successfully? How about different languages? (The same DLLs can be used by pre-.Net Visual Basic, etc.)
Each platform has some "calling conventions" that each C implementation must adhere to in order to be able to talk to the operating system correctly. For Windows, for example, all OS-based functions have to be called using stdcall convention, as opposed to the default C convention of cdecl.
In Linux, since the standard C library (and kernel) is compiled using GCC, any other compilers for Linux must make sure their calling conventions are compatible to the one used by GCC.
Compilers do come with their implementations of the standard library. It's just that under Linux it's assumed that any compiler will follow the same conventions the version of GCC that compiled the library had.
As of interoperability, it can be easier than you think. There are established calling conventions that will allow compilers to produce a valid call to a function, even if the function wasn't compiled with the same software.
As of structures and padding, you'll notice that most frameworks work with opaque types, that is, pointers to structures. Often, the structure's layout isn't even available to clients. As such, they never works with the actual data, only pointers to the data, which clears the padding issue.
Standards. You'll note that stdlib stuff operates on primitive values and arrays - and the standard for that stuff is pretty explicit on how things are to be done.

Resources