relationship of c compiler and c standard library - c

I have been doing a lot reading lately about how glibc functions wrap system calls in linux. I am wondering however about the relationship between glibc and the GNU C Compiler.
Lets say for example I wanted to write my own C Standard implementation and write a new library called "newglibc" and I change things just slightly. Like for example I take more checks and actions before and after the system calls. Would I have to write a new compiler? Or would I be able to use the same GNU gcc compiler?
If the compiler is completely separate from the library, then would someone be able to, THEORETICALLY, use the gcc on windows system if they could turn it into a .exe and provide the standard C library that windows provides?
Thank you

The Linux kernel, the GNU C Library ("glibc"), and the GNU Compiler Collection (gcc) are three separate development projects. They are often used all together, but they don't have to be. The ones with "GNU" in their name are offically part of the GNU Project; Linux isn't.
The C standard does not make a distinction between the "compiler" and the "library"; it's all one "implementation" to the committee. It is largely a historical accident that GCC is a separate development project from glibc—but a motivated one: back in the day, each commercial Unix variant shipped with its own C library and compiler, and they were terrible, 90% bugs by volume was typical. GNU got its start providing a less terrible replacement for the compiler (and the shell utilities, which were also terrible).
Replacing the compiler on a traditional commercial Unix is a lot easier than replacing the C library, because the C library isn't just the functions defined in clause 7 of the C standard; as you have noticed, it also provides the lowest-level interface to the kernel, and often that wasn't very well documented. glibc did at one time at least sort-of support a bunch of these Unixes, but nowadays it can only be used with Linux and an experimental kernel called the Hurd. By contrast, GCC supports dozens of different CPUs and kernels, and Linux supports dozens of different CPUs.
If you write your own C library and/or kernel, it is relatively easy to write a "back end" so that GCC can generate code for them as a cross-compiler, and somewhat more difficult to port GCC to run in that environment. You may also need to write a back end for the assembler and linker, which are yet a fourth project ("GNU Binutils"). Porting glibc to a new CPU running Linux is a large but straightforward task; porting glibc to a new operating system is hard, especially if that OS is not Unix-ish. (Windows is decidedly not Unix-ish, so much so that when Microsoft wanted to make it easier to run programs written for Unix under Windows, the path of least resistance was to bolt an in-house clone of the Linux kernel onto the side of the NT kernel. I am not making this up.)
If you write your own C compiler, you will have to make it conform to the expectations of the library and kernel that it is generating code for. A lot of that is documented in the "ABI" specification for the environment you're working in, but not all, unfortunately.
If that doesn't clarify, please let us know what is still unclear.

Related

The C language and Mac OSX

I was wondering whether anybody here could help me better understand the relationship between OSX and C. There's some developer information related to C++ in xcode but nothing for C.
I believe one fundamental difference is that osx uses libc as opposed to glibc. Can anybody point me to libc documentation? I can't seem to find any.
I've seen the usr/includes folder but all that does is make me wonder where I can get a reference that elucidates all the options available to me. For instance, I just discovered <tree.h>. That's all well and good but is there any documentation? Or do I need to trawl the includes folder?
It seems that you're asking whether the functionality that OSX provides to you as a programmer is partially different from other *nix systems; focusing on the functionality that OSX's implementation of the C Standard Library provides you with.
Now keep in mind that while the C Standard Library is a very common way to take advantage of the functionality the operating system kernel exposes, it's not the only way. You can use other low-level libraries, or write low-level functions yourself.
Having said that, consider the following:
OSX, like many other *nix systems, is "mostly POSIX-compliant". Meaning that its particular C Standard Library implementation will likely expose headers defined by the POSIX standard. This is the stuff you can rely on regardless of whether you use libc, glibc, or some other implementation of the C Standard Library.
Depending on the particular C Standard Library you're using, it might come with additional functionality, like BSD libc - we say "superset of the POSIX Standard Library" to that. While it can contain implementations of things specific to BSD (and therefore OSX), it mostly seems to contain things that can be implemented regardless of the operating system flavour. For example, the sys/tree.h header that you mention is "an implementation of Red-black tree and Splay tree" - by no means something that couldn't have been implemented on a Linux system!
To sum up:
OSX comes with an implementation of the C Standard Library called BSD libc that provides some additional headers on top of what the POSIX Standard defines.
The difference in functionality between the XNU kernel used by OSX and other *nix kernels will not necessarily be captured in the difference between the C Standard Library implementations. If you want to know what the XNU kernel can do for you that the Linux kernel can't, the place to start is with the kernels themselves.
So your question can be split into:
What is the difference between glibc and BSD libc?
and
What is the difference between the XNU kernel and the Linux kernel?
It's a bit unclear what you're asking.
OS X is based on top of FreeBSD, a POSIX-compliant UNIX operating system. The relationship between OS X and C is that C is one of many programming languages that you can code in to develop for the platform (C is the core of Objective-C, an otherwise unused language that Apple champions).
OS X doesn't use libc. clang, the compiler that ships as part of Apple's developer tools package for OS X, uses libc. There's a difference. If you want to use glib, grab GCC from Homebrew or Macports and use it to compile your programs instead of clang.
Lastly, you can't find documentation for libc, as all C libraries, like libc, glibc, etc, all provide the same set of functions if they are standards-compliant. There tend to be few differences end-user-wise between the different C libraries; so, if you want to find out about a header file, use man, like this: man clang to read clang documentation, for example.
Hope this helps.

Do you have to build a new compiler for a new operating system?

I would like to build an OS some time in the future, and now thinking of some light sketches on how it would be. I have pretty much been coding in C compiled for the Windows environment (and some little Java). I would have to recompile any of my C programs should I want to run it under Linux. So the binaries, the product of compilation, must be different for each operating system. If I design a totally new OS from scratch, for both hobby and academic purpose, without using the Linux kernel or any known base code of an OS, what I understand as to happen is that I cannot compile my C programs with GCC since my OS will not be among its target systems. Here my question written on the title emerges. Thanks in advance for any hints.
It depends. You could easily choose to re-use an existing compiler, such as the immemorial example GCC, and thus you would reap the benefits of an existing compiler. But there are some big provisos that must be cleared up.
Regards of whether or not you choose to build a new compiler, the challenge will remain in porting a C library. You technically can use C without a standard library (which is what the Linux kernel, or any self-hosted example for that matter, has to do, for example) but this is a ridiculous proposition for programs intended to run under an operating system, as most systems impose memory restrictions, etc, meaning that you cannot just have carte blanche in terms of using memory. Thusly, a C library call such as malloc is required.
Since any programs under your kernel (99% of your OS in all likelihood) will need a set of functions to link against, porting a C library is your biggest task. The C library is a huge monolith, and writing your own would be rather silly, especially with many implementations already available, the most well known being GCC's. So, the question you really should be asking is, do you want to write my own version of libc? (The answer is almost always no, and most alternative implementations are for niche use cases.) Plus, if you want to make your OS POSIX-compliant, then you'll have to implement more functions, adding to the hassle.
Whether you write your own compiler for your OS is a minor detail compared to which C library will be included with it. You can always use your own compiler with an already-written implementation of the C library.
My advice to your rather opinion-based question: no. Port an existing compiler such as GCC or clang, and then use that. Plus, that has several advantages:
Compatibility with existing tools and toolchains
A familiar program (no need for your users to learn how to use a new compiler)
They're open source - and in spite of that, you'd be insane to go at it alone. Heck, even Apple integrated two already existing compilers - GCC and clang - into their toolchains rather than do it themselves, and they're a billion-dollar company.
Take a look at this page. It demonstrates how to port GCC to your OS using Newlib as your C library.
No, you can just port an existing compiler. You can even choose an existing executable format, such as ELF, and use your standard GCC + GNU Binutils toolchain. You will need to port the standard library and C runtime, and you will need to write an ELF loader into your operating system.
I suspect the majority of the work will be in porting the C library.
A search turned up this page: Porting GCC to your OS
(1) No, you usually don't have to write your own compiler. Writing a good optimizing compiler can be actually big task which I would better avoid.
But in order to enable writing applications for your OS in some higher level language you will either need to provide
some (2.1) API emulation layer (so that code written and compiled for other OS can be run on your OS)
or you'll have to (2.2) port some existing compiler to your OS
or at least make your OS a new available (2.3) target platform in an existing compiler
or some other option I don't know about
The choices are multiple each with its own pros/cons.
Some examples (other then the obvious GCC already mentioned by #dietrich-epp, #sevenbits) to help you decide which way you want to follow:
(3.1) Free Pascal (see http://www.freepascal.org) compiler can be extended with another target platform
Free Pascal is a 32,64 and 16 bit professional Pascal compiler. It can target multiple processor architectures: Intel x86, AMD64/x86-64, PowerPC, PowerPC64, SPARC, and ARM. Supported operating systems include Linux, FreeBSD, Haiku, Mac OS X/iOS/Darwin, DOS, Win32, Win64, WinCE, OS/2, MorphOS, Nintendo GBA, Nintendo DS, and Nintendo Wii. Additionally, JVM, MIPS (big and little endian variants), i8086 and Motorola 68k architecture targets are available in the development versions
...
Source: http://www.freepascal.org
(3.2) Inferno Operating System (see http://www.vitanuova.com/inferno) has its own application language (see Limbo) with OS specific words, own compiler etc. Applications run in virtual machine (see Dis)
Inferno® is a compact operating system designed for building distributed and networked systems on a wide variety of devices and platforms. With many advanced and unique features, Inferno puts an unrivalled set of tools into your hands...Inferno can run as a user application on top of an existing operating system or as a stand alone operating system...
Source: http://www.vitanuova.com/inferno
(3.3) Squeak (see http://en.wikipedia.org/wiki/Squeak) is a self contained OS with graphics and everything. It uses Smalltalk-80 as the language. Compiler included, applications run in virtual machine (see Cog VM). The VM could be emitted as portable C code and then ported to a bare-bone hardware.
Squeak is a modern, open source, full-featured implementation of the powerful Smalltalk programming language and environment. Squeak is highly-portable, running on almost any platform you could name and you can really truly write once run anywhere. Squeak is the vehicle for a wide range of projects from multimedia applications and educational platforms to commercial web application development...
Source: http://www.squeak.org
(3.4) MenuetOS (see http://www.menuetos.net/) is 64bit OS written in assembly language. Flat Assembler (see FASM) compiler which can emit native binaries was ported to the OS including OS API and is included in basic installation. Later on C library was also ported
MenuetOS is an Operating System in development for the PC written entirely in 32/64 bit assembly language...supports 32/64 bit x86 assembly programming for smaller, faster and less resource hungry applications...Menuet isn't based on other operating system nor has it roots within UNIX or the POSIX standards. The design goal, since the first release in year 2000, has been to remove the extra layers between different parts of an OS, which normally complicate programming and create bugs...
Source: http://www.menuetos.net
(3.5) Google's Android OS (see Wikipedia: Android (operating system)) ported Java Virtual Machine (see Dalvik later replaced by Android Runtime) and provided OS APIs for the Java programming language, reusing existing compilers and IDEs just consuming the produced binaries
Android Runtime (ART) is an application runtime environment used by the Android mobile operating system. ART replaces Dalvik, which is the process virtual machine originally used by Android, and performs transformation of the application's bytecode into native instructions that are later executed by the device's runtime environment...
Source: http://en.wikipedia.org/wiki/Android_Runtime
There are many more useful examples available. Whether you have to or don't have to basically depends on the programming paradigm your new OS will introduce. Why you want to build it and how will it differ from the existing ones.
Examples for no are: (3.1), (3.4), (3.5)
Examples for yes are: (3.2), (3.3)

I'm confused with C libraries

Ok here's the thing.
Most people learn about the C standard library simultaneously as they first get in contact with the C language and I wasn't an exception either. But as I am studying linux now, I tend to get confused with C libraries. well first, I know that you get a nice old C standard lib as you install gcc on your linux distro as a static lib. After that, you get a new stable version of glibc pretty soon as you connect to the internet.
I started to look into glibc API and here's where I got messed up. glibc seems to support vast amount of lib basically starting from POSIX C Standard lib (which implements the standard C lib(including C99 as I know of)) to it's own extensions based on the POSIX standard C lib.
Does this mean that glibc actually modified or added functions in the POSIX C Standard lib? or even add whole new header set? Cause I see some functions that are not in the standard C lib but actually included in the standard C header (such as strnlen() in
Also referring to what I mentioned about a 'glibc making whole new header set', is because I'm starting to see some header files that seems pretty unique such as linux/blahblah.h or sys/syscalls.h <= (are these the libs that only glibc support?)
Next Ques is that I actually heard linux is built based on C language. Does this mean linux compiles itself with it's own gcc compiler???????
For the first question, glibc follows both standard C and POSIX, from About glibc
The GNU C Library is primarily designed to be a portable and high performance C library. It follows all relevant standards including ISO C11 and POSIX.1-2008. It is also internationalized and has one of the most complete internationalization interfaces known.
For the second question, yes, you can compile Linux using gcc. Even gcc itself can be compiled using gcc, it's called bootstrapping.
Glibc implements the POSIX, ANSI and ISO C standards, and adds its own 'fluff', which it calls "glibc extensions". The reason that they are all "mixed together" is because they wrote the library as one package, there is no separate POSIX-only glibc.
<linux/blah> is not part of glibc. It is a set headers written specifically for the operating system, by people outside of glibc, to give the programmer access to the Linux kernel API. It is "part" of the Linux kernel and is installed with it, and is used for kernel hacking. <sys/blah> is part of glibc, and is specific to Linux. It gives access to a fairly abstracted Linux system API.
As for your second question, yes. Linux is written in C, as it is (according to Linus) the only programming language for kernel and system programming. The way this is done is through a technique called bootstrapping, where a small compiler is built (usually manually in ASM) and builds the entire kernel or the entirety of GCC.
There is one more thing to be aware of: one of the purposes of the libc is to abstract from the actual system kernel. As such, the libc is the one part of your app that is kernel specific. If you had a different kernel with different syscalls, you would need to have a specially compiled libc. AFAIK, the libc is therefore usually linked as a shared library.
On linux, we usually have the glibc installed, because linux systems usually are GNU/Linux systems with a GNU toolchain on top of the linux kernel.
And yes, the glibc does expand the standards in certain spots: The asprintf() function for instance originated as a gnu-addition. It almost made it into the C11 standard subsequently, but until it becomes part of them, it's use will require a glibc-based system, or statically linking with the glibc.
By default, the glibc headers do not define these gnu additions. You can switch them on by defining the preprocessor macro GNU_SOURCE before including the appropriate headers, or by specifying -std=gnu11 to the gcc call.

A simple explanation of what is MinGW

I'm an avid Python user and it seems that I require MinGW to be installed on my Windows machine to compile some libraries. I'm a little confused about MinGW and GCC. Here's my question (from a real dummy point of view):
So Python is language which both interpreted and compiled. There are Linux and Windows implementations of Python which one simply installs and used the binary to a execute his code. They come bundled with a bunch of built-in libraries that you can use. It's the same with Ruby from what I've read.
Now, I've done a tiny bit a of C and I know that one has a to compile it. It has its built-in libraries which seem to be called header files which you can use. Now, back in the school day's, C, was writing code in a vi-like IDE called Turbo-C and then hitting F9 to compile it. That's pretty much where my C education ends.
What is MinGW and what is GCC? I've been mainly working on Windows systems and have even recently begun using Cygwin. Aren't they the same?
A simple explanation hitting these areas would be helpful.
(My apologies if this post sounds silly/stupid. I thought I'd ask here. Ignoring these core bits never made anyone a better programmer.)
Thanks everyone.
MinGW is a complete GCC toolchain (including half a dozen frontends, such as C, C++, Ada, Go, and whatnot) for the Windows platform which compiles for and links to the Windows OS component C Runtime Library in msvcrt.dll. Rather it tries to be minimal (hence the name).
This means, unlike Cygwin, MinGW does not attempt to offer a complete POSIX layer on top of Windows, but on the other hand it does not require you to link with a special compatibility library.
It therefore also does not have any GPL-license implications for the programs you write (notable exception: profiling libraries, but you will not normally distribute those so that does not matter).
The newer MinGW-w64 comes with a roughly 99% complete Windows API binding (excluding ATL and such) including x64 support and experimental ARM implementations. You may occasionally find some exotic constant undefined, but for what 99% of the people use 99% of the time, it just works perfectly well.
You can also use the bigger part of what's in POSIX, as long as it is implemented in some form under Windows. The one major POSIX thing that does not work with MinGW is fork, simply because there is no such thing under Windows (Cygwin goes through a lot of pain to implement it).
There are a few other minor things, but all in all, most things kind of work anyway.
So, in a very very simplified sentence: MinGW(-w64) is a "no-frills compiler thingie" that lets you write native binary executables for Windows, not only in C and C++, but also other languages.
To compile C program you need a C implementation for your specific computer.
C implementations consist, basically, of a compiler (its preprocesser and headers) and a library (the ready-made executable code).
On a computer with Windows installed, the library that contains most ready-made executable code is not compatible with gcc compiler ... so to use this compiler in Windows you need a different library: that's where MinGW enters. MinGW provides, among other things, the library(ies) needed for making a C implementation together with gcc.
The Windows library and MSVC together make a different implementation.
MinGW is a suite of development tools that contains GCC (among others), and GCC is a C compiler within that suite.
MinGW is an implementation of most of the GNU building utilities, like gcc and make on windows, while gcc is only the compiler. Cygwin is a lot bigger and sophisticated package, wich installs a lot more than MinGW.
The only reason for existence of MinGW is to provide linux-like environment for developers not capable of using native windows tools. It is inferior in almost every respect to Microsoft tooolchains on Win32/Win64 platforms, BUT it provides environment where linux developer does not have to learn anything new AND he/she can compile linux code almost without modifications. It is a questionable approach , but many people find that convenience more important than other aspects of the development .
It has nothing to do with C or C++ as was indicated in earlier answers, it has everything to do with the environment developer wants. Argument about GNU toolchains on windows and its nessessety, is just that - an argument
GCC - unix/linux compiler,
MinGW - approximation of GCC on Windows environment,
Microsoft compiler and Intel compiler - more of the same as names suggest(both produce much , much better programs on Windows then MinGW, btw)

C libraries are distributed along with compilers or directly by the OS?

As per my understanding, C libraries must be distributed along with compilers. For example, GCC must be distributing it's own C library and Forte must be distributing it's own C library. Is my understanding correct?
But, can a user library compiled with GCC work with Forte C library? If both the C libraries are present in a system, which one will get invoked during run time?
Also, if an application is linking to multiple libraries some compiled with GCC and some with Forte, will libraries compiled with GCC automatically link to the GCC C library and will it behave likewise for Forte.
GCC comes with libgcc which includes helper functions to do things like long division (or even simpler things like multiplication on CPUs with no multiply instruction). It does not require a specific libc implementation. FreeBSD uses a BSD derived one, glibc is very popular on Linux and there are special ones for embedded systems like avr-libc.
Systems can have many libraries installed (libc and other) and the rules for selecting them vary by OS. If you link statically it's entirely determined at compile time. If you link dynamically there are versioning and path rules which come into play. Generally you cannot mix and match at runtime because of bits of the library (from headers) that got compiled into the executable.
The compile products of two compilers should be compatible if they both follow the ABI for the platform. That's the purpose of defining specific register and calling conventions.
As far as Solaris is concerned, you assumption is incorrect. Being the interface between the kernel and the userland, the standard C library is provided with the operating system. That means whatever C compiler you use (Forte/studio or gcc), the same libc is always used. In any case, the rare ports of the Gnu standard C library (glibc) to Solaris are quite limited and probably lacking too much features to be usable. http://csclub.uwaterloo.ca/~dtbartle/opensolaris/
None of the other answers (yet) mentions an important feature that promotes interworking between compilers and libraries - the ABI or Application Binary Interface. On Unix-like machines, there is a well documented ABI, and the C compilers on the system all follow the ABI. This allows a great deal of mix'n'match. Normally, you use the system-provided C library, but you can use a replacement version provided with a compiler, or created separately. And normally, you can use a library compiled by one compiler with programs compiled by other compilers.
Sometimes, one compiler uses a runtime support library for some operations - perhaps 64-bit arithmetic routines on a 32-bit machine. If you use a library built with this compiler as part of a program built with another compiler, you may need to link this library. However, I've not seen that as a problem for a long time - with pure C.
Linking C++ is a different matter. There isn't the same degree of interworking between different C++ compilers - they disagree on details of class layout (vtables, etc) and on how exception handling is done, and so on. You have to work harder to create libraries built with one C++ compiler that can be used by others.
Only few things of the C library are mandatory in the sense that they are not needed for a freestanding environment. It only has to provide what is necessary for the headers
<float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, and <stdint.h>
These usually don't implement a lot of functions that must be provided.
The other type of environments are called "hosted" environments. As the name indicated they suppose that there is some entity that "hosts" the running program, usually the OS. So usually the C library is provided by that "hosting environment", but as Ben said, on different systems there may even be alternative implementations.
Forte? That's really old.
The preferred compilers and developer tools for Solaris are all contained in Oracle Solaris Studio.
C/C++/Fortran with a debugger, performance analyzer, and IDE based on NetBeans, and lots of libraries.
http://www.oracle.com/technetwork/server-storage/solarisstudio/index.html
It's (still) free, too.
I think there a is a bit of confusion about terms: a library is NOT DLL's or .so: in the real sense of programming languages, Libraries are compiled code the LINKER will merge with our binary (.o). So the linker (or the compiler via some directives...) can manage them, but OS can't, simply is NOT a concept related to OS.
We are used to think OSes are written in C and we can rebuild the OS using gcc/libraries or similar, but C is NOT linux / unix.
We can also have an OS written in Pascal (Mac OS was in this manner many years ago..) AND use libraries with our favorite C compiler, OR have an OS written in ASM (even if not all, as in first Windows version), but we must have C libraries to build an exe.

Resources