Subtleties dealing with cross compilation, freestanding libgcc, etc - c

I have a few questions about https://wiki.osdev.org/Meaty_Skeleton, which says:
The GCC documentation explicitly states that libgcc requires the
freestanding environment to supply the memcmp, memcpy, memmove, and
memset functions, as well as abort on some platforms. We will satisfy
this requirement by creating a special kernel C library (libk) that
contains the parts of the user-space libc that are freestanding
(doesn't require any kernel features) as opposed to hosted libc
features that need to do system calls
Alright, I understand that libgcc is a 'private library' that is used by gcc, i.e. It has significance only during the compilation process, when gcc is in use, kind of like a helper for gcc. Is this right? I understand that the machine I run gcc on is called the build machine, and the host machine is my own OS. What is the freestanding environment specified here? Since gcc runs on the build machine, libgcc must use the build machine libgcc I guess? Where does freestanding come into the picture?
Also, what is the libk? I think that I really don't understand freestanding and hosted environments yet :(

libgcc
This library must be linked in some cases when you compile with GCC. Because GCC produces calls to functions in this library, if the target architecture doesn't support a specific feature.
For example, if you use 64-Bit arithmetic in your kernel and you want compile for i386, then GCC make a call to the specific function which resides in libgcc:
uint64_t div64(uint64_t dividend, uint64_t divisor)
{
return (dividend / divisor);
}
Here you get an undefined reference to __udivdi3 error from the linker when you try to link your kernel without libgcc. On the other side, libgcc also make calls to standard C library functions. So these specific functions must be implemented.
Here is another IMHO good explanation. It's called bare-metal programming here for the ARM architecture, but also valid for x86.

Related

Is ARM toolchain's gcc regular c or embedded c?

I'm unsure which I'm using. I look it up and some answers say that it's the the host machine that determines if c is embedded c or not, so like, pc -> c, mcu -> embedded c.
edit: I use arm-none-eabi-xxx
arm-none-eabi-gcc is a cross-compiler for bare-metal ARM. ARM is the target, the host is the machine you run the compiler on - the development host.
In most cases you would use such a compiler for developing for bare-metal (i.e. no fully featured OS) embedded systems, but equally you could be using it to develop bootstrap code for a desktop system, or developing an operating system (though less likely).
In any event, it is not the compiler that determines if the system is embedded. An embedded system is simply a system running software that is not a general-purpose computer. For example many network routers run on embedded Linux such as OpenWRT and in that case you might use arm-linux-eabi-gcc. What distinguishes it is that it is still a cross-compiler; the host on which you build the code, is not the same machine architecture or OS as that which will run it.
Neither does being a cross-compiler make it "embedded" - it is entirely possible to cross compile Linux executables on Windows or vice versa with neither target being embedded.
The point is, there is no "Embedded C" language, it is all "Regular C". What makes it embedded is the intended target. So don't get hung-up on classification of the toolchain. If you are using it to generate code for embedded systems it is embedded, simple as that. Many years ago I worked on a project that used Microsoft C V6 (normally for MS-DOS) in an embedded system with code on ROM and running VRTX RTOS. Microsoft C was by no definition an embedded systems compiler; the "magic" was done by replacing Microsoft's linker to "retarget" it to produce ROM images rather than a .exe.
There are of course some targets that are invariably "embedded", such as 8051, and a cross-compiler for such a target could be said to be exclusively "embedded" I guess, but it is a distinction that serves little purpose.
There are many versions of arm gcc toolchains. The established naming convention for cross compilers is,
arm-linux-xxx - this is a 'Regular C', where the library is supported by the Linux OS. The 'xxx' is an ABI format.
arm-none-xxx - this is the 'embedded C'. The 'xxx' again indicates an ABI. It is generally 'newlib' based. Newlib can be hosted by an RTOS and then it will be almost 'Regular C'. If it is 'baremetal' and the newlib features are defaulted to error this is most likely termed 'embedded C'.
Embedded 'C++' was a term that was common sometime ago, but has generally been abandoned as a technology. As stated, the compiler itself is independant. However, the library/OS is typically what defines the 'spirit' of what you asked.
The ABI can be something like 'gnueabhf' for hard floating point, etc. It is a calling convention between routines and often depends on whether there is an FPU on the system or not. It may also depend on ISA, PIC, etc.
The C language allows two different flavours of targets: hosted and freestanding. Hosted means programs running on top of a OS like Windows or Linux, freestanding meaning everything else. Examples of freestanding systems are "bare metal" microcontrollers, RTOS on microcontrollers, the hosted OS itself.
There are just a few minor differences between hosted and freestanding:
The standard library
Hosted compilers must support all mandatory parts of the C standard library.
Freestanding compilers only need to implement <float.h>, <iso646.h>, <limits.h>, <stdalign.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h> and <stdnoreturn.h>. All other standard headers are optional.
Valid forms of main()
Hosted compilers must support int main(void) and int main(int argc, char *argv[]). Other implementation-defined forms of main() need not be supported.
Freestanding compilers always use an implementation-defined form of main(), since there is nobody to return to. The most common form is void main (void).
"Embedded C" just means C for embedded systems. It's not a formal term. (For example not to be confused with for example Embedded C++ which was actually a subset dialect of the C++ language.)
Embedded systems are usually freestanding systems. The gcc-arm-none-eabi compiler is the compiler port for freestanding ARM systems (using ARM's Embedded ABI). It will not come with various OS-specific libs.
When using gcc to compile for freestanding systems you should use the -ffreestanding option. This enables the common implementation-defined form of main() and together with -fno-builtin it might block various standard library function calls from getting "inlined" into your code. Which we don't want since those libs might not even be present.

Is the executable code generated by a C compiler on Z/OS executable on a an IBM z/Linux platform?

In the IBM environment a hybrid C / Assembler compiler call Metal C exists and it permits C source programs to be intermixed with Assembler source programs and compiled as an executable.
My question is this, ignoring all the external linkage references that the resultant executable might have for a moment, could the combined C / Assembler executable run on a z/Linux platform, its just machine instructions, right?
The underlying Operating System and its associated call stack determines compatibility. z/OS uses a specific calling convention, system calls and libraries that are fundamentally different than Linux. Thus they are not binary compatible.
That said, if you are using languages that are available on both platforms then you may be able to simply recompile and "port" your application.

I'm confused with C libraries

Ok here's the thing.
Most people learn about the C standard library simultaneously as they first get in contact with the C language and I wasn't an exception either. But as I am studying linux now, I tend to get confused with C libraries. well first, I know that you get a nice old C standard lib as you install gcc on your linux distro as a static lib. After that, you get a new stable version of glibc pretty soon as you connect to the internet.
I started to look into glibc API and here's where I got messed up. glibc seems to support vast amount of lib basically starting from POSIX C Standard lib (which implements the standard C lib(including C99 as I know of)) to it's own extensions based on the POSIX standard C lib.
Does this mean that glibc actually modified or added functions in the POSIX C Standard lib? or even add whole new header set? Cause I see some functions that are not in the standard C lib but actually included in the standard C header (such as strnlen() in
Also referring to what I mentioned about a 'glibc making whole new header set', is because I'm starting to see some header files that seems pretty unique such as linux/blahblah.h or sys/syscalls.h <= (are these the libs that only glibc support?)
Next Ques is that I actually heard linux is built based on C language. Does this mean linux compiles itself with it's own gcc compiler???????
For the first question, glibc follows both standard C and POSIX, from About glibc
The GNU C Library is primarily designed to be a portable and high performance C library. It follows all relevant standards including ISO C11 and POSIX.1-2008. It is also internationalized and has one of the most complete internationalization interfaces known.
For the second question, yes, you can compile Linux using gcc. Even gcc itself can be compiled using gcc, it's called bootstrapping.
Glibc implements the POSIX, ANSI and ISO C standards, and adds its own 'fluff', which it calls "glibc extensions". The reason that they are all "mixed together" is because they wrote the library as one package, there is no separate POSIX-only glibc.
<linux/blah> is not part of glibc. It is a set headers written specifically for the operating system, by people outside of glibc, to give the programmer access to the Linux kernel API. It is "part" of the Linux kernel and is installed with it, and is used for kernel hacking. <sys/blah> is part of glibc, and is specific to Linux. It gives access to a fairly abstracted Linux system API.
As for your second question, yes. Linux is written in C, as it is (according to Linus) the only programming language for kernel and system programming. The way this is done is through a technique called bootstrapping, where a small compiler is built (usually manually in ASM) and builds the entire kernel or the entirety of GCC.
There is one more thing to be aware of: one of the purposes of the libc is to abstract from the actual system kernel. As such, the libc is the one part of your app that is kernel specific. If you had a different kernel with different syscalls, you would need to have a specially compiled libc. AFAIK, the libc is therefore usually linked as a shared library.
On linux, we usually have the glibc installed, because linux systems usually are GNU/Linux systems with a GNU toolchain on top of the linux kernel.
And yes, the glibc does expand the standards in certain spots: The asprintf() function for instance originated as a gnu-addition. It almost made it into the C11 standard subsequently, but until it becomes part of them, it's use will require a glibc-based system, or statically linking with the glibc.
By default, the glibc headers do not define these gnu additions. You can switch them on by defining the preprocessor macro GNU_SOURCE before including the appropriate headers, or by specifying -std=gnu11 to the gcc call.

C libraries are distributed along with compilers or directly by the OS?

As per my understanding, C libraries must be distributed along with compilers. For example, GCC must be distributing it's own C library and Forte must be distributing it's own C library. Is my understanding correct?
But, can a user library compiled with GCC work with Forte C library? If both the C libraries are present in a system, which one will get invoked during run time?
Also, if an application is linking to multiple libraries some compiled with GCC and some with Forte, will libraries compiled with GCC automatically link to the GCC C library and will it behave likewise for Forte.
GCC comes with libgcc which includes helper functions to do things like long division (or even simpler things like multiplication on CPUs with no multiply instruction). It does not require a specific libc implementation. FreeBSD uses a BSD derived one, glibc is very popular on Linux and there are special ones for embedded systems like avr-libc.
Systems can have many libraries installed (libc and other) and the rules for selecting them vary by OS. If you link statically it's entirely determined at compile time. If you link dynamically there are versioning and path rules which come into play. Generally you cannot mix and match at runtime because of bits of the library (from headers) that got compiled into the executable.
The compile products of two compilers should be compatible if they both follow the ABI for the platform. That's the purpose of defining specific register and calling conventions.
As far as Solaris is concerned, you assumption is incorrect. Being the interface between the kernel and the userland, the standard C library is provided with the operating system. That means whatever C compiler you use (Forte/studio or gcc), the same libc is always used. In any case, the rare ports of the Gnu standard C library (glibc) to Solaris are quite limited and probably lacking too much features to be usable. http://csclub.uwaterloo.ca/~dtbartle/opensolaris/
None of the other answers (yet) mentions an important feature that promotes interworking between compilers and libraries - the ABI or Application Binary Interface. On Unix-like machines, there is a well documented ABI, and the C compilers on the system all follow the ABI. This allows a great deal of mix'n'match. Normally, you use the system-provided C library, but you can use a replacement version provided with a compiler, or created separately. And normally, you can use a library compiled by one compiler with programs compiled by other compilers.
Sometimes, one compiler uses a runtime support library for some operations - perhaps 64-bit arithmetic routines on a 32-bit machine. If you use a library built with this compiler as part of a program built with another compiler, you may need to link this library. However, I've not seen that as a problem for a long time - with pure C.
Linking C++ is a different matter. There isn't the same degree of interworking between different C++ compilers - they disagree on details of class layout (vtables, etc) and on how exception handling is done, and so on. You have to work harder to create libraries built with one C++ compiler that can be used by others.
Only few things of the C library are mandatory in the sense that they are not needed for a freestanding environment. It only has to provide what is necessary for the headers
<float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, and <stdint.h>
These usually don't implement a lot of functions that must be provided.
The other type of environments are called "hosted" environments. As the name indicated they suppose that there is some entity that "hosts" the running program, usually the OS. So usually the C library is provided by that "hosting environment", but as Ben said, on different systems there may even be alternative implementations.
Forte? That's really old.
The preferred compilers and developer tools for Solaris are all contained in Oracle Solaris Studio.
C/C++/Fortran with a debugger, performance analyzer, and IDE based on NetBeans, and lots of libraries.
http://www.oracle.com/technetwork/server-storage/solarisstudio/index.html
It's (still) free, too.
I think there a is a bit of confusion about terms: a library is NOT DLL's or .so: in the real sense of programming languages, Libraries are compiled code the LINKER will merge with our binary (.o). So the linker (or the compiler via some directives...) can manage them, but OS can't, simply is NOT a concept related to OS.
We are used to think OSes are written in C and we can rebuild the OS using gcc/libraries or similar, but C is NOT linux / unix.
We can also have an OS written in Pascal (Mac OS was in this manner many years ago..) AND use libraries with our favorite C compiler, OR have an OS written in ASM (even if not all, as in first Windows version), but we must have C libraries to build an exe.

What GNU C extensions are available that aren't trivial to implement in C99?

How come the Linux kernel can compile only with GCC? What GNU C extensions are really necessary for some projects and why?
Here's a couple gcc extensions the Linux kernel uses:
inline assembly
gcc builtins, such as __builtin_expect,__builtin_constant,__builtin_return_address
function attributes to specify e.g. what registers to use (e.g. __attribute__((regparm(0)),__attribute__((packed, aligned(PAGE_SIZE))) ) )
specific code depending on gcc predefined macros (e.g. workarounds for certain gcc bugs in certain versions)
ranges in switch cases (case 8 ... 15:)
Here's a few more: http://www.ibm.com/developerworks/linux/library/l-gcc-hacks/
Much of these gcc specifics are very architecture dependent, or is made possible because of how gcc is implemented, and probably do not make sense to be specified by a C standard. Others are just convenient extensions to C. As the Linux kernel is built to rely on these extensions, other compilers have to provide the same extensions as gcc to be able to build the kernel.
It's not that Linux had to rely on these features of gcc, e.g. the NetBSD kernel relies very little on gcc specific stuff.
GCC supports Nested Functions, which are not part of the C99 standard. That said, some analysis is required to see how prevalent they actually are within the linux kernel.
Linux kernel was written to be compiled by GCC, so standard compliance was never an objective for kernel developers.
And if GCC offers some useful extensions that make coding easier or compiled kernel smaller or faster, it was a natural choice to use these extensions.
I guess it's not they are really that necessary. Just there are many useful ones and cross-compiler portability is not that much an issue for Linux kernel to forgo the niceties. Not to mention sheer amount of work it would take to get rid of relying on extensions.

Resources